Feature-intensity differences exist between different modal images; hence, complementary features are easily ignored or drowned in most multimodal image fusion algorithms. In this study, a novel two-stage fusion algorithm is proposed to reduce the loss of complementary features. The fusion algorithm is divided into two stages: the first stage adopts the multiscale transform based on the hybrid l0l1 layer decomposition and Gauss filter to decompose the source images into the structure-, large-scale-, and detail-layer images. Further, the previous fusion images were generated using a linear combination of one kind of modal image and the different feature layer images of the other type of modal image; this eliminated the differences between the images of the different sensors. The second stage enhances feature fusion and improves the fusion effect. The previous fusion images are decomposed by using non-down-sampling shear wave transform (NSST), and the low-frequency fusion images are respectively fused using principal component analysis (PCA) and local engineering texture; subsequently, the different fusion images are integrated by using local contrasting weighting to obtain the final fusion image of the low-frequency images. High-frequency images are respectively fused by using the local standard deviation matching degree of the exponential function and the local modified spatial frequency weighted average and then the different fusion images are integrated by using a weighted average to obtain the final high-frequency fusion images. Finally, the final fusion image is obtained by the inversion of the NSST. Results show that the fusion images quality have considerably improved and is sufficiently clear.