This article primarily examines object segmentation methods based on contextual semantic aggregation, particularly focusing on comparing DeepLab V2 and DeepLab V3+ on the VOC2012 dataset, and enhancing DeepLab V3+ to optimize its parameter count while maintaining a certain segmentation accuracy. Object segmentation is a crucial task in computer vision, aimed at distinguishing the target object from the background in an image. Recently, advancements in deep learning have led to significant improvements in object segmentation tasks, especially with convolutional neural network (CNN)-based methods such as DeepLab V2 and DeepLab V3+. These methods employ techniques like atrous convolution and multi-scale pooling to capture object information at various scales. Specifically, DeepLab V3+ introduced the Multi-scale atrous convolution module to further enhance segmentation accuracy. Nevertheless, DeepLab V3+ has a higher parameter count, necessitating improvements to boost its efficiency. Therefore, we proposed an enhancement method based on MobileNetV2, which involves adjusting the sampling rate combination of the Atrous Spatial Pyramid Pooling (ASPP) module and incorporating the Convolutional Block Attention Module (CBAM) mixed attention mechanism. Additionally, we included parallel self-attention mechanisms in the decoder section of the DeepLab V3+ structure and optimized feature fusion into a multi-branch feature fusion module. Through these modifications, we successfully reduced the parameter count of DeepLab V3+ while maintaining a certain level of segmentation accuracy. The experimental results indicate that the improved DeepLab V3+ method exhibits high efficiency and segmentation accuracy, making it highly effective in object segmentation tasks.