英文摘要 |
VoxelNet is a classical end-to-end target detection architecture, only the point cloud is used as the input to generate high precision 3D bounding box. However, the network performs the same operation for all points in the scene, without taking into account the close-density and far-sparse characteristics of the LiDAR, which makes different points may have different importance for target detection. And it’s not negligible that a large amount of calculation in 3D convolution makes the inference speed slow. In consideration of the above deficiencies, this paper proposes AttentionVoxelNet model by introducing attention mechanism and sparse convolution. This method learns the weights of points by attention network and fuses them with features in the stage of voxel feature extraction so that paying more attention to the features of the points with high importance. Sparse convolution is also used to replace the 3D convolution to improve detection efficiency. Finally, through the vehicle detection experiment on the KITTI dataset, the average precision of detection and inference speed are compared with those of the classical algorithm under three difficulty modes. The results show that relatively accurate bounding boxes are generated in the point cloud and bird’s eye view scenes. Then contrasted with VoxelNet, the average precision has been greatly improved in the simple, medium, and difficult modes, increasing by 5.98, 12.12, and 8.47 percentage points respectively, it also saves more than half of the time. Experiments illustrate that the proposed method is effective in introducing attention mechanism and sparse convolution, and it’s meaningful compared with the current 3D vehicle detection structure. |