The traditional motion posture recognition methods cannot capture the temporal relationship in a video sequence, which leads to the problem that the recognition effect of time-dependent behaviors is not ideal. Therefore, this paper proposes a cascade attention-based spatial-temporal convolutional neural network for motion posture recognition. Firstly, the convolutional neural network is used to model the time sequence relationship in the video, so as to capture the spatial-temporal information in the video efficiently. At the same time, the cascade attention mechanism is used to improve the low learning ability of spatial features caused by channel information moving on the time axis. Meanwhile, a new spatial-temporal network structure is constructed, which includes the spatial-temporal appearance information flow and spatial-temporal motion information flow. Finally, the weighted average method is used to fuse the two spatial-temporal networks to obtain the final recognition result. Experiments are conducted on UCF101 and HMDB51 datasets, respectively, and the recognition accuracy is 96.8% and 79.6%. Experiment results show that compared with the state-of-the-art network methods, the recognition accuracy with the proposed method has better effect and robustness.