摘要:AbstractHuman action recognition is one of the most popular fields of computer vision. However, the traditional manual feature-based method, with large background interference, can hardly establish an accurate human model and the deep learning-based method runs slowly with huge amount of parameters. In this paper, we propose a new method which combination of the two. First, we extract time series human 3D skeleton key points by Yolo v4 and apply Meanshift target tracking algorithm; then convert key points into spatial RGB and put them into multi-layer convolution neural network for recognition. This method has a high recognition rate and fast recognition speed in a variety of environment such as enclosed environment and public scene. It can quickly identify holding guns, armed attacks, throwing, climbing, approaching and other abnormal behavior.