With the advance of localization technology, wireless localization methods can position humans rapidly. However, the attenuation of signal transmission in the indoor environment caused by the Path loss and the noise leads to the instability of localizatioon. Proposed by Google recently, Tensorflow Object Detection API, which is structured by a SSD neural network model, can detect people accurately. In this paper, we apply the Tensorflow Object Detection API to detect humans, and transform their images coordinates to real world positions with Inverse Perspective Mapping (IPM). We can track humans in camera sequences accurately and rapidly localize them at the cost of lowering the frame rate. Our experimental results show that our proposed method can correctly detect humans once a second, and the accuracy of positioning then can be improved by less than 100cm.