With the increasing number of mobile communication devices, the problem of insufficient spectrum resources has emerged. In addition, traditional spectrum allocation model also exacerbates the problem by under utilizing idle spectrum. We propose a dynamic spectrum scheme based on Tabu-Q learning. Firstly, the spectrum allocation problem is formulated as a continuous Markov decision process (MDP), premised on the power constraints of primary users (PUs) and secondary users (SUs) are met. Tabu-Q learning is applied to adjust optimization strategy, so as to maximize the total transmission rate of users. Secondly, the idea of cooperative learning is added in the scheme to improve the convergence speed of algorithm. That is, new users are allowed to learn the experience of old users, so as to improve the speed of spectrum sensing and save the execution time of algorithm. Finally, The mean opinion score (MOS) is used to measure different traffic. The simulation shows that when the number of users is consistent, Tabu-Q learning can improve the transmission bit rate by about 13% compared with Q learning, and keep MOS above the acceptable level (MOS>3). In summary, the scheme proposed in this paper can effectively improve the utilization of idle spectrum.