英文摘要 |
Differential privacy is a kind of privacy protection model based on data distortion proposed by Dwork. As the model does not need to assume the prior knowledge of the attacker, it has been a research hot spot in the field of privacy protection. Aimed at the problem that the traditional differential privacy K-means algorithm is more sensitive to the selection of the initial center points, which reduces the usability of clustering results, an improved differential privacy preserving clustering algorithm (DEDP K-means) is proposed by introducing adaptive opposition-based learning technique and differential evolution algorithm. At the same time, the improved algorithm is parallelized based on the Spark platform. It was also demonstrated that the improved algorithm can optimize the selection of the initial centers, improve the usability of clustering results and have a good speedup when dealing with massive data by parallel experiments. |