Black Box Watermarking for DNN Model Integrity Detection Using Label Loss

Yunfei Song; Yujia Zhu; 劉洋; Daoxun Xia

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Black Box Watermarking for DNN Model Integrity Detection Using Label Loss
並列篇名	Black Box Watermarking for DNN Model Integrity Detection Using Label Loss
作者	Yunfei Song、Yujia Zhu、劉洋 (Liu Yang)、Daoxun Xia
英文摘要	After significant investments of time and resources, the accuracy of deep neural network (DNN) models has reached commercially viable levels, leading to their increasing deployment on cloud platforms for commercial services. However, ongoing research indicates that the challenges facing deep neural network models are continually evolving, particularly with various attacks emerging to compromise their integrity. Deep neural networks are susceptible to poisoning attacks and backdoor attacks, both of which involve malicious fine-tuning of the deep models. Malicious fine-tuning can lead to unpredictable outputs from deep neural network models. Although at-tempts have been made to address this issue, these solutions often increase model complexity or diminish model performance. We propose a black-box watermarking technique based on trigger image sets, which can effectively detect malicious fine-tuning while also enabling copyright authentication. This watermarking technique builds upon black-box watermarking methods, leveraging trig-ger image sets and utilizing a two-stage alternating training approach to fine-tune the model. During training, a novel loss function is employed to optimize the trigger images, thereby embedding the watermark while preserving the model’s original classification capabilities. The proposed watermarking model is highly sensitive to malicious fine-tuning, resulting in unstable classification outcomes for trigger images. Ultimately, by inputting trigger image sets and analyzing the output of the watermarking model, the integrity of the deep neural network model can be verified. Experimental results demonstrate the effectiveness of this approach in detecting the integrity of DNN models.
起訖頁	277-290
關鍵詞	deep neural network、watermarking、trigger set、copyright protection
刊名	電腦學刊
期數	202408 (35:4期)
該期刊-上一篇	Architecture Design of Embedded Software IP Knowledge Base
該期刊-下一篇	Application of Neural Network-based Intelligent Refereeing Technology in Volleyball