Repeated Knowledge Distillation with Confidence Masking to Mitigate Membership Inference Attacks

Federico Mazzone*, Leander van den Heuvel, Maximilian Huber, Cristian Verdecchia, Maarten Hinderik Everts, Florian Hahn, Andreas Peter

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
27 Downloads (Pure)


Machine learning models are often trained on sensitive data, such as medical records or bank transactions, posing high privacy risks. In fact, membership inference attacks can use the model parameters or predictions to determine whether a given data point was part of the training set. One of the most promising mitigations in literature is Knowledge Distillation (KD). This mitigation consists of first training a teacher model on the sensitive private dataset, and then transferring the teacher knowledge to a student model, by the mean of a surrogate dataset. The student model is then deployed in place of the teacher model. Unfortunately, KD on its own does not provide users much flexibility, meant as the possibility to arbitrarily decide how much utility to sacrifice to get membership-privacy. To address this problem, we propose a novel approach that combines KD with confidence score masking. Concretely, we repeat the distillation procedure multiple times in series and, during each distillation, perturb the teacher predictions using confidence masking techniques. We show that our solution provides more flexibility than standard KD, as it allows users to tune the number of distillation rounds and the strength of the masking function. We implement our approach in a tool, RepKD, and assess our mitigation against white-and black-box attacks on multiple models and datasets. Even when the surrogate dataset is different from the private one (which we believe to be a more realistic setting than is commonly found in literature), our mitigation is able to make the black-box attack completely ineffective and significantly reduce the accuracy of the white-box attack at the cost of only 0.6% test accuracy loss.

Original languageEnglish
Title of host publicationAISec 2022 - Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, co-located with CCS 2022
PublisherAssociation for Computing Machinery
Number of pages12
ISBN (Electronic)9781450398800
Publication statusPublished - 11 Nov 2022
Event15th ACM Workshop on Artificial Intelligence and Security, AISec 2022 - Los Angeles, United States
Duration: 11 Nov 202211 Nov 2022
Conference number: 15


Conference15th ACM Workshop on Artificial Intelligence and Security, AISec 2022
Abbreviated titleAISec 2022
Country/TerritoryUnited States
CityLos Angeles
OtherCo-located with CCS 2022


  • Confidence score masking
  • Defense
  • Knowledge distillation
  • Membership inference attack
  • Mitigation


Dive into the research topics of 'Repeated Knowledge Distillation with Confidence Masking to Mitigate Membership Inference Attacks'. Together they form a unique fingerprint.

Cite this