Abstract
Among all privacy attacks against Machine Learning (ML), membership inference attacks (MIA) attracted the most attention. In these attacks, the attacker is given an ML model and a data point, and they must infer whether the data point was used for training. The attacker also has an auxiliary dataset to tune their inference algorithm.
Attack papers commonly simulate setups in which the attacker’s and the target’s datasets are sampled from the same distribution. This setting is convenient to perform experiments, but it rarely holds in practice. ML literature commonly starts with similar simplifying assumptions (i.e., “i.i.d.” datasets), and later generalizes the results to support heterogeneous data distributions. Similarly, our work makes a first step in the generalization of the MIA evaluation to heterogeneous data.
First, we design a metric to measure the heterogeneity between any pair of tabular data distributions. This metric provides a continuous scale to analyze the phenomenon. Second, we compare two methods to simulate a data heterogeneity between the target and the attacker. These setups provide opposite performances: 90% attack accuracy vs. 50% (i.e., random guessing). Our results show that the MIA accuracy depends on the experimental setup; and even if research on MIA considers heterogeneous data setups, we have no standardized baseline of how to simulate it. The lack of such a baseline for MIA experiments poses a significant challenge to risk assessments in real-world machine learning scenarios.
Attack papers commonly simulate setups in which the attacker’s and the target’s datasets are sampled from the same distribution. This setting is convenient to perform experiments, but it rarely holds in practice. ML literature commonly starts with similar simplifying assumptions (i.e., “i.i.d.” datasets), and later generalizes the results to support heterogeneous data distributions. Similarly, our work makes a first step in the generalization of the MIA evaluation to heterogeneous data.
First, we design a metric to measure the heterogeneity between any pair of tabular data distributions. This metric provides a continuous scale to analyze the phenomenon. Second, we compare two methods to simulate a data heterogeneity between the target and the attacker. These setups provide opposite performances: 90% attack accuracy vs. 50% (i.e., random guessing). Our results show that the MIA accuracy depends on the experimental setup; and even if research on MIA considers heterogeneous data setups, we have no standardized baseline of how to simulate it. The lack of such a baseline for MIA experiments poses a significant challenge to risk assessments in real-world machine learning scenarios.
| Original language | English |
|---|---|
| Title of host publication | Applied Cryptography and Network Security Workshops |
| Pages | 109-117 |
| ISBN (Electronic) | 978-3-032-01823-6 |
| DOIs | |
| Publication status | Published - 25 Oct 2025 |
| Event | 23rd International Conference on Applied Cryptography and Network Security, ACNS 2025 - Munich, Germany Duration: 23 Jun 2025 → 26 Jun 2025 Conference number: 23 |
Publication series
| Name | Applied Cryptography and Network Security Workshops |
|---|---|
| Volume | 15655 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 23rd International Conference on Applied Cryptography and Network Security, ACNS 2025 |
|---|---|
| Abbreviated title | ACNS 2025 |
| Country/Territory | Germany |
| City | Munich |
| Period | 23/06/25 → 26/06/25 |
Keywords
- 2026 OA procedure
Fingerprint
Dive into the research topics of 'Evaluating Membership Inference Attacks in Heterogeneous-Data Setups'. Together they form a unique fingerprint.Research output
- 1 Preprint
-
Evaluating Membership Inference Attacks in heterogeneous-data setups
van Dartel, B., Damie, M. & Hahn, F., 26 Feb 2025, ArXiv.org.Research output: Working paper › Preprint › Academic
Open AccessFile23 Downloads (Pure)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver