Abstract
In this paper, we tackle the problem of finding potentially problematic samples and complex regions of the input space for large pools of data without any supervision, with the objective of being relayed to and validated by a domain expert. This information can be critical, as even a low level of noise in the dataset may severely bias the model through spurious correlations between unrelated samples, and under-represented groups of data-points will exacerbate this issue. As such, we present two practical applications of influence functions in neural network models to industrial use-cases: exploration and cleanup of mislabeled examples in datasets. This robust statistics tool allows us to approximately know how different an estimator might be if we slightly changed the training dataset. In particular, we apply this technique to an ACAS Xu neural network surrogate model use-case[14] for complex region exploration, and to the CIFAR-10 canonical RGB image classification problem[20] for mislabeled sample detection with promising results.
Original language | English |
---|---|
Number of pages | 8 |
Publication status | Published - 2022 |
Event | 11th European Congress Embedded Real Time Systems, ERTS 2022 - Toulouse, France Duration: 1 Jun 2022 → 2 Jun 2022 Conference number: 11 |
Conference
Conference | 11th European Congress Embedded Real Time Systems, ERTS 2022 |
---|---|
Abbreviated title | ERTS 2022 |
Country/Territory | France |
City | Toulouse |
Period | 1/06/22 → 2/06/22 |