The Best of both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning

Stefanie Scherzinger, Christin Seifert, Lena Wiese

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    7 Citations (Scopus)
    218 Downloads (Pure)

    Abstract

    Machine learning experts prefer to think of their input as a single, homogeneous, and consistent data set. However, when analyzing large volumes of data, the entire data set may not be manageable on a single server, but must be stored on a distributed file system instead. Moreover, with the pressing demand to deliver explainable models, the experts may no longer focus on the machine learning algorithms in isolation, but must take into account the distributed nature of the data stored, as well as the impact of any data pre-processing steps upstream in their data analysis pipeline. In this paper, we make the point that even basic transformations during data preparation can impact the model learned, and that this is exacerbated in a distributed setting. We then sketch our vision of end-to-end explainability of the model learned, taking the pre-processing into account. In particular, we point out the potentials of linking the contributions of research on data provenance with the efforts on explainability in machine learning. In doing so, we highlight pitfalls we may experience in a distributed system on the way to generating more holistic explanations for our machine learning models.
    Original languageEnglish
    Title of host publication 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)
    PublisherIEEE
    ISBN (Electronic)978-1-7281-2519-0
    DOIs
    Publication statusPublished - 31 Oct 2019
    Event39th IEEE International Conference on Distributed Computing Systems 2019 - University of Texas, Dallas, United States
    Duration: 7 Jul 20199 Jul 2019
    Conference number: 39
    https://theory.utdallas.edu/ICDCS2019/

    Conference

    Conference39th IEEE International Conference on Distributed Computing Systems 2019
    Abbreviated titleISDCS 2019
    Country/TerritoryUnited States
    CityDallas
    Period7/07/199/07/19
    Internet address

    Fingerprint

    Dive into the research topics of 'The Best of both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning'. Together they form a unique fingerprint.

    Cite this