Verifying Sanitizer Correctness through Black-Box Learning: A Symbolic Finite Transducer Approach

Sophie Lathouwers, Maarten Everts, Marieke Huisman

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

154 Downloads (Pure)


String sanitizers are widely used functions for preventing injection attacks such as SQL injections and cross-site scripting (XSS). It is therefore crucial that the implementations of such string sanitizers are correct. We present a novel approach to reason about a sanitizer’s correctness by automatically generating a model of the implementation and comparing it to a model of the expected behaviour. To automatically derive a model of the implementation of the sanitizer, this paper introduces a black-box learning algorithm that derives a Symbolic Finite Transducer (SFT). This black-box algorithm uses membership and equivalence oracles to derive such a model. In contrast to earlier research, SFTs not only describe the input or output language of a sanitizer but also how a sanitizer transforms the input into the output. As a result, we can reason about the transformations from input into output that are performed by the sanitizer. We have implemented this algorithm in an open-source tool of which we show that it can reason about the correctness of non-trivial sanitizers within a couple
of minutes without any adjustments to the existing sanitizers.
Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Information Systems Security and Privacy
Subtitle of host publicationVolume 1: ForSE
EditorsSteven Furnell, Paolo Mori, Edgar Weippl, Olivier Camp
Number of pages12
ISBN (Print)978-989-758-399-5
Publication statusPublished - 2020


  • Automata learning
  • Sanitizers
  • Symbolic finite transducers
  • Injection attacks
  • Software verification


Dive into the research topics of 'Verifying Sanitizer Correctness through Black-Box Learning: A Symbolic Finite Transducer Approach'. Together they form a unique fingerprint.

Cite this