Skip to main navigation Skip to search Skip to main content

Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

  • Vitalii Fishchuk
  • , Daniel Braun*
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

145 Downloads (Pure)

Abstract

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
Original languageEnglish
Article number100367
Pages (from-to)861–874
Number of pages14
JournalInternational journal of speech technology
Volume27
Issue number4
Early online date16 Oct 2024
DOIs
Publication statusPublished - Dec 2024

Keywords

  • UT-Hybrid-D

Fingerprint

Dive into the research topics of 'Robustness of generative AI detection: adversarial attacks on black-box neural text detectors'. Together they form a unique fingerprint.

Cite this