Abstract
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
| Original language | English |
|---|---|
| Article number | 100367 |
| Pages (from-to) | 861–874 |
| Number of pages | 14 |
| Journal | International journal of speech technology |
| Volume | 27 |
| Issue number | 4 |
| Early online date | 16 Oct 2024 |
| DOIs | |
| Publication status | Published - Dec 2024 |
Keywords
- UT-Hybrid-D
Fingerprint
Dive into the research topics of 'Robustness of generative AI detection: adversarial attacks on black-box neural text detectors'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver