Efficient Black-Box Adversarial Attacks on Neural Text Detectors

Vitalii Fishchuk, Daniel Braun

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Neural text detectors are models trained to detect whether a given text was generated by a language model or written by a human. In this paper, we investigate three simple and resource-efficient strategies (parameter tweaking, prompt engineering, and character-level mutations) to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors. The results show that especially parameter tweaking and character-level mutations are effective strategies.
Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP-2023)
EditorsMourad Abbas, Abed Alhakim Freihat
PublisherAssociation for Computational Linguistics (ACL)
Pages78-83
Number of pages6
ISBN (Electronic)979-8-89176-065-3
Publication statusPublished - 1 Dec 2023

Keywords

  • 2024 OA procedure

Fingerprint

Dive into the research topics of 'Efficient Black-Box Adversarial Attacks on Neural Text Detectors'. Together they form a unique fingerprint.

Cite this