News Release

Examination of Large Language Model "red-teaming" defines it as a non-malicious team-effort activity to seek LLMs' limits and identifies 35 different techniques used to test them

Peer-Reviewed Publication

PLOS

Summon a demon and bind it: A grounded theory of LLM red teaming

image: 

Naming the activity with an image. Answer to the question “What do you call this activity?” (Promptmancer, “A portrait of a promptmancer in the Lab” by feddie xtzeth—https://objkt.com/asset/KT1EEMp7Z2Dk2vKGYLYuJJiJgTdNSzsnGUyd/0). Promptmancer shows a character whose face resembles a black skull with red eyes sitting at a table with slightly raised hands, seemingly manipulating abstract shapes and figures on the wall in front of them without physical touch. The piece has a distinct science fantasy vibe with vivid, almost neon colors and a futuristic-looking helmet and suit. The title, Promptmancer, evokes the association of divination magic, as though by writing prompts in their “lab”, the character is practicing magic and conjuring forces. The character is smoking a cigarette, which elicits associations to a sweatshop worker, or at least portrays the activity as distinctly earthly (as in not-esoteric) or trivial.

view more 

Credit: Inie et al., 2025, PLOS One, CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

Examination of Large Language Model "red-teaming" defines it as a non-malicious team-effort activity to seek LLMs' limits and identifies 35 different techniques used to test them

 

 

Article URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314658

Article title: Summon a demon and bind it: A grounded theory of LLM red teaming

Author countries: US, Denmark

Funding: VILLUM Foundation, grant No. 37176: ATTiKA: Adaptive Tools for Technical Knowledge Acquisition. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.