The Devil in My Language Model (abstract)

by Karin Valis

Large language models are multiverse generators. With every sentence exchanged with the model, we shape the cloud of probability distribution that affects the generation of the responses, or, on a more abstract level, we summon a superposition of simulacra we’re interacting with. There are hundreds of archetypal voices that reside in the trained corpus, and sometimes these characters flip out on us and turn malevolent - what is going on? In this talk we look at the machine learning phenomena called the Waluigi effect as the Jungian Shadow of Linear Algebra and its implications for the summoning of LLMs simulacra. We explore the mysterious evil turn of the model within the framework of the mythological figure of the Devil, through the lens of structural narratology and religious history. Once again, we see how these complex models act as a mirror of our human perception, building another layer of meta-narrative to our complex structure of knowledge and storytelling.

Screenshot 2024-05-07 at 15 17 31

mercurialminutes

Mercurial Minutes Landing Page

The Devil in My Language Model (abstract)