Dot Code and Hidden AI Messages
What if a secret message was hidden right underneath your nose - in plain sight - and only a select few could ever hope to decipher it? The study of communicating secret messages and deciphering them is a field called cryptography. In this essay, I briefly describe a secret communication method used in World War II and then showcase how an analogous method is being used to manipulate AI.Figure 1: Alan Turing did groundbreaking work at Bletchley Park and was crucial to cracking the Enigma code.
In World War II, British Intelligence was a leader in cryptography. Thanks to films like 'The Imitation Game,' more people outside of computer science now recognize Alan Turing and his brilliant team at Bletchley Park. They cracked the famous German ENIGMA encryption machine, which allowed the Allies to decipher German military communication messages and plan strategies to counter the Nazis.
The British Intelligence services used several innovative ways to communicate covertly. One such method was the dot code. They embedded dots in public newspapers, which were widely available and read by many. The dots were placed in specific locations within the newspaper text. Each dot represented a letter, number, or word based on a pre-arranged key shared between the sender and receiver... These dots were small and integrated into the standard text of the newspaper, only noticeable by the keen observer, and even if observed, could easily be dismissed as a typographical error or a decorative element. Thus, secret communication was blended seamlessly with publicly available material. The intended recipient would make note of the dots, use their secret key, and decode the communication. Thus, even if someone familiar with the dot code observed the encrypted message in the newspaper, they would have a difficult time deciphering the message unless they had the key agreed upon between the sender and receiver.
Secret Communications to LLMs
Large Language Models (LLMs) like OpenAI’s ChatGPT, Google’s Gemini, and other similar applications can read typed text (usually in text or PDF format). Many companies use AI to review, summarize, or filter written documents to help speed up their work. For example, companies feed resumes through LLMs to filter for desired job candidates. Universities use LLMs to review essays for plagiarism. Scientists used LLMs to summarize research articles. Scientific journals use LLMs to produce peer review reports.
While LLMs offer immense benefits, their increasing integration also introduces new vulnerabilities. For instance, recent reports indicate that unscrupulous scientists are exploiting LLMs used in peer review processes by including secret messages in their papers.
According to Elizabeth Gibney, Nature independently found 18 such pre-print studies containing such hidden messages, and the authors span 44 institutions in the field of Computer Science, across North America, Europe, Asia, and Oceania.
Prompt Injection
This practice of inserting secret or coded messages to exploit LLMs is called prompt injection. It is a form of cybersecurity exploit that targets LLMs by crafting deceptive inputs to manipulate the model's output or behavior. Attackers exploit LLM’s tendency to treat all text inputs as potential commands and leverage this to execute unintended actions.
These types of malicious injections can result in misinformation, disclosure of sensitive information, and even remote code execution. Similar to existing vulnerabilities such as SQL Injection, Bash injection, and cross-site scripting, prompt injection is a specific vulnerability that affects LLM models.
How to Prevent Prompt Injection?
To my knowledge at the time of writing this article, there isn’t a single pre-built coding method that prevents prompt injection. Unlike SQL injection, you cannot sanitize the input by using a parameterized query. Instead, you need to take several steps to reduce the likelihood of prompt injection.
These are some methods that an LLM designer should implement to reduce prompt injection.
Conclusion
As developers, when designing LLMs, we need to be careful about sanitizing the input provided by users and creating protection mechanisms (such as the ones I have outliner in this essay) to protect it.
As a user, please start a new chat (which clears context) when we are changing conversation topics with an LLM. LLMs have context memory, and they may remember your past prompts even if you don’t want them to.