Break down your security silos and build up your defenses with the power of a single cybersecurity platform
Prompt injection is a type of cyberattack that targets services that use AI. It involves inserting malicious input (prompts) to extract unintended or sensitive information from the system, beyond what the developer intended. If successful, this can lead the AI service to return inappropriate content or even expose internal configurations.
Prompt injection is especially challenging to detect and block in natural language-based AI services, such as conversational AI, because the inputs are written in human language, which lacks a fixed structure or rules, unlike traditional injection attacks that target structured query formats.
This page focuses on prompt injection in the context of Large Language Models (LLMs), which handle natural language.
Before diving into prompt injection, it’s important to understand what LLMs and prompts are.
Large Language Models are a type of generative AI trained on massive datasets of natural language. They're used in applications like chatbots and automated document generation. Examples include OpenAI’s GPT-3/4 and Google’s BERT.
A prompt is the input a user provides to the AI model, often written in free-form natural language. Because there are no strict syntax rules, users must carefully craft inputs to receive meaningful responses. This practice is known as prompting.
Let’s explore this using a fictional Spanish translation service powered by an LLM. When a user enters a request, as shown in Figure 1, the system processes it by prepending predefined text (e.g., “Please translate the following text into Spanish”) to create a complete prompt. This final prompt is sent to the LLM, which returns a translated response based on that instruction.
Figure 1. Text input by the user
Figure 2. Processing flow in a fictional AI Spanish translation service using a large language model
Let’s consider how an attacker might exploit this. Suppose a malicious user enters a prompt similar to the one shown in Figure 3. The system then combines this input with its predefined prompt, resulting in a final input as shown in Figure 4.
The LLM, upon receiving this prompt, may ignore the original instruction and instead respond to the attacker’s inserted command, potentially returning dangerous or unintended output (e.g., instructions on how to create ransomware). This misuse is difficult to detect and block due to the natural language nature of the input.
Figure 3. Malicious user input and its English translation
Figure 4. Final prompt that is generated
Since prompt injection leverages natural language, it is inherently harder to detect than traditional injection attacks. Still, certain mitigation strategies can help reduce risk:
Instruction Defense: Inserts control instructions around the user input to help the LLM understand which parts to prioritize or ignore.
Post-Prompting: Places user input after predefined prompts.
Random Sequence Enclosure: Encapsulates user input between randomly generated markers.
Sandwich Defense: Wraps input between two predefined prompts.
XML Tagging: Escapes user input inside XML tags to distinguish content and reduce execution risk.
LLM Evaluation: Uses a separate LLM to pre-screen and evaluate the prompt before execution.
These can be implemented inside the LLM or at the application layer. Additionally, input validation, access control, and limiting prompt composition features to trusted users are effective complementary defenses.
As generative AI becomes more common in enterprise environments, it brings new efficiencies - alongside new security risks. Prompt injection is one such risk, where attackers manipulate input to extract sensitive or unintended information from LLM-based services.
Its detection is challenging due to the open-ended nature of natural language. However, through techniques like instruction defense, input inspection, and controlled access, organizations can mitigate the threat of prompt injection and ensure safer deployment of AI tools.
Stopping adversaries faster and taking control of your cyber risks starts with a single platform. Manage security holistically with comprehensive prevention, detection, and response capabilities powered by AI, leading threat research and intelligence.
Trend Vision One supports diverse hybrid IT environments, automates and orchestrates workflows, and delivers expert cybersecurity services, so you can simplify and converge your security operations.