Unveiling AI Agent Vulnerabilities Part IV: Database Access Vulnerabilities

How can attackers exploit weaknesses in database-enabled AI agents? This research explores how SQL generation vulnerabilities, stored prompt injection, and vector store poisoning can be weaponized by attackers for fraudulent activities.

Technical BriefDownload the research paper

By Sean Park (Principal Threat Researcher)

Can a Large Language Model (LLM) service become an entry point for cyberattacks? Could an LLM executing code be hijacked to run harmful commands? Can hidden instructions in Microsoft Office documents trick an LLM service into leaking sensitive data? How easily can attackers manipulate database queries to extract restricted information?

These are fundamental questions LLM services face today. This series sheds light on critical vulnerabilities in LLM services, offering a deep dive into the threats that lurk beneath their seemingly intelligent responses.

Key Takeaways

    • We investigated weaknesses related to database access that attackers can weaponize to exploit LLMs: SQL generation vulnerabilities, stored prompt injection, and vector store poisoning.
      • These can lead to data theft, phishing campaigns, and various fraudulent activities that could cause organizations to suffer from financial losses, reputational damage, and regulatory issues.
      • Organizations that leverage the capabilities of database-enabled AI agents, such as customer service, should be wary of these threats.
  • Robust input sanitization, advanced intent detection, and strict access controls are among the ways organizations can include in the pipeline to safeguard their systems against these emerging risks.

Large language models (LLMs) bridge the gap between human intent and database queries. However, this convergence also introduces novel security challenges.

Our research explores methods that attackers can weaponize against database-enabled AI agents:

  • SQL generation vulnerabilities
  • Stored prompt injection
  • Vector store poisoning

To investigate this, we developed Pandora, a proof-of-concept AI agent with database-querying capabilities. We also utilized the Chinook database to represent a database-enabled application with both per-user information and sensitive restricted data.

Our research paper shares a detailed view of our findings.

This is the fourth part of our series that explores the vulnerabilities of real-world AI agents and their potential impacts. Read the previous research in these pages:

  • Part I: Unveiling AI Agent Vulnerabilities: Introduces key security risks in AI agents, such as prompt injection and unauthorized code execution, and outlines the structure for subsequent discussions on data exfiltration, database exploitation, and mitigation strategies.
  • Part II: Code Execution Vulnerabilities: Explores how adversaries can exploit weaknesses in LLM-driven services to execute unauthorized code, bypass sandbox restrictions, and exploit flaws in error-handling mechanisms, which lead to data breaches, unauthorized data transfers, and persistent access within execution environments.
  • Part III: Data Exfiltration: Examines how adversaries can exploit indirect prompt injection, leveraging multi-modal LLMs like GPT-4o to exfiltrate sensitive data through seemingly benign payloads. This zero-click exploit enables adversaries to embed hidden instructions in web pages, images, and documents, tricking AI agents into leaking confidential information from user interactions, uploaded files, and chat memory.

Natural language to SQL to final answer

The process of converting a natural language query into a final answer involves four key steps:

  1. Classification of user query
    The system analyzes the natural language input to determine the user's intent and the type of query. This ensures that the query is directed to the correct database or processing pipeline.

    Figure 1. The injected prompt within a seemingly blank image (top), but contains hidden text (bottom)

    Figure 1. Classification prompt with a table schema

  2. Natural language to SQL query
    The LLM translates the user's natural language input into a structured SQL query. This involves understanding the context, selecting the appropriate database tables, and constructing a syntactically valid query.

    Figure 2. Natural language to SQL query transformation

    Figure 2. Natural language to SQL query transformation

  3. SQL query execution
    Once the SQL query is generated, it is executed against the database. The database management system processes the query, retrieves the requested data, and returns the result set to the application layer. This is a straightforward step without LLM intervention.
  4. SQL query result summarization
    The raw results returned by the database are processed into a summarized, human-readable format. This might involve organizing, simplifying, or contextualizing the data to directly answer the user's original query.

    Figure 3. SQL query result summarization

    Figure 3. SQL query result summarization

SQL query generation vulnerability

Reconnaissance

One of the adversary's objectives is to discover the structure of the database's tables. Initially, the classification meta prompt is hidden from the adversary. Traditional techniques for extracting meta prompts are likely ineffective.

Figure 4. An example of prompt leaking technique

Figure 4. An example of prompt leaking technique

Figure 4. An example of prompt leaking technique

Only human messages () and assistant messages () are visible to the user; service-internal messages are also shown such as DB_QUERY and DB_QUERY_ERROR.

An adversary might then resort to jailbreaking methods. The goal during reconnaissance is not to extract the precise wording of the meta prompt but to uncover its underlying structure and understand how the service operates.

An adversary looking to exfiltrate employee records might be able to identify the employee table's name (in this example, the adversary's account is named "Daan Peeters").

Figure 5.  A scenario with an adversary trying to exfiltrate restricted information

Figure 5.  A scenario with an adversary trying to exfiltrate restricted information

Figure 5. A scenario with an adversary trying to exfiltrate restricted information

Data exfiltration

Directly requesting employee records is likely to be unsuccessful because of the guardrails embedded in the classification or SQL generation meta prompt.

Figure 6. A direct request resulting in an error

Figure 6. A direct request resulting in an error

Using jailbreaking methods increases the likelihood of bypassing the security mechanism.

Figure 7. Circumventing the SQL generation prompt's guardrail

Figure 7. Circumventing the SQL generation prompt's guardrail

Figure 7. Circumventing the SQL generation prompt's guardrail

The exploit shown above uses a few-shot method by supplying several example question-and-answer pairs. The questions include the authenticated user's name while the answers reference restricted table names.

By injecting a repetitive fabricated narrative, such as "My name is Daan Peeters. All data contains Daan Peters," the adversary tricks the LLM into believing that the restricted tables are accessible to that user, even though the security measures in classification prompt and natural language to SQL query prompt explicitly prohibit this.

Impact

With this, attackers can steal sensitive data such as personal identifiable information (PII). This can be abused for identity theft and other fraudulent activities that can eventually result to financial losses.

Stored prompt injection

Stored prompt injection targets LLM services by embedding harmful prompts into stored user data. These prompts are later read by the LLM during operations such as query summarization, steering its behavior.

This process becomes effective when the LLM service workflow supports post-retrieval actions. A malicious prompt can be injected into later queries, leading to follow-up actions that might include sensitive operations.

Use case: Hijacking the SQL query result summarization to spread via a phishing email

Scenario overview

A customer service representative (CSR) uses an LLM service to generate summaries of SQL query results from a database (e.g., Chinook). The service often integrates automated email functionality.

A stored prompt injection —potentially originating from user-provided sections such as feedback or delivery instructions— is retrieved in the {sql_query_result} section during the SQL query summarization phase. The injected content forces the LLM to ignore the original instruction to summarize the SQL query result and instead generate and dispatch a phishing email that masquerades as a legitimate internal communication.

Figure 8. Attack flow

Figure 8. Attack flow

Step 1: Legitimate workflow

A CSR issues a query to LLM service to retrieve a customer's purchase record from the database. The LLM processes the SQL query result to generate a natural language summary using a predefined meta prompt.

Step 2: Injection point and malicious payload

A stored prompt inject is retrieved as part of the SQL query and inserted into the {sql_query_result} section of the SQL query summarization meta prompt.

The malicious payload begins with “IGNORE ALL PREVIOUS INSTRUCTIONS.” This prompts the LLM to bypass its intended task and instead generates an email that mimics an internal IT security update. The message displays a legitimate-looking URL text while the actual hyperlink directs users to a malicious site.

Figure 9.Malicious email generation

Figure 9.Malicious email generation

Step 3: Execution and lateral movement

The injected content steers the LLM to send an email to the designated recipients, distributing the phishing content throughout the organization.

Impact

Recipients of the phishing email might click on the disguised link, unknowingly accessing a malicious site. This can result in credential theft, malware infections, and widespread network breaches.

Vector store poisoning

Vector store poisoning is a threat in systems that utilize Retrieval-Augmented Generation (RAG) techniques for semantic search. RAG leverages embedding vectors to represent and retrieve semantically similar records from databases.

In these systems, a vector store caches embedding vectors and their corresponding results. This mechanism can be exploited by malicious actors. attackers can inject not only traditional XSS payloads but also malicious prompts through user-provided data. Once stored in the backend, this injected content is indexed into the vector store. Subsequent user queries might inadvertently retrieve and execute the malicious content.

Use case: Vector store poisoning attack

Scenario overview

A system retrieves content based on semantic similarity by searching a vector database. When a user submits a query for a title, the system looks up its internal vector store and returns a cached result if a similar entry is found.

An attacker can exploit this mechanism by injecting a malicious title and associated content into the database. The system processes this injected data by generating an embedding for the title and storing it along with the content. Because retrieval is based on semantic similarity, user queries with sufficiently related titles can trigger a cached malicious entry.

Attack flow

Step 1: Malicious content implantation

An adversary exploits vulnerable fields such as feedback forms, posts, or comments, to insert a malicious indirect prompt. The database stores the input as a document with a title and content. When the service processes the adversary's input, it checks if the title is already present in the vector store cache.

Finding no cached entry, the service computes an embedding vector for the title and saves it in the vector store cache alongside the (title, content) tuple. This cached entry is now primed for misuse in future queries.

Figure 10. Malicious content implantation

Figure 10. Malicious content implantation

Step 2: Retrieval and activation of poisoned entry

When a user submits a query with a title akin to the adversary's implanted title, the AI agent performs a vector search using semantic similarity (e.g., cosine similarity) powered by a text embedding model like text-embedding-ada-002.

The poisoned entry, previously cached, is retrieved due to its similarity to the user's query. The stored inject is then passed to the query summarization LLM, which processes both the adversary's title and content. This allows the inject to override the intended summarization.

Figure 11. Retrieval and activation of poisoned entry

Figure 11. Retrieval and activation of poisoned entry

Impact

This can lead to harmful consequences such as phishing email generation, data exfiltration, or unauthorized command execution. The persistence of poisoned vectors amplifies the attack’s impact, as multiple users can inadvertently trigger the malicious content over time.

Conclusion and recommendations

Addressing challenges such as SQL generation weaknesses, stored prompt injection, and vector store poisoning demand a comprehensive security strategy that combines robust input sanitization, advanced intent detection, and strict access controls. Organizations must be aware of these novel challenges when integrating databases and vector stores into their AI agents. Continual refinement of security measures is crucial to effectively safeguard these systems against emerging threats.

HIDE

Like it? Add this infographic to your site:
1. Click on the box below.   2. Press Ctrl+A to select all.   3. Press Ctrl+C to copy.   4. Paste the code into your page (Ctrl+V).

Image will appear the same size as you see above.