Protect Production SQL Databases from AI/LLM Agentic SQL Query Risks
Overview
Many are concerned with how to protect production databases from AI query agents, a subset of AI coding agents, designed to assist in writing SQL queries against a database using natural-language prompts. It is helpful to think about this new capability as a pseudo-deterministic compiler that takes a prompt, reads the schema of the database, and produces dynamic queries that are executed against the production database with minimal human intervention. I stop short of calling these agents non-deterministic despite the LLM underpinnings because the set of syntactically valid SQL is more limited than general natural language.
In this article, we look at the real underlying security issue at play, which can be categorized as:
- The Problem: The “God User” has not gone away; AI agents have just inherited its powers.
- The Physical Fix: Read replicas are the only 100% guarantee against data destruction.
- The Architectural Fix: Treat the database as a security engine, not a dumb data store.
Importantly, the AI-assisted agent is not a priori more dangerous of an insider threat than your “sort of” technical CEO with the same elevated permissions. We categorize these agents as pseudo-deterministic natural language query builders, which is explained in more depth below.
We will unpack this scenario further.
The “God User” Re-emergence
In my 2006 paper, Application Layer Intrusion Detection for SQL Injection (ACM digital library), I identified a primary driver of database vulnerability: the “God User.” This anti-pattern occurs when a web application or agent connects to the database using a single account with broad, unconstrained permissions.
Twenty years later, the “God User” has returned as the default configuration for many AI agents. In truth, it never went away. Most Ruby on Rails applications, and many other frameworks, still default to this pattern. But that does not reduce the inherent risk of treating the database as a passive data store.
When you give an LLM-based agent a connection string with read/write access to your production tables, you are effectively granting “God User” status to a non-deterministic process. Even if the agent’s system prompt says, “Only generate SELECT queries,” a clever prompt injection or a structural hallucination can easily bypass those natural language “guardrails.”
Moving from Vibe-Based Security to Deterministic Guardrails
The industry is currently being sold expensive “AI Governance” middleware that promises to watch every query. But these solutions are prone to fail to handle RDBMS line speeds and introduce new “black box” logic into your stack.
Instead, we should look at three battle-tested, deterministic safeguards:
- Read-Only Replicas as a Physical Boundary: If an AI agent only needs to perform data analysis, it should never touch the primary database. By pointing the agent at a Read-Only Replica, you create a logical and physical impossibility for
DROP,DELETE, orUPDATEcommands to succeed, no matter how clever an injection. - Lexical Shape Validation: As I argued in 2006, we can detect anomalies by examining the lexical anatomy of the query. A reporting agent should have a predictable “shape” (e.g.,
SELECT ... FROM ... WHERE ...). If the generated query suddenly includes aUNIONor aJOINon system tables, the application layer should kill the execution based on structural violation, not “intent” analysis. - Role-Based Access Control (RBAC): Modern databases (Postgres, MariaDB, Microsoft SQL Server) have robust permission systems. An AI agent should run as a restricted user with access only to specific views or schemas. Raw production table access should almost never be granted.
The Open Question of Lexical Anomaly Detection
Lexical validation is often the missing piece in current AI middleware, yet it is the most robust way to secure the connection. It requires moving security closer to the RDBMS, but the performance and safety gains are worth the effort.
This is where the next generation of “AI-native” security will be won. We don’t need smarter chatbots; we need faster, deterministic parsers that understand the anatomy of a query before it ever reaches the database engine.
Understanding Threat Boundaries
The illustration above shows how data flows through a typical web application. Public users interact with the web server, which interacts read/write with the production database. This communication occurs across a trust boundary using a SQL connection string for a role limited strictly to application-level tasks.
Behind the scenes, a replication process clones the data across a hard threat boundary to a reporting SQL database (read replica). The AI query agent has read-only access exclusively to this replica.
The replica and production database exist in two different threat boundaries. This creates a logical “air gap”:
- No Side Effects: No action against the replica (such as a hallucinated DROP) can impact the production database.
- Performance Integrity: The AI can run massive, unoptimized natural-language queries without causing row-level locking or performance degradation on the primary node.
Note on Schema Visibility & Sensitive Information Disclosure
Why we do not hide the schema: Some suggest hiding table names from AI to improve security. We disagree. Obscurity is not security. Even if the AI knows your entire schema, the read-only replica ensures that no matter what the LLM knows or tries to do, it physically lacks the permission to alter the state of your production environment.
Another important security measure that we are not addressing in this article is data protection. That is limiting what data the AI can read. SQL database servers provide for robust role based access control. Those organizations entrusted with Personally Identifiable Information (PII) or security sensitive data SHALL make use of these facilities or should consider the lack of use negligence. The details of sensitive data exposure protection will be considered in a future article.
The Cost of Middleware vs. The Simplicity of Replicas
The industry is pushing “AI Governance Gateways” as the only solution. But for most companies, these tools often function as a performance tax on your database.
If you are considering an AI query agent or a text-to-SQL reporting solution, examine your SQL database’s native replication features first. You can build a deterministic, wire-speed security boundary for the cost of a single cloud instance. Don’t spend six figures on middleware to solve a problem that SQL anatomy and role-based permissions already address.
Backups
It should surprise no one that backups of the production database are vitally important. Have an automated backup process and test it periodically so that you can recover from all sorts of data destruction events, not just those related to an AI query agent risk.
Next steps
If you are considering using an AI query agent, or text-to-sql reporting solution, take a good look at your SQL database’s replication features and set that up before you go spending far too much money on some middleware.
As a firm, we have significant experience helping organizations and government agencies with both relational and NoSQL databases both on-prem and in the AWS cloud environment. We can help you walk through achieving these security objectives within your existing environment without expensive middleware.
Footnote
Note: The foundational research referenced here was conducted in 2005 while I was a student at the Georgia Institute of Technology and published in March 2006. Since then, I have earned an M.S. in Information Security from Georgia Tech and have continued to expand this work through my role as Founder and CEO of Rietta Inc. In 2015, at the request of my previous professor Sham Navathe, I contributed to the security chapter of the 7th edition of the Fundamentals of Database Systems textbook published by Addison-Wesley.