Text2SQL: The Art of Teaching Machines to Speak Database

The Joy, Pain, and Chaos of Letting LLMs Write SQL

Apr 21, 2025

Somewhere between human curiosity and database rigor lies a strange little problem: how do you turn a natural question—“Which customers churned last month?”—into a rigid SQL query that a database won’t complain about?

It turns out the answer isn’t just about syntax or token prediction. It’s about context, intention, ambiguity, and sometimes—just plain common sense. Which, ironically, LLMs don't really have. But they're getting surprisingly good at faking it.

And that’s both thrilling and terrifying.

When Questions Get Lost in Translation

Let me start with a confession: I love SQL. Not in a romantic way, but in the way a chess player might love the rules which can be constrained, expressive or brutally logical. But most people don’t think in WHERE clauses and LEFT JOINs. They think in stories:

“Show me all the customers who downgraded their subscription after contacting support more than twice in Q1.”

To a human, this is a straightforward ask. To a model, it’s a riddle wrapped in an enigma inside a schema full of snake_case table names.

The funny thing is—sometimes the model nails it. Other times, it confidently gives you a syntactically perfect query that subtly misinterprets the entire request. You might get users who are thrilled to see a dashboard… until they realize it excludes all EU customers due to a missed join.

SQL by Proxy: The Illusion of Understanding

What LLMs are doing here is a kind of mimicry. They aren’t solving SQL - they’re performing a language trick with statistical memory and prompt context. When you’re watching closely, it’s like seeing a magician shuffle cards behind their back. You’re impressed, but skeptical.

That’s why real-world Text2SQL systems have to act more like orchestras than soloists. You need a retrieval system to bring the right schema to the model. You need a validator to check for dangerous or absurd queries. You need policy enforcement to stop someone from querying the ssn field in users.

You also need… a therapist? Sort of.

Because users don’t always know what they want. They’ll ask a vague question and expect a miracle. They’ll contradict themselves mid-dialogue. Sometimes the hardest part is teasing out intent and that requires conversational design, not just code.

Beyond BI: What We’re Really Building

People think Text2SQL is about making analytics easier. I think it’s about making data conversational. We’re shifting from dashboards and drill-downs to something more fluid. Less command-line, more curiosity-driven.

At its best, a good Text2SQL system lets you reason with data. You ask a question, spot something odd, rephrase, dive deeper. This kind of loop: ask, analyze, refine is how insight happens. It's not just about answering one-off queries; it's about enabling data dialogue.

But for that loop to work, you need trust. You need consistency. You need guardrails that let users explore without falling off a cliff.

The Weird Stuff That Actually Matters

Let me give you a few odd lessons that don’t show up in most blog posts:

Synonym drift is real. “Revenue” and “sales” don’t always mean the same thing in practice, even if they look interchangeable. Your LLM might not know your company’s language politics. Train it—or better yet, give it a glossary.
People ask subjective questions. “Who are our best customers?” is not an objective query. It’s a negotiation. Do you mean highest spenders? Most loyal? Most engaged? Systems that treat all language literally miss the forest for the forest.
Your schema is a liability. Legacy table names, redundant fields, undocumented foreign keys—LLMs don’t care why your data is messy, but your users will care when the model trips over it. Cleaning metadata is as important as fine-tuning prompts.

A More Honest Architecture

Here’s what I’ve come to believe: great Text2SQL systems aren’t built on one model. They’re built on many layers working in concert:

Input interpretation: Is this even a SQL-able question?
Schema mapping: Which tables and columns might be relevant?
LLM generation: A first pass, not the final say
Validation and rewriting: Catch mistakes, apply constraints
Feedback and revision: Let users say, “That’s not quite what I meant…”

And somewhere in there, you need humor. No joke. Users will ask nonsense. They’ll try to break the system. Sometimes the best thing you can do is handle a failed query with a bit of charm: “That query gave me SQL indigestion—mind rephrasing?”

The Future: Not Self-Service, But Co-Pilots

The end goal isn’t full automation. It’s augmentation. Text2SQL systems don’t replace analysts - they sit beside them, helping less technical users explore confidently, and helping experts move faster. The real revolution isn’t that people can query databases with plain English. It’s that language becomes a tool for iteration, not just specification. That’s a subtle but powerful shift.

I’m excited about this direction but not because it’s solved, but because it’s hard. Because it mixes language, logic, UI design, governance, and trust. Because it refuses to be reduced to “just another prompt.” And because somewhere out there, someone is still calling their transactions table “tx_2021_bkup_final_final_reallyfinal”.

The machines have no idea what they’re walking into. But we do.

Mathy AI Substack

Discussion about this post