A 14-agent AI pipeline that turns natural language into production-grade SQL — against databases with 500+ tables. Not a ChatGPT wrapper. A self-correcting, type-safe agentic system with vector-based schema discovery, multi-stage verification, and real-time progress tracking.

See It Work

User asks:

“What were our top 10 customers by revenue last quarter?”

QueryNLQ processes through 14 specialized agents:

Classifies intent → complex SQL query
Plans approach → join orders + customers, aggregate revenue, filter by date range
Searches 500+ table schema via Milvus → finds orders, customers, invoices tables
Generates SQL with pruned schema context
Verifies by executing in read-only transaction → 10 rows, no errors
Reviews for logical correctness and performance
Returns results + the SQL for full transparency

SELECT c.name, SUM(i.amount) as total_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN invoices i ON o.id = i.order_id
WHERE o.order_date >= '2025-10-01' AND o.order_date < '2026-01-01'
GROUP BY c.name
ORDER BY total_revenue DESC
LIMIT 10;

Time: under 3 seconds for simple queries. 15-30 seconds for complex multi-table analysis.

Follow-up: “Now filter that by the Asia region” → QueryNLQ maintains conversation context and modifies the query.

Complex analytical queries that work:

“Run a cohort analysis — group customers by signup month and show retention over 6 months” → multi-CTE query with date bucketing, self-joins, and pivot logic
“Analyze sentiment trends across support tickets by product category over the last quarter” → aggregation across multiple tables with text-based classification and time-series grouping

These aren’t cherry-picked demos. They represent the kind of multi-step analytical queries that typically require a data analyst writing custom SQL — handled end-to-end by the agent pipeline.

Results

90%+ query accuracy on production schemas with 500+ tables
Sub-3s response for simple queries via the quick-pass path
Self-correcting — verification + review loops catch and fix SQL errors before execution
Zero unauthorized data access — read-only transactions, column-level permissions at the validation layer
Real-time progress — users see which agent is working via WebSocket updates

The Problem

Business users drown in data they can’t access. At one MSME finance company, every ad-hoc question — “What’s our receivables aging by region?” or “Which customers haven’t ordered in 90 days?” — required a developer, a Slack thread, and a 2-day turnaround. Finance teams needed answers in minutes, not days.

The problem gets harder at enterprise scale. Databases with 500+ tables, cryptic column names like cust_acct_xref_id, and complex relationships across schemas. No single LLM prompt can hold 500 table DDLs. Single-prompt NL-to-SQL tools break at 20-30 tables.

Architecture: 14-Agent Query Pipeline

The query pipeline is orchestrated by Pydantic-Graph — a stateful directed graph where each node is a specialized Pydantic-AI agent with its own system prompt, output schema, and validation rules.

Entry: Intent Classification The UserInteraction agent classifies the query into six categories: general knowledge, simple SQL, complex SQL, data analysis, schema metadata, or needs clarification. A “quick pass” handles simple queries inline in under 3 seconds; complex queries are dispatched to Celery workers for async processing.

Planning: Approach → Execution → Tasks Three planning agents decompose complex queries. The ApproachPlanner generates a numbered strategy. The ExecutionPlanner converts it into concrete SQL steps. The TaskPlanner breaks it into discrete, executable tasks — each becomes an independent SQL generation cycle.

Discovery: Semantic Search → Schema Retrieval → Table Selection For each task, the SemanticSearch agent queries Milvus with an embedding of the enriched query. It returns the top-k most relevant tables from the 500+ table index. The SchemaRetrieval agent fetches their full JSON schemas. If there’s ambiguity, TableSelection asks the user to confirm.

Generation: SQL → Verification → Review (self-correcting loop) The SQLGeneration agent produces SQL using the pruned schema, task instructions, and conversation history. SQLVerification executes it in a read-only transaction to check for errors and anomalies. SQLReviewer validates logical correctness and performance. If either finds issues, the query loops back to SQLGeneration with specific feedback — a self-correcting cycle that runs until the SQL passes all checks.

Safety: Error Handling & Loop Detection An ErrorHandler agent analyzes failures and routes to recovery. A LoopDetectionManager tracks per-agent visit counts — if an agent is visited too many times, the system breaks the loop and asks the user to rephrase. No infinite cycles.

Architecture: Schema Embedding Pipeline

Before a single query can be answered, the database must be indexed. This is the pipeline that makes QueryNLQ work at enterprise scale.

1. Extract — SQLAlchemy’s inspector extracts every table, column, type, constraint, index, and foreign key. A ThreadPoolExecutor (10 workers) processes tables in parallel. Row counts and sample values are collected for context.

2. Describe — A Pydantic-AI agent generates natural-language descriptions of every table and column — using column names, types, sample values, and relationships as context. Runs as Celery batch tasks with dynamic sizing: the system monitors active tasks and adjusts batch size to stay within concurrent LLM call limits. Memory monitored per-batch to prevent OOM.

3. Enrich — Each table and column gets an “enriched embedding text” — a structured summary including description, expanded types, sample values, relationships, indexes, purpose, and synonyms. This enriched text is what gets embedded, not just the raw column name.

4. Embed — Azure OpenAI’s text-embedding-3-large (3072 dimensions) generates vectors with token-aware batching. Embeddings are unit-normalized for inner product similarity.

5. Store — Two Milvus collections (tables + columns) with HNSW indexing. When the schema changes, the system diffs live vs. stored, deletes orphans, and upserts new entries — no full re-index needed.

Key Technical Decisions

Pydantic-Graph, not LangGraph — Type-safe node definitions with Pydantic output schemas enforced at every stage. Each agent has explicit input/output contracts, not free-form string passing.
14 specialized agents, not one mega-prompt — A single prompt with the full schema and all instructions hits context limits and produces unreliable SQL. Specialized agents with focused prompts and constrained outputs are more reliable and auditable.
Milvus vector search, not prompt stuffing — Stuffing 500 table DDLs into a prompt is impossible. Semantic search over embedded schema descriptions returns the 10 most relevant tables per query, keeping the SQL generation prompt focused.
sqlglot + SQLAlchemy for validation, not LLM self-check — SQL is validated by parsing (sqlglot), executing in a read-only transaction (SQLAlchemy), and checking results — not by asking the LLM “is this correct?”
Celery + Redis for async — Complex queries take 15-30 seconds across multiple agent hops. Celery workers process asynchronously while WebSocket pushes real-time progress to the UI.
Quick pass for simple queries — UserInteraction runs inline with a 3-second timeout. Simple classification and general questions are handled without Celery overhead.

Tech Stack

Agent Framework: Pydantic-AI 1.60 — structured output schemas, output validators, multi-provider support
Orchestration: Pydantic-Graph 1.60 — stateful directed graph with typed state and node routing
LLM Providers: Azure OpenAI (GPT-5.4), Anthropic Claude, Google Gemini, Groq
Vector Database: Milvus 2.5 — HNSW indexing, 3072-dim embeddings, two collections (tables + columns)
Embeddings: Azure OpenAI text-embedding-3-large
SQL Parsing: sqlglot 26 + SQLAlchemy 2.0
Backend: Django 5.2 + django-ninja (FastAPI-style routing)
Async: Celery 5.4 + Redis + Django Channels (WebSocket)
Database: PostgreSQL

Visual Architecture

How the 14 agents collaborate — and how databases are indexed for semantic search.

Query Pipeline Flow

💬

User Query

Natural language input

🧠

User Interaction Agent

Intent classification · Clarification · Quick pass (3s)

📋

Approach Planner

Strategy generation

⚙️

Execution Planner

Concrete SQL steps

📦

Task Planner

Discrete task queue

🔍

Semantic Search

Milvus vector lookup

📊

Schema Retrieval

JSON schema pruning

✅

Table Selection

Disambiguation

⚡

SQL Generation

Pydantic-AI agent

🧪

SQL Verification

Execute in transaction

👁️

SQL Reviewer

Logic + performance

↺ retry loop

🎯

SQL Execution

Results returned · Max 1000 rows · WebSocket push

🛡️

Error Handler + Loop Detection

Recovery routing · Graceful degradation · Per-agent visit thresholds

Schema Embedding Pipeline

How QueryNLQ indexes a 500+ table database for semantic search — before a single query is answered.

Extract

SQLAlchemy inspector extracts tables, columns, types, keys, indexes, relationships, sample values

10 parallel workers

→

Describe

Pydantic-AI agent generates natural-language descriptions for every table and column

Celery batch tasks · Dynamic sizing

→

Enrich

Build structured embedding text: description + types + sample values + relationships + purpose

Table + column level

→

Embed

Azure OpenAI text-embedding-3-large generates 3072-dim vectors with token-aware batching

Unit-normalized · Async

→

Store

Milvus vector DB with HNSW indexing. Two collections: tables + columns. Incremental sync on schema changes.

Semantic search ready

500+ tables supported · Memory-monitored · Partial failure recovery

QueryNLQ