Skip to content

zaydabash/llm-sql-auditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Auditor

Python FastAPI License Tests Coverage

Production-ready LLM-driven SQL optimization and analysis tool for detecting performance issues, suggesting optimizations, and measuring real-world improvements.

Current Status:

High Test Coverage - 147 passing tests, 91% coverage
Real-time Performance Validation - Measures actual speedups in test databases
Comprehensive Monitoring - Prometheus metrics, Grafana dashboards, and alerts
LLM Cost Controls - Token tracking, budget limits, and usage reporting

Screenshots

The input view, where you paste a schema and one or more queries:

SQL Auditor input

The results view, showing detected issues, an estimated speedup, optimized rewrites, and index recommendations:

SQL Auditor results

Demo

Running the analyzer on a single query returns the detected issues, an estimated speedup, and concrete index recommendations:

$ python -m backend.scripts.demo
Query: SELECT * FROM orders o JOIN users u ON u.id = o.user_id WHERE LOWER(u.email) = 'admin@example.com' ORDER BY o.created_at DESC;

Issues:
  [R001] WARN  SELECT_STAR        Avoid SELECT * in production queries
  [R002] WARN  UNUSED_JOIN        Join on table 'users' appears unused
  [R004] WARN  NON_SARGABLE       Function applied to column in WHERE clause prevents index usage
  [R006] INFO  ORDER_BY_NO_INDEX  ORDER BY may benefit from an index on the sorted columns

Summary:
  totalIssues=4  highSeverity=0
  estImprovement=Moderate improvements available (2-4x speedup)

Recommended indexes:
  CREATE INDEX ON orders (created_at)  -- Improves ORDER BY performance on created_at
  CREATE INDEX ON orders (user_id)     -- Optimizes JOIN performance on user_id
  CREATE INDEX ON users (email)        -- Supports WHERE clause filtering on email
  CREATE INDEX ON users (id)           -- Optimizes JOIN performance on id

Features

  • Static Analysis: Detects 10+ common SQL anti-patterns (SELECT *, unused joins, cartesian products, non-SARGable predicates, etc.)
  • LLM Integration: Uses OpenAI GPT-4 to generate natural language explanations and optimized query rewrites
  • Index Advisor: Recommends indexes based on query patterns (WHERE, JOIN, ORDER BY, GROUP BY)
  • Cost Estimation: Heuristic-based cost scoring and improvement estimates
  • Web UI: Clean React + TypeScript interface for interactive analysis
  • Jupyter Notebooks: Prototyping and experimentation environment
  • Docker Support: One-command deployment with docker-compose

Architecture

┌─────────────┐
│   Frontend  │  React + TypeScript + Tailwind
│  (Port 5173)│
└──────┬──────┘
       │ HTTP
       ▼
┌─────────────┐
│   FastAPI   │  REST API
│  (Port 8000)│
└──────┬──────┘
       │
       ├──► Parser (SQLGlot)
       ├──► Rules Engine (10 rules)
       ├──► Cost Estimator
       ├──► Index Advisor
       └──► LLM Provider (OpenAI)

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for frontend)
  • Poetry (recommended) or pip
  • SQLite3 (for demo database)
  • OpenAI API key (optional, for LLM features)

Installation

  1. Clone the repository

    git clone <repo-url>
    cd llm-sql-auditor
  2. Set up environment

    cp .env.example .env
    # Edit .env and add your OPENAI_API_KEY (optional - static analysis works without it)
  3. Install Python dependencies

    poetry install
    # OR
    pip install -r requirements.txt
  4. Seed demo database

    ./scripts/seed_demo.sh
  5. Start development servers

    ./scripts/dev_up.sh

    Or manually:

    # Terminal 1: Backend
    poetry run uvicorn backend.app:app --reload --port 8000
    
    # Terminal 2: Frontend
    cd frontend && npm install && npm run dev
  6. Access the application

    • Frontend: Available on the configured port (default: 5173)
    • API: Available on the configured port (default: 8000)
    • API Docs: Available at /docs endpoint

Docker

docker-compose up --build

Usage

Web UI

  1. Open the frontend application in your browser
  2. Paste your schema DDL in the "Schema" textarea
  3. Enter SQL queries (separate multiple queries with ---)
  4. Select SQL dialect (PostgreSQL or SQLite)
  5. Click "Analyze Queries"
  6. Review issues, rewrites, index suggestions, and LLM explanations

API

Audit Multiple Queries

curl -X POST <API_URL>/api/audit \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "CREATE TABLE users (id INTEGER, email TEXT);",
    "queries": [
      "SELECT * FROM users;",
      "SELECT * FROM users WHERE LOWER(email) = '\''test@example.com'\'';"
    ],
    "dialect": "postgres"
  }'

Explain Single Query

curl -X POST <API_URL>/api/explain \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "CREATE TABLE users (id INTEGER, email TEXT);",
    "query": "SELECT * FROM users WHERE LOWER(email) = '\''test@example.com'\'';",
    "dialect": "postgres"
  }'

Jupyter Notebook

cd notebooks
jupyter notebook 01_prototype.ipynb

Detected Issues

The rules engine detects the following issues:

Code Rule Severity Description Example
R001 SELECT_STAR warn Avoid SELECT * in production SELECT * FROM users;
R002 UNUSED_JOIN warn Joins where columns aren't referenced SELECT u.id FROM users u JOIN orders o ON ...
R003 CARTESIAN_JOIN error Joins without ON predicate SELECT * FROM users, orders;
R004 NON_SARGABLE warn Functions on indexed columns in WHERE WHERE LOWER(email) = 'test'
R005 MISSING_PREDICATE warn Large table scans without WHERE SELECT * FROM orders; (100k+ rows)
R006 ORDER_BY_NO_INDEX info ORDER BY columns lacking index ORDER BY created_at (no index)
R007 DISTINCT_MISUSE info DISTINCT as de-dupe band-aid SELECT DISTINCT ... with joins
R008 N_PLUS_ONE_PATTERN warn Correlated subqueries WHERE EXISTS (SELECT ... WHERE t.id = outer.id)
R009 LIKE_PREFIX_WILDCARD warn LIKE with leading wildcard WHERE name LIKE '%value'
R010 AGG_NO_GROUPING_INDEX info Aggregations missing covering index GROUP BY category (no index)

Example Output

Before & After Query Optimization

Original Query:

SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE LOWER(u.email) = 'admin@example.com'
ORDER BY o.created_at DESC;

Detected Issues:

  • R001: SELECT * usage
  • R004: Non-SARGable predicate (LOWER function)
  • R006: ORDER BY without supporting index

Optimized Query:

SELECT o.id, o.user_id, o.created_at, o.total_cents,
       u.id, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = 'admin@example.com'  -- Removed LOWER()
ORDER BY o.created_at DESC;

Recommended Index:

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_users_email ON users(email);

API Response Example

{
  "summary": {
    "totalIssues": 5,
    "highSeverity": 1,
    "estImprovement": "2-4x potential speedup - Issues: Full scan on large table 'orders' (100000 rows), Non-SARGable function in WHERE clause"
  },
  "issues": [
    {
      "code": "R001",
      "severity": "warn",
      "message": "Avoid SELECT * in production queries...",
      "rule": "SELECT_STAR"
    }
  ],
  "rewrites": [
    {
      "original": "SELECT * FROM orders...",
      "optimized": "SELECT o.id, o.user_id FROM orders o...",
      "rationale": "Project only needed columns..."
    }
  ],
  "indexes": [
    {
      "table": "orders",
      "columns": ["user_id", "created_at"],
      "type": "btree",
      "rationale": "Supports ORDER BY and join"
    }
  ],
  "llmExplain": "Your query scans entire orders due to SELECT *..."
}

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=backend --cov-report=html

# Run specific test file
pytest backend/tests/test_rules.py

Project Structure

sql-auditor/
├── backend/
│   ├── app.py                      # FastAPI entry point and routes
│   ├── core/
│   │   ├── config.py               # Settings (env vars)
│   │   ├── models.py               # Pydantic request/response models
│   │   ├── dialects.py             # SQL dialect handling
│   │   ├── auth.py                 # API-key authentication
│   │   ├── security.py             # Input validation, rate limiting
│   │   ├── monitoring.py           # Prometheus metrics
│   │   ├── alerts.py               # Threshold alerting
│   │   └── error_handler.py        # Centralized error handling
│   ├── services/
│   │   ├── analyzer/
│   │   │   ├── parser.py           # SQL parsing
│   │   │   ├── rules_engine.py     # 10 detection rules (R001-R010)
│   │   │   ├── cost_estimator.py
│   │   │   └── index_advisor.py
│   │   ├── llm/
│   │   │   ├── provider.py         # OpenAI integration (with stub fallback)
│   │   │   ├── prompts.py          # LLM prompts
│   │   │   └── cost_tracker.py     # Token usage and budget tracking
│   │   ├── pipeline.py             # Main orchestration
│   │   ├── persistence.py          # Audit history (SQLite / PostgreSQL)
│   │   └── performance_validator.py # Real EXPLAIN-based validation
│   ├── db/
│   │   ├── seed.sql                # Demo schema
│   │   ├── explain_executor.py     # EXPLAIN execution
│   │   └── explain_helpers.py
│   ├── scripts/
│   │   ├── demo.py                 # Command-line demo
│   │   └── benchmark.py            # Anti-pattern benchmark suite
│   └── tests/                      # Test suite (147 tests)
├── frontend/
│   ├── src/
│   │   ├── App.tsx
│   │   ├── api.ts
│   │   └── components/             # Badge, InputPanel, QueryCard, ReportView, ExampleQueries
│   └── package.json
├── grafana/
│   ├── provisioning/               # Datasource + dashboard providers
│   └── dashboards/
│       └── sql-auditor.json        # Pre-built metrics dashboard
├── notebooks/
│   └── 01_prototype.ipynb
├── scripts/
│   ├── dev_up.sh
│   └── seed_demo.sh
├── docker-compose.yml
├── Dockerfile
└── README.md

Configuration

Environment variables (see .env.example):

  • OPENAI_API_KEY: OpenAI API key for LLM features (optional)
  • SQLAUDITOR_LOG_LEVEL: Logging level (default: INFO)
  • SQLAUDITOR_DEMO_DB: Path to demo SQLite database
  • DEFAULT_DIALECT: Default SQL dialect (postgres/sqlite)

Roadmap

  • Real EXPLAIN integration with live database connections
  • Multi-dialect support (PostgreSQL, SQLite)
  • Comprehensive monitoring and alerting
  • LLM cost tracking and budgeting
  • API-key authentication (optional, header-based)
  • Audit History Persistence (SQLite default, PostgreSQL optional)
  • GitHub Actions CI/CD integration
  • JWT/OAuth authentication for multi-user support
  • Slack/Teams integration for alerts

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass and coverage ≥ 90%
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

About

SaaS-style SQL auditor for B2B data teams: analyzes schema + queries using LLM + rules engine to audit, optimize, and explain SQL in natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors