LexIntel is a full-stack AI-powered legal document analysis platform that allows users to upload PDF contracts or documents, ask questions, and receive source-grounded answers using Retrieval-Augmented Generation (RAG).
The project was built using FastAPI, React, Gemini, Qdrant, and document embeddings. It started as a basic PDF chatbot and was improved into a more trustworthy RAG system with page-level citations, source text, filename tracking, and similarity scores.
Basic PDF chatbots often generate answers without showing where the information came from. This creates a trust issue, especially for legal or contract-related documents.
LexIntel solves this by retrieving relevant document chunks from Qdrant and returning:
- AI-generated answer
- Source filename
- Page number
- Retrieved source text
- Similarity score
- Chunk index
This makes every answer more transparent and easier to verify.
- PDF contract/document upload
- Text extraction from uploaded PDFs
- Page-wise document processing
- Chunk creation for retrieval
- Embedding generation for each chunk
- Vector storage using Qdrant
- RAG-based question answering
- Gemini-powered legal document explanation
- Source-grounded answers
- Page number citations
- Similarity scores for retrieved chunks
- React-based dashboard UI
- Upload status and document statistics
- Source preview display in frontend
The initial version of LexIntel only uploaded documents and returned AI answers. The improved Phase 1 version adds source grounding.
The basic version worked like this:
PDF Upload
↓
Extract Text
↓
Create Chunks
↓
Store Embeddings
↓
Ask Question
↓
Get AI Answer
The issue was that the answer did not clearly show which page or document section supported it.
The improved version works like this:
PDF Upload
↓
Extract Text Page-by-Page
↓
Create Chunks with Metadata
↓
Store Text + Filename + Page Number + Chunk Index in Qdrant
↓
Ask Question
↓
Retrieve Relevant Chunks
↓
Generate Grounded Answer with Sources
↓
Display Page Numbers and Similarity Scores in UI
Now every answer includes supporting evidence from the uploaded document.
- React
- Vite
- Axios
- CSS
- FastAPI
- Python
- Pydantic
- Uvicorn
- Google Gemini
- Qdrant Vector Database
- Embeddings
- Retrieval-Augmented Generation
- pypdf
lexintel/
│
├── backend/
│ ├── app/
│ │ ├── api/
│ │ │ └── routes/
│ │ │ ├── upload.py
│ │ │ ├── analysis.py
│ │ │ └── health.py
│ │ │
│ │ ├── core/
│ │ │ └── config.py
│ │ │
│ │ ├── repositories/
│ │ │ ├── file_repository.py
│ │ │ └── qdrant_repository.py
│ │ │
│ │ ├── schemas/
│ │ │ └── analysis_schema.py
│ │ │
│ │ ├── services/
│ │ │ ├── analysis_service.py
│ │ │ ├── document_service.py
│ │ │ ├── embedding_service.py
│ │ │ ├── gemini_service.py
│ │ │ └── pdf_service.py
│ │ │
│ │ └── main.py
│ │
│ ├── requirements.txt
│ └── .env.example
│
├── frontend/
│ ├── src/
│ │ ├── api/
│ │ │ └── contractApi.js
│ │ ├── components/
│ │ │ ├── chat/
│ │ │ ├── dashboard/
│ │ │ └── upload/
│ │ └── pages/
│ │ └── Dashboard.jsx
│ │
│ └── package.json
│
├── docs/
│ └── LexIntel_Phase1_Issues_and_Improvements.docx
│
└── README.md
Go to the backend folder:
cd backendCreate a virtual environment:
python -m venv venvActivate the virtual environment:
venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtCreate a .env file inside the backend folder.
Example:
GOOGLE_API_KEY=your_google_api_key
QDRANT_URL=your_qdrant_cluster_url
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_COLLECTION=legal_documents_v2Run the backend:
uvicorn app.main:app --reloadIf port 8000 is already occupied, run:
uvicorn app.main:app --reload --port 8001Open Swagger docs:
http://127.0.0.1:8000/docs
or:
http://127.0.0.1:8001/docs
Go to the frontend folder:
cd frontendInstall dependencies:
npm installRun frontend:
npm run devOpen the frontend URL shown in the terminal, usually:
http://localhost:5173
POST /api/contracts/uploadUploads a PDF, extracts text, chunks it, creates embeddings, and stores the chunks in Qdrant.
Example response:
{
"filename": "contract.pdf",
"file_path": "uploads/contract.pdf",
"pages_extracted": 5,
"chunks_created": 12,
"message": "Contract uploaded and indexed successfully"
}POST /api/analysis/analyzeAccepts a user query and returns a grounded AI response with sources.
Example request:
{
"query": "What are the important clauses in this document?"
}Example response:
{
"analysis": "The document contains important clauses related to termination, payment, and responsibilities.",
"sources": [
{
"text": "Either party may terminate this agreement...",
"filename": "contract.pdf",
"page": 2,
"chunk_index": 0,
"score": 0.8123
}
]
}The upload endpoint was failing with a 500 error. The issue was not the file upload itself, but the indexing step after upload.
The upload pipeline was dependent on Qdrant because the backend was doing:
Upload PDF
↓
Extract text
↓
Create embeddings
↓
Store in Qdrant
When Qdrant failed, the full upload failed.
Fix:
- Debugged the upload route
- Added clearer error handling
- Understood that upload and indexing were connected
- Fixed Qdrant configuration
Qdrant returned:
Unexpected Response: 404 Not Found
Reason:
The Qdrant URL was incorrect. The backend was not pointing to the proper Qdrant cluster API endpoint.
Fix:
- Updated
QDRANT_URLin.env - Used correct Qdrant Cloud cluster URL
- Restarted backend
- Re-tested upload successfully
The earlier PDF extraction logic combined the whole document into one large text string. Because of that, the backend could not identify which page a retrieved answer came from.
Fix:
- Changed PDF extraction to work page-by-page
- Stored page number with each chunk
- Added page metadata into Qdrant payload
Earlier, Qdrant stored only:
{
"text": "chunk text",
"filename": "contract.pdf",
"chunk_index": 0
}Improved payload:
{
"text": "chunk text",
"filename": "contract.pdf",
"page": 2,
"chunk_index": 0
}This allowed the frontend to show page-level citations.
During development, port 8000 was stuck with an older process, so backend testing was moved to port 8001.
Fix:
- Updated frontend API base URL from port 8000 to 8001 during testing
- Increased frontend timeout for longer RAG responses
The backend was returning sources, but the chat message component only displayed the answer text.
Fix:
- Updated message state to store
sources - Passed sources into
MessageBubble - Added source cards below AI responses
- Displayed filename, page number, similarity score, and preview text
The project was improved from a basic RAG chatbot into a more reliable document intelligence system.
- Uploaded PDFs
- Extracted text
- Stored embeddings
- Returned AI-generated answers
- Extracts PDF text page-by-page
- Stores metadata with each chunk
- Retrieves relevant chunks from Qdrant
- Returns filename, page number, chunk index, and similarity score
- Displays sources in the frontend
- Improves trust and reduces hallucination risk
Planned improvements for the next phases:
- Risk detection for legal clauses
- Clause extraction
- Contract summary report
- Missing clause detection
- Contract comparison
- Chat history per document
- RAG evaluation dashboard
- Better reranking of retrieved chunks
- Authentication and user-specific documents
This project can be described on a resume as:
Built LexIntel, a full-stack legal document intelligence platform using FastAPI, React, Gemini, and Qdrant. Improved the RAG pipeline by adding source-grounded answers with filename, page number, retrieved text, chunk index, and similarity scores for better trust and reduced hallucination.
The advanced Phase 2 RAG upgrade is available in a separate branch:
lexintel-3-rag-upgrade
This branch includes parent-child chunking, Qdrant metadata improvements, and cross-encoder re-ranking.
Phani M BTech Electronics and Communication Engineering Interested in AI/ML, GenAI, FastAPI, React, and full-stack AI applications.
LexIntel is an educational AI project. It explains document content based on uploaded sources, but it does not provide legal advice.