ScrapeGraph MCP Server Documentation

Welcome to the ScrapeGraph MCP Server documentation hub. This directory contains comprehensive documentation for understanding, developing, and maintaining the ScrapeGraph MCP Server.

📚 Available Documentation

System Documentation (`system/`)

Project Architecture

Complete system architecture documentation including:

System Overview - MCP server purpose and capabilities
Technology Stack - Python 3.10+, FastMCP, httpx dependencies
Project Structure - File organization and key files
Core Architecture - MCP design, server architecture, patterns
MCP Tools - API v2 tools (markdownify, scrape, smartscraper, searchscraper, crawl, credits, history, monitor, …)
API Integration - ScrapeGraphAI API endpoints and credit system
Deployment - Smithery, Claude Desktop, Cursor, Docker setup
Recent Updates - SmartCrawler integration and latest features

MCP Protocol

Complete Model Context Protocol integration documentation:

What is MCP? - Protocol overview and key concepts
MCP in ScrapeGraph - Architecture and FastMCP usage
Communication Protocol - JSON-RPC over stdio transport
Tool Schema - Schema generation from Python type hints
Error Handling - Graceful error handling patterns
Client Integration - Claude Desktop, Cursor, custom clients
Advanced Topics - Versioning, streaming, authentication, rate limiting
Debugging - MCP Inspector, logs, troubleshooting

Task Documentation (`tasks/`)

Future: PRD and implementation plans for specific features

SOP Documentation (`sop/`)

Future: Standard operating procedures (e.g., adding new tools, testing)

🚀 Quick Start

For New Engineers

Read First:
- Project Architecture - System Overview
- MCP Protocol - What is MCP?
Setup Development Environment:
- Install Python 3.10+
- Clone repository: git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
- Install dependencies: pip install -e ".[dev]"
- Get API key from: dashboard.scrapegraphai.com

Run the Server:

export SGAI_API_KEY=your-api-key
scrapegraph-mcp

Test with MCP Inspector:

npx @modelcontextprotocol/inspector scrapegraph-mcp

Integrate with Claude Desktop:
- See: Project Architecture - Deployment
- Add config to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)

🔍 Finding Information

I want to understand...

...what MCP is:

Read: MCP Protocol - What is MCP?
Read: Project Architecture - Core Architecture

...how to add a new tool:

Read: Project Architecture - Contributing - Adding New Tools
Example: See existing tools in src/scrapegraph_mcp/server.py

...how tools are defined:

Read: MCP Protocol - Tool Schema
Code: src/scrapegraph_mcp/server.py (lines 232-372)

...how to debug MCP issues:

Read: MCP Protocol - Debugging MCP
Tools: MCP Inspector, Claude Desktop logs

...how to deploy:

Read: Project Architecture - Deployment
Options: Smithery (automated), Docker, pip install

...available tools and their parameters:

Read: Project Architecture - MCP Tools
Quick reference: see README “Available Tools” table (v2: + scrape, crawl_stop/resume, credits, sgai_history, monitor_*; removed sitemap, agentic_scrapper, *_status tools)

...error handling:

Read: MCP Protocol - Error Handling
Pattern: Return {"error": "message"} instead of raising exceptions

...how SmartCrawler works:

Read: Project Architecture - Tool #4 & #5
Pattern: Initiate (async) → Poll fetch_results until complete

🛠️ Development Workflows

Running Locally

# Install dependencies
pip install -e ".[dev]"

# Set API key
export SGAI_API_KEY=your-api-key

# Run server
scrapegraph-mcp
# or
python -m scrapegraph_mcp.server

Testing

Manual Testing (MCP Inspector):

npx @modelcontextprotocol/inspector scrapegraph-mcp

Manual Testing (stdio):

echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"markdownify","arguments":{"website_url":"https://scrapegraphai.com"}},"id":1}' | scrapegraph-mcp
# (v2: same tool name; backend calls POST /scrape)

Integration Testing (Claude Desktop):

Configure MCP server in Claude Desktop
Restart Claude
Ask: "Convert https://scrapegraphai.com to markdown"
Verify tool invocation and results

Code Quality

# Linting
ruff check src/

# Type checking
mypy src/

# Format checking
ruff format --check src/

Building Docker Image

# Build
docker build -t scrapegraph-mcp .

# Run
docker run -e SGAI_API_KEY=your-api-key scrapegraph-mcp

# Test
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | docker run -i -e SGAI_API_KEY=your-api-key scrapegraph-mcp

📊 MCP Tools Reference

Quick reference to all MCP tools:

Tool	Notes
`markdownify` / `scrape`	POST /scrape (v2)
`smartscraper`	POST /extract; URL only
`searchscraper`	POST /search; num_results 3–20
`smartcrawler_*`, `crawl_stop`, `crawl_resume`	POST/GET /crawl
`credits`, `sgai_history`	GET /credits, /history
`monitor_*`	/monitor namespace

For detailed tool documentation, see Project Architecture - MCP Tools.

🔧 Key Files Reference

Core Files

src/scrapegraph_mcp/server.py - Main server implementation (all code)
src/scrapegraph_mcp/__init__.py - Package initialization

Configuration

pyproject.toml - Project metadata, dependencies, build config
Dockerfile - Docker container definition
smithery.yaml - Smithery deployment config

Documentation

README.md - User-facing documentation
.agent/README.md - This file (developer documentation index)
.agent/system/project_architecture.md - Architecture documentation
.agent/system/mcp_protocol.md - MCP protocol documentation

🚨 Troubleshooting

Common Issues

Issue: "ScapeGraph client not initialized"

Cause: Missing SGAI_API_KEY environment variable
Solution: Set export SGAI_API_KEY=your-api-key or pass via --config

Issue: "Error 401: Unauthorized"

Cause: Invalid API key
Solution: Verify API key at dashboard.scrapegraphai.com

Issue: "Error 402: Payment Required"

Cause: Insufficient credits
Solution: Add credits to your ScrapeGraphAI account

Issue: Tools not appearing in Claude Desktop

Cause: Server not starting or config error
Solution: Check Claude logs at ~/Library/Logs/Claude/ (macOS)

Issue: SmartCrawler not returning results

Cause: Still processing (async operation)
Solution: Keep polling smartcrawler_fetch_results() until status == "completed"

Issue: Python version error

Cause: Python < 3.10
Solution: Upgrade Python to 3.10+

For more troubleshooting, see:

🤝 Contributing

Before Making Changes

Read relevant documentation - Understand MCP and the server architecture
Check existing issues - Avoid duplicate work
Test locally - Use MCP Inspector to verify changes
Test with clients - Verify with Claude Desktop or Cursor

Adding a New Tool

Step-by-step guide:

Add method to ScapeGraphClient class:

def new_tool(self, param: str) -> Dict[str, Any]:
    """Tool description."""
    url = f"{self.BASE_URL}/new-endpoint"
    data = {"param": param}
    response = self.client.post(url, headers=self.headers, json=data)
    if response.status_code != 200:
        raise Exception(f"Error {response.status_code}: {response.text}")
    return response.json()

Add MCP tool decorator:

@mcp.tool()
def new_tool(param: str) -> Dict[str, Any]:
    """
    Tool description for AI assistants.

    Args:
        param: Parameter description

    Returns:
        Dictionary containing results
    """
    if scrapegraph_client is None:
        return {"error": "ScapeGraph client not initialized. Please provide an API key."}

    try:
        return scrapegraph_client.new_tool(param)
    except Exception as e:
        return {"error": str(e)}

Test with MCP Inspector:

npx @modelcontextprotocol/inspector scrapegraph-mcp

Update documentation:

Add tool to Project Architecture - MCP Tools
Add schema to MCP Protocol - Tool Schema
Update tool reference table in this README

Submit pull request

Development Process

Make changes - Edit src/scrapegraph_mcp/server.py
Run linting - ruff check src/
Run type checking - mypy src/
Test locally - MCP Inspector + Claude Desktop
Update docs - Keep .agent/ docs in sync
Commit - Clear commit message
Create PR - Describe changes thoroughly

Code Style

Ruff: Line length 100, target Python 3.12
mypy: Strict mode, disallow untyped defs
Type hints: Always use type hints for parameters and return values
Docstrings: Google-style docstrings for all public functions
Error handling: Return error dicts, don't raise exceptions in tools

📖 External Documentation

MCP Resources

ScrapeGraphAI Resources

AI Assistant Integration

Development Tools

📝 Documentation Maintenance

When to Update Documentation

Update .agent/system/project_architecture.md when:

Adding new MCP tools
Changing tool parameters or return types
Updating deployment methods
Modifying technology stack

Update .agent/system/mcp_protocol.md when:

Changing MCP protocol implementation
Adding new communication patterns
Modifying error handling strategy
Updating authentication method

Update .agent/README.md when:

Adding new documentation files
Changing development workflows
Updating quick start instructions

Documentation Best Practices

Keep it current - Update docs with code changes in the same PR
Be specific - Include code snippets, file paths, line numbers
Include examples - Show real-world usage patterns
Link related sections - Cross-reference between documents
Test examples - Verify all code examples work

📅 Changelog

April 2026

✅ Migrated MCP client and tools to API v2 (scrapegraph-py#84): base https://v2-api.scrapegraphai.com/api, SGAI-APIKEY header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: SGAI_API_URL, SGAI_TIMEOUT (legacy alias SGAI_TIMEOUT_S still honored).
✅ Added monitor_activity tool for paginated tick history (GET /monitor/:id/activity), mirroring sgai.monitor.activity() in scrapegraph-py v2.

January 2026

✅ Added time_range parameter to SearchScraper for filtering results by recency (v1-era; ignored on API v2)
✅ Supported time ranges: past_hour, past_24_hours, past_week, past_month, past_year
✅ Documentation updated to reflect SDK changes (scrapegraph-py#77, scrapegraph-js#2)

October 2025

✅ Initial comprehensive documentation created
✅ Project architecture fully documented
✅ MCP protocol integration documented
✅ All 5 MCP tools documented
✅ SmartCrawler integration (initiate + fetch_results)
✅ Deployment guides (Smithery, Docker, Claude Desktop, Cursor)
✅ Recent updates: Enhanced error handling, extraction mode validation

🔗 Quick Links

Main README - User-facing documentation
Server Implementation - All code (single file)
pyproject.toml - Project metadata
Dockerfile - Docker configuration
smithery.yaml - Smithery config
GitHub Repository

📧 Support

For questions or issues:

Check this documentation first
Review Project Architecture and MCP Protocol
Test with MCP Inspector
Search GitHub issues
Create a new issue with detailed information

Made with ❤️ by ScrapeGraphAI Team

Happy Coding! 🚀

FilesExpand file tree

README.md

Latest commit

History