Welcome to the ScrapeGraph MCP Server documentation hub. This directory contains comprehensive documentation for understanding, developing, and maintaining the ScrapeGraph MCP Server.
Complete system architecture documentation including:
- System Overview - MCP server purpose and capabilities
- Technology Stack - Python 3.10+, FastMCP, httpx dependencies
- Project Structure - File organization and key files
- Core Architecture - MCP design, server architecture, patterns
- MCP Tools - API v2 tools (markdownify, scrape, smartscraper, searchscraper, crawl, credits, history, monitor, β¦)
- API Integration - ScrapeGraphAI API endpoints and credit system
- Deployment - Smithery, Claude Desktop, Cursor, Docker setup
- Recent Updates - SmartCrawler integration and latest features
Complete Model Context Protocol integration documentation:
- What is MCP? - Protocol overview and key concepts
- MCP in ScrapeGraph - Architecture and FastMCP usage
- Communication Protocol - JSON-RPC over stdio transport
- Tool Schema - Schema generation from Python type hints
- Error Handling - Graceful error handling patterns
- Client Integration - Claude Desktop, Cursor, custom clients
- Advanced Topics - Versioning, streaming, authentication, rate limiting
- Debugging - MCP Inspector, logs, troubleshooting
Future: PRD and implementation plans for specific features
Future: Standard operating procedures (e.g., adding new tools, testing)
-
Read First:
-
Setup Development Environment:
- Install Python 3.10+
- Clone repository:
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp - Install dependencies:
pip install -e ".[dev]" - Get API key from: dashboard.scrapegraphai.com
-
Run the Server:
export SGAI_API_KEY=your-api-key scrapegraph-mcp -
Test with MCP Inspector:
npx @modelcontextprotocol/inspector scrapegraph-mcp
-
Integrate with Claude Desktop:
- See: Project Architecture - Deployment
- Add config to
~/Library/Application Support/Claude/claude_desktop_config.json(macOS)
...what MCP is:
...how to add a new tool:
- Read: Project Architecture - Contributing - Adding New Tools
- Example: See existing tools in
src/scrapegraph_mcp/server.py
...how tools are defined:
- Read: MCP Protocol - Tool Schema
- Code:
src/scrapegraph_mcp/server.py(lines 232-372)
...how to debug MCP issues:
- Read: MCP Protocol - Debugging MCP
- Tools: MCP Inspector, Claude Desktop logs
...how to deploy:
- Read: Project Architecture - Deployment
- Options: Smithery (automated), Docker, pip install
...available tools and their parameters:
- Read: Project Architecture - MCP Tools
- Quick reference: see README βAvailable Toolsβ table (v2: + scrape, crawl_stop/resume, credits, sgai_history, monitor_*; removed sitemap, agentic_scrapper, *_status tools)
...error handling:
- Read: MCP Protocol - Error Handling
- Pattern: Return
{"error": "message"}instead of raising exceptions
...how SmartCrawler works:
- Read: Project Architecture - Tool #4 & #5
- Pattern: Initiate (async) β Poll fetch_results until complete
# Install dependencies
pip install -e ".[dev]"
# Set API key
export SGAI_API_KEY=your-api-key
# Run server
scrapegraph-mcp
# or
python -m scrapegraph_mcp.serverManual Testing (MCP Inspector):
npx @modelcontextprotocol/inspector scrapegraph-mcpManual Testing (stdio):
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"markdownify","arguments":{"website_url":"https://scrapegraphai.com"}},"id":1}' | scrapegraph-mcp
# (v2: same tool name; backend calls POST /scrape)Integration Testing (Claude Desktop):
- Configure MCP server in Claude Desktop
- Restart Claude
- Ask: "Convert https://scrapegraphai.com to markdown"
- Verify tool invocation and results
# Linting
ruff check src/
# Type checking
mypy src/
# Format checking
ruff format --check src/# Build
docker build -t scrapegraph-mcp .
# Run
docker run -e SGAI_API_KEY=your-api-key scrapegraph-mcp
# Test
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | docker run -i -e SGAI_API_KEY=your-api-key scrapegraph-mcpQuick reference to all MCP tools:
| Tool | Notes |
|---|---|
markdownify / scrape |
POST /scrape (v2) |
smartscraper |
POST /extract; URL only |
searchscraper |
POST /search; num_results 3β20 |
smartcrawler_*, crawl_stop, crawl_resume |
POST/GET /crawl |
credits, sgai_history |
GET /credits, /history |
monitor_* |
/monitor namespace |
For detailed tool documentation, see Project Architecture - MCP Tools.
src/scrapegraph_mcp/server.py- Main server implementation (all code)src/scrapegraph_mcp/__init__.py- Package initialization
pyproject.toml- Project metadata, dependencies, build configDockerfile- Docker container definitionsmithery.yaml- Smithery deployment config
README.md- User-facing documentation.agent/README.md- This file (developer documentation index).agent/system/project_architecture.md- Architecture documentation.agent/system/mcp_protocol.md- MCP protocol documentation
Issue: "ScapeGraph client not initialized"
- Cause: Missing
SGAI_API_KEYenvironment variable - Solution: Set
export SGAI_API_KEY=your-api-keyor pass via--config
Issue: "Error 401: Unauthorized"
- Cause: Invalid API key
- Solution: Verify API key at dashboard.scrapegraphai.com
Issue: "Error 402: Payment Required"
- Cause: Insufficient credits
- Solution: Add credits to your ScrapeGraphAI account
Issue: Tools not appearing in Claude Desktop
- Cause: Server not starting or config error
- Solution: Check Claude logs at
~/Library/Logs/Claude/(macOS)
Issue: SmartCrawler not returning results
- Cause: Still processing (async operation)
- Solution: Keep polling
smartcrawler_fetch_results()untilstatus == "completed"
Issue: Python version error
- Cause: Python < 3.10
- Solution: Upgrade Python to 3.10+
For more troubleshooting, see:
- Read relevant documentation - Understand MCP and the server architecture
- Check existing issues - Avoid duplicate work
- Test locally - Use MCP Inspector to verify changes
- Test with clients - Verify with Claude Desktop or Cursor
Step-by-step guide:
- Add method to
ScapeGraphClientclass:
def new_tool(self, param: str) -> Dict[str, Any]:
"""Tool description."""
url = f"{self.BASE_URL}/new-endpoint"
data = {"param": param}
response = self.client.post(url, headers=self.headers, json=data)
if response.status_code != 200:
raise Exception(f"Error {response.status_code}: {response.text}")
return response.json()- Add MCP tool decorator:
@mcp.tool()
def new_tool(param: str) -> Dict[str, Any]:
"""
Tool description for AI assistants.
Args:
param: Parameter description
Returns:
Dictionary containing results
"""
if scrapegraph_client is None:
return {"error": "ScapeGraph client not initialized. Please provide an API key."}
try:
return scrapegraph_client.new_tool(param)
except Exception as e:
return {"error": str(e)}- Test with MCP Inspector:
npx @modelcontextprotocol/inspector scrapegraph-mcp- Update documentation:
- Add tool to Project Architecture - MCP Tools
- Add schema to MCP Protocol - Tool Schema
- Update tool reference table in this README
- Submit pull request
- Make changes - Edit
src/scrapegraph_mcp/server.py - Run linting -
ruff check src/ - Run type checking -
mypy src/ - Test locally - MCP Inspector + Claude Desktop
- Update docs - Keep
.agent/docs in sync - Commit - Clear commit message
- Create PR - Describe changes thoroughly
- Ruff: Line length 100, target Python 3.12
- mypy: Strict mode, disallow untyped defs
- Type hints: Always use type hints for parameters and return values
- Docstrings: Google-style docstrings for all public functions
- Error handling: Return error dicts, don't raise exceptions in tools
Update .agent/system/project_architecture.md when:
- Adding new MCP tools
- Changing tool parameters or return types
- Updating deployment methods
- Modifying technology stack
Update .agent/system/mcp_protocol.md when:
- Changing MCP protocol implementation
- Adding new communication patterns
- Modifying error handling strategy
- Updating authentication method
Update .agent/README.md when:
- Adding new documentation files
- Changing development workflows
- Updating quick start instructions
- Keep it current - Update docs with code changes in the same PR
- Be specific - Include code snippets, file paths, line numbers
- Include examples - Show real-world usage patterns
- Link related sections - Cross-reference between documents
- Test examples - Verify all code examples work
- β
Migrated MCP client and tools to API v2 (scrapegraph-py#84): base
https://v2-api.scrapegraphai.com/api,SGAI-APIKEYheader (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK:SGAI_API_URL,SGAI_TIMEOUT(legacy aliasSGAI_TIMEOUT_Sstill honored). - β
Added
monitor_activitytool for paginated tick history (GET /monitor/:id/activity), mirroringsgai.monitor.activity()in scrapegraph-py v2.
- β
Added
time_rangeparameter to SearchScraper for filtering results by recency (v1-era; ignored on API v2) - β
Supported time ranges:
past_hour,past_24_hours,past_week,past_month,past_year - β Documentation updated to reflect SDK changes (scrapegraph-py#77, scrapegraph-js#2)
- β Initial comprehensive documentation created
- β Project architecture fully documented
- β MCP protocol integration documented
- β All 5 MCP tools documented
- β SmartCrawler integration (initiate + fetch_results)
- β Deployment guides (Smithery, Docker, Claude Desktop, Cursor)
- β Recent updates: Enhanced error handling, extraction mode validation
- Main README - User-facing documentation
- Server Implementation - All code (single file)
- pyproject.toml - Project metadata
- Dockerfile - Docker configuration
- smithery.yaml - Smithery config
- GitHub Repository
For questions or issues:
- Check this documentation first
- Review Project Architecture and MCP Protocol
- Test with MCP Inspector
- Search GitHub issues
- Create a new issue with detailed information
Made with β€οΈ by ScrapeGraphAI Team
Happy Coding! π