|
1 | 1 | --- |
2 | | -title: Scraping behind a proxy |
3 | | -description: 'Route requests through your own proxy for geo-targeting or privacy' |
| 2 | +title: Proxy & Fetch Configuration |
| 3 | +description: 'Control proxy routing, stealth mode, and geo-targeting with FetchConfig' |
4 | 4 | --- |
5 | 5 |
|
6 | | -Using a proxy lets you route ScrapeGraphAI requests through a specific IP address or geographic location. This is useful for accessing geo-restricted content, bypassing IP-based blocks, or testing region-specific pages. |
| 6 | +In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object. You can set the proxy strategy (`mode`), country-based geotargeting (`country`), wait times, scrolling, custom headers, and more. |
7 | 7 |
|
8 | | -## How to pass a proxy |
| 8 | +See the [full proxy reference](/services/additional-parameters/proxy) for all available options. |
9 | 9 |
|
10 | | -Use the `proxy` parameter available in SmartScraper, SearchScraper, and Markdownify: |
| 10 | +## Choosing a fetch mode |
11 | 11 |
|
12 | | -```python |
13 | | -from scrapegraph_py import Client |
| 12 | +The `mode` parameter controls how pages are retrieved: |
| 13 | + |
| 14 | +| Mode | Description | |
| 15 | +|------|-------------| |
| 16 | +| `auto` | Automatically selects the best strategy (default) | |
| 17 | +| `fast` | Direct HTTP fetch, no JS rendering — fastest option | |
| 18 | +| `js` | Headless browser for JavaScript-heavy pages | |
| 19 | +| `direct+stealth` | Residential proxy with stealth headers (no JS) | |
| 20 | +| `js+stealth` | JS rendering + residential/stealth proxy | |
| 21 | + |
| 22 | +## Examples |
| 23 | + |
| 24 | +### Geo-targeted content |
| 25 | + |
| 26 | +Access content from a specific country using the `country` parameter: |
| 27 | + |
| 28 | +<CodeGroup> |
| 29 | + |
| 30 | +```python Python |
| 31 | +from scrapegraph_py import Client, FetchConfig |
14 | 32 |
|
15 | 33 | client = Client(api_key="your-api-key") |
16 | 34 |
|
17 | | -response = client.smartscraper( |
18 | | - website_url="https://example.com", |
19 | | - user_prompt="Extract the main content", |
20 | | - proxy="http://username:password@proxy-host:8080", |
| 35 | +response = client.extract( |
| 36 | + url="https://example.com", |
| 37 | + prompt="Extract the main content", |
| 38 | + fetch_config=FetchConfig(country="de"), # Route through Germany |
21 | 39 | ) |
22 | 40 | ``` |
23 | 41 |
|
24 | | -```javascript |
25 | | -import { scrapegraphai } from "scrapegraph-js"; |
| 42 | +```javascript JavaScript |
| 43 | +import { scrapegraphai } from 'scrapegraph-js'; |
26 | 44 |
|
27 | | -const sgai = scrapegraphai({ apiKey: "your-api-key" }); |
28 | | -const { data } = await sgai.extract("https://example.com", { |
29 | | - prompt: "Extract the main content", |
30 | | - fetchConfig: { |
31 | | - proxy: "http://username:password@proxy-host:8080", |
32 | | - }, |
| 45 | +const sgai = scrapegraphai({ apiKey: 'your-api-key' }); |
| 46 | + |
| 47 | +const { data } = await sgai.extract('https://example.com', { |
| 48 | + prompt: 'Extract the main content', |
| 49 | + fetchConfig: { country: 'de' }, |
33 | 50 | }); |
34 | 51 | ``` |
35 | 52 |
|
36 | | -See the [proxy parameter documentation](/services/additional-parameters/proxy) for the full reference. |
| 53 | +</CodeGroup> |
37 | 54 |
|
38 | | -## Proxy URL format |
| 55 | +### Stealth mode for protected sites |
39 | 56 |
|
40 | | -``` |
41 | | -http://username:password@host:port |
42 | | -socks5://username:password@host:port |
43 | | -``` |
| 57 | +Use stealth modes to bypass anti-bot protections: |
44 | 58 |
|
45 | | -If the proxy does not require authentication: |
| 59 | +<CodeGroup> |
46 | 60 |
|
47 | | -``` |
48 | | -http://host:port |
| 61 | +```python Python |
| 62 | +from scrapegraph_py import Client, FetchConfig |
| 63 | + |
| 64 | +client = Client(api_key="your-api-key") |
| 65 | + |
| 66 | +response = client.scrape( |
| 67 | + url="https://protected-site.com", |
| 68 | + format="markdown", |
| 69 | + fetch_config=FetchConfig( |
| 70 | + mode="js+stealth", |
| 71 | + wait=3000, |
| 72 | + scrolls=3, |
| 73 | + country="us", |
| 74 | + ), |
| 75 | +) |
49 | 76 | ``` |
50 | 77 |
|
51 | | -## Common use cases |
| 78 | +```javascript JavaScript |
| 79 | +const { data } = await sgai.scrape('https://protected-site.com', { |
| 80 | + format: 'markdown', |
| 81 | + fetchConfig: { |
| 82 | + mode: 'js+stealth', |
| 83 | + wait: 3000, |
| 84 | + scrolls: 3, |
| 85 | + country: 'us', |
| 86 | + }, |
| 87 | +}); |
| 88 | +``` |
52 | 89 |
|
53 | | -### Geo-targeted content |
| 90 | +</CodeGroup> |
54 | 91 |
|
55 | | -Access content that is only available in a specific country: |
| 92 | +### Custom headers and cookies |
56 | 93 |
|
57 | | -```python |
58 | | -# Using a proxy located in Germany |
59 | | -proxy = "http://user:pass@de-proxy.example.com:8080" |
60 | | -``` |
| 94 | +Pass custom HTTP headers or cookies with your requests: |
61 | 95 |
|
62 | | -### Bypassing IP-based rate limits |
| 96 | +<CodeGroup> |
63 | 97 |
|
64 | | -If the target website blocks your IP after too many requests, rotate through a pool of proxy IPs: |
| 98 | +```python Python |
| 99 | +from scrapegraph_py import Client, FetchConfig |
65 | 100 |
|
66 | | -```python |
67 | | -import itertools |
| 101 | +client = Client(api_key="your-api-key") |
68 | 102 |
|
69 | | -proxies = itertools.cycle([ |
70 | | - "http://user:pass@proxy1.example.com:8080", |
71 | | - "http://user:pass@proxy2.example.com:8080", |
72 | | - "http://user:pass@proxy3.example.com:8080", |
73 | | -]) |
| 103 | +response = client.extract( |
| 104 | + url="https://example.com", |
| 105 | + prompt="Extract product details", |
| 106 | + fetch_config=FetchConfig( |
| 107 | + headers={"Accept-Language": "en-US"}, |
| 108 | + cookies={"session": "abc123"}, |
| 109 | + ), |
| 110 | +) |
| 111 | +``` |
74 | 112 |
|
75 | | -for url in urls_to_scrape: |
76 | | - response = client.smartscraper( |
77 | | - website_url=url, |
78 | | - user_prompt="Extract the product details", |
79 | | - proxy=next(proxies), |
80 | | - ) |
| 113 | +```javascript JavaScript |
| 114 | +const { data } = await sgai.extract('https://example.com', { |
| 115 | + prompt: 'Extract product details', |
| 116 | + fetchConfig: { |
| 117 | + headers: { 'Accept-Language': 'en-US' }, |
| 118 | + cookies: { session: 'abc123' }, |
| 119 | + }, |
| 120 | +}); |
81 | 121 | ``` |
82 | 122 |
|
| 123 | +</CodeGroup> |
| 124 | + |
83 | 125 | ## Tips |
84 | 126 |
|
85 | | -- Use a reputable proxy provider for reliable uptime and performance. |
86 | | -- Test your proxy connection independently before passing it to ScrapeGraphAI to rule out proxy-side issues. |
87 | | -- Do not use public/free proxies for sensitive data — they may log or modify your traffic. |
| 127 | +- Start with `mode: "auto"` and only switch to a specific mode if you need to. |
| 128 | +- Use `js+stealth` for sites with strong anti-bot protections. |
| 129 | +- Add `wait` time for pages that load content dynamically after the initial render. |
| 130 | +- Use `scrolls` to trigger lazy-loaded content on infinite-scroll pages. |
| 131 | +- The `country` parameter doesn't affect pricing — credits are charged the same regardless of proxy location. |
0 commit comments