Skip to content

Commit 4547fb9

Browse files
VinciGit00claude
andcommitted
docs: update proxy docs with v2 FetchConfig modes and SDK examples
Rewrite proxy configuration page to document FetchConfig object with mode parameter (auto/fast/js/direct+stealth/js+stealth), country-based geotargeting, and all fetch options. Update knowledge-base proxy guide and fix FetchConfig examples in both Python and JavaScript SDK pages to match the actual v2 API surface. Refs: ScrapeGraphAI/scrapegraph-js#11, ScrapeGraphAI/scrapegraph-py#82 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4507eec commit 4547fb9

4 files changed

Lines changed: 390 additions & 177 deletions

File tree

knowledge-base/scraping/proxy.mdx

Lines changed: 97 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,131 @@
11
---
2-
title: Scraping behind a proxy
3-
description: 'Route requests through your own proxy for geo-targeting or privacy'
2+
title: Proxy & Fetch Configuration
3+
description: 'Control proxy routing, stealth mode, and geo-targeting with FetchConfig'
44
---
55

6-
Using a proxy lets you route ScrapeGraphAI requests through a specific IP address or geographic location. This is useful for accessing geo-restricted content, bypassing IP-based blocks, or testing region-specific pages.
6+
In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object. You can set the proxy strategy (`mode`), country-based geotargeting (`country`), wait times, scrolling, custom headers, and more.
77

8-
## How to pass a proxy
8+
See the [full proxy reference](/services/additional-parameters/proxy) for all available options.
99

10-
Use the `proxy` parameter available in SmartScraper, SearchScraper, and Markdownify:
10+
## Choosing a fetch mode
1111

12-
```python
13-
from scrapegraph_py import Client
12+
The `mode` parameter controls how pages are retrieved:
13+
14+
| Mode | Description |
15+
|------|-------------|
16+
| `auto` | Automatically selects the best strategy (default) |
17+
| `fast` | Direct HTTP fetch, no JS rendering — fastest option |
18+
| `js` | Headless browser for JavaScript-heavy pages |
19+
| `direct+stealth` | Residential proxy with stealth headers (no JS) |
20+
| `js+stealth` | JS rendering + residential/stealth proxy |
21+
22+
## Examples
23+
24+
### Geo-targeted content
25+
26+
Access content from a specific country using the `country` parameter:
27+
28+
<CodeGroup>
29+
30+
```python Python
31+
from scrapegraph_py import Client, FetchConfig
1432

1533
client = Client(api_key="your-api-key")
1634

17-
response = client.smartscraper(
18-
website_url="https://example.com",
19-
user_prompt="Extract the main content",
20-
proxy="http://username:password@proxy-host:8080",
35+
response = client.extract(
36+
url="https://example.com",
37+
prompt="Extract the main content",
38+
fetch_config=FetchConfig(country="de"), # Route through Germany
2139
)
2240
```
2341

24-
```javascript
25-
import { scrapegraphai } from "scrapegraph-js";
42+
```javascript JavaScript
43+
import { scrapegraphai } from 'scrapegraph-js';
2644

27-
const sgai = scrapegraphai({ apiKey: "your-api-key" });
28-
const { data } = await sgai.extract("https://example.com", {
29-
prompt: "Extract the main content",
30-
fetchConfig: {
31-
proxy: "http://username:password@proxy-host:8080",
32-
},
45+
const sgai = scrapegraphai({ apiKey: 'your-api-key' });
46+
47+
const { data } = await sgai.extract('https://example.com', {
48+
prompt: 'Extract the main content',
49+
fetchConfig: { country: 'de' },
3350
});
3451
```
3552

36-
See the [proxy parameter documentation](/services/additional-parameters/proxy) for the full reference.
53+
</CodeGroup>
3754

38-
## Proxy URL format
55+
### Stealth mode for protected sites
3956

40-
```
41-
http://username:password@host:port
42-
socks5://username:password@host:port
43-
```
57+
Use stealth modes to bypass anti-bot protections:
4458

45-
If the proxy does not require authentication:
59+
<CodeGroup>
4660

47-
```
48-
http://host:port
61+
```python Python
62+
from scrapegraph_py import Client, FetchConfig
63+
64+
client = Client(api_key="your-api-key")
65+
66+
response = client.scrape(
67+
url="https://protected-site.com",
68+
format="markdown",
69+
fetch_config=FetchConfig(
70+
mode="js+stealth",
71+
wait=3000,
72+
scrolls=3,
73+
country="us",
74+
),
75+
)
4976
```
5077

51-
## Common use cases
78+
```javascript JavaScript
79+
const { data } = await sgai.scrape('https://protected-site.com', {
80+
format: 'markdown',
81+
fetchConfig: {
82+
mode: 'js+stealth',
83+
wait: 3000,
84+
scrolls: 3,
85+
country: 'us',
86+
},
87+
});
88+
```
5289

53-
### Geo-targeted content
90+
</CodeGroup>
5491

55-
Access content that is only available in a specific country:
92+
### Custom headers and cookies
5693

57-
```python
58-
# Using a proxy located in Germany
59-
proxy = "http://user:pass@de-proxy.example.com:8080"
60-
```
94+
Pass custom HTTP headers or cookies with your requests:
6195

62-
### Bypassing IP-based rate limits
96+
<CodeGroup>
6397

64-
If the target website blocks your IP after too many requests, rotate through a pool of proxy IPs:
98+
```python Python
99+
from scrapegraph_py import Client, FetchConfig
65100

66-
```python
67-
import itertools
101+
client = Client(api_key="your-api-key")
68102

69-
proxies = itertools.cycle([
70-
"http://user:pass@proxy1.example.com:8080",
71-
"http://user:pass@proxy2.example.com:8080",
72-
"http://user:pass@proxy3.example.com:8080",
73-
])
103+
response = client.extract(
104+
url="https://example.com",
105+
prompt="Extract product details",
106+
fetch_config=FetchConfig(
107+
headers={"Accept-Language": "en-US"},
108+
cookies={"session": "abc123"},
109+
),
110+
)
111+
```
74112

75-
for url in urls_to_scrape:
76-
response = client.smartscraper(
77-
website_url=url,
78-
user_prompt="Extract the product details",
79-
proxy=next(proxies),
80-
)
113+
```javascript JavaScript
114+
const { data } = await sgai.extract('https://example.com', {
115+
prompt: 'Extract product details',
116+
fetchConfig: {
117+
headers: { 'Accept-Language': 'en-US' },
118+
cookies: { session: 'abc123' },
119+
},
120+
});
81121
```
82122

123+
</CodeGroup>
124+
83125
## Tips
84126

85-
- Use a reputable proxy provider for reliable uptime and performance.
86-
- Test your proxy connection independently before passing it to ScrapeGraphAI to rule out proxy-side issues.
87-
- Do not use public/free proxies for sensitive data — they may log or modify your traffic.
127+
- Start with `mode: "auto"` and only switch to a specific mode if you need to.
128+
- Use `js+stealth` for sites with strong anti-bot protections.
129+
- Add `wait` time for pages that load content dynamically after the initial render.
130+
- Use `scrolls` to trigger lazy-loaded content on infinite-scroll pages.
131+
- The `country` parameter doesn't affect pricing — credits are charged the same regardless of proxy location.

sdks/javascript.mdx

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,7 @@ const { data } = await sgai.extract(
137137
{
138138
prompt: "Extract the main heading",
139139
fetchConfig: {
140-
stealth: true,
141-
render: true,
140+
mode: 'js+stealth',
142141
wait: 2000,
143142
scrolls: 3,
144143
},
@@ -288,19 +287,18 @@ data.items.forEach((entry) => {
288287

289288
### FetchConfig
290289

291-
Controls how pages are fetched.
290+
Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
292291

293292
```javascript
294293
{
295-
stealth: true, // Anti-detection mode
296-
render: true, // Render JavaScript
297-
wait: 2000, // Wait time after page load (ms)
298-
scrolls: 3, // Number of scrolls (0-100)
299-
country: "us", // Proxy country code
300-
cookies: { key: "value" },
301-
headers: { "X-Custom": "header" },
302-
timeout: 15000, // Fetch timeout in ms (1000-30000)
303-
mock: false, // Enable mock mode for testing
294+
mode: 'js+stealth', // Proxy strategy: auto, fast, js, direct+stealth, js+stealth
295+
timeout: 15000, // Request timeout in ms (1000-60000)
296+
wait: 2000, // Wait after page load in ms (0-30000)
297+
scrolls: 3, // Number of scrolls (0-100)
298+
country: 'us', // Proxy country code (ISO 3166-1 alpha-2)
299+
headers: { 'X-Custom': 'header' },
300+
cookies: { key: 'value' },
301+
mock: false, // Enable mock mode for testing
304302
}
305303
```
306304

sdks/python.mdx

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,8 @@ response = client.extract(
126126
url="https://example.com",
127127
prompt="Extract the main heading",
128128
fetch_config=FetchConfig(
129-
stealth=True,
130-
render_js=True,
131-
wait_ms=2000,
129+
mode="js+stealth",
130+
wait=2000,
132131
scrolls=3,
133132
),
134133
llm_config=LlmConfig(
@@ -282,20 +281,20 @@ for entry in history["items"]:
282281

283282
### FetchConfig
284283

285-
Controls how pages are fetched.
284+
Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
286285

287286
```python
288287
from scrapegraph_py import FetchConfig
289288

290289
config = FetchConfig(
291-
mock=False, # Enable mock mode for testing
292-
stealth=True, # Anti-detection mode
290+
mode="js+stealth", # Proxy strategy: auto, fast, js, direct+stealth, js+stealth
291+
timeout=15000, # Request timeout in ms (1000-60000)
292+
wait=2000, # Wait after page load in ms (0-30000)
293293
scrolls=3, # Number of scrolls (0-100)
294-
country="us", # Proxy country code
295-
cookies={"key": "value"},
294+
country="us", # Proxy country code (ISO 3166-1 alpha-2)
296295
headers={"X-Custom": "header"},
297-
wait_ms=2000, # Wait time after page load (ms)
298-
render_js=True, # Render JavaScript
296+
cookies={"key": "value"},
297+
mock=False, # Enable mock mode for testing
299298
)
300299
```
301300

0 commit comments

Comments
 (0)