Skip to content

Commit f40d3e0

Browse files
authored
docs: align ruby gem guides with current config behavior (#1083)
* docs(ruby-gem): align reference pages with current config surface * docs(ruby-gem): make guides and tutorials use valid configs * Apply suggestions from code review
1 parent 249ba37 commit f40d3e0

13 files changed

Lines changed: 295 additions & 58 deletions

src/content/docs/ruby-gem/how-to/advanced-features.mdx

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,16 @@ html2rss is designed to be memory-efficient:
3535
For websites with many items:
3636

3737
```yaml
38-
# Use specific selectors to limit items
38+
channel:
39+
url: "https://example.com/articles"
3940
selectors:
4041
items:
4142
selector: ".article:not(.advertisement)" # Exclude ads
4243
title:
4344
selector: "h2" # More specific than generic selectors
45+
url:
46+
selector: "a"
47+
extractor: "href"
4448
```
4549
4650
## Error Recovery
@@ -59,6 +63,16 @@ Optimize requests with appropriate headers:
5963
headers:
6064
Accept: "text/html,application/xhtml+xml" # Avoid JSON if not needed
6165
Accept-Encoding: "gzip, deflate" # Enable compression
66+
channel:
67+
url: "https://example.com/articles"
68+
selectors:
69+
items:
70+
selector: "article"
71+
title:
72+
selector: "h2"
73+
url:
74+
selector: "a"
75+
extractor: "href"
6276
```
6377

6478
## Monitoring and Debugging
@@ -98,13 +112,20 @@ Invalid articles are automatically filtered out to prevent empty or broken feed
98112
You can add custom validation by using post-processors:
99113

100114
```yaml
115+
channel:
116+
url: "https://example.com/articles"
101117
selectors:
118+
items:
119+
selector: "article"
102120
title:
103121
selector: "h2"
104122
post_process:
105123
- name: "gsub"
106124
pattern: "^\\s*$"
107125
replacement: "Untitled"
126+
url:
127+
selector: "a"
128+
extractor: "href"
108129
```
109130

110131
## Best Practices

src/content/docs/ruby-gem/how-to/custom-http-requests.mdx

Lines changed: 75 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "Custom HTTP Requests"
33
description: "Learn how to customize HTTP requests with custom headers, authentication, and API interactions for html2rss."
44
---
55

6-
Some websites require custom HTTP headers, authentication, or specific request configurations to access their content. html2rss makes it easy to customize your requests to handle these scenarios.
6+
Some websites require custom HTTP headers, authentication, or other request settings to access their content. `html2rss` lets you customize requests for those cases.
77

88
## When You Need Custom Headers
99

@@ -17,7 +17,7 @@ You might need custom HTTP requests when:
1717

1818
## Basic Configuration
1919

20-
Add a `headers` section to your feed configuration:
20+
Add a `headers` section to your feed configuration. This example is a complete, valid config:
2121

2222
```yaml
2323
headers:
@@ -28,9 +28,11 @@ channel:
2828
url: https://api.example.com/posts
2929
selectors:
3030
items:
31-
selector: ".post"
31+
selector: "array > object"
3232
title:
33-
selector: "h2"
33+
selector: "title"
34+
url:
35+
selector: "url"
3436
```
3537
3638
## Common Use Cases
@@ -43,6 +45,15 @@ Many APIs require authentication tokens:
4345
headers:
4446
Authorization: "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
4547
X-API-Key: "your-api-key-here"
48+
channel:
49+
url: "https://api.example.com/posts"
50+
selectors:
51+
items:
52+
selector: "array > object"
53+
title:
54+
selector: "title"
55+
url:
56+
selector: "url"
4657
```
4758
4859
### User Agent Spoofing
@@ -55,6 +66,16 @@ headers:
5566
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
5667
Accept-Language: "en-US,en;q=0.5"
5768
Accept-Encoding: "gzip, deflate"
69+
channel:
70+
url: "https://example.com/articles"
71+
selectors:
72+
items:
73+
selector: "article"
74+
title:
75+
selector: "h2"
76+
url:
77+
selector: "a"
78+
extractor: "href"
5879
```
5980
6081
### Content Type Negotiation
@@ -63,9 +84,16 @@ Request specific content types:
6384
6485
```yaml
6586
headers:
66-
Accept: "application/json" # For JSON APIs
67-
Accept: "text/html" # For HTML content
68-
Accept: "application/rss+xml" # For RSS feeds
87+
Accept: "application/json"
88+
channel:
89+
url: "https://api.example.com/posts"
90+
selectors:
91+
items:
92+
selector: "array > object"
93+
title:
94+
selector: "title"
95+
url:
96+
selector: "url"
6997
```
7098
7199
### Custom API Headers
@@ -77,6 +105,15 @@ headers:
77105
X-Requested-With: "XMLHttpRequest"
78106
X-Custom-Header: "your-value"
79107
Content-Type: "application/json"
108+
channel:
109+
url: "https://api.example.com/posts"
110+
selectors:
111+
items:
112+
selector: "array > object"
113+
title:
114+
selector: "title"
115+
url:
116+
selector: "url"
80117
```
81118
82119
## Dynamic Headers
@@ -85,12 +122,27 @@ You can use dynamic parameters in headers for runtime values:
85122
86123
```yaml
87124
headers:
88-
Authorization: "Bearer {{api_token}}"
89-
X-User-ID: "{{user_id}}"
125+
Authorization: "Bearer %<api_token>s"
126+
X-User-ID: "%<user_id>s"
127+
channel:
128+
url: "https://api.example.com/users/%<user_id>s/posts"
129+
selectors:
130+
items:
131+
selector: "array > object"
132+
title:
133+
selector: "title"
134+
url:
135+
selector: "url"
90136
```
91137
92138
See our [Dynamic Parameters guide](/ruby-gem/how-to/dynamic-parameters) for more details.
93139
140+
## Notes
141+
142+
- Header examples that target third-party APIs are illustrative. Authentication requirements, header names, and response shapes can change independently of `html2rss`.
143+
- For JSON APIs, validate the response structure before assuming selectors like `array > object` or `html_url` will match.
144+
- If you document or share a config for reuse, prefer placeholder values and parameterized headers over embedding real tokens.
145+
94146
## Testing Your Headers
95147

96148
Test your configuration to ensure headers work correctly:
@@ -130,6 +182,13 @@ headers:
130182
User-Agent: "html2rss/1.0"
131183
channel:
132184
url: https://api.github.com/repos/owner/repo/issues
185+
selectors:
186+
items:
187+
selector: "array > object"
188+
title:
189+
selector: "title"
190+
url:
191+
selector: "html_url"
133192
```
134193

135194
### Reddit API
@@ -140,6 +199,13 @@ headers:
140199
Accept: "application/json"
141200
channel:
142201
url: https://www.reddit.com/r/programming.json
202+
selectors:
203+
items:
204+
selector: "data > children > object > data"
205+
title:
206+
selector: "title"
207+
url:
208+
selector: "url"
143209
```
144210

145211
## Related Topics

src/content/docs/ruby-gem/how-to/dynamic-parameters.mdx

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,25 @@ title: Dynamic Parameters
33
description: "Learn how to use dynamic parameters in URLs and headers for creating reusable feed configurations. Pass runtime values to customize feeds."
44
---
55

6-
For websites with similar structures but varying content based on a parameter in the URL or headers, you can use dynamic parameters.
6+
Use dynamic parameters when websites share the same structure but vary by URL or header values.
77

88
## Solution
99

1010
You can add dynamic parameters to the `channel` and `headers` values. This is useful for creating feeds from structurally similar pages with different URLs.
1111

1212
```yaml
1313
channel:
14-
url: "http://domainname.tld/whatever/%<id>s.html"
14+
url: "https://domainname.tld/whatever/%<id>s.html"
1515
headers:
1616
X-Something: "%<foo>s"
17+
selectors:
18+
items:
19+
selector: "article"
20+
title:
21+
selector: "h2"
22+
url:
23+
selector: "a"
24+
extractor: "href"
1725
```
1826
1927
You can then pass the values for these parameters when you run `html2rss`:
@@ -30,6 +38,12 @@ html2rss feed the_feed_config.yml --params id:42 foo:bar
3038
- You provide the actual values for these parameters at runtime using the `--params` option.
3139
- This allows you to reuse the same feed configuration for multiple similar pages or APIs.
3240

41+
## Notes
42+
43+
- Dynamic substitution applies to `channel` and `headers`. Selector definitions are not parameterized by this feature.
44+
- If a config references `%<param>s` and you do not provide a value, feed generation fails unless the caller supplies a fallback.
45+
- For shared config repositories such as `html2rss-configs`, it is common to store default parameter values alongside the config so examples, validation, and tests have concrete inputs.
46+
3347
## Related Topics
3448

3549
- **[Custom HTTP Requests](/ruby-gem/how-to/custom-http-requests/)** - Using dynamic parameters in headers

src/content/docs/ruby-gem/how-to/managing-feed-configs.mdx

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,24 @@ feeds:
2121
channel:
2222
url: "https://example.com/blog"
2323
selectors:
24-
# ...
24+
items:
25+
selector: ".post"
26+
title:
27+
selector: "h2"
28+
url:
29+
selector: "a"
30+
extractor: "href"
2531
my-second-feed:
2632
channel:
2733
url: "https://example.com/news"
2834
selectors:
29-
# ...
35+
items:
36+
selector: ".news-item"
37+
title:
38+
selector: "h2"
39+
url:
40+
selector: "a"
41+
extractor: "href"
3042
```
3143
3244
## Building Feeds from a YAML File

src/content/docs/ruby-gem/how-to/scraping-json.mdx

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,12 @@ Html2rss.feed(
6868
Accept: 'application/json'
6969
},
7070
channel: {
71-
url: 'http://domainname.tld/whatever.json'
71+
url: 'https://domainname.tld/whatever.json'
7272
},
7373
selectors: {
74-
title: { selector: 'foo' }
74+
items: { selector: 'array > object' },
75+
title: { selector: 'title' },
76+
url: { selector: 'url' }
7577
}
7678
)
7779
```
@@ -82,10 +84,12 @@ Html2rss.feed(
8284
headers:
8385
Accept: application/json
8486
channel:
85-
url: "http://domainname.tld/whatever.json"
87+
url: "https://domainname.tld/whatever.json"
8688
selectors:
8789
items:
8890
selector: "array > object"
8991
title:
90-
selector: "foo"
92+
selector: ".title"
93+
url:
94+
selector: "url"
9195
```

src/content/docs/ruby-gem/reference/auto-source.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ You can customize `auto_source` to improve its accuracy.
3636
Enable or disable specific scrapers and adjust their settings:
3737

3838
```yaml
39+
channel:
40+
url: https://example.com
3941
auto_source:
4042
scraper:
4143
schema:
@@ -55,6 +57,8 @@ auto_source:
5557
Remove unwanted items from the results:
5658

5759
```yaml
60+
channel:
61+
url: https://example.com
5862
auto_source:
5963
cleanup:
6064
keep_different_domain: false # default: true

src/content/docs/ruby-gem/reference/channel.mdx

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@ title: Channel
33
description: "Learn about the channel configuration block for RSS feed metadata. Configure feed title, description, author, and other RSS channel properties."
44
---
55

6-
The `channel` configuration block defines the metadata for your RSS feed.
6+
The `channel` configuration block defines your feed metadata.
7+
8+
This example is a complete feed config so you can see the `channel` block in context:
79

810
```yaml
911
channel:
@@ -12,8 +14,16 @@ channel:
1214
description: "A feed of the latest news from Example.com"
1315
author: "[email protected] (Jane Doe)"
1416
ttl: 60
15-
language: "en-us"
17+
language: "en"
1618
time_zone: "Europe/Berlin"
19+
selectors:
20+
items:
21+
selector: "article"
22+
title:
23+
selector: "h2"
24+
url:
25+
selector: "a"
26+
extractor: "href"
1727
```
1828
1929
## Options
@@ -28,6 +38,12 @@ channel:
2838
| `language` | Optional | The language of the feed. Defaults to the `lang` attribute of the `<html>` tag. |
2939
| `time_zone` | Optional | The time zone for parsing dates. See the [list of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
3040

41+
## Notes
42+
43+
- `language` is runtime-validated. Use a valid language code such as `en`, not an arbitrary string.
44+
- `author` should follow the RSS-style `email (Name)` format when you set it explicitly.
45+
- `time_zone` must be a known TZ database identifier such as `UTC` or `Europe/Berlin`.
46+
3147
---
3248

3349
For detailed documentation on the Ruby API, see the [official YARD documentation](https://www.rubydoc.info/gems/html2rss).

0 commit comments

Comments
 (0)