html2rss.github.io/src/content/docs/ruby-gem/reference/strategy.mdx at 50d63e04f47a9d0a7ee440cba0e86239dd06bd81 · html2rss/html2rss.github.io

title	Strategy
description	Learn about different strategies for fetching website content with html2rss. Choose between faraday and browserless strategies for optimal performance.

The strategy key defines how html2rss fetches a website's content.

faraday (default): Makes a direct HTTP request. It is fast but does not execute JavaScript.
browserless: Renders the website in a headless Chrome browser, which is necessary for JavaScript-heavy sites.

strategy is a top-level config key. Request-specific controls live under request.

Use faraday first for direct newsroom/listing/changelog pages. Prefer browserless when the target is client-rendered, protected by anti-bot checks, or otherwise requires JavaScript to expose article links.

`browserless`

To use the browserless strategy, you need a running instance of Browserless.io.

Docker

You can run a local Browserless.io instance using Docker:

docker run \
  --rm \
  -p 3000:3000 \
  -e "CONCURRENT=10" \
  -e "TOKEN=6R0W53R135510" \
  ghcr.io/browserless/chromium

Configuration

Set the strategy at the top level of your feed configuration and put request controls under request:

strategy: browserless
request:
  max_redirects: 5
  max_requests: 6
channel:
  url: "https://example.com/app"
selectors:
  items:
    selector: ".article"
  title:
    selector: "h2"
  url:
    selector: "a"
    extractor: "href"

Request Structure

Use this split consistently:

strategy: selects faraday or browserless
headers: top-level headers shared by all strategies
request.max_redirects: redirect limit for the request session
request.max_requests: total request budget for the whole feed build
request.browserless.*: Browserless-only options

Example:

strategy: browserless
headers:
  User-Agent: "Mozilla/5.0 (compatible; html2rss/1.0)"
request:
  max_redirects: 5
  max_requests: 6
  browserless:
    preload:
      wait_after_ms: 5000
channel:
  url: "https://example.com/app"
selectors:
  items:
    selector: ".article"
  title:
    selector: "h2"
  url:
    selector: "a"
    extractor: "href"

Browserless Preload

Browserless can interact with the page before html2rss captures the final HTML. Configure preload steps under request.browserless.preload.

strategy: browserless
request:
  browserless:
    preload:
      wait_after_ms: 5000
      click_selectors:
        - selector: ".load-more"
          max_clicks: 3
          wait_after_ms: 250
      scroll_down:
        iterations: 5
        wait_after_ms: 200

wait_after_ms: inserts a fixed wait before or after preload steps
click_selectors: clicks matching elements until they disappear or max_clicks is reached
scroll_down: scrolls until the page height stops growing or iterations is reached

If preload triggers a real navigation or redirect, html2rss keeps the final document metadata. Relative links and follow-up pagination therefore resolve against the page that was actually rendered after preload completed.

Command-Line Usage

You can also specify the strategy on the command line:

# Set environment variables for your Browserless.io instance
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" \
BROWSERLESS_IO_API_TOKEN="6R0W53R135510" \
  html2rss feed my_config.yml --strategy browserless

# Override request budgets at runtime
html2rss feed my_config.yml --max-redirects 5 --max-requests 6

# Or rely on the strategy stored in the YAML config
html2rss feed my_config.yml

Browserless Troubleshooting

If Browserless cannot connect, html2rss surfaces a Browserless connection failed (...) error with endpoint/token hints.

Check these first:

BROWSERLESS_IO_WEBSOCKET_URL is reachable from where html2rss runs
BROWSERLESS_IO_API_TOKEN matches your Browserless TOKEN
your Browserless service is running and accepting connections

For custom Browserless websocket endpoints, BROWSERLESS_IO_API_TOKEN is mandatory. The local default endpoint (ws://127.0.0.1:3000) can use the default local token 6R0W53R135510.

For detailed documentation on the Ruby API, see the official YARD documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`browserless`

Docker

Configuration

Request Structure

Browserless Preload

Command-Line Usage

Browserless Troubleshooting

FilesExpand file tree

strategy.mdx

Latest commit

History

strategy.mdx

File metadata and controls

browserless

Docker

Configuration

Request Structure

Browserless Preload

Command-Line Usage

Browserless Troubleshooting

`browserless`