Skip to content

sni-router: break domain-fronting loop#478

Open
dolonet wants to merge 2 commits into9seconds:masterfrom
dolonet:fix/sni-router-fronting-loop
Open

sni-router: break domain-fronting loop#478
dolonet wants to merge 2 commits into9seconds:masterfrom
dolonet:fix/sni-router-fronting-loop

Conversation

@dolonet
Copy link
Copy Markdown
Collaborator

@dolonet dolonet commented Apr 25, 2026

Summary

Follow-up to #462. When the secret's domain resolves back to this server (the SNI-router default), mtg's fallback fronting dial lands on HAProxy, HAProxy sees the SNI matching the secret and routes the connection back to mtg → loop.

Reported by @gaudima in #462 (comment).

Fix

Pin [domain-fronting].host = "web" in mtg-config.toml so mtg dials the Caddy container directly via compose-network DNS, bypassing HAProxy. Requires mtg ≥ 2.4 (#480 added hostname acceptance for the fronting target — already merged).

mtg-config.toml:

[domain-fronting]
host = "web"
port = 8443
proxy-protocol = true

README gains a "Fronting loop" section explaining the cause. The existing "Real client IPs" sync list grows to a fourth piece ([domain-fronting].proxy-protocol), since mtg now also writes a PROXY v2 header on the fronting dial.

Net diff: +51 −3 / 2 files (README.md, mtg-config.toml). No docker-compose.yml changes — compose-network DNS handles the routing without a static subnet.

Test plan

  • On a test VPS with DNS pointing at the host:
    • Telegram client connects through the proxy as before
    • curl https://DOMAIN/ returns Caddy's content
    • curl --resolve DOMAIN:443:HOST_IP -k -I https://DOMAIN/ (probe simulation: SNI matches the secret, no MTProto handshake) — connection terminates against Caddy without looping; Caddy's access log shows the real client IP

Follow-up

Runtime/doctor self-loop detection (sketched up-thread) tracked separately so this PR stays config + docs only.

Comment thread contrib/sni-router/README.md Outdated
Comment on lines +92 to +94
> (Caddy's pinned address). Caddy may refuse the mixed-family header
> and log the docker-network address instead of the real client IP for
> that connection. Telegram traffic is unaffected.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fixable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — it disappears with the hostname change in #480. Dual-stack docker DNS lets mtg dial an IPv6 backend for IPv6 clients, so the PROXY v2 source/dest stay same-family. Caveat will be dropped once #480 merges.

Comment thread contrib/sni-router/README.md Outdated
Comment on lines +81 to +82
`docker-compose.yml` (mtg's `domain-fronting.ip` only accepts a literal
IP, not a hostname, hence the static `sni` network). `proxy-protocol =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fundamental mtg's restriction? Maybe try to fix it there?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fundamental — TypeIP just calls net.ParseIP, but the rest of the dial path is hostname-capable. Opened #480 to add a sibling [domain-fronting].host that accepts hostname or IP. Once it lands this PR shrinks to a host = "web" line and the static subnet/pin go away.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#480 I suggest to have this one first, so the whole PR could be simplified

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can shrink it now, right?

Comment thread contrib/sni-router/docker-compose.yml Outdated

networks:
sni:
driver: bridge
Copy link
Copy Markdown
Contributor

@bam80 bam80 Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bridge driver necessary?

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 28, 2026

Just to clarify - is this problem happens only if both the hostname and the domain are fully equal, or also if they just partially intersect - so even if the hostname is a.b.com and the domain is b.com?

@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented Apr 28, 2026

Good question — it's not about the names overlapping, it's about DNS.

HAProxy matches the SNI exactly (req_ssl_sni -i …), so a partial suffix wouldn't route. What actually triggers the loop is that mtg's default fronting target is the secret's hostname, which it resolves via DNS. In an SNI-router setup that hostname has to point at this same server (otherwise clients couldn't reach mtg at all), so mtg's fronting dial lands back on HAProxy carrying the original ClientHello → HAProxy sees the secret's SNI → routes to mtg → loop.

So in your a.b.com / b.com example, the relationship between the two names doesn't matter. As long as the secret's hostname resolves to this host (which it must, for the setup to work), the loop reproduces. The [domain-fronting] pin in this PR sidesteps it in every case by routing mtg directly to Caddy without going back through HAProxy.

Pushed a small README tweak (bcfacec) leading with "the trigger is DNS, not name equality" so the doc doesn't imply the matching-name case is the only one.

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 29, 2026

Could we add a loop detection to the mtg runtime and/or it's config check doctor mode?

@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented Apr 30, 2026

Good idea, but I'd rather not expand 478 (it's config + docs only) — happy to track it as a separate issue/PR.

Sketch of a feasible runtime check: when [domain-fronting] isn't set, resolve the secret's hostname at startup and compare the result against the local interface addresses / the bind address. On a match, warn that the fronting dial may loop back through the same listener and recommend pinning [domain-fronting] upstream. Warning, not fatal — legitimate self-fronting is rare but possible.

Caveat worth being upfront about: the check only sees the direct case. An SNI-router on a separate IP that ultimately routes back to mtg would slip through, since mtg's outbound dial lands on a "foreign" IP. A precise detector would need an out-of-band marker on the fronting connection, which MTProto doesn't expose cleanly. So 80% coverage from a cheap check, the rest stays a documentation problem.

Shall I open a follow-up issue with this scope?

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 30, 2026

Sure, thanks.

@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented Apr 30, 2026

Wanna hear @9seconds's opinion on that before diving in :)

@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 4, 2026

Sounds good — happy to wait on #480. Once it lands, this PR collapses to:

  • drop the static networks: block and ipv4_address pin in docker-compose.yml,
  • replace [domain-fronting].ip = "172.28.0.10" with host = "web" (the compose service name),
  • simplify the README's "Fronting loop" section accordingly (the dual-stack caveat noted at L101 also goes away — docker DNS gives mtg an A or AAAA per client, so PROXY v2 stays same-family).

I'll rebase and force-push the simplified version after #480 merges.

Separately, on the runtime loop-detection idea raised above (#478 (comment)) — would you like me to open a follow-up issue with the "resolve secret hostname at startup, warn if it matches a local interface, non-fatal" sketch? Easy to track, easy to scope, but I didn't want to open it without your nod.

When the secret's domain resolves back to this server (the SNI-router
default), mtg's fallback fronting dial lands on HAProxy, the SNI
matches the secret, HAProxy routes the connection back to mtg -> loop.

Set [domain-fronting].host = "web" in mtg-config.toml so mtg dials
Caddy directly via compose-network DNS, bypassing HAProxy.  Requires
mtg >= 2.4 (9seconds#480 added hostname acceptance for the fronting target).

README gains a "Fronting loop" section explaining the cause.
@dolonet dolonet force-pushed the fix/sni-router-fronting-loop branch from bf501a8 to 0fdf6cb Compare May 9, 2026 00:37
@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 9, 2026

Pushed the simplified version (force-pushed):

  • Drops the static networks: sni block and the pinned ipv4_address in docker-compose.yml.
  • Replaces [domain-fronting].ip = "172.28.0.10" with host = "web" (mtg resolves the service name through compose-network DNS), per config: accept hostname for [domain-fronting] target #480.
  • README's "Fronting loop" section trimmed accordingly — no more network-sync caveats, no more dual-stack note (compose DNS gives a single A or AAAA per query, so PROXY v2 stays same-family for the common path).

Net diff is now +45 / 2 files.

mtg now also sends PROXY v2 on the fronting dial (introduced in the
previous commit via [domain-fronting].proxy-protocol = true), so the
"Real client IPs" section's sync list must include that fourth piece.
Without it, an operator who disables Caddy's PROXY listener wrapper
without also flipping [domain-fronting].proxy-protocol will leave mtg
sending an unparsed PROXY v2 prefix to Caddy on every fronted probe.
@dolonet dolonet changed the title sni-router: break domain-fronting loop with pinned Caddy IP sni-router: break domain-fronting loop May 9, 2026
@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 9, 2026

Quick self-review pass before this lands on your queue:

  • README's "Real client IPs" sync list now lists four pieces instead of three (5d9a5ef). The new [domain-fronting].proxy-protocol = true is a fourth PROXY-protocol consumer (mtg → Caddy on fronting). Without flipping it together with the Caddyfile wrapper an operator gets a parse failure on every fronted probe, so it belongs in the sync list.
  • Refreshed PR title and body to match the simplified post-config: accept hostname for [domain-fronting] target #480 approach — the original wording still described the IP-pinning revision and would have been confusing on review.

@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 9, 2026

One correction on a side claim in the previous self-review note (#478 (comment)): I wrote that PROXY v2 "stays same-family for the common path." That holds only while the client's family matches the resolved family for web.

Compose's default network is single-stack IPv4, so an IPv6 client fronted to Caddy yields a mixed-family PROXY v2 header (TCPv6 source, IPv4 destination encoded as IPv4-mapped). PROXY v2 spec marks that as invalid, but Caddy's go-proxyproto parser accepts it (the destination just renders as ::ffff:x.x.x.x in logs). So the path works; the access-log line for the rare "IPv6 client → fronted → Caddy" case will just show the client's IP in IPv4-mapped form.

If pristine IPv6 logging matters for someone's deployment, they can enable_ipv6: true on the compose default network so web ends up with an AAAA record and the dial stays TCPv6 end-to-end. IPv4 clients are unaffected. Not worth a README change in this PR; flagging it here so the audit trail is honest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants