Skip to content

Improve firewall and NAT64 rules / enable RCP recovery in beta#4500

Open
agners wants to merge 8 commits intomasterfrom
improve-firewall-and-nat64-rules-and-enable-recovery-in-beta
Open

Improve firewall and NAT64 rules / enable RCP recovery in beta#4500
agners wants to merge 8 commits intomasterfrom
improve-firewall-and-nat64-rules-and-enable-recovery-in-beta

Conversation

@agners
Copy link
Copy Markdown
Member

@agners agners commented Mar 18, 2026

Summary

  • Replace broad FORWARD iptables rules that accepted all traffic on the infra interface with a dedicated OTBR_FORWARD_NAT64 chain. Outbound Thread traffic is matched by fwmark, return traffic is scoped to established connections only.
  • Add proper teardown of NAT64 iptables rules in finish to prevent rule accumulation on container restart (fixes OTBR finish script does not clean up NAT64 iptables rules, causing duplicates on restart #4488).
  • Use ${} style and quoting for variables throughout, resolving all shellcheck exceptions.
  • Enable spinel recovery for beta builds.

Details

The previous NAT64 rules appended directly to the FORWARD chain with blanket -i/-o accepts on the infra interface. This was overly broad (allowed all forwarded traffic through that interface, not just Thread-originated) and had no cleanup path — every restart added duplicate rules.

The new approach mirrors the existing ip6tables pattern: a named chain (OTBR_FORWARD_NAT64) with an unconditional jump from FORWARD. Inside the chain, outbound traffic is identified by the fwmark set in mangle/PREROUTING on the Thread interface, and return traffic is accepted only for established connections via conntrack. Unmatched packets fall through to the rest of FORWARD via implicit RETURN.

Summary by CodeRabbit

  • New Features

    • Beta: enabled automatic recovery from radio transmission timeouts
  • Bug Fixes

    • Fixed and tightened NAT64 firewall handling with more thorough teardown
    • Narrowed non-firewall forwarding to only Thread and infrastructure interfaces
    • Improved restoration/retry handling to reduce RCP restore failures and improve startup/firewall reliability
  • Chore

    • Version bumped to 2.16.6

@agners agners requested a review from puddly March 18, 2026 21:16
@agners agners marked this pull request as draft March 18, 2026 21:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Update OTBR scripts to quote variables, add a NAT64 forwarding chain and mark (OTBR_FORWARD_NAT64, 0x1001) with conntrack-aware rules and comprehensive teardown; add Docker CMake flag DOT_RCP_RESTORATION_MAX_COUNT=2; bump config to 2.16.6; move ShellCheck opts into the lint job step.

Changes

Cohort / File(s) Summary
Build & Changelog
openthread_border_router/Dockerfile, openthread_border_router/CHANGELOG.md, openthread_border_router/config.yaml
Added CMake flag -DOT_RCP_RESTORATION_MAX_COUNT=2 in beta & stable build stages; bumped config to 2.16.6 and added changelog entries.
OTBR agent runtime
openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run
Quote variable expansions; introduce OTBR_FORWARD_INGRESS/OTBR_FORWARD_EGRESS usage consistency and new OTBR_FORWARD_NAT64 chain; use ${otbr_fw_mark} instead of hardcoded mark; route NAT64 via dedicated chain with conntrack-aware ACCEPT rules.
OTBR agent teardown
openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish
Quote variable expansions; add comprehensive NAT64 cleanup loops: delete FORWARD jump rules, flush & delete OTBR_FORWARD_NAT64, remove mangle PREROUTING mark rules and nat POSTROUTING MASQUERADE rules; updated final log message.
Shared agent vars
openthread_border_router/rootfs/etc/s6-overlay/scripts/otbr-agent-common
Declare otbr_forward_nat64_chain="OTBR_FORWARD_NAT64" and otbr_fw_mark="0x1001".
CI lint step
.github/workflows/lint.yml
Moved SHELLCHECK_OPTS from workflow top-level into the ShellCheck job step and added -x tracing flag.

Sequence Diagram(s)

sequenceDiagram
    participant Run as otbr-agent (run)
    participant IPT as iptables/nftables
    participant Conn as conntrack/kernel
    participant Finish as otbr-agent (finish)

    Run->>IPT: create OTBR_FORWARD_INGRESS / EGRESS chains\ncreate OTBR_FORWARD_NAT64 chain\nadd mangle PREROUTING mark (mark=${otbr_fw_mark})\nadd nat POSTROUTING MASQUERADE for mark\ninsert FORWARD -> OTBR_FORWARD_NAT64 jump
    IPT->>Conn: mark packets / update conntrack state
    Conn-->>IPT: established/related recognized
    Finish->>IPT: remove PREROUTING mark rules\nremove POSTROUTING MASQUERADE\nremove FORWARD jumps\nflush & delete OTBR_FORWARD_NAT64
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: improving firewall and NAT64 rules and enabling RCP recovery for beta, directly reflecting the PR objectives.
Linked Issues check ✅ Passed The PR addresses all requirements from issue #4488: removes NAT64 iptables rules in finish script, implements loop-check-and-delete logic, and prevents duplicate rules on restart.
Out of Scope Changes check ✅ Passed All changes directly support the PR objectives: NAT64 cleanup, firewall rule narrowing, shellcheck fixes, RCP recovery enablement, and changelog/version updates are all in-scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch improve-firewall-and-nat64-rules-and-enable-recovery-in-beta
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run (1)

106-106: Inconsistent variable quoting.

${thread_if} is unquoted here while all other variable references in this file use quotes (e.g., "${thread_if}" on lines 104, 113, 124, 136). This should be quoted for consistency and to prevent word splitting issues.

Suggested fix
-    ip6tables -A "${otbr_forward_ingress_chain}" -m pkttype --pkt-type unicast -i ${thread_if} -j DROP
+    ip6tables -A "${otbr_forward_ingress_chain}" -m pkttype --pkt-type unicast -i "${thread_if}" -j DROP
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run` at
line 106, The ip6tables invocation uses an unquoted variable ${thread_if}
causing inconsistent quoting and potential word-splitting; update the ip6tables
command that references otbr_forward_ingress_chain and thread_if so the
interface variable is quoted (use "${thread_if}" where currently ${thread_if}
appears) to match other lines and avoid splitting issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openthread_border_router/CHANGELOG.md`:
- Line 6: The changelog entry "Enable recovery mechanism from \"radio tx
timeout\" errors for beta" conflicts with the Dockerfile change that adds
-DOT_RCP_RESTORATION_MAX_COUNT=2 to both otbr-beta-builder and
otbr-stable-builder; decide whether the flag is intended for beta only or for
both. If beta-only, remove the -DOT_RCP_RESTORATION_MAX_COUNT=2 addition from
the otbr-stable-builder section in the Dockerfile; if intended for both, update
the CHANGELOG.md entry to reference both builds (or remove the "for beta"
qualifier) so it matches the presence of -DOT_RCP_RESTORATION_MAX_COUNT=2 in
otbr-beta-builder and otbr-stable-builder.

---

Nitpick comments:
In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run`:
- Line 106: The ip6tables invocation uses an unquoted variable ${thread_if}
causing inconsistent quoting and potential word-splitting; update the ip6tables
command that references otbr_forward_ingress_chain and thread_if so the
interface variable is quoted (use "${thread_if}" where currently ${thread_if}
appears) to match other lines and avoid splitting issues.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22a61be7-1691-4d73-bcaf-7cef81fd5ef0

📥 Commits

Reviewing files that changed from the base of the PR and between cb90eb8 and a9bf96f.

📒 Files selected for processing (5)
  • openthread_border_router/CHANGELOG.md
  • openthread_border_router/Dockerfile
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run
  • openthread_border_router/rootfs/etc/s6-overlay/scripts/otbr-agent-common

@agners agners force-pushed the improve-firewall-and-nat64-rules-and-enable-recovery-in-beta branch from 465c963 to 755b1d0 Compare March 18, 2026 21:21
Use ${} and quoting for variables throuought, make sure the common file
is found by shellcheck. This allows to remove all shellcheck exceptions.
@agners agners force-pushed the improve-firewall-and-nat64-rules-and-enable-recovery-in-beta branch 2 times, most recently from 7d3304f to 069ecff Compare March 18, 2026 21:35
agners added 3 commits March 18, 2026 22:44
Replace the broad FORWARD rules that accepted all traffic on the
infra interface with a dedicated OTBR_FORWARD_NAT64 chain. Outbound
Thread traffic is identified by fwmark instead of interface name,
and return traffic is scoped to established connections only.

Add proper teardown in finish to prevent rule accumulation on restart.
@agners agners force-pushed the improve-firewall-and-nat64-rules-and-enable-recovery-in-beta branch from 069ecff to c013c51 Compare March 18, 2026 21:44
@agners
Copy link
Copy Markdown
Member Author

agners commented Mar 18, 2026

Tested with a tado Thermostat X Matter over Thread device, the device appears on the tado App (which connects through the cloud to the device).

@agners agners marked this pull request as ready for review March 18, 2026 22:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run`:
- Around line 133-136: The iptables rule that accepts forwarded NAT64 traffic
(the rule using otbr_forward_nat64_chain, otbr_fw_mark and backbone_if) is too
broad because it lacks an input interface check; restrict it by adding an input
interface match using thread_if (i.e., include -i "${thread_if}") so the accept
only applies to packets coming from the Thread interface to backbone_if, keeping
other marked traffic from other origins from matching.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 458ebda9-668a-4a76-859b-0485ccda5535

📥 Commits

Reviewing files that changed from the base of the PR and between a9bf96f and c013c51.

📒 Files selected for processing (5)
  • .github/workflows/lint.yml
  • openthread_border_router/CHANGELOG.md
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run
  • openthread_border_router/rootfs/etc/s6-overlay/scripts/otbr-agent-common
🚧 Files skipped from review as they are similar to previous changes (1)
  • openthread_border_router/CHANGELOG.md

The non-firewall configuration simply changed the IPv6 default forward
policy to forward, making the host a IPv6 router. This is anyways the
case on HAOS since we enable IPv6 forwarding, which makes Docker by
default leave the default ACCEPT policy on the FORWARD chain.

But rather than messing with the default policy, let OTBR just add
explicit forward rules for the interfaces we need routing for. On
exit also cleanup those rules.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run (1)

132-135: ⚠️ Potential issue | 🟠 Major

Still scope the outbound NAT64 allow rule to thread_if.

Line 133 accepts any packet that already carries 0x1001 as long as it exits ${backbone_if}. Add the ingress match so this bypass stays limited to Thread traffic.

🛠️ Suggested fix
-    iptables -A "${otbr_forward_nat64_chain}" -m mark --mark "${otbr_fw_mark}" -o "${backbone_if}" -j ACCEPT
+    iptables -A "${otbr_forward_nat64_chain}" -i "${thread_if}" -m mark --mark "${otbr_fw_mark}" -o "${backbone_if}" -j ACCEPT
In Linux iptables, is the skb mark matched by `-m mark --mark VALUE` global within a network namespace, and can a rule matching only `-m mark --mark VALUE -o IFACE` accept packets from any ingress interface that already carries that mark?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run` around
lines 132 - 135, The iptables rule that accepts marked packets on
"${otbr_forward_nat64_chain}" uses only -o "${backbone_if}" and thus lets any
packet with mark "${otbr_fw_mark}" exit via "${backbone_if}"; update that rule
to also match ingress interface "-i \"${thread_if}\"" so the accept is scoped to
Thread-originated traffic (modify the rule referencing otbr_forward_nat64_chain,
otbr_fw_mark, backbone_if, and thread_if accordingly and keep its order relative
to the conntrack ESTABLISHED,RELATED rule).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish`:
- Around line 54-61: finish currently does not load backbone_if before the
ip6tables delete loops, so the FORWARD rules targeting the backbone interface
are never removed; modify finish to read/initialize backbone_if from the
persisted state (e.g. /tmp/otbr-backbone-if) before the delete loops, wrap both
ip6tables -C/-D FORWARD loops that reference backbone_if and thread_if in a
guard that checks [ -n "${backbone_if}" ], and after successfully removing rules
remove the persisted file (rm -f /tmp/otbr-backbone-if) so subsequent restarts
don’t reuse a stale value.

---

Duplicate comments:
In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run`:
- Around line 132-135: The iptables rule that accepts marked packets on
"${otbr_forward_nat64_chain}" uses only -o "${backbone_if}" and thus lets any
packet with mark "${otbr_fw_mark}" exit via "${backbone_if}"; update that rule
to also match ingress interface "-i \"${thread_if}\"" so the accept is scoped to
Thread-originated traffic (modify the rule referencing otbr_forward_nat64_chain,
otbr_fw_mark, backbone_if, and thread_if accordingly and keep its order relative
to the conntrack ESTABLISHED,RELATED rule).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d01131e7-e5ae-4c4e-9367-a3e640fa788c

📥 Commits

Reviewing files that changed from the base of the PR and between fb7e361 and f710397.

📒 Files selected for processing (3)
  • openthread_border_router/CHANGELOG.md
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish
  • openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run

Comment on lines +54 to +61
# Cleanup non-firewall rules
while ip6tables -C FORWARD -i "${thread_if}" -o "${backbone_if}" -j ACCEPT 2>/dev/null; do
ip6tables -D FORWARD -i "${thread_if}" -o "${backbone_if}" -j ACCEPT
done

while ip6tables -C FORWARD -i "${backbone_if}" -o "${thread_if}" -j ACCEPT 2>/dev/null; do
ip6tables -D FORWARD -i "${backbone_if}" -o "${thread_if}" -j ACCEPT
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist backbone_if so teardown can remove these FORWARD rules.

finish never initializes backbone_if, so Line 55 and Line 59 end up matching an empty interface and these accepts survive every restart.

🛠️ Suggested fix
# openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/run
+printf '%s\n' "${backbone_if}" > /tmp/otbr-backbone-if
# openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish
+backbone_if=""
+if [ -f /tmp/otbr-backbone-if ]; then
+    backbone_if="$(< /tmp/otbr-backbone-if)"
+fi

Wrap the delete loops in an [ -n "${backbone_if}" ] guard and rm -f /tmp/otbr-backbone-if after cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openthread_border_router/rootfs/etc/s6-overlay/s6-rc.d/otbr-agent/finish`
around lines 54 - 61, finish currently does not load backbone_if before the
ip6tables delete loops, so the FORWARD rules targeting the backbone interface
are never removed; modify finish to read/initialize backbone_if from the
persisted state (e.g. /tmp/otbr-backbone-if) before the delete loops, wrap both
ip6tables -C/-D FORWARD loops that reference backbone_if and thread_if in a
guard that checks [ -n "${backbone_if}" ], and after successfully removing rules
remove the persisted file (rm -f /tmp/otbr-backbone-if) so subsequent restarts
don’t reuse a stale value.

Use existing igress/egress tables to generate "non-firewall" rules. This
simplifies teardown.

Also don't restrict forwarding to the backbone interface. We didn't do
that before, and there might be usecases where routing in other networks
is required. Also, backbone_if is not available in the finish script, so
this sidesteps that.
@agners agners force-pushed the improve-firewall-and-nat64-rules-and-enable-recovery-in-beta branch from 4c6cfcd to 77fcdc0 Compare March 19, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTBR finish script does not clean up NAT64 iptables rules, causing duplicates on restart

1 participant