You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On macOS Tahoe 26.3.1 with ZeroTier 1.16.1, on a NATed corporate LAN where this exact setup worked fine until ~7 days ago, LEAF peer discovery never completes. zerotier-cli listpeers shows LEAFs with path: -, latency: -1; /peer/<addr> returns paths: [], version: -1.-1.-1. The daemon also writes negative "no path" entries to peers.d/<ztaddr>.peer that survive every soft remedy (service restart, leave/join, uplink change). The forceTcpRelay workaround does establish a TCP connection to 204.80.128.1:443 but only sometimes actually forwards traffic.
Possibly related to #2584 (same OS major, same ZT version, also involves coexisting with another VPN), but the symptom and code path are different.
ZeroTier:1.16.1, installed from official .pkg. Binary mtime Dec 12 20:11:57 2025 — unchanged for ~5 months.
Network: Private, default flow rules (accept;), 5 members, all authorized and active < 1 min on Central throughout the issue.
Coexisting VPN (during initial diagnosis): Homebrew OpenVPN /opt/homebrew/sbin/openvpn using dev tun (classic tun driver, not NetworkExtension / NEPacketTunnelProvider). Split-tunnel (no redirect-gateway). Pushes specific 10.x routes only.
Timing — almost certainly a recent macOS regression
This exact environment (same Mac, same network, same ZT install, same OpenVPN config) worked fine until roughly May 5–6, 2026. Today is May 12.
log show --predicate 'subsystem == "com.apple.SoftwareUpdate"' --info --last 14d
…shows substantial OS-update activity on 2026-05-05 and 2026-05-06. ZeroTier itself was not updated in that window (stat of zerotier-one confirms Dec 12 20:11:57 2025). Strong correlation to a macOS Tahoe dot-release rather than to ZT.
Symptoms
1. LEAF peer paths never form on the corporate LAN
zerotier-cli info → ONLINE
zerotier-cli listnetworks → OK PRIVATE feth3618 10.147.18.235/24
Control plane to the network controller and several PLANETs works (with degraded freshness — lastReceive values of 100+ seconds for some PLANETs on the same uplink that's perfectly fine for everything else).
After timeout, the daemon prunes the LEAF entries entirely from listpeers.
Kernel route for the peer's ZT IP picks up the REJECT flag (<UP,HOST,REJECT,...>) on the feth interface, and ping returns No route to host / Host is down.
This persists across:
launchctl kickstart -k system/com.zerotier.one
Killing OpenVPN entirely + waiting for utunX interfaces to drop
zerotier-cli leave 35c192ce9b9531d1 + zerotier-cli join (membership reverifies OK, identity preserved, still no LEAF paths)
Changing primary uplink while ZT is running
2. peers.d/ cache poisoning
Pre-wipe peers.d/ listing (during the failure window):
35c192ce9b.peer May 12 16:33 82B ← controller (was being refreshed)
778cde7190.peer May 12 16:33 96B ← PLANET (refreshed)
cafe04eba9.peer May 12 16:33 117B ← PLANET (refreshed, with full path)
cafe80ed74.peer May 12 15:46 89B ← PLANET (refreshed)
cafefd6717.peer May 12 16:33 96B ← PLANET (refreshed)
98da488706.peer May 12 15:28 82B ← LEAF — frozen at daemon's first-start mtime
d624836ca0.peer May 12 15:28 82B ← LEAF — frozen
e751b8120a.peer May 12 15:28 82B ← LEAF — frozen
ef6f1e242a.peer May 12 16:00 82B
62f865ae71.peer May 12 15:27 82B ← orphan, not a member of any current network
cafe9efeb9.peer May 12 15:27 82B ← orphan PLANET
LEAF .peer files stay at the minimal 82-byte "identity-only, no path" size and at the daemon's first-start mtime for the entire failure window, while PLANET files are being rewritten every few minutes. Suggests the daemon writes a negative entry on the initial WHOIS-timeout and then doesn't re-attempt WHOIS for those peers on subsequent ticks. Two orphan entries are also present for peers no longer on this network.
But: this only fully fixes things on a network where peer discovery can actually complete. On the broken corporate LAN, the cache wipe is followed by a fresh round of failed WHOIS, the LEAFs are pruned again, and we're back where we started. The cache wipe is therefore necessary but not sufficient.
…and an independent nc -vz 204.80.128.1 443 succeeds. But the daemon shows info: TUNNELED, listnetworks: OK (netconf came through), and only the 4 PLANETs in listpeers with path: -, latency: -1 — no controller, no LEAFs — for 2+ minutes. The first attempt of the day after a fresh reboot can converge in ~75 s; subsequent attempts on the same uplink frequently need ~120–180 s of continuous outbound peer traffic (ICMP, application packets) before LEAFs and controller suddenly populate in listpeers with real paths.
Concretely measured today: 12 rounds of ping separated by 15 s sleeps; peer paths first appeared at round 9 (~2 minutes in), then stable from then on. Sample populated state:
Bidirectional ping then works at ~290 ms RTT, no loss. So the TCP relay path is functionally fine; it's the convergence latency that's surprising — and during those 2 minutes there's no signal at all in info / listpeers / API responses that progress is being made, so it looks identical to a stuck state.
Suggested daemon improvement: either expose a "TCP relay handshake/WHOIS retry" counter in /peer or /status so operators can tell convergence-in-progress from stuck, or shorten the WHOIS retry interval over the TCP fallback path.
What is not the cause
OpenVPN. Killing it (and removing all utunX interfaces it created) does not change the symptom.
NetworkExtension VPN. None is installed. The only NE-style extension on the system is Karabiner-DriverKit (keyboard).
Endpoint security / EDR / packet inspectors. None present. systemextensionsctl list shows only Karabiner. kmutil showloaded has only stock Apple kexts. No CrowdStrike, SentinelOne, Cisco, Netskope, Zscaler, GlobalProtect, etc.
pf / macOS Application Firewall. Both Disabled.
Corporate firewall. Same symptom reproduces on a totally different uplink (iPhone cellular hotspot — 172.20.10.x NAT, no corporate gear in path). And TCP/443 from corporate to 204.80.128.1 succeeds; only the data plane through the relay is unreliable.
Membership / authorization. Central shows all peers as active < 1 min continuously. Network status is OK. leave/join cycle works, returns to OK, doesn't fix peers.
Flow rules. Default (just accept;).
Moons. None configured.
Working workaround (use both steps)
Switch uplink to a less restrictive network (iPhone hotspot is fine).
On a network where UDP peer discovery works (e.g. cellular), this immediately restores connectivity (via PLANET relay first, then direct P2P paths form shortly after) and listpeers populates fully with real paths, latency, and versions. Switching back to the corporate LAN breaks it again, and the cycle repeats.
Hypothesis
Three failure modes that probably share an upstream cause:
macOS Tahoe 26.3.x changed something (kernel networking, NE framework, packet-filter ordering, route-installation behavior, or similar) that makes ZT's UDP-based peer discovery / NAT-traversal fail far more often than it used to on certain networks. The same network was unproblematic 7 days ago.
ZeroTier 1.16.1 writes negative peer-cache entries on a single WHOIS timeout and never re-attempts WHOIS for those peers until the file is deleted. This turns transient discovery failures into permanent ones. The user-facing symptoms persist across kickstart, leave/join, and uplink changes.
ZeroTier's TCP fallback relay (forceTcpRelay) has at least one state where the outbound TCP socket to 204.80.128.1:443 is ESTABLISHED but the daemon does not actually pump control-plane / peer traffic through it. Identical config produces a working session one run and a broken session the next, with the same network and Mac state.
(1) is the macOS-side regression. (2) and (3) are robustness bugs in ZT that amplify it. Splitting (2) and (3) into separate issues if the maintainers prefer is fine.
Here: peer paths never form (paths: [], version: -1.-1.-1); no NE-based VPN involved; coexisting OpenVPN is irrelevant (kills don't help); persists across uplink changes; only peers.d/ wipe on a non-broken network restores connectivity, until you return to the broken network.
Plausibly both are downstream of the same macOS-26 networking change, manifesting in different ZT code paths.
Data I can provide
Happy to attach sysdiagnose, tcpdump captures during failure, zerotier-cli peer API dumps before/after peers.d/ wipe, full lsof output of zerotier-one in both working and broken states, or anything else useful. Reproducible reliably on my setup just by switching between the corporate LAN and cellular hotspot.
TL;DR
On macOS Tahoe 26.3.1 with ZeroTier 1.16.1, on a NATed corporate LAN where this exact setup worked fine until ~7 days ago, LEAF peer discovery never completes.
zerotier-cli listpeersshows LEAFs withpath: -,latency: -1;/peer/<addr>returnspaths: [],version: -1.-1.-1. The daemon also writes negative "no path" entries topeers.d/<ztaddr>.peerthat survive every soft remedy (service restart,leave/join, uplink change). TheforceTcpRelayworkaround does establish a TCP connection to204.80.128.1:443but only sometimes actually forwards traffic.Possibly related to #2584 (same OS major, same ZT version, also involves coexisting with another VPN), but the symptom and code path are different.
Environment
25D771280a(Apple Silicon, MacBook Pro).pkg. Binary mtimeDec 12 20:11:57 2025— unchanged for ~5 months.accept;), 5 members, all authorized andactive < 1 minon Central throughout the issue./opt/homebrew/sbin/openvpnusingdev tun(classic tun driver, not NetworkExtension / NEPacketTunnelProvider). Split-tunnel (noredirect-gateway). Pushes specific10.xroutes only.Timing — almost certainly a recent macOS regression
This exact environment (same Mac, same network, same ZT install, same OpenVPN config) worked fine until roughly May 5–6, 2026. Today is May 12.
…shows substantial OS-update activity on
2026-05-05and2026-05-06. ZeroTier itself was not updated in that window (statofzerotier-oneconfirmsDec 12 20:11:57 2025). Strong correlation to a macOS Tahoe dot-release rather than to ZT.Symptoms
1. LEAF peer paths never form on the corporate LAN
zerotier-cli info→ONLINEzerotier-cli listnetworks→OK PRIVATE feth3618 10.147.18.235/24lastReceivevalues of 100+ seconds for some PLANETs on the same uplink that's perfectly fine for everything else).listpeers:/peer/<ztaddr>returns:{ "address": "98da488706", "paths": [], "version": "-1.-1.-1", "latency": -1, "role": "LEAF" }listpeers.REJECTflag (<UP,HOST,REJECT,...>) on thefethinterface, andpingreturnsNo route to host/Host is down.This persists across:
launchctl kickstart -k system/com.zerotier.oneutunXinterfaces to dropzerotier-cli leave 35c192ce9b9531d1+zerotier-cli join(membership reverifiesOK, identity preserved, still no LEAF paths)2.
peers.d/cache poisoningPre-wipe
peers.d/listing (during the failure window):LEAF
.peerfiles stay at the minimal 82-byte "identity-only, no path" size and at the daemon's first-start mtime for the entire failure window, while PLANET files are being rewritten every few minutes. Suggests the daemon writes a negative entry on the initial WHOIS-timeout and then doesn't re-attempt WHOIS for those peers on subsequent ticks. Two orphan entries are also present for peers no longer on this network.Resolution:
But: this only fully fixes things on a network where peer discovery can actually complete. On the broken corporate LAN, the cache wipe is followed by a fresh round of failed WHOIS, the LEAFs are pruned again, and we're back where we started. The cache wipe is therefore necessary but not sufficient.
3.
forceTcpRelayworks but converges very slowlyConfig:
After full reset (stop daemon, wipe
peers.d/, writelocal.conf, bootstrap), the TCP relay socket isESTABLISHEDimmediately:…and an independent
nc -vz 204.80.128.1 443succeeds. But the daemon showsinfo: TUNNELED,listnetworks: OK(netconf came through), and only the 4 PLANETs inlistpeerswithpath: -, latency: -1— no controller, no LEAFs — for 2+ minutes. The first attempt of the day after a fresh reboot can converge in ~75 s; subsequent attempts on the same uplink frequently need ~120–180 s of continuous outbound peer traffic (ICMP, application packets) before LEAFs and controller suddenly populate inlistpeerswith real paths.Concretely measured today: 12 rounds of
pingseparated by 15 s sleeps; peer paths first appeared at round 9 (~2 minutes in), then stable from then on. Sample populated state:Bidirectional ping then works at ~290 ms RTT, no loss. So the TCP relay path is functionally fine; it's the convergence latency that's surprising — and during those 2 minutes there's no signal at all in
info/listpeers/ API responses that progress is being made, so it looks identical to a stuck state.Suggested daemon improvement: either expose a "TCP relay handshake/WHOIS retry" counter in
/peeror/statusso operators can tell convergence-in-progress from stuck, or shorten the WHOIS retry interval over the TCP fallback path.What is not the cause
utunXinterfaces it created) does not change the symptom.systemextensionsctl listshows only Karabiner.kmutil showloadedhas only stock Apple kexts. No CrowdStrike, SentinelOne, Cisco, Netskope, Zscaler, GlobalProtect, etc.Disabled.172.20.10.xNAT, no corporate gear in path). And TCP/443 from corporate to204.80.128.1succeeds; only the data plane through the relay is unreliable.active < 1 mincontinuously. Network status isOK.leave/joincycle works, returns toOK, doesn't fix peers.accept;).Working workaround (use both steps)
peers.d/:On a network where UDP peer discovery works (e.g. cellular), this immediately restores connectivity (via PLANET relay first, then direct P2P paths form shortly after) and
listpeerspopulates fully with real paths, latency, and versions. Switching back to the corporate LAN breaks it again, and the cycle repeats.Hypothesis
Three failure modes that probably share an upstream cause:
kickstart,leave/join, and uplink changes.forceTcpRelay) has at least one state where the outbound TCP socket to204.80.128.1:443isESTABLISHEDbut the daemon does not actually pump control-plane / peer traffic through it. Identical config produces a working session one run and a broken session the next, with the same network and Mac state.(1) is the macOS-side regression. (2) and (3) are robustness bugs in ZT that amplify it. Splitting (2) and (3) into separate issues if the maintainers prefer is fine.
Relationship to #2584
NEPacketTunnelProviderin full-tunnel mode); peer paths do form there (DIRECT/RELAYvisible); outbound packets leave; inbound packets are dropped by NE. Disconnecting Teleport resolves it instantly.paths: [],version: -1.-1.-1); no NE-based VPN involved; coexisting OpenVPN is irrelevant (kills don't help); persists across uplink changes; onlypeers.d/wipe on a non-broken network restores connectivity, until you return to the broken network.Plausibly both are downstream of the same macOS-26 networking change, manifesting in different ZT code paths.
Data I can provide
Happy to attach
sysdiagnose,tcpdumpcaptures during failure,zerotier-cli peerAPI dumps before/afterpeers.d/wipe, fulllsofoutput ofzerotier-onein both working and broken states, or anything else useful. Reproducible reliably on my setup just by switching between the corporate LAN and cellular hotspot.