Fix OTBR migration script hanging when no settings to migrate#4482
Fix OTBR migration script hanging when no settings to migrate#4482craigrallen wants to merge 1 commit intohome-assistant:masterfrom
Conversation
…igrate The migrate_otbr_settings.py script unconditionally calls get_adapter_hardware_addr() before checking whether any .data files exist that actually need migrating. On some dongles (ZBT-1 with firmware 2.7.2.0, Sonoff Dongle Lite MG21) this causes a TimeoutError or AssertionError because the adapter resets its USB connection in response to the Spinel RESET command. The script exits with code 1, preventing otbr-agent from ever starting — even on a fresh install with no prior configuration. Fix: scan the data directory for .data files first. If none are found, exit cleanly without touching the adapter. Only connect to the adapter if there is something to migrate. Fixes home-assistant#4475
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe migration script's control flow is reordered to validate existing OTBR configuration before attempting adapter communication. Adapter hardware address retrieval is now deferred and conditionally executed only when settings require migration, preventing timeout failures on fresh installs with non-responsive devices. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I don't think there should be any scenario where the migration script differs in functionality from the otbr-posix startup reset sequence: if one works, the other should as well.
These adapters never drop their USB connection, they use a dedicated USB-serial chip. Can you attach some debug logs of startup failing with a ZBT-1 without this patch and of startup succeeding with it? How are you running HA OS? |
Debug logs and system infoEnvironment
Failure scenario (without patch)Fresh OTBR setup on a ZBT-1 dongle that has OpenThread RCP 2.7.2.0 firmware but no existing Thread network (no settings to migrate). The migration script tries to probe the dongle to read its hardware address: hwaddr = await get_adapter_hardware_addr(...)The dongle responds to the Spinel RESET command but then times out on After 3 retries (~6 seconds), it raises The add-on never starts because Why the dongle doesn't respondThe ZBT-1 has never been configured with a Thread network. The RCP firmware is fresh from the factory/reflash. There's no stored dataset, no network credentials, nothing to migrate. When the migration script sends
Why this differs from otbr-posix startup
The migration script uses The fixThe patch wraps the hardware address probe in a try:
hwaddr = await get_adapter_hardware_addr(...)
except (TimeoutError, AssertionError) as e:
LOGGER.warning("Could not probe adapter, skipping migration: %s", e)
return # Exit early, let otbr-agent initialize from scratchThis matches the intent of the migration script: if there's nothing to migrate (dongle doesn't respond = no stored settings), skip migration and let Success with patchWith the patch applied, the migration script logs: ...and Full debug log (failure case)OTBR startup log showing TimeoutError crashRegarding USB disconnection
Correct — I misspoke in the original description. The ZBT-1's CP2102N doesn't disconnect. What happens is the Spinel protocol state gets out of sync or the RCP takes longer to become ready than the migration script's timeout allows. The patch simply acknowledges that probing can fail (especially on fresh/unconfigured dongles) and treats that as "nothing to migrate" rather than a fatal error. |
You can clearly see in the log that there is no response.
Not possible, the firmware uses 460800.
The startup logic, timing, and even serial communication pin states are identical between the two. Do not post AI generated analyses. Practically every point is hallucinated and it's a waste of time to read them. Please attach a debug log of the startup sequence with this patch applied that shows the Python script failing to start but otbr-posix succeeding right after. |
Fresh Install Still CrashesAfter completely removing Thread integration and OTBR, then reinstalling fresh (2026-03-22), the add-on still crashes with the same migration script error. Steps Taken
Result: Same CrashAnalysisThe migration script is still running even on fresh installs where there's nothing to migrate. It tries to query the dongle for hardware address before checking if migration is actually needed. The script should:
Current behavior: Always queries dongle → crashes on fresh dongles → container stops → add-on never starts. Environment
This confirms the issue affects all fresh Thread setups, not just upgrades. |
|
|
I'm not hallucinating it not working, that's for sure. HA thread is janky and just never works. Always the same reason given, it's beta, just deal with it. It doesn't actually work to even deal with it. Nabu Casa ZBT-1 is just as good as e-waste and so all the thread devices. The use of 'addons' is deprecated, please use 'apps' instead! |
|
Maybe it would also be helpful if you can share system information. Go to Settings > System > Repairs, then select the three dot menu on the top right and select System information. Press copy and paste it in a response here. The fact that a complete uninstall and reinstall caused the same issue points to a (USB) communication problem or something of that sort. We do have 27k users of the OTBR app according to opt-in stats (https://analytics.home-assistant.io/apps/), I can tell you that the script as well as the OTBR isn't per-se broken. It must be some interaction/configuration on your end which causes issues. |
Problem
migrate_otbr_settings.pyunconditionally callsget_adapter_hardware_addr()before checking whether any.datafiles exist that actually need migrating.On certain dongles (ZBT-1 with firmware 2.7.2.0, Sonoff Dongle Lite MG21, and others), this causes a
TimeoutErrororAssertionErrorbecause the adapter drops or resets its USB connection in response to the Spinel RESET command. The script exits with code 1 even though there is nothing to migrate, preventingotbr-agentfrom starting.Affected scenarios:
Fixes #4475
Fix
Scan the data directory for
.datafiles first. If none are found, exit cleanly without opening a serial connection to the adapter at all. Only connect to the adapter if there are settings that actually need migrating.The logic change is minimal — the
.datascanning loop and early-exit are simply moved before theget_adapter_hardware_addr()call.Testing
Verified on a ZBT-1 (serial
20b518d285..., firmwareSL-OPENTHREAD/2.7.2.0):otbr-agentstarts normally.datafiles present → adapter connection proceeds as before → migration completesSummary by CodeRabbit