otbr: Automatically migrate OTBR settings to last-good backup#4417
otbr: Automatically migrate OTBR settings to last-good backup#4417puddly wants to merge 3 commits intohome-assistant:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds automatic recovery functionality to the OTBR (OpenThread Border Router) settings migration script to handle corrupted data files. The PR addresses an issue where corrupted 8-byte datasets were being loaded by the migration script, causing TLV parsing failures and breaking Thread network functionality.
Changes:
- Enhanced error handling in the migration script by replacing assertions with explicit ValueError exceptions
- Added validation logic to detect corrupted OTBR settings files (active dataset < 32 bytes)
- Implemented automatic recovery mechanism that attempts to restore from backup files when corruption is detected
- Added unit tests using real test data to verify the migration script's behavior
- Replaced print statements with proper logging throughout the migration script
Reviewed changes
Copilot reviewed 2 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openthread_border_router/rootfs/usr/local/bin/migrate_otbr_settings.py | Core migration script with enhanced validation, recovery logic, refactored helper functions, and logging |
| openthread_border_router/tests/test_migration.py | Unit tests for migration script covering both normal and corrupted settings scenarios |
| openthread_border_router/tests/data/otbr_settings_complex_running/* | Test data files representing a working OTBR configuration with multiple adapters and backups |
| openthread_border_router/tests/data/otbr_settings_broken/* | Test data files representing a corrupted configuration (incomplete - missing required files) |
| openthread_border_router/.gitignore | Standard Python project ignore patterns for cache and system files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| sys.path.append(str(Path(__file__).parent.parent / "rootfs/usr/local/bin")) | ||
|
|
||
| import pytest | ||
| import shutil | ||
| from unittest.mock import patch | ||
| from migrate_otbr_settings import main, parse_otbr_settings, is_valid_otbr_settings_file | ||
|
|
||
|
|
There was a problem hiding this comment.
The sys.path manipulation on line 4 modifies the global Python path for the entire test session. This can cause issues if multiple tests run in the same session and have conflicting module names. Consider using a more isolated approach such as importlib or pytest fixtures that handle the path manipulation within the test scope only.
| sys.path.append(str(Path(__file__).parent.parent / "rootfs/usr/local/bin")) | |
| import pytest | |
| import shutil | |
| from unittest.mock import patch | |
| from migrate_otbr_settings import main, parse_otbr_settings, is_valid_otbr_settings_file | |
| import importlib.util | |
| import pytest | |
| import shutil | |
| from unittest.mock import patch | |
| _MIGRATE_OTBR_SETTINGS_PATH = ( | |
| Path(__file__).parent.parent / "rootfs/usr/local/bin" / "migrate_otbr_settings.py" | |
| ) | |
| _spec = importlib.util.spec_from_file_location( | |
| "migrate_otbr_settings", _MIGRATE_OTBR_SETTINGS_PATH | |
| ) | |
| if _spec is None or _spec.loader is None: | |
| raise ImportError(f"Cannot load migrate_otbr_settings from {_MIGRATE_OTBR_SETTINGS_PATH}") | |
| _migrate_otbr_settings = importlib.util.module_from_spec(_spec) | |
| _spec.loader.exec_module(_migrate_otbr_settings) # type: ignore[union-attr] | |
| main = _migrate_otbr_settings.main | |
| parse_otbr_settings = _migrate_otbr_settings.parse_otbr_settings | |
| is_valid_otbr_settings_file = _migrate_otbr_settings.is_valid_otbr_settings_file |
agners
left a comment
There was a problem hiding this comment.
Generally looks good, I wonder if we have enough such cases which makes it worthwhile to add this complexity 🤔
Not sure if the "Reset border router" command via Thread configuration in HA is possible in this situation, if not users could also just uninstall the add-on and reinstall it to clear the internal settings, then let the OTBR integration setup a "new" border router 🤔.
I guess we can also add this, and remove it after a couple releases to not have the long term maintenance burden of the complexity 🤷 .
| """Check if parsed settings represent a valid OTBR settings file.""" | ||
| return {OtbrSettingsKey.ACTIVE_DATASET} <= {key for key, _ in settings} | ||
| active_dataset = get_active_dataset(settings) | ||
| return active_dataset is not None and len(active_dataset) >= MIN_ACTIVE_DATASET_SIZE |
There was a problem hiding this comment.
I guess instead of going by length we could just test for some usually available tags 🤔
I'd go for these five, they are pretty much required:
- Network Key
- Network Name
- Extended PAN ID
- PAN ID
- Channel
Current TLV parsing issues in Core appear to be the result of the "bad"
tmpdata file being loaded by the migration script and passed through to OTBR. It manages to start up and broadcast the broken 8 byte dataset, which breaks TLV parsing in Core and leaves the Thread network broken.I've amended the migration script to run a data migration and revert to the last-known backup if we see that the current OTBR data file is invalid.
I'm not sure if this is acceptable but I've written a few unit tests using real data and test the migration script's behavior using them. These do not run as part of CI but they are in a
tests/folder. Is this fine to include in the addons repo?