DHCP-less/inband Bootz spec changes#316
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a DHCP-less (inband) operating mode for Bootz, updating the documentation to include new entry points, cleanup procedures, and a standardized CLI specification across Unary Bootz and BootstrapStream (v0.6 and v1.0) protocols. Review feedback highlights a numbering error in the Unary Bootz section and suggests consistent formatting for the cleanup steps across all protocol versions.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
|
||
| #### DHCP-less (inband) | ||
|
|
||
| In environments where DHCP is not available or out-of-band (OOB) management |
There was a problem hiding this comment.
If device need to do dhcp-less bootz if local config for dhcp-less is available on devcie, do we need the default option on device to set as "dhcp-less" option.
as per above definition the default behaviour is to use the dhcp based discovery.
- what should be the criteria for fall back to dhcp-less.
- "local configuration" (specific to dhcp-less config), is this the trigger tigger for using "dhcp-less", if config present always use "dhcp-less".
(or do the devcie need to run regular dhcp based discovery and if no success (max try), look for dhcp-less method)
Is this case(dhcp-less) applied only manually initiation, is this applied to below cases, if applied what will be the behaviour for following cases.
- factory reset
only dhcp based discovery is possible
- reload (both can be active)
assumed, in this case device will look for local persistent cfg, if present perform dhcp-less
- reimage
make sure persistent config is not removed
assume device should, use dhcp-less if local config present - manual initiate
devcie should follow the initiate commad, if no specific option given, use dhcp-less lookup as first method.
There was a problem hiding this comment.
I don't quite follow your comment so please let me know if I've misunderstood.
DHCP-less Bootz should only ever be attempted if someone has explicitly enabled it via the CLI. The implementation of this is up to the vendor but you can think of this as writing something to the filesystem that enables this feature. The existence (or lack of) startup config should have no effect on what Bootz mode is used.
To answer your questions:
- Factory reset
Yes, this would go straight to DHCP mode since an operator didn't run the DHCP-less CLI after the factory reset.
- reload (both can be active)
Device will look for persistent Bootz parameters and start dhcp-less Bootz if so. If they don't exist, don't enter Bootz loop.
- reimage
Assuming this means an in-place OS upgrade/downgrade without touching the configuration? Again, this would depend if the Bootz parameters have been persisted on the device. If not, then it would use the existing local config without entering Bootz.
- manual initiate
The CLI command is intended specifically for the DHCP-less method. For triggering DHCP Bootz, then a normal factory reset or config wipe + reload would work.
There was a problem hiding this comment.
DHCP-less Bootz — Exit Behavior Gap
Core Issue
DHCP-less bootz has no exit criteria other than success. Unlike regular bootz, the DHCP-less workflow bypasses the normal exit conditions (e.g., "exit if device is already provisioned"). Because the DHCP-less trigger is stored in persistent disk configuration, bootz will run forever in background — surviving reloads, reimages, and shutdown/boot cycles. The only way to stop it today is a factory reset or explicit user intervention (stop/remove dhcp-less config).
Existing Bootz Exit Behavior (Regular / DHCP-based)
Bootz exits under the following conditions today:
| Workflow | Exit Condition |
|---|---|
| Normal boot(reload/reimage) | Exits if device is already provisioned |
| bootz | On success (provisioning complete) |
| bootz | On failure — bootz does not exit, retries forever |
| Automated (DHCP-based) | Exits on first user creation (device is considered provisioned) |
| Manual bootz start(cli) | Exits on bootz terminate cli execution |
| Manual bootz start(cli) | Exits on reload, and do not start on boot if device is already configured |
Note: In normal cases, bootz will not be running or started automatically if the device is already provisioned (through manual provisioning or prior bootz provisioning).
DHCP-less Bootz
- Device already contains configuration (user/reachability config present).
- Whether to use dhcp-less or regular discovery is determined by the persistent configuration on disk.
- dhcp-less config is saved on disk, and once created will not get cleared on applying new xr-config/commit replace.
Steps to Initiate DHCP-less Bootz
- user already logged in to device (user configuration is present)
- added additional configuration (reachability)
- added dhcp-less config entry
- manually started using bootz cli/reload
What Happens After Reboot
- device will boot and bootz starts execution
- looks for "dhcp-less", as first steps (regular bootz workflow will exit if it detects user-name configuration)
- If "dhcp-less" is configured device will attempt "dhcp-less" bootz, and skips user config checks.
Scenario 1: Bootz fails to reach the server
- device is not able to complete bootz with server (could be any reason).
- bootz will continue to run in background, server will get status based on the type of failure.
Scenario 2: User logs in while bootz is still running (after continuous failures)
- user is logging in to device (through console/**).
- user will be prompted with an authentication window, and user will get access into the device.
- bootz will still be running in the background.
(Expecting the user to manually stop bootz or remove the dhcp-less configuration after logging in is not always feasible and is error-prone).
Impact
- If user is not explicitly stopping bootz, device will have bootz running in background.
- The bootz start (dhcp-less) is associated with a persistent configuration, (if present) bootz will get started even after reboot/reimage/shutdown and boot.
- If user attempted dhcp-less bootz, on failure then decides to provision device manually:
- bootz will run in background even after the device is provisioned.
- there are no exit cases for bootz other than success (which will alter the manually provisioned data).
- bootz will be continuously attempted on device, and it will survive reload/reimage/shut and boot.
- With stale dhcp-less data on disk, bootz will run forever in background — there is no dhcp-less workflow-specific exit criteria in automated workflow to exit bootz (except factory reset).
Summary
DHCP-less bootz has no graceful exit path other than success or factory reset. This means:
- Stale DHCP-less config on persistent disk = bootz runs indefinitely
- User manually provisioning after a failed DHCP-less attempt = bootz still running in background
- Bootz workflow survives/gets started even after reload/reimage/shutdown
Open Question: Should DHCP-less Bootz Require Explicit Preparation Steps?
Should the dhcp-less workflow require the following steps before initiation:
- bootz cleanup
- commit-replace/config removal
- apply required configuration for reachability
- dhcp-less command initiate (which will initiate bootz with a reload, or start without reload)
Note: Here User-name configuration is still an exit criteria for bootz. On device boot, even if bootz finds a dhcp-less configuration, if the device is configured with a user configuration, bootz will decide to exit.
-> or dhcp-less work flow require a max retry and exit.
-> or start the dhcp-less with a expiry.
There was a problem hiding this comment.
I don't think username configuration is a good exit criteria for Bootz. There are some cases where the pre-Bootz local configuration will want to have a local user configured and still have DHCP-less Bootz continue.
I do agree though that there should be some other way to prevent DHCP-less Bootz from running in the background indefinitely. Let me chat with a few of the openconfig maintainers and get back to you.
| 1. A network operator manually configures the device with a local | ||
| configuration that gives it reachability to the Bootz server. This | ||
| includes static routing, interface configuration and local admin | ||
| credentials. |
There was a problem hiding this comment.
If local admin credentials are already configured.
- what will be the exit criteria for bootz work, as it is going to run in backgroud till success.
- bootz work flow involves reload/reimage and the exit criteria is the detection of configuration, If devcie is already contains configuration bootz will assume device is already provisioned, otherwise will attempt bootz starting with dhcp discovery
(with new case will be one of dhcp discovery or dhcp-less).
If we have admin creds on device before starting bootz, if bootz workflow reloads the devcie, bootz run will go in background, even if the user logs in bootz will be still running in background, need explicit terminate from user.
There was a problem hiding this comment.
The exit criteria is actually that the device sends a final ReportStatus gRPC to the Bootz server which indicates that it has finished bootstrapping. As I mentioned above, this would mean that the presence of a local config on the device doesn't determine whether Bootz is started or not. The persisted Bootz parameters is authoritative here.
| credentials. | ||
| 2. A network operator triggers DHCP-less Bootz via CLI, providing the | ||
| Bootz Server URI and Source Interface. These are saved to a | ||
| persistent parameters file on disk to survive reboots. |
There was a problem hiding this comment.
What will the scope and life time of this persistent data.
- If this data is present on device, every time when device gets a reload/reiame (manual), there is a risk of device starting dhcp-less workflow, as there is no other exit criteria (user cred is not an exit criteria now).
There was a problem hiding this comment.
These persistent Bootz parameters exist for the lifetime of the Bootstrapping process. They are deleted when Bootz finishes successfully or when the operator manually cancels DHCP-less mode. This is explained further down in the doc.
| 4. If the DHCP-less Bootz process fails at any point, the device MUST | ||
| revert back to the operator-provided local configuration and attempt | ||
| to connect to the Bootz server again. The device MUST remain in this | ||
| recovery loop until either: |
There was a problem hiding this comment.
on failure:
- device should fall back to initial configuration.
assume that device need to retry bootz again (dhcp-less mode).
There was a problem hiding this comment.
Yes, the device would keep trying dhcp-less mode on failure, but must revert to the previous local configuration (and not an empty factory-reset configuration).
| recovery loop until either: | ||
| 1. Bootz completes successfully | ||
| 2. The operator manually resets the device to standard DHCP mode | ||
| via the CLI, at which point Option A takes effect. |
There was a problem hiding this comment.
on this mode change(removing cfg/dhcp-less persistent config) do we need to exit current instance of bootz ?
- do we need autostart of bootz on mode chnage,
(or wait for user to trigger the again using manual reload/initiate command ?)
There was a problem hiding this comment.
If Bootz is successful, then you exit Bootz because it is not needed anymore.
If you manually cancel DHCP-less Bootz mode, you should stop the current Bootz attempt, and reloads into DHCP mode. See the sequence diagrams below.
No description provided.