Skip to content

Take shutdown inhibitor lock for graceful service teardown on external shutdown#6631

Draft
agners wants to merge 9 commits intomainfrom
take-shutdown-inhibit-lock-on-system-shutdown
Draft

Take shutdown inhibitor lock for graceful service teardown on external shutdown#6631
agners wants to merge 9 commits intomainfrom
take-shutdown-inhibit-lock-on-system-shutdown

Conversation

@agners
Copy link
Copy Markdown
Member

@agners agners commented Mar 13, 2026

Proposed change

When the host is shut down externally (e.g. ACPI power button, hypervisor shutdown command), Docker sends SIGTERM to Supervisor without any prior warning. This means managed containers (Home Assistant Core, add-ons, plugins) are killed abruptly without graceful shutdown.

This PR adds handling for logind's PrepareForShutdown signal so Supervisor can gracefully stop all services before the host proceeds with shutdown:

  1. On startup, Supervisor takes a delay inhibitor lock from logind via Inhibit()
  2. A background task listens for the PrepareForShutdown(true) signal
  3. When the signal fires (and Supervisor is in RUNNING state), it runs core.shutdown() to gracefully stop all managed containers
  4. The inhibitor lock is then released, allowing the host to proceed
  5. The monitor task is tracked and cleanly cancelled during Core.stop() stage 2

Key details:

  • Unix FD negotiation is enabled on the D-Bus message bus (negotiate_unix_fd=True) since Inhibit() returns a file descriptor
  • A state guard prevents double-shutdown when Supervisor itself initiates the shutdown via API
  • The monitor task reference is stored and cancelled cleanly via a new HostManager.unload() method

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds systemd-logind integration to gracefully stop Supervisor-managed services when the host is shutting down/rebooting, using a logind inhibitor lock and the PrepareForShutdown signal.

Changes:

  • Add logind D-Bus support for Inhibit() and PrepareForShutdown signal handling.
  • Start a background shutdown-monitor task in HostManager to trigger coresys.core.shutdown() when host shutdown is detected.
  • Extend/adjust tests and D-Bus service mocks to cover the new behavior, and enable UNIX FD negotiation on the system D-Bus connection.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/host/test_manager.py Adds end-to-end tests ensuring Supervisor shuts down (or ignores) when PrepareForShutdown is emitted.
tests/dbus/test_login.py Adds tests for Logind.inhibit() and prepare_for_shutdown() signal wrapper behavior.
tests/dbus_service_mocks/logind.py Extends logind mock with Inhibit (FD return) and PrepareForShutdown signal.
supervisor/host/manager.py Introduces shutdown monitor task that takes a logind inhibitor lock and listens for host shutdown signal.
supervisor/dbus/manager.py Enables UNIX FD negotiation on system D-Bus connection (needed for inhibitor FD passing).
supervisor/dbus/logind.py Adds logind inhibit + signal wrapper API.
supervisor/dbus/const.py Adds constant for logind PrepareForShutdown signal.
supervisor/core.py Ensures HostManager unload() is invoked during shutdown.

@agners
Copy link
Copy Markdown
Member Author

agners commented Mar 16, 2026

This is an overview of Supervisor shutdown/reboot. What is a bit counter intuitive is that Supervisor shutdown is really about all components (Core, plug-ins, apps) and a separate flow from Supervisor's own "shutdown", which is rather named stop.

This code really adds the External (ACPI/hypervisor) via logind PrepareForShutdown wrinkle, along with some improvements to the shutdown() function itself (making it re-entrant). This new path can call shutdown() pretty much at any state (we initialize the host monitor in SETUP and uninitialize during STOPPING, so shutdown() can happen in SETUP, STARTUP, RUNNING, SHUTDOWN, or STOPPING), so extra measures need to be taken to handle shutdown() calls from all states gracefully.

● ┌─────────────────────────────────────────────────────────────────────────┐
  │                        SHUTDOWN TRIGGERS                                │
  ├──────────────────┬────────────────────┬─────────────────────────────────┤
  │  API / UI        │  Backup Restore    │  External (ACPI/hypervisor)     │
  │                  │                    │                                 │
  │  host.control    │  backups.manager   │  logind PrepareForShutdown      │
  │  .shutdown()     │  .full_restore()   │  signal received by             │
  │  .reboot()       │                    │  host.manager monitor task      │
  │                  │                    │                                 │
  │  state: RUNNING  │  state: FREEZE     │  state: any                     │
  └────────┬─────────┴─────────┬──────────┴────────────────┬────────────────┘
           │                   │                           │
           ▼                   ▼                           ▼
      ┌─────────────────────────────────────────────────────────────────┐
      │                    core.shutdown()                              │
      │                                                                 │
      │  ┌───────────────────────────────────────────────────────────┐  │
      │  │ Guard 1: state in (SETUP, STARTUP)?                       │  │
      │  │   → log debug, wait for startup to complete               │  │
      │  │                                                           │  │
      │  │ Guard 2: state in (STOPPING, CLOSE)?                      │  │
      │  │   → log warning, return (Supervisor tearing down,         │  │
      │  │     too late for graceful container shutdown)             │  │
      │  │                                                           │  │
      │  │ Guard 3: state == SHUTDOWN?                               │  │
      │  │   → await _shutdown_event.wait() (reentrant, wait for     │  │
      │  │     in-progress shutdown to finish)                       │  │
      │  │                                                           │  │
      │  │ _shutdown_event.clear()  ← reset for this cycle           │  │
      │  └───────────────────────────────────────────────────────────┘  │
      │                                                                 │
      │  state → SHUTDOWN (only if was RUNNING)                         │
      │                                                                 │
      │  try:                                                           │
      │    1. Stop Application add-ons                                  │
      │    2. Stop Home Assistant Core                                  │
      │    3. Stop Services add-ons                                     │
      │    4. Stop System add-ons                                       │
      │    5. Stop Initialize add-ons                                   │
      │    6. Stop Plugins (DNS, Audio, CLI, etc.)                      │
      │  finally:                                                       │
      │    _shutdown_event.set()  ← unblocks reentrant waiters          │
      └──────────────────────────┬──────────────────────────────────────┘
                                 │
           ┌─────────────────────┼──────────────────────┐
           │                     │                      │
           ▼                     ▼                      ▼
      API / UI path         Backup path          External path
      ─────────────         ───────────          ─────────────
      logind.power_off()    state → RUNNING      release inhibitor
      or logind.reboot()    (continue restore)   lock (os.close fd)
           │                future shutdown          │
           │                cycles work thanks       │
           ▼                to event.clear()         ▼
      Host begins                              Host proceeds
      shutdown/reboot                          with shutdown
           │                                        │
           └──────────────┬─────────────────────────┘
                          │
                          ▼
                ┌───────────────────┐
                │ Docker stops the  │
                │ Supervisor        │
                │ container         │
                │ (sends SIGTERM)   │
                └─────────┬─────────┘
                          │
                          ▼
      ┌─────────────────────────────────────────────────────────────────┐
      │               __main__.py shutdown_handler                      │
      │                                                                 │
      │  SIGTERM/SIGHUP/SIGINT received                                 │
      │  → cancel startup_task (if still running)                       │
      │  → create_task(core.stop())                                     │
      └─────────────────────────┬───────────────────────────────────────┘
                                │
                                ▼
      ┌─────────────────────────────────────────────────────────────────┐
      │                      core.stop()                                │
      │                                                                 │
      │  Guard: if state in (STOPPING, CLOSE): return                   │
      │                                                                 │
      │  state → STOPPING                                               │
      │                                                                 │
      │  ★ host.unload() ← cancels monitor task FIRST,                 │
      │                     before infrastructure teardown              │
      │                                                                 │
      │  Stage 1 (10s timeout):                                         │
      │    ├─ Stop API server                                           │
      │    ├─ Stop scheduler                                            │
      │    └─ Unload Docker monitor                                     │
      │                                                                 │
      │  Stage 2 (10s timeout):                                         │
      │    ├─ Close websession                                          │
      │    ├─ Unload ingress                                            │
      │    ├─ Unload hardware                                           │
      │    └─ Unload D-Bus                                              │
      │                                                                 │
      │  state → CLOSE                                                  │
      │  loop.stop()  ← event loop ends, process exits                  │
      └─────────────────────────────────────────────────────────────────┘

agners and others added 8 commits March 16, 2026 15:27
Handle external shutdown events (ACPI power button, hypervisor shutdown)
by listening to logind's PrepareForShutdown signal. A delay inhibitor
lock is acquired on startup so Supervisor has time to gracefully stop
all managed services before the host proceeds with shutdown.

Changes:
- Add inhibit() and prepare_for_shutdown() methods to Logind D-Bus interface
- Enable Unix FD negotiation on the D-Bus message bus for inhibitor lock FDs
- Add background monitor task in HostManager that listens for the signal
- Track the monitor task and cancel it cleanly via new unload() method
- Wire host unload into Core.stop() stage 2 for clean shutdown
- Add PrepareForShutdown signal constant to dbus/const.py

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Make Core.shutdown() reentrant using an asyncio.Event so that
concurrent callers await the in-progress shutdown instead of
starting a second one. This ensures the inhibitor lock is held
until shutdown truly completes, even if PrepareForShutdown fires
while an API-initiated shutdown is already in progress.

The state guard in the monitor task is no longer needed since
core.shutdown() now handles reentrance itself.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move host.unload() before Stage 1 in stop() so the shutdown monitor
task is cancelled before infrastructure teardown begins. This prevents
a race where the monitor could react to PrepareForShutdown after
stop() has already started tearing down.

For the remaining edge case where shutdown() is called while stop()
is running, log a warning and return immediately instead of awaiting
the shutdown event (which would deadlock since stop() never sets it).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
If PrepareForShutdown fires while Supervisor is still starting up,
wait for startup to complete before running the graceful shutdown
sequence. This prevents shutting down partially initialized
containers and services.

The _startup_complete event is set automatically in set_state()
whenever state transitions to RUNNING. If SIGTERM arrives during
startup, core.stop() cancels the monitor task via host.unload(),
cleanly interrupting the wait.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@agners agners force-pushed the take-shutdown-inhibit-lock-on-system-shutdown branch from c2feeef to def618b Compare March 16, 2026 14:28
@agners
Copy link
Copy Markdown
Member Author

agners commented Mar 16, 2026

In my testing this works quite well, but there are some small gaps: When Supervisor updates or restarts, it calls core.stop() which terminates the process. The Docker container exits and is restarted by the hassos-supervisor systemd service. During this window, no shutdown inhibitor lock is held — the host is free to proceed with a pending shutdown. Supervisor also can't hold a lock during that time, as the lock is a file descriptor. But because the entire container disappears, there is no way to keep the fd opened.

I've then thought about using systemd-inhibit in the hassos-supervisor service, but this proves to be somewhat awkward. Also ChatGPT was quick to point out that systemd itself has a lifecycle management, with strict ordering guarantees (After=, Before=) which are reversed during startup, and timeouts for services (TimeoutStopSec=...).

Which actually made me wonder, are we on the right track here? Maybe it would be better to react on hassos-supervisor service shutdown, and teardown all the other containers there (essentially have a stop() which shutdown). We'd need to increase the service stop timeout probably. This also ties into home-assistant/operating-system#4584, which shows another lifecycle problem we have between our containers and systemd world. Thoughts?

@agners agners marked this pull request as draft March 16, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants