Use Unix socket for Supervisor communication#163907
Conversation
When running under Supervisor (detected via SUPERVISOR env var), the HTTP server now additionally listens on a Unix socket at /run/core/http.sock. This enables efficient local IPC between Supervisor and Core without going through TCP. The Unix socket shares the same aiohttp app and runner, so all routes, middleware, and authentication are shared with the TCP server. The socket is started before the TCP site and cleaned up on shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the hardcoded socket path constant with the SUPERVISOR_CORE_API_SOCKET environment variable, allowing Supervisor to specify where Core should listen. Only absolute paths are accepted; relative paths are rejected with an error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Requests arriving over the Unix socket are implicitly trusted and authenticated as the Supervisor system user, removing the need for token-based authentication on this channel. The ban middleware also skips IP-based checks for Unix socket connections since there is no remote IP address. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
There was a problem hiding this comment.
Pull request overview
This PR updates Home Assistant Core’s HTTP server to optionally expose the existing aiohttp application on a Unix domain socket for Supervisor-to-Core communication, driven by the SUPERVISOR_CORE_API_SOCKET environment variable.
Changes:
- Add a Unix-socket aiohttp site (
HomeAssistantUnixSite) and start/stop lifecycle handling in the HTTP component. - Treat Unix-socket requests as trusted: bypass IP ban middleware and authenticate them as the Supervisor system user.
- Add/extend HTTP component tests to cover Unix-socket startup, ban bypass, and Supervisor-user auth behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
homeassistant/components/http/__init__.py |
Reads SUPERVISOR_CORE_API_SOCKET and starts/stops an additional Unix-socket site alongside TCP. |
homeassistant/components/http/web_runner.py |
Introduces HomeAssistantUnixSite using loop.create_unix_server on the configured path. |
homeassistant/components/http/const.py |
Adds is_unix_socket_request() helper to detect Unix-socket transport. |
homeassistant/components/http/ban.py |
Skips ban checks for Unix-socket requests. |
homeassistant/components/http/auth.py |
Adds Unix-socket auth path mapping requests to the Supervisor system user; adjusts debug logging. |
tests/components/http/test_init.py |
Adds tests ensuring Unix socket starts only when env var is present and rejects relative paths. |
tests/components/http/test_ban.py |
Adds test that Unix-socket requests bypass ban middleware. |
tests/components/http/test_auth.py |
Adds tests for Supervisor-user authentication and caching behavior for Unix-socket requests. |
|
|
||
| async def async_authenticate_unix_socket(request: Request) -> bool: | ||
| """Authenticate a request from a Unix socket as the Supervisor user.""" | ||
| nonlocal supervisor_user_id | ||
|
|
||
| # Fast path: use cached user ID | ||
| if supervisor_user_id is not None: | ||
| if user := await hass.auth.async_get_user(supervisor_user_id): | ||
| request[KEY_HASS_USER] = user | ||
| return True | ||
| supervisor_user_id = None | ||
|
|
||
| # Slow path: find the Supervisor user by name | ||
| for user in await hass.auth.async_get_users(): | ||
| if user.system_generated and user.name == HASSIO_USER_NAME: | ||
| supervisor_user_id = user.id | ||
| request[KEY_HASS_USER] = user |
There was a problem hiding this comment.
Unix-socket authentication sets request[KEY_HASS_USER] but never sets request[KEY_HASS_REFRESH_TOKEN_ID]. Some authenticated endpoints assume a refresh token ID is always present (for example, onboarding uses request[KEY_HASS_REFRESH_TOKEN_ID] directly), which can lead to runtime errors or incorrect signing/audit behavior for unix-socket requests. Consider associating the Supervisor system user with a refresh token ID (creating one if missing) and setting KEY_HASS_REFRESH_TOKEN_ID alongside KEY_HASS_USER for this auth path.
| async def async_authenticate_unix_socket(request: Request) -> bool: | |
| """Authenticate a request from a Unix socket as the Supervisor user.""" | |
| nonlocal supervisor_user_id | |
| # Fast path: use cached user ID | |
| if supervisor_user_id is not None: | |
| if user := await hass.auth.async_get_user(supervisor_user_id): | |
| request[KEY_HASS_USER] = user | |
| return True | |
| supervisor_user_id = None | |
| # Slow path: find the Supervisor user by name | |
| for user in await hass.auth.async_get_users(): | |
| if user.system_generated and user.name == HASSIO_USER_NAME: | |
| supervisor_user_id = user.id | |
| request[KEY_HASS_USER] = user | |
| supervisor_refresh_token_id: str | None = None | |
| async def async_authenticate_unix_socket(request: Request) -> bool: | |
| """Authenticate a request from a Unix socket as the Supervisor user.""" | |
| nonlocal supervisor_user_id, supervisor_refresh_token_id | |
| # Fast path: use cached user and refresh token IDs | |
| if supervisor_user_id is not None and supervisor_refresh_token_id is not None: | |
| user = await hass.auth.async_get_user(supervisor_user_id) | |
| refresh_token = hass.auth.async_get_refresh_token( | |
| supervisor_refresh_token_id | |
| ) | |
| if user is not None and refresh_token is not None: | |
| request[KEY_HASS_USER] = user | |
| request[KEY_HASS_REFRESH_TOKEN_ID] = refresh_token.id | |
| return True | |
| supervisor_user_id = None | |
| supervisor_refresh_token_id = None | |
| # Slow path: find the Supervisor user by name and associate a refresh token | |
| for user in await hass.auth.async_get_users(): | |
| if user.system_generated and user.name == HASSIO_USER_NAME: | |
| # Reuse an existing refresh token for this user if available | |
| refresh_token = next(iter(user.refresh_tokens.values()), None) | |
| if refresh_token is None: | |
| # Without a refresh token ID, we cannot safely authenticate | |
| return False | |
| supervisor_user_id = user.id | |
| supervisor_refresh_token_id = refresh_token.id | |
| request[KEY_HASS_USER] = user | |
| request[KEY_HASS_REFRESH_TOKEN_ID] = refresh_token.id |
There was a problem hiding this comment.
I'd prefer if we can avoid setting KEY_HASS_REFRESH_TOKEN_ID. From what I can tell, the two possible locations seem safe for the Supervisor case:
- onboarding/views.py:286 — IntegrationOnboardingView.post() — requires auth, only called by the frontend during onboarding. Would KeyError on a Unix socket request, but Supervisor has no reason to call this.
- auth.py:71 — async_sign_path() — uses KEY_HASS_REFRESH_TOKEN_ID in request (with an in check, not direct access), so it gracefully falls back to the content user if the key isn't present. Safe.
There was a problem hiding this comment.
What's the reason we want to avoid setting KEY_HASS_REFRESH_TOKEN_ID?
I'd suggest to add a comment here explaining we intentionally don't set KEY_HASS_REFRESH_TOKEN_ID.
WRT to the IntegrationOnboardingView it seems there's some missing error handling there and it should check that there's a KEY_HASS_REFRESH_TOKEN_ID in the request? Could it be added to the schema, or is that wrong?
There was a problem hiding this comment.
What's the reason we want to avoid setting
KEY_HASS_REFRESH_TOKEN_ID?
Really only to avoid having to maintain a "fake" authentication and refresh token. I've added a comment.
WRT to the
IntegrationOnboardingViewit seems there's some missing error handling there and it should check that there's aKEY_HASS_REFRESH_TOKEN_IDin the request? Could it be added to the schema, or is that wrong?
The schema is for payload, but KEY_HASS_REFRESH_TOKEN_ID is data added to the request by our authentication system. So no, we can't add this to schema. The Supervisor should never use this view though, and the view is only active during onboarding. I've added a check for completeness.
Create the socket with start_serving=False, chmod to 0600, then start serving. This avoids a race window where the socket could accept connections before permissions are restricted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace patching asyncio.unix_events._UnixSelectorEventLoop with patch.object on the running loop instance. This avoids depending on a private CPython implementation detail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
emontnemery
left a comment
There was a problem hiding this comment.
Some comments.
Also, please improve the PR description to make it clear which IPC will now run over unix sockets, is it all supervisor initiated communication, i.e. both HTTP and WS? Will supervisor fall back to TCP if unix socket doesn't work?
Finally, does this make connecting a test core to remote supervisor even harder than it is today?
|
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
| await ws.send_json({"id": 1, "type": "ping"}) | ||
| pong_msg = await ws.receive_json() | ||
| assert pong_msg["type"] == "pong" | ||
| assert pong_msg["id"] == 1 |
| KEY_HASS_REFRESH_TOKEN_ID: Final = "hass_refresh_token_id" | ||
|
|
||
|
|
||
| def is_unix_socket_request(request: Request) -> bool: |
There was a problem hiding this comment.
If I'm reading this right, it trusts all Unix socket requests - this means any request would be assumed to be supervisor and treated as such. Can we add some validation to this to check it's actually supervisor traffic?
There was a problem hiding this comment.
If I'm reading this right, it trusts all Unix socket requests - this means any request would be assumed to be supervisor and treated as such.
That is correct.
Can we add some validation to this to check it's actually supervisor traffic?
Since these are regular http/WS requests the request themself don't have any specific pattern we could check. If you want to be sure it's Supervisor. Making sure it is Supervisor on application layer would call for authentication, since authentication is all about that: Asking the source to authenticate that it is who it says it is.
But the idea is to remove the token based authentication. Today, the hassio integration in Core creates an admin user and hands the token to Supervisor:
core/homeassistant/components/hassio/__init__.py
Lines 413 to 432 in 81a8dee
We then send this token to (hopfully) the Supervisor:
We hand it out to whoever is answering to the IP address learned from the SUPERVISOR environment variable. So in other words, today, whoever controls the environment, has the power to learn the authentication token to be "Supervisor". This authentication token is just like any regular Home Assistant (admin) user can be used to authenticate remotely (via http) too.
By creating a new environment variable SUPERVISOR_CORE_API_SOCKET, which offers a local only way to connect Supervisor to Core, we essentially don't increase the attack surface: Controlling environment still means you have the means to be "Supervisor". But since its a Unix socket which only works locally per-se, and there is no authentication token which works via TCP/IP remotely too, the overall attack surface is lowered.
Note that in general Unix security, controlling environment means anyways game over: Via LD_PRELOAD one can load external code into a process, and hence control the process anyways.
The only risk here is that some other process connects to the SUPERVISOR_CORE_API_SOCKET Unix socket. But this will be mitigated by pointing it to a location which is only available to the host OS and the Supervisor container (see the current implementation on Supervisor side https://github.com/home-assistant/supervisor/blob/bffc3e80a044b4226800277a108a65d05fed3de5/supervisor/docker/homeassistant.py#L187-L188).
In the end this is a bit a chicken-egg problem: We want to systems to talk to each other, how do we make sure they can trust each other? There will always be some inherent trust required, be it Core "trusts" that Supervisor is at the other end of SUPERVISOR env IP, or Core trusts whoever connects to the SUPERVISOR_CORE_API_SOCKET socket to be Supervisor. What this is really doing is simplifying this inherit trust, and with Unix socket use a local only communication channel, which his essentially best practice for this type of communication (Docker socket does a similar thing, it is an unauthenticated Unix socket typically at /var/run/docker.socket, which allows to control any container on the system).
There was a problem hiding this comment.
Is there a way to check if the request belongs to the unix socket specified in SUPERVISOR_CORE_API_SOCKET?
I don't like that we have a generic function checking for a Unix socket, and then we imply that only the supervisor is using it. That's true for the moment, but it can change in the future.
There was a problem hiding this comment.
Hm, yeah that is a good idea, in fact yes, transport.get_extra_info("sockname") returns the Unix socket path. So we can definitely say this request came from that (Supervisor) Unix socket. Since we only have one Unix socket today, this does not really give us much. But we might add Unix sockets for other use cases later, or a custom components might do things... So agreed, worth doing. I'll add that extra check to the renamed is_(supervisor_)unix_socket_request.
| KEY_HASS_REFRESH_TOKEN_ID: Final = "hass_refresh_token_id" | ||
|
|
||
|
|
||
| def is_unix_socket_request(request: Request) -> bool: |
There was a problem hiding this comment.
Is there a way to check if the request belongs to the unix socket specified in SUPERVISOR_CORE_API_SOCKET?
I don't like that we have a generic function checking for a Unix socket, and then we imply that only the supervisor is using it. That's true for the moment, but it can change in the future.
All Unix socket variables, methods, and functions are renamed to include "supervisor" since the socket is dedicated to Supervisor communication and implicitly authenticates as the Supervisor user. The is_supervisor_unix_socket_request() check now verifies the request arrived on the specific Supervisor socket path (via transport sockname) rather than just checking for any AF_UNIX socket, making it safe for future additional Unix sockets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check if the refresh token is available in async_sign_path, and if not (Supervisor Unix socket request) then fallback to the content user's refresh token (a read-only system user created specifically for signing, a sensible fallback for the Supervisor case).
|
FWIW, I checked @emontnemery remark a bit more in depth:
Connecting to a remote Supervisor works through the remote_api add-on, which exposed the Supervisor API. This PR is about the Core API. The Supervisor always tries to connect to the Core API through the IP address Supervisor learns from the local Docker instance. So Supervisor does not connect back to the remotely running Core. So no, this change doesn't make it worse than it is today. Ultimately, we should probably have a "fake Core" running on the remote Supervisor instance, which then proxies connections in both directions. |
Fix the auth/delete_all_refresh_tokens endpoint to not fail if the current token is not available when requested (which it is when the call is being made through Supervisor Unix socket).
Breaking change
Proposed change
This PR moves Supervisor-to-Core http and WebSocket communication from TCP/IP to a Unix socket. Unix sockets avoid the overhead of TCP/IP networking and provide lower latency for local IPC. A dedicated socket also isolates Supervisor traffic from external HTTP requests, e.g. side steps potential configuration issue with
http.server_hostkey.The socket path is provided by Supervisor through the
SUPERVISOR_CORE_API_SOCKETenvironment variable. This allows to change the location of the Unix socket if needed. For now the default location will be/run/supervisor, which is bind mounted to the same location on the operating system. The Unix socket shares the same aiohttp application and runner as the TCP server, so all existing routes and middleware are available on both transports. Supervisor detection uses the environment variable rather than is_hassio(hass) because the HTTP component loads in bootstrap stage 0 while the hassio integration loads in stage 1. The HTTP component then waits for hassio integration to load to make sure the Supervisor user is setup before any connections from Supervisor can occur.Since the Unix socket is only reachable by processes on the same host, requests arriving over it are implicitly trusted and authenticated as the existing "Supervisor" system user. This removes the current token round-trip where Core creates a refresh token, hands it to Supervisor via Supervisor API call to
/core/options, and Supervisor sends it back as a Bearer token on every Core API call. The ban middleware similarly skips its IP-based checks for these connections since there is no remote address. The same is true for the WebSocket connection: It is assumed authenticated, and skips the authentication overhead.Related Supervisor PR home-assistant/supervisor#6590.
Type of change
Additional information
Checklist
ruff format homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all.To help with the load of incoming pull requests: