Skip to content

Use Unix socket for Supervisor communication#163907

Merged
edenhaus merged 25 commits intodevfrom
use-unix-socket-for-supervisor
Mar 25, 2026
Merged

Use Unix socket for Supervisor communication#163907
edenhaus merged 25 commits intodevfrom
use-unix-socket-for-supervisor

Conversation

@agners
Copy link
Copy Markdown
Member

@agners agners commented Feb 23, 2026

Breaking change

Proposed change

This PR moves Supervisor-to-Core http and WebSocket communication from TCP/IP to a Unix socket. Unix sockets avoid the overhead of TCP/IP networking and provide lower latency for local IPC. A dedicated socket also isolates Supervisor traffic from external HTTP requests, e.g. side steps potential configuration issue with http.server_host key.

The socket path is provided by Supervisor through the SUPERVISOR_CORE_API_SOCKET environment variable. This allows to change the location of the Unix socket if needed. For now the default location will be /run/supervisor, which is bind mounted to the same location on the operating system. The Unix socket shares the same aiohttp application and runner as the TCP server, so all existing routes and middleware are available on both transports. Supervisor detection uses the environment variable rather than is_hassio(hass) because the HTTP component loads in bootstrap stage 0 while the hassio integration loads in stage 1. The HTTP component then waits for hassio integration to load to make sure the Supervisor user is setup before any connections from Supervisor can occur.

Since the Unix socket is only reachable by processes on the same host, requests arriving over it are implicitly trusted and authenticated as the existing "Supervisor" system user. This removes the current token round-trip where Core creates a refresh token, hands it to Supervisor via Supervisor API call to /core/options, and Supervisor sends it back as a Bearer token on every Core API call. The ban middleware similarly skips its IP-based checks for these connections since there is no remote address. The same is true for the WebSocket connection: It is assumed authenticated, and skips the authentication overhead.

Related Supervisor PR home-assistant/supervisor#6590.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • I understand the code I am submitting and can explain how it works.
  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.
  • Any generated code has been carefully reviewed for correctness and compliance with project standards.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies a diff between library versions and ideally a link to the changelog/release notes is added to the PR description.

To help with the load of incoming pull requests:

agners and others added 3 commits February 23, 2026 23:00
When running under Supervisor (detected via SUPERVISOR env var),
the HTTP server now additionally listens on a Unix socket at
/run/core/http.sock. This enables efficient local IPC between
Supervisor and Core without going through TCP.

The Unix socket shares the same aiohttp app and runner, so all
routes, middleware, and authentication are shared with the TCP
server. The socket is started before the TCP site and cleaned up
on shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the hardcoded socket path constant with the
SUPERVISOR_CORE_API_SOCKET environment variable, allowing
Supervisor to specify where Core should listen. Only absolute
paths are accepted; relative paths are rejected with an error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Requests arriving over the Unix socket are implicitly trusted and
authenticated as the Supervisor system user, removing the need for
token-based authentication on this channel. The ban middleware also
skips IP-based checks for Unix socket connections since there is no
remote IP address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@home-assistant
Copy link
Copy Markdown

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (http) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of http can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign http Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component, problem in config, problem in device, feature-request) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component, problem in config, problem in device, feature-request) on the pull request.

@agners agners marked this pull request as ready for review February 24, 2026 10:34
@agners agners requested a review from a team as a code owner February 24, 2026 10:34
Copilot AI review requested due to automatic review settings February 24, 2026 10:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Home Assistant Core’s HTTP server to optionally expose the existing aiohttp application on a Unix domain socket for Supervisor-to-Core communication, driven by the SUPERVISOR_CORE_API_SOCKET environment variable.

Changes:

  • Add a Unix-socket aiohttp site (HomeAssistantUnixSite) and start/stop lifecycle handling in the HTTP component.
  • Treat Unix-socket requests as trusted: bypass IP ban middleware and authenticate them as the Supervisor system user.
  • Add/extend HTTP component tests to cover Unix-socket startup, ban bypass, and Supervisor-user auth behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
homeassistant/components/http/__init__.py Reads SUPERVISOR_CORE_API_SOCKET and starts/stops an additional Unix-socket site alongside TCP.
homeassistant/components/http/web_runner.py Introduces HomeAssistantUnixSite using loop.create_unix_server on the configured path.
homeassistant/components/http/const.py Adds is_unix_socket_request() helper to detect Unix-socket transport.
homeassistant/components/http/ban.py Skips ban checks for Unix-socket requests.
homeassistant/components/http/auth.py Adds Unix-socket auth path mapping requests to the Supervisor system user; adjusts debug logging.
tests/components/http/test_init.py Adds tests ensuring Unix socket starts only when env var is present and rejects relative paths.
tests/components/http/test_ban.py Adds test that Unix-socket requests bypass ban middleware.
tests/components/http/test_auth.py Adds tests for Supervisor-user authentication and caching behavior for Unix-socket requests.

Comment on lines +217 to +233

async def async_authenticate_unix_socket(request: Request) -> bool:
"""Authenticate a request from a Unix socket as the Supervisor user."""
nonlocal supervisor_user_id

# Fast path: use cached user ID
if supervisor_user_id is not None:
if user := await hass.auth.async_get_user(supervisor_user_id):
request[KEY_HASS_USER] = user
return True
supervisor_user_id = None

# Slow path: find the Supervisor user by name
for user in await hass.auth.async_get_users():
if user.system_generated and user.name == HASSIO_USER_NAME:
supervisor_user_id = user.id
request[KEY_HASS_USER] = user
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unix-socket authentication sets request[KEY_HASS_USER] but never sets request[KEY_HASS_REFRESH_TOKEN_ID]. Some authenticated endpoints assume a refresh token ID is always present (for example, onboarding uses request[KEY_HASS_REFRESH_TOKEN_ID] directly), which can lead to runtime errors or incorrect signing/audit behavior for unix-socket requests. Consider associating the Supervisor system user with a refresh token ID (creating one if missing) and setting KEY_HASS_REFRESH_TOKEN_ID alongside KEY_HASS_USER for this auth path.

Suggested change
async def async_authenticate_unix_socket(request: Request) -> bool:
"""Authenticate a request from a Unix socket as the Supervisor user."""
nonlocal supervisor_user_id
# Fast path: use cached user ID
if supervisor_user_id is not None:
if user := await hass.auth.async_get_user(supervisor_user_id):
request[KEY_HASS_USER] = user
return True
supervisor_user_id = None
# Slow path: find the Supervisor user by name
for user in await hass.auth.async_get_users():
if user.system_generated and user.name == HASSIO_USER_NAME:
supervisor_user_id = user.id
request[KEY_HASS_USER] = user
supervisor_refresh_token_id: str | None = None
async def async_authenticate_unix_socket(request: Request) -> bool:
"""Authenticate a request from a Unix socket as the Supervisor user."""
nonlocal supervisor_user_id, supervisor_refresh_token_id
# Fast path: use cached user and refresh token IDs
if supervisor_user_id is not None and supervisor_refresh_token_id is not None:
user = await hass.auth.async_get_user(supervisor_user_id)
refresh_token = hass.auth.async_get_refresh_token(
supervisor_refresh_token_id
)
if user is not None and refresh_token is not None:
request[KEY_HASS_USER] = user
request[KEY_HASS_REFRESH_TOKEN_ID] = refresh_token.id
return True
supervisor_user_id = None
supervisor_refresh_token_id = None
# Slow path: find the Supervisor user by name and associate a refresh token
for user in await hass.auth.async_get_users():
if user.system_generated and user.name == HASSIO_USER_NAME:
# Reuse an existing refresh token for this user if available
refresh_token = next(iter(user.refresh_tokens.values()), None)
if refresh_token is None:
# Without a refresh token ID, we cannot safely authenticate
return False
supervisor_user_id = user.id
supervisor_refresh_token_id = refresh_token.id
request[KEY_HASS_USER] = user
request[KEY_HASS_REFRESH_TOKEN_ID] = refresh_token.id

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we can avoid setting KEY_HASS_REFRESH_TOKEN_ID. From what I can tell, the two possible locations seem safe for the Supervisor case:

  1. onboarding/views.py:286 — IntegrationOnboardingView.post() — requires auth, only called by the frontend during onboarding. Would KeyError on a Unix socket request, but Supervisor has no reason to call this.
  2. auth.py:71 — async_sign_path() — uses KEY_HASS_REFRESH_TOKEN_ID in request (with an in check, not direct access), so it gracefully falls back to the content user if the key isn't present. Safe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we want to avoid setting KEY_HASS_REFRESH_TOKEN_ID?
I'd suggest to add a comment here explaining we intentionally don't set KEY_HASS_REFRESH_TOKEN_ID.

WRT to the IntegrationOnboardingView it seems there's some missing error handling there and it should check that there's a KEY_HASS_REFRESH_TOKEN_ID in the request? Could it be added to the schema, or is that wrong?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we want to avoid setting KEY_HASS_REFRESH_TOKEN_ID?

Really only to avoid having to maintain a "fake" authentication and refresh token. I've added a comment.

WRT to the IntegrationOnboardingView it seems there's some missing error handling there and it should check that there's a KEY_HASS_REFRESH_TOKEN_ID in the request? Could it be added to the schema, or is that wrong?

The schema is for payload, but KEY_HASS_REFRESH_TOKEN_ID is data added to the request by our authentication system. So no, we can't add this to schema. The Supervisor should never use this view though, and the view is only active during onboarding. I've added a check for completeness.

agners and others added 2 commits February 24, 2026 13:43
Create the socket with start_serving=False, chmod to 0600, then
start serving. This avoids a race window where the socket could
accept connections before permissions are restricted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace patching asyncio.unix_events._UnixSelectorEventLoop with
patch.object on the running loop instance. This avoids depending
on a private CPython implementation detail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 24, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@agners agners added this to the 2026.3.0b0 milestone Feb 25, 2026
@frenck frenck modified the milestones: 2026.3.0b0, 2026.3.0 Feb 25, 2026
@emontnemery emontnemery removed this from the 2026.3.0 milestone Feb 26, 2026
Copy link
Copy Markdown
Contributor

@emontnemery emontnemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Also, please improve the PR description to make it clear which IPC will now run over unix sockets, is it all supervisor initiated communication, i.e. both HTTP and WS? Will supervisor fall back to TCP if unix socket doesn't work?

Finally, does this make connecting a test core to remote supervisor even harder than it is today?

@home-assistant home-assistant bot marked this pull request as draft February 26, 2026 07:52
@home-assistant
Copy link
Copy Markdown

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

Copilot AI review requested due to automatic review settings March 17, 2026 10:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

await ws.send_json({"id": 1, "type": "ping"})
pong_msg = await ws.receive_json()
assert pong_msg["type"] == "pong"
assert pong_msg["id"] == 1
KEY_HASS_REFRESH_TOKEN_ID: Final = "hass_refresh_token_id"


def is_unix_socket_request(request: Request) -> bool:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, it trusts all Unix socket requests - this means any request would be assumed to be supervisor and treated as such. Can we add some validation to this to check it's actually supervisor traffic?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, it trusts all Unix socket requests - this means any request would be assumed to be supervisor and treated as such.

That is correct.

Can we add some validation to this to check it's actually supervisor traffic?

Since these are regular http/WS requests the request themself don't have any specific pattern we could check. If you want to be sure it's Supervisor. Making sure it is Supervisor on application layer would call for authentication, since authentication is all about that: Asking the source to authenticate that it is who it says it is.

But the idea is to remove the token based authentication. Today, the hassio integration in Core creates an admin user and hands the token to Supervisor:

refresh_token = None
if (hassio_user := config_store.data.hassio_user) is not None:
user = await hass.auth.async_get_user(hassio_user)
if user and user.refresh_tokens:
refresh_token = list(user.refresh_tokens.values())[0]
# Migrate old Hass.io users to be admin.
if not user.is_admin:
await hass.auth.async_update_user(user, group_ids=[GROUP_ID_ADMIN])
# Migrate old name
if user.name == "Hass.io":
await hass.auth.async_update_user(user, name=HASSIO_USER_NAME)
if refresh_token is None:
user = await hass.auth.async_create_system_user(
HASSIO_USER_NAME, group_ids=[GROUP_ID_ADMIN]
)
refresh_token = await hass.auth.async_create_refresh_token(user)
config_store.update(hassio_user=user.id)

We then send this token to (hopfully) the Supervisor:

await supervisor_client.homeassistant.set_options(options)

We hand it out to whoever is answering to the IP address learned from the SUPERVISOR environment variable. So in other words, today, whoever controls the environment, has the power to learn the authentication token to be "Supervisor". This authentication token is just like any regular Home Assistant (admin) user can be used to authenticate remotely (via http) too.

By creating a new environment variable SUPERVISOR_CORE_API_SOCKET, which offers a local only way to connect Supervisor to Core, we essentially don't increase the attack surface: Controlling environment still means you have the means to be "Supervisor". But since its a Unix socket which only works locally per-se, and there is no authentication token which works via TCP/IP remotely too, the overall attack surface is lowered.

Note that in general Unix security, controlling environment means anyways game over: Via LD_PRELOAD one can load external code into a process, and hence control the process anyways.

The only risk here is that some other process connects to the SUPERVISOR_CORE_API_SOCKET Unix socket. But this will be mitigated by pointing it to a location which is only available to the host OS and the Supervisor container (see the current implementation on Supervisor side https://github.com/home-assistant/supervisor/blob/bffc3e80a044b4226800277a108a65d05fed3de5/supervisor/docker/homeassistant.py#L187-L188).

In the end this is a bit a chicken-egg problem: We want to systems to talk to each other, how do we make sure they can trust each other? There will always be some inherent trust required, be it Core "trusts" that Supervisor is at the other end of SUPERVISOR env IP, or Core trusts whoever connects to the SUPERVISOR_CORE_API_SOCKET socket to be Supervisor. What this is really doing is simplifying this inherit trust, and with Unix socket use a local only communication channel, which his essentially best practice for this type of communication (Docker socket does a similar thing, it is an unauthenticated Unix socket typically at /var/run/docker.socket, which allows to control any container on the system).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check if the request belongs to the unix socket specified in SUPERVISOR_CORE_API_SOCKET?
I don't like that we have a generic function checking for a Unix socket, and then we imply that only the supervisor is using it. That's true for the moment, but it can change in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah that is a good idea, in fact yes, transport.get_extra_info("sockname") returns the Unix socket path. So we can definitely say this request came from that (Supervisor) Unix socket. Since we only have one Unix socket today, this does not really give us much. But we might add Unix sockets for other use cases later, or a custom components might do things... So agreed, worth doing. I'll add that extra check to the renamed is_(supervisor_)unix_socket_request.

@frenck frenck added this to the 2026.4.0b0 milestone Mar 24, 2026
KEY_HASS_REFRESH_TOKEN_ID: Final = "hass_refresh_token_id"


def is_unix_socket_request(request: Request) -> bool:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check if the request belongs to the unix socket specified in SUPERVISOR_CORE_API_SOCKET?
I don't like that we have a generic function checking for a Unix socket, and then we imply that only the supervisor is using it. That's true for the moment, but it can change in the future.

@home-assistant home-assistant bot marked this pull request as draft March 24, 2026 12:59
agners and others added 2 commits March 24, 2026 16:41
All Unix socket variables, methods, and functions are renamed to
include "supervisor" since the socket is dedicated to Supervisor
communication and implicitly authenticates as the Supervisor user.

The is_supervisor_unix_socket_request() check now verifies the
request arrived on the specific Supervisor socket path (via
transport sockname) rather than just checking for any AF_UNIX
socket, making it safe for future additional Unix sockets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 24, 2026 15:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

agners added 3 commits March 24, 2026 16:50
Check if the refresh token is available in async_sign_path, and if
not (Supervisor Unix socket request) then fallback to the content
user's refresh token (a read-only system user created specifically
for signing, a sensible fallback for the Supervisor case).
@agners agners marked this pull request as ready for review March 24, 2026 16:25
Copilot AI review requested due to automatic review settings March 24, 2026 16:25
@home-assistant home-assistant bot requested a review from edenhaus March 24, 2026 16:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

@agners agners changed the title Use unix socket for Supervisor communication Use Unix socket for Supervisor communication Mar 24, 2026
@agners
Copy link
Copy Markdown
Member Author

agners commented Mar 24, 2026

FWIW, I checked @emontnemery remark a bit more in depth:

Finally, does this make connecting a test core to remote supervisor even harder than it is today?

Connecting to a remote Supervisor works through the remote_api add-on, which exposed the Supervisor API. This PR is about the Core API. The Supervisor always tries to connect to the Core API through the IP address Supervisor learns from the local Docker instance. So Supervisor does not connect back to the remotely running Core. So no, this change doesn't make it worse than it is today.

Ultimately, we should probably have a "fake Core" running on the remote Supervisor instance, which then proxies connections in both directions.

Fix the auth/delete_all_refresh_tokens endpoint to not fail if the
current token is not available when requested (which it is when
the call is being made through Supervisor Unix socket).
Copy link
Copy Markdown
Member

@edenhaus edenhaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agners 👍

@edenhaus edenhaus merged commit d5ff890 into dev Mar 25, 2026
46 of 47 checks passed
@edenhaus edenhaus deleted the use-unix-socket-for-supervisor branch March 25, 2026 09:06
@frenck frenck removed this from the 2026.4.0b0 milestone Mar 25, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants