Skip to content

commands: stop dial-stdio when the builder connection closes#3790

Merged
tonistiigi merged 1 commit intodocker:masterfrom
crazy-max:dial-stop
May 6, 2026
Merged

commands: stop dial-stdio when the builder connection closes#3790
tonistiigi merged 1 commit intodocker:masterfrom
crazy-max:dial-stop

Conversation

@crazy-max
Copy link
Copy Markdown
Member

@crazy-max crazy-max commented Apr 9, 2026

fixes #3668

Fixes a dial-stdio hang that could persist after the builder connection closed. The command now exits when the read side finishes instead of waiting for a new write on stdin to unblock the process.

The old code used two io.Copy calls and waited for both to finish, which meant a blocked read from stdin could keep the command alive even after the builder connection had already died. This showed up when the Docker daemon restarted and dial-stdio stayed hung until the user pressed Enter.

Tried to add an integration test for this but that's tricky. We would need some wiring with explicit restart/kill hook into the worker to test this regression. So needs changes in BuildKit test framework imo.

cc @invidian @cpuguy83

@crazy-max crazy-max requested a review from tonistiigi April 15, 2026 14:58
@crazy-max crazy-max marked this pull request as ready for review April 15, 2026 14:58
@crazy-max crazy-max added this to the v0.34.0 milestone Apr 15, 2026
@tonistiigi
Copy link
Copy Markdown
Member

This kind of leaves the reader in unknown state. Normally you would read to io.Discard in here but not sure if it would hang.

@crazy-max
Copy link
Copy Markdown
Member Author

This kind of leaves the reader in unknown state. Normally you would read to io.Discard in here but not sure if it would hang.

I simplified this by removing the cancelable reader wrapper entirely. The proxy now starts the two io.Copy directions directly and waits on their completion channels.

When stdin finishes first, it closes the connection write side and keeps waiting for remote output. When the remote side finishes first, it closes the read side and returns immediately, so dial-stdio no longer hangs on a blocked stdin read.

That avoids leaving the original reader in an odd state, since we no longer put a pipe reader in front of it or try to cancel it from the outside. I also removed the extra errgroup, so the implementation is much closer to the original copy-loop shape.

I tested it on a real remote builder by closing the connection.

Copy link
Copy Markdown
Contributor

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't recall this one coming in.
LGTM.

Comment thread commands/dial_stdio.go
if err != nil && !errors.Is(err, net.ErrClosed) && !errors.Is(err, io.ErrClosedPipe) {
return err
}
stdinDone = nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem necessary.

@tonistiigi tonistiigi merged commit 0439d7f into docker:master May 6, 2026
160 checks passed
@crazy-max crazy-max deleted the dial-stop branch May 6, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dial-stdio hangs until write attempt before returning an error when Docker daemon is restarted

3 participants