Skip to content

Websocket lifecycle fixes#2411

Open
stephenberry wants to merge 8 commits intomainfrom
websocket
Open

Websocket lifecycle fixes#2411
stephenberry wants to merge 8 commits intomainfrom
websocket

Conversation

@stephenberry
Copy link
Copy Markdown
Owner

@stephenberry stephenberry commented Mar 30, 2026

Fix websocket_client crash on Windows IOCP (issue #2409)

Problem

websocket_client crashes with STATUS_ACCESS_VIOLATION (0xC0000005) on Windows IOCP during run(). The crash occurs at op->complete() in ASIO's IOCP event loop, intermittently, during the SSL handshake / early connection phase.

Root cause

This is ASIO issue #312 — an unfixed upstream bug. When PostQueuedCompletionStatus fails due to IOCP resource exhaustion, ASIO falls back to an internal completed_ops_ queue. But the completion key (overlapped_contains_result) is only passed as an argument to PostQueuedCompletionStatus and never stored on the operation object. When do_one() later picks up the operation from the fallback queue, it dispatches with the wrong completion key, causing op->complete() to be called with incorrect error codes and byte counts — leading to undefined behavior and crashes.

Fix

ASIO IOCP patch (cmake/asio-iocp-fix.cmake): Applied automatically to bundled ASIO (FetchContent path). Based on MongoDB's proven patch:

  • Adds a completionKey_ member to win_iocp_operation to store the completion key on the operation object
  • Stores the key before calling PostQueuedCompletionStatus, so it survives the fallback path
  • Passes op->completionKey() instead of a local variable when re-posting

The patch is 16 lines across 2 ASIO internal headers. It is idempotent (detects if already applied) and only runs for the bundled ASIO path. Users with system-installed ASIO or Boost.Asio should apply the patch to their ASIO installation if they experience the crash.

cancel_all() safety (websocket_client.hpp): Cancel and close raw sockets before resetting their shared_ptrs, then drain pending IOCP completion handlers with ctx->poll(). During the handshake phase, the websocket_connection doesn't exist yet, so force_close() is a no-op — the raw sockets must be cancelled directly.

Handler ordering (websocket_client.hpp): Set on_message/on_close/on_error handlers on the websocket_connection before calling set_initial_data(), so WebSocket frames that arrive in the same TCP segment as the HTTP upgrade response are not silently dropped.

New tests (websocket_client_lifetime_test.cpp) — 8 tests

  • Destroy-during-resolve, destroy-during-connect, destroy-during-handshake, destroy-during-active-connection
  • Rapid connect/destroy stress test (50 cycles at varying timings)
  • Destroy after connection failure, shared context destruction order
  • immediate_server_message_received: server sends a message immediately on open; verifies the client receives it (fails ~25% without the handler ordering fix)

@GTruf
Copy link
Copy Markdown

GTruf commented Mar 31, 2026

@stephenberry, hello, I used your websocket branch and verified that the code in the library is indeed the updated version, but unfortunately, the error persists. Based on the logs from asio, it’s still crashing at op->complete.

изображение

Try use my code example:

#include "glaze/net/websocket_client.hpp"

int main() {
    glz::websocket_client client;

    client.set_ssl_verify_mode(asio::ssl::verify_none);

    client.on_open([]() {
        std::cout << "Connected to WebSocket server!" << std::endl;
    });

    client.on_message([](std::string_view message, glz::ws_opcode opcode) {
        std::cout << message << std::endl;
    });

    client.on_close([](glz::ws_close_code code, std::string_view reason) {
        std::cout << "Connection closed with code: " << static_cast<int>(code);
        if (!reason.empty()) {
            std::cout << ", reason: " << reason;
        }
        std::cout << std::endl;
    });

    client.on_error([](std::error_code ec) {
        std::cerr << "Error: " << ec.message() << std::endl;
    });

    client.connect("wss://wseea.okx.com:8443/ws/v5/public");
    client.run();

    return 0;
}

The same problem:
изображение

изображение

@stephenberry
Copy link
Copy Markdown
Owner Author

stephenberry commented Mar 31, 2026

@GTruf, the issue is with the handshake/upgrade being asynchronous. I've now made it synchronous, but the current fix spawns a thread for each client that connects. I want to make a long term asynchronous solution, but this might be a valid short term fix. Are you expecting to deal with tens of thousands of websocket clients?

@GTruf
Copy link
Copy Markdown

GTruf commented Mar 31, 2026

@stephenberry, Not tens of thousands, but in the long run it could be several dozen or a few hundred. A per-connection flow doesn't seem like the best solution.

@stephenberry
Copy link
Copy Markdown
Owner Author

Yeah, I don't like the fixes I've tried so far. The problem is that the original code looks correct, but this is probably an IOCP windows bug. The synchronous handshake sidesteps the problem by never registering the socket with IOCP during the handshake phase. The only major cost is serialization when multiple clients share one io_context, which is an uncommon pattern for a WebSocket client.

@GTruf
Copy link
Copy Markdown

GTruf commented Mar 31, 2026

@stephenberry, so, for now, you’ll keep the synchronous version specifically for Windows and the asynchronous version for Unix? If so, could you please test my example on Unix (Ubuntu 24/25, for example) using your asynchronous version?

And what are your long-term plans for Windows? Will you open an issue for asio, or will you try to find workarounds?

@GTruf
Copy link
Copy Markdown

GTruf commented Mar 31, 2026

@stephenberry, In this issue, someone added a commit whose title suggests it fixes the problem. Have you tried this patch for asio code? It might just work.

изображение

Comment thread tests/CMakeLists.txt Outdated

# Apply IOCP fix for issue #312 (Windows crash in op->complete)
include(${CMAKE_CURRENT_SOURCE_DIR}/../cmake/asio-iocp-fix.cmake)
apply_asio_iocp_fix("${asio_SOURCE_DIR}/asio/include")
Copy link
Copy Markdown

@GTruf GTruf Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all of this works, it will be necessary to add checks to ensure that the Windows OS

Comment thread tests/CMakeLists.txt
# Apply IOCP fix for issue #312 (Windows crash in op->complete)
include(${CMAKE_CURRENT_SOURCE_DIR}/../cmake/asio-iocp-fix.cmake)
apply_asio_iocp_fix("${asio_SOURCE_DIR}/asio/include")
if(WIN32)
Copy link
Copy Markdown

@GTruf GTruf Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenberry, Was the original IOCP problem finally solved?

@stephenberry
Copy link
Copy Markdown
Owner Author

@GTruf, I'm going to run tests on Windows and get back to you soon. Still trying to understand the whole of the problem.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 7, 2026

@stephenberry, is there any progress on PR? If you need any help writing tests, I don't mind helping))

@stephenberry
Copy link
Copy Markdown
Owner Author

@GTruf, I haven't had time to test on Windows. So, if you could test this branch and provide feedback, that would be helpful.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 7, 2026

@stephenberry, I tried running my example on the websocket and v7.3.0 branches with the asio patch applied (by the way, your CMake doesn’t make the full patch that the mongo commit applies, so I manually applied that commit and verified it), using both asio 1-36-0 and 1-38-0. Unfortunately, none of the combinations solved the problem; there’s a periodic crash at startup everywhere...

@stephenberry
Copy link
Copy Markdown
Owner Author

@GTruf, thanks for testing. I need to dive into this, but I'm not sure how soon I can really delve in. I'm looking into more aggressive fixes as well.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 7, 2026

@stephenberry, I'm going to run all of this with ASAN and MSVC right now. If this really is an ASIO issue on Windows, it will likely pinpoint the exact location where the problem occurs.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 7, 2026

@stephenberry, In short, here’s the result:

  • everything works fine on MSVC 19.50.35728.0 (in Debug and Release builds). I ran the websocket and v7.3.0 branches over 20 times each with unpatched asio 1-38-0 and OpenSSL 3.6.2, and there wasn’t a single crash, everything works correctly;
  • under MinGW with GCC 15.2.0, asio 1-38-0 (both patched and unpatched), and OpenSSL 3.6.2, it crashes on both websocket and v7.3.0 branches in roughly the same proportion.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 7, 2026

What I don't understand is that the code in asio is exactly the same, there are no #ifdef directives; it's literally the same code, just compiled with MSVC instead of GCC. I've tried both enabling optimizations and running it without them.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 8, 2026

@stephenberry, with MinGW crashes if the GLZ_ENABLE_SSL macro is defined. Even connecting to ws:// crashes in this case. When I remove the definition of this macro, everything works fine in case of connecting to ws://.

P.S. BTW, there is outdated code in your documentation. Please update it

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 8, 2026

@stephenberry, UPD: the remote server detects the connection being established and calls the on_open handler:

WebSocket connection opened from 127.0.0.1
WebSocket connection closed:
WebSocket error: An existing connection was forcibly closed by the remote host.
WebSocket connection opened from 127.0.0.1
WebSocket error: An existing connection was forcibly closed by the remote host.
WebSocket connection closed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRITICAL BLOCKER BUG: crash on websocket_client::run()

2 participants