Skip to content

websocket optimization and benchmarking#2399

Open
stephenberry wants to merge 6 commits intomainfrom
websocket-optimization
Open

websocket optimization and benchmarking#2399
stephenberry wants to merge 6 commits intomainfrom
websocket-optimization

Conversation

@stephenberry
Copy link
Copy Markdown
Owner

@stephenberry stephenberry commented Mar 24, 2026

WebSocket Optimization and Benchmarking

Performance Optimizations

Shared receive buffers — All WebSocket connections on a given thread now share a single 512KB receive buffer (one allocation per thread instead of per-connection). Unconsumed partial-frame bytes spill to a small per-connection buffer. Deferred reclamation avoids thrashing. Enabled by default; disable with ws_recv_buffer_size(0).

Fused unmask + ASCII detection — XOR unmasking now processes 8 bytes at a time and simultaneously checks whether all bytes are ASCII. For ASCII-only text frames, the separate UTF-8 validation pass is skipped entirely.

Zero-allocation write fast path — When no write is in flight, outgoing frames are built directly in a persistent per-connection buffer (capacity reused across messages). Frames are only heap-allocated and queued when a concurrent write is already in progress.

Write queue simplification — Replaced std::deque<std::unique_ptr<std::vector<uint8_t>>> with std::deque<std::vector<uint8_t>>, removing a level of indirection.

Benchmark Suite

Added benchmarks/ws_benchmark/ comparing Glaze against uWebSockets using Boost.Beast as a neutral client. Tests cover:

  • Single-connection echo at 64B, 1KB, 64KB, and JSON payloads
  • Connection upgrade (new WebSocket per message)
  • Concurrent echo with N clients (single-threaded and multi-threaded server)

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 22, 2026

@stephenberry, Hi, are there any plans to optimize the WebSockets/HTTP part? And maybe the benchmarks you were planning to do, at least with uWebSockets?

@stephenberry
Copy link
Copy Markdown
Owner Author

@GTruf, optimizing websockets is an aim. uWebSockets is pretty well optimized, so in my work to make Glaze just as fast I realized there are core limitations due to the design of asio. I need to decide whether to optimize asio or rework core networking logic to support extreme optimization.

I think in the mean time I'll merge optimizations that still use the asio architecture and include benchmarks. But, I want to get this right and not potentially make any API breaking changes without good reason.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 23, 2026

@stephenberry, By rework core networking logic, do you mean moving away from ASIO? And do you have any plans to support DPDK via the F-Stack? This is quite a complex task, mainly due to TLS, but there is an existing implementation (using SIMD for WS masks, etc.) that could be adapted - the flashws library.

@RazielXYZ
Copy link
Copy Markdown

This is an interesting dilemma; are there any better alternatives to asio at this point or is this a situation in which one would have to roll their own?
F-Stack doesn't seem to be cross-platform, and even DPDK itself is only partially cross-platform. I assume this would be an issue unless we're fine with different backends on different platforms, plus being fine with it being basically as low-level as BSD sockets.

@stephenberry
Copy link
Copy Markdown
Owner Author

stephenberry commented Apr 25, 2026

There is a lot I like about asio, but also some core flaws and lots of unfixed bugs. I've thought of trying to contribute heavily to asio, but I'd rather just work on a more modern library and not need to support old C++ versions like asio needs to care about. I have an experimental fork of asio that I have massively cleaned up, and which drops lots of deprecated code and cleans up things with modern C++20 concepts, etc. If I could get enough developers to help maintain this I would consider this direction, but I'm wary of the required time investment. The other option is to implement custom cross platform networking code for websockets, but this is more prone to bugs and bifurcates the networking codebase. A couple years ago I was using uWebsockets heavily and core bugs and design flaws were requiring us to do strange hacks and quick exits to avoid segfaults, so tightly optimized networking code is hard, but also very desirable.

I might open source my asio fork soon for feedback. It's a tough call. It removes many thousands of lines of code that aren't needed due to historical debt, but I wouldn't want to aim for parity with asio any more. This would be a completely new library with a similar API, but would evolve in another direction for the sake of modern C++ and performance.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 25, 2026

@RazielXYZ, DPDK is fully implemented for Linux. It’s already available for Windows, but is still under active development. No one is suggesting that the backend should be implemented exclusively on the F-Stack, I mean supporting it as an additional option to maximize performance right at the kernel bypass level.

@GTruf
Copy link
Copy Markdown

GTruf commented Apr 25, 2026

@stephenberry, I think you'll have no trouble finding developers on Reddit.

@RazielXYZ
Copy link
Copy Markdown

I might open source my asio fork soon for feedback. It's a tough call. It removes many thousands of lines of code that aren't needed due to historical debt, but I wouldn't want to aim for parity with asio any more. This would be a completely new library with a similar API, but would evolve in another direction for the sake of modern C++ and performance.

I don't think there's necessarily much point in aiming for parity with asio anyway, since asio still exists and is still under development, so if people want parity with asio, well, they can use asio. Something nicer and modern-er, as you describe it there, would certainly be welcome.

As for other ws libraries - I did try uWebSockets a bit, but did not like the API or design much - anything slightly different than expected or somewhat involved was either not doable easily or really damn ugly. I've used ixWebSockets quite a bit recently and it was fine in that regard, but quite limited by the one thread per connection/client design, and not really under active development anymore. Way back in the day I also used websocketpp, which is quite far removed from the niceties we have nowadays.

@stephenberry
Copy link
Copy Markdown
Owner Author

Thanks for the feedback and encouragement. I do think an asio fork would be great for the future C++ community. And, I'm excited to do more networking in C++, I'm just short on time at the moment and I don't want to do something half baked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants