websocket optimization and benchmarking by stephenberry · Pull Request #2399 · stephenberry/glaze

stephenberry · 2026-03-24T14:36:13Z

WebSocket Optimization and Benchmarking

Performance Optimizations

Shared receive buffers — All WebSocket connections on a given thread now share a single 512KB receive buffer (one allocation per thread instead of per-connection). Unconsumed partial-frame bytes spill to a small per-connection buffer. Deferred reclamation avoids thrashing. Enabled by default; disable with ws_recv_buffer_size(0).

Fused unmask + ASCII detection — XOR unmasking now processes 8 bytes at a time and simultaneously checks whether all bytes are ASCII. For ASCII-only text frames, the separate UTF-8 validation pass is skipped entirely.

Zero-allocation write fast path — When no write is in flight, outgoing frames are built directly in a persistent per-connection buffer (capacity reused across messages). Frames are only heap-allocated and queued when a concurrent write is already in progress.

Write queue simplification — Replaced std::deque<std::unique_ptr<std::vector<uint8_t>>> with std::deque<std::vector<uint8_t>>, removing a level of indirection.

Benchmark Suite

Added benchmarks/ws_benchmark/ comparing Glaze against uWebSockets using Boost.Beast as a neutral client. Tests cover:

Single-connection echo at 64B, 1KB, 64KB, and JSON payloads
Connection upgrade (new WebSocket per message)
Concurrent echo with N clients (single-threaded and multi-threaded server)

GTruf · 2026-04-22T21:15:38Z

@stephenberry, Hi, are there any plans to optimize the WebSockets/HTTP part? And maybe the benchmarks you were planning to do, at least with uWebSockets?

stephenberry · 2026-04-23T06:59:30Z

@GTruf, optimizing websockets is an aim. uWebSockets is pretty well optimized, so in my work to make Glaze just as fast I realized there are core limitations due to the design of asio. I need to decide whether to optimize asio or rework core networking logic to support extreme optimization.

I think in the mean time I'll merge optimizations that still use the asio architecture and include benchmarks. But, I want to get this right and not potentially make any API breaking changes without good reason.

GTruf · 2026-04-23T08:45:35Z

@stephenberry, By rework core networking logic, do you mean moving away from ASIO? And do you have any plans to support DPDK via the F-Stack? This is quite a complex task, mainly due to TLS, but there is an existing implementation (using SIMD for WS masks, etc.) that could be adapted - the flashws library.

RazielXYZ · 2026-04-24T22:24:01Z

This is an interesting dilemma; are there any better alternatives to asio at this point or is this a situation in which one would have to roll their own?
F-Stack doesn't seem to be cross-platform, and even DPDK itself is only partially cross-platform. I assume this would be an issue unless we're fine with different backends on different platforms, plus being fine with it being basically as low-level as BSD sockets.

stephenberry · 2026-04-25T11:31:47Z

There is a lot I like about asio, but also some core flaws and lots of unfixed bugs. I've thought of trying to contribute heavily to asio, but I'd rather just work on a more modern library and not need to support old C++ versions like asio needs to care about. I have an experimental fork of asio that I have massively cleaned up, and which drops lots of deprecated code and cleans up things with modern C++20 concepts, etc. If I could get enough developers to help maintain this I would consider this direction, but I'm wary of the required time investment. The other option is to implement custom cross platform networking code for websockets, but this is more prone to bugs and bifurcates the networking codebase. A couple years ago I was using uWebsockets heavily and core bugs and design flaws were requiring us to do strange hacks and quick exits to avoid segfaults, so tightly optimized networking code is hard, but also very desirable.

I might open source my asio fork soon for feedback. It's a tough call. It removes many thousands of lines of code that aren't needed due to historical debt, but I wouldn't want to aim for parity with asio any more. This would be a completely new library with a similar API, but would evolve in another direction for the sake of modern C++ and performance.

GTruf · 2026-04-25T13:02:24Z

@RazielXYZ, DPDK is fully implemented for Linux. It’s already available for Windows, but is still under active development. No one is suggesting that the backend should be implemented exclusively on the F-Stack, I mean supporting it as an additional option to maximize performance right at the kernel bypass level.

GTruf · 2026-04-25T18:41:20Z

@stephenberry, I think you'll have no trouble finding developers on Reddit.

RazielXYZ · 2026-04-26T00:44:20Z

I might open source my asio fork soon for feedback. It's a tough call. It removes many thousands of lines of code that aren't needed due to historical debt, but I wouldn't want to aim for parity with asio any more. This would be a completely new library with a similar API, but would evolve in another direction for the sake of modern C++ and performance.

I don't think there's necessarily much point in aiming for parity with asio anyway, since asio still exists and is still under development, so if people want parity with asio, well, they can use asio. Something nicer and modern-er, as you describe it there, would certainly be welcome.

As for other ws libraries - I did try uWebSockets a bit, but did not like the API or design much - anything slightly different than expected or somewhat involved was either not doable easily or really damn ugly. I've used ixWebSockets quite a bit recently and it was fine in that regard, but quite limited by the one thread per connection/client design, and not really under active development anymore. Way back in the day I also used websocketpp, which is quite far removed from the niceties we have nowadays.

stephenberry · 2026-04-28T14:06:16Z

Thanks for the feedback and encouragement. I do think an asio fork would be great for the future C++ community. And, I'm excited to do more networking in C++, I'm just short on time at the moment and I don't want to do something half baked.

stephenberry added 6 commits March 24, 2026 09:36

websocket optimization and benchmarking

c38401f

shared receive buffers

b4e1570

boost asio ec

1d8b609

Update websocket_connection.hpp

005ddab

threading updates

9ede030

Default to faster shared receive buffer

6cfa509

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

websocket optimization and benchmarking#2399

websocket optimization and benchmarking#2399
stephenberry wants to merge 6 commits intomainfrom
websocket-optimization

stephenberry commented Mar 24, 2026 •

edited

Loading

Uh oh!

GTruf commented Apr 22, 2026

Uh oh!

stephenberry commented Apr 23, 2026

Uh oh!

GTruf commented Apr 23, 2026 •

edited

Loading

Uh oh!

RazielXYZ commented Apr 24, 2026

Uh oh!

stephenberry commented Apr 25, 2026 •

edited

Loading

Uh oh!

GTruf commented Apr 25, 2026

Uh oh!

GTruf commented Apr 25, 2026

Uh oh!

RazielXYZ commented Apr 26, 2026

Uh oh!

stephenberry commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stephenberry commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!