Skip to content

Implement HTTP headers class#2402

Draft
annihilatorq wants to merge 3 commits intostephenberry:mainfrom
annihilatorq:implement-http-headers
Draft

Implement HTTP headers class#2402
annihilatorq wants to merge 3 commits intostephenberry:mainfrom
annihilatorq:implement-http-headers

Conversation

@annihilatorq
Copy link
Copy Markdown
Contributor

@annihilatorq annihilatorq commented Mar 26, 2026

Motivation

Currently, the net part of glaze uses std::unordered_map as a header storage. This comes with some issues, the main ones being:

TL;DR:

  1. current storage cannot represent repeated field lines losslessly;
  2. therefore the parser must either drop values or merge them;
  3. that pushes HTTP-specific semantics into storage/parsing;
  4. the resulting user API becomes awkward.

Details:

  1. Lowercasing of header names upon insertion.
    The original case of header names is lost, which can be unpleasant for the user. This could be fixed by implementing custom Hash & KeyEqual functors for std::unordered_map that lowercase keys during the hashing and KeyEqual stages; however, this would still not solve the problems described below.

  2. Loss of multi-value headers.
    As you know, HTTP allows multiple field lines with the same field name, and not all such fields can be safely merged into a single comma-separated value.
    HTTP allows multiple field lines with the same field name, and not all such fields can be safely merged into a single comma-separated value.

  3. Disrupting the order of headers.
    std::unordered_map does not preserve the order in which headers were received. In the current implementation of http_headers, however, std::vector is used, which allows the original order of the headers to be preserved.

  4. Potential performance losses in real-world scenarios
    For the typical size of an HTTP header section, a flat std::vector-based representation may be competitive with or even faster than std::unordered_map in practice, due to better cache locality, fewer allocations, and the absence of hashing (this statement should be validated by benchmarks, not words). I'm not sure exactly when this happens, because it depends on the size of the keys, but I'd guess that once you have a few dozen or a hundred entries or more, std::vector will start to lag behind std::unordered_map in terms of optimization (but even in that case, if we want maximum speed with massive header sets, and if the library's design allows for the allocation of extra memory on insertion, we can add internal indexing or something simillar).

  5. Inability to preserve multi-value headers without the parser concatenating them.
    A std::unordered_map is a [key:value] structure, which automatically means we have a choice: either we lose the additional value (which is, of course, prohibited by the RFC), or we concatenate it with the previous value. This will complicate the work of the header parser, since it will also need to know the concatenation rules for specific headers (such as Cache-Control and Set-Cookie). While http_headers in its current form, allows headers to be stored in their low-level representation (without concatenating fields with the same name into a single field), which enables convenient iteration over ANY combined headers, whether it's a Set-Cookie that cannot be combined with a comma, or any other headers whose values are separated by commas. The http_headers class doesn't care about this; it is the responsibility of the header parser (which allows for a rather nice SRP). All it does is store the original header fields in a std::vector<pair_of_strings>. Yes, the implementation may incur a small overhead for storing duplicate keys in std::vector (for example, multiple entries for the same Set-Cookie in std:: vector could potentially be numerous), but in return we get an interface that is extremely convenient for iterating over values, for which, I believe, even low-level developers would be willing to accept the small trade-off described above.
    P.S. Although the argument that knowing the rules for header value concatenation complicates the parser can be refuted, because the parser needs to know them anyway in order to correctly parse and validate all types of headers.

  6. Inconvenient API.
    Based on the previous point (5), we get the following scenario: the user ends up with a std::unordered_map containing keys with their insertion order lost, as well as values concatenated into a single string - EITHER separated by commas ", " OR by semicolons "; " for some keys, which automatically forces the user to know which separator divides the values of a given key. By using the http_headers class, the user no longer needs to know which specific separator is used to separate the values of a given header; the only time this information is necessary is when concatenating the values of a list-based header. And of course, .add("Header-Name", "Header value") & .replace("Header-Name", "Header value") instead of .emplace(...) in std::unordered_map, a range-like view API for lightweight iteration over headers, overall the current class has a quite convenient API relative to std::unordered_map (tailored for this task) that will be a joy to work with for anyone who has ever dealt with HTTP headers.

Other, slight advantages

  1. Easy class extensibility. This class is open to extension; that is, if tomorrow we need to add convenient getters for common headers, such as .content_length(), which will parse the Content-Length header for the user into a clear std::expected<size_t, std::error_code> - this would be very convenient and save the user from a lot of boilerplate code.
  2. Built-in serialization. The ability to seamlessly .serialize() all HTTP headers for a single method call into a formatted HTTP header section string will make developer lives much, much easier.

Implementation details

I tried to follow the glaze codestyle as best I could (and of course used glaze clang-format), but it's possible I made a mistake somewhere. I've left "REVIEW:" comments in the code that would be great to review and fix or remove before merging.

While writing the code, I had some questions about the philosophy behind this class. I can't answer this question myself, since I certainly don't understand glaze's error-handling model and level of abstraction as well as @stephenberry or other contributors does, so I'm asking for help in the discussion. Should http_headers be treated as a low-level storage for header fields, or is it better to make API access immutable (in the form of a const view) and validate all user-provided headers during the constructor, .add(), and .replace() calls? I had three thoughts on this matter, two quite definitive, and one somewhere in the middle:

  1. No field validation (the current implementation stops here) - the constructor and the .add() & .replace() methods currently do not validate the names and values of header fields in any way, meaning a user could potentially push through an RFC-invalid header, and during serialization we would serialize that invalid header (currently, .serialize() only checks whether the header name is empty, and I think that we must add full token validation of the entire header—both name and value—to serialize in case we reject the second option)
  2. Full field validation - validate fields for potential errors according to the RFC, reject invalid tokens, empty field names, CR & LF characters in field values, and so on, while also removing the mutable range API (.fields() &), as this would constitute a breach of the original contract and a potential violation of class invariants (in this implementation, the validation on .serialize() can be omitted, since the contract of the class itself ensures that there is no possibility of passing through headers that violate the RFC format)
  3. A combination of the previous two options - validation from the second option, while retaining the mutable .fields() range API with a note in the documentation stating that modifying it may violate expected invariants.

If the current class API works for you, please let me know, and I'll continue with the implementation and write the documentation for it. Also, if necessary, I can write a basic parse_headers function (which, if desired and necessary, depending on the benchmark results - can be optimized) and replace std::unordered_map<std::string, std::string> with http_headers throughout the HTTP code.

I removed rvalue overloading from some of the http_headers methods to prevent the use of APIs that return references, iterators, string_view, or lazy ranges/views on temporary objects.

Simply put: to prevent users from accidentally writing code that compiles and looks fine but immediately creates a dangling lifetime.

I took the idea for http_headers from my repository blazeauth/aero and refactored it, improving its API and adapting it to the glaze style, as I have great respect for glaze quality, and want to help make it even better

Examples (even more motivation 😁)

Print Set-Cookie

void print_set_cookie_with_unordered_map()
{
   std::unordered_map<std::string, std::string> headers;

   headers.emplace("Set-Cookie", "session=abc; Path=/; HttpOnly");
   headers["Set-Cookie"] = "theme=dark; Path=/; Max-Age=3600";

   const auto iterator = headers.find("Set-Cookie");
   if (iterator == headers.end()) {
      return;
   }

   std::println("Set-Cookie: {}", iterator->second);

   // Problem:
   // The first Set-Cookie field line is already gone.
   // std::unordered_map<std::string, std::string> can only store one value per key.

   // Problem:
   // This is not just an inconvenience during iteration.
   // The storage model itself has already destroyed valid HTTP data.
}

vs:

void print_set_cookie_with_http_headers(const glz::http_headers& headers)
{
   for (const auto& field : headers.fields("Set-Cookie")) {
      std::println("{}: {}", field.name, field.value);
   }

   // Good:
   // Each Set-Cookie field line is preserved as a separate entry.

   // Good:
   // Iteration matches the actual HTTP wire format:
   // Repeated field lines remain repeated field lines.
}

Print comma-separated header

std::vector<std::string_view> split_header_list(std::string_view text)
{
   std::vector<std::string_view> parts;

   size_t part_begin = 0;
   while (part_begin < text.size()) {
      const size_t comma_position = text.find(',', part_begin);
      const size_t part_end = comma_position == std::string_view::npos ? text.size() : comma_position;

      while (part_begin < part_end && text[part_begin] == ' ') {
         ++part_begin;
      }

      size_t trimmed_end = part_end;
      while (trimmed_end > part_begin && text[trimmed_end - 1] == ' ') {
         --trimmed_end;
      }

      parts.emplace_back(text.substr(part_begin, trimmed_end - part_begin));

      if (comma_position == std::string_view::npos) {
         break;
      }

      part_begin = comma_position + 1;
   }

   return parts;
}

void print_accept_encoding_with_unordered_map(const std::unordered_map<std::string, std::string>& headers)
{
   const auto iterator = headers.find("Accept-Encoding");
   if (iterator == headers.end()) {
      return;
   }

   for (const std::string_view encoding : split_header_list(iterator->second)) {
      std::println("{}", encoding);
   }

   // Problem:
   // Iteration is no longer iteration over stored header fields.
   // The user first has to know that this header is comma-separated.

   // Problem:
   // The user must manually split the string and implement trimming logic.

   // Problem:
   // This logic is header-specific.
   // It is valid for some headers, but invalid for Set-Cookie.

   // Problem:
   // The container API gives no help here.
   // The user has to know HTTP semantics and write parsing code by hand.
}

vs

void print_accept_encoding_with_http_headers(const glz::http_headers& headers)
{
   for (std::string_view value : headers.values("Accept-Encoding")) {
      std::println("{}", value);
   }

   // Good:
   // The container can return every matching field line directly.

   // Good:
   // If the parser stored multiple Accept-Encoding field lines,
   // The user can iterate over them without any extra storage conventions.

   // Good:
   // The user stays at the "iterate matching header fields" level first,
   // And only then decides whether to split comma-separated syntax.
}

@annihilatorq
Copy link
Copy Markdown
Contributor Author

This also opens up the potential for interacting with custom headers during the WebSocket handshake in the future, but while this would have a positive impact on the library UX, it would most likely come at the cost of performance (after all, storing headers requires memory allocation), since the current WS handshake parses headers as a view without any additional memory allocation, which is a major performance advantage (great as always 🥇)

P.S. - what I meant when I was talking about working with custom headers in the upgrade handshake

@packit-as-a-service
Copy link
Copy Markdown

One of the tests failed for 929ecba. @admin check logs None, packit dashboard https://dashboard.packit.dev/jobs/copr/3410125 and external service dashboard https://copr.fedorainfracloud.org/coprs/build/10263826/

@annihilatorq
Copy link
Copy Markdown
Contributor Author

I should also note that I am not attempting to finalize the class API for working with headers with this implementation, nor am I requesting a merge for this specific implementation. This PR is not ready for a merge and requires further discussion regarding the API if further work is planned in this area; this is more of a starting PoC as a building block for something that would be extremely useful to have in glaze, at least from a functional standpoint (so as not to lose multiple headers with the same name)

@stephenberry
Copy link
Copy Markdown
Owner

Thanks for this prototype! You've raised from great points and I'll look this over in depth as I am able.

@annihilatorq
Copy link
Copy Markdown
Contributor Author

Just sharing a few thoughts in writing while they were still fresh, in case any of it ends up being useful when you get a chance to look through it.

After thinking this through, I came to the conclusion that the API I proposed above is probably not the best fit.

Instead of reinventing the wheel, I think it makes sense to look at how HTTP headers are represented in Go's standard library and see if it might be a good idea to adapt that model for C++. In case you have never worked with Go before, I will briefly describe below how its http.Header type works.

http.Header is essentially map[string][]string, with helper methods for case-insensitive access to header fields.

  • Add(key, value) appends a new value for a key
  • Get(key) returns the first value for a key
  • Values(key) returns all values for that key

Both Get and Values are case-insensitive, and Values returns the underlying slice.

Practical meaning of this

The important part is that Go does not treat Header as a semantic parser for comma-separated HTTP field values.
It stores header values as they were associated with the header key, not as a fully normalized list of parsed items (unlike the interface I suggested above).

In other words:

  • Get(key) returns the first value associated with the key
  • Values(key) returns all values associated with the key
  • Add(key, value) appends one more value to the key
  • values inside a single string such as "text/html, application/json" are not automatically split into separate entries by Header.Values (again, unlike the interface I suggested)

Mock examples

Below are five concrete examples that show how this model behaves.

A. A single regular header

X-Foo: one

B. Multiple headers with the same name

X-Foo: one
X-Foo: two

C. A list-based header stored in a single header line

Accept: text/html, application/json

D. Multiple list-based headers with the same name

Accept: text/html, application/json
Accept: image/png

E. Multiple Set-Cookie headers

Set-Cookie: a=1; Path=/
Set-Cookie: b=2; Path=/

Behavior

Case Logical header key Stored representation Get(key) Values(key)
A X-Foo {"X-Foo": ["one"]} "one" ["one"]
B X-Foo {"X-Foo": ["one", "two"]} "one" ["one", "two"]
C Accept {"Accept": ["text/html, application/json"]} "text/html, application/json" ["text/html, application/json"]
D Accept {"Accept": ["text/html, application/json", "image/png"]} "text/html, application/json" ["text/html, application/json", "image/png"]
E Set-Cookie {"Set-Cookie": ["a=1; Path=/", "b=2; Path=/"]} "a=1; Path=/" ["a=1; Path=/", "b=2; Path=/"]

Key takeaway

This distinction is important:

  • Get does not concatenate all values
  • Get returns only the first value
  • Values returns the full list of values for that header key
  • Values is useful both for repeated header lines and for headers such as Set-Cookie, where multiple values are naturally represented as multiple entries
  • if a single header line already contains a comma-separated value, such as Accept: text/html, application/json, then Values still returns that as one string, not as two parsed items

Why this design is useful

This approach keeps the header container simple and predictable:

  • repeated header fields are preserved as repeated values
  • case-insensitive lookup is still convenient
  • list parsing, if needed, can be done separately and explicitly
  • special cases such as Set-Cookie work naturally without inventing header-specific storage rules

That is the main reason why I think this model is a better starting point than trying to encode too much header-specific behavior directly into the container itself.

@annihilatorq annihilatorq marked this pull request as draft March 30, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants