Implement HTTP headers class#2402
Conversation
|
This also opens up the potential for interacting with custom headers during the WebSocket handshake in the future, but while this would have a positive impact on the library UX, it would most likely come at the cost of performance (after all, storing headers requires memory allocation), since the current WS handshake parses headers as a view without any additional memory allocation, which is a major performance advantage (great as always 🥇) P.S. - what I meant when I was talking about working with custom headers in the upgrade handshake |
|
One of the tests failed for 929ecba. @admin check logs None, packit dashboard https://dashboard.packit.dev/jobs/copr/3410125 and external service dashboard https://copr.fedorainfracloud.org/coprs/build/10263826/ |
|
I should also note that I am not attempting to finalize the class API for working with headers with this implementation, nor am I requesting a merge for this specific implementation. This PR is not ready for a merge and requires further discussion regarding the API if further work is planned in this area; this is more of a starting PoC as a building block for something that would be extremely useful to have in |
|
Thanks for this prototype! You've raised from great points and I'll look this over in depth as I am able. |
|
Just sharing a few thoughts in writing while they were still fresh, in case any of it ends up being useful when you get a chance to look through it. After thinking this through, I came to the conclusion that the API I proposed above is probably not the best fit. Instead of reinventing the wheel, I think it makes sense to look at how HTTP headers are represented in Go's standard library and see if it might be a good idea to adapt that model for C++. In case you have never worked with Go before, I will briefly describe below how its
Both Practical meaning of thisThe important part is that Go does not treat In other words:
Mock examplesBelow are five concrete examples that show how this model behaves. A. A single regular headerX-Foo: oneB. Multiple headers with the same nameX-Foo: one
X-Foo: twoC. A list-based header stored in a single header lineAccept: text/html, application/jsonD. Multiple list-based headers with the same nameAccept: text/html, application/json
Accept: image/pngE. Multiple
|
| Case | Logical header key | Stored representation | Get(key) |
Values(key) |
|---|---|---|---|---|
| A | X-Foo |
{"X-Foo": ["one"]} |
"one" |
["one"] |
| B | X-Foo |
{"X-Foo": ["one", "two"]} |
"one" |
["one", "two"] |
| C | Accept |
{"Accept": ["text/html, application/json"]} |
"text/html, application/json" |
["text/html, application/json"] |
| D | Accept |
{"Accept": ["text/html, application/json", "image/png"]} |
"text/html, application/json" |
["text/html, application/json", "image/png"] |
| E | Set-Cookie |
{"Set-Cookie": ["a=1; Path=/", "b=2; Path=/"]} |
"a=1; Path=/" |
["a=1; Path=/", "b=2; Path=/"] |
Key takeaway
This distinction is important:
Getdoes not concatenate all valuesGetreturns only the first valueValuesreturns the full list of values for that header keyValuesis useful both for repeated header lines and for headers such asSet-Cookie, where multiple values are naturally represented as multiple entries- if a single header line already contains a comma-separated value, such as
Accept: text/html, application/json, thenValuesstill returns that as one string, not as two parsed items
Why this design is useful
This approach keeps the header container simple and predictable:
- repeated header fields are preserved as repeated values
- case-insensitive lookup is still convenient
- list parsing, if needed, can be done separately and explicitly
- special cases such as
Set-Cookiework naturally without inventing header-specific storage rules
That is the main reason why I think this model is a better starting point than trying to encode too much header-specific behavior directly into the container itself.
Motivation
Currently, the net part of
glazeusesstd::unordered_mapas a header storage. This comes with some issues, the main ones being:TL;DR:
Details:
Lowercasing of header names upon insertion.
The original case of header names is lost, which can be unpleasant for the user. This could be fixed by implementing custom
Hash&KeyEqualfunctors forstd::unordered_mapthat lowercase keys during the hashing andKeyEqualstages; however, this would still not solve the problems described below.Loss of multi-value headers.
As you know, HTTP allows multiple field lines with the same field name, and not all such fields can be safely merged into a single comma-separated value.
HTTP allows multiple field lines with the same field name, and not all such fields can be safely merged into a single comma-separated value.
Disrupting the order of headers.
std::unordered_mapdoes not preserve the order in which headers were received. In the current implementation ofhttp_headers, however,std::vectoris used, which allows the original order of the headers to be preserved.Potential performance losses in real-world scenarios
For the typical size of an HTTP header section, a flat std::vector-based representation may be competitive with or even faster than std::unordered_map in practice, due to better cache locality, fewer allocations, and the absence of hashing (this statement should be validated by benchmarks, not words). I'm not sure exactly when this happens, because it depends on the size of the keys, but I'd guess that once you have a few dozen or a hundred entries or more,
std::vectorwill start to lag behindstd::unordered_mapin terms of optimization (but even in that case, if we want maximum speed with massive header sets, and if the library's design allows for the allocation of extra memory on insertion, we can add internal indexing or something simillar).Inability to preserve multi-value headers without the parser concatenating them.
A
std::unordered_mapis a [key:value] structure, which automatically means we have a choice: either we lose the additional value (which is, of course, prohibited by the RFC), or we concatenate it with the previous value. This will complicate the work of the header parser, since it will also need to know the concatenation rules for specific headers (such asCache-ControlandSet-Cookie). Whilehttp_headersin its current form, allows headers to be stored in their low-level representation (without concatenating fields with the same name into a single field), which enables convenient iteration over ANY combined headers, whether it's aSet-Cookiethat cannot be combined with a comma, or any other headers whose values are separated by commas. Thehttp_headersclass doesn't care about this; it is the responsibility of the header parser (which allows for a rather nice SRP). All it does is store the original header fields in astd::vector<pair_of_strings>. Yes, the implementation may incur a small overhead for storing duplicate keys instd::vector(for example, multiple entries for the sameSet-Cookieinstd:: vectorcould potentially be numerous), but in return we get an interface that is extremely convenient for iterating over values, for which, I believe, even low-level developers would be willing to accept the small trade-off described above.P.S. Although the argument that knowing the rules for header value concatenation complicates the parser can be refuted, because the parser needs to know them anyway in order to correctly parse and validate all types of headers.
Inconvenient API.
Based on the previous point (5), we get the following scenario: the user ends up with a
std::unordered_mapcontaining keys with their insertion order lost, as well as values concatenated into a single string - EITHER separated by commas ", " OR by semicolons "; " for some keys, which automatically forces the user to know which separator divides the values of a given key. By using thehttp_headersclass, the user no longer needs to know which specific separator is used to separate the values of a given header; the only time this information is necessary is when concatenating the values of a list-based header. And of course,.add("Header-Name", "Header value")&.replace("Header-Name", "Header value")instead of.emplace(...)instd::unordered_map, a range-like view API for lightweight iteration over headers, overall the current class has a quite convenient API relative tostd::unordered_map(tailored for this task) that will be a joy to work with for anyone who has ever dealt with HTTP headers.Other, slight advantages
.content_length(), which will parse theContent-Lengthheader for the user into a clearstd::expected<size_t, std::error_code>- this would be very convenient and save the user from a lot of boilerplate code..serialize()all HTTP headers for a single method call into a formatted HTTP header section string will make developer lives much, much easier.Implementation details
I tried to follow the
glazecodestyle as best I could (and of course usedglazeclang-format), but it's possible I made a mistake somewhere. I've left "REVIEW:" comments in the code that would be great to review and fix or remove before merging.While writing the code, I had some questions about the philosophy behind this class. I can't answer this question myself, since I certainly don't understand glaze's error-handling model and level of abstraction as well as @stephenberry or other contributors does, so I'm asking for help in the discussion. Should
http_headersbe treated as a low-level storage for header fields, or is it better to make API access immutable (in the form of a const view) and validate all user-provided headers during the constructor,.add(), and.replace()calls? I had three thoughts on this matter, two quite definitive, and one somewhere in the middle:.add()&.replace()methods currently do not validate the names and values of header fields in any way, meaning a user could potentially push through an RFC-invalid header, and during serialization we would serialize that invalid header (currently,.serialize()only checks whether the header name is empty, and I think that we must add full token validation of the entire header—both name and value—to serialize in case we reject the second option).serialize()can be omitted, since the contract of the class itself ensures that there is no possibility of passing through headers that violate the RFC format).fields()range API with a note in the documentation stating that modifying it may violate expected invariants.If the current class API works for you, please let me know, and I'll continue with the implementation and write the documentation for it. Also, if necessary, I can write a basic
parse_headersfunction (which, if desired and necessary, depending on the benchmark results - can be optimized) and replacestd::unordered_map<std::string, std::string>withhttp_headersthroughout the HTTP code.I removed rvalue overloading from some of the
http_headersmethods to prevent the use of APIs that return references, iterators,string_view, or lazy ranges/views on temporary objects.Simply put: to prevent users from accidentally writing code that compiles and looks fine but immediately creates a dangling lifetime.
I took the idea for
http_headersfrom my repository blazeauth/aero and refactored it, improving its API and adapting it to theglazestyle, as I have great respect forglazequality, and want to help make it even betterExamples (even more motivation 😁)
Print Set-Cookie
vs:
Print comma-separated header
vs