Skip to content

No ticket: Clarify when the zigzag encoding is used [skip ci]#3503

Open
abellgithub wants to merge 1 commit into
apache:masterfrom
abellgithub:compact-correction
Open

No ticket: Clarify when the zigzag encoding is used [skip ci]#3503
abellgithub wants to merge 1 commit into
apache:masterfrom
abellgithub:compact-correction

Conversation

@abellgithub
Copy link
Copy Markdown

Client: all

  • Did you create an Apache Jira ticket? (Request account here, not required for trivial changes)
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

@abellgithub abellgithub requested a review from Jens-G as a code owner May 20, 2026 16:41
possible to handle unknown fields while decoding by ignoring them. The field type is used to determine how to decode field values.

Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility.
Note that the fields are identified by their integer value and not thier name.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thier name?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me, but you're welcome to change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not "their name"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, spelling. Will fix.

@Jens-G Jens-G changed the title No ticket: Clarify when the zigzag encoding is used. No ticket: Clarify when the zigzag encoding is used [skip ci] May 20, 2026
@Jens-G
Copy link
Copy Markdown
Member

Jens-G commented May 20, 2026

Code review

Found 1 issue:

  1. Double encoding description contradicts the document's own i64 definition — the new "Integer encoding" section defines i16/i32/i64 as ZigZag-encoded and then varint-encoded. Saying doubles are "encoded as if they were i64" implies they go through the same ZigZag+varint pipeline, which is incorrect — doubles are written as 8 raw little-endian bytes with no ZigZag and no varint transformation.

Values of type `i8` are encoded as one byte.
Values of type `i16`, `i32` and `i64` are first transformed to using [ZigZag](#zigzag-encoding)
and that result is then [varint](#varint-encoding) encoded before being written.
### Enum encoding
`Enum`s are encoded as their integer value as defined by the Thrift IDL.
### Binary encoding
Binary is sent as follows:
```
Binary protocol, binary data, 1+ bytes:
+--------+...+--------+--------+...+--------+
| byte length | bytes |
+--------+...+--------+--------+...+--------+
```
Where:
* `byte length` is the length of the byte array (non-negative), encoded as a
[varint](#varint-encoding) (**not** [ZigZag'ed](#zigzag-encoding)).
* `bytes` are the bytes of the byte array.
### String encoding
Strings are written as [binary](#binary-encoding) and must be encoded as UTF-8. Strings do
not include a NULL(0) terminator.
### Double encoding
Values of type `double` are converted to little-endian and then encoded as if they were `i64` (the bytes are treated as the bytes of an 64-bit integer).

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@Jens-G Jens-G added the doc label May 20, 2026
@abellgithub
Copy link
Copy Markdown
Author

abellgithub commented May 20, 2026

I don't see any way to respond explicitly to the comment about doubles. A look at the code shows that I misinterpreted the old language. I will edit.

@abellgithub
Copy link
Copy Markdown
Author

Actually, the old language talking about int64 seems pointless. The bytes of a double seem to be written directly in little-endian byte order -- any reference to i64 would strike me as confusing (and was probably why I was confused when I read the old language).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants