Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions clients/src/main/java/org/apache/kafka/common/Uuid.java
Original file line number Diff line number Diff line change
Expand Up @@ -70,12 +70,12 @@ private static Uuid unsafeRandomUuid() {

/**
* Static factory to retrieve a type 4 (pseudo randomly generated) UUID.
*
* This will not generate a UUID equal to 0, 1, or one whose string representation starts with a dash ("-")
* <p>
* This will not generate a UUID equal to 0, 1, or one whose string representation contains a dash ("-").
*/
public static Uuid randomUuid() {
Uuid uuid = unsafeRandomUuid();
while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) {
while (RESERVED.contains(uuid) || uuid.toString().contains("-")) {
Copy link
Copy Markdown
Contributor

@squah-confluent squah-confluent Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the probability of rejecting a generated uuid rises from 1.56% (= 1/64) to 30.5% (= 1 - (63/64)**19 * 15/16, taking into account fixed bits in version 4 uuids). The new p99 number of rejections is ~3.88 (was 1.11) which I think is not too bad.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the math :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30% failure rate for a general utility class is really high and I don't think it makes sense. Uuids are occasionally used in cases where performance matters. Also, the reasoning for the change is unclear.

Copy link
Copy Markdown
Member

@mumrah mumrah Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ijuma -- that's a very good point about performance! I did check through fetch/produce paths originally, but of course that doesn't protect us from future usages (or things not caught during the review).

I'll file a patch shortly that makes this new behavior opt-in.

As to the reasoning, the original Jira I wrote was about issues with double-clicking on IDs. Really it's just meant to be a small quality of life improvement.

Copy link
Copy Markdown
Member

@mumrah mumrah Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #22442

uuid = unsafeRandomUuid();
}
return uuid;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void testRandomUuid() {

assertNotEquals(Uuid.ZERO_UUID, randomID);
assertNotEquals(Uuid.METADATA_TOPIC_ID, randomID);
assertFalse(randomID.toString().startsWith("-"));
assertFalse(randomID.toString().contains("-"));
}

@Test
Expand Down
16 changes: 16 additions & 0 deletions docs/design/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,19 @@ Others have asked if maybe we shouldn't support many different protocols. Prior
Another question is why we don't adopt XMPP, STOMP, AMQP or an existing protocol. The answer to this varies by protocol, but in general the problem is that the protocol does determine large parts of the implementation and we couldn't do what we are doing if we didn't have control over the protocol. Our belief is that it is possible to do better than existing messaging systems have in providing a truly distributed messaging system, and to do this we need to build something that works differently.

A final question is why we don't use a system like Protocol Buffers or Thrift to define our request messages. These packages excel at helping you to managing lots and lots of serialized messages. However we have only a few messages. Support across languages is somewhat spotty (depending on the package). Finally the mapping between binary log format and wire protocol is something we manage somewhat carefully and this would not be possible with these systems. Finally we prefer the style of versioning APIs explicitly and checking this to inferring new values as nulls as it allows more nuanced control of compatibility.

## Recommendations for 3rd‑party Clients: Member ID Format

When a Kafka client participates in group protocols (e.g., `ConsumerGroupHeartbeat` RPC), it must generate a **member ID** to identify itself to the broker. While the protocol does not strictly enforce the format of this ID, we strongly recommend the following:

1. **Use a base64‑encoded UUID** as the member ID.
2. **Encode the UUID using URL‑safe base64** (without `+` or `/` characters).
3. **Omit hyphens** — the resulting string should be a continuous sequence of alphanumeric characters (e.g., `abc123def456`).

**Example**
A standard UUID (`00000000-0000-0000-0000-000000000000`) should be transformed into a URL‑safe base64 string like: `YzYxNjQ4OTItZDE1Mi00Y2E4LWIyNzUtYmIwMzAwMDAwMDAw`

*(Note: This is illustrative; actual encoding depends on the UUID bytes.)*

**Important**
While this is a strong recommendation, the protocol does **not** reject member IDs that deviate from this format.