Optimize RawBsonDocument encode and decode #1913
Conversation
…e allocations - Add BsonWriter.pipe(byte[], int, int) with BsonBinaryWriter override to write raw BSON bytes directly to the output, avoiding intermediate object allocation on the encode path - Add BsonInput.pipe(BsonOutput, int) to remove the temporary byte[] copy in BsonBinaryWriter.pipeDocument() on both encode and decode paths - Add public getByteBacking(), getByteOffset(), getByteLength() on RawBsonDocument to expose the backing byte array JAVA-6133
- Removes pipe(byte[], int, int) from BsonWriter interface to avoid coupling it to concrete IO classes; dispatches via instanceof BsonBinaryWriter in the codec instead - Renames getByteBacking() to getBackingArray() for clarity - Validates minimum BSON document size before writing raw bytes, consistent with the reader-based pipe path - Adds tests for the raw-byte pipe happy path and invalid-size rejection JAVA-6133
| byte[] bytes = new byte[size - 4]; | ||
| bsonInput.readBytes(bytes); | ||
| bsonOutput.writeBytes(bytes); |
There was a problem hiding this comment.
This avoids an extra temporary byte[] allocation by reading directly into the target buffer, reducing both allocation pressure and byte-copy/processing overhead.
| @Override | ||
| public void pipe(final byte[] bytes, final int offset, final int length) { |
There was a problem hiding this comment.
Instead of routing through a BsonReader (which wraps a BsonInput but doesn’t expose the underlying array unless copied), we write the bytes directly to the output.
There was a problem hiding this comment.
Pull request overview
This PR optimizes RawBsonDocument encode/decode paths by enabling direct piping of raw BSON bytes between inputs/writers/outputs, reducing intermediate allocations and copies.
Changes:
- Added
BsonBinaryWriter.pipe(byte[], int, int)to write raw BSON document bytes directly to the output. - Added
BsonInput.pipe(BsonOutput, int)and implemented it inByteBufferBsonInputto avoid temporary byte array copies during piping. - Exposed
RawBsonDocumentbacking byte array, offset, and length via new public accessors, with accompanying tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| bson/src/test/unit/org/bson/RawBsonDocumentSpecification.groovy | Adds coverage for new RawBsonDocument backing-array/offset/length accessors. |
| bson/src/test/unit/org/bson/BsonBinaryWriterTest.java | Adds tests for piping raw BSON bytes and invalid-size handling. |
| bson/src/main/org/bson/RawBsonDocument.java | Adds public accessors to expose the backing byte array, offset, and length. |
| bson/src/main/org/bson/io/ByteBufferBsonInput.java | Implements BsonInput.pipe with an array-backed fast path. |
| bson/src/main/org/bson/io/BsonInput.java | Introduces the new pipe(BsonOutput, int) API on BsonInput. |
| bson/src/main/org/bson/codecs/RawBsonDocumentCodec.java | Uses a fast-path to pipe raw bytes when the writer is a BsonBinaryWriter. |
| bson/src/main/org/bson/BsonWriter.java | Minor formatting cleanup. |
| bson/src/main/org/bson/BsonBinaryWriter.java | Implements raw-byte piping and refactors pipe-document completion logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| public void encode(final BsonWriter writer, final RawBsonDocument value, final EncoderContext encoderContext) { | ||
| try (BsonBinaryReader reader = new BsonBinaryReader(new ByteBufferBsonInput(value.getByteBuffer()))) { | ||
| writer.pipe(reader); | ||
| if (writer instanceof BsonBinaryWriter) { | ||
| // Fast path. The pipe method should ideally exist on BsonWriter, but adding it as | ||
| // abstract would be a breaking change, and adding it as a default method would force | ||
| // BsonWriter to depend on BsonBinaryReader/ByteBufferBsonInput, violating the | ||
| // interface's abstraction. | ||
| // TODO JAVA-6211 move pipe(byte[], int, int) to BsonWriter to remove this instanceof. | ||
| ((BsonBinaryWriter) writer).pipe(value.getBackingArray(), value.getByteOffset(), value.getByteLength()); | ||
| } else { |
| @Test | ||
| void defaultPipeShouldCopyBytesFromInputToOutput() { | ||
| // given | ||
| byte[] inputBytes = {0x4a, 0x61, 0x76, 0x61, 0x21}; |
There was a problem hiding this comment.
can we do
"Java!".getBytes(StandardCharsets.UTF_8);
instead ? Same thing but for me personally it's really hard to reason about raw bytes , what do you think ?
|
|
||
| @Test | ||
| public void testPipeOfRawBytesWithInvalidSize() { | ||
| byte[] bytes = {4, 0, 0, 0}; // minimum document size is 5 |
There was a problem hiding this comment.
I saw the validation for size of 5, just curious why is it 5 ?
| } | ||
|
|
||
|
|
||
| def 'getBackingArray, getByteOffset and getByteLength should expose the document range'() { |
There was a problem hiding this comment.
I might be mistaken but I saw a lot of PRs that also remove groovy spec class, can we move this test case to java test instead ?
| @Test | ||
| void defaultPipeShouldCopyPartialBytesFromInputToOutput() { | ||
| // given | ||
| byte[] inputBytes = {0x4a, 0x61, 0x76, 0x61, 0x21}; |
There was a problem hiding this comment.
Performance analyzer: link
JAVA-6133