Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions .github/workflows/vector-plugins.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
strategy:
fail-fast: false
matrix:
java: [ '17' ]
java: [ '17', '21', '25' ]
codes: [ 'uncompressed' ]
name: Build Parquet with JDK ${{ matrix.java }} and ${{ matrix.codes }}

Expand All @@ -46,12 +46,19 @@ jobs:
run: |
EXTRA_JAVA_TEST_ARGS=$(./mvnw help:evaluate -Dexpression=extraJavaTestArgs -q -DforceStdout)
export MAVEN_OPTS="$MAVEN_OPTS $EXTRA_JAVA_TEST_ARGS"
./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true -Dspotless.check.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use similar pattern like SPOTLESS_ARGS below?

- name: verify
env:
TEST_CODECS: ${{ matrix.codes }}
JAVA_VERSION: ${{ matrix.java }}
run: |
EXTRA_JAVA_TEST_ARGS=$(./mvnw help:evaluate -Dexpression=extraJavaTestArgs -q -DforceStdout)
export MAVEN_OPTS="$MAVEN_OPTS $EXTRA_JAVA_TEST_ARGS"
./mvnw verify --batch-mode -Pvector-plugins javadoc:javadoc -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
# Spotless check uses palantir-java-format which relies on internal javac APIs
# that are not available on all JDK versions (e.g. JDK 25+). Since the formatting
# result is JDK-independent, running the check on JDK 17 alone is sufficient.
SPOTLESS_ARGS=""
if [ "${{ matrix.java }}" != "17" ]; then
SPOTLESS_ARGS="-Dspotless.check.skip=true"
fi
./mvnw verify --batch-mode -Pvector-plugins javadoc:javadoc $SPOTLESS_ARGS -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*.class
.project
.classpath
.factorypath
.settings
target
# Package Files #
Expand All @@ -20,4 +21,3 @@ target/
mvn_install.log
.vscode/*
.DS_Store

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this blank line

Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(BYTE_SPECIES_64, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(BYTE_SPECIES_64, in, inPos);
ShortVector tempRes = byteVector
.castShape(SHORT_SPECIES_512, 0)
.reinterpretAsBytes()
Expand Down Expand Up @@ -260,7 +260,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(BYTE_SPECIES, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(BYTE_SPECIES, in, inPos);
ShortVector tempRes = byteVector
.castShape(LONG_SPECIES, 0)
.reinterpretAsBytes()
Expand Down Expand Up @@ -377,9 +377,8 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B128, in, inPos, in.order())
.castShape(S512, 0)
.reinterpretAsBytes();
ByteVector byteVector =
fromByteBuffer(B128, in, inPos, inp_mask).castShape(S512, 0).reinterpretAsBytes();
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -466,7 +465,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(BSPECIES, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(BSPECIES, in, inPos);
ShortVector tempRes = byteVector
.castShape(ISPECIES, 0)
.reinterpretAsBytes()
Expand Down Expand Up @@ -582,9 +581,8 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B256, in, inPos, in.order(), inp_mask)
.castShape(S512, 0)
.reinterpretAsBytes();
ByteVector byteVector =
fromByteBuffer(B256, in, inPos, inp_mask).castShape(S512, 0).reinterpretAsBytes();

ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
Expand Down Expand Up @@ -705,9 +703,8 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B256, in, inPos, in.order(), inp_mask)
.castShape(S512, 0)
.reinterpretAsBytes();
ByteVector byteVector =
fromByteBuffer(B256, in, inPos, inp_mask).castShape(S512, 0).reinterpretAsBytes();

ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
Expand Down Expand Up @@ -827,9 +824,8 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B256, in, inPos, in.order(), inp_mask)
.castShape(S512, 0)
.reinterpretAsBytes();
ByteVector byteVector =
fromByteBuffer(B256, in, inPos, inp_mask).castShape(S512, 0).reinterpretAsBytes();
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -914,7 +910,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(B512, in, inPos);
byteVector
.castShape(ISPECIES, 0)
.lanewise(VectorOperators.AND, 255)
Expand Down Expand Up @@ -1004,7 +1000,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1084,7 +1080,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(B512, in, inPos);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1194,7 +1190,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1280,7 +1276,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order());
ByteVector byteVector = fromByteBuffer(B512, in, inPos);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1388,7 +1384,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1512,7 +1508,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1630,7 +1626,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsShorts()
Expand Down Expand Up @@ -1703,7 +1699,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
ShortVector shortVector = byteVector.reinterpretAsShorts();
shortVector
.castShape(I512, 0)
Expand Down Expand Up @@ -1783,7 +1779,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -1866,7 +1862,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -1944,7 +1940,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2022,7 +2018,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2102,7 +2098,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2182,7 +2178,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2261,7 +2257,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2332,7 +2328,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 =
byteVector.rearrange(perm_mask0).reinterpretAsInts().lanewise(VectorOperators.AND, 16777215);
tempRes1.intoArray(out, outPos, out_mask);
Expand Down Expand Up @@ -2407,7 +2403,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2486,7 +2482,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2603,7 +2599,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2718,7 +2714,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2832,7 +2828,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -2960,7 +2956,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -3089,7 +3085,7 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector
.rearrange(perm_mask0)
.reinterpretAsInts()
Expand Down Expand Up @@ -3175,13 +3171,30 @@ public final void unpackValuesUsingVector(final byte[] in, final int inPos, fina

public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int outPos) {
ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos, in.order(), inp_mask);
ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 = byteVector.rearrange(perm_mask0).reinterpretAsInts();

tempRes1.intoArray(out, outPos, out_mask);
}
}

private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
}

private static ByteVector fromByteBuffer(
VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on this, but Codex complains as below:

The new fromByteBuffer(species, input, inPos, mask) helper changes the contract of the old JDK 17 masked load. Per the JDK 17 ByteVector.fromByteBuffer(..., m) docs, only masked-on lanes must be in bounds, but this fallback always copies species.length() bytes for direct buffers before applying the mask. That breaks packers whose mask intentionally covers fewer bytes than the vector width, such as Packer3 (B128 with a 12-byte mask): a direct ByteBuffer with exactly the required packed bytes now throws BufferUnderflowException, even though the old implementation and the array-backed path both accept it. The fix needs to preserve masked-load bounds semantics instead of unconditionally reading the whole vector window.

return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
}
Comment on lines +3203 to +3214
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the masked overload, readInputBytes(..., mask.trueCount()) creates a backing array smaller than species.length(). ByteVector.fromArray(species, array, 0, mask) may still perform bounds checks assuming array.length >= species.length() (and also breaks immediately if a mask ever has a true lane at an index >= array.length). To keep masked loads safe across JDK implementations, allocate an array of species.length() and only fill the needed prefix bytes (e.g., mask.trueCount()), leaving the rest as zero.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this was wrong. mask.trueCount() only happens to be safe at the call sites in this file because every mask is built as a contiguous-prefix mask, but ByteVector.fromArray(species, array, 0, mask) requires array.length >= species.length() for any mask shape, and the masked overload of OpenJDK 17's ByteVector.fromByteBuffer actually loads the full window unconditionally and then blends with the mask — not a prefix copy. The fix in dffb6e6 is exactly that: always allocate species.length() bytes and let fromArray(..., mask) apply the mask, matching JDK semantics for arbitrary masks. Verified the AVX-512 unpack tests still pass on JDK 17 / 21 / 25.

Comment on lines +3189 to +3214
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fromByteBuffer(...) helpers introduce a distinct fallback path for non-array-backed buffers (!input.hasArray()), including ByteBuffer.allocateDirect() and read-only heap buffers. The existing vector unpack tests only cover heap ByteBuffer.wrap(byte[]), so this direct/read-only path is currently untested. Add a unit test that runs unpackValuesUsingVector(ByteBuffer, ...) against a direct (and optionally read-only) buffer to validate correctness and protect against regressions in readInputBytes/masked loads on newer JDKs.

Copilot uses AI. Check for mistakes.

private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {
byte[] bytes = new byte[byteCount];
ByteBuffer source = input.duplicate();
source.position(inPos);
source.get(bytes);
Comment on lines +3189 to +3220
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readInputBytes allocates a new byte[] and creates a new ByteBuffer via duplicate() on every vector unpack from a ByteBuffer. This is a significant per-call allocation/copy regression compared to the previous ByteVector.fromByteBuffer(...) path and can materially impact the performance benefits of the vector plugin. Consider reducing allocations by reusing a scratch buffer (e.g., via ThreadLocal<byte[]> sized to the max species length) and/or fast-pathing heap buffers (input.hasArray()) to load directly from the backing array when possible.

Suggested change
private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
}
private static ByteVector fromByteBuffer(
VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
}
private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {
byte[] bytes = new byte[byteCount];
ByteBuffer source = input.duplicate();
source.position(inPos);
source.get(bytes);
private static final ThreadLocal<byte[]> INPUT_SCRATCH =
ThreadLocal.withInitial(() -> new byte[B512.length()]);
private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
if (input.hasArray()) {
return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos);
}
return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
}
private static ByteVector fromByteBuffer(
VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
if (input.hasArray()) {
return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos, mask);
}
return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
}
private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {
byte[] bytes = INPUT_SCRATCH.get();
for (int i = 0; i < byteCount; i++) {
bytes[i] = input.get(inPos + i);
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

@iemejia iemejia Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the regression concern, and the heap-buffer fast path is in dffb6e6input.hasArray() returns ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos[, mask]) directly, which HotSpot intrinsifies to the same vmovdqu64 AVX-512 load as the original fromByteBuffer. Since BytePacker call sites in parquet-mr use ByteBuffer.wrap(byte[]) (heap), this fast path covers the dominant production case with no copy and no duplicate().

I'd like to push back on the ThreadLocal<byte[]> part:

  1. It only eliminates the byte[] allocation, not the ByteBuffer.duplicate(), which is the larger of the two costs on the direct-buffer path. To eliminate both you'd need a thread-local ByteBuffer view as well, at which point the helper is doing more bookkeeping than the load itself.
  2. The byte[] is 8–64 bytes, TLAB-allocated, never escapes the helper, and is a strong escape-analysis candidate — HotSpot will frequently scalar-replace it on hot paths. ThreadLocal.get() (~5 ns) never scalarizes, so on the common JIT-hot case the ThreadLocal is a net loss.
  3. ThreadLocal of a mutable byte[] carries the usual operational risks in long-lived JVMs (thread-pool retention, classloader leak surface) for code that is explicitly throwaway: dffb6e6 adds a TODO marking ByteVector.fromMemorySegment (JEP 454, JDK 22+) as the proper replacement, which removes the copy entirely for both heap and direct buffers via ScopedMemoryAccess and is intrinsifiable. Once the project's minimum JDK reaches 22, both helpers and readInputBytes go away.

So the change keeps the direct-buffer path as a simple byte[] + duplicate() fallback (matching the previous PR's behavior on that path), adds the heap fast path you suggested for the dominant case, and defers further optimization to the fromMemorySegment migration tracked in the TODO.

return bytes;
}

private static void notSupport() {
throw new RuntimeException(
"ByteBitPacking512VectorLE doesn't support the function, please use ByteBitPackingLE!");
Expand Down