Skip to content

Add $rerank aggregation stage support#1963

Merged
rozza merged 5 commits into
mongodb:mainfrom
rozza:JAVA-6052
May 20, 2026
Merged

Add $rerank aggregation stage support#1963
rozza merged 5 commits into
mongodb:mainfrom
rozza:JAVA-6052

Conversation

@rozza
Copy link
Copy Markdown
Member

@rozza rozza commented May 6, 2026

Add $rerank aggregation stage support

Adds builder support for the $rerank pipeline stage (MongoDB 8.3, Atlas only).

API:

  • RerankQuery: query object with text shorthand or full Bson for future modalities
  • Aggregates.rerank(): 2 overloads (single path, multi path)
  • Scala wrappers and type alias in Aggregates.scala

JAVA-6052

@rozza rozza requested a review from Copilot May 6, 2026 15:06
@rozza rozza force-pushed the JAVA-6052 branch 2 times, most recently from c71c695 to 23de292 Compare May 6, 2026 15:22
Adds builder support for the $rerank pipeline stage (MongoDB 8.3, Atlas only).

API:
- RerankQuery: query object with text shorthand or full Bson for future modalities
- Aggregates.rerank(): 2 overloads (single path, multi path)
- Scala wrappers and type alias in Aggregates.scala

JAVA-6052
@rozza rozza marked this pull request as ready for review May 6, 2026 15:44
@rozza rozza requested a review from a team as a code owner May 6, 2026 15:44
@rozza rozza requested a review from nhachicha May 6, 2026 15:44
Copy link
Copy Markdown
Collaborator

@nhachicha nhachicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two comments plus AI review below (most are nits & praises 👍 )

AI review (Opus 4.7, 1M with /driver-code-review skill)

Findings

driver-core/src/main/com/mongodb/client/model/RerankQuery.java

  • 🟡 [important] L154: Missing @Immutable annotation on a value-object class with only final fields
    Detail: Public client.model value objects in this driver are typically annotated with @Immutable (from com.mongodb.annotations) to document the thread-safety contract. RerankQuery qualifies (final
    class, all final fields). Add the annotation.

  • 🟡 [important] L155, L183-186: The Bson overload stores the caller's reference without eager conversion — caller can mutate after construction
    Detail: rerankQuery(Bson) stores the passed Bson reference directly. If a caller passes a mutable Document, mutating it later changes the serialized output of this RerankQuery (and any pipeline that
    captures it). The String overload is safe (it builds a BsonDocument internally). For the Bson overload, eagerly snapshot: return new RerankQuery(query.toBsonDocument(BsonDocument.class, DEFAULT_CODEC_REGISTRY)), or document the no-mutate contract explicitly.

  • 🟢 [nit] L154-197: No equals/hashCode on a public value-object class
    Detail: Not a blocker (matches VectorSearchBson precedent), but value-shaped public types often benefit from structural equality for users composing pipelines in maps/sets. Skip unless other builders in
    this area have it.

driver-core/src/main/com/mongodb/client/model/Aggregates.java

  • 🟡 [important] L55-66 (multi-path overload): numDocsToRerank range (1-1000) is documented in Javadoc but never enforced
    Detail: The Javadoc on both overloads says (1-1000), but there's no isTrueArgument check. Negative, zero, or >1000 values silently produce a stage the server will reject with an opaque error. Add
    isTrueArgument("numDocsToRerank must be between 1 and 1000", numDocsToRerank >= 1 && numDocsToRerank <= 1000); in the multi-path overload (the single-path overload delegates, so one place suffices). Confirm
    the exact server-side range before locking in the bounds.

  • 🟡 [important] L62-65 (multi-path overload): No per-element null check on paths
    Detail: notNull("paths", paths) and isTrueArgument(...!paths.isEmpty()) check the container, but asList("content", null) would produce a $rerank stage with a null entry. Validate each element, e.g.,
    paths.forEach(p -> notNull("paths element", p));.

  • 🟢 [nit] L85-87 (RerankBson constructor): paths stored without defensive copy
    Detail: A mutable list passed via the multi-path overload can be mutated later, affecting toBsonDocument output and toString. this.paths = new ArrayList<>(paths) (or List.copyOf(paths)) keeps the
    internal state stable. The single-path delegation already uses singletonList (immutable), so only the multi-path overload is affected.

  • 💡 [suggestion] L31, L55: Consider tightening numDocsToRerank to a typed/bounded helper or accepting long to avoid silent narrowing
    Detail: Optional. The int choice is consistent with similar builders (e.g., vectorSearch's limit), so this is purely a style question.

  • 🎉 [praise] L37-38: Clean delegation — single-path overload validates only its own parameter and forwards to the multi-path implementation, avoiding validation duplication.

  • 🎉 [praise] L92-95 (toBsonDocument): Smart serialization choice — emitting a bare string for single-path and an array for multi-path matches the server's expected shape exactly, while exposing two
    natural overloads at the API level.

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

  • 🟢 [nit] L262-339: No negative tests — null inputs, empty path list, out-of-range numDocsToRerank
    Detail: All four new tests verify happy-path serialization. There are no tests for rerank(null, "path", 25, "model"), empty paths list, or boundary numDocsToRerank values. Adding validation-failure
    tests would lock in the contract — especially if the range check is added per the important finding above.

  • 💡 [suggestion] L262-339: These tests don't touch a server — consider moving them to a unit test class
    Detail: assertPipeline is a pure serialization check. The driver convention is to keep server-dependent tests in functional/ and pure-serialization tests in unit/. Not blocking — matches what the
    existing testExactVectorSearchWithQueryObject and friends do in this same file.

  • 🎉 [praise] L302-319, L321-339: Good coverage of the Bson query overload, including the multi-modal case (imageURL) the API was designed to support — these tests double as documentation of the
    intended future modality use case.

driver-scala/src/main/scala/org/mongodb/scala/model/Aggregates.scala

  • 🎉 [praise] L370-399: Idiomatic Scala wrappers — Seq[String] parameter, .toList.asJava conversion, no get-prefixed methods, @Beta(Array(Reason.SERVER)) annotation syntax. No Java getter/setter
    patterns leaked into the Scala API surface.

driver-scala/src/main/scala/org/mongodb/scala/model/package.scala

  • 🎉 [praise] L990: Type alias correctly added so Scala users can refer to org.mongodb.scala.model.RerankQuery instead of the Java FQN. Placed appropriately near GeoNearOptions.

driver-scala/src/test/scala/org/mongodb/scala/model/AggregatesSpec.scala

  • 🟡 [important] L425: Scala test imports com.mongodb.client.model.RerankQuery directly, bypassing the Scala type alias
    Detail: import com.mongodb.client.model.RerankQuery defeats the purpose of the alias added in package.scala. The test should use org.mongodb.scala.model.RerankQuery (or rely on the existing
    org.mongodb.scala.model._ wildcard if present), so users copying this as a usage example get the idiomatic Scala import. Compare with how GeoNearOptions and similar are imported in nearby tests.

  • 🟢 [nit] L433-475: Only the text-query variant is exercised; the Bson query overload is not covered
    Detail: The Java functional tests cover rerankQuery(Bson) (the multi-modal use case), but the Scala spec does not. Adding one example using the Bson overload would complete parity and document the
    Scala-side calling convention for that overload.

Cross-cutting

  • 📚 [learning] Kotlin coverage: No Kotlin-specific changes are needed — driver-kotlin-coroutine and driver-kotlin-sync reuse Aggregates directly via Java interop, matching how vectorSearch is
    exposed. Worth a one-line mention in the PR description so reviewers don't go looking.

  • 📚 [learning] Binary compatibility: All changes are pure additions — new public methods, a new public final class, new Scala wrappers, a new Scala type alias. No existing signatures changed. No
    binary-compatibility concerns.

  • 🟢 [nit] @since 5.8 consistency: Verify 5.8 matches the upcoming release version (the @mongodb.server.release 8.3 tag already documents the server requirement). If 5.8 has already shipped, this
    needs to be bumped.

Summary

The PR is a clean, well-structured addition that follows the driver's established patterns for $-stage builders (Bson-returning static factories on Aggregates, a final value-object query class, parallel
Scala wrappers with a type alias, @Beta(Reason.SERVER) gating). The two issues most worth fixing before merge are runtime enforcement of the documented numDocsToRerank 1-1000 range — currently the
constraint exists only in Javadoc — and the Scala test importing RerankQuery from the Java package directly, which defeats the type alias and sets a misleading example for Scala users. The missing
@Immutable annotation, lack of null-element validation on paths, and absence of defensive copies for the wrapped Bson query and paths list are also worth addressing while the API is still under
@Beta.

Comment thread driver-core/src/main/com/mongodb/client/model/Aggregates.java
Comment thread driver-core/src/main/com/mongodb/client/model/Aggregates.java
@rozza rozza requested a review from nhachicha May 19, 2026 13:24
@rozza
Copy link
Copy Markdown
Member Author

rozza commented May 19, 2026

For the important notes via code review:

 driver-core/src/main/com/mongodb/client/model/RerankQuery.java                                                                                                                                                
  🟡 [important] L154: Missing @Immutable annotation on a value-object class with only final fields                                                                                                             
  Detail: Public client.model value objects in this driver are typically annotated with @Immutable (from com.mongodb.annotations) to document the thread-safety contract. RerankQuery qualifies (final          
  class, all final fields). Add the annotation.                                                                                                                                                                 
                                                                                                                                                                                                                
  🟡 [important] L155, L183-186: The Bson overload stores the caller's reference without eager conversion — caller can mutate after construction                                                                
  Detail: rerankQuery(Bson) stores the passed Bson reference directly. If a caller passes a mutable Document, mutating it later changes the serialized output of this RerankQuery (and any pipeline that        
  captures it). The String overload is safe (it builds a BsonDocument internally). For the Bson overload, eagerly snapshot: return new RerankQuery(query.toBsonDocument(BsonDocument.class,                     
  DEFAULT_CODEC_REGISTRY)), or document the no-mutate contract explicitly.  
  
⏺ Good points. Let me check how the codebase handles these patterns:

  Searched for 7 patterns, read 1 file (ctrl+o to expand)

⏺ Here's my assessment on those two review comments:

  1. @Immutable annotation: I'd say this is a false positive. Looking at the codebase, @Immutable is only used in the geojson sub-package (on Position, CoordinateReferenceSystem, etc.) — it's not a pattern
  applied to Bson-implementing value objects in the client.model package. VectorSearchOptions, GeoNearOptions, and other similar classes don't use it. Adding it to RerankQuery alone would be inconsistent. I'd
   dismiss this one.
  
  2. Storing the Bson reference without eager conversion: This one is more nuanced but I'd also lean toward not acting on it. The pattern of storing a Bson reference and calling toBsonDocument at
  serialization time is pervasive throughout the driver (filters, projections, sorts — all the builder classes store Bson references). Eagerly snapshotting would:
  - Require a CodecRegistry at construction time (which isn't available)
  - Break the lazy evaluation model the driver uses everywhere
  - Be inconsistent with how every other builder in the codebase works

  The implicit contract is already "don't mutate objects after passing them to builders" — this is standard Java builder library behavior (same as passing a List to Collections.unmodifiableList wrapper). The
  VectorSearchBson class similarly stores its queryVector object reference without copying.

  TL;DR: Both findings are technically correct in isolation but don't match the conventions of this codebase. I'd dismiss both.
❯ 🟡 [important] L62-65 (multi-path overload): No per-element null check on paths                                                                                                                               
  Detail: notNull("paths", paths) and isTrueArgument(...!paths.isEmpty()) check the container, but asList("content", null) would produce a $rerank stage with a null entry. Validate each element, e.g.,        
  paths.forEach(p -> notNull("paths element", p));.                                                                                                                                                             

  Searched for 2 patterns (ctrl+o to expand)

⏺ I'd dismiss this one too. The driver doesn't do per-element null checking on list parameters anywhere in the client.model builders — Aggregates.unset(List<String>), Projections.include(List<String>), etc.
  all just check the container. A null element would eventually surface as a server error or NPE at serialization time, which is the same contract the rest of the API follows.

  Adding per-element validation to rerank alone would be inconsistent with the rest of the codebase.

@nhachicha
Copy link
Copy Markdown
Collaborator

@copilot review

@nhachicha nhachicha requested review from Copilot and removed request for Copilot May 19, 2026 15:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comment on lines +1044 to +1047
/**
* Creates a {@code $rerank} pipeline stage supported by MongoDB Atlas.
* You may use the {@code $meta: "score"} expression to extract the relevance score
* assigned to each reranked document.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No public page yet.

Comment thread driver-core/src/main/com/mongodb/client/model/Aggregates.java Outdated
Comment thread driver-core/src/main/com/mongodb/client/model/RerankQuery.java
Comment on lines +750 to +754
/**
* Creates a `\$rerank` pipeline stage supported by MongoDB Atlas.
* You may use the `\$meta: "score"` expression to extract the relevance score
* assigned to each reranked document.
*
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No public page yet.

Comment thread driver-core/src/main/com/mongodb/client/model/Aggregates.java
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@rozza rozza merged commit 028fec3 into mongodb:main May 20, 2026
53 checks passed
@rozza rozza deleted the JAVA-6052 branch May 20, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants