JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968
JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968nhachicha wants to merge 42 commits into
Conversation
…ABEL` (mongodb#1926) This commit only adds the labels, and does not fully implement the tickets specified below. The reason there are four JAVA tickets specified is that the is a single specification commit that resolved the four corresponding DRIVERS tickets. All of these JAVA tickets have to be done together. The relevant spec changes: - https://github.com/mongodb/specifications/blame/ba14b6bdc1dc695aa9cc20ccf9378592da1b2329/source/client-backpressure/client-backpressure.md#L52-L80 - it's a subset of [DRIVERS-3239, DRIVERS-3411, DRIVERS-3370, DRIVERS-3412: Client backpressure (mongodb#1907)](mongodb/specifications@1125200) JAVA-5956, JAVA-6117, JAVA-6113, JAVA-6119
…b#1929) The relevant spec changes: - https://github.com/mongodb/specifications/blame/ba14b6bdc1dc695aa9cc20ccf9378592da1b2329/source/retryable-writes/tests/README.md#L265-L418 - See also https://jira.mongodb.org/browse/DRIVERS-3432 for the phrasing fixes for "Test 3 Case 3" JAVA-6055
JAVA-6141
- Deprioritize sharded clusters on any error, all other topologies only on SystemOverloadedError. - Pass ClusterType to updateCandidate so onAttemptFailure can distinguish topology types. - Add retryable reads prose tests 3.1 and 3.2. - Change ServerSelectionSelectionTest to use BaseCluster server selection chain. JAVA-6105 JAVA-6021 JAVA-6074 --------- Co-authored-by: Valentin Kovalenko <[email protected]> Co-authored-by: Ross Lawley <[email protected]>
The relevant spec changes: - https://github.com/mongodb/specifications/blob/8a8a7c56429c80b51ec62268dcafc5e5e3c477ef/source/client-backpressure/tests/README.md JAVA-5956, JAVA-6117, JAVA-6113, JAVA-6119, JAVA-6141
- Add enableOverloadRetargeting boolean option to MongoClientSettings and ConnectionString to allow the driver to route requests to a different replica set member on retries when the previously used server is overloaded - Add prose test 3.3 to verify that overload errors are retried on the same server when retargeting is disabled JAVA-6167 --------- Co-authored-by: Ross Lawley <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR introduces a dedicated MongoSocksProxyException to represent SOCKS5 proxy connection/handshake failures (with phase + optional RFC 1928 reply code), and updates the SOCKS implementation/tests to throw and assert this new exception type. This supports more precise classification/handling of SOCKS-related connection failures (per JAVA-6194).
Changes:
- Added
MongoSocksProxyException(withHandshakePhaseand optional proxy reply code). - Updated
SocksSocketandSocketStreamto throw/propagateMongoSocksProxyExceptionfor SOCKS negotiation/auth/connect failures and proxy TCP connect failures. - Updated/added tests to validate the new exception type and its phase/reply-code tagging.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| driver-sync/src/test/functional/com/mongodb/client/Socks5ProseTest.java | Updates prose test assertions to expect MongoSocksProxyException during SOCKS auth failures. |
| driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java | Adds unit tests validating handshake phase tagging and CONNECT reply-code tagging. |
| driver-core/src/main/com/mongodb/MongoSocksProxyException.java | Adds new public exception type carrying SOCKS handshake phase and optional RFC 1928 reply code. |
| driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java | Replaces some SOCKS protocol failures with MongoSocksProxyException and adds helper to build a ServerAddress. |
| driver-core/src/main/com/mongodb/internal/connection/SocketStream.java | Wraps proxy-enabled IO failures as MongoSocksProxyException and rethrows SOCKS exceptions directly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2d1b2e1 to
11a7b73
Compare
…re/socks5_exception
- Delegate single-arg constructor to four-arg variant - Clarify Javadoc that proxyReplyCode may be null - Remove duplicate blank lines between constructors
MongoSocksProxyException extends RuntimeException and bypassed the existing catch (SocketException) block, leaking the underlying proxy TCP socket on every SOCKS5 protocol failure (negotiation/auth/relay).
Defensive: avoids any reverse-DNS risk in error paths if the unresolved invariant on remoteAddress is ever weakened.
- Close inner socket on failure inside initializeSocketOverSocksProxy and initializeSslSocketOverSocksProxy so that failures before this.socket is assigned do not leak the underlying file descriptor. - Hoist translateInterruptedException out of the two orElseThrow branches.
Server thread now blocks on the client closing the connection instead of guessing how long the SOCKS exchange will take. Removes a CI flake under load.
Port 1 (tcpmux) is unassigned on most systems but not guaranteed. Binding then releasing an ephemeral port gives a reliably-closed port.
The previous fix used InputStream.transferTo (Java 9+) and OutputStream.nullOutputStream (Java 11+), which CI on Java 8 surfaced as NoSuchMethodError. The test source set is compiled by the Groovy compiler (because src/test/unit contains both Java and Groovy), which does not honor options.release.set(8) — Java 11 API calls slipped past compile-time validation but failed at link time on Java 8. Replace with a plain read-into-discard-buffer loop. Same semantics (blocks until client closes), Java 8 source compatible.
ApiAliasAndCompanionSpec discovers every public subtype of MongoException in the Java driver and asserts each one has a corresponding Scala alias. MongoSocksProxyException is new in 5.8 and was missing from the wrapper.
…eler JAVA-6194: SOCKS5 failures during post-TCP phases (negotiation, auth, CONNECT-relay) are configuration/protocol errors and must not carry SystemOverloadedError or RetryableError labels. Failures during the PROXY_TCP_CONNECT phase are plain TCP-level reach failures (proxy host unreachable / overloaded) and continue to receive the labels like any other socket-open failure. Removes the corresponding TODO.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java:147
SocksSocket.connectonly closes the underlying socket onMongoSocksProxyExceptionandSocketException. A TCP connect timeout (SocketTimeoutException) is anIOException(not aSocketException), so it can escape this method without closing the socket, which risks leaking an FD for callers that don’t explicitly close on failure. Consider closing on any failedconnectattempt (e.g., catchIOExceptionaround the whole body or use a failurefinally).
} catch (SocketException socketException) {
/*
* The 'close()' call here has two purposes:
*
* 1. Enforces self-closing under RFC 1928 if METHOD is X'FF'.
The per-phase try-blocks in SocksSocket.connect each had an explicit
catch (MongoSocksProxyException e) { throw e; } before the
catch (IOException e) wrapper. Since MongoSocksProxyException is a
RuntimeException, not an IOException, the IOException catch would
never have absorbed it anyway — the explicit re-throws are dead code
that Java's type system already enforces.
Remove the three redundant catches and extend the explanatory comment
to make the propagation path explicit: a MongoSocksProxyException
thrown directly by a phase method falls through to the outer block
that closes the socket.
No behavior change; SocksSocketTest (11) and BackpressureErrorLabelerTest
(49) both pass.
The defensive null check on getHandshakePhase() and its rationale comment predated commit dee79f0, which added notNull("handshakePhase", ...) to all four MongoSocksProxyException constructors. Combined with the field being private final, the phase reference is guaranteed non-null for any instance created through the public API, making the labeler-side check unreachable in normal use. Collapse the body of isExcludedSocksPostTcpPhase to the bare phase comparison; the constructor contract is the authoritative non-null guarantee. Deserialization-driven null is not defended against here for the same reason it is not defended against elsewhere in the driver hierarchy: if it were a real concern, readObject on the exception itself is the correct enforcement point, not a downstream labeler. SocksSocketTest (11) and BackpressureErrorLabelerTest (49) both pass.
Backpressure labels (SystemOverloadedError, RetryableError) signal that the target mongod is overloaded and the driver should back off. The prior policy excluded post-TCP SOCKS5 phases (NEGOTIATION / AUTHENTICATION / CONNECT_RELAY) and labeled PROXY_TCP_CONNECT on the "structurally identical to a socket-open failure" rationale (98483bb, b3da503). That rule conflated failure mechanism with semantic attribution: backpressure is about WHICH party is overloaded, not WHAT kind of failure surfaces. Re-frame: a SOCKS5 failure is mongod-attributable only when the proxy acted as a reach-proxy for mongod and reported a transport-level outcome that mirrors a direct-connection socket-open failure. That is exactly CONNECT_RELAY with RFC 1928 reply codes 3 (network unreachable), 4 (host unreachable), or 5 (connection refused) — the SOCKS5 analogs of NoRouteToHostException and ConnectException. Policy changes: - PROXY_TCP_CONNECT flips from labeled to NOT labeled — the proxy itself is unreachable; mongod was never reached. - CONNECT_RELAY with codes 3, 4, or 5 flips from NOT labeled to LABELED — these mirror direct-connection mongod-overload signals. - NEGOTIATION, AUTHENTICATION, CONNECT_RELAY with codes 1, 2, 6, 7, 8, and CONNECT_RELAY with null replyCode (I/O failure / unrecognised reply) remain NOT labeled — proxy-side or ambiguous failures with no mongod-side attribution. Rename isExcludedSocksPostTcpPhase -> isNonMongodAttributableSocksFailure to reflect the actual semantic and add reply-code-aware logic. Update MongoSocksProxyException class Javadoc and the labeler method Javadoc to describe the attribution rule. Test recalibration: - socksProxyPostTcpPhaseShouldNotBeLabeled split and renamed: socksProxyNonMongodAttributableShouldNotBeLabeled (9 cases) and socksProxyMongodAttributableShouldBeLabeled (3 cases for codes 3/4/5). - Old socksProxyTcpConnectPhaseShouldBeLabeled removed; PROXY_TCP_CONNECT is now covered as the first case of the non-attributable parameterised test. This inverts the explicit Round 2 decision documented in 98483bb and the matching class-level Javadoc updated in b3da503.
When SocketStream.open() wraps an initial-connect IOException as a MongoSocksProxyException with PROXY_TCP_CONNECT, the message text was "Exception connecting to SOCKS5 proxy" without identifying which proxy. Operators reading a log line could not tell whether the ServerAddress attached to the exception was the proxy or the target mongod. Append the proxy host:port to the message string so the proxy endpoint is identifiable from the message alone. Keep the ServerAddress as the target mongod — SDAM, the connection pool, and retry logic key off the cluster-member identity, and substituting the proxy endpoint there would break those consumers (per the discussion on PR mongodb#1968 thread r3267511517: message-string enrichment accepted, ServerAddress substitution rejected).
The throw at SocksSocket.checkServerReply used reply.message verbatim
(e.g. "Host is unreachable") which is indistinguishable in raw log
output from a non-proxy ICMP-driven ConnectException carrying the
same text. Prefix the message with "SOCKS5 CONNECT reply:" and append
the RFC 1928 reply code so operators can attribute the failure to the
SOCKS5 protocol layer from the message alone.
Phrasing chosen to remain distinguishable from the sibling IO-failure
path message ("SOCKS5 CONNECT relay failed: ..." at SocksSocket.java
line 130): "reply" signals a parsed proxy response; "relay failed"
signals an IO failure during the CONNECT exchange. Two textual styles
let operators tell the two CONNECT_RELAY sub-cases apart.
reply.replyNumber remains available via getProxyReplyCode() for
programmatic consumers; embedding it in the message is purely a
diagnostic convenience for log readers.
After 78d6b01 (per-phase IOException wrapping) and 27417eb (removal of redundant re-throws), the two outer catches in SocksSocket.connect handle disjoint, well-defined cases — but the comments hadn't caught up. - catch (MongoSocksProxyException) — now the destination for every SOCKS5 protocol failure, including the RFC 1928 X'FF' "no acceptable method" self-close which performNegotiation throws as MongoSocksProxyException directly (not as an IOException). Note this in the comment. - catch (IOException) — now only ever fires for IOExceptions from the initial proxy TCP connect (timeout.checkedRun above); inner-phase IOExceptions are caught and rewrapped by the per-phase wrappers and land in the MongoSocksProxyException catch above. The stale comment block here referenced the RFC 1928 X'FF' path, which no longer reaches this catch. Replace with an accurate description of what actually lands here, and the leak-prevention rationale (JDK auto-close on connect timeout is implementation-defined).
The helper wrapped the SocksSocket try-with-resources in
catch (Exception ignored) {} followed by a trailing return null, which
silently absorbed any non-IOException, non-MongoSocksProxyException
failure (RuntimeException from SocksSocket internals, close()
failures, etc.) and returned null. assertProxy(null) then failed with
"Expected MongoSocksProxyException but got: null" instead of the
actual stack trace — masking real regressions.
Remove the catch and the trailing return null. Unexpected exceptions
now propagate with the original stack trace; the inner catch still
captures the expected MongoSocksProxyException / IOException as the
helper's return value. Move t.join(5000) into its own try with a
narrow InterruptedException catch that preserves the interrupt flag
without masking a primary exception that may already be unwinding.
Reviewers reading the four constructor overloads can mistake the ordering for inconsistent because handshakePhase appears at a different absolute position in each (third in the 3-arg ctor, fourth when cause or proxyReplyCode is present, etc.). The ordering is actually rule- driven: parent-class params first (message, address, optional cause), then SOCKS-specific args (handshakePhase, optional proxyReplyCode). Adding the rule to the class-level Javadoc closes the documentation gap without restructuring the API — keeping parent-class symmetry on the super(...) calls and avoiding a one-off divergence from the rest of the MongoSocket*Exception hierarchy.
JAVA-6194