cli: implement round-robin DNS IP rotation This change introduces det…#454
cli: implement round-robin DNS IP rotation This change introduces det…#454maniche1024 wants to merge 9 commits intominio:masterfrom
Conversation
|
Hi, @klauspost @harshavardhana , I've implemented transport-level DNS rotation and SNI preservation to fix gateway hotspots. I've included benchmark results in the description showing a ~16% throughput improvement. Ready for review when you have a moment, thanks in advance! |
|
I am not a particular fan. Mainly because round-robin is quite ineffective if there is even the slightest imbalance in the servers. Then you see throughput significantly below your capacity. I would much rather see that Add to type Client struct {
*minio.Client
Host *url.URL
}
// EndpointURL returns the endpoint URL.
func (c *Client) EndpointURL() *url.URL {
if c.Host != nil {
return c.Host
}
return c.Client.EndpointURL()
}Does that make sense? |
|
Hi Klaus, thanks for the guidance. Yeah it does make sense but I’ve located the client initialization logic in cli/client.go (I believe you might have been referring to this rather than pkg/bench, as that's where minio.New is called). I am currently refactoring getClient and newClient to use the Client wrapper you suggested so that EndpointURL() returns the original hostname for SNI and Host headers, even when connecting to the resolved IPs. I'll perform some validation tests and update the PR once verified. Does this sound like the right direction? |
|
Think I fully understood the context of the changes you were suggesting when started working on it. It definitely required more work than I had initially anticipated but thankfully was able to figure out eventually. Providing a summary of changes below:
Ran several tests to ensure:
Let me know your thoughts on this or if you need more information |
|
@harshavardhana You are probably the best to evaluate this. Sounds reasonable to me. |
|
@klauspost @harshavardhana just a quick bump on the Warp changes PR when you have a moment. I'm keen to get this merged so I can start on the next phase. Let me know if there's anything I can clarify! |
|
@klauspost thanks for the approval! But looks like I don't have merge privileges, so either would need that or please merge the PR. |
|
@maniche1024 I would want @harshavardhana to also take a look. He can merge if he approves. |
|
Hi @ramondeklein Thanks for the feedback. I’ve addressed all the review comments by removing the Client wrapper and updating the IP-pinning logic in the relevant transport layer files. I've also updated the Dialer to be proxy-aware, it now only performs IP pinning when no proxy is configured. I’ve verified the fix with my previous test suite, including cases for hosts supplied with custom ports. Everything is behaving as expected, and the results remain consistent. Hope things look good this time! |
ramondeklein
left a comment
There was a problem hiding this comment.
I think the proper way to deal with IP pinning is to only use a custom dialer and SNI when IP pinning is actually being used. It has too many side effects. It's probably fine not to support HTTP(S) proxies when using IP pinning, but it should work without it.
|
@ramondeklein I’ve pushed updates addressing all your recent comments, including the proxy-safety logic and the kTLS dialer optimization. The logic is now unified across client_tls, client_ktls, and client_transport. Ready for another look! |
Critical:
|
…erministic IP-level load distribution at the transport layer. Previously, Warp relied on the OS or a one-time resolution, which could lead to uneven traffic distribution when using a single hostname sitting in front of multiple gateways/IPs Key improvements: - Added a thread-safe DNS cache (sync.Map) and atomic counter to rotate between IPs for every new connection. - Fixed TLS SNI verification: when dialing a resolved IP, the original hostname is now explicitly set in TLSClientConfig.ServerName to prevent certificate hostname mismatches. - Applied rotation logic to both standard TLS and kTLS transport paths. - Optimized performance by reducing redundant DNS lookups during high-concurrency benchmarks.
…ing when using --resolve-host flag
fa5e768 to
a1137f2
Compare
|
Hi @ramondeklein, thanks for the thorough review again, I was out for about a month and half due to personal reasons hence could not address your review comments. Did it now after coming back to work, here's a summary:
I have also tested my changes thoroughly and everything looks good and I see expected results. |
|
Hi @ramondeklein — just wanted to follow up on this PR. I've addressed all the review comments you raised (SNI fix, endpoint hostname for signing, proxy bypass, and the stale description), and have pushed the updated changes. Would really appreciate another look when you get a chance. Happy to answer any questions or make further adjustments. Thanks! |
Summary
This PR introduces deterministic, transport-level IP rotation to ensure even load distribution across multi-A record S3 endpoints. It specifically addresses environments where the storage backend (e.g., Hitachi Virtual Storage Platform One) or its ingress layer (e.g., Istio Gateways) experiences connection hotspots due to the client pinning to a single resolved IP.
The Problem: Connection Hotspots & SNI Mismatches
When benchmarking S3-compatible storage sitting behind multiple gateways, Warp can experience unequal traffic distribution. While Warp has a
--resolve-hostflag, using it often leads to two issues in modern architectures:Observed behavior (Before): In a test against 5 gateways, traffic was pinned to only 3, with a severe skew:
Resolution: Changes' Summary
cli/client.go — Introduces a hostPair struct pairing each resolved IP with its original hostname. parseHostPairs() produces one pair per IP; getClient() uses originalHost as the S3 endpoint (for correct request signing and virtual-host bucket routing) and the resolved IP only for dialing.
cli/client_transport.go — Adds a withResolveHost() transport option that rewrites dial addresses from the logical hostname to the resolved IP, with correct proxy bypass handling.
cli/client_tls.go and cli/client_ktls.go — SNI (ServerName) is now derived from originalHost (the hostname), not the resolved IP. Both standard TLS and Kernel TLS paths use withResolveHost() in resolve-host mode.
cli/client_default.go — Plain HTTP transport also uses withResolveHost() when a resolved host is provided.
cli/flags.go — Removed stale warning ("This can break SSL certificates, use --insecure if so") since TLS now works correctly without --insecure
Evidence: Test Results
Post-implementation tests demonstrate perfectly balanced connection distribution across all 5 gateways without any HTTP 400 or TLS handshake errors.
Observed behavior (After):
Performance Impact & Verification
Comparative analysis between the stock Warp binary and this PR demonstrates that deterministic IP rotation significantly improves performance by preventing gateway saturation.
Test Environment: 600 concurrency, 1KB PUT operations, 5-gateway backend.