fix: connect via all DNS IPs when first contact point is non-responsive (DRIVER-201)#865
Open
nikagra wants to merge 1 commit intoscylladb:scylla-4.xfrom
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses DRIVER-201 by improving initial contact point connection behavior when RESOLVE_CONTACT_POINTS=false, ensuring that if a hostname resolves to multiple IPs and the first is unreachable, the driver can fall back to subsequent IPs instead of failing immediately.
Changes:
- Refactors
ChannelFactoryconnection logic to expand unresolved hostnames to all DNS IPs at connection time and attempt them sequentially. - Introduces a
ConnectRequestparameter object and decomposes connection steps into smaller methods. - Adds an integration test covering the “first DNS A record is unreachable” scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java |
Expands unresolved hostnames to all resolved IPs at connect time and adds sequential fallback across addresses. |
integration-tests/src/test/java/com/datastax/oss/driver/core/resolver/MockResolverIT.java |
Adds an integration test to validate successful connection when the first DNS entry is non-responsive. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java
Outdated
Show resolved
Hide resolved
…ve (DRIVER-201)
When RESOLVE_CONTACT_POINTS=false (the default), a hostname is stored as a
single unresolved InetSocketAddress. At connection time Netty's bootstrap
called InetAddress.getByName(), returning only the first IP. If that IP was
non-responsive the driver raised AllNodesFailedException with no fallback.
Refactor ChannelFactory to:
- Introduce ConnectRequest parameter object, eliminating the 9-parameter
private connect() method.
- Decompose the connection logic into three single-responsibility methods:
connect(ConnectRequest) – resolves hostname via getAllByName() and
builds a candidate list of all IPs;
tryNextAddress(...) – iterates candidates, falling back to the
next IP on failure;
connectToAddress(...) – performs a single Netty connect and handles
protocol-version negotiation on the same IP.
- DNS resolution is still deferred to connection time, preserving the
dynamic-DNS semantics of RESOLVE_CONTACT_POINTS=false.
Add integration test should_connect_when_first_dns_entry_is_non_responsive
to MockResolverIT that maps the first DNS entry to a non-existent IP and
asserts the session opens successfully via the subsequent entries.
89e53ea to
7da0919
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
RESOLVE_CONTACT_POINTS=false(the default), a contact point hostname is stored as a single unresolvedInetSocketAddress. At connection time,ChannelFactorypassed this address directly to Netty'sbootstrap.connect(), which internally calledInetAddress.getByName()— returning only the first IP for the hostname. If that IP was non-responsive, the driver raisedAllNodesFailedExceptionwith no fallback to other IPs the hostname might resolve to.This is particularly impactful in dynamic DNS environments where a hostname can map to multiple nodes and the first one may be temporarily unavailable.
Fixes DRIVER-201.
Changes
ChannelFactory.javaRefactored the connection logic using the Parameter Object pattern and SRP decomposition:
ConnectRequestinner class: bundles all per-connection-attempt state, eliminating the unwieldy 9-parameter privateconnect()method.connect(ConnectRequest): resolves the endpoint address. For unresolved hostnames, callsInetAddress.getAllByName()to expand to all known IPs, building a candidate list. DNS resolution is still deferred to connection time — theEndPointcontinues to hold the unresolved hostname — preserving the dynamic-DNS semantics ofRESOLVE_CONTACT_POINTS=false.tryNextAddress(request, candidates, index): iterates the candidate list. On per-address failure it tries the next candidate; only when all are exhausted does it fail the overallresultFuture.connectToAddress(request, address): performs a single Netty bootstrap connect to one resolved IP. Protocol-version negotiation (downgrade retries) is scoped to the same IP, which is semantically correct. Returns its ownCompletableFuture<DriverChannel>sotryNextAddresscan distinguish a per-address failure from the final outcome.tryNextAddressRaw: unchanged pass-through for non-InetSocketAddresstypes (e.g. Unix domain sockets).MockResolverIT.javaNew integration test
should_connect_when_first_dns_entry_is_non_responsive:127.0.1.xtest.cluster.fakeresolves to127.0.1.11(non-existent) first, then nodes 1 and 2RESOLVE_CONTACT_POINTS=false,RECONNECT_ON_INIT=false