feat: optimize system table queries with column projection (DRIVER-368) by nikagra · Pull Request #862 · scylladb/java-driver

nikagra · 2026-04-07T12:11:07Z

Closes https://scylladb.atlassian.net/browse/DRIVER-368

Backport of the same optimization done for the 4.x driver (DefaultTopologyMonitor), applied here to ControlConnection in the 3.x driver.

On the first connection to each system table (system.local, system.peers, system.peers_v2) the driver issues SELECT * to discover which columns the server exposes. It caches the intersection of those columns with a driver-internal *_COLUMNS_OF_INTEREST set. Subsequent queries project only those columns, reducing bytes on the wire and deserialization work.

Changes:

Three LOCAL/PEERS/PEERS_V2_COLUMNS_OF_INTEREST ImmutableSet<String> constants
Three volatile Set<String> cache fields (null = uninitialized sentinel)
intersectWithNeeded() and buildProjectedQuery() static helpers (@VisibleForTesting)
Cache reset in setNewConnection() and peers_v2 fallback path
Projected queries in refreshNodeListAndTokenMap(), selectPeersFuture(), and fetchNodeInfo() (system.local only)
fetchNodeInfo() uses SELECT * for peer WHERE-clause lookups (single-row reconnection path) to stay compatible with Scassandra primes on non-restarted nodes
New ControlConnectionUnitTest.java: 16 pure unit tests, no cluster required

dkropachev · 2026-04-07T13:17:28Z

@nikagra please check why cicd is failing

Copilot

Pull request overview

This PR backports the 4.x “system table column projection” optimization to the 3.x driver’s ControlConnection, caching discovered system table columns (via initial SELECT *) and using projected SELECT col1, col2... for subsequent queries to reduce bytes/deserialization work.

Changes:

Add *_COLUMNS_OF_INTEREST constants, projected-column caches, and helper methods (intersectWithNeeded, buildProjectedQuery) in ControlConnection.
Update control connection system table queries to use projections once caches are warmed, and reset caches on reconnect / peers_v2 fallback.
Extend Scassandra priming to include projected-query primes and add unit tests covering the new helpers/caches.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java	Adds column-interest sets, cache fields, cache reset, and switches system table queries to projected form after discovery.
driver-core/src/test/java/com/datastax/driver/core/ScassandraCluster.java	Re-primes after restart and primes projected queries to match the driver’s new projected system-table queries.
driver-core/src/test/java/com/datastax/driver/core/ControlConnectionUnitTest.java	New pure unit tests for projection helpers/constants and cache-field declarations.
driver-core/src/test/java/com/datastax/driver/core/ControlConnectionTest.java	Resets column caches when Scassandra primes are cleared to avoid projected-query misses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T21:30:17Z

driver-core/src/test/java/com/datastax/driver/core/ScassandraCluster.java

+    // Pre-generate one UUID per node so that system.local and system.peers rows for the same node
+    // use the same host_id. Keys are nodeCount values (1-based).
+    Map<Integer, java.util.UUID> hostIdByNodeCount = new HashMap<>();
+    int tempCount = 1;
+    for (Integer dc : new TreeSet<Integer>(dcNodeMap.keySet())) {
+      for (int n = 0; n < dcNodeMap.get(dc).size(); n++) {
+        hostIdByNodeCount.put(tempCount, UUIDs.random());
+        tempCount++;
+      }
+    }


hostIdByNodeCount is rebuilt on every primeMetadata() call (and primeMetadata() is invoked per Scassandra node, including on restart). That means the same logical node can get different host_id values depending on which Scassandra process you query, defeating the intent of having consistent host_ids across system.local and system.peers for the cluster and potentially breaking host identity during control-connection failover/restart scenarios. Consider computing the host-id map once per ScassandraCluster instance (store as a field) or deriving deterministic UUIDs from nodeCount, and reusing it in every primeMetadata() call.

Fixed. hostIdByNodeCount is now a private final Map<Integer, UUID> instance field on ScassandraCluster, populated once in the constructor (after dcNodeMap is built) using the same DC/node iteration order. Every subsequent primeMetadata() call — whether at init() time or after a node restart — reads from this stable map, so system.local and system.peers rows for the same logical node always carry the same host_id.

Copilot · 2026-04-09T21:30:17Z

driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java

+  static String buildProjectedQuery(String table, Set<String> columns, String whereClause) {
+    StringBuilder sb = new StringBuilder("SELECT ");
+    boolean first = true;
+    for (String col : columns) {
+      if (!first) sb.append(", ");
+      sb.append(col);
+      first = false;
+    }
+    sb.append(" FROM ").append(table);
+    if (whereClause != null) sb.append(" WHERE ").append(whereClause);
+    return sb.toString();
+  }


buildProjectedQuery() will produce an invalid CQL statement if columns is empty (it becomes SELECT FROM ...). Since intersectWithNeeded() can legitimately return an empty set if the server/prime doesn't expose any of the needed columns, this can turn a schema mismatch into a confusing syntax error and break subsequent refreshes once the empty set is cached. Add a guard to fall back to SELECT * (or keep the cache as null) when columns.isEmpty().

Fixed. intersectWithNeeded() now returns null instead of an empty set when no server columns overlap the needed set. Since null is the uninitialised sentinel value for the cache fields, this keeps the cache in the "needs discovery" state and ensures buildProjectedQuery() is never called with an empty column list — the driver will continue issuing SELECT * until it gets a non-empty intersection.

Copilot · 2026-04-09T21:30:18Z

driver-core/src/test/java/com/datastax/driver/core/ControlConnectionUnitTest.java

+  @Test(groups = "unit")
+  public void testBuildProjectedQueryAllColumnsPresent() {
+    // Every column in the needed set appears somewhere in the resulting query
+    Set<String> columns = ControlConnection.PEERS_COLUMNS_OF_INTEREST;
+    String query = ControlConnection.buildProjectedQuery("system.peers", columns, null);
+
+    for (String col : columns) {
+      assertThat(query).as("query should contain column: " + col).contains(col);


testBuildProjectedQueryAllColumnsPresent() uses String.contains(col) to assert that each column is present in the generated query. This can yield false positives when one column name is a substring of another (e.g. native_port vs native_transport_port). Consider parsing the selected column list (split between SELECT and FROM, then split on commas and trim) and comparing exact identifiers instead.

Suggested change

@Test(groups = "unit")

public void testBuildProjectedQueryAllColumnsPresent() {

// Every column in the needed set appears somewhere in the resulting query

Set<String> columns = ControlConnection.PEERS_COLUMNS_OF_INTEREST;

String query = ControlConnection.buildProjectedQuery("system.peers", columns, null);

for (String col : columns) {

assertThat(query).as("query should contain column: " + col).contains(col);

private Set<String> extractSelectedColumns(String query) {

int selectStart = query.indexOf("SELECT ");

int fromStart = query.indexOf(" FROM ");

assertThat(selectStart).as("query should contain SELECT").isEqualTo(0);

assertThat(fromStart).as("query should contain FROM").isGreaterThan(selectStart);

String selectedColumns = query.substring("SELECT ".length(), fromStart);

ImmutableSet.Builder<String> builder = ImmutableSet.builder();

for (String column : selectedColumns.split(",")) {

builder.add(column.trim());

}

return builder.build();

}

@Test(groups = "unit")

public void testBuildProjectedQueryAllColumnsPresent() {

// Every column in the needed set appears in the projected SELECT column list.

Set<String> columns = ControlConnection.PEERS_COLUMNS_OF_INTEREST;

String query = ControlConnection.buildProjectedQuery("system.peers", columns, null);

Set<String> selectedColumns = extractSelectedColumns(query);

for (String col : columns) {

assertThat(selectedColumns)

.as("query should project column: " + col)

.contains(col);

Fixed. Added a private extractSelectedColumns(String query) helper that parses the column list between SELECT and FROM, splits on commas, and trims whitespace — returning an exact Set<String>. testBuildProjectedQueryAllColumnsPresent() now asserts membership in that parsed set rather than using String.contains(), eliminating the false-positive risk from substrings like native_port inside native_transport_port.

Backport of the 4.x DefaultTopologyMonitor optimization to the 3.x driver's ControlConnection. On the first query to each system table (system.local, system.peers, system.peers_v2) the driver sends SELECT * to discover available columns. It caches the intersection with an internal *_COLUMNS_OF_INTEREST set; subsequent queries project only those columns, reducing bytes on the wire and deserialization work. Changes in ControlConnection: - LOCAL/PEERS/PEERS_V2_COLUMNS_OF_INTEREST ImmutableSet<String> constants - Volatile Set<String> cache fields (null = uninitialized sentinel) - intersectWithNeeded() returns null on empty intersection to prevent invalid empty-column projections - buildProjectedQuery() static helper - Cache reset in setNewConnection() and peers_v2 fallback path - Projected queries in refreshNodeListAndTokenMap(), selectPeersFuture(), and fetchNodeInfo() (system.local only; peer WHERE lookups use SELECT * for Scassandra compatibility) Test changes: - ScassandraCluster: stable per-instance hostIdByNodeCount field; primes projected full-scan queries; re-primes after node restart - ControlConnectionTest: resets column caches before clearing primes - ControlConnectionUnitTest: 16 pure unit tests covering constants, intersectWithNeeded, buildProjectedQuery, and cache field declarations

nikagra · 2026-04-09T22:28:10Z

All CI failures have been resolved. The root issues were: projected WHERE-clause queries not primed in Scassandra for non-restarted nodes (fixed by using SELECT * for peer WHERE lookups in fetchNodeInfo), inconsistent host_id UUIDs across system.local and system.peers primes (fixed by computing UUIDs once per ScassandraCluster instance), and missing contains()-guarded columns in the PEERS/PEERS_V2_COLUMNS_OF_INTEREST sets. All 11 CI checks are now passing on the latest commit.

When the control connection reconnects to a new host, tryConnect() calls refreshNodeListAndTokenMap() before setNewConnection() is called. This meant the stale localColumns/peersColumns/peersV2Columns from the previous connection were used to build projected queries, which could fail if the new host (e.g. a Scassandra mock) only has the unfiltered SELECT * primed. Fix by resetting the three column caches to null at the start of the try block in tryConnect(), before the first query is issued. The reset in setNewConnection() remains as a safety net but is now redundant for the reconnection case.

nikagra marked this pull request as draft April 7, 2026 14:44

nikagra requested review from Copilot and dkropachev April 9, 2026 21:26

nikagra marked this pull request as ready for review April 9, 2026 21:26

Copilot started reviewing on behalf of nikagra April 9, 2026 21:27 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

nikagra force-pushed the driver-368-optimize-system-local-query branch from 0bf9092 to 4f9a052 Compare April 9, 2026 22:25

nikagra force-pushed the driver-368-optimize-system-local-query branch from c37c98f to 4f9a052 Compare April 9, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize system table queries with column projection (DRIVER-368)#862

feat: optimize system table queries with column projection (DRIVER-368)#862
nikagra wants to merge 2 commits intoscylladb:scylla-3.xfrom
nikagra:driver-368-optimize-system-local-query

nikagra commented Apr 7, 2026 •

edited

Loading

Uh oh!

dkropachev commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

nikagra Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

nikagra Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

nikagra Apr 9, 2026

Uh oh!

nikagra commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  @Test(groups = "unit")
-  public void testBuildProjectedQueryAllColumnsPresent() {
-    // Every column in the needed set appears somewhere in the resulting query
-    Set<String> columns = ControlConnection.PEERS_COLUMNS_OF_INTEREST;
-    String query = ControlConnection.buildProjectedQuery("system.peers", columns, null);
-    for (String col : columns) {
-      assertThat(query).as("query should contain column: " + col).contains(col);
+  private Set<String> extractSelectedColumns(String query) {
+    int selectStart = query.indexOf("SELECT ");
+    int fromStart = query.indexOf(" FROM ");
+    assertThat(selectStart).as("query should contain SELECT").isEqualTo(0);
+    assertThat(fromStart).as("query should contain FROM").isGreaterThan(selectStart);
+    String selectedColumns = query.substring("SELECT ".length(), fromStart);
+    ImmutableSet.Builder<String> builder = ImmutableSet.builder();
+    for (String column : selectedColumns.split(",")) {
+      builder.add(column.trim());
+    }
+    return builder.build();
+  }
+  @Test(groups = "unit")
+  public void testBuildProjectedQueryAllColumnsPresent() {
+    // Every column in the needed set appears in the projected SELECT column list.
+    Set<String> columns = ControlConnection.PEERS_COLUMNS_OF_INTEREST;
+    String query = ControlConnection.buildProjectedQuery("system.peers", columns, null);
+    Set<String> selectedColumns = extractSelectedColumns(query);
+    for (String col : columns) {
+      assertThat(selectedColumns)
+          .as("query should project column: " + col)
+          .contains(col);

Conversation

nikagra commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkropachev commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

nikagra Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

nikagra Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

nikagra Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

nikagra commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikagra commented Apr 7, 2026 •

edited

Loading