You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The asyncio reactor (AsyncioConnection) uses loop.sock_recv() and loop.sock_sendall() with an ssl.SSLSocket. CPython explicitly rejects SSLSocket in these APIs — raising TypeError("Socket cannot be of type SSLSocket") — on all Python versions this driver supports (≥3.9). The TypeError is silently swallowed because the run_coroutine_threadsafe futures are never inspected, causing the read and write coroutines to die immediately. The connection then hangs with no I/O until Connection.factory() times out with OperationTimedOut.
This was surfaced by PR #773 (updating the CI Scylla version from 2025.2 to 2026.1), which enables the test_client_routes.py SSL tests for the first time. All 5 asyncio CI jobs fail on TestSslThroughNlb::test_ssl_without_hostname_verification_through_nlb with OperationTimedOut, while all libev and asyncore jobs pass.
In short: asyncio + SSL has never worked on supported Python versions. It was never caught because no CI-exercised test path combined these two until now.
Root Cause
The failure chain
AsyncioConnection.__init__() (asyncioreactor.py:88) calls _connect_socket(), which wraps the raw socket with ssl_context.wrap_socket() (connection.py:1073), producing an ssl.SSLSocket. The TCP connection and SSL handshake complete synchronously (blocking) during _connect_socket().
The socket is set to non-blocking (asyncioreactor.py:93), then handle_read() and handle_write() coroutines are scheduled on the event loop thread via run_coroutine_threadsafe() (asyncioreactor.py:102-107).
_send_options_message() (asyncioreactor.py:108) queues the CQL OPTIONS message via push() → _push_msg() → _write_queue.put_nowait().
On the event loop thread, handle_write() dequeues the message and calls await self._loop.sock_sendall(self._socket, next_msg) (asyncioreactor.py:203). Since self._socket is an ssl.SSLSocket, CPython's _check_ssl_socket() guard immediately raises TypeError("Socket cannot be of type SSLSocket").
The TypeError is not caught by except socket.error (TypeError is not a subclass of OSError). It propagates out of the coroutine and is stored in the concurrent.futures.Future returned by run_coroutine_threadsafe().
Nobody ever calls .result() on _read_watcher or _write_watcher, so the TypeError is silently swallowed. Both coroutines are dead.
With no read/write I/O, the CQL OPTIONS message is never sent, no SUPPORTED response is ever received, connected_event is never set, and Connection.factory() (connection.py:980) times out with OperationTimedOut.
Why libev/asyncore work
Both libevreactor.py and asyncorereactor.py call self._socket.recv() / self._socket.send() directly — the SSL socket's own methods — rather than passing the socket through asyncio's sock_recv()/sock_sendall() APIs. They also handle SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE on both read and write paths:
# libevreactor.py handle_write (line 330-341)exceptsocket.erroraserr:
if (err.args[0] inNONBLOCKINGorerr.args[0] in (ssl.SSL_ERROR_WANT_READ, ssl.SSL_ERROR_WANT_WRITE)):
...
# libevreactor.py handle_read (line 367-374)ifisinstance(err, ssl.SSLError):
iferr.args[0] in (ssl.SSL_ERROR_WANT_READ, ssl.SSL_ERROR_WANT_WRITE):
...
Why the right fix is asyncio SSL Transport, not just catching TypeError
Even if you bypassed the _check_ssl_socket guard (e.g., by calling self._socket.recv() directly like libev does), sock_recv()/sock_sendall() have deeper architectural problems with SSL sockets:
They only catch BlockingIOError/InterruptedError internally. SSL sockets raise SSLWantReadError/SSLWantWriteError instead, which are not caught by the retry logic.
Direction mismatch.sock_recv() registers add_reader() on the fd; sock_sendall() registers add_writer(). But SSL operations can need the opposite direction — recv() may need to write (TLS key update response), and send() may need to read. These APIs have no mechanism for cross-direction I/O.
TLS 1.3 makes cross-direction I/O common. TLS 1.3 NewSessionTicket messages arrive immediately after the handshake, and KeyUpdate messages can arrive at any time. Processing these during a recv() may require writing, and vice versa. With TLS 1.2 this was rare; with TLS 1.3 it is nearly guaranteed on the first read.
handle_write has no SSL error handling. Even with direct recv()/send(), the current handle_write (asyncioreactor.py:198) only catches socket.error — SSLWantReadError/SSLWantWriteError during a write would defunct the connection.
Asyncio's built-in SSL transport (SSLProtocol + ssl.MemoryBIO) handles all of this correctly: it keeps the raw TCP socket for the selector, uses ssl.SSLObject over memory BIOs for the TLS state machine, catches SSLWantReadError as "try again later," and flushes outgoing TLS frames after every operation.
Why it wasn't caught before
The client_routes SSL tests are gated by @skip_scylla_version_lt("2026.1.0") because system.client_routes didn't exist in 2025.2. PR ci: update Scylla test version from 2025.2 to 2026.1 #773 bumps the CI version to 2026.1, enabling these tests for the first time.
The tests/integration/long/test_ssl.py suite (20+ SSL tests) is never run in CI — only tests/integration/standard/ is run.
No other standard integration test exercises SSL through the asyncio reactor.
The asyncio reactor unit tests (tests/unit/io/test_asyncioreactor.py) only test timers — they don't cover read/write/SSL at all. The shared ReactorTestMixin (which covers SSL error recovery, partial reads, etc.) is not used by the asyncio tests.
Proposed Fix: Use asyncio SSL Transport (SSLProtocol + MemoryBIO)
Replace the raw sock_recv()/sock_sendall() approach with asyncio's built-in SSL transport for SSL connections. Non-SSL connections continue to use the existing approach.
Implementation Plan
Step 1: Override _wrap_socket_from_context() to skip SSL wrapping
Override _wrap_socket_from_context() in AsyncioConnection to return the raw socket unchanged (no ssl.SSLSocket wrapping). Store self.ssl_context and extract server_hostname for later use by create_connection().
server_hostname extraction must follow the same logic as the current _wrap_socket_from_context() (connection.py:1019-1033):
If ssl_options provides server_hostname, use it.
If ssl_context.check_hostname is True and no explicit server_hostname, use self.endpoint.address.
If check_hostname is False and no explicit server_hostname, pass server_hostname="" (empty string satisfies the asyncio API requirement without enabling hostname verification).
Note: loop.create_connection(ssl=ctx, sock=sock)requiresserver_hostname when using a pre-connected socket — omitting it raises ValueError("You must set server_hostname when using ssl without a host").
Step 2: Create an asyncio Protocol bridge
class_CQLProtocol(asyncio.Protocol):
"""Bridge between asyncio's transport/protocol model and Connection."""def__init__(self, connection):
self._connection=connectionself.transport=Nonedefconnection_made(self, transport):
self.transport=transport# SSL handshake is complete; now start the CQL handshakeself._connection._send_options_message()
defdata_received(self, data):
self._connection._iobuf.write(data)
self._connection.process_io_buffer()
defconnection_lost(self, exc):
ifexc:
self._connection.defunct(exc)
else:
self._connection.close()
defpause_writing(self):
self._connection._socket_writable=Falsedefresume_writing(self):
self._connection._socket_writable=True
Note on pause_writing/resume_writing: the current asyncio reactor never sets _socket_writable = False (pre-existing gap), so implementing these callbacks is an improvement. _socket_writable gates send_msg() at connection.py:1216 — when False, it raises ConnectionBusy. This provides proper write backpressure through the asyncio transport's high/low water marks.
Step 3: Refactor AsyncioConnection.__init__ for SSL vs non-SSL
For SSL connections, replace the handle_read() / handle_write() coroutines with a single _create_ssl_connection() coroutine that sets up the transport:
asyncdef_create_ssl_connection(self):
try:
transport, protocol=awaitasyncio.wait_for(
self._loop.create_connection(
lambda: _CQLProtocol(self),
sock=self._socket,
ssl=self.ssl_context,
server_hostname=self._server_hostname,
),
timeout=self._ssl_handshake_timeout,
)
self._transport=transportself._protocol=protocol# connection_made() has already been called by this point,# which triggers _send_options_message()exceptExceptionasexc:
self.defunct(exc)
For non-SSL connections, the existing handle_read() / handle_write() / _send_options_message() flow remains unchanged.
The connect_timeout budget must be split:
TCP connect: happens synchronously in _connect_socket() (bounded by socket.settimeout(connect_timeout))
SSL handshake timeout: remaining budget after TCP connect, passed as _ssl_handshake_timeout and enforced via asyncio.wait_for()
CQL startup: remaining budget after handshake, bounded by Connection.factory()'s connected_event.wait(timeout - elapsed)
connected_event must NOT be set from connection_made(). In this driver, connected_event means "CQL-level startup is complete (READY/auth received) or has failed." It is set at:
connection.py:1584 — after ReadyMessage received
connection.py:1640 — after AuthSuccessMessage received
connection.py:1136 — on defunct() (error path)
asyncioreactor.py:168 — on _close() (cleanup path)
Connection.factory() at connection.py:980 waits on this event with connect_timeout. Setting it earlier would expose half-initialized connections that haven't completed the CQL handshake.
The flow is: connection_made() → _send_options_message() → OPTIONS/SUPPORTED/STARTUP/READY exchange → connected_event.set().
Step 5: Refactor push() for SSL connections
For SSL connections, replace the write queue + sock_sendall() with transport.write():
No chunking is needed for the transport path — asyncio's SSLProtocol handles buffering internally.
Step 6: Update close() / _close() for SSL connections
For SSL connections:
Call self._transport.close() instead of self._socket.close() — the transport owns the socket and handles graceful TLS shutdown (close_notify).
Skip remove_reader()/remove_writer() calls — the transport manages its own fd registration.
Cancel the _create_ssl_connection future if still pending.
For non-SSL connections, keep the existing close logic.
Step 7: Error propagation
All errors from the transport/protocol must reach the driver's error handling:
_create_ssl_connection() catches exceptions from create_connection() (handshake failures, cert errors, timeouts) and calls self.defunct(exc), which sets last_error, calls close(), errors all requests, and sets connected_event.
connection_lost(exc) in _CQLProtocol calls self.defunct(exc) for error cases and self.close() for clean shutdown.
asyncio.wait_for() raises asyncio.TimeoutError if the handshake exceeds the timeout budget, which _create_ssl_connection() catches and passes to defunct().
Step 8: Tests
Unit tests (new, required):
Successful SSL startup via transport: mock create_connection() → connection_made() → OPTIONS/SUPPORTED/STARTUP/READY → connected_event set
SSL handshake failure: create_connection() raises ssl.SSLCertVerificationError → defunct() called → connected_event set with error
SSL handshake timeout: asyncio.wait_for() raises TimeoutError → defunct() called
connection_lost(exc) → defunct() called
connection_lost(None) → close() called
Write via transport.write() — verify data reaches the transport mock
Non-SSL connections unchanged — existing sock_recv/sock_sendall path still used
Integration tests (existing, need validation):
All test_client_routes.py SSL tests pass under EVENT_LOOP_MANAGER=asyncio (this is the primary acceptance criterion)
Run tests/integration/long/test_ssl.py under EVENT_LOOP_MANAGER=asyncio at least once manually (not in CI, but verified before merge)
Both check_hostname=True and check_hostname=False paths work
Regression tests:
Adapt ReactorTestMixin from tests/unit/io/utils.py for the asyncio reactor (currently not used by asyncio tests — only timers are tested)
Non-SSL asyncio connections must pass all existing tests unchanged
Scope and Risk
Files changed: Primarily cassandra/io/asyncioreactor.py, with minor changes to cassandra/connection.py (to allow subclass override of _wrap_socket_from_context()).
Backward compatibility: The public API (Cluster(ssl_context=...)) is unchanged. Only the internal transport mechanism changes for asyncio+SSL.
Risk areas:
Timeout budget splitting between TCP connect, SSL handshake, and CQL startup
_send_options_message() being called from connection_made() (async, on event loop thread) vs. from __init__() (sync, on caller thread) — verify thread safety of the CQL startup path
push() being called from both the caller thread and the event loop thread — the call_soon_threadsafe approach handles this, but verify no races with transport readiness
Non-SSL connections: Completely unchanged — sock_recv()/sock_sendall() work correctly for plain TCP sockets and the existing code path is not modified.
Summary
The asyncio reactor (
AsyncioConnection) usesloop.sock_recv()andloop.sock_sendall()with anssl.SSLSocket. CPython explicitly rejectsSSLSocketin these APIs — raisingTypeError("Socket cannot be of type SSLSocket")— on all Python versions this driver supports (≥3.9). TheTypeErroris silently swallowed because therun_coroutine_threadsafefutures are never inspected, causing the read and write coroutines to die immediately. The connection then hangs with no I/O untilConnection.factory()times out withOperationTimedOut.This was surfaced by PR #773 (updating the CI Scylla version from 2025.2 to 2026.1), which enables the
test_client_routes.pySSL tests for the first time. All 5 asyncio CI jobs fail onTestSslThroughNlb::test_ssl_without_hostname_verification_through_nlbwithOperationTimedOut, while all libev and asyncore jobs pass.In short: asyncio + SSL has never worked on supported Python versions. It was never caught because no CI-exercised test path combined these two until now.
Root Cause
The failure chain
AsyncioConnection.__init__()(asyncioreactor.py:88) calls_connect_socket(), which wraps the raw socket withssl_context.wrap_socket()(connection.py:1073), producing anssl.SSLSocket. The TCP connection and SSL handshake complete synchronously (blocking) during_connect_socket().The socket is set to non-blocking (asyncioreactor.py:93), then
handle_read()andhandle_write()coroutines are scheduled on the event loop thread viarun_coroutine_threadsafe()(asyncioreactor.py:102-107)._send_options_message()(asyncioreactor.py:108) queues the CQL OPTIONS message viapush()→_push_msg()→_write_queue.put_nowait().On the event loop thread,
handle_write()dequeues the message and callsawait self._loop.sock_sendall(self._socket, next_msg)(asyncioreactor.py:203). Sinceself._socketis anssl.SSLSocket, CPython's_check_ssl_socket()guard immediately raisesTypeError("Socket cannot be of type SSLSocket").The
TypeErroris not caught byexcept socket.error(TypeError is not a subclass of OSError). It propagates out of the coroutine and is stored in theconcurrent.futures.Futurereturned byrun_coroutine_threadsafe().Nobody ever calls
.result()on_read_watcheror_write_watcher, so theTypeErroris silently swallowed. Both coroutines are dead.With no read/write I/O, the CQL OPTIONS message is never sent, no SUPPORTED response is ever received,
connected_eventis never set, andConnection.factory()(connection.py:980) times out withOperationTimedOut.Why libev/asyncore work
Both
libevreactor.pyandasyncorereactor.pycallself._socket.recv()/self._socket.send()directly — the SSL socket's own methods — rather than passing the socket through asyncio'ssock_recv()/sock_sendall()APIs. They also handleSSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITEon both read and write paths:Why the right fix is asyncio SSL Transport, not just catching TypeError
Even if you bypassed the
_check_ssl_socketguard (e.g., by callingself._socket.recv()directly like libev does),sock_recv()/sock_sendall()have deeper architectural problems with SSL sockets:They only catch
BlockingIOError/InterruptedErrorinternally. SSL sockets raiseSSLWantReadError/SSLWantWriteErrorinstead, which are not caught by the retry logic.Direction mismatch.
sock_recv()registersadd_reader()on the fd;sock_sendall()registersadd_writer(). But SSL operations can need the opposite direction —recv()may need to write (TLS key update response), andsend()may need to read. These APIs have no mechanism for cross-direction I/O.TLS 1.3 makes cross-direction I/O common. TLS 1.3
NewSessionTicketmessages arrive immediately after the handshake, andKeyUpdatemessages can arrive at any time. Processing these during arecv()may require writing, and vice versa. With TLS 1.2 this was rare; with TLS 1.3 it is nearly guaranteed on the first read.handle_writehas no SSL error handling. Even with directrecv()/send(), the currenthandle_write(asyncioreactor.py:198) only catchessocket.error—SSLWantReadError/SSLWantWriteErrorduring a write would defunct the connection.Asyncio's built-in SSL transport (
SSLProtocol+ssl.MemoryBIO) handles all of this correctly: it keeps the raw TCP socket for the selector, usesssl.SSLObjectover memory BIOs for the TLS state machine, catchesSSLWantReadErroras "try again later," and flushes outgoing TLS frames after every operation.Why it wasn't caught before
client_routesSSL tests are gated by@skip_scylla_version_lt("2026.1.0")becausesystem.client_routesdidn't exist in 2025.2. PR ci: update Scylla test version from 2025.2 to 2026.1 #773 bumps the CI version to 2026.1, enabling these tests for the first time.tests/integration/long/test_ssl.pysuite (20+ SSL tests) is never run in CI — onlytests/integration/standard/is run.tests/unit/io/test_asyncioreactor.py) only test timers — they don't cover read/write/SSL at all. The sharedReactorTestMixin(which covers SSL error recovery, partial reads, etc.) is not used by the asyncio tests.Proposed Fix: Use asyncio SSL Transport (SSLProtocol + MemoryBIO)
Replace the raw
sock_recv()/sock_sendall()approach with asyncio's built-in SSL transport for SSL connections. Non-SSL connections continue to use the existing approach.Implementation Plan
Step 1: Override
_wrap_socket_from_context()to skip SSL wrappingOverride
_wrap_socket_from_context()inAsyncioConnectionto return the raw socket unchanged (nossl.SSLSocketwrapping). Storeself.ssl_contextand extractserver_hostnamefor later use bycreate_connection().server_hostnameextraction must follow the same logic as the current_wrap_socket_from_context()(connection.py:1019-1033):ssl_optionsprovidesserver_hostname, use it.ssl_context.check_hostnameis True and no explicitserver_hostname, useself.endpoint.address.check_hostnameis False and no explicitserver_hostname, passserver_hostname=""(empty string satisfies the asyncio API requirement without enabling hostname verification).Note:
loop.create_connection(ssl=ctx, sock=sock)requiresserver_hostnamewhen using a pre-connected socket — omitting it raisesValueError("You must set server_hostname when using ssl without a host").Step 2: Create an asyncio Protocol bridge
Note on
pause_writing/resume_writing: the current asyncio reactor never sets_socket_writable = False(pre-existing gap), so implementing these callbacks is an improvement._socket_writablegatessend_msg()at connection.py:1216 — when False, it raisesConnectionBusy. This provides proper write backpressure through the asyncio transport's high/low water marks.Step 3: Refactor
AsyncioConnection.__init__for SSL vs non-SSLFor SSL connections, replace the
handle_read()/handle_write()coroutines with a single_create_ssl_connection()coroutine that sets up the transport:For non-SSL connections, the existing
handle_read()/handle_write()/_send_options_message()flow remains unchanged.The
connect_timeoutbudget must be split:_connect_socket()(bounded bysocket.settimeout(connect_timeout))_ssl_handshake_timeoutand enforced viaasyncio.wait_for()Connection.factory()'sconnected_event.wait(timeout - elapsed)Step 4: Preserve
connected_eventsemantics exactlyconnected_eventmust NOT be set fromconnection_made(). In this driver,connected_eventmeans "CQL-level startup is complete (READY/auth received) or has failed." It is set at:ReadyMessagereceivedAuthSuccessMessagereceiveddefunct()(error path)_close()(cleanup path)Connection.factory()at connection.py:980 waits on this event withconnect_timeout. Setting it earlier would expose half-initialized connections that haven't completed the CQL handshake.The flow is:
connection_made()→_send_options_message()→ OPTIONS/SUPPORTED/STARTUP/READY exchange →connected_event.set().Step 5: Refactor
push()for SSL connectionsFor SSL connections, replace the write queue +
sock_sendall()withtransport.write():No chunking is needed for the transport path — asyncio's
SSLProtocolhandles buffering internally.Step 6: Update
close()/_close()for SSL connectionsFor SSL connections:
self._transport.close()instead ofself._socket.close()— the transport owns the socket and handles graceful TLS shutdown (close_notify).remove_reader()/remove_writer()calls — the transport manages its own fd registration._create_ssl_connectionfuture if still pending.For non-SSL connections, keep the existing close logic.
Step 7: Error propagation
All errors from the transport/protocol must reach the driver's error handling:
_create_ssl_connection()catches exceptions fromcreate_connection()(handshake failures, cert errors, timeouts) and callsself.defunct(exc), which setslast_error, callsclose(), errors all requests, and setsconnected_event.connection_lost(exc)in_CQLProtocolcallsself.defunct(exc)for error cases andself.close()for clean shutdown.asyncio.wait_for()raisesasyncio.TimeoutErrorif the handshake exceeds the timeout budget, which_create_ssl_connection()catches and passes todefunct().Step 8: Tests
Unit tests (new, required):
create_connection()→connection_made()→ OPTIONS/SUPPORTED/STARTUP/READY →connected_eventsetcreate_connection()raisesssl.SSLCertVerificationError→defunct()called →connected_eventset with errorasyncio.wait_for()raisesTimeoutError→defunct()calledconnection_lost(exc)→defunct()calledconnection_lost(None)→close()calledtransport.write()— verify data reaches the transport mockpause_writing()/resume_writing()→_socket_writabletoggledserver_hostnameextraction:check_hostname=Truewith/without explicit hostname,check_hostname=Falsesock_recv/sock_sendallpath still usedIntegration tests (existing, need validation):
test_client_routes.pySSL tests pass underEVENT_LOOP_MANAGER=asyncio(this is the primary acceptance criterion)tests/integration/long/test_ssl.pyunderEVENT_LOOP_MANAGER=asyncioat least once manually (not in CI, but verified before merge)check_hostname=Trueandcheck_hostname=Falsepaths workRegression tests:
ReactorTestMixinfromtests/unit/io/utils.pyfor the asyncio reactor (currently not used by asyncio tests — only timers are tested)Scope and Risk
cassandra/io/asyncioreactor.py, with minor changes tocassandra/connection.py(to allow subclass override of_wrap_socket_from_context()).Cluster(ssl_context=...)) is unchanged. Only the internal transport mechanism changes for asyncio+SSL._send_options_message()being called fromconnection_made()(async, on event loop thread) vs. from__init__()(sync, on caller thread) — verify thread safety of the CQL startup pathpush()being called from both the caller thread and the event loop thread — thecall_soon_threadsafeapproach handles this, but verify no races with transport readinesssock_recv()/sock_sendall()work correctly for plain TCP sockets and the existing code path is not modified.