Skip to content

Refactor/better outbound request types#1772

Draft
Stevenjin8 wants to merge 28 commits intoistio:masterfrom
Stevenjin8:refactor/better-outbound-request-types
Draft

Refactor/better outbound request types#1772
Stevenjin8 wants to merge 28 commits intoistio:masterfrom
Stevenjin8:refactor/better-outbound-request-types

Conversation

@Stevenjin8
Copy link
Contributor

The way I kinda want to see outbound.rs change

Copilot AI and others added 27 commits February 11, 2026 05:20
…eEntry

- Proto: Add ConnectStrategy enum and field to LoadBalancing message
- Rust model: Add ConnectStrategy enum and field to LoadBalancer struct
- Proto→Rust conversion: Map connect_strategy in TryFrom<&XdsService>
- Add resolve_all_on_demand_dns method for returning all DNS IPs
- Thread race_candidates through Upstream and Request structs
- Implement proxy_to_race using FuturesUnordered for connection racing
- Dispatch to proxy_to_race when race_candidates is non-empty
- Add rst_close helper for RST-closing on total failure
- Fix all test struct literals for new fields

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
…placeholder

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
1. Explicitly drop FuturesUnordered after finding winner to cancel
   remaining in-progress connection attempts.
2. Replace separate actual_destination + race_candidates fields with
   UpstreamTarget enum (Single/Race) for clearer type-level modeling.
3. Guard against racing for in-mesh (HBONE) destinations - only race
   for TCP (non-mesh) destinations.

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
… fix error log level

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Instead of spawning all connection attempts simultaneously, use a
timer-based approach inspired by Happy Eyeballs (RFC 8305):

1. Start with the first candidate
2. If an attempt fails immediately (e.g. RST), start the next
   candidate right away without waiting
3. If neither success nor failure occurs within 300ms, start the
   next candidate in parallel while keeping current attempts going
4. First successful TCP handshake wins; remaining attempts are cancelled

This reduces connection amplification: with 10 DNS entries and a
300ms delay, typically only 2 connections are made (assuming ~500ms
handshake time) instead of 10.

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Remove the representative field from UpstreamTarget::Race and instead
pass the original destination address (orig_dst) to proxy_to_race.
ConnectionResultBuilder is now constructed after racing determines the
actual winner via peer_addr(), ensuring metrics report the real
destination rather than a pre-selected representative.

Move connection tracking (track_outbound) into proxy_to_race so it
also uses the actual winning address.
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
* prioritize canonical services on inbound

* delete MatchReason

* fix itertools thing

* continue not return

* drop continue for into_iter

* as deref

* gencheck
* dry-run no allow policies match

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* authpol logging macro + env var to allow info log level if desired

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* move macro to better location

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* clean up

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* update tests

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* fmt

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>
* proxy: handle peer_addr() failure gracefully instead of panicking

Replace .expect("must receive peer addr") with proper error handling in
outbound and socks5 proxy paths. A client can send RST immediately after
the TCP handshake completes, causing getpeername(2) to return ENOTCONN on
the already-queued socket. The previous .expect() converted this transient
OS-level error into a panic that killed the Tokio task.

- outbound: match on peer_addr(), log debug and return early on error
- socks5/handle_socks_connection: same pattern
- socks5/negotiate_socks_connection: use ? operator (From<io::Error> for
  SocksError already exists)

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* i socket: avoid panic in orig_dst_addr diagnostic log

peer_addr() and local_addr() inside the warn!() in orig_dst_addr() were
called with .unwrap(), which could panic if the socket became unavailable
before the log statement was reached. Replace with .map(...).unwrap_or_else()
so the warning logs "N.A." instead of crashing when the address cannot be
retrieved.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Replace the Request struct + UpstreamTarget enum with a Request enum
having four variants: Tcp, RaceTcp, Hbone, DoubleHbone. Each variant
is a separate struct carrying only the fields relevant to that protocol
mode:

- TcpRequest: direct TCP to single address
- RaceTcpRequest: race TCP connections to multiple candidates
- HboneRequest: single HBONE tunnel (hbone_target_destination is
  non-optional)
- DoubleHboneRequest: nested HBONE via E/W gateway (includes
  final_sans)

Key changes:
- Remove UpstreamTarget enum (subsumed into Request variants)
- Each proxy_to_* handler takes its specific variant type and owns its
  own connection tracking and telemetry setup
- proxy_to() is now a clean match dispatch with no pre-dispatch setup
- send_hbone_request and create_hbone_request take individual fields
  instead of &Request
- baggage() takes &Workload directly
- conn_metrics() takes shared fields as parameters

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@istio-testing
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 4, 2026
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants