Skip to content

Add jitter support for HTTP max_connection_duration #42410

@naren-13

Description

@naren-13

Title: Add jitter support for HTTP max_connection_duration

Description:
Currently, Envoy supports jitter for TCP max_connection_duration (implemented in #40686), but not for HTTP connections. This causes synchronised connection draining when many HTTP/2 connections reach the same max_connection_duration simultaneously, leading to thundering herd problems.

Use Case / Problem Statement

We are running a production Istio service mesh with approximately x number of HTTP/2 SIDECAR_INBOUND connections** using max_connection_duration: 7200s (2 hours).

Observed Behavior

When all connections hit the 2-hour mark simultaneously:

  1. Synchronised draining: All connections shutdown at the same time
  2. Service disruption: Incoming requests receive 503 errors during the drain window
  3. Response flags: Metrics show extensive UC (Upstream Connection Termination) flags in Istio telemetry

Evidence

From Istio/Envoy metrics during drain cycles:

istio_requests_total{response_code="503",response_flags="UC"} [high counts]
istio_requests_total{response_code="503",response_flags="UC"} 
istio_requests_total{response_code="503",response_flags="UC"} 

This is a classic thundering herd problem caused by synchronised connection lifecycle management.

We need to implement connection duration limits for compliance and security reasons but cannot do so with the current behavior. The synchronised draining creates service disruptions. Kindly help with the feature.
Expecting the existing TCP jitter implementation (from #40686) to HTTP connection durations, allowing connections to be closed in a staggered manner.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions