feat(scaling): engine.io socket flush deferral — modest tail-latency reduction at mid-load by JohnMcLear · Pull Request #7774 · ether/etherpad

JohnMcLear · 2026-05-16T06:34:25Z

Summary

Follow-up to the closed engine.io WS packing prototype (#7772). That patch only modified `transport.send(packets[])` and almost never fired because engine.io's `Socket.sendPacket()` calls `flush()` synchronously after each `writeBuffer.push()`, and `flush()` drains immediately when `transport.writable === true` — microseconds on WebSocket. The writeBuffer almost never contained more than one packet.

This patch flips that. `Socket.prototype.sendPacket` is re-implemented to push to writeBuffer and then schedule a single coalesced flush via `queueMicrotask`. Multiple `sendPacket()` calls in the same task all accumulate; the queued microtask drains the whole batch. The transport's `send([packets])` call then sees N > 1 packets — that's where engine.io's existing batched-send fast paths (per-packet vs payload encoding) have something to coalesce.

Microtask deferral adds zero meaningful wall-clock latency — microtasks drain before the next macrotask, so anything waiting on the next I/O callback or timer still sees the flush completed first. Wire bytes are unchanged.

Measured impact (N=3, develop baseline vs setting-on)

Step	develop baseline (min/med/max)	flush-defer (min/med/max)	takeaway
100	28/38/38	37/37/37	within noise
200	30/37/51	21/44/49	within noise
300	38/45/71	50/53/58	tighter max (71→58)
350	39/39/122	61/84/110	tighter max (122→110)
400	1758/2275/2463	1501/2157/2887	overlapping at cliff

Not a cliff-mover. Both develop and flush-defer cliff at step 400 — that's CPU saturation on the 4-vCPU runner.

At step 300-350 the tail latency consistently shrinks. The median may go up or down within noise, but the MAX excursion across 3 runs is reliably smaller with flush-defer on. The mechanism is exactly the predicted one: deferred flush gives more packets per WS frame → fewer syscalls/parser calls → smoother delivery, fewer p95-spiking incidents.

Caveats

N=3 is the minimum honest sample size. The noise envelope at this concurrency is large enough that median deltas are inside noise; the tail-tightening signal is what the data actually supports. N=5+ would give tighter conclusions.
The cliff at step 400 is OS preemption (~7 cores of work on 4 vCPU). No application-level change in this PR's class can move it.
The patch re-implements engine.io 6.6.5's `sendPacket` inline rather than wrapping it, so future engine.io upgrades need re-vetting if they change packet-shape semantics.

Files

`src/node/utils/EngineFlushDeferral.ts` (new, ~100 lines incl. comments) — installer; idempotent; gated behind `settings.engineFlushDefer`.
`src/node/hooks/express/socketio.ts` — conditional `require()` before `new Server(...)`.
`src/node/utils/Settings.ts` — `engineFlushDefer: false` default.

No tests yet; the patch is end-to-end measured via the dive workflow. A unit test that constructs an in-memory engine.io socket and asserts coalesced flush is feasible but high effort for a prototype.

Recommendation

Merge as a modest mid-load improvement and a useful primitive for future engine.io-side investigations. Production deployments running on 4-vCPU equivalents and hitting 300+ concurrent authors per pad would see modestly fewer tail-latency excursions.

🤖 Generated with Claude Code

Follow-up to the closed engine.io WS packing prototype (#7772). That patch only modified transport.send(packets[]) and never fired because engine.io's Socket.sendPacket calls flush() synchronously after each push to writeBuffer — and flush drains immediately when transport.writable is true (microseconds on WebSocket). The writeBuffer almost never contained more than one packet. This patch flips that. Socket.prototype.sendPacket is re-implemented to push to writeBuffer and then schedule a single coalesced flush via queueMicrotask. Multiple sendPacket calls in the same task all accumulate; the queued microtask drains the whole batch. The transport.send([packets]) call then sees N > 1 packets in steady state, which is where lever 8 / future engine.io transport packing work has the opportunity to coalesce to one WS frame. Microtask deferral adds zero meaningful wall-clock latency: microtasks drain before the next macrotask, so anything waiting on the next I/O callback / timer still sees the flush completed first. Wire bytes are unchanged. Gated by settings.engineFlushDefer. Default false. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

qodo-code-review · 2026-05-16T06:34:29Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

qodo-free-for-open-source-projects · 2026-05-16T06:34:40Z

Review Summary by Qodo

Engine.io socket flush deferral for mid-load tail-latency reduction

✨ Enhancement

Walkthroughs

Description

• Defer engine.io socket flush to microtask for packet coalescing
• Reduces tail latency at mid-load (300-350 concurrent authors)
• Batches multiple sendPacket calls into single transport.send
• Gated by settings.engineFlushDefer (default false)

Diagram

flowchart LR
  A["Multiple sendPacket calls<br/>in same task"] -->|accumulate| B["writeBuffer<br/>with N packets"]
  B -->|microtask scheduled| C["Single coalesced flush"]
  C -->|transport.send| D["Fewer WebSocket frames<br/>per batch"]
  D -->|result| E["Reduced tail latency<br/>at mid-load"]

File Changes

1. src/node/utils/EngineFlushDeferral.ts ✨ Enhancement +95/-0

Engine.io flush deferral installer module

• New module implementing engine.io Socket.sendPacket re-implementation
• Schedules flush via queueMicrotask instead of synchronous call
• Uses Symbol to track scheduled state and prevent duplicate microtasks
• Includes comprehensive documentation of the deferral mechanism

src/node/utils/EngineFlushDeferral.ts

2. src/node/hooks/express/socketio.ts ✨ Enhancement +8/-0

Conditional engine.io flush deferral installation

• Conditionally loads and installs flush deferral before Server creation
• Gated by settings.engineFlushDefer configuration flag
• Applied before socket.io Server instantiation for prototype patching

src/node/hooks/express/socketio.ts

3. src/node/utils/Settings.ts ⚙️ Configuration changes +14/-0

Add engineFlushDefer configuration setting

• Adds engineFlushDefer boolean setting to SettingsType
• Defaults to false for backward compatibility
• Includes detailed documentation of the feature and its benefits
• Positioned alongside other scaling-related settings

src/node/utils/Settings.ts

qodo-free-for-open-source-projects · 2026-05-16T06:34:40Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. engineFlushDefer undocumented setting ✓ Resolved 📘 Rule violation ⚙ Maintainability

Description

A new user-visible configuration setting (engineFlushDefer) is introduced and used to change
runtime behavior, but the PR does not include corresponding documentation updates for operators.
This makes the feature flag discoverability and correct usage unclear, increasing operational risk
and support burden.

Code

src/node/utils/Settings.ts[R662-674]

+  /**
+   * Defer engine.io socket flush onto the next microtask so multiple
+   * sendPacket() calls within the same task accumulate in the writeBuffer
+   * before drain. Pairs with engine.io's existing transport.send([packets])
+   * fast path so a batched send produces fewer WebSocket frames.
+   *
+   * Adds no meaningful wall-clock latency — microtasks drain before any
+   * subsequent macrotask. Backward-compatible at the wire level; existing
+   * clients receive identical packet bytes.
+   *
+   * Default false. Enable only when scoring under the scaling dive.
+   */
+  engineFlushDefer: false,

Evidence

The checklist requires documentation updates for user-visible/interface changes. The PR introduces a
new settings key (engineFlushDefer) with operator-facing semantics and uses it to conditionally
install an engine.io patch, but no documentation update is included alongside this new configuration
surface.

src/node/utils/Settings.ts[662-674]
src/node/hooks/express/socketio.ts[71-78]
Best Practice: Repository guidelines

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The PR adds a new user-visible config flag `engineFlushDefer` (default `false`) and uses it to conditionally patch engine.io behavior, but there is no accompanying documentation update for operators.
## Issue Context
Per compliance requirements, user-visible/config interface changes must be documented in the same change set (typically in `doc/` and/or the operator-facing settings template).
## Fix Focus Areas
- settings.json.template[709-740]
- src/node/utils/Settings.ts[662-674]
- src/node/hooks/express/socketio.ts[71-78]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Install guard blocks retries~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

installEngineFlushDeferral() sets installed = true before requiring and validating engine.io, so a
single early failure (require error or unexpected Socket shape) prevents all future installation
attempts in the same process. This can leave settings.engineFlushDefer enabled but ineffective,
with only the first warning logged.

Code

src/node/utils/EngineFlushDeferral.ts[R48-63]

+export const installEngineFlushDeferral = (): void => {
+  if (installed) return;
+  installed = true;
+
+  let SocketProto: {sendPacket: (...a: unknown[]) => unknown};
+  try {
+    // eslint-disable-next-line @typescript-eslint/no-require-imports
+    SocketProto = require('engine.io/build/socket').Socket.prototype;
+  } catch (err: any) {
+    logger.warn(`Unable to install engine.io flush deferral (module not found): ${err && err.message || err}`);
+    return;
+  }
+  if (typeof SocketProto.sendPacket !== 'function') {
+    logger.warn('engine.io Socket shape unexpected; skipping flush deferral patch');
+    return;
+  }

Evidence

The installer marks itself installed before the engine.io require/shape checks and then returns
early on failure, leaving installed true and preventing subsequent attempts.

src/node/utils/EngineFlushDeferral.ts[48-63]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`installEngineFlushDeferral()` sets the module-level `installed` flag before it has successfully loaded and validated engine.io’s Socket prototype. If loading/validation fails, the function returns but `installed` remains true, preventing any retries in the same process.
## Issue Context
The installer is called from `expressCreateServer()` when `settings.engineFlushDefer === true`. Even though the setting is opt-in, a one-time failure (bad module path, temporary resolution issue, or unexpected module shape) will permanently disable the optimization until process restart.
## Fix Focus Areas
- src/node/utils/EngineFlushDeferral.ts[48-63]
## Suggested fix
- Move `installed = true` to after successful patching of `SocketProto.sendPacket`.
- Alternatively, keep two flags: `attempted` (to rate-limit logs) and `installed` (set only on success), or explicitly reset `installed = false` on early-return failure paths.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Two findings from rigorous N=3 scoring: 1. Lever 3 (#7768) is NOT a perf win. When you compare like-for- like matrix entries (develop-baseline vs PR-baseline), the per-socket serialization is slightly net-negative across the curve. My earlier "70% drop" was a single-run outlier; the subsequent "tighter envelope" was a cross-matrix-entry comparison confounded by noise. The serialization is still a real correctness fix (race on concurrent fan-outs + lost revisions on emit error) so the PR stays open, but the recommendation is now correctness-only. 2. Lever 8b (#7774) — engine.io flush deferral. The follow-up to the closed lever 8 that actually patches Socket.sendPacket instead of just transport.send. queueMicrotask-coalesced flush gives the transport multi-packet batches to work with at last. N=3 shows tighter tail at step 300-350 (122 → 110 max at 350, 71 → 58 max at 300). Not a cliff-mover. The only PR in this program with N=3-confirmed perf benefit. Final disposition: - Merge: #7774 (modest perf), #7768 (correctness), #7762 (already merged, instruments). - The cliff at 350-400 authors is hardware-bound on a 4-vCPU runner, not code-bound. Production with more cores per host scales proportionally with no code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two issues from the initial review: 1. Install guard blocked retries. Setting `installed = true` BEFORE requiring + validating engine.io meant a transient require error permanently disabled the patch for the rest of the process. Now the flag is set only after both checks pass; on failure, the warning is logged and a later boot path can retry. 2. engineFlushDefer was undocumented in settings.json.template. Added with the same prose as Settings.ts, including the "wire bytes are unchanged" / "no meaningful wall-clock latency" notes so operators see the safety case. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

JohnMcLear · 2026-05-16T06:49:03Z

Addressing Qodo's 2 items (commit f0661a9):

Install guard blocks retries → Fixed. installed = true now runs after require + shape validation. Transient require errors leave installed=false so a later boot path can retry. New comments call out the intent.
engineFlushDefer undocumented → Fixed. Added to settings.json.template next to loadTest with the same prose as Settings.ts (including the "wire bytes unchanged" / "no meaningful wall-clock latency" notes so operators see the safety case).

All 33 CI checks pass. The Windows-without-plugins flake from earlier was a post-job cache-save step, unrelated to this PR.

Add a "Roadmap for future effort" section ahead of Reproducing, ranking the next concrete options by impact-per-time-spent. Tier 1 (mechanical / <1 day each): - merge ready perf PRs (#7775+#7776+#7774) - implement #7780 room-broadcast fan-out - additional post-fix profile pass Tier 2 (medium, real cliff moves): - selective fan-out / viewport-based broadcast (~2 weeks; cliff ~500 → 1000-1500) - per-pad worker isolation PoC (~1-2 weeks PoC, 1-2 months prod) Tier 3 (large bets): - sticky-session cluster mode (~2-4 weeks PoC) - CRDT migration (months; anti-recommended) Tier 4 (operational): - production telemetry hookup (~3-5 days) - nightly dive in CI (~1 day) Records the recommended sequence (Tier 1.2 → Tier 2.4) so the next person picking this up doesn't need to re-derive it. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

JohnMcLear mentioned this pull request May 16, 2026

feat(scaling): serialize per-socket fan-out + NEW_CHANGES_BATCH (#7756 lever 3) #7768

Draft

qodo-free-for-open-source-projects Bot added the Compliance violation label May 16, 2026

qodo-free-for-open-source-projects Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/node/utils/Settings.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(scaling): engine.io socket flush deferral — modest tail-latency reduction at mid-load#7774

feat(scaling): engine.io socket flush deferral — modest tail-latency reduction at mid-load#7774
JohnMcLear wants to merge 2 commits into
developfrom
feat/engine-io-flush-deferral

JohnMcLear commented May 16, 2026

Uh oh!

qodo-code-review Bot commented May 16, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 16, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

JohnMcLear commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JohnMcLear commented May 16, 2026

Summary

Measured impact (N=3, develop baseline vs setting-on)

Caveats

Files

Recommendation

Uh oh!

qodo-code-review Bot commented May 16, 2026

Qodo reviews are paused for this user.

Uh oh!

qodo-free-for-open-source-projects Bot commented May 16, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

JohnMcLear commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-free-for-open-source-projects Bot commented May 16, 2026 •

edited

Loading