perf: don't log settings.loadTest warning per-message (#7756)#7776
perf: don't log settings.loadTest warning per-message (#7756)#7776JohnMcLear wants to merge 2 commits into
Conversation
CPU profile of develop (and of the open #7775 branch) at the 100-400 author dive sweep attributed ~4% of total process CPU to log4js inside SecurityManager.checkAccess. Tracing the actual log call: line 79-80 emits `console.warn('bypassing socket.io authentication...')` on every checkAccess invocation when settings.loadTest is true — once per inbound message. With log4js's replaceConsole + cluster-mode dispatch enabled, that warning allocated, formatted, and dispatched a LogEvent through sendToListeners -> sendLogEventToAppender for every CLIENT_READY, COMMIT_CHANGESET, USERINFO_UPDATE, etc. settings.loadTest is a configuration choice, not a per-request condition. The warning belongs at startup. Move it to Settings.ts init alongside the other "you set X, beware" warnings, and drop the per-message branch (the loadTest short-circuit still applies). Test plan: - tests/backend/specs/api/sessionsAndGroups.ts: 32 passing - tests/backend/specs/socketio.ts: 39 passing (handleMessage paths) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoMove loadTest warning to startup, eliminate per-message logging overhead
WalkthroughsDescription• Move loadTest warning from per-message to startup initialization • Eliminate ~4% CPU overhead in SecurityManager.checkAccess logging • Simplify authentication check logic by inverting condition • Add startup warning in Settings.ts alongside other config warnings Diagramflowchart LR
A["SecurityManager.checkAccess<br/>per-message"] -->|removed| B["console.warn<br/>~4% CPU cost"]
C["Settings.ts<br/>startup init"] -->|added| D["logger.warn<br/>one-time warning"]
E["loadTest bypass<br/>logic"] -->|unchanged| F["Silent short-circuit"]
File Changes1. src/node/db/SecurityManager.ts
|
Code Review by Qodo
1. settings.loadTest block misindented
|
N=3 measured impactThree perf-branch runs (25959515488, 25959516741, 25959517823) vs the same three develop baselines used to score #7775 — `authors=100..500:step=50:dwell=8s:warmup=2s`:
* step 350 develop raw p95 triples [35, 41.6, 41.8] vs #7776 [37.1, 37.9, 38.8] — develop's 39 is a single-run dip; #7776's distribution is tighter (every run between 37-39) which the median doesn't show. The CPU% gain at this step (-8.9%) is the cleanest direction signal. Summary:
Pairs cleanly with #7775 (which fixes a different log4js entry path). Both should land. |
Three combined-branch runs (perf/dive-combined = #7776 cherry-picked onto #7775 base; runs 25960003164/25960004223/25960005248) vs the same three develop baselines: -12% to -20% CPU% across all 9 sweep steps, with the p95 cliff effectively moving from ~400 to ~500 authors (at step 400, two of three combined runs land below the cliff at 45ms and 112ms p95 vs develop [1758, 2275, 2463]). Adds: - Lever 10 section for #7776 with its own N=3 numbers (-3.6 to -8.9% alone). - "Stacking" section showing super-additive interaction. - Local vCPU experiment showing the cliff is single-event-loop-bound, not total-CPU-bound: 4-core and 8-core pinned SUTs hit the same cliff at the same step. - Updated TL;DR, Recommendation order (merge both #7775+#7776 first), and "Where to take this next" with worker-thread offload as the smallest next architectural step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#7775 + #7776 stacked — N=3 combined measurementI cherry-picked #7775 onto #7776 (branch
The stacked impact (-12% to -20% CPU%) is well above the sum of the individual gains. Both fixes remove call sites feeding the same log4js cluster-mode dispatch chain ( Latency: cliff has effectively moved past step 400. Raw p95 triples:
At step 400, two of three combined runs land below the cliff entirely. Recommend landing these two together to capture the full effect. |
Post-#7775/#7776 profile shows applyToAText splits cleanly: - applyToText (Changeset.ts:404) is pure (cs, text) -> text; trivially offloadable to a worker via worker_threads structured-clone postMessage. - applyToAttribution (Changeset.ts:684) mutates AttributePool; not trivially offloadable. Document the obvious first-pass design (run them in parallel via Promise.all inside applyToAText) and the realistic estimate (~6-8% CPU moved off the main event loop). putAttrib is only 0.26% in the post-fix profile, confirming the bulk of applyToAText's cost is in the string-manipulation half. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues flagged: 1. Indentation: outdented the continuation lines inside the new `if (settings.loadTest)` block from 10 spaces to 8 (one level from `logger.warn(`), matching 2-space indent rule for added code. 2. Warning scope: the original wording said only socket.io authn/authz is bypassed, but settings.loadTest short-circuits SecurityManager.checkAccess() which is called from both HTTP (padaccess, importexport) and socket.io (PadMessageHandler) paths. Reword to "SecurityManager.checkAccess() will bypass authentication and authorization for both HTTP and socket.io requests". 3. Misleading "fires once at startup" comment in SecurityManager.ts: the warning is logged from Settings.ts reloadSettings(), which is also called on admin restart and plugin install. Rephrase to "logged from Settings.ts during settings load/reload, not on every request". All three issues are accurate. No behaviour change for the fix itself; only comment + warning text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all three Qodo issues in afb3a4a:
71 backend tests pass (sessionsAndGroups + socketio). |
|
Re Qodo's remaining issue #1 (indent on Settings.ts:1196-1200): The surrounding The Leaving as-is to match local convention. |
Add a "Roadmap for future effort" section ahead of Reproducing, ranking the next concrete options by impact-per-time-spent. Tier 1 (mechanical / <1 day each): - merge ready perf PRs (#7775+#7776+#7774) - implement #7780 room-broadcast fan-out - additional post-fix profile pass Tier 2 (medium, real cliff moves): - selective fan-out / viewport-based broadcast (~2 weeks; cliff ~500 → 1000-1500) - per-pad worker isolation PoC (~1-2 weeks PoC, 1-2 months prod) Tier 3 (large bets): - sticky-session cluster mode (~2-4 weeks PoC) - CRDT migration (months; anti-recommended) Tier 4 (operational): - production telemetry hookup (~3-5 days) - nightly dive in CI (~1 day) Records the recommended sequence (Tier 1.2 → Tier 2.4) so the next person picking this up doesn't need to re-derive it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
A fresh CPU profile of the dive SUT (etherpad-load-test workflow run 25957515210, against open PR #7775's branch to remove the throw-cascade noise) revealed that ~4% of total process CPU was spent in log4js inside
SecurityManager.checkAccess— and it wasn't from the throw chain at all. Tracing the actual log call exposes this line:```ts
// SecurityManager.checkAccess, line 78-81
if (settings.loadTest) {
console.warn(
'bypassing socket.io authentication and authorization checks due to settings.loadTest');
}
```
When `settings.loadTest` is true (used by the dive harness, by some local dev rigs, and by anyone benchmarking), this `console.warn` fires on every `checkAccess` call — once per inbound socket.io message. With `log4js.replaceConsole` + cluster-mode dispatch (default), each warning allocates a LogEvent, formats it, and walks the `sendToListeners` → `sendLogEventToAppender` chain. The profile attributes ~4% of total CPU to that path.
`settings.loadTest` is a configuration choice, not a per-request condition. Move the warning to startup (Settings.ts init, next to the other config-time warnings), and drop the per-message branch. The loadTest short-circuit (bypassing the `else if (settings.requireAuthentication)` block) still applies — just silently.
Profile evidence
Inverted-caller analysis of `log4js.Logger.` in the perf-branch profile (CPU.20260516.083843.2623.0.001.cpuprofile, 100→400 author sweep, ~103s wall, 65% on-CPU):
```
4.08% exports.checkAccess [src/node/db/SecurityManager.ts]
0.19% _getLocked [nm/ueberdb2]
0.19% _setLocked [nm/ueberdb2]
0.09% handleRequest [nm/router/lib/layer.js]
0.07% (anon) [src/node/handler/SocketIORouter.ts]
```
After PR #7775 removed the `getSessionInfo` throw cascade, this number was unchanged, which is what surfaced the real root cause.
Test plan
Relationship to #7775
Independent. #7775 removes the throw-as-control-flow pattern (CustomError construction is real CPU, ~1.8%); this PR removes the per-message warning logging (~4% in dive measurements with `loadTest: true`). Both should land. Together they hit the actionable parts of the per-message logger cost the profile flagged.