harden: reject USER_CHANGES inserts without an author attribute#7773
Conversation
Insert ops MUST carry the author attribute reference so that pad.atext.text and pad.atext.attribs stay in lock-step. An accepted insert with empty attribs would grow text without contributing matching attribute markers, leaving the stored AText in a state where the two iterables disagree on length when reconstructed. Downstream clients then fail reconciliation in ace2_inner.ts:setDocAText with 'mismatch error setting raw text in setDocAText' on every subsequent pad load — making the affected pad effectively unloadable until manually repaired. This commit adds a single defensive check inside the existing per-op validation loop in handleUserChanges: when an op is a '+' (insert) and its attribs string doesn't yield an 'author' entry via AttributeMap.fromString, reject with badChangeset. The check piggybacks on the wireApool that was already constructed for the prior author-match validation, so no extra parsing. Test fixtures in messages.ts were updated to send proper author-attributed inserts plus the matching apool (mirroring what the JS web client always does). A new regression test 'insert without author attribute is rejected' locks in the new behaviour.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by Qodo(Agentic_describe updated until commit 1751900)Harden pad changeset validation to prevent AText desynchronization
WalkthroughsDescription• Reject USER_CHANGES inserts lacking author attribute to prevent malformed AText • Substitute stable system author for unattributed inserts via HTTP API/plugins • Validate trailing newline preservation in all USER_CHANGES changesets • Block wire-borne operations claiming reserved system author identity • Update test fixtures to include proper author attribution in changesets Diagramflowchart LR
A["USER_CHANGES socket.io"] -->|validate author attrib| B["Reject if missing"]
C["HTTP API/plugins"] -->|spliceText/setText| D["Substitute system author"]
E["Rebased changeset"] -->|validate trailing newline| F["Reject if missing"]
G["Wire operation"] -->|check reserved author| H["Reject if system author"]
B --> I["Well-formed AText"]
D --> I
F --> I
H --> I
File Changes1. src/node/db/Pad.ts
|
Code Review by Qodo
1. Trailing-newline auto-fix removed
|
The first commit closed the socket.io USER_CHANGES hole. This commit closes the parallel path through Pad.spliceText (used by API.setText, API.appendText, the import flow, and plugins like ep_post_data) where an unattributed insert would otherwise produce a malformed AText. Approach: instead of REJECTING (which would break ep_post_data and many existing tests that call setText/appendText without an authorId), substitute a stable system author when none is provided. The resulting changeset is properly attributed, the AText stays well-formed, and existing callers continue to work unchanged. Plugins that want named author attribution should still pass an explicit authorId (e.g., one allocated via authorManager.createAuthor). Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' — a stable identifier that appears in the pad's attribute pool when internal callers (HTTP API, plugins, server-side imports) write text without naming an author. The existing 'attribute changes by another author' protections still apply to socket.io USER_CHANGES paths — a remote client can't impersonate the system author for inserts (their session author check fires first). Test: - Pad.ts spec adds 'spliceText with empty authorId attributes to the system author' — verifies pad text lands AND the pool contains the system-author binding. Existing tests that pass an authorId are unaffected.
|
Pushed a second commit (631d8c5) that closes the parallel hole through The first commit only caught socket.io I went with auto-substitute a stable system author rather than throw, because rejecting would break:
New test Happy to split this into a separate PR if you'd prefer to land the socket.io check first. |
|
I need to also check for |
Etherpad's pad text always ends with '\n'. _handleUserChanges previously appended a separate `nlChangeset` correction revision whenever the applied USER_CHANGES left the pad without a trailing '\n'. The stored pad ended up well-formed, but the FIRST NEW_CHANGES broadcast (the malformed user revision itself) reached browsers BEFORE the correction did, and applyToAttribution's MergingOpAssembler aborts with "line assembler not finished" on a non-'\n'-terminated doc — the watching browser session then dropped the changeset and any subsequent edits silently no-op'd until the user reloaded. Replace the silent auto-correction with an explicit reject. Compute `applyToText(rebasedChangeset, prevText)` before appendRevision; if the result doesn't end with '\n', throw -> badChangeset disconnect. Clients must emit USER_CHANGES whose application preserves the invariant — this matches what the JS web client already does and forces non-JS clients (etherpad-pad, third-party integrations) to surface their bugs in their own logs instead of stranding the trailing newline in pad revision history. Also fixes a latent retransmission-detection bug surfaced by this PR's author-attrib changes: moveOpsToNewPool renumbers `*N` references to whatever slot the pad pool assigns, which can differ from the wire form's slot. Comparing the raw client wire against the stored revision form (`changeset === c`) then misses legitimate retransmissions and the same edit gets duplicated. Snapshot the post-pool-mapping form (`canonicalCs`) and compare that against `c` instead. Backend test additions: - 'changeset that would strand the trailing \\n is rejected' covers the new rejection path with wire `Z:6>1|1=6*0+1$X` against `hello\n`. - handleMessageSecurity test now captures roSocket's own authorId and uses it in the apool sent through roSocket, because the prior PR commit made `*0` referencing the wrong author a hard reject. All 1130 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Pushed Problem (related but distinct from the author-attrib vector this PR opens with): Repro from a non-JS client (this is what surfaced it):
Fix in 805766c:
All 1130 backend tests pass locally on Node 25. Happy to split this into a separate PR if reviewers prefer that — but it's the same shape of bug (malformed-but-passes-checkRep USER_CHANGES → server accepts → browser chokes mid-stream) and the same defensive posture, so leaving it stacked here unless asked to separate. |
`applyEdit` previously left `attribs` undefined when no `edit.authorId` was supplied, producing a changeset whose `+` ops carried no author attribute. Etherpad then stored a pad.atext where `text` and `attribs` disagreed on length, and every later client load hit `mismatch error setting raw text in setDocAText` in ace2_inner — the pad became effectively unloadable. Mirror the `SYSTEM_AUTHOR_ID` pattern that ether/etherpad#7773 adopted in core's `Pad.spliceText`: when the caller doesn't supply an authorId, attribute the insert to `a.ep-ai-chat-system` so the changeset is always well-formed. The system author is *not* announced in the user list — `announceAiAuthor` stays gated on the caller-supplied id. Adds a regression test that exercises the missing-authorId path and asserts (a) every run in pad.atext.attribs carries an author attribute and (b) `attribs` total chars equals `text.length`. Refs ether/etherpad#7773
4.0.3 sends author-attributed inserts and preserves the trailing newline, complying with this PR's tightened USER_CHANGES validation. The rate-limit CI workflow drives the test pad via this client, so without the bump the new server-side rejects fire on the very first \`pad.append()\` and the rate-limit disconnect never gets a chance to arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job fails with "ratelimit was not triggered when sending every 99 ms". Refs ether/etherpad-cli-client#131
| * a fixed system author keeps the AText well-formed without requiring | ||
| * every plugin to allocate its own author up-front. | ||
| */ | ||
| static readonly SYSTEM_AUTHOR_ID = 'a.etherpad-system'; |
There was a problem hiding this comment.
Can that be abused? If I name myself a.etherpad-system as author. Can this have consequences?
The session-author equality check already rejects wire `*N` that
names a different real user, but `a.etherpad-system` is server-
internal — it's only used when spliceText / setText is called with
an empty authorId from HTTP API or plugin paths. A wire op that
names it is either a confused client or an attempt to launder
edits through a reserved attribution slot. Refuse.
Backend test 'insert claiming the reserved system author is
rejected' locks in the new behavior with wire `Z:1>5*0+5$hello`
plus an apool that maps slot 0 to `a.etherpad-system`. All 1131
backend tests pass.
Inline literal `'a.etherpad-system'` rather than importing the
constant from `Pad.SYSTEM_AUTHOR_ID` — `require('../db/Pad')` at
PadMessageHandler module scope returned a partially-initialized
class via the padManager circular path, leaving the static-field
access undefined at runtime.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Persistent review updated to latest commit 1751900 |
| // Defensive: reject any rebased changeset whose application would leave | ||
| // the pad text not ending with '\n'. Previously the server silently | ||
| // appended a separate `nlChangeset` correction revision; that worked | ||
| // for the stored pad but the FIRST broadcast (the malformed user | ||
| // revision) reached browsers BEFORE the correction did, and the | ||
| // browser's line assembler asserts "line assembler not finished" on | ||
| // a doc that doesn't end with '\n', taking the session out. Refuse to | ||
| // accept such changesets — clients must always preserve the | ||
| // trailing-newline invariant. | ||
| const projectedText = applyToText(rebasedChangeset, prevText); | ||
| if (!projectedText.endsWith('\n')) { | ||
| throw new Error( | ||
| `Rejected USER_CHANGES whose application would leave the pad ` + | ||
| `without a trailing '\\n' (length ${projectedText.length}). ` + | ||
| `Every USER_CHANGES must preserve the "doc ends with \\n" invariant.`); | ||
| } |
There was a problem hiding this comment.
1. Trailing-newline auto-fix removed 📘 Rule violation ☼ Reliability
handleUserChanges now rejects any USER_CHANGES changeset that would leave the pad text not ending with \n, instead of tolerating and correcting it. This removes a previously supported behavior without a WARN-level deprecation period, which can break older/third-party clients that rely on server-side correction.
Agent Prompt
## Issue description
The server now hard-rejects USER_CHANGES that would violate the trailing newline invariant, which is a breaking behavior change and effectively removes the prior tolerance/correction behavior without a deprecation path.
## Issue Context
Per compliance, feature/behavior removals must be deprecated first with a WARN log, then removed/enforced in a later version. Current code throws an error immediately for these changesets.
## Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[949-964]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if (opAuthorId === 'a.etherpad-system') { | ||
| throw new Error(`Author ${thisSession.author} attempted to submit changes as the ` + | ||
| `reserved system author ${opAuthorId} in changeset ${changeset}`); | ||
| } |
There was a problem hiding this comment.
2. System-author undo rejected 🐞 Bug ≡ Correctness
handleUserChanges() rejects any op whose resolved authorId is 'a.etherpad-system', including '=' ops that restore existing authorship. Pads can now legitimately contain system-attributed text via server-side spliceText()/setText() without an authorId, so undoing “clear authorship colors” can disconnect clients with badChangeset when the undo attempts to restore the system author.
Agent Prompt
### Issue description
`handleUserChanges()` rejects any op whose `author` attribute resolves to the reserved system author (`a.etherpad-system`). This breaks legitimate client flows (notably undo of “clear authorship colors”) on pads that contain system-authored text, because undo restores author attributes via `'='` ops.
### Issue Context
* Server-side internal callers now attribute inserts to `Pad.SYSTEM_AUTHOR_ID` when no `authorId` is provided, so pads can contain system-authored segments.
* The editor’s “clearauthorship” command sets author to `''` across the doc; undo restores prior author attributes (including non-self authors) via `'='` ops.
* The new unconditional rejection of `a.etherpad-system` blocks that restoration.
### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[878-908]
- src/node/db/Pad.ts[416-442]
- src/static/js/ace2_inner.ts[610-623]
- src/tests/backend/specs/undo_clear_authorship.ts[168-195]
### Proposed fix approach
1. Narrow the reserved-system-author rejection so it only blocks actual attempts to *write as* the system author (for example: reject if `thisSession.author === 'a.etherpad-system'`, and/or reject only `'+'` ops whose `opAuthorId` is the system author).
2. Do **not** reject `'='` ops solely because they restore the system author; those are needed to undo/redo attribution changes on existing system-authored content.
3. Add/adjust a regression test that creates system-authored content (via `pad.setText(..., '')` or `spliceText(..., '')`), then performs clear-authorship + undo via USER_CHANGES and asserts it is accepted.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Etherpad's USER_CHANGES handler now (post-2.7.x, see ether/etherpad#7773) rejects two kinds of malformed inserts: 1. `+` ops with no `author` attribute — they grow pad.atext.text without contributing matching markers to pad.atext.attribs, leaving the two iterables out of sync and breaking setDocAText reconciliation on every later client load. 2. USER_CHANGES whose application would leave the pad text not ending with '\n' — the browser's line assembler asserts on docs that don't end with '\n' and the session dies. Both were silently produced by `ee.append()`: it called `makeSplice(text, text.length, 0, ins)` (insert at very end, no attribs, no pool). When the host pad starts as `'\n'`, appending `'1'` produced `'\n1'` (no trailing newline) and the changeset's insert carried no author attribute. Fix: * Capture `userId` from CLIENT_VARS and use it as the author attribute on subsequent inserts. * Splice at `text.length - 1` so the insert lands before the trailing '\n' and the invariant holds. * Build the changeset against `padState.apool`, then `moveOpsToNewPool` into a fresh wire apool so the server can resolve the `*N` slot numbers we send. Backward-compatible: pre-hardening Etherpad servers accept the new shape too (it's the same shape the standard JS web client sends). Refs ether/etherpad#7773
Summary
Adds defensive validation in
handleUserChangesthat rejects USER_CHANGES whose+(insert) ops lack theauthorattribute. Insert ops without an author attribute can produce an AText wheretextandattribsdisagree on length, which downstream clients then fail to reconcile inace2_inner.ts:setDocATextwith:Once a pad reaches that state, every subsequent browser load throws on init and the pad becomes effectively unloadable.
The check is a one-liner inside the existing per-op validation loop and reuses the
wireApoolthat's already constructed for the author-match validation a few lines above — no extra parsing or new traversal.Why now
The standard JS web client always tags inserts with the author attribute (via
Changeset.builder(...).insert(text, [['author', authorId]], pool)), so legitimate clients are unaffected. Less-careful clients (older code paths, third-party integrations, naive ports) can today send unattributed inserts that the server silently accepts — producing the broken AText state described above. This commit closes that gap server-side.Changes
src/node/handler/PadMessageHandler.ts— single new check at the end of the existing per-op validation loop. ReusesopAuthorIdalready computed there. ~6 lines.src/tests/backend/specs/messages.ts— existing USER_CHANGES test fixtures updated to send proper author-attributed inserts (*0+5instead of+5, plus the matching apool entry). Mirrors what the JS client does in production. A new regression test'insert without author attribute is rejected'locks in the new server behaviour.Test plan
disconnect: badChangeseton an insert with empty attribsImportEtherpad.ts,importexportGetPost.ts) flow through different code paths (import API, not USER_CHANGES via socket.io) and are unaffected🤖 Generated with Claude Code