Skip to content

harden: reject USER_CHANGES inserts without an author attribute#7773

Merged
JohnMcLear merged 5 commits into
ether:developfrom
JohnMcLear:harden/reject-insert-without-author-attrib
May 16, 2026
Merged

harden: reject USER_CHANGES inserts without an author attribute#7773
JohnMcLear merged 5 commits into
ether:developfrom
JohnMcLear:harden/reject-insert-without-author-attrib

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Adds defensive validation in handleUserChanges that rejects USER_CHANGES whose + (insert) ops lack the author attribute. Insert ops without an author attribute can produce an AText where text and attribs disagree on length, which downstream clients then fail to reconcile in ace2_inner.ts:setDocAText with:

mismatch error setting raw text in setDocAText

Once a pad reaches that state, every subsequent browser load throws on init and the pad becomes effectively unloadable.

The check is a one-liner inside the existing per-op validation loop and reuses the wireApool that's already constructed for the author-match validation a few lines above — no extra parsing or new traversal.

Why now

The standard JS web client always tags inserts with the author attribute (via Changeset.builder(...).insert(text, [['author', authorId]], pool)), so legitimate clients are unaffected. Less-careful clients (older code paths, third-party integrations, naive ports) can today send unattributed inserts that the server silently accepts — producing the broken AText state described above. This commit closes that gap server-side.

Changes

  1. src/node/handler/PadMessageHandler.ts — single new check at the end of the existing per-op validation loop. Reuses opAuthorId already computed there. ~6 lines.
  2. src/tests/backend/specs/messages.ts — existing USER_CHANGES test fixtures updated to send proper author-attributed inserts (*0+5 instead of +5, plus the matching apool entry). Mirrors what the JS client does in production. A new regression test 'insert without author attribute is rejected' locks in the new server behaviour.

Test plan

  • Existing USER_CHANGES tests still pass with the updated fixtures
  • New regression test asserts disconnect: badChangeset on an insert with empty attribs
  • Other test files that use unattributed inserts (ImportEtherpad.ts, importexportGetPost.ts) flow through different code paths (import API, not USER_CHANGES via socket.io) and are unaffected

🤖 Generated with Claude Code

Insert ops MUST carry the author attribute reference so that pad.atext.text
and pad.atext.attribs stay in lock-step. An accepted insert with empty
attribs would grow text without contributing matching attribute markers,
leaving the stored AText in a state where the two iterables disagree on
length when reconstructed. Downstream clients then fail reconciliation in
ace2_inner.ts:setDocAText with 'mismatch error setting raw text in
setDocAText' on every subsequent pad load — making the affected pad
effectively unloadable until manually repaired.

This commit adds a single defensive check inside the existing per-op
validation loop in handleUserChanges: when an op is a '+' (insert) and
its attribs string doesn't yield an 'author' entry via
AttributeMap.fromString, reject with badChangeset. The check piggybacks
on the wireApool that was already constructed for the prior author-match
validation, so no extra parsing.

Test fixtures in messages.ts were updated to send proper author-attributed
inserts plus the matching apool (mirroring what the JS web client always
does). A new regression test 'insert without author attribute is rejected'
locks in the new behaviour.
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 16, 2026

Review Summary by Qodo

(Agentic_describe updated until commit 1751900)

Harden pad changeset validation to prevent AText desynchronization

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Reject USER_CHANGES inserts lacking author attribute to prevent malformed AText
• Substitute stable system author for unattributed inserts via HTTP API/plugins
• Validate trailing newline preservation in all USER_CHANGES changesets
• Block wire-borne operations claiming reserved system author identity
• Update test fixtures to include proper author attribution in changesets
Diagram
flowchart LR
  A["USER_CHANGES socket.io"] -->|validate author attrib| B["Reject if missing"]
  C["HTTP API/plugins"] -->|spliceText/setText| D["Substitute system author"]
  E["Rebased changeset"] -->|validate trailing newline| F["Reject if missing"]
  G["Wire operation"] -->|check reserved author| H["Reject if system author"]
  B --> I["Well-formed AText"]
  D --> I
  F --> I
  H --> I
Loading

Grey Divider

File Changes

1. src/node/db/Pad.ts ✨ Enhancement +23/-2

Add system author substitution for unattributed inserts

• Added SYSTEM_AUTHOR_ID static constant for server-internal author attribution
• Modified spliceText to substitute system author when authorId is empty and insert is non-empty
• Updated appendRevision call to use effective author instead of original parameter
• Preserves backward compatibility for existing callers without requiring explicit author allocation

src/node/db/Pad.ts


2. src/node/handler/PadMessageHandler.ts 🐞 Bug fix +56/-9

Harden USER_CHANGES validation with multiple defensive checks

• Added import of applyToText from Changeset module
• Removed makeSplice from imports (no longer used in this file)
• Added validation to reject + ops without author attribute in per-op loop
• Added defense-in-depth check to reject wire operations claiming reserved system author
• Added pre-application validation to reject changesets that would strand trailing newline
• Captured canonical changeset form after pool mapping for accurate retransmission detection
• Removed automatic newline correction revision logic (now preventive instead of corrective)

src/node/handler/PadMessageHandler.ts


3. src/tests/backend/specs/Pad.ts 🧪 Tests +23/-2

Add system author substitution test coverage

• Updated spliceText and setText calls to pass explicit author IDs in existing tests
• Added new test verifying system author substitution for empty authorId with non-empty insert
• Test validates that system author binding appears in pad's attribute pool

src/tests/backend/specs/Pad.ts


View more (3)
4. src/tests/backend/specs/messages.ts 🧪 Tests +92/-12

Update USER_CHANGES tests with proper author attribution

• Captured authorId and roAuthorId from handshake responses in beforeEach
• Created authorPool helper to generate proper apool with author attribute for session
• Updated all USER_CHANGES test fixtures to include *0 author attribute reference and matching
 apool
• Added regression test for insert without author attribute rejection
• Added test for rejection of operations claiming reserved system author
• Added test for rejection of changesets that would strand trailing newline
• Updated read-only socket tests to use matching author in apool

src/tests/backend/specs/messages.ts


5. pnpm-lock.yaml Dependencies +14/-13

Update dependency versions

• Updated etherpad-cli-client from 4.0.2 to 4.0.3
• Updated eslint dependency resolution for eslint-import-resolver-typescript and
 eslint-plugin-import

pnpm-lock.yaml


6. src/package.json Dependencies +1/-1

Bump etherpad-cli-client version

• Updated etherpad-cli-client dependency from ^4.0.2 to ^4.0.3

src/package.json


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 16, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (3)

Grey Divider


Action required

1. Trailing-newline auto-fix removed 📘 Rule violation ☼ Reliability ⭐ New
Description
handleUserChanges now rejects any USER_CHANGES changeset that would leave the pad text not
ending with \n, instead of tolerating and correcting it. This removes a previously supported
behavior without a WARN-level deprecation period, which can break older/third-party clients that
rely on server-side correction.
Code

src/node/handler/PadMessageHandler.ts[R949-964]

+    // Defensive: reject any rebased changeset whose application would leave
+    // the pad text not ending with '\n'. Previously the server silently
+    // appended a separate `nlChangeset` correction revision; that worked
+    // for the stored pad but the FIRST broadcast (the malformed user
+    // revision) reached browsers BEFORE the correction did, and the
+    // browser's line assembler asserts "line assembler not finished" on
+    // a doc that doesn't end with '\n', taking the session out. Refuse to
+    // accept such changesets — clients must always preserve the
+    // trailing-newline invariant.
+    const projectedText = applyToText(rebasedChangeset, prevText);
+    if (!projectedText.endsWith('\n')) {
+      throw new Error(
+          `Rejected USER_CHANGES whose application would leave the pad ` +
+          `without a trailing '\\n' (length ${projectedText.length}). ` +
+          `Every USER_CHANGES must preserve the "doc ends with \\n" invariant.`);
+    }
Evidence
Compliance requires a WARN deprecation path before removing previously supported behavior. The new
code explicitly throws (rejects) when the applied text would not end with \n, immediately
enforcing the new rule without any WARN-based deprecation stage.

src/node/handler/PadMessageHandler.ts[949-964]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The server now hard-rejects USER_CHANGES that would violate the trailing newline invariant, which is a breaking behavior change and effectively removes the prior tolerance/correction behavior without a deprecation path.

## Issue Context
Per compliance, feature/behavior removals must be deprecated first with a WARN log, then removed/enforced in a later version. Current code throws an error immediately for these changesets.

## Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[949-964]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. System-author undo rejected 🐞 Bug ≡ Correctness ⭐ New
Description
handleUserChanges() rejects any op whose resolved authorId is 'a.etherpad-system', including '=' ops
that restore existing authorship. Pads can now legitimately contain system-attributed text via
server-side spliceText()/setText() without an authorId, so undoing “clear authorship colors” can
disconnect clients with badChangeset when the undo attempts to restore the system author.
Code

src/node/handler/PadMessageHandler.ts[R905-908]

+      if (opAuthorId === 'a.etherpad-system') {
+        throw new Error(`Author ${thisSession.author} attempted to submit changes as the ` +
+                        `reserved system author ${opAuthorId} in changeset ${changeset}`);
+      }
Evidence
The server now creates system-authored inserts when internal callers omit authorId, making system
author a legitimate existing attribution in pads. The editor’s clear-authorship command sets author
to '' and undo restores prior author attributes via '=' ops; rejecting all occurrences of the system
author therefore blocks that undo/restore path.

src/node/db/Pad.ts[95-106]
src/node/db/Pad.ts[416-442]
src/node/handler/PadMessageHandler.ts[848-908]
src/static/js/ace2_inner.ts[610-623]
src/tests/backend/specs/undo_clear_authorship.ts[168-195]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`handleUserChanges()` rejects any op whose `author` attribute resolves to the reserved system author (`a.etherpad-system`). This breaks legitimate client flows (notably undo of “clear authorship colors”) on pads that contain system-authored text, because undo restores author attributes via `'='` ops.

### Issue Context
* Server-side internal callers now attribute inserts to `Pad.SYSTEM_AUTHOR_ID` when no `authorId` is provided, so pads can contain system-authored segments.
* The editor’s “clearauthorship” command sets author to `''` across the doc; undo restores prior author attributes (including non-self authors) via `'='` ops.
* The new unconditional rejection of `a.etherpad-system` blocks that restoration.

### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[878-908]
- src/node/db/Pad.ts[416-442]
- src/static/js/ace2_inner.ts[610-623]
- src/tests/backend/specs/undo_clear_authorship.ts[168-195]

### Proposed fix approach
1. Narrow the reserved-system-author rejection so it only blocks actual attempts to *write as* the system author (for example: reject if `thisSession.author === 'a.etherpad-system'`, and/or reject only `'+'` ops whose `opAuthorId` is the system author).
2. Do **not** reject `'='` ops solely because they restore the system author; those are needed to undo/redo attribution changes on existing system-authored content.
3. Add/adjust a regression test that creates system-authored content (via `pad.setText(..., '')` or `spliceText(..., '')`), then performs clear-authorship + undo via USER_CHANGES and asserts it is accepted.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Wrong author in roSocket ✓ Resolved 🐞 Bug ≡ Correctness
Description
The USER_CHANGES test helper always builds the apool using authorId from the first socket’s
CLIENT_VARS, but roSocket can have a different author identity because common.connect() only
forwards cookies from the *current* HTTP response and /p/:pad does not re-issue the author-token
cookie if it already exists. With the new server validation requiring an author attribute on '+'
ops, sendUserChanges(roSocket, ...) can now be rejected as “changes as another author”, breaking
the permitOnce test.
Code

src/tests/backend/specs/messages.ts[R172-179]

+    // Insert ops MUST carry the author attribute (server-side validation
+    // added in this PR — see PadMessageHandler.ts). Helper assembles the
+    // wire form `*0+N` plus the matching apool entry for the session author.
+    const authorPool = () =>
+        ({numToAttrib: {0: ['author', authorId]}, nextNum: 1});
  const sendUserChanges =
-        async (socket:any, cs:any) => await common.sendUserChanges(socket, {baseRev: rev, changeset: cs});
+        async (socket:any, cs:any, apool: any = authorPool()) =>
+            await common.sendUserChanges(socket, {baseRev: rev, changeset: cs, apool});
Evidence
The test code captures only one authorId but uses it to build the default apool for all
USER_CHANGES. roSocket’s handshake response is not used to set its authorId. Because
common.connect() forwards only Set-Cookie-derived cookies, and /p/:pad won’t re-issue the author
token cookie if it already exists, roSocket can connect without the author-token cookie and will
fall back to a new random CLIENT_READY token, giving it a different authorId than authorId used by
authorPool()—triggering server-side author validation failures for roSocket inserts.

src/tests/backend/specs/messages.ts[26-46]
src/tests/backend/specs/messages.ts[171-179]
src/tests/backend/specs/messages.ts[255-270]
src/tests/backend/common.ts[189-206]
src/tests/backend/common.ts[228-236]
src/node/utils/ensureAuthorTokenCookie.ts[17-35]
src/node/hooks/express/specialpages.ts[195-218]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/tests/backend/specs/messages.ts` defaults the USER_CHANGES `apool` to the main socket’s `authorId` for *all* sockets. But `roSocket` can end up with a different authorId (no token cookie forwarded on its socket.io connection), so `sendUserChanges(roSocket, '...*0+...')` may be rejected by the new server-side validation.
## Issue Context
- `authorId` is captured only from the first socket’s CLIENT_VARS; the roSocket CLIENT_VARS is ignored.
- `common.connect(res)` forwards only cookies present in the provided HTTP response’s `set-cookie` headers.
- `/p/:pad` uses `ensureAuthorTokenCookie()`, which does not set a Set-Cookie header if the request already has a valid token cookie—so the second GET (to the read-only URL) often won’t provide the token cookie to `common.connect()`.
## Fix Focus Areas
- src/tests/backend/specs/messages.ts[26-46]
- src/tests/backend/specs/messages.ts[171-179]
- src/tests/backend/specs/messages.ts[255-270]
## Suggested fix
1. Capture `roAuthorId` from `common.handshake(roSocket, roPadId)` (read CLIENT_VARS).
2. Change `authorPool()` to accept an explicit authorId (or create two pools: one for `socket`, one for `roSocket`).
3. Ensure `sendUserChanges(roSocket, ...)` uses `roAuthorId`’s pool (or explicitly pass an `apool` argument in the roSocket test).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. HTTP API author default undocumented 📘 Rule violation ⚙ Maintainability ⭐ New
Description
Pad.spliceText() now substitutes a stable system author (a.etherpad-system) when an internal
caller performs an insert with an empty authorId, changing how setText()/appendText() behave
when authorId is omitted. The HTTP API documentation describes authorId as optional but does not
document the new default attribution behavior.
Code

src/node/db/Pad.ts[R429-441]

+    // An unattributed insert (empty authorId + non-empty ins) would produce
+    // an AText where `text` and `attribs` disagree on length — clients fail
+    // setDocAText reconciliation on load. Backward-compat fix: if the caller
+    // didn't provide an authorId, attribute the insert to a stable system
+    // author. ep_post_data and other plugins that want named attribution
+    // should still pass an explicit authorId.
+    const effectiveAuthorId =
+        (ins.length > 0 && !authorId) ? Pad.SYSTEM_AUTHOR_ID : authorId;
+    const attribs = effectiveAuthorId
+        ? [['author', effectiveAuthorId] as [string, string]]
+        : undefined;
    const changeset = makeSplice(orig, start, ndel, ins, attribs, this.pool);
-    await this.appendRevision(changeset, authorId);
+    await this.appendRevision(changeset, effectiveAuthorId);
Evidence
The code change introduces a new default author attribution (Pad.SYSTEM_AUTHOR_ID) for inserts
when authorId is empty, which affects documented HTTP API methods that accept optional authorId.
The HTTP API docs show authorId as optional but do not describe the default attribution behavior,
so the documentation is no longer aligned with the API behavior.

src/node/db/Pad.ts[429-441]
doc/api/http_api.md[373-399]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Behavior of HTTP API calls that omit `authorId` has changed: inserts are now attributed to a stable system author rather than being unattributed. This is an API behavior change that should be documented.

## Issue Context
The docs currently list `setText(padID, text, [authorId])` and `appendText(padID, text, [authorId])` but do not explain what happens when `authorId` is omitted.

## Fix Focus Areas
- src/node/db/Pad.ts[429-441]
- doc/api/http_api.md[373-399]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Extra full-text projection 🐞 Bug ➹ Performance ⭐ New
Description
The new trailing-newline invariant check applies the rebased changeset to the entire pad text via
applyToText(), then appendRevision() applies the same changeset again to build the new AText. This
adds an extra full-document string build on every USER_CHANGES, increasing CPU and allocations on a
hot path.
Code

src/node/handler/PadMessageHandler.ts[R958-964]

+    const projectedText = applyToText(rebasedChangeset, prevText);
+    if (!projectedText.endsWith('\n')) {
+      throw new Error(
+          `Rejected USER_CHANGES whose application would leave the pad ` +
+          `without a trailing '\\n' (length ${projectedText.length}). ` +
+          `Every USER_CHANGES must preserve the "doc ends with \\n" invariant.`);
+    }
Evidence
The new handler calls applyToText(), which constructs a full new string by iterating ops and
appending remaining text; appendRevision() then calls applyToAText() to build the new AText, which
includes the new text string. These are two separate full applications/allocation paths for each
edit.

src/node/handler/PadMessageHandler.ts[949-967]
src/static/js/Changeset.ts[397-440]
src/node/db/Pad.ts[228-235]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`handleUserChanges()` computes `projectedText` by calling `applyToText(rebasedChangeset, prevText)` to check the trailing `\n` invariant, and then calls `pad.appendRevision(rebasedChangeset, ...)`, which applies the changeset again (to AText) to produce/stash the new document state.

### Issue Context
This happens for every accepted USER_CHANGES and adds an O(document length) extra pass and allocation, independent of whether the change touches the end of the document.

### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[949-967]
- src/node/db/Pad.ts[228-276]
- src/static/js/Changeset.ts[397-440]

### Proposed fix approach
1. Avoid building `projectedText` just to check the last character:
  * Option A: Apply once in `appendRevision()` (it already computes `newAText.text`) and validate `newAText.text.endsWith('\n')` **before** committing/copying `newAText` into the pad.
  * Option B: Add a helper that returns the would-be new text/AText so validation and commit can reuse the same computed result (no second apply).
2. Keep the current behavior (reject) but ensure the validation does not require constructing a full new string twice per edit.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Unwarned rejection of inserts 📘 Rule violation ⚙ Maintainability
Description
The PR introduces an incompatible behavior change by rejecting previously accepted USER_CHANGES +
ops that lack an author attribute, but it does not emit a WARN-level deprecation message before
enforcing the new behavior. Older/third-party clients will be disconnected without advance warning
or a deprecation period.
Code

src/node/handler/PadMessageHandler.ts[R886-889]

+      if (op.opcode === '+' && !opAuthorId) {
+        throw new Error(`Author ${thisSession.author} submitted an insert without an ` +
+                        `author attribute in changeset ${changeset}`);
+      }
Evidence
PR Compliance ID 3 requires a WARN-level deprecation warning before incompatible behavior changes.
The new code immediately throws an Error for unattributed + ops, with no WARN log indicating
deprecation or future enforcement timeline.

src/node/handler/PadMessageHandler.ts[886-889]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Server behavior for USER_CHANGES insert ops has changed incompatibly (unattributed inserts now throw), but there is no WARN-level deprecation message to give existing clients time to adapt.
## Issue Context
A new validation rejects `op.opcode === '+'` when `opAuthorId` is falsy by throwing an Error. This can break older/third-party clients that previously worked, and the compliance checklist requires a WARN-level deprecation notice before incompatible removals/behavior changes.
## Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[886-889]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 1751900

Results up to commit N/A


🐞 Bugs (0) 📘 Rule violations (1) 📎 Requirement gaps (0)


Action required
1. Wrong author in roSocket ✓ Resolved 🐞 Bug ≡ Correctness
Description
The USER_CHANGES test helper always builds the apool using authorId from the first socket’s
CLIENT_VARS, but roSocket can have a different author identity because common.connect() only
forwards cookies from the *current* HTTP response and /p/:pad does not re-issue the author-token
cookie if it already exists. With the new server validation requiring an author attribute on '+'
ops, sendUserChanges(roSocket, ...) can now be rejected as “changes as another author”, breaking
the permitOnce test.
Code

src/tests/backend/specs/messages.ts[R172-179]

+    // Insert ops MUST carry the author attribute (server-side validation
+    // added in this PR — see PadMessageHandler.ts). Helper assembles the
+    // wire form `*0+N` plus the matching apool entry for the session author.
+    const authorPool = () =>
+        ({numToAttrib: {0: ['author', authorId]}, nextNum: 1});
   const sendUserChanges =
-        async (socket:any, cs:any) => await common.sendUserChanges(socket, {baseRev: rev, changeset: cs});
+        async (socket:any, cs:any, apool: any = authorPool()) =>
+            await common.sendUserChanges(socket, {baseRev: rev, changeset: cs, apool});
Evidence
The test code captures only one authorId but uses it to build the default apool for all
USER_CHANGES. roSocket’s handshake response is not used to set its authorId. Because
common.connect() forwards only Set-Cookie-derived cookies, and /p/:pad won’t re-issue the author
token cookie if it already exists, roSocket can connect without the author-token cookie and will
fall back to a new random CLIENT_READY token, giving it a different authorId than authorId used by
authorPool()—triggering server-side author validation failures for roSocket inserts.

src/tests/backend/specs/messages.ts[26-46]
src/tests/backend/specs/messages.ts[171-179]
src/tests/backend/specs/messages.ts[255-270]
src/tests/backend/common.ts[189-206]
src/tests/backend/common.ts[228-236]
src/node/utils/ensureAuthorTokenCookie.ts[17-35]
src/node/hooks/express/specialpages.ts[195-218]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/tests/backend/specs/messages.ts` defaults the USER_CHANGES `apool` to the main socket’s `authorId` for *all* sockets. But `roSocket` can end up with a different authorId (no token cookie forwarded on its socket.io connection), so `sendUserChanges(roSocket, '...*0+...')` may be rejected by the new server-side validation.
## Issue Context
- `authorId` is captured only from the first socket’s CLIENT_VARS; the roSocket CLIENT_VARS is ignored.
- `common.connect(res)` forwards only cookies present in the provided HTTP response’s `set-cookie` headers.
- `/p/:pad` uses `ensureAuthorTokenCookie()`, which does not set a Set-Cookie header if the request already has a valid token cookie—so the second GET (to the read-only URL) often won’t provide the token cookie to `common.connect()`.
## Fix Focus Areas
- src/tests/backend/specs/messages.ts[26-46]
- src/tests/backend/specs/messages.ts[171-179]
- src/tests/backend/specs/messages.ts[255-270]
## Suggested fix
1. Capture `roAuthorId` from `common.handshake(roSocket, roPadId)` (read CLIENT_VARS).
2. Change `authorPool()` to accept an explicit authorId (or create two pools: one for `socket`, one for `roSocket`).
3. Ensure `sendUserChanges(roSocket, ...)` uses `roAuthorId`’s pool (or explicitly pass an `apool` argument in the roSocket test).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. Unwarned rejection of inserts 📘 Rule violation ⚙ Maintainability
Description
The PR introduces an incompatible behavior change by rejecting previously accepted USER_CHANGES +
ops that lack an author attribute, but it does not emit a WARN-level deprecation message before
enforcing the new behavior. Older/third-party clients will be disconnected without advance warning
or a deprecation period.
Code

src/node/handler/PadMessageHandler.ts[R886-889]

+      if (op.opcode === '+' && !opAuthorId) {
+        throw new Error(`Author ${thisSession.author} submitted an insert without an ` +
+                        `author attribute in changeset ${changeset}`);
+      }
Evidence
PR Compliance ID 3 requires a WARN-level deprecation warning before incompatible behavior changes.
The new code immediately throws an Error for unattributed + ops, with no WARN log indicating
deprecation or future enforcement timeline.

src/node/handler/PadMessageHandler.ts[886-889]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Server behavior for USER_CHANGES insert ops has changed incompatibly (unattributed inserts now throw), but there is no WARN-level deprecation message to give existing clients time to adapt.
## Issue Context
A new validation rejects `op.opcode === '+'` when `opAuthorId` is falsy by throwing an Error. This can break older/third-party clients that previously worked, and the compliance checklist requires a WARN-level deprecation notice before incompatible removals/behavior changes.
## Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[886-889]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

Comment thread src/tests/backend/specs/messages.ts
The first commit closed the socket.io USER_CHANGES hole. This commit closes
the parallel path through Pad.spliceText (used by API.setText, API.appendText,
the import flow, and plugins like ep_post_data) where an unattributed insert
would otherwise produce a malformed AText.

Approach: instead of REJECTING (which would break ep_post_data and many
existing tests that call setText/appendText without an authorId), substitute
a stable system author when none is provided. The resulting changeset is
properly attributed, the AText stays well-formed, and existing callers
continue to work unchanged. Plugins that want named author attribution
should still pass an explicit authorId (e.g., one allocated via
authorManager.createAuthor).

Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' — a stable identifier that
appears in the pad's attribute pool when internal callers (HTTP API,
plugins, server-side imports) write text without naming an author. The
existing 'attribute changes by another author' protections still apply
to socket.io USER_CHANGES paths — a remote client can't impersonate the
system author for inserts (their session author check fires first).

Test:
- Pad.ts spec adds 'spliceText with empty authorId attributes to the
  system author' — verifies pad text lands AND the pool contains the
  system-author binding. Existing tests that pass an authorId are
  unaffected.
@JohnMcLear
Copy link
Copy Markdown
Member Author

Pushed a second commit (631d8c5) that closes the parallel hole through Pad.spliceText — which is the path used by API.setText / API.appendText, the import flow, and plugins like ep_post_data.

The first commit only caught socket.io USER_CHANGES. But callers reaching Pad.spliceText with an empty authorId produce the same malformed AText: attribs field on + ops is undefined, so the resulting AText's text and attribs drift out of sync.

I went with auto-substitute a stable system author rather than throw, because rejecting would break:

  • ~10 existing tests that call pad.setText(...) / pad.appendText(...) without an authorId
  • ep_post_data and any other plugins that call API.appendText(padId, content) (the second arg defaults to '')

Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' is now used when internal callers don't name an author. The AText stays well-formed. Plugins/callers that want named attribution should still pass an explicit authorId (allocate via authorManager.createAuthor).

New test 'spliceText with empty authorId attributes to the system author' verifies the pool gets the binding.

Happy to split this into a separate PR if you'd prefer to land the socket.io check first.

@JohnMcLear JohnMcLear self-assigned this May 16, 2026
@JohnMcLear JohnMcLear marked this pull request as draft May 16, 2026 06:47
@JohnMcLear
Copy link
Copy Markdown
Member Author

JohnMcLear commented May 16, 2026

I need to also check for

Uncaught (in promise) Error: doRepApplyChangeset length mismatch: 53/65

Etherpad's pad text always ends with '\n'. _handleUserChanges previously
appended a separate `nlChangeset` correction revision whenever the
applied USER_CHANGES left the pad without a trailing '\n'. The stored
pad ended up well-formed, but the FIRST NEW_CHANGES broadcast (the
malformed user revision itself) reached browsers BEFORE the correction
did, and applyToAttribution's MergingOpAssembler aborts with
"line assembler not finished" on a non-'\n'-terminated doc — the
watching browser session then dropped the changeset and any subsequent
edits silently no-op'd until the user reloaded.

Replace the silent auto-correction with an explicit reject. Compute
`applyToText(rebasedChangeset, prevText)` before appendRevision; if the
result doesn't end with '\n', throw -> badChangeset disconnect. Clients
must emit USER_CHANGES whose application preserves the invariant —
this matches what the JS web client already does and forces non-JS
clients (etherpad-pad, third-party integrations) to surface their bugs
in their own logs instead of stranding the trailing newline in pad
revision history.

Also fixes a latent retransmission-detection bug surfaced by this PR's
author-attrib changes: moveOpsToNewPool renumbers `*N` references to
whatever slot the pad pool assigns, which can differ from the wire
form's slot. Comparing the raw client wire against the stored revision
form (`changeset === c`) then misses legitimate retransmissions and
the same edit gets duplicated. Snapshot the post-pool-mapping form
(`canonicalCs`) and compare that against `c` instead.

Backend test additions:
- 'changeset that would strand the trailing \\n is rejected' covers
  the new rejection path with wire `Z:6>1|1=6*0+1$X` against
  `hello\n`.
- handleMessageSecurity test now captures roSocket's own authorId and
  uses it in the apool sent through roSocket, because the prior PR
  commit made `*0` referencing the wrong author a hard reject.

All 1130 backend tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@JohnMcLear
Copy link
Copy Markdown
Member Author

Pushed 805766c extending the PR with a second hardening pass, motivated by a related browser-disconnect vector surfaced by a non-JS client (etherpad-pad terminal editor):

Problem (related but distinct from the author-attrib vector this PR opens with):
Etherpad's pad text always ends with \n. When a client sends USER_CHANGES whose application strands the trailing \n mid-document, _handleUserChanges used to silently appended a second nlChangeset correction revision to fix the stored pad. But the FIRST NEW_CHANGES broadcast (the malformed user revision itself) reaches browsers BEFORE the correction does, and applyToAttribution's MergingOpAssembler asserts line assembler not finished on a non-\n-terminated doc — the watching browser session drops the changeset and any later edits silently no-op until reload.

Repro from a non-JS client (this is what surfaced it):

  • Wire Z:1h>6|g=1h*0+6$hello from a 53-char doc ending in \n — Keep covers all 53 chars, Insert lands at position 53, past the existing \n.
  • Server applies, sees no trailing \n, appends correction.
  • Browser receives the bad rev first and crashes.

Fix in 805766c:

  • Replace the silent auto-correction with an explicit reject. Compute applyToText(rebasedChangeset, prevText) before appendRevision; if the result doesn't end with \n, throw → badChangeset disconnect. Forces buggy clients to surface their issues in their own logs instead of stranding \n in pad history.
  • Side-fix for a latent retransmission-detection bug uncovered by this PR's author-attrib changes: moveOpsToNewPool renumbers *N references to whatever slot the pad pool assigns, which can differ from the wire form's slot. Comparing raw client changeset === c then misses retransmissions and duplicates the edit. Snapshot the post-pool-mapping form and compare against c instead.
  • Backend test 'changeset that would strand the trailing \\n is rejected' locks in the new behaviour with wire Z:6>1|1=6*0+1$X against hello\n.
  • Updated handleMessageSecurity test to capture roSocket's own authorId for the apool — the PR's prior commit made *0 referencing the wrong author a hard reject, which the test was tripping over.

All 1130 backend tests pass locally on Node 25.

Happy to split this into a separate PR if reviewers prefer that — but it's the same shape of bug (malformed-but-passes-checkRep USER_CHANGES → server accepts → browser chokes mid-stream) and the same defensive posture, so leaving it stacked here unless asked to separate.

JohnMcLear added a commit to ether/ep_ai_chat that referenced this pull request May 16, 2026
`applyEdit` previously left `attribs` undefined when no `edit.authorId`
was supplied, producing a changeset whose `+` ops carried no author
attribute. Etherpad then stored a pad.atext where `text` and `attribs`
disagreed on length, and every later client load hit
`mismatch error setting raw text in setDocAText` in ace2_inner — the
pad became effectively unloadable.

Mirror the `SYSTEM_AUTHOR_ID` pattern that ether/etherpad#7773 adopted
in core's `Pad.spliceText`: when the caller doesn't supply an authorId,
attribute the insert to `a.ep-ai-chat-system` so the changeset is
always well-formed. The system author is *not* announced in the user
list — `announceAiAuthor` stays gated on the caller-supplied id.

Adds a regression test that exercises the missing-authorId path and
asserts (a) every run in pad.atext.attribs carries an author attribute
and (b) `attribs` total chars equals `text.length`.

Refs ether/etherpad#7773
4.0.3 sends author-attributed inserts and preserves the trailing
newline, complying with this PR's tightened USER_CHANGES validation.
The rate-limit CI workflow drives the test pad via this client, so
without the bump the new server-side rejects fire on the very first
\`pad.append()\` and the rate-limit disconnect never gets a chance to
arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job
fails with "ratelimit was not triggered when sending every 99 ms".

Refs ether/etherpad-cli-client#131
Comment thread src/node/db/Pad.ts
* a fixed system author keeps the AText well-formed without requiring
* every plugin to allocate its own author up-front.
*/
static readonly SYSTEM_AUTHOR_ID = 'a.etherpad-system';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that be abused? If I name myself a.etherpad-system as author. Can this have consequences?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch incoming

The session-author equality check already rejects wire `*N` that
names a different real user, but `a.etherpad-system` is server-
internal — it's only used when spliceText / setText is called with
an empty authorId from HTTP API or plugin paths. A wire op that
names it is either a confused client or an attempt to launder
edits through a reserved attribution slot. Refuse.

Backend test 'insert claiming the reserved system author is
rejected' locks in the new behavior with wire `Z:1>5*0+5$hello`
plus an apool that maps slot 0 to `a.etherpad-system`. All 1131
backend tests pass.

Inline literal `'a.etherpad-system'` rather than importing the
constant from `Pad.SYSTEM_AUTHOR_ID` — `require('../db/Pad')` at
PadMessageHandler module scope returned a partially-initialized
class via the padManager circular path, leaving the static-field
access undefined at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@JohnMcLear JohnMcLear marked this pull request as ready for review May 16, 2026 16:23
@JohnMcLear JohnMcLear merged commit 294158e into ether:develop May 16, 2026
19 checks passed
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 16, 2026

Persistent review updated to latest commit 1751900

Comment on lines +949 to +964
// Defensive: reject any rebased changeset whose application would leave
// the pad text not ending with '\n'. Previously the server silently
// appended a separate `nlChangeset` correction revision; that worked
// for the stored pad but the FIRST broadcast (the malformed user
// revision) reached browsers BEFORE the correction did, and the
// browser's line assembler asserts "line assembler not finished" on
// a doc that doesn't end with '\n', taking the session out. Refuse to
// accept such changesets — clients must always preserve the
// trailing-newline invariant.
const projectedText = applyToText(rebasedChangeset, prevText);
if (!projectedText.endsWith('\n')) {
throw new Error(
`Rejected USER_CHANGES whose application would leave the pad ` +
`without a trailing '\\n' (length ${projectedText.length}). ` +
`Every USER_CHANGES must preserve the "doc ends with \\n" invariant.`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Trailing-newline auto-fix removed 📘 Rule violation ☼ Reliability

handleUserChanges now rejects any USER_CHANGES changeset that would leave the pad text not
ending with \n, instead of tolerating and correcting it. This removes a previously supported
behavior without a WARN-level deprecation period, which can break older/third-party clients that
rely on server-side correction.
Agent Prompt
## Issue description
The server now hard-rejects USER_CHANGES that would violate the trailing newline invariant, which is a breaking behavior change and effectively removes the prior tolerance/correction behavior without a deprecation path.

## Issue Context
Per compliance, feature/behavior removals must be deprecated first with a WARN log, then removed/enforced in a later version. Current code throws an error immediately for these changesets.

## Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[949-964]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +905 to +908
if (opAuthorId === 'a.etherpad-system') {
throw new Error(`Author ${thisSession.author} attempted to submit changes as the ` +
`reserved system author ${opAuthorId} in changeset ${changeset}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. System-author undo rejected 🐞 Bug ≡ Correctness

handleUserChanges() rejects any op whose resolved authorId is 'a.etherpad-system', including '=' ops
that restore existing authorship. Pads can now legitimately contain system-attributed text via
server-side spliceText()/setText() without an authorId, so undoing “clear authorship colors” can
disconnect clients with badChangeset when the undo attempts to restore the system author.
Agent Prompt
### Issue description
`handleUserChanges()` rejects any op whose `author` attribute resolves to the reserved system author (`a.etherpad-system`). This breaks legitimate client flows (notably undo of “clear authorship colors”) on pads that contain system-authored text, because undo restores author attributes via `'='` ops.

### Issue Context
* Server-side internal callers now attribute inserts to `Pad.SYSTEM_AUTHOR_ID` when no `authorId` is provided, so pads can contain system-authored segments.
* The editor’s “clearauthorship” command sets author to `''` across the doc; undo restores prior author attributes (including non-self authors) via `'='` ops.
* The new unconditional rejection of `a.etherpad-system` blocks that restoration.

### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[878-908]
- src/node/db/Pad.ts[416-442]
- src/static/js/ace2_inner.ts[610-623]
- src/tests/backend/specs/undo_clear_authorship.ts[168-195]

### Proposed fix approach
1. Narrow the reserved-system-author rejection so it only blocks actual attempts to *write as* the system author (for example: reject if `thisSession.author === 'a.etherpad-system'`, and/or reject only `'+'` ops whose `opAuthorId` is the system author).
2. Do **not** reject `'='` ops solely because they restore the system author; those are needed to undo/redo attribution changes on existing system-authored content.
3. Add/adjust a regression test that creates system-authored content (via `pad.setText(..., '')` or `spliceText(..., '')`), then performs clear-authorship + undo via USER_CHANGES and asserts it is accepted.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

pull Bot pushed a commit to fgreinacher/etherpad-cli-client that referenced this pull request May 16, 2026
Etherpad's USER_CHANGES handler now (post-2.7.x, see ether/etherpad#7773)
rejects two kinds of malformed inserts:

  1. `+` ops with no `author` attribute — they grow pad.atext.text without
     contributing matching markers to pad.atext.attribs, leaving the two
     iterables out of sync and breaking setDocAText reconciliation on
     every later client load.
  2. USER_CHANGES whose application would leave the pad text not ending
     with '\n' — the browser's line assembler asserts on docs that don't
     end with '\n' and the session dies.

Both were silently produced by `ee.append()`: it called
`makeSplice(text, text.length, 0, ins)` (insert at very end, no attribs,
no pool). When the host pad starts as `'\n'`, appending `'1'` produced
`'\n1'` (no trailing newline) and the changeset's insert carried no
author attribute.

Fix:

* Capture `userId` from CLIENT_VARS and use it as the author attribute
  on subsequent inserts.
* Splice at `text.length - 1` so the insert lands before the trailing
  '\n' and the invariant holds.
* Build the changeset against `padState.apool`, then `moveOpsToNewPool`
  into a fresh wire apool so the server can resolve the `*N` slot
  numbers we send.

Backward-compatible: pre-hardening Etherpad servers accept the new
shape too (it's the same shape the standard JS web client sends).

Refs ether/etherpad#7773
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants