🤖 fix(goals): treat text-only continuation turns as goal completion by ammar-agent · Pull Request #3326 · coder/mux

ammar-agent · 2026-05-18T23:47:16Z

Summary

Fixes a bug in the Goals continuation loop where the agent could finish a goal_continuation turn with a plain-text "looks done" reply and no tool calls — leaving the goal in active, so the continuation loop would re-fire the same prompt against an idle agent until budget or cooldown gates intervened. We now interpret a tool-less goal-continuation turn as an implicit complete_goal call.

Background

The continuation prompt explicitly asks the agent to call complete_goal after a completion audit, but in practice models occasionally just narrate that they're done without invoking the tool. From the workspace's perspective the loop has no signal to stop — requestContinuationAfterStreamEnd reads the still-active goal and arms another continuation. The user-visible failure mode is repeated, no-op assistant turns that burn budget without making progress.

Implementation

The hook lives in AgentSession's stream-end forward handler (mirroring the existing extractSwitchAgentResult precedent for inspecting parts at the same site):

After applyPendingAfterStreamEnd settles any mid-stream goal mutations, and inside the same guard that arms goalContinuationRequest, check whether the just-finished turn was a goal continuation (activeStreamContext.goalKind === GOAL_CONTINUATION_KIND).
Inspect streamEndPayload.parts for any dynamic-tool entry. If none, treat the turn as silent.
Synthesize a completion summary by lifting the last non-empty text part (trimmed, capped at 500 chars, ellipsised if truncated), falling back to a constant when no usable text exists.
Delegate to a new WorkspaceGoalService.completeGoalFromSilentContinuation({ workspaceId, completionSummary }) which:
- Re-reads goal state and short-circuits unless status is active (paused / complete / missing / budget-limited all fall through).
- Calls setGoal({ status: "complete", initiator: "model", expectedGoalId }) so a concurrent clear/replace surfaces as a typed goal_conflict rather than a confusing validation error.
- Logs and swallows typed Result errors so the stream-end cleanup path never throws on this best-effort hook.

The subsequent requestContinuationAfterStreamEnd call now no-ops on the completed goal (status !== "active" && status !== "budget_limited" early-return), so the continuation loop stops without additional plumbing.

Scope decisions

Continuation kind only. Restricts to GOAL_CONTINUATION_KIND so a user's first manual turn answered with text is never mistaken for completion (the manual auto-pause hook already covers that case).
budget_limited is out-of-scope. That status owns a one-shot wrap-up message; collapsing it here would race with tryMarkBudgetLimitInjected.
initiator: "model". Matches the explicit complete_goal tool path so analytics treats both completions as model-driven.

Validation

bun test src/node/services/agentSession.goalAutoPause.test.ts (16/16 pass, including 4 new tests).
bun test over related goal suites (workspaceGoalService, switchAgent, tools/goal, idleDispatcher) — 163/163 pass.
make static-check — green (typecheck + lint + fmt + docs link check).

New tests cover:

Text-only continuation → goal transitions to complete with the synthesized summary and a goal_completed analytics event tagged initiator: "model".
Tool-call continuation (dynamic-tool part present) → goal stays active.
Manual user message text-only finish → goal stays paused (auto-pause wins; auto-complete does not fire).
Continuation against a paused goal → status unchanged (service helper short-circuits on non-active status).

Risks

Low-to-moderate, scoped to the goal continuation flow:

The hook only fires for synthetic GOAL_CONTINUATION_KIND turns, so user-driven turns and budget-limit wrap-ups are unaffected.
Optimistic concurrency via expectedGoalId prevents a race with concurrent clear/replace from producing a stale completion.
All error paths are swallowed (typed Result errors via setGoal, plus a defensive try/catch around the helper), so a bad goal state cannot break stream-end cleanup.
Behavioral change: previously a silent continuation looped; now it terminates. If a model emits text-only as an intentional "still thinking" signal, the goal will be marked complete instead of continuing — but the existing continuation prompt explicitly requires tool calls, so any such pattern was already misuse.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max • Cost: n/a

When an agent ends a `goal_continuation` turn with a plain-text reply and no tool calls, treat that as an implicit `complete_goal`. The continuation prompt asks the agent to call `complete_goal` explicitly, but real models sometimes finish with "looks done" without invoking the tool. Without this fallback the continuation loop re-fires on the same idle output until budget or cooldown gates intervene. Scope: - Only fires when `activeStreamContext.goalKind === GOAL_CONTINUATION_KIND` (skips manual user turns and budget-limit wrap-ups by design). - Skips when any `dynamic-tool` part is present in the stream-end payload (the agent acted; let the next turn resolve naturally). - Skips when goal status is not `active` (paused / complete / missing / budget_limited fall through; budget-limited owns its own wrap-up flow). - Uses `expectedGoalId` for optimistic-concurrency safety against concurrent clear/replace. - Completion summary defaults to the last text part (trimmed and capped at 500 chars) and falls back to a constant when no usable text exists. - Errors from the underlying `setGoal` are logged and swallowed so stream-end cleanup never throws on this best-effort path. The follow-on `requestContinuationAfterStreamEnd` call short-circuits on the now-complete goal (`status !== "active" && status !== "budget_limited"`), so the loop stops without additional plumbing. Tests cover: text-only continuation → complete, tool-call continuation → stays active, manual user message text-only → stays paused (the existing auto-pause hook wins, not auto-complete), and paused-goal continuation → stays paused (helper short-circuits on non-active status).

ammar-agent · 2026-05-18T23:47:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edad7637f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Addresses Codex P2 (#3326 PRRT_kwDOPxxmWM6DAGFi). A goal_continuation turn that ends with text + no tool parts may have been truncated by the provider (e.g. output-token limit → `finishReason === "length"`), not naturally finished. Marking such turns complete would lose work. Gate the silent-completion fallback on `metadata.finishReason === "stop"` so only clean natural stops qualify; length-truncated / content-filtered / unknown finishes keep the goal active and resume on the next continuation. Mirrors `TaskService`'s same conservatism for implicit agent-task report finalization (taskService.ts: "Length-truncated or other provider finish reasons still go through explicit completion-tool recovery so partial assistant text cannot prematurely finalize the task"). Tests updated: - Default the shared `emitStreamEnd` helper to `finishReason: "stop"` so existing silent-continuation tests still trigger the auto-complete. - Add a new "length-truncated text-only stream-end does not auto-complete the goal" case that emits `finishReason: "length"` and asserts the goal stays `active`.

ammar-agent · 2026-05-18T23:55:47Z

@codex review

Addressed the P2 finishReason feedback in 5854347. The silent-completion fallback now requires metadata.finishReason === "stop" (matching the same conservatism TaskService uses for implicit task-report finalization), so length-truncated / content-filtered / unknown finishes keep the goal active and resume on the next continuation. Added a length-truncated test case alongside the existing scenarios.

chatgpt-codex-connector · 2026-05-18T23:59:06Z

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/node/services/agentSession.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 fix(goals): treat text-only continuation turns as goal completion#3326

🤖 fix(goals): treat text-only continuation turns as goal completion#3326
ammar-agent wants to merge 2 commits into
mainfrom
goals-snm0

ammar-agent commented May 18, 2026

Uh oh!

ammar-agent commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ammar-agent commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ammar-agent commented May 18, 2026

Summary

Background

Implementation

Scope decisions

Validation

Risks

Uh oh!

ammar-agent commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant