🤖 fix(goals): treat text-only continuation turns as goal completion#3326
🤖 fix(goals): treat text-only continuation turns as goal completion#3326ammar-agent wants to merge 2 commits into
Conversation
When an agent ends a `goal_continuation` turn with a plain-text reply and no tool calls, treat that as an implicit `complete_goal`. The continuation prompt asks the agent to call `complete_goal` explicitly, but real models sometimes finish with "looks done" without invoking the tool. Without this fallback the continuation loop re-fires on the same idle output until budget or cooldown gates intervene. Scope: - Only fires when `activeStreamContext.goalKind === GOAL_CONTINUATION_KIND` (skips manual user turns and budget-limit wrap-ups by design). - Skips when any `dynamic-tool` part is present in the stream-end payload (the agent acted; let the next turn resolve naturally). - Skips when goal status is not `active` (paused / complete / missing / budget_limited fall through; budget-limited owns its own wrap-up flow). - Uses `expectedGoalId` for optimistic-concurrency safety against concurrent clear/replace. - Completion summary defaults to the last text part (trimmed and capped at 500 chars) and falls back to a constant when no usable text exists. - Errors from the underlying `setGoal` are logged and swallowed so stream-end cleanup never throws on this best-effort path. The follow-on `requestContinuationAfterStreamEnd` call short-circuits on the now-complete goal (`status !== "active" && status !== "budget_limited"`), so the loop stops without additional plumbing. Tests cover: text-only continuation → complete, tool-call continuation → stays active, manual user message text-only → stays paused (the existing auto-pause hook wins, not auto-complete), and paused-goal continuation → stays paused (helper short-circuits on non-active status).
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edad7637f0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses Codex P2 (#3326 PRRT_kwDOPxxmWM6DAGFi). A goal_continuation turn that ends with text + no tool parts may have been truncated by the provider (e.g. output-token limit → `finishReason === "length"`), not naturally finished. Marking such turns complete would lose work. Gate the silent-completion fallback on `metadata.finishReason === "stop"` so only clean natural stops qualify; length-truncated / content-filtered / unknown finishes keep the goal active and resume on the next continuation. Mirrors `TaskService`'s same conservatism for implicit agent-task report finalization (taskService.ts: "Length-truncated or other provider finish reasons still go through explicit completion-tool recovery so partial assistant text cannot prematurely finalize the task"). Tests updated: - Default the shared `emitStreamEnd` helper to `finishReason: "stop"` so existing silent-continuation tests still trigger the auto-complete. - Add a new "length-truncated text-only stream-end does not auto-complete the goal" case that emits `finishReason: "length"` and asserts the goal stays `active`.
|
@codex review Addressed the P2 finishReason feedback in 5854347. The silent-completion fallback now requires |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Fixes a bug in the Goals continuation loop where the agent could finish a
goal_continuationturn with a plain-text "looks done" reply and no tool calls — leaving the goal inactive, so the continuation loop would re-fire the same prompt against an idle agent until budget or cooldown gates intervened. We now interpret a tool-less goal-continuation turn as an implicitcomplete_goalcall.Background
The continuation prompt explicitly asks the agent to call
complete_goalafter a completion audit, but in practice models occasionally just narrate that they're done without invoking the tool. From the workspace's perspective the loop has no signal to stop —requestContinuationAfterStreamEndreads the still-activegoal and arms another continuation. The user-visible failure mode is repeated, no-op assistant turns that burn budget without making progress.Implementation
The hook lives in
AgentSession'sstream-endforward handler (mirroring the existingextractSwitchAgentResultprecedent for inspecting parts at the same site):applyPendingAfterStreamEndsettles any mid-stream goal mutations, and inside the same guard that armsgoalContinuationRequest, check whether the just-finished turn was a goal continuation (activeStreamContext.goalKind === GOAL_CONTINUATION_KIND).streamEndPayload.partsfor anydynamic-toolentry. If none, treat the turn as silent.WorkspaceGoalService.completeGoalFromSilentContinuation({ workspaceId, completionSummary })which:active(paused / complete / missing / budget-limited all fall through).setGoal({ status: "complete", initiator: "model", expectedGoalId })so a concurrent clear/replace surfaces as a typedgoal_conflictrather than a confusing validation error.Resulterrors so the stream-end cleanup path never throws on this best-effort hook.The subsequent
requestContinuationAfterStreamEndcall now no-ops on the completed goal (status !== "active" && status !== "budget_limited"early-return), so the continuation loop stops without additional plumbing.Scope decisions
GOAL_CONTINUATION_KINDso a user's first manual turn answered with text is never mistaken for completion (the manual auto-pause hook already covers that case).budget_limitedis out-of-scope. That status owns a one-shot wrap-up message; collapsing it here would race withtryMarkBudgetLimitInjected.initiator: "model". Matches the explicitcomplete_goaltool path so analytics treats both completions as model-driven.Validation
bun test src/node/services/agentSession.goalAutoPause.test.ts(16/16 pass, including 4 new tests).bun testover related goal suites (workspaceGoalService,switchAgent,tools/goal,idleDispatcher) — 163/163 pass.make static-check— green (typecheck + lint + fmt + docs link check).New tests cover:
completewith the synthesized summary and agoal_completedanalytics event taggedinitiator: "model".dynamic-toolpart present) → goal staysactive.paused(auto-pause wins; auto-complete does not fire).pausedgoal → status unchanged (service helper short-circuits on non-active status).Risks
Low-to-moderate, scoped to the goal continuation flow:
GOAL_CONTINUATION_KINDturns, so user-driven turns and budget-limit wrap-ups are unaffected.expectedGoalIdprevents a race with concurrent clear/replace from producing a stale completion.setGoal, plus a defensivetry/catcharound the helper), so a bad goal state cannot break stream-end cleanup.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:max• Cost:n/a