Skip to content

🤖 fix(goals): treat text-only continuation turns as goal completion#3326

Open
ammar-agent wants to merge 2 commits into
mainfrom
goals-snm0
Open

🤖 fix(goals): treat text-only continuation turns as goal completion#3326
ammar-agent wants to merge 2 commits into
mainfrom
goals-snm0

Conversation

@ammar-agent
Copy link
Copy Markdown
Collaborator

Summary

Fixes a bug in the Goals continuation loop where the agent could finish a goal_continuation turn with a plain-text "looks done" reply and no tool calls — leaving the goal in active, so the continuation loop would re-fire the same prompt against an idle agent until budget or cooldown gates intervened. We now interpret a tool-less goal-continuation turn as an implicit complete_goal call.

Background

The continuation prompt explicitly asks the agent to call complete_goal after a completion audit, but in practice models occasionally just narrate that they're done without invoking the tool. From the workspace's perspective the loop has no signal to stop — requestContinuationAfterStreamEnd reads the still-active goal and arms another continuation. The user-visible failure mode is repeated, no-op assistant turns that burn budget without making progress.

Implementation

The hook lives in AgentSession's stream-end forward handler (mirroring the existing extractSwitchAgentResult precedent for inspecting parts at the same site):

  1. After applyPendingAfterStreamEnd settles any mid-stream goal mutations, and inside the same guard that arms goalContinuationRequest, check whether the just-finished turn was a goal continuation (activeStreamContext.goalKind === GOAL_CONTINUATION_KIND).
  2. Inspect streamEndPayload.parts for any dynamic-tool entry. If none, treat the turn as silent.
  3. Synthesize a completion summary by lifting the last non-empty text part (trimmed, capped at 500 chars, ellipsised if truncated), falling back to a constant when no usable text exists.
  4. Delegate to a new WorkspaceGoalService.completeGoalFromSilentContinuation({ workspaceId, completionSummary }) which:
    • Re-reads goal state and short-circuits unless status is active (paused / complete / missing / budget-limited all fall through).
    • Calls setGoal({ status: "complete", initiator: "model", expectedGoalId }) so a concurrent clear/replace surfaces as a typed goal_conflict rather than a confusing validation error.
    • Logs and swallows typed Result errors so the stream-end cleanup path never throws on this best-effort hook.

The subsequent requestContinuationAfterStreamEnd call now no-ops on the completed goal (status !== "active" && status !== "budget_limited" early-return), so the continuation loop stops without additional plumbing.

Scope decisions

  • Continuation kind only. Restricts to GOAL_CONTINUATION_KIND so a user's first manual turn answered with text is never mistaken for completion (the manual auto-pause hook already covers that case).
  • budget_limited is out-of-scope. That status owns a one-shot wrap-up message; collapsing it here would race with tryMarkBudgetLimitInjected.
  • initiator: "model". Matches the explicit complete_goal tool path so analytics treats both completions as model-driven.

Validation

  • bun test src/node/services/agentSession.goalAutoPause.test.ts (16/16 pass, including 4 new tests).
  • bun test over related goal suites (workspaceGoalService, switchAgent, tools/goal, idleDispatcher) — 163/163 pass.
  • make static-check — green (typecheck + lint + fmt + docs link check).

New tests cover:

  • Text-only continuation → goal transitions to complete with the synthesized summary and a goal_completed analytics event tagged initiator: "model".
  • Tool-call continuation (dynamic-tool part present) → goal stays active.
  • Manual user message text-only finish → goal stays paused (auto-pause wins; auto-complete does not fire).
  • Continuation against a paused goal → status unchanged (service helper short-circuits on non-active status).

Risks

Low-to-moderate, scoped to the goal continuation flow:

  • The hook only fires for synthetic GOAL_CONTINUATION_KIND turns, so user-driven turns and budget-limit wrap-ups are unaffected.
  • Optimistic concurrency via expectedGoalId prevents a race with concurrent clear/replace from producing a stale completion.
  • All error paths are swallowed (typed Result errors via setGoal, plus a defensive try/catch around the helper), so a bad goal state cannot break stream-end cleanup.
  • Behavioral change: previously a silent continuation looped; now it terminates. If a model emits text-only as an intentional "still thinking" signal, the goal will be marked complete instead of continuing — but the existing continuation prompt explicitly requires tool calls, so any such pattern was already misuse.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max • Cost: n/a

When an agent ends a `goal_continuation` turn with a plain-text reply and
no tool calls, treat that as an implicit `complete_goal`. The continuation
prompt asks the agent to call `complete_goal` explicitly, but real models
sometimes finish with "looks done" without invoking the tool. Without this
fallback the continuation loop re-fires on the same idle output until
budget or cooldown gates intervene.

Scope:

- Only fires when `activeStreamContext.goalKind === GOAL_CONTINUATION_KIND`
  (skips manual user turns and budget-limit wrap-ups by design).
- Skips when any `dynamic-tool` part is present in the stream-end payload
  (the agent acted; let the next turn resolve naturally).
- Skips when goal status is not `active` (paused / complete / missing /
  budget_limited fall through; budget-limited owns its own wrap-up flow).
- Uses `expectedGoalId` for optimistic-concurrency safety against
  concurrent clear/replace.
- Completion summary defaults to the last text part (trimmed and capped
  at 500 chars) and falls back to a constant when no usable text exists.
- Errors from the underlying `setGoal` are logged and swallowed so
  stream-end cleanup never throws on this best-effort path.

The follow-on `requestContinuationAfterStreamEnd` call short-circuits on
the now-complete goal (`status !== "active" && status !== "budget_limited"`),
so the loop stops without additional plumbing.

Tests cover: text-only continuation → complete, tool-call continuation →
stays active, manual user message text-only → stays paused (the existing
auto-pause hook wins, not auto-complete), and paused-goal continuation →
stays paused (helper short-circuits on non-active status).
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edad7637f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/agentSession.ts
Addresses Codex P2 (#3326 PRRT_kwDOPxxmWM6DAGFi). A goal_continuation
turn that ends with text + no tool parts may have been truncated by the
provider (e.g. output-token limit → `finishReason === "length"`), not
naturally finished. Marking such turns complete would lose work.

Gate the silent-completion fallback on `metadata.finishReason === "stop"`
so only clean natural stops qualify; length-truncated / content-filtered
/ unknown finishes keep the goal active and resume on the next
continuation. Mirrors `TaskService`'s same conservatism for implicit
agent-task report finalization (taskService.ts: "Length-truncated or
other provider finish reasons still go through explicit completion-tool
recovery so partial assistant text cannot prematurely finalize the
task").

Tests updated:
- Default the shared `emitStreamEnd` helper to `finishReason: "stop"` so
  existing silent-continuation tests still trigger the auto-complete.
- Add a new "length-truncated text-only stream-end does not auto-complete
  the goal" case that emits `finishReason: "length"` and asserts the
  goal stays `active`.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed the P2 finishReason feedback in 5854347. The silent-completion fallback now requires metadata.finishReason === "stop" (matching the same conservatism TaskService uses for implicit task-report finalization), so length-truncated / content-filtered / unknown finishes keep the goal active and resume on the next continuation. Added a length-truncated test case alongside the existing scenarios.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant