Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
a397886
feat: add XML tool calling support as provider setting
jthweny Mar 21, 2026
1ce143a
fix: add useXmlToolCalling i18n keys to all locales
jthweny Mar 21, 2026
4269390
fix: add useXmlToolCalling keys to advancedSettings in all locale set…
jthweny Mar 21, 2026
c0877f3
feat: add useXmlToolCalling support to all providers
jthweny Mar 21, 2026
8a6d111
fix: improve XML tool calling reliability across providers
jthweny Mar 21, 2026
116061a
fix: update tests and snapshots for compact XML tool descriptions
jthweny Mar 22, 2026
1977f54
chore: remove unused xml-tool-descriptions.ts (knip)
jthweny Mar 22, 2026
fe27805
fix: update test assertions for missing tool_use.id error message
jthweny Mar 22, 2026
5e02378
docs: add intelligent memory system design spec
jthweny Mar 22, 2026
f563508
docs: address spec review feedback — sql.js, schema versioning, PII f…
jthweny Mar 22, 2026
5fbe17d
docs: add intelligent memory system implementation plan
jthweny Mar 22, 2026
98d6d31
feat: add three specialized subagents for intelligent memory system i…
jthweny Mar 22, 2026
93b4f25
feat(memory): add types and interfaces for intelligent memory system
jthweny Mar 22, 2026
d021796
feat(memory): add scoring module with decay and reinforcement formulas
jthweny Mar 22, 2026
b999753
feat(memory): add types and message preprocessor with noise filtering
jthweny Mar 22, 2026
1407657
feat(memory): add memory learning settings and message types
jthweny Mar 22, 2026
7ad6d0d
feat(memory): add prompt compiler for system prompt and analysis agen…
jthweny Mar 22, 2026
1969e48
feat(memory): add SQLite memory store via sql.js with schema versioning
jthweny Mar 22, 2026
2891cfe
feat(memory): add analysis agent with LLM invocation and response par…
jthweny Mar 22, 2026
0fe62ef
feat(memory): inject user profile section into system prompt
jthweny Mar 22, 2026
e63276b
feat(memory): add memory writer with PII filter, dedup, and workspace…
jthweny Mar 22, 2026
2f5eb45
feat(memory): add pipeline orchestrator with triggers, concurrency gu…
jthweny Mar 22, 2026
3f44db8
feat(memory): integrate orchestrator with extension host and message …
jthweny Mar 22, 2026
673cdbc
feat(memory): add memory learning toggle indicator to chat UI
jthweny Mar 22, 2026
0a8ac63
feat(memory): add memory learning settings section to SettingsView
jthweny Mar 22, 2026
2f2226e
feat(memory): add memory learning settings section to SettingsView
jthweny Mar 22, 2026
ef1482a
build: ensure sql.js WASM files are bundled in extension dist
jthweny Mar 22, 2026
5201e29
feat: add 8 verification and cleanup subagents for memory system
jthweny Mar 22, 2026
9fa6f0d
test(memory): add E2E tests for full pipeline, scoring, workspace sco…
jthweny Mar 22, 2026
df96e99
fix(memory): resolve type errors in src/core/memory
jthweny Mar 22, 2026
7961bff
fix(memory): resolve cross-agent type mismatches and add JSDoc
jthweny Mar 22, 2026
a7126a7
feat(memory): add personality traits system and frontend integration
jthweny Mar 22, 2026
6b80262
fix(memory): wire missing state flow and pipeline triggers
jthweny Mar 22, 2026
a55ea3e
docs: add memory sync spec and 5 new subagents
jthweny Mar 22, 2026
6fc24a7
fix(memory): resolve memory-specific provider profile instead of main…
jthweny Mar 22, 2026
effe896
Add deleteAllEntries() to MemoryStore
jthweny Mar 22, 2026
a4af083
feat(memory): add MemoryChatPicker dialog component
jthweny Mar 22, 2026
e136127
Add batchAnalyzeHistory() and clearAllMemory() to MemoryOrchestrator
jthweny Mar 22, 2026
0a26055
Add memory sync message types to WebviewMessage and ExtensionMessage
jthweny Mar 22, 2026
1644707
test(memory): add tests for clearAllMemory and provider-null guard
jthweny Mar 22, 2026
6e3dd14
Add startMemorySync and clearMemory message handlers
jthweny Mar 22, 2026
47aee65
feat(memory): add prior chat sync UI with progress and clear memory
jthweny Mar 22, 2026
83c9faa
docs: add memory debugging spec for system prompt, sync persistence, …
jthweny Mar 22, 2026
4a2895b
Fix system prompt preview missing memory profile section
jthweny Mar 22, 2026
6e80dcc
Add [Memory] debug logging to analysis pipeline
jthweny Mar 22, 2026
9c73980
fix: resolve race condition where memory store is queried before init…
jthweny Mar 22, 2026
4b4efcd
fix(memory): harden prompt compiler token cap and raise to 2000
jthweny Mar 22, 2026
e1010ca
fix: guard against concurrent memory syncs causing flickering
jthweny Mar 22, 2026
40f4e63
fix: persist memory sync progress bar across settings tab switches
jthweny Mar 22, 2026
cd4ecc8
feat(memory): auto-refresh memory status after sync/clear actions
jthweny Mar 22, 2026
cc05554
feat(memory): add visual feedback for memory status in Settings and c…
jthweny Mar 22, 2026
30d5af2
fix: resolve lint warnings in PersonalityTraitsPanel
jthweny Mar 22, 2026
1e43390
fix: guard MemoryChatPicker against undefined taskHistory in tests
jthweny Mar 22, 2026
4bdf50c
docs: add multi-orchestrator mode design spec
jthweny Mar 22, 2026
9c9efa0
docs: rewrite multi-orchestrator spec with full per-agent task instru…
jthweny Mar 22, 2026
aba3af3
feat(multi-orch): add shared types and constants
jthweny Mar 22, 2026
5a7d55f
feat(multi-orch): add message types and global settings
jthweny Mar 22, 2026
8b5a571
feat(multi-orch): add multi-orchestrator mode definition
jthweny Mar 22, 2026
175707c
feat(multi-orch): add agent count selector to chat area for multi-orc…
jthweny Mar 22, 2026
bd3d185
feat(multi-orch): add worktree manager for agent isolation
jthweny Mar 22, 2026
5b82735
feat(multi-orch): add report aggregator for unified result formatting
jthweny Mar 22, 2026
c04d89c
feat(multi-orch): add panel spawner for parallel agent tab panels
jthweny Mar 22, 2026
ce1faeb
feat(multi-orch): add status panel and plan review panel components
jthweny Mar 22, 2026
2b90de6
feat(multi-orch): add LLM-powered plan generator for task decomposition
jthweny Mar 22, 2026
94d9f9a
feat(multi-orch): add top-level orchestrator coordinating full lifecycle
jthweny Mar 22, 2026
7c76d6f
feat(multi-orch): add message handlers for plan, approve, abort, and …
jthweny Mar 22, 2026
b5f2859
test(multi-orch): add unit tests for types, plan generator, and repor…
jthweny Mar 22, 2026
1aa2eba
feat(multi-orch): add multi-orchestrator settings section
jthweny Mar 22, 2026
d7bf0c1
feat: add getOrCreateMultiOrchestrator factory to ClineProvider
jthweny Mar 22, 2026
4784c50
fix(multi-orch): wire full message chain between extension and webview
jthweny Mar 22, 2026
ed966a1
Add E2E integration tests for multi-orchestrator subsystem
jthweny Mar 22, 2026
d1c8798
fix: resolve test failures from personality traits update and MemoryS…
jthweny Mar 22, 2026
7a73a84
fix: getMultiOrchestrator now lazily creates the instance on-demand
jthweny Mar 22, 2026
c4a1958
fix: persist multi-orchestrator settings across webview reloads
jthweny Mar 22, 2026
1dd470f
fix: multi-orchestrator send path bypasses standard task guards
jthweny Mar 22, 2026
0c3e9cf
fix(multi-orchestrator): prevent architect mode from running as paral…
jthweny Mar 22, 2026
1cf3e6f
fix(multi-orchestrator): smarter task decomposition to avoid over-spl…
jthweny Mar 22, 2026
cc24b3c
fix: enforce agent count limit in multi-orchestrator
jthweny Mar 22, 2026
e666071
fix: switch spawned agent panels to correct mode before task creation
jthweny Mar 22, 2026
11941dd
debug: add MultiOrch tracing logs across the full agent-count pipeline
jthweny Mar 22, 2026
b020f0d
fix: enable auto-approval for spawned multi-orchestrator agent panels
jthweny Mar 22, 2026
bb21c14
debug: add extensive logging and robustness to plan-generator
jthweny Mar 22, 2026
a07d7ea
fix: harden PanelSpawner error handling, disposal, and ViewColumn safety
jthweny Mar 22, 2026
dda1a66
debug: add [MultiOrch:Handler] tracing across full Enter-to-execute flow
jthweny Mar 22, 2026
9b27f86
fix: add missing contextTokens to makeTokenUsage in e2e tests
jthweny Mar 22, 2026
ac7481d
fix(multi-orchestrator): close race conditions in agent startup lifec…
jthweny Mar 22, 2026
033d205
fix: align multi-orchestrator tests with updated plan-generator imple…
jthweny Mar 22, 2026
13e0d42
fix: wire MultiOrchStatusPanel and PlanReviewPanel into ChatView main…
jthweny Mar 22, 2026
a282304
docs: add multi-orchestrator debugging spec for agent count, parallel…
jthweny Mar 22, 2026
4b60c92
fix: remove short-request heuristic that overrode user's agent count
jthweny Mar 22, 2026
4f2203d
fix: parallelize panel spawning and task creation in multi-orchestrator
jthweny Mar 22, 2026
9edf375
debug(multi-orch): add diagnostic logging for spawned task failure in…
jthweny Mar 22, 2026
ac1bd80
fix: resolve multi-orchestrator auto-approval by using per-provider o…
jthweny Mar 22, 2026
2158d18
fix(multi-orch): use ViewColumn.Beside for panel placement, sequentia…
jthweny Mar 22, 2026
161bcff
fix(multi-orch): force exact agent count from user selection
jthweny Mar 22, 2026
ba99070
fix(multi-orch): force-approve ALL operations in spawned agent panels
jthweny Mar 22, 2026
33ffa3c
fix(multi-orch): prevent task completion loop by excluding resume ask…
jthweny Mar 23, 2026
ca73a7c
fix(multi-orch): wire worktree paths to spawned provider working dire…
jthweny Mar 23, 2026
17c4553
fix(multi-orch): stop task completion loop + add agent system prompt
jthweny Mar 23, 2026
e7d910d
fix(multi-orch): close panels after completion + capture agent reports
jthweny Mar 23, 2026
02b8581
feat(multi-orch): use vscode.setEditorLayout for proper N-column pane…
jthweny Mar 23, 2026
135c26e
docs: create Multi-Orchestrator Master Spec (living document)
jthweny Mar 23, 2026
b8c6a13
fix(multi-orch): BUG-002 — fire all agent start() calls simultaneously
jthweny Mar 23, 2026
8a20a6c
fix(multi-orch): BUG-003 — use focusNextGroup to place panels in corr…
jthweny Mar 23, 2026
09dc855
fix(multi-orch): thread ViewColumn to DiffViewProvider to fix BUG-001
jthweny Mar 23, 2026
0b05564
fix(BUG-005): suppress approve/deny button flicker in multi-orch agen…
jthweny Mar 23, 2026
15a0fb6
feat(multi-orch): add post-completion verification phase (FEAT-001)
jthweny Mar 23, 2026
d272928
fix: verification sweep — fix test failures and missing type export
jthweny Mar 23, 2026
94771f7
fix(multi-orch): use actual ViewColumn from panel, not symbolic value
jthweny Mar 23, 2026
ec9027d
docs: update master spec — BUG-001 and BUG-002 marked as fixed
jthweny Mar 23, 2026
f037640
fix(multi-orch): stagger agent starts + suppress diff views in agent …
jthweny Mar 23, 2026
7d5a867
docs: create exhaustive multi-orchestrator bug report and engineering…
jthweny Mar 23, 2026
f78cb0b
docs: add Bug #21 — finished sub-tasks don't flow back to multi-orche…
jthweny Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions packages/types/src/provider-settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ const baseProviderSettingsSchema = z.object({

// Model verbosity.
verbosity: verbosityLevelsSchema.optional(),

// Tool calling protocol.
useXmlToolCalling: z.boolean().optional(),
})

// Several of the providers share common model config properties.
Expand Down
7 changes: 7 additions & 0 deletions src/api/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,13 @@ export interface ApiHandlerCreateMessageMetadata {
* Only applies to providers that support function calling restrictions (e.g., Gemini).
*/
allowedFunctionNames?: string[]
/**
* When true, native tool definitions are omitted from the API request body.
* The model relies solely on XML tool documentation in the system prompt
* and outputs tool calls as raw XML text, which the existing TagMatcher
* in presentAssistantMessage() parses into ToolUse objects.
Comment on lines +91 to +93
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment claims XML tool calls are parsed by TagMatcher in presentAssistantMessage(), but presentAssistantMessage currently treats missing tool_use.id as an invalid legacy/XML tool call and rejects it, and tools generally require nativeArgs. Please update this comment to reflect the actual execution/parsing flow, or add the missing XML parsing implementation and adjust this description accordingly.

Suggested change
* The model relies solely on XML tool documentation in the system prompt
* and outputs tool calls as raw XML text, which the existing TagMatcher
* in presentAssistantMessage() parses into ToolUse objects.
* The model is expected to rely solely on XML tool documentation in the system prompt
* and may output tool calls as raw XML (or XML-like) text.
*
* This flag only affects how the request is constructed; any parsing of XML tool
* calls into ToolUse objects must be handled by higher-level consumer code.

Copilot uses AI. Check for mistakes.
*/
useXmlToolCalling?: boolean
}

export interface ApiHandler {
Expand Down
58 changes: 58 additions & 0 deletions src/api/providers/__tests__/anthropic.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -787,5 +787,63 @@ describe("AnthropicHandler", () => {
arguments: '"London"}',
})
})

it("should omit tools and tool_choice when useXmlToolCalling is true", async () => {
const stream = handler.createMessage(systemPrompt, messages, {
taskId: "test-task",
tools: mockTools,
tool_choice: "auto",
useXmlToolCalling: true,
})

// Consume the stream to trigger the API call
for await (const _chunk of stream) {
// Just consume
}

const callArgs = mockCreate.mock.calls[mockCreate.mock.calls.length - 1][0]
// When useXmlToolCalling is true, the tools and tool_choice should NOT be in the request
expect(callArgs.tools).toBeUndefined()
expect(callArgs.tool_choice).toBeUndefined()
})

it("should include tools when useXmlToolCalling is false", async () => {
const stream = handler.createMessage(systemPrompt, messages, {
taskId: "test-task",
tools: mockTools,
tool_choice: "auto",
useXmlToolCalling: false,
})

// Consume the stream to trigger the API call
for await (const _chunk of stream) {
// Just consume
}

const callArgs = mockCreate.mock.calls[mockCreate.mock.calls.length - 1][0]
// When useXmlToolCalling is false, tools should be included normally
expect(callArgs.tools).toBeDefined()
expect(callArgs.tools.length).toBeGreaterThan(0)
expect(callArgs.tool_choice).toBeDefined()
})

it("should include tools when useXmlToolCalling is undefined", async () => {
const stream = handler.createMessage(systemPrompt, messages, {
taskId: "test-task",
tools: mockTools,
tool_choice: "auto",
})

// Consume the stream to trigger the API call
for await (const _chunk of stream) {
// Just consume
}

const callArgs = mockCreate.mock.calls[mockCreate.mock.calls.length - 1][0]
// Default behavior: tools should be included
expect(callArgs.tools).toBeDefined()
expect(callArgs.tools.length).toBeGreaterThan(0)
expect(callArgs.tool_choice).toBeDefined()
})
})
})
13 changes: 9 additions & 4 deletions src/api/providers/anthropic-vertex.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,15 @@ export class AnthropicVertexHandler extends BaseProvider implements SingleComple
// Filter out non-Anthropic blocks (reasoning, thoughtSignature, etc.) before sending to the API
const sanitizedMessages = filterNonAnthropicBlocks(messages)

const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
Comment on lines +78 to +86
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as Anthropic: omitting native tools/tool_choice when useXmlToolCalling is true will leave the system without a working tool-call execution path unless XML tool calls are parsed into ToolUse blocks with ids/nativeArgs. As-is, this will likely break tool use for Anthropic Vertex. Either implement the XML parsing + tool documentation path, or continue sending native tool params.

Suggested change
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// Always send native tool definitions to the API request so that tool calling
// continues to work even when XML-based tool documentation is used elsewhere.
const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}

Copilot uses AI. Check for mistakes.

/**
* Vertex API has specific limitations for prompt caching:
Expand Down
13 changes: 9 additions & 4 deletions src/api/providers/anthropic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,15 @@ export class AnthropicHandler extends BaseProvider implements SingleCompletionHa
betas.push("context-1m-2025-08-07")
}

const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
Comment on lines +78 to +86
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When metadata.useXmlToolCalling is true, this omits tools/tool_choice from the Anthropic request, but the codebase currently executes tools via native tool_use blocks with ids/nativeArgs (XML/legacy tool calls are explicitly rejected in presentAssistantMessage/BaseTool). Without an XML-to-ToolUse parser (and tool schema documentation) this will prevent any tool execution. Either implement the XML parsing + tool catalog path end-to-end, or keep sending native tools for Anthropic.

Suggested change
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// Always send native tool definitions for Anthropic so that tool_use blocks are produced.
// The useXmlToolCalling flag is currently ignored here because the rest of the codebase
// expects native tool_use events and does not support XML-based tool calling.
const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}

Copilot uses AI. Check for mistakes.

switch (modelId) {
case "claude-sonnet-4-6":
Expand Down
175 changes: 159 additions & 16 deletions src/core/prompts/sections/__tests__/tool-use.spec.ts
Original file line number Diff line number Diff line change
@@ -1,31 +1,174 @@
import { getSharedToolUseSection } from "../tool-use"
import { getToolUseGuidelinesSection } from "../tool-use-guidelines"

describe("getSharedToolUseSection", () => {
it("should include native tool-calling instructions", () => {
const section = getSharedToolUseSection()
describe("default (native) mode", () => {
it("should include native tool-calling instructions", () => {
const section = getSharedToolUseSection()

expect(section).toContain("provider-native tool-calling mechanism")
expect(section).toContain("Do not include XML markup or examples")
expect(section).toContain("provider-native tool-calling mechanism")
expect(section).toContain("Do not include XML markup or examples")
})

it("should include multiple tools per message guidance", () => {
const section = getSharedToolUseSection()

expect(section).toContain("You must call at least one tool per assistant response")
expect(section).toContain("Prefer calling as many tools as are reasonably needed")
})

it("should NOT include single tool per message restriction", () => {
const section = getSharedToolUseSection()

expect(section).not.toContain("You must use exactly one tool call per assistant response")
expect(section).not.toContain("Do not call zero tools or more than one tool")
})

it("should NOT include XML formatting instructions", () => {
const section = getSharedToolUseSection()

expect(section).not.toContain("<actual_tool_name>")
expect(section).not.toContain("</actual_tool_name>")
})

it("should return native instructions when useXmlToolCalling is false", () => {
const section = getSharedToolUseSection(false)

expect(section).toContain("provider-native tool-calling mechanism")
expect(section).not.toContain("<actual_tool_name>")
})
})

it("should include multiple tools per message guidance", () => {
const section = getSharedToolUseSection()
describe("XML tool calling mode", () => {
it("should include XML formatting instructions when useXmlToolCalling is true", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("<actual_tool_name>")
expect(section).toContain("</actual_tool_name>")
expect(section).toContain("Tool uses are formatted using XML-style tags")
})

it("should NOT include provider-native tool-calling text when useXmlToolCalling is true", () => {
const section = getSharedToolUseSection(true)

expect(section).not.toContain("provider-native tool-calling mechanism")
expect(section).not.toContain("Do not include XML markup or examples")
})

it("should include parameter tag syntax example when useXmlToolCalling is true", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("<parameter1_name>value1</parameter1_name>")
expect(section).toContain("<parameter2_name>value2</parameter2_name>")
})

it("should include TOOL USE header when useXmlToolCalling is true", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("TOOL USE")
expect(section).toContain("You have access to a set of tools")
})

it("should include new_task XML example", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("<new_task>")
expect(section).toContain("<mode>code</mode>")
expect(section).toContain("</new_task>")
})

it("should include execute_command XML example", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("You must call at least one tool per assistant response")
expect(section).toContain("Prefer calling as many tools as are reasonably needed")
expect(section).toContain("<execute_command>")
expect(section).toContain("<command>npm run dev</command>")
expect(section).toContain("</execute_command>")
})

it("should include IMPORTANT XML FORMATTING RULES section", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("IMPORTANT XML FORMATTING RULES")
expect(section).toContain("Every opening tag MUST have a matching closing tag")
expect(section).toContain("Do NOT use self-closing tags")
expect(section).toContain("Do NOT include JSON objects")
expect(section).toContain("Do NOT wrap tool calls in markdown code blocks")
})

it("should include COMMON MISTAKES TO AVOID section", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("COMMON MISTAKES TO AVOID")
expect(section).toContain("Using JSON format")
expect(section).toContain("Missing closing tags")
expect(section).toContain("Using self-closing")
expect(section).toContain("Correct XML format")
})

it("should include read_file correct example in common mistakes", () => {
const section = getSharedToolUseSection(true)

expect(section).toContain("<read_file>")
expect(section).toContain("<path>src/app.ts</path>")
expect(section).toContain("</read_file>")
})
})
})

describe("getToolUseGuidelinesSection", () => {
describe("default (non-XML) mode", () => {
it("should include base guidelines without XML reinforcement", () => {
const section = getToolUseGuidelinesSection()

expect(section).toContain("# Tool Use Guidelines")
expect(section).toContain("Assess what information you already have")
expect(section).toContain("Choose the most appropriate tool")
expect(section).toContain("If multiple actions are needed")
})

it("should NOT include single tool per message restriction", () => {
const section = getSharedToolUseSection()
it("should NOT include XML reinforcement when called without arguments", () => {
const section = getToolUseGuidelinesSection()

expect(section).not.toContain("You must use exactly one tool call per assistant response")
expect(section).not.toContain("Do not call zero tools or more than one tool")
expect(section).not.toContain("REMINDER: You MUST format all tool calls as XML")
expect(section).not.toContain("Formulate your tool use using the XML format")
})

it("should NOT include XML reinforcement when useXmlToolCalling is false", () => {
const section = getToolUseGuidelinesSection(false)

expect(section).not.toContain("REMINDER: You MUST format all tool calls as XML")
expect(section).not.toContain("Formulate your tool use using the XML format")
})
})

it("should NOT include XML formatting instructions", () => {
const section = getSharedToolUseSection()
describe("XML tool calling mode", () => {
it("should include XML reinforcement guidelines when useXmlToolCalling is true", () => {
const section = getToolUseGuidelinesSection(true)

expect(section).toContain("Formulate your tool use using the XML format")
expect(section).toContain("REMINDER: You MUST format all tool calls as XML")
})

it("should include XML-specific numbered steps", () => {
const section = getToolUseGuidelinesSection(true)

expect(section).toContain("4. Formulate your tool use using the XML format")
expect(section).toContain("5. After each tool use, the user will respond")
expect(section).toContain("6. ALWAYS wait for user confirmation")
})

it("should still include base guidelines alongside XML reinforcement", () => {
const section = getToolUseGuidelinesSection(true)

expect(section).toContain("# Tool Use Guidelines")
expect(section).toContain("Assess what information you already have")
expect(section).toContain("Choose the most appropriate tool")
})

it("should include explicit XML structure reminder", () => {
const section = getToolUseGuidelinesSection(true)

expect(section).not.toContain("<actual_tool_name>")
expect(section).not.toContain("</actual_tool_name>")
expect(section).toContain("<tool_name><param>value</param></tool_name>")
})
})
})
13 changes: 11 additions & 2 deletions src/core/prompts/sections/tool-use-guidelines.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
export function getToolUseGuidelinesSection(): string {
export function getToolUseGuidelinesSection(useXmlToolCalling?: boolean): string {
const xmlReinforcement = useXmlToolCalling
? `
4. Formulate your tool use using the XML format specified for each tool. The tool name becomes the outermost XML tag, with each parameter as a nested child tag.
5. After each tool use, the user will respond with the result of that tool use. This result will provide you with the necessary information to continue your task or make further decisions.
6. ALWAYS wait for user confirmation after each tool use before proceeding. Never assume the success of a tool use without explicit confirmation of the result from the user.

**REMINDER: You MUST format all tool calls as XML.** Do not use JSON, function-call syntax, or any other format. Each tool call must use the exact XML structure: \`<tool_name><param>value</param></tool_name>\`.`
: ""

return `# Tool Use Guidelines

1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In XML mode, step 3 still says multiple tools may be used in a single message, but other XML instructions reinforce one-at-a-time tool usage. Please reconcile the XML-mode guidance so it consistently reflects the intended/implemented behavior (single-tool vs multi-tool) to avoid conflicting instructions in the system prompt.

Suggested change
: ""
return `# Tool Use Guidelines
1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.
: "";
const step3Guideline = useXmlToolCalling
? `3. If multiple actions are needed, use tools iteratively across messages, making at most one XML tool call per assistant message. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.`
: `3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.`;
return `# Tool Use Guidelines
1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
${step3Guideline}

Copilot uses AI. Check for mistakes.

${xmlReinforcement}
By carefully considering the user's response after tool executions, you can react accordingly and make informed decisions about how to proceed with the task. This iterative process helps ensure the overall success and accuracy of your work.`
}
50 changes: 49 additions & 1 deletion src/core/prompts/sections/tool-use.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,52 @@
export function getSharedToolUseSection(): string {
export function getSharedToolUseSection(useXmlToolCalling?: boolean): string {
if (useXmlToolCalling) {
return `====

TOOL USE

You have access to a set of tools that are executed upon the user's approval. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In XML mode this section says "You can use one tool per message", but the general tool-use guidelines (and native mode) explicitly allow multiple tools per message. This internal inconsistency can confuse the model and cause unpredictable tool behavior. Align the XML instructions with the actual supported behavior (either document single-tool restriction everywhere for XML mode, or remove the single-tool claim here).

Suggested change
You have access to a set of tools that are executed upon the user's approval. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
You have access to a set of tools that are executed upon the user's approval. You can use one or more tools per message, and will receive the result of those tool uses in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

Copilot uses AI. Check for mistakes.

# Tool Use Formatting

Tool uses are formatted using XML-style tags. The tool name itself becomes the XML tag name. Each parameter is enclosed within its own set of tags. Here's the structure:

<actual_tool_name>
<parameter1_name>value1</parameter1_name>
<parameter2_name>value2</parameter2_name>
...
</actual_tool_name>

For example, to use the new_task tool:

<new_task>
<mode>code</mode>
<message>Implement a new feature for the application.</message>
</new_task>

For example, to use the execute_command tool:

<execute_command>
<command>npm run dev</command>
</execute_command>

**IMPORTANT XML FORMATTING RULES:**
- Always use the actual tool name as the XML tag name for proper parsing and execution.
- Every opening tag MUST have a matching closing tag (e.g., <tool_name>...</tool_name>).
- Parameter tags must be nested inside the tool tag.
- Do NOT use self-closing tags (e.g., <param /> is invalid).
- Do NOT include JSON objects or other non-XML formatting for tool calls.
- Do NOT wrap tool calls in markdown code blocks - output raw XML directly.

**COMMON MISTAKES TO AVOID:**
- ❌ Using JSON format: { "tool": "read_file", "path": "src/app.ts" }
- ❌ Missing closing tags: <read_file><path>src/app.ts</path>
- ❌ Using self-closing: <read_file path="src/app.ts" />
- ✅ Correct XML format:
<read_file>
<path>src/app.ts</path>
</read_file>`
}

return `====

TOOL USE
Expand Down
Loading
Loading