feat: parse GitHub.copilot-chat/transcripts/*.jsonl event-stream format#70
feat: parse GitHub.copilot-chat/transcripts/*.jsonl event-stream format#70hora7ce wants to merge 3 commits into
Conversation
…SH / devcontainer findVsCodeDirs() only scanned desktop installation paths (~/.config/Code, AppData, ~/Library/...) and missed the VS Code Server path used by WSL2, Remote SSH, and Dev Containers: ~/.vscode-server/data/User/workspaceStorage ~/.vscode-server-insiders/data/User/workspaceStorage Add both server editions to the scan on non-Windows platforms. Also extend harnessFromPath() with .vscode-server-insiders and .vscode-server checks (ordered most-specific first to avoid the Insiders path matching the plain .vscode-server substring) so sessions discovered via these paths are labelled 'Local Agent (Server)' or 'Local Agent (Server Insiders)' rather than the fallback 'Local Agent'. Fixes microsoft#62
- README.extension.md: add Local Agent (Server) and Local Agent (Server Insiders) rows to Supported Harnesses table - docs/content/_index.md: add server harness row to Multi-Harness Support table - docs/content/getting-started/supported-tools.md: note Remote-WSL/SSH/devcontainer log paths under Local Agent section - parser-vscode.ts: tighten harnessFromPath ordering comment (substring collision) and findVsCodeDirs platform guard comment per reviewer suggestions - parser-vscode.test.ts: add findVsCodeDirs test covering server workspaceStorage path inclusion via temporary home directory
Fixes microsoft#64. VS Code stores Copilot Chat sessions in two locations inside each workspace's workspaceStorage entry: 1. chatSessions/*.{json,jsonl} — existing format (already parsed) 2. GitHub.copilot-chat/transcripts/*.jsonl — newer event-stream format (silently ignored until now) This commit adds support for the second format. ## New helpers (parser-vscode.ts) - listTranscriptFiles(dir) — lists *.jsonl files in a transcripts/ dir - parseTranscriptLines(raw) — parses JSONL text into TranscriptEvent[] - buildToolNameIndex(events) — pre-indexes toolCallId → toolName - collectToolsFromToolRequests(...) — extracts tool names from assistant message toolRequests arrays with fallback to the pre-built index - buildRequestsFromTranscriptEvents(events, toolNames) — groups events into per-turn SessionRequest[] (one request per user.message) - parseTranscriptFile(filePath, wsId, wsName, harness, customInstrBytes) — public API: reads a transcript file and returns a Session or null ## Integration processWorkspaceEntry / processWorkspaceEntryAsync now scan the transcript directory alongside chatSessions and wire discovered sessions into the same sessions[] / sessionSourceIndex pipeline so the dashboard picks them up transparently. The async path tracks transcript files in the same progress-reporting budget as chat files (totalUnits includes both). ## Tests (parser-vscode.test.ts) Five new cases in the parseTranscriptFile describe block: - full flow: session.start → user.message → assistant.message with tool calls → tool.execution_start/complete → final assistant.message - multi-turn: two user/assistant pairs produce two requests - empty session: no user messages → null - malformed file: all-corrupt lines → null (events.length === 0) - deduplication: same tool appearing in both toolRequests and tool.execution_start is deduplicated to a single entry
|
@san360 LGTM can you check if this PR works and does not introduce any duplication? Thanks! |
|
Thanks @hora7ce for putting this together, and @eyehiel for the duplicate-sessions screenshot — both were extremely useful. After reviewing the PR locally on Windows desktop VS Code (built from this branch and from Before landing the transcript parser, we need to answer a set of open questions about when each format is present, whether the same I've opened #87 to track that investigation in detail, including:
Closing this PR in favor of #87 — the code here is a good starting point and will be referenced from the issue. Please feel free to follow up on #87 with your environment details (it sounds like you're on WSL / Remote-SSH per #62 / #63, which is exactly the scenario we most need data from). Also note one test ( Closing as: more investigation required (tracked in #87). |

Summary
Closes #64.
VS Code stores Copilot Chat sessions in two distinct locations inside each workspace's
workspaceStorageentry:chatSessions/*.{json,jsonl}GitHub.copilot-chat/transcripts/*.jsonlSessions recorded only in format 2 never appeared in the dashboard. This PR makes the extension parse both formats transparently.
The transcript event-stream format
Each
.jsonlfile represents one session. Each line is a typed JSON event:session.startsessionIdand metadatauser.messagecontenttextassistant.messagetoolRequests[]arraytool.execution_starttoolCallId+toolNametool.execution_completeChanges
src/core/parser-vscode.tsNew private helpers — each kept small to stay within the ESLint complexity limit:
listTranscriptFiles(dir)*.jsonlfiles under atranscripts/directory; returns[]when the dir does not existparseTranscriptLines(raw)buildToolNameIndex(events)toolCallId → toolNamefromtool.execution_starteventscollectToolsFromToolRequests(...)assistant.message.toolRequests[], withtoolCallIdfallback to the pre-built indexbuildRequestsFromTranscriptEvents(events, toolNames)SessionRequest[]; eachuser.messagestarts a new turn that is flushed when the nextuser.message(or end-of-stream) arrivesNew exported function:
parseTranscriptFile(filePath, wsId, wsName, harness, customInstructionsBytes?)— reads a transcript file, builds theSessionobject (or returnsnullfor empty / unreadable files), and is the single public surface consumed by the integration points below.Integration in
processWorkspaceEntryandprocessWorkspaceEntryAsync:chatSessions/, both functions now also scanGitHub.copilot-chat/transcripts/and add discovered sessions to the samesessions[]/sessionSourceIndexpipeline — fully transparent to the rest of the analyzer.totalUnitsso progress reporting stays accurate.src/core/parser-vscode.test.tsFive new tests in a
parseTranscriptFiledescribe block:SessionData.requests[]; validatessessionId,workspaceId,harness,messageText,responseText,toolsUsed,agentMode,timestampnullgracefullynullgracefully (guards theevents.length === 0path)toolRequestsandtool.execution_startcollapses to a single entryDependency
This branch is based on PR #63 (
fix/vscode-server-log-discovery), which adds~/.vscode-serverpath discovery. The two PRs are independent at the code level — they touch disjoint functions — but merging #63 first is recommended so the transcript sessions are attributed the correctLocal Agent (Server)harness label for VS Code Server users.Checklist
npm run checkpasses (typecheck + lint + spellcheck + knip + tests)