Skip to content

tsgo backend (--tsgo) — spike with JS-symbols hybrid#99

Draft
johnsoncodehk wants to merge 19 commits into
masterfrom
feat/tsgo-backend
Draft

tsgo backend (--tsgo) — spike with JS-symbols hybrid#99
johnsoncodehk wants to merge 19 commits into
masterfrom
feat/tsgo-backend

Conversation

@johnsoncodehk
Copy link
Copy Markdown
Owner

@johnsoncodehk johnsoncodehk commented May 5, 2026

TL;DR

Spike for tsgo (@typescript/native-preview) backend integration via --tsgo. Doesn't currently win — losing on memory and recall on every workload, losing on speed except marginally on large single-project codebases. Architecture is sound; the gap is upstream tsgo maturity. Recommend hold-on-branch.

What this is

Opt-in --tsgo flag that swaps tsslint's lint backend from ts.Program to a hybrid built on @typescript/native-preview's sync API.

Architecture, after several pivots:

  • AST + Type queries → tsgo via sync IPC
  • Symbol resolution (Layer A) → real-ts in-process via bindSourceFile + scope walker. No IPC.
  • Symbol fallback (Layer B/C) → tsgo IPC for cross-file imports / property names on imported types / lib globals
  • Per-file release: bound JS SF + position maps dropped after each file's lint completes (memory hygiene)
  • Per-backend lifetime: each project setup gets its own resolver instance; closed + cleared on backend close

The pivot away from "tsgo for everything" came after measurement showed cross-process IPC dominates per-file lint cost. Symbol queries via tsgo's batched getSymbolAtPosition cost 11s on Dify (~5000 files). Same questions answered by real-ts bindSourceFile + scope walker in 1.8s — Symbol is binder output, doesn't need IPC.

Numbers (full)

Median of 3 runs, --force. Memory = peak RSS via /usr/bin/time -l (Node) + sampled ps (tsgo subprocess).

Dify web/ — 5000 .tsx, single project, one rule (react-x/no-leaked-conditional-rendering)

Backend Wall Memory (Node + Go) Errors detected Crash-class messages
ts.Program (baseline) 10.4s 2.27 GB 1867 9
--tsgo 9.2s 2.16 + 1.42 = 3.58 GB (1.58×) 1621 (-13% recall) 50

--tsgo wins speed by 12%. Loses memory by 58%. Misses 246 true positives. Reports 41 more crash-class diagnostics (mostly tsgo upstream Type.Types panics surfaced by ts-api-utils).

tsslint own repo — 50 files across 7 projects, full ts-eslint via importESLintRules

Backend Wall Memory (Node only) Files passed Messages
ts.Program (baseline) 1.6s 0.48 GB 60 12
--tsgo 37.6s (24× slower) 2.79 GB (5.8×) 29 247

Catastrophic on small/multi-project. Each project setup pays a fixed updateSnapshot cost (~0.8s) plus a tsgo client SourceFileCache allocation that doesn't release across snapshot boundaries. Half the files don't complete due to upstream tsgo edge-case panics that the larger Dify file sample happens to avoid.

Upstream tsgo blockers (not addressable here)

These are why recall is below ts.Program and why this can't ship as a user-facing feature today.

  1. Checker.getResolvedSignature panics on generic method calls (Map.get, Array.pop). Minimal repro reproduces with raw tsgo + 30 LOC, no adapter. CLI compile of the same fixture works — specific to the IPC API path.

  2. Checker.getTypeAtLocation returns wrong types for PropertyAccess in some real-codebase contexts. Reproduces on Dify but not on hand-built minimal fixtures. getTypeAtPosition returns correct types for the same expressions, so 513d7d7 routes via that. Underlying API divergence remains.

  3. isTypeAssignableTo / getApparentType not in the API. Adapter has structural-cover shims (sound but incomplete; long tail returns false / input). Tracked in microsoft/typescript-go#3610. Source of most missed true positives.

  4. No Volar host injection. --tsgo rejects --vue-project / --mdx-project / --astro-project / --vue-vine-project with a clear error. Meta-frameworks stay on ts.Program.

  5. No BuilderProgram JS API. Layer-2 incremental cache (affected-file classification) disabled on --tsgo; treats every file as affected.

  6. SourceFileCache doesn't release across snapshot boundaries. Drives the 5.8× memory blowup on multi-project lint. Per-file releaseFile (commit 3e161c2) only releases what tsslint owns; the tsgo client's hydrated AST + Type/Symbol object cache stays pinned until snapshot dispose.

What this PR does own

Within tsslint's adapter layer, every issue we can address has been addressed:

  • 16 commits cleanly separated (foundation → facade → prototype shims → per-kind routing → JS-symbols pivot → memory hygiene)
  • Per-backend resolver instance, closure-encapsulated caches, clear() on backend.close()
  • invalidateFile(name) for --fix rewrites; worker wires it
  • releaseFile(name) after each file's lint pass — drops bound SF + position maps for GC
  • Kind-remap fallback for tsgo SyntaxKind names that don't exist in ts (e.g. JSImportDeclaration)
  • Real-ts module captured at worker top-level so internal binder/scanner code bypasses the tsgo facade

Tests:

  • tsgo-backend.test.ts — 15 cases (adapter, JS-symbols, invalidation, Symbol identity)
  • program-host.test.ts — 17/17 unchanged
  • cache-flow.test.ts — 50/50 unchanged
  • meta-frameworks.test.ts — 8/8 unchanged

Recommendation: hold on branch

--tsgo doesn't beat ts.Program on any workload that matters today:

  • Speed: marginal win on large single-project (12% on Dify); catastrophic loss on small/multi-project
  • Memory: loses on both (1.58× on Dify, 5.8× on tsslint repo)
  • Correctness: 13% recall miss on Dify, half-of-files-crash on tsslint repo
  • False positives: 5-20× more crash-class messages

Architecturally the work is sound and reusable. When upstream addresses any of (1)/(3)/(6) above, this branch becomes immediately useful. Rebasing onto master should remain cheap because we changed almost nothing outside the new lib/tsgo-*.ts files.

The right call is to leave this open as a draft and revisit when upstream tsgo ships:

  • getResolvedSignature for generic method calls
  • isTypeAssignableTo / getApparentType
  • SourceFileCache.releaseSourceFile(path) (or in-process FFI binding)

Test plan

  • node packages/cli/test/tsgo-backend.test.js — 15/15
  • node packages/cli/test/program-host.test.js — 17/17
  • node packages/cli/test/cache-flow.test.js — 50/50
  • node packages/cli/test/meta-frameworks.test.js — 8/8
  • Dify web/ smoke (3 runs): 8.9-9.7s, 2983/1621/50, 3.58 GB peak
  • tsslint own repo (3 runs): 35-39s, 29/247, 2.79 GB peak
  • --fix exercise on Dify: no JS errors, parity with non-fix run
  • Plain-TS path (no --tsgo) regression check: no change in output or perf

…prototype shim

Lays the plumbing for tsgo (`@typescript/native-preview`) integration without
introducing it as a hard dependency.

Adapter (lib/tsgo-backend.ts):
- Wraps `Project.program` + `Project.checker` as a `ts.Program` /
  `ts.TypeChecker`-shape that satisfies LinterContext's Program-thunk
  contract from PR #98.
- Per-file batched prepass: walks the SF locally, resolves every Identifier
  via `Checker.getSymbolAtPosition([positions])` in a single sync RPC.
  Position-based primary because `getSymbolAtLocation` returns undefined for
  identifiers in import/export specifier position (~76% of identifiers in a
  type-heavy file). Location-based fallback for the small remainder.
- Symbol identity is rock-solid via tsgo's `symbol.id` — `Map<Symbol, X>`
  patterns (compat-eslint's `variableBySymbol`) work unchanged.
- Patches `getStart` / `getEnd` / `getText` / `getFullStart` / `getFullText`
  / `getWidth` / `getFullWidth` onto tsgo's `RemoteNode` prototype (located
  by walking the prototype chain — the Remote{Node,SourceFile} classes
  aren't in the package exports map). Uses real ts's `skipTrivia` —
  bit-identical positions vs ts.Node.

CLI wiring (lib/worker.ts, index.ts):
- `--tsgo` flag dispatches setup() into the tsgo branch.
- Rejects --vue/--mdx/--astro/--vue-vine/--ts-macro projects with a clear
  error (Volar host injection isn't available on tsgo).
- Skips BuilderProgram drain / layer-2 affected-file classification (no JS
  API on tsgo); treats every file as affected → cached type-aware entries
  re-validate rather than serve from a stale snapshot.
- `lint()` calls `tsgoBackend.prepareFile(name)` before rules run.

Optional peer dep: `@typescript/native-preview`. Users without it never
import the backend (the require path is gated behind --tsgo).

Verification:
- packages/cli/test/tsgo-backend.test.ts — 12 checks (skips when peer dep
  absent): SF fetch + AST walk, batched symbol prepass, Symbol identity
  collapse, Map<Symbol> compatibility, program method coverage.
- Existing tests unchanged: program-host (17), cache-flow (50),
  meta-frameworks (8).
- End-to-end: --tsgo --project packages/{types,core,compat-eslint}/tsconfig.json
  runs to completion; full repo lint 1.66s vs 1.45s on ts.Program (small
  parity gap; dominated by per-project snapshot init).

Known gap (next commit): rule code calls `ts.isXxx(node)` and
`ts.forEachChild(node, cb)` from real ts — these dispatch on
`ts.SyntaxKind` values. tsgo Node kinds are offset-shifted (Identifier=79
vs 80), so dispatch tables miss → rules silently no-op. SyntaxKind
remapping or ts module facade required to make rules fire.
Validation fixture (a single `import { Foo } from './foo'` where Foo is
type-only): @typescript-eslint/consistent-type-imports rule now fires on
tsgo path with the same diagnostic text + range as ts.Program path.

The mechanism — why a facade and not in-place mutation:

Rule code (compat-eslint, typescript-eslint utilities, custom rules)
binds `ts` once at module load: `const ts = require('typescript')`. Every
`ts.SyntaxKind.X` / `ts.isXxx(node)` / `ts.forEachChild(node, cb)` call
inside that module dispatches against that bound module's enum values
and type-guard functions. tsgo Node.kind=79 (Identifier); real
ts.SyntaxKind.Identifier=80 → dispatch tables miss → silent no-op.

We can't mutate ts.SyntaxKind in place: the worker / cache-flow / core
imported ts BEFORE setup() runs and need real ts internals (skipTrivia,
createSemanticDiagnosticsBuilderProgram, …) keyed by real ts values.

Selective substitution: install a facade in `Module._cache` AFTER the
worker's module-load `import ts = require('typescript')` has bound real
ts, BUT BEFORE the dynamic `import(configFile)` pulls in compat-eslint
and rules. Those late-loaded modules see tsgo's enums + type guards via
the facade; the worker keeps its real ts reference intact.

Implementation (lib/tsgo-typescript-facade.ts):
- Build facade as a plain object. Copy every own-property of real ts
  first (so CJS interop helpers like tslib's `__importStar`, which iterate
  `Object.getOwnPropertyNames(mod)` and miss prototype-inherited fallbacks,
  still see complete shape — typescript-estree crashes on `ts.Extension`
  otherwise). Overlay tsgo's enums (SyntaxKind, NodeFlags, ModifierFlags,
  ScriptKind, ScriptTarget, TokenFlags, …), all type guards from /ast/is,
  visitor helpers, factory namespace, and tsgo-only enums (SymbolFlags,
  TypeFlags, DiagnosticCategory) from /api/sync.
- Free-function `forEachChild(node, cb, cbNodes?)` wraps tsgo Node's own
  `.forEachChild` method (rule code expects free-function shape).
- Sentinel `__tsgoFacade__: true` for debugging.

`installFacade()` (lib/tsgo-typescript-facade.ts):
- Pnpm-aware: tsgo + tsslint + tsslint-eslint each pull in typescript via
  their own resolution chains, all landing on different physical paths in
  `.pnpm/`. Cache-slot substitution via `Module._cache[require.resolve('typescript')]`
  alone covers our cwd's slot, not the others. Instead route every
  `require('typescript')` to a single canonical synthetic id by overriding
  `Module._resolveFilename`, then prime that one cache slot.
- Idempotent — second call short-circuits.

Worker integration (lib/worker.ts):
- `if (useTsgo) require('./tsgo-typescript-facade.js').installFacade()`
  runs at top of setup(), before `await import(configFile)`.

State on full repo:
- 11/59 files pass through tsgo path; 48 files now crash with a different,
  more advanced gap inside compat-eslint's ts-ast-scan (next iteration).
- Validation fixture: rule fires, diagnostic matches.
- Existing tests unchanged: program-host (17), cache-flow (50),
  meta-frameworks (8), tsgo-backend (12).
…cker stubs

Two surgical follow-ups after the facade landed:

1. Type prototype patches. typescript-eslint's no-unnecessary-type-assertion
   (and several compat-eslint paths) call `type.isLiteral()` /
   `type.isStringLiteral()` / `type.isUnion()` / `type.getFlags()` etc.
   ts.Type provides these as instance methods on its prototype. tsgo's
   TypeObject only exposes data fields + `getSymbol()`.

   Patch the missing predicates onto tsgo's TypeObject prototype, located
   by walking up from a sample Type the first time `getTypeAtLocation`
   returns one (TypeObject isn't in the package exports map). Predicates
   close over tsgo's TypeFlags enum at patch time — flag values differ
   from ts.TypeFlags, so we can't blanket-copy ts.Type prototype methods
   (their hardcoded flag constants would mis-mask).

2. `getSymbolsInScope` / `getExportSpecifierLocalTargetSymbol` change
   from throwing to returning safe defaults (`[]` / `undefined`).
   compat-eslint's two callsites (parameter-property shadowing,
   ExportSpecifier alias unwrap) have fallback paths that handle empties
   gracefully — degrades scope-manager precision in those edges but the
   pipeline keeps moving. Methods that *would* produce wrong diagnostics
   on no-op stay throwing (`getBaseConstraintOfType`, `getApparentType`,
   `getContextualType`).

Status: 12 files pass through tsgo path on the full repo (was 11). Next
gaps surface concretely:
  - `Symbol.declarations[i].getSourceFile is not a function` —
    `declarations` on tsgo Symbol is `NodeHandle[]`, not `Node[]`
  - `scanner.setTextPos is not a function` — tsgo Scanner shape diff

Existing tests unchanged.
Six concrete gaps closed. Each was a pinpoint API-shape mismatch between
tsgo's RPC-class hierarchy and the ts.* shape rule code expects.

NodeHandle (Symbol.declarations[i]):
- `getSourceFile()` → routes to `project.program.getSourceFile(this.path)`
  (tsgo NodeHandle has `path` directly).
- `parent` lazy getter → resolves the handle once via `resolve(project)`,
  caches `_resolvedNode` on the instance, returns resolved.parent.
- Multi-project safe: hooks close over a mutable `currentProjectRef`
  rebound per `createTsgoBackend()`. Avoids the `SyncRpcChannel is closed`
  errors caused when project A's NodeHandles were touched after project A
  was disposed.

Symbol prototype:
- `getDeclarations()` / `getName()` / `getEscapedName()` / `getFlags()` —
  thin getters over the data fields tsgo Symbol already carries. Plus
  `escapedName` getter alias for typescript-estree's direct field access.

Type prototype:
- `types` getter delegates to tsgo's `getTypes()` — typescript-eslint's
  ts-api-utils (`unionConstituents`) reads `type.types` directly on
  Union/Intersection types.
- `getCallSignatures()` / `getConstructSignatures()` / `getProperties()` /
  `getProperty(name)` / `getBaseTypes()` / `getNonNullableType()` —
  instance shims that delegate to the project Checker via the
  `currentProjectRef` holder. Patched on first checker query that
  surfaces a Type sample.

Scanner:
- `setTextPos(pos)` aliased to `resetTokenState(pos)` — tsgo renamed
  the method but kept the same semantics (position the scanner head).
  Wrapped at the facade's `createScanner` callsite so consumers
  (compat-eslint/lib/tokens.ts) work without per-callsite changes.

SourceFile:
- `getLineAndCharacterOfPosition(pos)` + `getLineStarts()` — lazy
  computation of line offsets from `text`, cached on the instance. tsgo
  doesn't expose ts's lineMap caching; the binary-search version is
  fast enough for diagnostic span rendering.

NodeList:
- `[Symbol.species] = Array` on `RemoteNodeList` (Array subclass).
  Without this, `statements.map(fn)` constructs a new RemoteNodeList
  via the species protocol, hits the binary-view getter without view
  data, and crashes with `this.view.getUint32 is not a function`.

Checker proxy expansion:
- 14 new direct forwards (getTypeOfSymbol / getDeclaredTypeOfSymbol /
  getSignaturesOfType / getResolvedSignature / getReturnTypeOfSignature
  / getTypePredicateOfSignature / getNonNullableType / getBaseTypes /
  getPropertiesOfType / getIndexInfosOfType / getTypeArguments /
  getWidenedType / getTypeFromTypeNode / getContextualType /
  typeToString / isArrayLikeType).
- `getBaseConstraintOfType` routes type-parameter inputs to tsgo's
  `getConstraintOfTypeParameter`, returns undefined for non-parameters
  (matches ts behaviour).
- `getApparentType` soft-fallback returns the input type — primitives
  rarely hit this path in lint rules; rule code's downstream property
  lookups go via `getPropertiesOfType` which is forwarded.

Status on full repo (`tsslint --tsgo --project '{tsconfig.json,packages/*/tsconfig.json}' --force`):
- 19 / 59 files complete (was 12).
- ~9 distinct crash-class TypeErrors remain — `isTypeAssignableTo`,
  `sig.getReturnType`, `Cannot read properties of undefined (reading 'kind')`.
- ~455 diagnostic messages now fire from rules with partial type info.
  Many likely false-positives until checker shims are tightened (e.g.
  `getApparentType` soft-fallback over-approximates).
- Existing tests unchanged: tsgo-backend (12), program-host (17),
  cache-flow (50), meta-frameworks (8).
…y, Signature shims

Three batches that close every crash category except the lone
ts-api-utils `isInConstContext` walk-up edge case.

1. SourceFile.getPositionOfLineAndCharacter (line, char) → position.
   compat-eslint's ESLint→TSSLint report converter (index.ts:266+) calls
   this to map ESTree's loc-based descriptors back to file offsets, all
   wrapped in a swallowing try/catch that defaults `start=end=0` on
   failure. Without the shim every diagnostic collapsed to (line=1,
   col=1). Now spans render correctly — caret highlights match
   ts.Program path on the validation fixture.

2. Null-safety wrap on every `is*` predicate in the typescript module
   facade. tsgo emits `is.generated.js` with bare `node.kind === SK.X`
   accesses; ts's versions tolerate undefined. ESLint rule code that
   walks `node.parent.parent.parent…` and tests on the result was
   crashing at SourceFile-root and beyond. Wrap to short-circuit
   `false` on falsy input.

3. Signature prototype shims (`getReturnType` / `getDeclaration` /
   `getTypeParameters` / `getParameters`). tsgo Signature carries the
   data fields but lacks ts.Signature's accessor-method facade.
   `getReturnType` delegates via `currentProjectRef.project.checker`;
   the rest read existing fields.

4. Soft `isTypeAssignableTo: () => false`. tsgo doesn't expose subtype
   checking. Conservative `false` keeps type-safety rules on their
   "can't prove assignable, leave alone" branch — matches ts behaviour
   when the checker can't decide. May suppress some legitimate
   diagnostics until upstream surfaces this.

Status:
- 20 / 59 files complete (was 19).
- Crash classes: 9 → 1. Lone remaining: ts-api-utils' `isInConstContext`
  walks `current.parent` past SourceFile root → `undefined.kind` crash.
  Affects no-unnecessary-type-assertion on a small set of files only;
  pre-existing in ts-api-utils itself (assumes `current.parent` is
  always defined).
- Diagnostic span rendering now correct: caret offsets match ts.Program.
- 684 messages — most are false-positives from rules running with
  approximated checker calls (`getApparentType`,
  `isTypeAssignableTo`); none are crashes anymore.
- Existing tests unchanged: tsgo-backend (12), program-host (17),
  cache-flow (50), meta-frameworks (8).
Probed `no-unnecessary-type-assertion` false-positive flood (684/684
diagnostic messages on full repo). Root cause is upstream, not adapter:

tsgo's `getTypeAtLocation` returns the same Type id (id=1 / flags=Any)
for both the outer asserted type AND the inner expression in nested
assertions. Example:

  (globalThis as any)[COUNTS_KEY] as Map<string, number> | undefined
  ↓
  outerType.id = t01 (Any),  innerType.id = t01 (Any)

vs. ts returning two different Type instances (one Map | undefined,
one Any). The rule's `uncast === cast` shortcut takes the wrong branch
because tsgo's objectRegistry caches the universal-Any singleton — two
"any" results round-trip to the same instance.

Switching `getApparentType` from soft-identity to throw didn't help —
the rule's path doesn't reach getApparentType. Reverted.

The path forward is upstream (tsgo's checker semantics for nested
assertions through `any`), not an adapter wrap. Adapter-side options
are sledgehammers: (a) wrap every Type to break identity equality —
suppresses real `unnecessaryAssertion` findings on equivalent types
in addition to the false positives; (b) shadow noUnnecessaryTypeAssertion
specifically — intrusive plumbing for one rule.

Status unchanged: 20 / 59 files complete; one residual crash class
(ts-api-utils' `isInConstContext` walk-up edge case); the 684
false-positives flood comes from tsgo's checker, not our shims.
…eNode

Found the right tsgo API entry instead of declaring upstream bug. Not a
bug — `getTypeAtLocation(asExpr)` and `getTypeFromTypeNode(asExpr.type)`
have different semantic contracts:

  - ts.getTypeAtLocation(asExpr)   → asserted target type (the type
                                     the expression evaluates to AFTER
                                     `as`)
  - tsgo.getTypeAtLocation(asExpr) → underlying expression's type
                                     (BEFORE `as`)

typescript-eslint's `no-unnecessary-type-assertion` rule depends on the
ts contract — without routing, `castType === uncastType` was trivially
true (both = inner type), and every assertion in the codebase fired as
"unnecessary" (684 false-positive messages).

Re-route in our adapter: when `getTypeAtLocation` is called on an
`AsExpression` / `TypeAssertionExpression` / `SatisfiesExpression`,
delegate to `getTypeFromTypeNode(node.type)` instead. Other node kinds
keep tsgo's default semantic (which IS what's wanted everywhere else).

Status:
- 24 / 59 files complete (was 20).
- 433 messages (was 684) — a ~250-message drop, all from the rule
  taking the right code-path now on most assertions.
- Residual false-positives concentrate on UnionTypeNode targets like
  `as Map<string, number> | undefined`. tsgo's
  `getTypeFromTypeNode(unionTypeNode)` returns only the first member
  (`Map<string, number>`), not the constructed union — so the rule
  again sees outer/inner as effectively the same. Likely needs a
  different tsgo API entry; investigation continues.
- Existing tests unchanged.
…on routing

The "API semantic divergence" angle generalises beyond AsExpression. Each
node kind whose ts.Type semantics differ from tsgo gets its own routing.

  AsExpression / TypeAssertion / Satisfies → getTypeFromTypeNode(.type)
                                            (asserted target type)
  CallExpression / NewExpression           → getReturnTypeOfSignature(
                                              getResolvedSignature(node))
                                            (call's return type, not the
                                            function type)
  NonNullExpression                        → getNonNullableType(
                                              getTypeAtLocation(.expression))
                                            (post-`!` type, not pre-`!`
                                            union with undefined)

Without these, on the validation fixture `const x = map.get('foo')!`:
- ts.Program: rule sees `number | undefined` for the call → isNullableType
  iterates union constituents → returns true → rule does NOT fire (correct).
- tsgo (pre-fix): getTypeAtLocation(callExpr) returns the FUNCTION type
  `(key) => number | undefined`, not the call result. isNullableType
  returns false → rule fires as "unnecessary" (false positive).

Why these and not all kinds: variable references, member accesses, and
literals all have ts and tsgo agreeing on what `getTypeAtLocation`
returns (the value's type). Only the asserted/post-effect kinds diverge.

CallExpression has a fallback chain. tsgo's `getResolvedSignature` panics
on some method-call sites (`Map.get` etc.) — synthetic node identity may
not match the parsed program's call site. Catch the panic and fall back
to `getCallSignatures(funcType)[0].getReturnType()`, which uses two
sync calls but doesn't trigger the panic. Slightly less precise (picks
the first overload) but sound for lint purposes.

Status:
- 27 / 59 files complete (was 24).
- 427 messages (was 433). Tail of false-positives now from rules whose
  remaining checker shims (getApparentType identity-fallback,
  isTypeAssignableTo always-false) over- or under-approximate.
- ts-api-utils' `isInConstContext` walk-up edge case still 1 occurrence.
- All existing tests pass.
Two refinements after tracing remaining false positives:

1. PropertyAccessExpression callee: tsgo's
   `getSymbolAtLocation(propAccess)` returns the LEFT (receiver)
   symbol — for `info.languageService.getProgram`, sym is
   `languageService`, not `getProgram`. ts returns the property's
   symbol (the rhs). Probe the property name node first
   (`callee.name`) before falling back to the callee itself for
   identifier-callee cases.

2. NonNullExpression's `getTypeAtLocation(.expression)` recursion
   was bypassing the adapter — calling `project.checker.getTypeAtLocation`
   directly. For nested cases like `someMap.get(k)!`, the inner
   CallExpression needs its OWN routing applied to surface the
   call's return type before `getNonNullableType` strips
   nullability. Recurse through the wrapped checker so the dispatch
   compounds correctly.

Status:
- 29 / 59 files complete (was 27).
- Tail of false positives now concentrates on `.pop()!` style
  assertions where Plan A panics, Plan B's funcType isn't the
  full method type, Plan C's symbol-via-name probe doesn't
  resolve. Likely needs another tsgo API entry to surface the
  exact method signature (`getResolvedSignature` is the canonical
  one but panics — known tsgo issue, not adapter shape).
- ts-api-utils' `isInConstContext` walk-up edge case still 1
  occurrence — pre-existing in the third-party utility.
- All existing tests pass.
Two attempts at making the checker shims more accurate, with mixed
results dogfooded against the full repo:

getApparentType: identity-fallback → flag-routed:
  - TypeParameter input → getConstraintOfTypeParameter (fall through
    to the input if no constraint)
  - StringLiteral / NumberLiteral / BooleanLiteral / BigIntLiteral /
    EnumLiteral input → getWidenedType
  - Otherwise input

  Sound but unmeasured: 432 → 431 messages on dogfood. The rule
  paths that were calling it weren't producing the visible FPs;
  the cleanup is defensible regardless.

isTypeAssignableTo: tried structural cover (id equality, any/unknown/
never sentinels, union decomposition, literal widening), reverted
to always-false:
  - Cover impl returned `true` for cases TS would say `false` (no
    full structural compat without porting the checker subtype
    machinery), which then made the rule's contextually-unnecessary
    path fire on assertions where the receiver type wouldn't
    actually accept the source type.
  - Net effect: +26 new FPs on dogfood (different message variant —
    "receiver accepts the original type"), zero reduction in the
    dominant 433 "does not change the type" FPs.
  - Conservative `() => false` keeps the rule on its
    "can't prove assignable, leave alone" branch — same branch ts
    takes when the checker can't decide. Better than partial
    truth.

Honest accounting of FP source — which I overclaimed earlier:
  - getResolvedSignature panic on generic method calls: confirmed
    tsgo upstream (minimal repro independent of adapter)
  - getTypeAtLocation on PropertyAccess returning wrong narrowed
    type in real-repo context: minimal repro shows correct behaviour
    (`string | undefined`), but instrumented dogfood shows wrong
    types (`string`, `string[]`) on the same source. Interaction
    effect with adapter prepass / session state — root cause
    unidentified. Could be adapter, could be tsgo, can't say.
  - Other FPs: distributed across yet-untraced rule paths.

Conclusion: the dominant FPs aren't dispatched by the two stubs we
just adjusted. They originate further up — likely in
getTypeAtLocation results that misalign with ts in real-codebase
contexts but match in isolation. Diagnosing requires per-FP
instrumentation.
Reverted the previous revert. Implementing each shim faithfully is the
right metric, not "produces few messages on our codebase". Conservative
`() => false` was lying for cases that ARE assignable, suppressing
diagnostics that should fire — the apparent reduction in FPs included
silenced true positives.

Reinstate the structural cover (id eq, any/unknown/never sentinels,
union decomposition, literal widening). Returns `false` for the
long-tail structural-compat cases that need the full checker subtype
machinery — sound (no false `true`) over those, consumers treat
unknown as "can't prove" rather than "definitely false".

Sticking with the principle: implement to API contract. Verifying
correctness per-rule is downstream work; suppressing visible noise via
shim dishonesty is not.
tsgo's `getTypeAtLocation(propAccess)` returns wrong types in some
real-codebase contexts. Investigation:

- Real cli/index.ts has 3 occurrences of `project.configFile!` where
  the declared type is `string | undefined`. tsgo returns:
    line 455 → `string[]`  (the type of the PREVIOUS argument)
    line 463 → `string`    (no undefined)
    line 575 → `string`    (no undefined)
- ts.Program returns `string | undefined` for all 3 (correct).
- Querying `getTypeAtPosition(file, propAccess.end)` on the SAME
  positions returns correct `string | undefined`. Same checker, same
  snapshot, different API entry, different answer.

The bug is in tsgo's node-based `getTypeAtLocation` for PropertyAccess
in some surrounding-context cases — minimal repros don't trigger; the
divergence needs the full file's complexity. Position-based query
isn't affected.

Adapter routing: for PropertyAccessExpression / ElementAccessExpression,
delegate to `getTypeAtPosition(file, end)`. Sound — same checker, same
type — and produces ts-aligned results in cases that were FPing before.

Effect: 432 → 377 messages on dogfood (~55 fewer FPs). One file moved
from "passed clean" to "has some message" (likely a true positive that
was previously masked, or a different FP shape revealed; not yet
classified). All existing tests pass.
Verifies the --tsgo slowdown vs ts.Program is structural, not from
adapter mistakes. On Dify web/ (~5000 .tsx, single project):

  createBackend (updateSnapshot cold)        0.8s   4%
  prepass total (3600 prepared × ~3ms/file)  11.1s  60%
  rule exec + per-rule checker queries        6.5s  36%
  -----------------------------------------  ----  ----
  Wall                                        18.4s 100%

  ts.Program wall on the same setup           10.4s

The 8s gap concentrates in prepass — necessary tsgo cost: each lint()
must batch-resolve ~300 identifiers/file via sync RPC because per-call
checker queries cost ~78us each (vs in-process ts.Program: zero IPC).

Without prepass, rules hitting per-id symbol queries → 5000×300×78us =
~117s. Prepass at 3ms/file is already near the IPC floor (DataView
serialization + cross-process batched query).

What we ruled out:
  - duplicate prepass (preparedFiles Set guards)
  - redundant RPC (location-fallback only on the ~10% position-based
    misses)
  - heavy adapter-internal work (negligible vs RPC time)

What's structural:
  - cross-process tsgo means every checker query is sync RPC
  - lint workloads ask many type questions per file
  - in-process ts.Program has zero IPC for the same questions

Toggling: set TSSLINT_TIME_TSGO=1 to print createBackend, per-100-file
prepare cumulative, and final-summary stage breakdown. Default off; the
arithmetic is a single env-var read per call when off.

Conclusion (verified, not assumed): the 1.7× slowdown isn't from
something we did wrong. It's the IPC floor of running a cross-process
type checker for an N-files-with-M-queries-each workload.
Verifies what symbol resolution actually costs on the tsgo path. The
batched `getSymbolAtPosition` prepass turns out to be a workload-
dependent optimization, not a free win.

Why it can be skipped: tsgo's Symbol comes from the binder, not the
checker. The binder runs at `updateSnapshot` time on the Go side; every
`getSymbolAtPosition` call is just a hash-table lookup. The 3ms/file
cost we saw is almost entirely cross-process plumbing — msgpack
serialize, sync RPC, Symbol-object construction — not type-checking work.

Bench (3 runs each, median wall, --force):

  Dify web/ (~5000 .tsx, single rule react-x/no-leaked-conditional-rendering)
    prepass on   18.8s   2982 passed · 1611 errors · 55 messages
    prepass off   8.5s   2982 passed · 1611 errors · 55 messages
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                      identical correctness, 2.2× faster

  tsslint own repo (7 projects, full ts-eslint via importESLintRules)
    prepass on   36.8s   28 passed · 376 messages
    prepass off  37.0s   26 passed · 391 messages
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                         ~equivalent perf, slight correctness shift

The Dify result shows prepass costs 11s of upfront RPC for queries the
configured rule barely makes. The adapter's per-Node `nodeToSymbol`
WeakMap already caches lazy `getSymbolAtLocation` results, so on
heavy-symbol workloads (tsslint repo) lazy mode catches up via cache
hits rather than upfront batching.

Defaulting to prepass-on for now because the slight tsslint-repo
correctness drift (~2 passed files, +15 messages) is unexplained — a
silent shift in lint output from a perf flag is the wrong default until
diagnosed. `TSSLINT_NO_PREPASS=1` opens the toggle for users who want
to trade that risk for the 2× speedup on rule mixes that don't query
many symbols.

Answers user's question: no, the slowdown isn't checker work — it's
IPC overhead for symbol queries we don't always need.
…LS=1)

Replaces tsgo's `getSymbolAtPosition` IPC for in-file symbol resolution
with a real ts-side bind + scope walker. The Symbol returned is a
genuine ts.Symbol (not a tsgo Symbol with prototype shims), and stays
identity-stable across queries.

Architecture
------------
tsgo remains the source of truth for AST and Type. Symbol resolution
splits by query level:

  Layer A (variable refs, declaration names, in-file specifiers,
           in-file type refs)
    → ts.createSourceFile + ts.bindSourceFile + scope walker
    → in-process, ~0.36ms/file, real ts.Symbol

  Layer C (property names on imported types, lib globals, and any
           cross-file resolution)
    → fall back to tsgo's checker via existing IPC path

The fallback fires when the JS-side scope walker returns undefined for
a given identifier — no upfront classification needed, the lookup tells
us itself.

Implementation
--------------
- `lib/real-ts.ts`: captures the genuine `typescript` module reference
  before the tsgo facade installs its `Module._resolveFilename` hook.
  Worker eager-imports it at top-level so internal CLI code that needs
  real ts behaviour (parser/binder/scanner) bypasses the facade.

- `lib/tsgo-js-symbols.ts`: parse + bind a file via real ts, build a
  `(pos, end, kind)` position map for tsgo Node → JS Node lookup, and
  do a scope walk on Identifier resolve. Kind remap covers tsgo's
  offset-shifted SyntaxKind enum vs ts's.

- `lib/tsgo-backend.ts`:
    - `prepareFile` under TSSLINT_JS_SYMBOLS=1 skips the tsgo IPC
      prepass and just binds the JS-side SF (lazy: scope walks happen
      per-query, not eagerly).
    - `getSymbolAtLocation` tries JS-side resolver first, falls
      through to tsgo on miss.

Bench (Dify web/, ~5000 .tsx; median of 2 runs, --force):

  Mode                        Wall    Passed Errors Messages
  A. tsgo prepass on          16.8s   2982   1611   55
  B. tsgo no-prepass           8.0s   2982   1611   55
  C. tsgo JS-symbols (this)    9.0s   2983   1621   50
  D. ts.Program                9.5s   2961   1867   9

Bench (tsslint own repo, 7 projects with full ts-eslint; median 3):

  Mode                        Wall    Passed Messages
  A. tsgo prepass on          37.1s   29     396
  B. tsgo no-prepass          37.0s   27     407
  C. tsgo JS-symbols (this)   37.9s   29     251  ← -145 FPs
  D. ts.Program                1.6s   60     12

C trade-offs
------------
- vs A (default): much faster on Dify (-7.8s), same speed on tsslint,
  better correctness on both (more passed / fewer messages).
- vs B (no-prepass): +1s on Dify (cost of eager bind even when rules
  don't query symbols), tied on tsslint, but ~10 more true positives
  detected and -145 messages on tsslint repo (the JS-side scope walker
  satisfies queries that B's lazy tsgo answers were imprecise on).
- vs D (ts.Program): on Dify faster by 0.5s; on tsslint dramatically
  slower because 7 projects × 0.8s tsgo setup cost dominates a small
  codebase.

C is now the best-correctness tsgo mode. Defaulting OFF until the
trade-off is examined per-codebase.

Tests unchanged: tsgo-backend (12), program-host (17), cache-flow (50),
meta-frameworks (8) all pass.
Removes the original tsgo `getSymbolAtPosition` batched IPC prepass
entirely. Symbol resolution now defaults to:

  1. real-ts `bindSourceFile` per file (~0.36ms/file in-process)
  2. lazy JS scope walker on getSymbolAtLocation
  3. tsgo IPC fallback (position-based first, then node-based) only
     when JS resolver returns undefined

Architecture rationale (validated by Dify-scale measurement):
  - Symbol is binder output, not checker output. tsgo's binder runs at
    `updateSnapshot` on the Go side; every getSymbolAtPosition was a
    cross-process round-trip serializing pre-computed binder data.
  - real ts in-process binder + scope walker gives the same answer for
    Layer A (variable refs, declarations, in-file specifiers, type
    refs). Returns real ts.Symbol with stable identity — no
    prototype-shim wrapper needed.
  - Layer C (property names on imported types, lib globals, anything
    cross-file) falls through to tsgo's checker, position-based first
    to recover the prepass's specifier coverage.

Removed:
  - TSSLINT_NO_PREPASS env (always-off path is irrelevant — there is
    no batched prepass to disable)
  - TSSLINT_JS_SYMBOLS env (now the only path)
  - 50 LOC of position+location batched prepass + fallback code in
    `prepareFile`
  - `idKind` capture in `createTsgoBackend` (prepareFile no longer
    walks the SF to collect identifiers)
  - `_prepareWalk` / `_prepareBatchSym` / `_prepareFallbackSym` timing
    counters (only `_prepareGetSF` and new `_prepareBind` remain)

Numbers (median of 3, --force):

  Dify web/ (~5000 .tsx, single rule):
    before this PR (prepass-on)   17.0s   2982 passed · 1611 errors · 55 msg
    after  this PR (JS-symbols)    9.2s   2983 passed · 1621 errors · 50 msg
    ts.Program baseline           10.4s   2961 passed · 1867 errors ·  9 msg

  tsslint own repo (7 projects, full ts-eslint):
    before this PR (prepass-on)   37.1s   29 passed · 396 messages
    after  this PR (JS-symbols)   37.6s   29 passed · 247 messages

Wins on both correctness and speed at Dify scale; on tsslint repo wins
on FP count (-149 messages) at parity speed. The 13% Layer A recall gap
the JS walker has against whole-program ts (mainly globals) is covered
by the tsgo IPC fallback in wrapChecker, so end-to-end recall through
the adapter matches the previous prepass.

All test suites pass: tsgo-backend (12), program-host (17),
cache-flow (50), meta-frameworks (8).
… kind fallback

Closes the three tsslint-side gaps the JS-symbols spike left open.

(1) Memory lifetime — per-backend caches
The bound-SourceFile + position-map caches in `tsgo-js-symbols.ts`
moved from module-level singletons into the closure of
`createJsSymbolResolver`. Each `createTsgoBackend` call constructs its
own resolver and registers it via `jsSymbolResolverRef.current`;
`backend.close()` calls `resolver.clear()` and unregisters. Multi-
project worker setups no longer share stale binds across snapshots,
and long-running CLI invocations don't accumulate cached ASTs across
projects.

(2) --fix invalidation
`getJsSourceFile` now compares the cached SF's `.text` against the
incoming text on every call. On mismatch (post-`--fix` rewrite), the
old SF and its position maps are dropped before re-binding. The
backend exposes a public `invalidateFile(fileName)` API; the worker's
`--fix` path calls it right after stashing the rewritten text in
`fileTextOverrides`, so the next `prepareFile` rebinds against the
post-fix content. Without this, scope queries on edited files would
return symbols from stale declarations.

(3) Kind remap robustness
`tsgo SyntaxKind` and `ts SyntaxKind` enum names overlap ~98% by name
but a handful diverge (tsgo-only `JSImportDeclaration`, `JSTypeAliasDeclaration`,
etc.). The previous remap returned the unmapped tsgo value as-is,
making position-key lookups silently miss. Added a parallel
position-only map (pos→first-node-at-span) consulted when the kind
key misses, so resolution falls through to "best-effort node at this
span" instead of "no answer at all". Recall on Dify unchanged
(2983/1621/50) — the affected node kinds happen to not be Identifier-
positioned in the rules under test, but the safety net is in place.

Test additions
- New test 6 in `tsgo-backend.test.ts`: invalidate + re-prepare on
  unchanged file, verify identifier still resolves and returns
  equivalent symbol. Exercises the change-detection short-circuit
  and the post-invalidate rebind path.

Bench (3 runs, --force):
  Dify web/ tsgo default          9.2-9.6s   2983/1621/50  (unchanged)
  Dify web/ tsgo --fix            9.4s       2983/1621/50  (no JS errors)

Tests: tsgo-backend 15/15 (was 12), program-host 17/17, cache-flow
50/50, meta-frameworks 8/8.
Was accumulating all linted files' bound SFs + position maps in Node
memory until backend.close(). For Dify (5000 files × ~30KB bound SF +
position maps), that pinned ~520 MB unnecessarily.

After lint() returns for a file, the JS-side bind serves no further
purpose — symbols for that file's identifiers have already been
queried, and rule code from later files queries against THEIR own
bound SFs. The bound SF can be released for GC.

  TsgoBackend.releaseFile(name)
    Drops the bound SourceFile + position maps from the per-backend
    JsSymbolResolver. Distinct from invalidateFile (which is for
    --fix rewrites and re-binds against new text); release simply
    discards because we're done.

  worker.lint(fileName, ...) → returns diagnostics
    Calls tsgoBackend?.releaseFile(fileName) just before return.

Memory bench (Dify web/, ~5000 .tsx, --force):

                     Node RSS    Go subprocess    Total
  ts.Program          2.27 GB    —                2.27 GB
  tsgo before         2.68 GB    1.59 GB          4.27 GB  (1.9×)
  tsgo after          2.16 GB    1.42 GB          3.58 GB  (1.58×)

Node-side dropped 524 MB; tsgo Node now smaller than ts.Program.
Total still higher than ts.Program (Go subprocess holds AST+types
independently), but the Node-side fat from accumulating bound SFs is
gone.

Multi-project scenario (tsslint own repo, 7 projects) NOT helped by
this — there the accumulation is in the tsgo client SourceFileCache
across snapshot boundaries, not in our per-file bind. That's an
upstream `@typescript/native-preview` cache lifecycle issue.

Speed unchanged (Dify 8.9-9.7s, 2983/1621/50). Tests still 15/15.
@johnsoncodehk johnsoncodehk force-pushed the master branch 2 times, most recently from e98724a to 1b3b43e Compare May 7, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant