feat(webapp): consolidate auth path + add comprehensive auth tests#3499
Conversation
🦋 Changeset detectedLatest commit: 0a2094d The changes in this PR will be included in the next version bump. This PR includes changesets to release 31 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds RBAC plugin and fallback, RBAC contracts and plugin surface, rewrites API and realtime route authorization to typed {type,id}/anyResource/everyResource forms, centralizes auth in updated API/dashboard route builders, exposes dashboardLoader/dashboardAction client shim, wires RBAC into webapp services/models/UI (roles page, team RolePicker, invite/token role fields), unifies AuthenticatedEnvironment types across packages, extracts scheduleEmail, adds DB migration for invite role id, and provides comprehensive Vitest e2e suites with shared test server/helpers plus a CI workflow. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
Five real-bug fixes from CodeRabbit + Devin review of #3499: #1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string was 'RBAC fallback not installed' but the OSS fallback actually returns 'RBAC plugin not installed'. The mismatch made every PAT creation with a roleId hit the compensating-delete branch on self-hosters with no plugin installed — including the dashboard PAT-creation flow. Aligns the constant with the canonical string. #2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped the revoked-API-key grace window (RevokedApiKey table), so a freshly-rotated env API key would 401 immediately on the new auth path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey logic so the auth-cross-cutting e2e tests pass. #3 api.v1.query.ts: multi-table queries built a plain RbacResource array, which checkAuth treats as 'any element passes'. A JWT scoped to one detected table could submit a query that also reads another table it isn't scoped to. Wrap with everyResource — same AND-semantics fix as the batch trigger routes. #4 account.tokens/route.tsx: defaultRoleId could land on a custom or plan-blocked role when userRoleId wasn't in the picker's assignable set. The action's submit-revalidation would then 400 until the user manually changed the dropdown. Clamp the default to roles the picker actually renders. #5 settings.team/route.tsx: the role Select used defaultValue, so a failed set-role submit left the attempted role visible while the server kept the old one. Switch to a controlled value bound to currentRoleId.
Five real-bug fixes from CodeRabbit + Devin review of #3499: #1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string was 'RBAC fallback not installed' but the OSS fallback actually returns 'RBAC plugin not installed'. The mismatch made every PAT creation with a roleId hit the compensating-delete branch on self-hosters with no plugin installed — including the dashboard PAT-creation flow. Aligns the constant with the canonical string. #2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped the revoked-API-key grace window (RevokedApiKey table), so a freshly-rotated env API key would 401 immediately on the new auth path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey logic so the auth-cross-cutting e2e tests pass. #3 api.v1.query.ts: multi-table queries built a plain RbacResource array, which checkAuth treats as 'any element passes'. A JWT scoped to one detected table could submit a query that also reads another table it isn't scoped to. Wrap with everyResource — same AND-semantics fix as the batch trigger routes. #4 account.tokens/route.tsx: defaultRoleId could land on a custom or plan-blocked role when userRoleId wasn't in the picker's assignable set. The action's submit-revalidation would then 400 until the user manually changed the dropdown. Clamp the default to roles the picker actually renders. #5 settings.team/route.tsx: the role Select used defaultValue, so a failed set-role submit left the attempted role visible while the server kept the old one. Switch to a controlled value bound to currentRoleId.
Five real-bug fixes from CodeRabbit + Devin review of #3499: #1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string was 'RBAC fallback not installed' but the OSS fallback actually returns 'RBAC plugin not installed'. The mismatch made every PAT creation with a roleId hit the compensating-delete branch on self-hosters with no plugin installed — including the dashboard PAT-creation flow. Aligns the constant with the canonical string. #2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped the revoked-API-key grace window (RevokedApiKey table), so a freshly-rotated env API key would 401 immediately on the new auth path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey logic so the auth-cross-cutting e2e tests pass. #3 api.v1.query.ts: multi-table queries built a plain RbacResource array, which checkAuth treats as 'any element passes'. A JWT scoped to one detected table could submit a query that also reads another table it isn't scoped to. Wrap with everyResource — same AND-semantics fix as the batch trigger routes. #4 account.tokens/route.tsx: defaultRoleId could land on a custom or plan-blocked role when userRoleId wasn't in the picker's assignable set. The action's submit-revalidation would then 400 until the user manually changed the dropdown. Clamp the default to roles the picker actually renders. #5 settings.team/route.tsx: the role Select used defaultValue, so a failed set-role submit left the attempted role visible while the server kept the old one. Switch to a controlled value bound to currentRoleId.
zizmor flagged on PR #3499: - L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the rest of the repo). - L74 credential persistence → add `persist-credentials: false`. This job doesn't push, so leaving GITHUB_TOKEN in .git/config has no upside and a leak path through any subsequent step. - L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows; this job runs on ubuntu-latest, no buildjet runner needed). - L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…orts fix Four issues flagged on PR #3499: 1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated() was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor` to a number but actually passed it through unchanged. Slim AE accepts `number | DecimalLike`, so it compiled, but at runtime any consumer doing arithmetic would get NaN. Now calls `.toNumber()` explicitly. 2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's toAuthenticatedEnvironment — was a typed identity function. Now coerces the burst factor to number via the same union-narrowing pattern used by the EnvironmentQueuePresenter. 3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts) The fallback was using the primary PrismaClient for read-only auth-path queries (env lookup, revoked-key grace, session user lookup, PAT lookup). Pre-PR, findEnvironmentByApiKey used $replica for these. Shifting auth-path reads from replica to primary is a real perf regression on high-traffic deployments. RoleBaseAccessFallback now accepts either a single PrismaClient or `{ primary, replica }`. The webapp passes both — `prisma` for writes (role mutations) and `$replica` for the read-only authenticate* paths. The fallback's role-mutation methods are no-op stubs in the plugin-not-installed case, so writes only need the primary nominally; the type kept for symmetry. 5. (packages/core/package.json) Two new subpath exports added in this PR — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from the `typesVersions` block. attw flagged "resolution failed" on both, failing the `check-exports` CI job. Added the entries. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
22345f6 to
e9340b2
Compare
Five real-bug fixes from CodeRabbit + Devin review of #3499: #1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string was 'RBAC fallback not installed' but the OSS fallback actually returns 'RBAC plugin not installed'. The mismatch made every PAT creation with a roleId hit the compensating-delete branch on self-hosters with no plugin installed — including the dashboard PAT-creation flow. Aligns the constant with the canonical string. #2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped the revoked-API-key grace window (RevokedApiKey table), so a freshly-rotated env API key would 401 immediately on the new auth path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey logic so the auth-cross-cutting e2e tests pass. #3 api.v1.query.ts: multi-table queries built a plain RbacResource array, which checkAuth treats as 'any element passes'. A JWT scoped to one detected table could submit a query that also reads another table it isn't scoped to. Wrap with everyResource — same AND-semantics fix as the batch trigger routes. #4 account.tokens/route.tsx: defaultRoleId could land on a custom or plan-blocked role when userRoleId wasn't in the picker's assignable set. The action's submit-revalidation would then 400 until the user manually changed the dropdown. Clamp the default to roles the picker actually renders. #5 settings.team/route.tsx: the role Select used defaultValue, so a failed set-role submit left the attempted role visible while the server kept the old one. Switch to a controlled value bound to currentRoleId.
zizmor flagged on PR #3499: - L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the rest of the repo). - L74 credential persistence → add `persist-credentials: false`. This job doesn't push, so leaving GITHUB_TOKEN in .git/config has no upside and a leak path through any subsequent step. - L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows; this job runs on ubuntu-latest, no buildjet runner needed). - L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…orts fix Four issues flagged on PR #3499: 1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated() was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor` to a number but actually passed it through unchanged. Slim AE accepts `number | DecimalLike`, so it compiled, but at runtime any consumer doing arithmetic would get NaN. Now calls `.toNumber()` explicitly. 2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's toAuthenticatedEnvironment — was a typed identity function. Now coerces the burst factor to number via the same union-narrowing pattern used by the EnvironmentQueuePresenter. 3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts) The fallback was using the primary PrismaClient for read-only auth-path queries (env lookup, revoked-key grace, session user lookup, PAT lookup). Pre-PR, findEnvironmentByApiKey used $replica for these. Shifting auth-path reads from replica to primary is a real perf regression on high-traffic deployments. RoleBaseAccessFallback now accepts either a single PrismaClient or `{ primary, replica }`. The webapp passes both — `prisma` for writes (role mutations) and `$replica` for the read-only authenticate* paths. The fallback's role-mutation methods are no-op stubs in the plugin-not-installed case, so writes only need the primary nominally; the type kept for symmetry. 5. (packages/core/package.json) Two new subpath exports added in this PR — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from the `typesVersions` block. attw flagged "resolution failed" on both, failing the `check-exports` CI job. Added the entries. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Five real-bug fixes from CodeRabbit + Devin review of #3499: #1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string was 'RBAC fallback not installed' but the OSS fallback actually returns 'RBAC plugin not installed'. The mismatch made every PAT creation with a roleId hit the compensating-delete branch on self-hosters with no plugin installed — including the dashboard PAT-creation flow. Aligns the constant with the canonical string. #2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped the revoked-API-key grace window (RevokedApiKey table), so a freshly-rotated env API key would 401 immediately on the new auth path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey logic so the auth-cross-cutting e2e tests pass. #3 api.v1.query.ts: multi-table queries built a plain RbacResource array, which checkAuth treats as 'any element passes'. A JWT scoped to one detected table could submit a query that also reads another table it isn't scoped to. Wrap with everyResource — same AND-semantics fix as the batch trigger routes. #4 account.tokens/route.tsx: defaultRoleId could land on a custom or plan-blocked role when userRoleId wasn't in the picker's assignable set. The action's submit-revalidation would then 400 until the user manually changed the dropdown. Clamp the default to roles the picker actually renders. #5 settings.team/route.tsx: the role Select used defaultValue, so a failed set-role submit left the attempted role visible while the server kept the old one. Switch to a controlled value bound to currentRoleId.
zizmor flagged on PR #3499: - L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the rest of the repo). - L74 credential persistence → add `persist-credentials: false`. This job doesn't push, so leaving GITHUB_TOKEN in .git/config has no upside and a leak path through any subsequent step. - L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows; this job runs on ubuntu-latest, no buildjet runner needed). - L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
#1 internal-packages/rbac/src/ability.ts (severity: 🔴 silent privilege escalation): buildJwtAbility was treating any scope starting with `admin:` as a universal wildcard. Pre-RBAC, the legacy checkAuthorization string-matched superScopes — `admin:sessions` only granted access to routes that explicitly listed it. After the JWT- ability split, the same scope was returning true for every action on every resource. Restrict the bypass to bare `admin` (no second segment); `admin:<type>` now flows through normal matching as action="admin" against resources of that type. Adds 2 regression tests in ability.test.ts. #2 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (status discard): authenticateRequestForApiBuilder hardcoded `status: 401` even though BearerAuthResult.status is `401 | 403`. A plugin returning 403 (e.g. suspended account, IP block) would silently get downgraded to 401 — semantically wrong (401 = "who are you?", 403 = "you're not allowed") and confusing for client retry logic. Plumb result.status through. #3 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (everyResource([]) vacuous truth): [].every() returns true, so everyResource([]) was passing auth for any token. Not exploitable today (Zod rejects empty bodies before auth), but the auth layer should never grant on empty input. Same defensive guard added to anyResource() for symmetry — only PERMISSIVE_ABILITY would have granted there, but the pattern shouldn't depend on the ability's choice. #4 internal-packages/rbac/src/fallback.ts (PREVIEW env regression): the fallback's authenticateBearer looked up environments by apiKey only, skipping the branch-aware resolution that findEnvironmentByApiKey does for PREVIEW envs. Self-hosters using preview/branch envs would either fail or operate against the parent env. Mirror the legacy path: read x-trigger-branch, include matching child env, and pivot the resolved env to the child (apiKey/orgMember/organization/project inherited from parent). sanitizeBranchName inlined here because internal-packages can't import webapp code; comment notes the duplication. All four flagged by Devin's PR review. Cloud plugin's buildJwtAbility gets the same #1 fix in a sibling commit on this PR's cloud branch. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…core/v3/utils/gitBranch Both helpers were originally in apps/webapp/app/v3/gitBranch.ts. internal-packages/rbac needed sanitizeBranchName for the PREVIEW-env branch resolution added in 8246234, and copy-pasting the function into the fallback was a smell. Moves the canonical home into core (no internal-package can import webapp code), and updates the four webapp call sites + the rbac fallback to import from @trigger.dev/core/v3/utils/gitBranch. `sanitizeBranchName`'s input type is widened slightly to `string | null | undefined` so callers passing `Headers.get(...)` (which returns `string | null`) don't need a `?? undefined` workaround. The existing webapp callers all pass `string | undefined`; the new union is backwards-compatible. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…minate redundant findEnvironmentById Plugins now return the full env shape they fetched. The apiBuilder bridge passes it straight through to handlers — no follow-up findEnvironmentById round-trip. @trigger.dev/core/v3/auth/environment (new) defines the slim AuthenticatedEnvironment structurally, independent of @trigger.dev/database. @trigger.dev/plugins re-exports it plus sanitizeBranchName so cloud's workspace-linked plugin imports from one surface. Internal-packages/rbac/src/fallback.ts drops toRbacEnvironment stripping. apiBuilder.server.ts bridge drops findEnvironmentById. Model finders coerce to slim shape. Various downstream signature updates accept slim AE. Net: single DB call per API request. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
zizmor flagged on PR #3499: - L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the rest of the repo). - L74 credential persistence → add `persist-credentials: false`. This job doesn't push, so leaving GITHUB_TOKEN in .git/config has no upside and a leak path through any subsequent step. - L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows; this job runs on ubuntu-latest, no buildjet runner needed). - L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…orts fix Four issues flagged on PR #3499: 1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated() was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor` to a number but actually passed it through unchanged. Slim AE accepts `number | DecimalLike`, so it compiled, but at runtime any consumer doing arithmetic would get NaN. Now calls `.toNumber()` explicitly. 2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's toAuthenticatedEnvironment — was a typed identity function. Now coerces the burst factor to number via the same union-narrowing pattern used by the EnvironmentQueuePresenter. 3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts) The fallback was using the primary PrismaClient for read-only auth-path queries (env lookup, revoked-key grace, session user lookup, PAT lookup). Pre-PR, findEnvironmentByApiKey used $replica for these. Shifting auth-path reads from replica to primary is a real perf regression on high-traffic deployments. RoleBaseAccessFallback now accepts either a single PrismaClient or `{ primary, replica }`. The webapp passes both — `prisma` for writes (role mutations) and `$replica` for the read-only authenticate* paths. The fallback's role-mutation methods are no-op stubs in the plugin-not-installed case, so writes only need the primary nominally; the type kept for symmetry. 5. (packages/core/package.json) Two new subpath exports added in this PR — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from the `typesVersions` block. attw flagged "resolution failed" on both, failing the `check-exports` CI job. Added the entries. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Webapp's vitest config sets `pool: "forks"` and the test script uses
`--no-file-parallelism`, so every test file in a shard executes
sequentially in the same forked Node process. globalThis persists
across files even though vitest clears the module cache between them.
Two places call `provider.register()`:
- `~/v3/tracer.server.ts` via the `setupTelemetry` singleton — sets the
OTel global APIs (trace/context/propagation).
- `test/utils/tracing.ts#createInMemoryTracing()` — also calls
`provider.register()` to make `trace.getTracer("test-tracer")`
return its in-memory provider's tracer.
Once the webapp's tracer.server.ts has been loaded by any test in the
shard, the globals are set. The next test that calls
createInMemoryTracing fails with `@opentelemetry/api: Attempted
duplicate registration of API: trace/context/propagation`. Failed test
stack pointed at `runsBackfiller.test.ts:30` calling
`createInMemoryTracing()` after triggerTask.test.ts (which loads
tracer.server.ts via `~/runEngine/services/triggerTask.server`) ran
first in shard 6.
The same shape failure cascades through the rest of the shard's tests
because the consumer keeps trying and tracer.server.ts can't be
re-singleton'd cleanly after the cache eviction.
Why now: this PR's slim-AuthenticatedEnvironment refactor changed the
import graph so that triggerTask test's transitive imports actually
reach tracer.server.ts in CI's shard 6 distribution. It worked before
because the shard's tests didn't co-load both register() callers. But
the latent issue was always there — fixing the test util is the right
layer.
Fix: drop `provider.register()` from createInMemoryTracing. The
exporter still receives spans because the consumer uses the tracer
returned from `provider.getTracer(...)` directly (provider-scoped, not
global). Tests that genuinely need the global must register their own
provider explicitly and clean up — but none of the current callers
do.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…t") that collides with OTel auto-instrumentation
The previous fix in this PR (lazy-load `expect` inside assertNonNullable
via require("vitest")) was intended to keep the module importable from
contexts where vitest isn't initialized at module-load time (a vitest
globalSetup, where the worker doesn't exist yet).
But require() inside the function body collides with OTel's
`@opentelemetry/instrumentation`, which uses `require-in-the-middle`
to hook every Node `require()` call. vitest is ESM-only, so once OTel's
been touched in the same process, the next `require("vitest")` call
throws "Vitest cannot be imported in a CommonJS module using require()".
That instrumentation runs as soon as the run-engine — or any code that
loads its OTel-traced internals — is imported. That's:
- Every run-engine internal test (run-queue, heartbeats, pendingVersion)
uses assertNonNullable.
- Every webapp test that transitively reaches RunEngine (triggerTask,
runsBackfiller, plus any test that imports services using engine
internals) does too.
Each shard ran ~7/8 failing because some early test loaded run-engine,
require-in-the-middle armed, then the next assertNonNullable call
exploded — cascading the rest of the shard's tests via vitest's
fail-fast on uncaught errors in the same process.
Fix: replace the vitest.expect call with a plain throw. Vitest still
gets a useful failure (the message shows in the stack trace) without
the require() hazard. The test that PR #3438 originally needed this
for (a globalSetup that imported assertNonNullable before workers
existed) still works — there's no top-level vitest import any more.
Verified: triggerTask.test.ts (which previously failed locally with
the require-in-the-middle error) now passes 8/8.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…okie helper
Background. The RBAC plugin contract took a `helpers.getSessionUserId(request)`
callback at create time. The OSS rbac.server module wired it up by statically
importing `getUserId` from `~/services/session.server`, which transitively
loaded the entire remix-auth pipeline:
rbac.server.ts
→ session.server.ts → auth.server.ts
→ emailAuth.server.tsx (throws on missing MAGIC_LINK_SECRET at module load)
→ email.server.ts → commonWorker.server.ts (V1 services)
→ marqs/index.server.ts and taskRunConcurrencyTracker.server.ts
(throw on missing REDIS_HOST/REDIS_PORT at module load)
Any module that pulled `rbac` in — including PAT-only call sites that have
no session-cookie path at all — therefore inherited a hard dependency on
the entire dashboard auth chain and the V1 queue. Webapp tests that mock
`~/env.server` to a stripped object (e.g. personalAccessToken.test.ts) hit
the module-load throws before their assertions ran.
Contract change (packages/plugins/src/rbac.ts).
- `authenticateSession` and `authenticateAuthorizeSession` now take
`userId: string | null` in their `context` argument.
- `RoleBasedAccessControlPlugin.create()` no longer takes a helpers arg.
- The plugin treats `userId === null` as "no authenticated user"
(same shape `helpers.getSessionUserId === null` returned).
OSS fallback (internal-packages/rbac/src/fallback.ts) reads
`context.userId` directly. LazyController (internal-packages/rbac/src/index.ts)
drops the helpers parameter through.
Webapp (apps/webapp/app/services/rbac.server.ts) drops the session.server
import entirely. dashboardBuilder.server.ts — the only `authenticateSession`
caller — resolves userId via session.server itself before calling rbac.
Side change (apps/webapp/app/services/scheduleEmail.server.ts new file).
Moved `scheduleEmail` out of email.server.ts to break a parallel chain:
email.server.ts → commonWorker → V1 services → marqs. email.server is
now just the SMTP/Resend client, as it should have been. Three caller
files updated: routes/invite-resend.tsx, routes/_app.orgs.$organizationSlug.invite/route.tsx,
services/mfa/multiFactorAuthentication.server.ts.
Verified locally with `REDIS_HOST="" REDIS_PORT="" pnpm vitest run
test/services/personalAccessToken.test.ts test/engine/triggerTask.test.ts`
— both files pass (11/11) with the env mocked stripped, which was the
CI failure mode on shards 1/2/4/5/6/7. Webapp + plugins + rbac
typecheck clean.
Cloud-repo plugin update is in a coordinated branch (rbac-packages on
APIHero/cloud) — same contract change in
enterprise/plugins/src/rbac/{index,controller,controller.integration.test}.ts.
Changeset: minor for `@trigger.dev/plugins` (contract change).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…n createInMemoryTracing The previous form (commit 133d468) skipped `provider.register()` to avoid "Attempted duplicate registration of API: trace" when another test in the same shard had already loaded `~/v3/tracer.server.ts` (which calls register() via its `singleton("opentelemetry", …)`). That fix went too far: tests like `sentryTraceContext.server.test.ts` exercise OTel's *global* API (`trace.getActiveSpan()` via `getActiveTraceIds`), which requires a registered global context manager. Without register(), `trace.getActiveSpan()` returned undefined — 6/9 assertions in that file flipped red on shard 1 of the latest CI run. Webapp's vitest config uses `pool: "forks"` with `--no-file-parallelism`, so all test files in a shard share one process. globalThis persists across files even though vitest clears the module cache between them. OTel's `setGlobal*` no-ops (logs an error, returns false) when a global is already set — so the second caller's register() is a no-op, leaving the globals pointed at the first caller's provider. Fix: call `trace.disable(); context.disable(); propagation.disable();` *before* `provider.register()`. The disables clear any previously-set globals (whether from `tracer.server.ts`'s singleton or a prior createInMemoryTracing call), so the test's provider always wins for both span recording (via SimpleSpanProcessor) AND the global API. Each createInMemoryTracing call rotates the globals; no leakage to the next test file. Verified locally: `pnpm vitest run test/sentryTraceContext.server.test.ts` 9/9 pass. Running it alongside test/engine/triggerTask.test.ts (which loads tracer.server.ts) reaches the test-setup phase normally — no more module-load OTel collision. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ticated mapper
CodeRabbit flagged: after switching the API-key branch to return the
slim AuthenticatedEnvironment shape (via runtimeEnvironment.server.ts's
`toAuthenticated()` + `authIncludeBase`), the PAT and OAT branches in
`authenticatedEnvironmentForAuthentication()` still ran their own
`runtimeEnvironment.findFirst` calls with a smaller include
(`{ project: true, organization: true }`) and returned the raw Prisma
row. The function's return type narrowed structurally, but the runtime
shape was auth-method-dependent — `concurrencyLimitBurstFactor` came
back as Prisma's `Decimal` for PAT/OAT but `number` for API-key, and
PAT/OAT lacked `orgMember` entirely while API-key had it.
Fix: export the shared `authIncludeBase` / `authIncludeWithParent`
include shapes and `toAuthenticated()` from runtimeEnvironment.server,
then use them in the PAT/OAT branches. PREVIEW envs still override
apiKey with the parent's apiKey before mapping. All four branches now
return the same normalized slim shape — Decimal coerced to number,
columns the slim type doesn't carry stripped, orgMember populated
consistently.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…hten team action authz
Two related findings reviewing the RBAC route migration.
1. api.v1.runs / realtime.v1.runs: type-level scope no longer granted unfiltered access
Pre-RBAC the resource was a plain object (`{ tasks: searchParams[...] }`
for api.v1.runs, the searchParams object itself for realtime.v1.runs)
and the legacy `checkAuthorization` iterated `Object.keys`. So a JWT
with type-level `read:tasks` (api.v1.runs) or `read:tags`
(realtime.v1.runs) — no id — granted access to the unfiltered list.
The new `anyResource([…])` model only matches resources we list,
and the post-migration list dropped the type-level alternative when
no filter was set, so those JWTs started 403'ing on unfiltered
listings. Add `{ type: "tasks" }` and `{ type: "tags" }` (no id)
alongside `{ type: "runs" }` so the legacy semantic is preserved
verbatim — per-id `read:tasks:foo` / `read:tags:bar` still grants
only when the filter contains the matching id.
2. team action: explicit per-intent authz gate
The team route's action handler accepts three intents (set-role,
purchase-seats, remove-member). set-role already gates on
`manage:members`, but purchase-seats and remove-member relied on
model-layer enforcement (SetSeatsAddOnService, removeTeamMember).
The loader gates on `read:members` (broader audience than
pre-RBAC's owner/admin-only loader), so any user reaching the page
could submit those forms — banking on defense in depth at the
model layer.
Fix:
- purchase-seats → require `manage:billing`
- remove-member → require `manage:members`, with self-leave
(target.userId === actor.userId) carved out as always allowed
- removeTeamMember model fn previously only verified the actor was
a member of the org — any org member could remove any other
member by id. The new route gate closes that hole.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tring coupling
The PAT-creation flow needed to know "is a real RBAC plugin loaded, or
just the OSS fallback?" so it could skip the compensating-delete
rollback path when the fallback returns ok:false from setTokenRole
(no plugin → no role table to write to → that's not a failure).
Previous implementation stringly-typed it: the fallback returned
`{ ok: false, error: "RBAC plugin not installed" }`, and the webapp's
PAT module compared `roleResult.error === FALLBACK_NOT_INSTALLED_ERROR`
to detect the case. If the fallback's error string ever drifted, PAT
creation would blow up with a compensating-delete on every install
without an enterprise plugin.
Replace with an explicit capability method on the controller:
RoleBaseAccessController.isUsingPlugin(): Promise<boolean>
OSS fallback returns false; cloud plugin returns true. LazyController
delegates. The webapp gates the rbac.setTokenRole call on
`await rbac.isUsingPlugin()`, skipping it when the fallback is in
use. No string magic, decoupled from any specific error message,
and the capability is generally useful — UI surfaces that gate
role-pickers / admin-only sections on plugin presence can use the
same method.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
`buildJwtAbility` parsed scopes with `[a, b, c] = scope.split(":")`
which captures only the first three segments. A scope like
`read:tags:env:staging` (a tag id containing a colon) lost everything
after the second colon — `scopeId` became `"env"`, and the tag
`{ type: "tags", id: "env:staging" }` silently failed to match.
Fix: split into all parts, then take everything after the second
colon as the resource id (`parts.slice(2).join(":")`). Two-segment
scopes (`read:tags`) still produce `scopeId === undefined`,
preserving the type-level wildcard semantic.
Test coverage added for the multi-colon-id path. The bug was
pre-existing in the legacy `checkAuthorization` too — system-generated
ids (friendlyIds like `run_abc`) use underscores, so the issue only
surfaced for user-provided strings (tags), which made it easy to miss.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… dead authorization.server.ts Two related cleanups falling out of the RBAC migration. The `scopes?: string[]` field on `ApiAuthenticationResultSuccess` was populated from JWT claims by the legacy auth path and consumed only by `services/authorization.server.ts`'s `checkAuthorization`. The new apiBuilder bridge (which constructs the same result type) doesn't populate scopes, and every call site has migrated to the rbac ability model. Verified by grep: no handler reads `authentication.scopes` / `authenticationResult.scopes` anywhere outside the dead `authorization.server` module. A perpetually-undefined optional field is a footgun — future code might branch on it and silently misbehave. Drop it from the type and remove the two dead-write populations in `authenticateApiKey` and `authenticateApiKeyWithFailure`. `services/authorization.server.ts` (the file that defined `AuthorizationEntity`, `AuthorizationResources`, and the `checkAuthorization` function) is wholly dead — nothing in `apps/webapp/app` imports it any more. Delete it. The route comments that reference "the legacy `checkAuthorization`" are kept for historical context. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ter OSS empty-state + dedupe orgId lookup
Four targeted fixes off the RBAC PR audit.
1. Batch retrieve routes (api.v1, api.v2, realtime.v1) accept read:runs again
Pre-RBAC, these routes' superScopes included `read:runs`, so a JWT
minted with `read:runs` could read batches. The new strict scope-type
match dropped that — `read:runs` no longer trivially matches
`{type: "batch"}`. JWTs in the wild that batched on `read:runs`
started 403'ing. Wrap the resource in `anyResource([{type:"batch",
id}, {type:"runs"}])` to preserve the legacy semantic. Per-id and
type-level `read:batch` still grant via the first element.
2. Document the deprecated pk_* token cutover
The new RBAC fallback intentionally does NOT call
`findEnvironmentByPublicApiKey`. pk_* tokens haven't been issued for
years; public access is JWT-based now. Added explicit comments
alongside the bearer-auth path (fallback) and the deprecated model
helper (`findEnvironmentByPublicApiKey` jsdoc) so future contributors
understand the cutover is deliberate.
3. Roles page shows a deployment-aware empty state
Old empty state told OSS self-host users to "upgrade to Pro to
unlock RBAC" — wrong copy: there's no plan to upgrade to in a
self-hosted deployment. Thread `rbac.isUsingPlugin()` through the
loader and branch the `EmptyState` component: plugin loaded →
plan-upsell copy (existing behaviour); plugin absent → explain
that role infrastructure isn't part of the self-hosted edition.
Also gate the `CreateRoleUpsell` button on `isUsingPlugin` so OSS
doesn't see a dangling Enterprise-pitch dialog.
4. Drop the redundant org-by-slug lookup in purchase-seats
The team route's action handler's `purchase-seats` branch
re-resolved the org by slug instead of reading the orgId the
dashboardBuilder's context callback had already cached.
`context.organizationId` is the source of truth on both the OSS
fallback and cloud plugin paths (the plugin takes orgId as input,
doesn't re-resolve), so one resolution per request is enough.
Other branches in the same handler already read from context;
purchase-seats was the lone outlier.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
"Roles aren't available in this self-hosted deployment. All members have full access. Role-Based Access Controls are available in Trigger.dev Cloud or with an enterprise self-hosted license." Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The previous apiBuilder PAT flow ran two PAT lookups per request: authenticateApiRequestWithPersonalAccessToken first (token validation + lastAccessedAt write), then rbac.authenticatePat (cap+floor join). Two DB roundtrips for what should be one — a regression vs pre-RBAC main's single findFirst + conditional updateMany. Option B from the audit: - plugin contract: `PatAuthResult` now includes `lastAccessedAt: Date | null` - OSS fallback selects + returns it - cloud plugin (paired PR): selects + returns it; Drizzle schema mirror gets the column webapp apiBuilder now calls rbac.authenticatePat as the sole PAT auth (empty ctx when no `context`/`authorization` declared — fallback validates fine), then fires `updateLastAccessedAtIfStale` once with the plugin-returned timestamp. `updateLastAccessedAtIfStale` is a new shared helper in `personalAccessToken.server.ts` that two-layer throttles the write: JS check elides the SQL entirely when the cached timestamp is fresh (< 5 min), and the SQL WHERE inside the conditional updateMany is race-safe for concurrent auths that both decide to fire. The legacy `authenticatePersonalAccessToken` is refactored to use the same helper — picks up the JS-side smart-skip for free (was always firing the updateMany regardless). Query count (apiBuilder PAT route, fresh cache): main: 1 findFirst + 1 conditional updateMany = 2 PR before: 1 findFirst + 1 plugin join + 1 updateMany = 3 this commit: 1 plugin auth + 0 (skipped, fresh) = 1 ✓ (stale-cache case still matches main at 2.) PERMISSIVE_ABILITY local sentinel removed — the plugin always returns a real ability (permissive in the OSS fallback case, computed in cloud). Co-located comment in checkAuth() generalised to mention the permissive ability rather than the removed constant. Coordinated with the cloud-side change in triggerdotdev/cloud#763 (rbac-packages branch). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
origin/main switched from `react-window-splitter` to `@window-splitter/react` (along with the usual cascade of transitive bumps). The rebase preserved our older lockfile commits, leaving the workspace unable to resolve the renamed package — every file using ResizablePanel failed typecheck. Standard lockfile-conflict resolution: take origin/main's pnpm-lock.yaml verbatim, run `pnpm install`, commit the reconciled result. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
99c629c to
704b072
Compare
…match backwards-compat fix Two CI failures on PR #3499: 1. Shard 7 — `test/authorization.test.ts` failed at module load with "Cannot find module '../app/services/authorization.server'". That module was deleted in 18e0701 (the entire `checkAuthorization` function went with the apiBuilder migration), but I missed the test file that exercised it — the earlier grep for live consumers only walked `app/`, not `test/`. The test was unit coverage for the now-deleted function; nothing else worth saving in it. Delete. 2. E2E auth — `JWT read:runs: 403 (resource type is 'batch', not 'runs')` was asserting the strict-match behaviour that 9f987ae deliberately reverted for backwards compat (batch retrieve routes wrap `{type: "batch", id}` in anyResource([...]) alongside `{type: "runs"}` so JWTs minted with read:runs against batch endpoints keep working). The test caught my own intentional change. Update the expectation: `read:runs` JWT should now pass (not be 403), and rename the test to reflect the new behaviour with a comment pointing at the backwards-compat rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Before: the org-settings sidebar gated the "Roles" link on `isManagedCloud` — a deploy-config flag set by triggerdotdev's hosted-cloud env. That's wrong on both sides: - self-hosted enterprise installs (with the plugin loaded but isManagedCloud=false) wrongly HID the link - OSS self-hosted (no plugin) showed it via the page itself only with the empty-state copy After: gate on `rbac.isUsingPlugin()` — true on both managed cloud and self-hosted enterprise installs (anywhere the plugin actually loads), false on OSS where there's no role infrastructure to manage. Cleaner semantic, and uses the same capability flag the Roles page itself uses for its empty-state branching. Pluming: the org-settings layout loader (already runs server-side for every settings page) calls `await rbac.isUsingPlugin()` and threads the boolean into the sidebar component. The deployment-aware empty-state on the Roles page itself stays — direct URL navigation still reaches the route, so the page needs to handle that case for any auditor / engineer who knows the URL. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
Consolidates the webapp's authentication and authorization into a small set of route helpers, replacing the ad-hoc
requireUser/requireUserId/authenticatedEnvironmentForAuthenticationcalls scattered across routes. Same security model, but the per-request flow (authenticate → authorize → load) now lives in one place per route family.Introduces a plugin seam (
@trigger.dev/plugins) that lets the cloud build install a richer RBAC implementation without touching webapp code. The OSS fallback keeps the pre-RBAC permissive behaviour intact, so self-hosted deployments work unchanged.Adds a comprehensive end-to-end auth test suite that didn't exist before — 193
it()blocks (vitest reports ~199 afterit.eachexpansion) covering API key, PAT and JWT auth across the public API surface, plus dashboard session auth for admin pages.Changes
Plugin contract —
@trigger.dev/pluginsRoleBaseAccessControllerinterface authoritative for both OSS (fallback) and cloud (enterprise plugin):authenticateBearer(request, { allowJWT? })— API-key / public-JWT auth, returns env + abilityauthenticateSession(request, { userId, organizationId?, projectId? })— dashboard auth, caller resolvesuserIdfrom the session cookie and passes it in (nohelpers.getSessionUserIdcallback — decouples the plugin host from session-cookie code)authenticatePat(request, { organizationId?, projectId? })— PAT auth, returns identity +lastAccessedAtso the host can throttle the per-request updateauthenticateAuthorize*variants for the auth-and-check-in-one-call casesisUsingPlugin(): Promise<boolean>— capability flag for UI / branching where plugin-present-ness matters; replaces the sentinel-string coupling that hadpersonalAccessToken.servermatching"RBAC plugin not installed"literallyDashboard auth (started, partial rollout)
Admin and settings pages migrated to a unified
dashboardLoader/dashboardActionhelper that authenticates the session, runs an authorization check, and exposes the result to the route. Other dashboard routes still on the old pattern; remaining migration tracked in TRI-8730.Migrated routes:
admin.*(14 admin / back-office / feature-flags / LLM-models / notifications / orgs / concurrency pages)_app.orgs.$organizationSlug.settings.team_app.orgs.$organizationSlug.settings.rolesAPI / realtime / engine auth (complete for the migrated families)
71 routes migrated to a unified
apiBuilderthat centralizes Bearer / PAT / Public-JWT authentication and applies the per-route authorization check before the handler runs. Includes:api.v1.*andapi.v2.*andapi.v3.*— tasks, runs, batches, queues, prompts, deployments, query, sessions, waitpoints, packets, workers, idempotency keysrealtime.v1.*— runs, batches, sessions, streamsengine.v1.*— dev / worker-action protocols29 routes still on the legacy
authenticateApiRequest*helpers — tracked as a post-deploy follow-up in TRI-9228.Multi-resource auth direction is now explicit at the call site via
anyResource(...)(OR) andeveryResource(...)(AND). Bare arrays no longer typecheck — fixes a class of bug where a JWT scoped to one resource could implicitly access others under OR semantics.PAT auth path consolidated: was three DB queries per request (legacy
authenticateApiRequestWithPersonalAccessTokenfindFirst +rbac.authenticatePatjoin +lastAccessedAtupdate). Now one query in the steady state — plugin returnslastAccessedAt, host smart-skips the update via JS-side throttle when fresh.Side effect: action aliases preserved historic JWT scope semantics where the new model is stricter (e.g. a
write:tasksJWT now also satisfiestrigger/batchTrigger/updateactions on the same resource — matched at the auth boundary, not in the route handler).Backwards-compat fixes
The strict-match model regressed several real-world JWT shapes. Each preserved via explicit
anyResource(...)entries in the route's authz block:api.v1.batches.$batchId,api.v2.*,realtime.v1.batches.*) acceptread:runsJWTs again (pre-RBAC literal-match superScope behaviour)api.v1.runs,realtime.v1.runs) accept type-levelread:tasks/read:tagson unfiltered queries (matched the legacyObject.keysiteration semantic)toAuthenticatedso all auth methods return the same slimAuthenticatedEnvironment(was: API-key returned the slim shape but PAT/OAT returned raw PrismaDecimal/ noorgMember):preservation in resource ids —read:tags:env:stagingnow correctly identifies the tag id asenv:staging, notenvSlim
AuthenticatedEnvironmentExtracted to
@trigger.dev/core/v3/auth/environment— a structural shape independent of@trigger.dev/database. The plugin contract returns this; webapp consumers import from there; the cloud plugin (Drizzle) returns the same shape without Prisma'sDecimalclass leaking into the public surface. Lets internal-packages (run-engine, etc.) refer toAuthenticatedEnvironmentwithout pulling Prisma in.Auth test suite (new —
*.e2e.full.test.ts)193 e2e tests run against a real spawned webapp + Postgres (no mocks). Coverage matrix:
Hygiene cleanups
app/services/authorization.server.ts(legacycheckAuthorization+ types — no live consumers post-migration) and its orphaned testscopesfield fromApiAuthenticationResultSuccessscheduleEmailmoved out ofemail.server.tsinto its own module — breaks acommonWorker → marqs/V1import chain that was poisoning the auth test graphrbac.isUsingPlugin()manage:billingfor purchase-seats,manage:membersfor set-role + remove-member with self-leave carve-out)Cross-repo coordination
All public-package contract changes paired in
triggerdotdev/cloud#763(rbac-packages branch) — the enterprise plugin implements the sameRoleBaseAccessControllerinterface against Drizzle.Test plan
pnpm run typecheck --filter webappcleanpnpm --filter webapp exec vitest run --config vitest.e2e.full.config.ts— 193/193 pass (requires Docker for testcontainers)