feat(rbac): end-to-end RBAC — Keycloak, agentgateway/MCP enforcement, audit, Slack/UI identity (Specs 098/102/103/104)#1257
Conversation
… git workflow rules Add the Enterprise Identity Federation and User Impersonation architecture document to docs/docs/architecture/, covering OAuth 2.0 Token Exchange (RFC 8693), OBO delegation, Keycloak integration, and the full chain-of-trust design for CAIPE agents acting on behalf of authenticated users. Also update CLAUDE.md, .cursorrules, and .specify/.cursorrules to reflect the git worktree-based development workflow and the corrected branch naming convention: prebuild/<type>/<description> (e.g. prebuild/docs/enterprise-identity-federation). Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…ederation doc The document referenced 'Pattern 2' without defining other patterns, making the label confusing. Replaced all instances with the architectural name each reference already used: 'One-Time User Consent with Identity Linking'. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Replace all organization-specific references with generic equivalents: - sri@cisco.com → user@example.com - cisco.okta.com → your-org.okta.com - @Sri-GH → @myusername - 'Cisco Okta SSO' → 'Enterprise IdP (Okta)' - '(e.g., Cisco)' prose removed; reframed as generic enterprise environment Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…eric placeholders Replace the Keycloak identity provider JSON config block with fully generic placeholders so the document reads as reference architecture rather than a Cisco-specific runbook: - alias: okta-enterprise → enterprise-idp - displayName: 'Enterprise IdP (Okta)' kept, example clarified - all URLs: your-org.okta.com → <idp-domain> - clientId: caipe-keycloak-client → <keycloak-client-id> - clientSecret vault key: okta-client-secret → idp-client-secret - matching alias filter in Python snippet updated to enterprise-idp Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Rules changes (CLAUDE.md, .cursorrules, .specify/.cursorrules removal) extracted to prebuild/chore/git-worktree-workflow-rules (PR #976). This branch now contains only the architecture doc and spec. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…agrams Add two new architecture docs with Mermaid diagrams: - slack-bot-authorization.md: End-to-end authorization topology, pre-authorization identity binding (Okta→Keycloak), runtime token exchange sequence with 4 scope validation gates, multi-agent scope isolation, JWT delegation chain, and error recovery flows. Clearly labels WebSocket (Socket Mode) for Slack↔Bot and A2A Protocol for Bot↔CAIPE communication. - slack-io-guardrails.md: Input/output guardrail architecture for the Slack bot pipeline. Input guardrails (length, secrets, PII, prompt injection, content policy) and output guardrails (credential scan, PII leak, hallucination markers, content safety, format sanitization) with pluggable chain pattern, configuration schema, and observability/metrics integration. Also adds both docs plus enterprise-identity-federation to the docs sidebar, and cross-references the authorization diagrams from the federation doc. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
…tt chart Converts the CAIPE Architecture Evolution slide into a Docusaurus markdown page with a Mermaid Gantt chart covering the roadmap from static distributed agents through dynamic/single unification and persona-based profiles. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Spec 096: policy engine comparison (Cedar, CEL, Casbin, OPA/Rego), AgentGateway/Keycloak/Slack-Webex external authz research, and supporting architecture docs (identity federation, Slack authorization and I/O guardrails, architecture evolution) consolidated under docs/docs/specs. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Rename docs/docs/specs/096-policy-engine-comparison to 093-agent-enterprise-identity; update spec number (093), feature branch name, and research context lines. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
…mparison Keep git branch name unchanged for the open PR; document spec folder slug 093-agent-enterprise-identity in the same line. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Fold documentation-site and contributor-workflow checklist from former 095-enterprise-identity-federation-docs into 093 spec.md and README. Remove redundant 095 spec file. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Add User Story 7 (P1), FR-026 (RAG server Keycloak JWT integration), FR-027 (per-KB access control with hybrid Keycloak roles + team ownership), SC-010/SC-011, Phase 10 tasks (T116-T127), and architecture section documenting defense-in-depth enforcement for KB operations. Key additions across 5 spec documents: - spec.md: Session clarification, FR-026, FR-027, US7, edge cases, SC-010/SC-011 - architecture.md: RAG RBAC to Keycloak + Per-KB Access Control section with flow diagrams, role mapping table, query-time filtering - plan.md: Phase 10 reference, updated summary and project structure - tasks.md: 12 new tasks (T116-T127) in 3 sub-phases, updated dependencies, FR/SC coverage maps, execution strategy - permission-matrix.md: per-KB capability rows, enforcement points, per-KB roles in roles summary Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Add Phase 11 / User Story 8 for dynamic agent RBAC with CEL as the mandated policy engine across all enforcement points: - spec.md: Session 2026-03-25 clarifications, FR-028 (three-layer dynamic agent RBAC), FR-029 (CEL everywhere), FR-030 (deepagent MCP routing via AG), US8, SC-012/SC-013, edge cases, key entities - architecture.md: Three-layer enforcement flow, per-agent access resolution, CEL universal engine diagram, deepagent MCP sequence diagram, component summary update - permission-matrix.md: dynamic_agent component rows, per-agent access control section, per-agent roles, CEL enforcement points - plan.md: Phase 11 note, dynamic_agents project structure, CEL complexity tracking, updated summary and task count (88 tasks) - tasks.md: Phase 11 tasks T128-T140 (11A CEL library, 11B Keycloak integration, 11C deepagent MCP routing), FR/SC coverage maps Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
…pecs Clarification session (5 Q/A) for Slack channel-level RBAC: - FR-031: Slack channel-to-team scope mapping — channels act as team selectors, not permission grants; Keycloak roles sole authority - FR-032: Admin UI Slack Management Dashboard with full operational view (user bootstrapping + channel-to-team mapping manager) - Edge cases: role mismatch deny, unlinked channel, stale mapping, disabled Keycloak account, 60s cache TTL for bot performance - Bot talks to Keycloak for identity/auth, MongoDB for team context Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Phase 12 / User Story 9 for Slack channel-level RBAC: - plan.md: Phase 12 note, project structure, task count update - architecture.md: Slack Bot RBAC flow diagram, Admin UI dashboard - permission-matrix.md: slack_channel + slack_admin component rows - tasks.md: Phase 12 tasks T141-T152, FR-031/FR-032 coverage Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Add US8/US9 to Phase Dependencies, User Story Dependencies, Parallel Opportunities, and Implementation Strategy sections. Update MVP scope to include US8 (P1). Update Incremental Delivery to include US9 (P2). Expand Parallel Team Strategy for 4 devs. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Implement Phases 8-12 of the 098 Enterprise RBAC spec: Phase 8 (US6): Admin UI RBAC management - Keycloak Admin REST API client for role/mapper CRUD - BFF API routes for roles, role-mappings, team roles - RolesAccessTab with realm roles, group mappings, team assignments - CreateRoleDialog and GroupRoleMappingDialog components Phase 10 (US7): RAG server Keycloak RBAC + per-KB access - OIDC configuration for RAG server pointing to Keycloak - KeycloakRole constants and KbPermission model - Per-KB role extraction from JWT claims (kb_reader/ingestor/admin:<id>) - Query-time KB filtering via inject_kb_filter - require_kb_access FastAPI dependency for KB-mutating endpoints Phase 11 (US8): Dynamic Agent RBAC + CEL policy engine - Shared CEL evaluator library (Python celpy + TypeScript cel-js) - CEL integration in RAG server rbac.py and BFF api-middleware.ts - KeycloakSyncService for agent resource/role lifecycle - CEL-based access checks replacing code-based can_view/can_use_agent - OBO JWT forwarding through Agent Gateway for MCP calls - Agent Gateway CEL policy configuration Phase 12 (US9): Slack channel-to-team RBAC + Admin dashboard - ChannelTeamMapper with 60s TTL cache and stale-team handling - Channel-to-team RBAC middleware integration in Slack bot - Team context propagation via X-Team-Id header - SlackUsersTab and SlackChannelMappingTab admin components - BFF API routes for Slack user management and channel mappings Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Add Swagger-style API documentation covering all 5 CAIPE services organized into 8 functional domains (FR-034): Markdown reference docs (docs/docs/api/): - Admin & User Management (1,812 lines) - RBAC & Roles (735 lines) - Chat & Conversations (1,153 lines) - RAG & Knowledge Bases (1,031 lines) - Dynamic Agents & MCP (1,132 lines) - Slack Integration (451 lines) - Platform health/config/settings (667 lines) - CAIPE Supervisor Agent (434 lines) OpenAPI 3.1.0 YAML specs (docs/docs/api/openapi/): - bff.yaml (UI BFF, ~73 paths) - rag-server.yaml (RAG Server) - dynamic-agents.yaml (Dynamic Agents) - caipe-supervisor.yaml (A2A Supervisor) Also fixes Next.js build error caused by conflicting dynamic path slugs ([email] vs [id]) by consolidating the user role update route under [id]/role with auto-detection of email vs UUID input. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
…efresh on role change - Add UserDetailModal with full dark theme support - Add UserManagementTab for admin user listing and management - Force-refresh access token after role/team changes so RAG server and other services pick up updated Keycloak realm roles immediately - Update auth-config JWT callback to support forceRefresh trigger - Update Keycloak init-idp.sh, identity_linker, keycloak-admin - Refactor admin page tabs to flex layout for scrollable overflow - Update spec docs (plan, tasks, data-model, research) Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Made-with: Cursor
Duo issues access tokens signed with keys not published in their public JWKS (key ID is a 64-char hex SHA-256 absent from the JWKS endpoint). The rag-server's JWKS validation hard-fails on every Duo access token, blocking the userinfo fetch that follows — which means email, groups, and role mapping are never reached, causing 401 on all caipe-ui requests. When JWKS validation fails for a provider, now attempt to validate the token implicitly via the userinfo endpoint. A 200 response from the provider's userinfo endpoint proves the token is valid; the returned claims (email, groups) flow into the existing RBAC role-mapping logic. This unblocks the full enterprise RBAC path: user sends Bearer token → userinfo returns email + groups → groups map to admin/ingestor/viewer role → per-user access control works correctly. Fixes: #1137 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
uv.lock and package-lock.json files were resolved to main versions during rebase conflict resolution. Regenerate them to include new RBAC dependencies (cel-python, python-jose, motor, langfuse, cel-js, copilotkit) added in pyproject.toml and package.json. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Root workspace uv.lock was stale after rebase — missing cel-python, google-re2, and pendulum added by RBAC pyproject.toml changes. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Signed-off-by: suwhang-cisco <suwhang@cisco.com>
Signed-off-by: suwhang-cisco <suwhang@cisco.com>
Signed-off-by: suwhang-cisco <suwhang@cisco.com>
Signed-off-by: suwhang-cisco <suwhang@cisco.com>
…tion, dynamic agents
Implements the remaining pieces of the enterprise RBAC feature across all
service boundaries so user identity and roles are enforced consistently from
login through to MCP tool execution.
Auth / Keycloak:
- init-idp.sh: create caipe-silent-broker-login flow (idp-create-user-if-unique
+ idp-auto-link, both ALTERNATIVE) so users never see "Account already exists"
prompt on first SSO login via any federated IdP
- init-idp.sh: fix executions endpoint to use flow alias (not ID) — Keycloak
REST API requires alias for POST /flows/{alias}/executions/execution
- init-idp.sh: add slack_user_id to user profile schema + unmanagedAttributePolicy
so custom attributes are not silently dropped by Keycloak 26
- init-idp.sh: ensure chat_user and offline_access are in default-roles-caipe
composite so all brokered users receive them automatically
Supervisor (A2A):
- Add JwtUserContextMiddleware to main.py middleware stack — decodes JWT claims
and stores user identity in a per-request ContextVar
- jwt_user_context_middleware.py: new Starlette BaseHTTPMiddleware
- agent_executor.py: read user identity from get_jwt_user_context() when
ENABLE_USER_INFO_TOOL=true; fall back to "by user:" message prefix
Dynamic Agents:
- auth.py: extract realm_access.roles (Keycloak format) in
extract_groups_from_claims so is_admin and oidc_required_group checks
correctly see Keycloak roles
UI:
- api-middleware.ts: bootstrap admins bypass ALL resource checks (not just
admin resource); pass user: session.user to all requireRbacPermission calls
- RAG proxy routes: add user field so isBootstrapAdmin receives the email
- admin/page.tsx: fix ReferenceError — feedbackUsers not users
- CreateTeamDialog: replace free-text members textarea with searchable
MultiSelect fetching /api/admin/users
Slack Bot:
- app.py: add linking prompt cooldown (SLACK_LINKING_PROMPT_COOLDOWN, default
3600s) to prevent spamming unlinked users
Docs:
- how-rbac-works.md: new architecture reference for junior + security engineers
— badge analogies, component deep-dives, threat model, OBO sequence diagram,
support for Duo SSO / Okta / Entra ID / generic OIDC
- slack-rag-auth-flow.md: Slack → RAG end-to-end auth sequence diagram
- CLAUDE.md: RBAC living documentation rule — keep how-rbac-works.md in sync
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Merge origin/release/0.4.0 into the RBAC branch, resolving conflicts: - agent_runtime.py: keep client_context (0.4.0) + _auth_bearer (our JWT forwarding) - mongo.py: accept 0.4.0 read-only model (CRUD moved to Next.js) - slack_bot/app.py: accept client_context replacing platform_team_id - slack_bot/utils/ai.py: accept event_stream loop pattern from 0.4.0 - docker-compose.dev.yaml: keep RBAC env vars + add SEED_CONFIG_PATH; drop deleted config.yaml volume mount; keep AGENT_GATEWAY_URL Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…nt routing Slack channel routing now points directly to a dynamic agent (1:1) instead of a platform team. The channel_agent_mapper resolves the mapping from MongoDB and performs a basic RBAC check before routing: - global agents: any authenticated user - team agents: user must have team_member:<team> Keycloak realm role - private agents: denied (not appropriate for channel routing) Changes: - New channel_agent_mapper.py: MongoDB-backed resolver with TTL cache + RBAC - route.ts: switched collection/field from team→agent; validates agent exists - SlackChannelMappingTab.tsx: UI now shows dynamic agents dropdown - app.py: _rbac_enrich_context sets channel_agent_id; _get_agent_id takes mapped_agent_id override (DB mapping > channel config > global default) - ai.py: handle_ai_alert_processing accepts mapped_agent_id instead of platform_team_id (removed in 0.4.0 stream_response signature) - how-rbac-works.md: new "Channel → Dynamic Agent Routing" section + file map Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-confluence:feat-comprehensive-rbac-240
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-backstage:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-netutils:feat-comprehensive-rbac-240
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-argocd:feat-comprehensive-rbac-240
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-komodor:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-jira:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-argocd:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-github:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-confluence:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-backstage:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-webex:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-victorops:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-argocd:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-webex:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-pagerduty:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-jira:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/mcp-github:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-victorops:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-gitlab:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-netutils:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-template:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-confluence:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-weather:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-splunk:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-komodor:feat-comprehensive-rbac-241
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-aws:feat-comprehensive-rbac-240
|
🐳 Prebuild Docker Image PublishedAgent: Usagedocker pull ghcr.io/cnoe-io/prebuild/agent-aws:feat-comprehensive-rbac-241
|
Summary
End-to-end RBAC across the platform — identity, authorization, audit, and test infrastructure — landing Specs 098 (enterprise RBAC for Slack + UI), 102 (comprehensive RBAC tests & completion), 103, and 104 (team-scoped RBAC).
Identity & Auth
DA_REQUIRE_BEARER=trueby default — outlier BFF callers migratedAuthorization
acl_tagsfilter — Spec 102 §1.3cel-js) for client-side gatingRouting
client_contextObservability & Ops
how-rbac-works.md— Spec 102 US8Platform changes pulled in via merges
Docs & Migration
docs/docs/security/rbac/(architecture / workflows / file-map / usage)scripts/migrations/0.5.0/RUN.md)Stats
Test plan
make lint— cleanmake test— all suites pass (supervisor + multi-agents + agents)make caipe-ui-tests— UI Jest tests passmake test-rbac— Spec 102 RBAC matrix passesDA_REQUIRE_BEARER=trueacl_tagswhen enabled🤖 Assisted-by: Claude:claude-opus-4-7