chore: Phase 2 OpenSpec scoping — proposal, design, specs, tasks

8 workstreams scoped per OpenSpec standards: 1. HashiCorp Vault integration (secret management) 2. Python SDK (sentryagent-idp) 3. Go SDK (idp-sdk-go) 4. Java SDK (ai.sentryagent:idp-sdk) 5. OPA policy engine (dynamic ABAC, hot-reload Rego) 6. Web Dashboard UI (React 18 + TypeScript) 7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard) 8. Multi-region Terraform deployment (AWS + GCP) Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5) before any implementation begins. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 14:53:09 +00:00
parent 0d56895fae
commit 7593bfe1c1
12 changed files with 682 additions and 0 deletions
--- a/openspec/changes/phase-2-production-ready/.openspec.yaml
+++ b/openspec/changes/phase-2-production-ready/.openspec.yaml
@@ -0,0 +1,3 @@
 change: phase-2-production-ready
 status: proposed
 date: 2026-03-28
--- a/openspec/changes/phase-2-production-ready/design.md
+++ b/openspec/changes/phase-2-production-ready/design.md
@@ -0,0 +1,218 @@
 # Phase 2: Production-Ready — Technical Design
 **Date**: 2026-03-28
 **Author**: Virtual Architect
 **Status**: Draft — pending CEO approval of proposal
 ---
 ## 1. HashiCorp Vault Integration
 ### Architecture
 ```
 AgentIdP Server
  └── CredentialService
        └── VaultClient (new)
              └── HashiCorp Vault (sidecar or external)
                    └── KV Secrets Engine v2
 ```
 ### Design Decisions
 **ADR-001: Vault over AWS KMS/GCP Secret Manager**
 Vault is cloud-agnostic, open-source, and already standard in enterprise environments. Using Vault keeps Phase 2 cloud-provider independent.
 **ADR-002: KV Secrets Engine v2**
 KV v2 provides versioned secrets and metadata. When a credential is rotated, the old version is retained in Vault history, enabling audit-grade secret lifecycle tracking.
 **ADR-003: AgentIdP stores Vault path, not secret**
 `credentials.vault_path` stores the Vault KV path (e.g. `secret/agentidp/agents/{agentId}/credentials/{credentialId}`). The secret itself is never written to PostgreSQL.
 ### New environment variables
 | Variable | Description |
 |----------|-------------|
 | `VAULT_ADDR` | Vault server address |
 | `VAULT_TOKEN` | Vault root/service token |
 | `VAULT_MOUNT` | KV mount path (default: `secret`) |
 ### Migration
 Add `vault_path` column to `credentials` table (`005_add_vault_path.sql`). Existing credentials retain bcrypt hashes; new credentials use Vault. Both code paths coexist until all credentials are rotated (migration guide provided).
 ---
 ## 2. Multi-Language SDKs
 ### Shared contract (all SDKs implement identically)
 ```
 AgentIdPClient(baseUrl, clientId, clientSecret, scopes?)
  .agents     → AgentRegistryClient   (5 methods)
  .credentials → CredentialClient     (4 methods)
  .tokens     → TokenClient           (2 methods)
  .audit      → AuditClient           (2 methods)
  .clearTokenCache()
 TokenManager — auto-refresh 60s before expiry
 AgentIdPError — code, message, httpStatus, details
 ```
 ### Python SDK (`sentryagent-idp`)
 - Python 3.9+ (httpx for async, requests for sync)
 - Both sync and async client variants
 - PyPI package: `sentryagent-idp`
 - Type hints throughout (`mypy --strict` clean)
 ### Go SDK (`github.com/sentryagent/idp-sdk-go`)
 - Go 1.21+, standard library `net/http`
 - Context-aware methods (`context.Context` first arg)
 - Idiomatic Go error handling (`error` return, no panic)
 - Go module: `github.com/sentryagent/idp-sdk-go`
 ### Java SDK (`ai.sentryagent:idp-sdk`)
 - Java 17+, Apache HttpClient 5
 - Synchronous and CompletableFuture async variants
 - Maven Central: `ai.sentryagent:idp-sdk`
 - Fully typed with generics
 ---
 ## 3. OPA Policy Engine
 ### Architecture
 ```
 HTTP Request
  → Auth Middleware (JWT verify) — unchanged
  → OPA Middleware (new) — evaluates policy
      → OPA Wasm (embedded, no network call)
          → Rego policy files (hot-reloadable)
  → Controller
 ```
 ### Design Decisions
 **ADR-004: OPA Wasm over OPA sidecar**
 Embedding OPA as Wasm in the Node.js process eliminates a network hop and removes a runtime dependency. Policy files are loaded from `policies/` directory at startup and reloaded on SIGHUP.
 **ADR-005: Policy replaces, does not wrap, scope check**
 The existing static scope check in `auth.ts` is replaced by an OPA policy evaluation. This keeps the policy as the single source of truth for access control.
 ### Policy structure (`policies/`)
 ```
 policies/
  authz.rego          — main policy: allow/deny
  data/
    scopes.json       — scope → permission mapping
 ```
 ---
 ## 4. Web Dashboard UI
 ### Architecture
 ```
 dashboard/            (new — separate from sdk/)
  src/
    components/       — reusable UI components
    pages/            — Agents, Credentials, Audit, Health
    hooks/            — useAgents, useCredentials, useAudit
    lib/
      client.ts       — wraps @sentryagent/idp-sdk
      auth.ts         — credential entry and storage
 ```
 ### Tech Stack
 - React 18 + TypeScript strict
 - Vite 5 (build tool)
 - TanStack Query v5 (server state)
 - shadcn/ui components (Radix UI + Tailwind CSS)
 ### Pages
 | Page | Scope Required | Features |
 |------|---------------|----------|
 | Agents | `agents:read` | List, search, view detail, suspend/reactivate |
 | Credentials | `agents:read` | List credentials per agent, rotate, revoke |
 | Audit Log | `audit:read` | Filter by agent/action/outcome/date, paginate |
 | Health | None | Server uptime, Redis/PostgreSQL connectivity |
 ### Authentication
 The dashboard accepts `clientId` + `clientSecret` via a login form. The `@sentryagent/idp-sdk` `TokenManager` handles token acquisition and caching in `sessionStorage`. No backend session — all state is client-side.
 ---
 ## 5. Prometheus + Grafana Monitoring
 ### Metrics exposed at `GET /metrics`
 | Metric | Type | Description |
 |--------|------|-------------|
 | `agentidp_tokens_issued_total` | Counter | Tokens issued, labelled by outcome |
 | `agentidp_agents_registered_total` | Counter | Agent registrations |
 | `agentidp_http_requests_total` | Counter | All requests, labelled by method/path/status |
 | `agentidp_http_request_duration_seconds` | Histogram | Request latency |
 | `agentidp_rate_limit_rejections_total` | Counter | 429 responses |
 | `agentidp_db_query_duration_seconds` | Histogram | PostgreSQL query latency |
 | `agentidp_redis_command_duration_seconds` | Histogram | Redis command latency |
 ### Grafana dashboard
 Pre-built JSON dashboard shipped in `monitoring/grafana/dashboards/agentidp.json`. Auto-provisioned via `monitoring/grafana/provisioning/`.
 ### Docker Compose extension
 Add `prometheus` and `grafana` services to a `docker-compose.monitoring.yml` overlay — keeps the base `docker-compose.yml` clean for developers who don't need monitoring.
 ---
 ## 6. Multi-Region Deployment (Terraform)
 ### Structure
 ```
 terraform/
  modules/
    agentidp/         — reusable module: compute + networking
    rds/              — managed PostgreSQL
    redis/            — managed Redis
    lb/               — load balancer + TLS
  environments/
    aws/              — AWS-specific config (ECS + RDS + ElastiCache)
    gcp/              — GCP-specific config (Cloud Run + Cloud SQL + Memorystore)
 ```
 ### Design Decisions
 **ADR-006: Two provider targets (AWS + GCP) in Phase 2**
 AWS and GCP cover the majority of developer deployments. Azure module is Phase 3. Each environment is a thin wrapper over the shared `agentidp` module.
 **ADR-007: Terraform over Pulumi/CDK**
 Terraform is the most widely-used IaC tool, familiar to most DevOps teams. The HCL syntax is simpler for documentation purposes.
 ---
 ## Component Interaction Map (Phase 2)
 ```
                      ┌────────────────────┐
                      │   Web Dashboard    │
                      │  (React + Vite)    │
                      └────────┬───────────┘
                               │ HTTPS
              ┌────────────────▼────────────────┐
              │         AgentIdP Server         │
              │  Auth MW → OPA MW → Controllers │
              │  /metrics (prom-client)         │
              └──┬──────────┬──────────┬────────┘
                 │          │          │
           ┌─────▼──┐  ┌────▼───┐  ┌──▼───────┐
           │Postgres│  │ Redis  │  │  Vault   │
           └────────┘  └────────┘  └──────────┘
                 │
        ┌────────▼────────┐
        │   Prometheus    │
        └────────┬────────┘
                 │
        ┌────────▼────────┐
        │    Grafana      │
        └─────────────────┘
 ```
--- a/openspec/changes/phase-2-production-ready/proposal.md
+++ b/openspec/changes/phase-2-production-ready/proposal.md
@@ -0,0 +1,96 @@
 # Phase 2: Production-Ready — Change Proposal
 **Date**: 2026-03-28
 **Author**: Virtual CTO
 **Status**: Proposed — awaiting CEO approval
 ---
 ## Summary
 Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 makes it production-ready: hardened secrets management, multi-language SDKs, a policy engine, a web dashboard, observability, and multi-region deployment.
 ---
 ## Problem Statement
 Phase 1 is functional but has the following production gaps:
 | Gap | Risk |
 |-----|------|
 | Credentials stored as bcrypt hashes in PostgreSQL | No HSM/KMS — acceptable for MVP, not for enterprise |
 | Only Node.js SDK | Developers in Python/Go/Java cannot use the SDK |
 | No policy engine | Scope enforcement is static — no dynamic ABAC/RBAC |
 | No web UI | Operators must use `curl` to manage agents |
 | No observability | No metrics, no dashboards, no alerting |
 | Single-region deployment | No HA, no geo-redundancy |
 ---
 ## Proposed Changes
 ### 1. HashiCorp Vault Integration
 Replace raw bcrypt credential storage with Vault-backed secret management. Vault handles secret generation, versioning, and revocation. AgentIdP stores only Vault secret paths, not the secrets themselves.
 ### 2. Multi-Language SDKs
 Add Python, Go, and Java SDKs with identical API surface to the existing Node.js SDK: `AgentIdPClient`, `TokenManager`, service clients for all 14 endpoints, typed error hierarchy.
 ### 3. Advanced Policy Engine (OPA)
 Integrate Open Policy Agent (OPA) as a sidecar for dynamic scope and attribute-based access control. Policies are hot-reloadable Rego files — no server restart required.
 ### 4. Web Dashboard UI
 A React + TypeScript dashboard for operators: agent list and management, credential overview, audit log viewer, system health panel. Read-only by default; write operations require `agents:write` scope.
 ### 5. Prometheus + Grafana Monitoring
 Instrument all services with Prometheus metrics (`/metrics` endpoint). Ship a pre-built Grafana dashboard for: token issuance rate, agent registration rate, error rates, Redis latency, PostgreSQL query latency.
 ### 6. Multi-Region Deployment
 Terraform modules for AWS/GCP deployment with: managed PostgreSQL (RDS/Cloud SQL), managed Redis (ElastiCache/Memorystore), container orchestration (ECS/Cloud Run), load balancer, and a deployment guide.
 ---
 ## Out of Scope for Phase 2
 - AGNTCY federation (Phase 3)
 - W3C DID support (Phase 3)
 - SOC 2 certification (Phase 3)
 - Rust/C++ SDKs (Phase 3)
 ---
 ## Dependencies
 | New Dependency | Purpose | CEO Approval Required |
 |---------------|---------|----------------------|
 | `@openpolicyagent/opa-wasm` | OPA policy evaluation | Yes |
 | `node-vault` | HashiCorp Vault client | Yes |
 | React 18 + Vite | Web dashboard | Yes |
 | `prom-client` | Prometheus metrics | Yes |
 | Terraform | Infrastructure as code | Yes |
 ---
 ## Delivery Sequence (per OpenSpec spec-first workflow)
 ```
 1. Vault integration (highest security impact)
 2. Python SDK (highest developer demand)
 3. Go SDK
 4. Java SDK
 5. OPA policy engine
 6. Web dashboard UI
 7. Prometheus + Grafana monitoring
 8. Multi-region deployment (Terraform)
 ```
 ---
 ## Success Criteria
 - All new dependencies CEO-approved before implementation begins
 - All new API endpoints have OpenAPI 3.0 specs before implementation
 - TypeScript strict mode + zero `any` maintained throughout
 - >80% test coverage on all new services
 - All SDKs pass the same QA gate: 14-endpoint coverage, typed errors, zero `any`
 - Web dashboard passes OWASP Top 10 security review
 - Monitoring stack ships with pre-built dashboards — zero manual setup required
--- a/openspec/changes/phase-2-production-ready/specs/deployment/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/deployment/spec.md
@@ -0,0 +1,44 @@
 # Spec: Multi-Region Deployment (Terraform)
 **Status**: Pending CEO approval
 **Workstream**: 8 of 8
 ## Scope
 - `terraform/` directory at project root
 - Shared `agentidp` module (compute, networking, secrets)
 - `environments/aws/` — ECS Fargate + RDS PostgreSQL + ElastiCache Redis
 - `environments/gcp/` — Cloud Run + Cloud SQL + Memorystore Redis
 - Deployment guide: `docs/devops/deployment.md`
 ## Module structure
 ```
 terraform/
  modules/
    agentidp/
      main.tf       — compute (ECS task or Cloud Run service)
      networking.tf — VPC, subnets, security groups
      variables.tf  — all configurable inputs
      outputs.tf    — service URL, DB endpoint, Redis endpoint
    rds/            — managed PostgreSQL
    redis/          — managed Redis
    lb/             — ALB (AWS) or Cloud LB (GCP), TLS cert
  environments/
    aws/
      main.tf       — calls modules, sets AWS-specific vars
      variables.tf
      terraform.tfvars.example
    gcp/
      main.tf
      variables.tf
      terraform.tfvars.example
 ```
 ## Acceptance Criteria
 - [ ] `terraform validate` passes for both aws and gcp environments
 - [ ] `terraform plan` produces no errors against a live AWS/GCP account (test in dev env)
 - [ ] JWT_PRIVATE_KEY and JWT_PUBLIC_KEY injected as environment secrets (not hardcoded)
 - [ ] TLS termination at load balancer — HTTPS only in production modules
 - [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
 - [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
 - [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control
--- a/openspec/changes/phase-2-production-ready/specs/go-sdk/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/go-sdk/spec.md
@@ -0,0 +1,23 @@
 # Spec: Go SDK (`github.com/sentryagent/idp-sdk-go`)
 **Status**: Pending CEO approval
 **Workstream**: 3 of 8
 ## Scope
 - `sdk-go/` directory at project root
 - Context-aware `AgentIdPClient` using standard library `net/http`
 - `TokenManager` with mutex-guarded cache and 60s auto-refresh
 - Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
 - Idiomatic Go error type `AgentIdPError` implementing `error` interface
 - `go.mod` module: `github.com/sentryagent/idp-sdk-go`
 - `sdk-go/README.md`
 ## Acceptance Criteria
 - [ ] All 14 endpoints covered
 - [ ] All methods take `context.Context` as first argument
 - [ ] No panics — all errors returned as `error`
 - [ ] `AgentIdPError` implements `error` and exposes `.Code`, `.HTTPStatus`, `.Details`
 - [ ] `TokenManager` is goroutine-safe (`sync.Mutex` on cache)
 - [ ] `go vet` and `staticcheck` pass with zero warnings
 - [ ] `go test ./...` with >80% coverage
 - [ ] README matches Node.js SDK structure
--- a/openspec/changes/phase-2-production-ready/specs/java-sdk/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/java-sdk/spec.md
@@ -0,0 +1,23 @@
 # Spec: Java SDK (`ai.sentryagent:idp-sdk`)
 **Status**: Pending CEO approval
 **Workstream**: 4 of 8
 ## Scope
 - `sdk-java/` directory at project root
 - `AgentIdPClient` with sync and `CompletableFuture` async variants
 - `TokenManager` with thread-safe cache and 60s auto-refresh
 - Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
 - `AgentIdPException` extending `RuntimeException` with `code`, `httpStatus`, `details`
 - `pom.xml`: groupId=`ai.sentryagent`, artifactId=`idp-sdk`, Java 17+
 - `sdk-java/README.md`
 ## Acceptance Criteria
 - [ ] All 14 endpoints covered
 - [ ] Sync methods return typed POJOs; async methods return `CompletableFuture<T>`
 - [ ] `AgentIdPException` thrown (not raw IOException) on all failure paths
 - [ ] `TokenManager` is thread-safe (`synchronized` on cache)
 - [ ] Apache HttpClient 5 for HTTP transport
 - [ ] Jackson for JSON serialization
 - [ ] `mvn verify` passes with >80% coverage (JUnit 5)
 - [ ] README matches Node.js SDK structure
--- a/openspec/changes/phase-2-production-ready/specs/monitoring/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/monitoring/spec.md
@@ -0,0 +1,32 @@
 # Spec: Prometheus + Grafana Monitoring
 **Status**: Pending CEO approval
 **Workstream**: 7 of 8
 ## Scope
 - `prom-client` integration — expose `GET /metrics`
 - 7 metrics (counters + histograms) across all services
 - `monitoring/` directory: Prometheus config + Grafana provisioning
 - `docker-compose.monitoring.yml` overlay (adds prometheus + grafana services)
 - Pre-built Grafana dashboard JSON (`monitoring/grafana/dashboards/agentidp.json`)
 ## Metrics
 | Metric | Type | Labels |
 |--------|------|--------|
 | `agentidp_tokens_issued_total` | Counter | `outcome` (success/failure) |
 | `agentidp_agents_registered_total` | Counter | `outcome` |
 | `agentidp_http_requests_total` | Counter | `method`, `path`, `status_code` |
 | `agentidp_http_request_duration_seconds` | Histogram | `method`, `path` |
 | `agentidp_rate_limit_rejections_total` | Counter | — |
 | `agentidp_db_query_duration_seconds` | Histogram | `operation` |
 | `agentidp_redis_command_duration_seconds` | Histogram | `command` |
 ## Acceptance Criteria
 - [ ] `GET /metrics` returns Prometheus text format
 - [ ] `/metrics` endpoint does NOT require Bearer auth (Prometheus scrapes it)
 - [ ] All 7 metrics present and updating under load
 - [ ] Grafana dashboard auto-provisions on `docker compose -f docker-compose.monitoring.yml up`
 - [ ] Grafana runs on port 3001 (no conflict with AgentIdP on 3000)
 - [ ] `docs/devops/operations.md` updated with monitoring section
 - [ ] `prom-client` added as new dependency — CEO approval gate
--- a/openspec/changes/phase-2-production-ready/specs/opa-policy/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/opa-policy/spec.md
@@ -0,0 +1,37 @@
 # Spec: OPA Policy Engine Integration
 **Status**: Pending CEO approval
 **Workstream**: 5 of 8
 ## Scope
 - New `OpaMiddleware` replacing static scope check in `auth.ts`
 - `@openpolicyagent/opa-wasm` integration (embedded Wasm, no sidecar)
 - `policies/authz.rego` — main allow/deny policy
 - `policies/data/scopes.json` — scope to permission mapping
 - SIGHUP handler to hot-reload policies without restart
 - New env var: `POLICY_DIR` (default: `./policies`)
 ## Policy interface
 ```
 input = {
  "method": "GET",
  "path": "/api/v1/agents",
  "scopes": ["agents:read"],
  "agentId": "uuid"
 }
 output = {
  "allow": true | false,
  "reason": "string"   // populated when allow=false
 }
 ```
 ## Acceptance Criteria
 - [ ] All existing scope checks replaced by OPA evaluation
 - [ ] Policy files hot-reloadable on SIGHUP (no restart required)
 - [ ] OPA Wasm loaded at startup — fail-fast if `POLICY_DIR` invalid
 - [ ] `allow=false` responses return `403` with `reason` in error body
 - [ ] Existing test suite passes unchanged (OPA evaluates same rules as before)
 - [ ] New unit tests for OPA middleware: allow/deny cases, missing scope, invalid input
 - [ ] `POLICY_DIR` env var documented in `docs/devops/environment-variables.md`
--- a/openspec/changes/phase-2-production-ready/specs/python-sdk/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/python-sdk/spec.md
@@ -0,0 +1,24 @@
 # Spec: Python SDK (`sentryagent-idp`)
 **Status**: Pending CEO approval
 **Workstream**: 2 of 8
 ## Scope
 - `sdk-python/` directory at project root
 - `AgentIdPClient` with sync and async variants
 - `TokenManager` with 60s auto-refresh
 - Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
 - `AgentIdPError` typed exception
 - Full type hints — `mypy --strict` clean
 - `sdk-python/README.md` with installation and usage
 ## Acceptance Criteria
 - [ ] All 14 API endpoints covered
 - [ ] Sync client: `requests` library
 - [ ] Async client: `httpx` library
 - [ ] `mypy --strict` passes with zero errors
 - [ ] Zero untyped code
 - [ ] `AgentIdPError` raised (not raw requests/httpx exceptions) on all failure paths
 - [ ] `TokenManager` tested: caches token, refreshes at exp-60s
 - [ ] `pyproject.toml` with: name=sentryagent-idp, python>=3.9, dependencies declared
 - [ ] README matches Node.js SDK structure
--- a/openspec/changes/phase-2-production-ready/specs/vault/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/vault/spec.md
@@ -0,0 +1,21 @@
 # Spec: HashiCorp Vault Integration
 **Status**: Pending CEO approval
 **Workstream**: 1 of 8
 ## Scope
 - VaultClient class wrapping `node-vault`
 - `005_add_vault_path.sql` migration
 - Updated CredentialService to write secrets to Vault instead of PostgreSQL
 - New env vars: VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
 - Migration guide: bcrypt → Vault coexistence strategy
 ## Acceptance Criteria
 - [ ] New credentials: secret written to Vault KV v2, `vault_path` stored in PostgreSQL
 - [ ] Credential rotation: Vault versioned update, `vault_path` unchanged
 - [ ] Credential revocation: Vault secret deleted, DB status = `revoked`
 - [ ] Existing bcrypt credentials continue to work until rotated
 - [ ] VaultClient follows existing service interface pattern (DRY, SOLID)
 - [ ] Zero `any` types, TypeScript strict
 - [ ] `VAULT_ADDR` / `VAULT_TOKEN` validation at startup (fail-fast)
 - [ ] DevOps docs updated with Vault setup section
--- a/openspec/changes/phase-2-production-ready/specs/web-dashboard/spec.md
+++ b/openspec/changes/phase-2-production-ready/specs/web-dashboard/spec.md
@@ -0,0 +1,34 @@
 # Spec: Web Dashboard UI
 **Status**: Pending CEO approval
 **Workstream**: 6 of 8
 ## Scope
 - `dashboard/` directory at project root
 - React 18 + TypeScript strict, built with Vite 5
 - TanStack Query v5 for server state
 - shadcn/ui (Radix UI + Tailwind CSS) for components
 - Four pages: Agents, Credentials, Audit Log, Health
 - Client-side auth: `clientId` + `clientSecret` → `TokenManager`
 - Served from AgentIdP server at `GET /dashboard` (static build)
 ## Pages
 | Page | Route | Scope Required |
 |------|-------|---------------|
 | Login | `/dashboard/login` | None |
 | Agents | `/dashboard/agents` | `agents:read` |
 | Agent Detail | `/dashboard/agents/:id` | `agents:read` |
 | Credentials | `/dashboard/agents/:id/credentials` | `agents:read` |
 | Audit Log | `/dashboard/audit` | `audit:read` |
 | Health | `/dashboard/health` | None |
 ## Acceptance Criteria
 - [ ] TypeScript strict — zero `any` across all dashboard files
 - [ ] `dashboard/tsconfig.json` with `strict: true`
 - [ ] Login form stores token in `sessionStorage` only (not `localStorage`)
 - [ ] All write operations (suspend, revoke, rotate) require confirmation dialog
 - [ ] OWASP Top 10 review: no XSS, no CSRF, no sensitive data in URL params
 - [ ] Vite build outputs to `dashboard/dist/`; AgentIdP serves it as static
 - [ ] `dashboard/README.md` — how to build and serve
 - [ ] Responsive layout — functional on desktop and tablet
--- a/openspec/changes/phase-2-production-ready/tasks.md
+++ b/openspec/changes/phase-2-production-ready/tasks.md
@@ -0,0 +1,127 @@
 # Phase 2: Production-Ready — Tasks
 **Status**: Awaiting CEO dependency approvals before any implementation begins.
 ## CEO Approval Gates (required before implementation)
 - [ ] A0.1 Approve dependency: `node-vault` (Vault integration)
 - [ ] A0.2 Approve dependency: `@openpolicyagent/opa-wasm` (OPA policy engine)
 - [ ] A0.3 Approve dependency: React 18 + Vite 5 (web dashboard)
 - [ ] A0.4 Approve dependency: `prom-client` (Prometheus metrics)
 - [ ] A0.5 Approve dependency: Terraform (infrastructure as code)
 ---
 ## Workstream 1: HashiCorp Vault Integration
 - [ ] 1.1 Write `src/vault/VaultClient.ts` — wraps `node-vault`; methods: writeSecret, readSecret, deleteSecret, rotateSecret
 - [ ] 1.2 Write `src/db/migrations/005_add_vault_path.sql` — add `vault_path` column to `credentials`
 - [ ] 1.3 Update `CredentialService.ts` — new credentials use Vault; existing bcrypt credentials continue to work
 - [ ] 1.4 Update `docs/devops/environment-variables.md` — add VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
 - [ ] 1.5 Write `docs/devops/vault-setup.md` — Vault dev server setup, production Vault config, migration guide
 - [ ] 1.6 Write unit tests for VaultClient (mocked Vault) and updated CredentialService
 - [ ] 1.7 QA sign-off: zero `any`, TypeScript strict, >80% coverage, coexistence verified
 ## Workstream 2: Python SDK
 - [ ] 2.1 Create `sdk-python/` with `pyproject.toml` — name: sentryagent-idp, python>=3.9
 - [ ] 2.2 Write `sdk-python/src/sentryagent_idp/types.py` — all request/response dataclasses
 - [ ] 2.3 Write `sdk-python/src/sentryagent_idp/errors.py` — AgentIdPError exception
 - [ ] 2.4 Write `sdk-python/src/sentryagent_idp/token_manager.py` — sync TokenManager
 - [ ] 2.5 Write `sdk-python/src/sentryagent_idp/async_token_manager.py` — async TokenManager (httpx)
 - [ ] 2.6 Write `sdk-python/src/sentryagent_idp/services/agents.py` — AgentRegistryClient (sync + async)
 - [ ] 2.7 Write `sdk-python/src/sentryagent_idp/services/credentials.py` — CredentialClient (sync + async)
 - [ ] 2.8 Write `sdk-python/src/sentryagent_idp/services/token.py` — TokenClient (sync + async)
 - [ ] 2.9 Write `sdk-python/src/sentryagent_idp/services/audit.py` — AuditClient (sync + async)
 - [ ] 2.10 Write `sdk-python/src/sentryagent_idp/client.py` — AgentIdPClient (sync) + AsyncAgentIdPClient
 - [ ] 2.11 Write `sdk-python/src/sentryagent_idp/__init__.py` — barrel exports
 - [ ] 2.12 Write `sdk-python/README.md`
 - [ ] 2.13 QA: `mypy --strict` clean, all 14 endpoints, AgentIdPError on all failure paths, pytest >80%
 ## Workstream 3: Go SDK
 - [ ] 3.1 Create `sdk-go/` with `go.mod` — module: github.com/sentryagent/idp-sdk-go, go 1.21
 - [ ] 3.2 Write `sdk-go/types.go` — all request/response structs
 - [ ] 3.3 Write `sdk-go/errors.go` — AgentIdPError type implementing error interface
 - [ ] 3.4 Write `sdk-go/token_manager.go` — mutex-guarded TokenManager
 - [ ] 3.5 Write `sdk-go/services/agents.go` — AgentRegistryClient
 - [ ] 3.6 Write `sdk-go/services/credentials.go` — CredentialClient
 - [ ] 3.7 Write `sdk-go/services/token.go` — TokenClient
 - [ ] 3.8 Write `sdk-go/services/audit.go` — AuditClient
 - [ ] 3.9 Write `sdk-go/client.go` — AgentIdPClient
 - [ ] 3.10 Write `sdk-go/README.md`
 - [ ] 3.11 QA: `go vet` clean, `staticcheck` clean, all 14 endpoints, goroutine-safe, `go test ./...` >80%
 ## Workstream 4: Java SDK
 - [ ] 4.1 Create `sdk-java/` with `pom.xml` — groupId: ai.sentryagent, artifactId: idp-sdk, Java 17
 - [ ] 4.2 Write all POJO request/response model classes
 - [ ] 4.3 Write `AgentIdPException.java` extending RuntimeException
 - [ ] 4.4 Write `TokenManager.java` — synchronized cache with 60s refresh buffer
 - [ ] 4.5 Write `AgentRegistryClient.java` — sync + CompletableFuture methods
 - [ ] 4.6 Write `CredentialClient.java` — sync + CompletableFuture methods
 - [ ] 4.7 Write `TokenClient.java` — sync + CompletableFuture methods
 - [ ] 4.8 Write `AuditClient.java` — sync + CompletableFuture methods
 - [ ] 4.9 Write `AgentIdPClient.java` — composes all service clients
 - [ ] 4.10 Write `sdk-java/README.md`
 - [ ] 4.11 QA: `mvn verify` passes, all 14 endpoints, AgentIdPException on all failure paths, JUnit 5 >80%
 ## Workstream 5: OPA Policy Engine
 - [ ] 5.1 Write `policies/authz.rego` — allow/deny rules matching all current scope checks
 - [ ] 5.2 Write `policies/data/scopes.json` — scope to endpoint permission mapping
 - [ ] 5.3 Write `src/middleware/opa.ts` — OpaMiddleware: loads Wasm, evaluates input, returns allow/deny
 - [ ] 5.4 Replace static scope check in `src/middleware/auth.ts` with OpaMiddleware
 - [ ] 5.5 Add SIGHUP handler in `src/server.ts` to hot-reload policy files
 - [ ] 5.6 Update `docs/devops/environment-variables.md` — add POLICY_DIR
 - [ ] 5.7 QA: all existing auth tests pass unchanged, new OPA unit tests, hot-reload verified
 ## Workstream 6: Web Dashboard UI
 - [ ] 6.1 Create `dashboard/` with Vite 5 + React 18 + TypeScript strict configuration
 - [ ] 6.2 Set up shadcn/ui with Tailwind CSS
 - [ ] 6.3 Write `dashboard/src/lib/auth.ts` — credential entry, TokenManager, sessionStorage
 - [ ] 6.4 Write `dashboard/src/lib/client.ts` — wraps @sentryagent/idp-sdk AgentIdPClient
 - [ ] 6.5 Write Login page (`/dashboard/login`)
 - [ ] 6.6 Write Agents page (`/dashboard/agents`) — list, search, filter by status
 - [ ] 6.7 Write Agent Detail page (`/dashboard/agents/:id`) — suspend/reactivate with confirm dialog
 - [ ] 6.8 Write Credentials page (`/dashboard/agents/:id/credentials`) — rotate/revoke with confirm
 - [ ] 6.9 Write Audit Log page (`/dashboard/audit`) — filters, pagination
 - [ ] 6.10 Write Health page (`/dashboard/health`) — PostgreSQL + Redis connectivity status
 - [ ] 6.11 Configure AgentIdP Express app to serve `dashboard/dist/` at `/dashboard`
 - [ ] 6.12 Write `dashboard/README.md`
 - [ ] 6.13 QA: TypeScript strict, zero `any`, OWASP Top 10 review, responsive layout verified
 ## Workstream 7: Prometheus + Grafana Monitoring
 - [ ] 7.1 Add `prom-client` to dependencies (after CEO approval A0.4)
 - [ ] 7.2 Write `src/metrics/registry.ts` — shared Prometheus Registry with all 7 metric definitions
 - [ ] 7.3 Instrument `OAuth2Service.ts` — increment `agentidp_tokens_issued_total`
 - [ ] 7.4 Instrument `AgentService.ts` — increment `agentidp_agents_registered_total`
 - [ ] 7.5 Instrument `src/middleware/` — HTTP request counter and duration histogram
 - [ ] 7.6 Instrument `src/db/pool.ts` — DB query duration histogram
 - [ ] 7.7 Instrument `src/cache/redis.ts` — Redis command duration histogram
 - [ ] 7.8 Add `GET /metrics` route (unauthenticated, Prometheus text format)
 - [ ] 7.9 Write `monitoring/prometheus/prometheus.yml` — scrape config
 - [ ] 7.10 Write `monitoring/grafana/provisioning/` — datasource + dashboard provisioning
 - [ ] 7.11 Write `monitoring/grafana/dashboards/agentidp.json` — pre-built Grafana dashboard
 - [ ] 7.12 Write `docker-compose.monitoring.yml` overlay
 - [ ] 7.13 Update `docs/devops/operations.md` — monitoring section
 - [ ] 7.14 QA: all 7 metrics verified under load, Grafana auto-provisions, no auth leak on /metrics
 ## Workstream 8: Multi-Region Deployment (Terraform)
 - [ ] 8.1 Write `terraform/modules/agentidp/main.tf` + `variables.tf` + `outputs.tf`
 - [ ] 8.2 Write `terraform/modules/rds/` — managed PostgreSQL module
 - [ ] 8.3 Write `terraform/modules/redis/` — managed Redis module
 - [ ] 8.4 Write `terraform/modules/lb/` — load balancer + TLS module
 - [ ] 8.5 Write `terraform/environments/aws/main.tf` + `variables.tf` + `terraform.tfvars.example`
 - [ ] 8.6 Write `terraform/environments/gcp/main.tf` + `variables.tf` + `terraform.tfvars.example`
 - [ ] 8.7 Write `docs/devops/deployment.md` — end-to-end AWS and GCP deployment walkthrough
 - [ ] 8.8 QA: `terraform validate` passes, secrets not hardcoded, TLS enforced, DB/Redis VPC-internal
 ---
 ## Phase 2 Complete Criteria
 All 8 workstreams done. All tasks checked. All QA gates passed. CEO reviewed.