# Phase 2: Production-Ready — Tasks **Status**: In progress — Workstreams 1, 2, 3, 4 complete. ## CEO Approval Gates (required before implementation) - [x] A0.1 Approve dependency: `node-vault` (Vault integration) - [x] A0.2 Approve dependency: `@openpolicyagent/opa-wasm` (OPA policy engine) - [x] A0.3 Approve dependency: React 18 + Vite 5 (web dashboard) - [x] A0.4 Approve dependency: `prom-client` (Prometheus metrics) - [x] A0.5 Approve dependency: Terraform (infrastructure as code) --- ## Workstream 1: HashiCorp Vault Integration - [x] 1.1 Write `src/vault/VaultClient.ts` — wraps `node-vault`; methods: writeSecret, readSecret, deleteSecret, verifySecret - [x] 1.2 Write `src/db/migrations/005_add_vault_path.sql` — add `vault_path` column to `credentials` - [x] 1.3 Update `CredentialService.ts` — new credentials use Vault; existing bcrypt credentials continue to work - [x] 1.4 Update `docs/devops/environment-variables.md` — add VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT - [x] 1.5 Write `docs/devops/vault-setup.md` — Vault dev server setup, production Vault config, migration guide - [x] 1.6 Write unit tests for VaultClient (mocked Vault) and updated CredentialService - [x] 1.7 QA sign-off: zero `any`, TypeScript strict, >80% coverage, coexistence verified ## Workstream 2: Python SDK - [x] 2.1 Create `sdk-python/` with `pyproject.toml` — name: sentryagent-idp, python>=3.9 - [x] 2.2 Write `sdk-python/src/sentryagent_idp/types.py` — all request/response dataclasses - [x] 2.3 Write `sdk-python/src/sentryagent_idp/errors.py` — AgentIdPError exception - [x] 2.4 Write `sdk-python/src/sentryagent_idp/token_manager.py` — sync TokenManager - [x] 2.5 Write `sdk-python/src/sentryagent_idp/async_token_manager.py` — async TokenManager (httpx) - [x] 2.6 Write `sdk-python/src/sentryagent_idp/services/agents.py` — AgentRegistryClient (sync + async) - [x] 2.7 Write `sdk-python/src/sentryagent_idp/services/credentials.py` — CredentialClient (sync + async) - [x] 2.8 Write `sdk-python/src/sentryagent_idp/services/token.py` — TokenClient (sync + async) - [x] 2.9 Write `sdk-python/src/sentryagent_idp/services/audit.py` — AuditClient (sync + async) - [x] 2.10 Write `sdk-python/src/sentryagent_idp/client.py` — AgentIdPClient (sync) + AsyncAgentIdPClient - [x] 2.11 Write `sdk-python/src/sentryagent_idp/__init__.py` — barrel exports - [x] 2.12 Write `sdk-python/README.md` - [x] 2.13 QA: `mypy --strict` clean, all 14 endpoints, AgentIdPError on all failure paths, pytest >80% ## Workstream 3: Go SDK - [x] 3.1 Create `sdk-go/` with `go.mod` — module: github.com/sentryagent/idp-sdk-go, go 1.21 - [x] 3.2 Write `sdk-go/types.go` — all request/response structs - [x] 3.3 Write `sdk-go/errors.go` — AgentIdPError type implementing error interface - [x] 3.4 Write `sdk-go/token_manager.go` — mutex-guarded TokenManager - [x] 3.5 Write `sdk-go/agents.go` — AgentRegistryClient (flat package; see ADR below) - [x] 3.6 Write `sdk-go/credentials.go` — CredentialClient - [x] 3.7 Write `sdk-go/token_service.go` — TokenServiceClient - [x] 3.8 Write `sdk-go/audit.go` — AuditClient - [x] 3.9 Write `sdk-go/client.go` — AgentIdPClient - [x] 3.10 Write `sdk-go/README.md` - [x] 3.11 QA: `go vet` clean, `staticcheck` clean, all 14 endpoints, goroutine-safe, `go test ./...` >80% ## Workstream 4: Java SDK - [x] 4.1 Create `sdk-java/` with `pom.xml` — groupId: ai.sentryagent, artifactId: idp-sdk, Java 17 - [x] 4.2 Write all POJO request/response model classes - [x] 4.3 Write `AgentIdPException.java` extending RuntimeException - [x] 4.4 Write `TokenManager.java` — synchronized cache with 60s refresh buffer - [x] 4.5 Write `AgentRegistryClient.java` — sync + CompletableFuture methods - [x] 4.6 Write `CredentialClient.java` — sync + CompletableFuture methods - [x] 4.7 Write `TokenClient.java` — sync + CompletableFuture methods - [x] 4.8 Write `AuditClient.java` — sync + CompletableFuture methods - [x] 4.9 Write `AgentIdPClient.java` — composes all service clients - [x] 4.10 Write `sdk-java/README.md` - [x] 4.11 QA: `mvn verify` passes, all 14 endpoints, AgentIdPException on all failure paths, JUnit 5 >80% ## Workstream 5: OPA Policy Engine - [x] 5.1 Write `policies/authz.rego` — allow/deny rules matching all current scope checks - [x] 5.2 Write `policies/data/scopes.json` — scope to endpoint permission mapping - [x] 5.3 Write `src/middleware/opa.ts` — OpaMiddleware: loads Wasm, evaluates input, returns allow/deny - [x] 5.4 Replace static scope check in `src/middleware/auth.ts` with OpaMiddleware - [x] 5.5 Add SIGHUP handler in `src/server.ts` to hot-reload policy files - [x] 5.6 Update `docs/devops/environment-variables.md` — add POLICY_DIR - [x] 5.7 QA: all existing auth tests pass unchanged, new OPA unit tests, hot-reload verified ## Workstream 6: Web Dashboard UI - [x] 6.1 Create `dashboard/` with Vite 5 + React 18 + TypeScript strict configuration - [x] 6.2 Set up shadcn/ui with Tailwind CSS - [x] 6.3 Write `dashboard/src/lib/auth.ts` — credential entry, TokenManager, sessionStorage - [x] 6.4 Write `dashboard/src/lib/client.ts` — wraps @sentryagent/idp-sdk AgentIdPClient - [x] 6.5 Write Login page (`/dashboard/login`) - [x] 6.6 Write Agents page (`/dashboard/agents`) — list, search, filter by status - [x] 6.7 Write Agent Detail page (`/dashboard/agents/:id`) — suspend/reactivate with confirm dialog - [x] 6.8 Write Credentials page (`/dashboard/agents/:id/credentials`) — rotate/revoke with confirm - [x] 6.9 Write Audit Log page (`/dashboard/audit`) — filters, pagination - [x] 6.10 Write Health page (`/dashboard/health`) — PostgreSQL + Redis connectivity status - [x] 6.11 Configure AgentIdP Express app to serve `dashboard/dist/` at `/dashboard` - [x] 6.12 Write `dashboard/README.md` - [x] 6.13 QA: TypeScript strict, zero `any`, OWASP Top 10 review, responsive layout verified ## Workstream 7: Prometheus + Grafana Monitoring - [ ] 7.1 Add `prom-client` to dependencies (after CEO approval A0.4) - [ ] 7.2 Write `src/metrics/registry.ts` — shared Prometheus Registry with all 7 metric definitions - [ ] 7.3 Instrument `OAuth2Service.ts` — increment `agentidp_tokens_issued_total` - [ ] 7.4 Instrument `AgentService.ts` — increment `agentidp_agents_registered_total` - [ ] 7.5 Instrument `src/middleware/` — HTTP request counter and duration histogram - [ ] 7.6 Instrument `src/db/pool.ts` — DB query duration histogram - [ ] 7.7 Instrument `src/cache/redis.ts` — Redis command duration histogram - [ ] 7.8 Add `GET /metrics` route (unauthenticated, Prometheus text format) - [ ] 7.9 Write `monitoring/prometheus/prometheus.yml` — scrape config - [ ] 7.10 Write `monitoring/grafana/provisioning/` — datasource + dashboard provisioning - [ ] 7.11 Write `monitoring/grafana/dashboards/agentidp.json` — pre-built Grafana dashboard - [ ] 7.12 Write `docker-compose.monitoring.yml` overlay - [ ] 7.13 Update `docs/devops/operations.md` — monitoring section - [ ] 7.14 QA: all 7 metrics verified under load, Grafana auto-provisions, no auth leak on /metrics ## Workstream 8: Multi-Region Deployment (Terraform) - [ ] 8.1 Write `terraform/modules/agentidp/main.tf` + `variables.tf` + `outputs.tf` - [ ] 8.2 Write `terraform/modules/rds/` — managed PostgreSQL module - [ ] 8.3 Write `terraform/modules/redis/` — managed Redis module - [ ] 8.4 Write `terraform/modules/lb/` — load balancer + TLS module - [ ] 8.5 Write `terraform/environments/aws/main.tf` + `variables.tf` + `terraform.tfvars.example` - [ ] 8.6 Write `terraform/environments/gcp/main.tf` + `variables.tf` + `terraform.tfvars.example` - [ ] 8.7 Write `docs/devops/deployment.md` — end-to-end AWS and GCP deployment walkthrough - [ ] 8.8 QA: `terraform validate` passes, secrets not hardcoded, TLS enforced, DB/Redis VPC-internal --- ## Phase 2 Complete Criteria All 8 workstreams done. All tasks checked. All QA gates passed. CEO reviewed.