chore: Phase 2 OpenSpec scoping — proposal, design, specs, tasks

8 workstreams scoped per OpenSpec standards:
1. HashiCorp Vault integration (secret management)
2. Python SDK (sentryagent-idp)
3. Go SDK (idp-sdk-go)
4. Java SDK (ai.sentryagent:idp-sdk)
5. OPA policy engine (dynamic ABAC, hot-reload Rego)
6. Web Dashboard UI (React 18 + TypeScript)
7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard)
8. Multi-region Terraform deployment (AWS + GCP)

Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5)
before any implementation begins.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-03-28 14:53:09 +00:00
parent 0d56895fae
commit 7593bfe1c1
12 changed files with 682 additions and 0 deletions

View File

@@ -0,0 +1,44 @@
# Spec: Multi-Region Deployment (Terraform)
**Status**: Pending CEO approval
**Workstream**: 8 of 8
## Scope
- `terraform/` directory at project root
- Shared `agentidp` module (compute, networking, secrets)
- `environments/aws/` — ECS Fargate + RDS PostgreSQL + ElastiCache Redis
- `environments/gcp/` — Cloud Run + Cloud SQL + Memorystore Redis
- Deployment guide: `docs/devops/deployment.md`
## Module structure
```
terraform/
modules/
agentidp/
main.tf — compute (ECS task or Cloud Run service)
networking.tf — VPC, subnets, security groups
variables.tf — all configurable inputs
outputs.tf — service URL, DB endpoint, Redis endpoint
rds/ — managed PostgreSQL
redis/ — managed Redis
lb/ — ALB (AWS) or Cloud LB (GCP), TLS cert
environments/
aws/
main.tf — calls modules, sets AWS-specific vars
variables.tf
terraform.tfvars.example
gcp/
main.tf
variables.tf
terraform.tfvars.example
```
## Acceptance Criteria
- [ ] `terraform validate` passes for both aws and gcp environments
- [ ] `terraform plan` produces no errors against a live AWS/GCP account (test in dev env)
- [ ] JWT_PRIVATE_KEY and JWT_PUBLIC_KEY injected as environment secrets (not hardcoded)
- [ ] TLS termination at load balancer — HTTPS only in production modules
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control

View File

@@ -0,0 +1,23 @@
# Spec: Go SDK (`github.com/sentryagent/idp-sdk-go`)
**Status**: Pending CEO approval
**Workstream**: 3 of 8
## Scope
- `sdk-go/` directory at project root
- Context-aware `AgentIdPClient` using standard library `net/http`
- `TokenManager` with mutex-guarded cache and 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- Idiomatic Go error type `AgentIdPError` implementing `error` interface
- `go.mod` module: `github.com/sentryagent/idp-sdk-go`
- `sdk-go/README.md`
## Acceptance Criteria
- [ ] All 14 endpoints covered
- [ ] All methods take `context.Context` as first argument
- [ ] No panics — all errors returned as `error`
- [ ] `AgentIdPError` implements `error` and exposes `.Code`, `.HTTPStatus`, `.Details`
- [ ] `TokenManager` is goroutine-safe (`sync.Mutex` on cache)
- [ ] `go vet` and `staticcheck` pass with zero warnings
- [ ] `go test ./...` with >80% coverage
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,23 @@
# Spec: Java SDK (`ai.sentryagent:idp-sdk`)
**Status**: Pending CEO approval
**Workstream**: 4 of 8
## Scope
- `sdk-java/` directory at project root
- `AgentIdPClient` with sync and `CompletableFuture` async variants
- `TokenManager` with thread-safe cache and 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- `AgentIdPException` extending `RuntimeException` with `code`, `httpStatus`, `details`
- `pom.xml`: groupId=`ai.sentryagent`, artifactId=`idp-sdk`, Java 17+
- `sdk-java/README.md`
## Acceptance Criteria
- [ ] All 14 endpoints covered
- [ ] Sync methods return typed POJOs; async methods return `CompletableFuture<T>`
- [ ] `AgentIdPException` thrown (not raw IOException) on all failure paths
- [ ] `TokenManager` is thread-safe (`synchronized` on cache)
- [ ] Apache HttpClient 5 for HTTP transport
- [ ] Jackson for JSON serialization
- [ ] `mvn verify` passes with >80% coverage (JUnit 5)
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,32 @@
# Spec: Prometheus + Grafana Monitoring
**Status**: Pending CEO approval
**Workstream**: 7 of 8
## Scope
- `prom-client` integration — expose `GET /metrics`
- 7 metrics (counters + histograms) across all services
- `monitoring/` directory: Prometheus config + Grafana provisioning
- `docker-compose.monitoring.yml` overlay (adds prometheus + grafana services)
- Pre-built Grafana dashboard JSON (`monitoring/grafana/dashboards/agentidp.json`)
## Metrics
| Metric | Type | Labels |
|--------|------|--------|
| `agentidp_tokens_issued_total` | Counter | `outcome` (success/failure) |
| `agentidp_agents_registered_total` | Counter | `outcome` |
| `agentidp_http_requests_total` | Counter | `method`, `path`, `status_code` |
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `path` |
| `agentidp_rate_limit_rejections_total` | Counter | — |
| `agentidp_db_query_duration_seconds` | Histogram | `operation` |
| `agentidp_redis_command_duration_seconds` | Histogram | `command` |
## Acceptance Criteria
- [ ] `GET /metrics` returns Prometheus text format
- [ ] `/metrics` endpoint does NOT require Bearer auth (Prometheus scrapes it)
- [ ] All 7 metrics present and updating under load
- [ ] Grafana dashboard auto-provisions on `docker compose -f docker-compose.monitoring.yml up`
- [ ] Grafana runs on port 3001 (no conflict with AgentIdP on 3000)
- [ ] `docs/devops/operations.md` updated with monitoring section
- [ ] `prom-client` added as new dependency — CEO approval gate

View File

@@ -0,0 +1,37 @@
# Spec: OPA Policy Engine Integration
**Status**: Pending CEO approval
**Workstream**: 5 of 8
## Scope
- New `OpaMiddleware` replacing static scope check in `auth.ts`
- `@openpolicyagent/opa-wasm` integration (embedded Wasm, no sidecar)
- `policies/authz.rego` — main allow/deny policy
- `policies/data/scopes.json` — scope to permission mapping
- SIGHUP handler to hot-reload policies without restart
- New env var: `POLICY_DIR` (default: `./policies`)
## Policy interface
```
input = {
"method": "GET",
"path": "/api/v1/agents",
"scopes": ["agents:read"],
"agentId": "uuid"
}
output = {
"allow": true | false,
"reason": "string" // populated when allow=false
}
```
## Acceptance Criteria
- [ ] All existing scope checks replaced by OPA evaluation
- [ ] Policy files hot-reloadable on SIGHUP (no restart required)
- [ ] OPA Wasm loaded at startup — fail-fast if `POLICY_DIR` invalid
- [ ] `allow=false` responses return `403` with `reason` in error body
- [ ] Existing test suite passes unchanged (OPA evaluates same rules as before)
- [ ] New unit tests for OPA middleware: allow/deny cases, missing scope, invalid input
- [ ] `POLICY_DIR` env var documented in `docs/devops/environment-variables.md`

View File

@@ -0,0 +1,24 @@
# Spec: Python SDK (`sentryagent-idp`)
**Status**: Pending CEO approval
**Workstream**: 2 of 8
## Scope
- `sdk-python/` directory at project root
- `AgentIdPClient` with sync and async variants
- `TokenManager` with 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- `AgentIdPError` typed exception
- Full type hints — `mypy --strict` clean
- `sdk-python/README.md` with installation and usage
## Acceptance Criteria
- [ ] All 14 API endpoints covered
- [ ] Sync client: `requests` library
- [ ] Async client: `httpx` library
- [ ] `mypy --strict` passes with zero errors
- [ ] Zero untyped code
- [ ] `AgentIdPError` raised (not raw requests/httpx exceptions) on all failure paths
- [ ] `TokenManager` tested: caches token, refreshes at exp-60s
- [ ] `pyproject.toml` with: name=sentryagent-idp, python>=3.9, dependencies declared
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,21 @@
# Spec: HashiCorp Vault Integration
**Status**: Pending CEO approval
**Workstream**: 1 of 8
## Scope
- VaultClient class wrapping `node-vault`
- `005_add_vault_path.sql` migration
- Updated CredentialService to write secrets to Vault instead of PostgreSQL
- New env vars: VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
- Migration guide: bcrypt → Vault coexistence strategy
## Acceptance Criteria
- [ ] New credentials: secret written to Vault KV v2, `vault_path` stored in PostgreSQL
- [ ] Credential rotation: Vault versioned update, `vault_path` unchanged
- [ ] Credential revocation: Vault secret deleted, DB status = `revoked`
- [ ] Existing bcrypt credentials continue to work until rotated
- [ ] VaultClient follows existing service interface pattern (DRY, SOLID)
- [ ] Zero `any` types, TypeScript strict
- [ ] `VAULT_ADDR` / `VAULT_TOKEN` validation at startup (fail-fast)
- [ ] DevOps docs updated with Vault setup section

View File

@@ -0,0 +1,34 @@
# Spec: Web Dashboard UI
**Status**: Pending CEO approval
**Workstream**: 6 of 8
## Scope
- `dashboard/` directory at project root
- React 18 + TypeScript strict, built with Vite 5
- TanStack Query v5 for server state
- shadcn/ui (Radix UI + Tailwind CSS) for components
- Four pages: Agents, Credentials, Audit Log, Health
- Client-side auth: `clientId` + `clientSecret``TokenManager`
- Served from AgentIdP server at `GET /dashboard` (static build)
## Pages
| Page | Route | Scope Required |
|------|-------|---------------|
| Login | `/dashboard/login` | None |
| Agents | `/dashboard/agents` | `agents:read` |
| Agent Detail | `/dashboard/agents/:id` | `agents:read` |
| Credentials | `/dashboard/agents/:id/credentials` | `agents:read` |
| Audit Log | `/dashboard/audit` | `audit:read` |
| Health | `/dashboard/health` | None |
## Acceptance Criteria
- [ ] TypeScript strict — zero `any` across all dashboard files
- [ ] `dashboard/tsconfig.json` with `strict: true`
- [ ] Login form stores token in `sessionStorage` only (not `localStorage`)
- [ ] All write operations (suspend, revoke, rotate) require confirmation dialog
- [ ] OWASP Top 10 review: no XSS, no CSRF, no sensitive data in URL params
- [ ] Vite build outputs to `dashboard/dist/`; AgentIdP serves it as static
- [ ] `dashboard/README.md` — how to build and serve
- [ ] Responsive layout — functional on desktop and tablet