chore: Phase 2 OpenSpec scoping — proposal, design, specs, tasks
8 workstreams scoped per OpenSpec standards: 1. HashiCorp Vault integration (secret management) 2. Python SDK (sentryagent-idp) 3. Go SDK (idp-sdk-go) 4. Java SDK (ai.sentryagent:idp-sdk) 5. OPA policy engine (dynamic ABAC, hot-reload Rego) 6. Web Dashboard UI (React 18 + TypeScript) 7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard) 8. Multi-region Terraform deployment (AWS + GCP) Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5) before any implementation begins. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,44 @@
|
||||
# Spec: Multi-Region Deployment (Terraform)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 8 of 8
|
||||
|
||||
## Scope
|
||||
- `terraform/` directory at project root
|
||||
- Shared `agentidp` module (compute, networking, secrets)
|
||||
- `environments/aws/` — ECS Fargate + RDS PostgreSQL + ElastiCache Redis
|
||||
- `environments/gcp/` — Cloud Run + Cloud SQL + Memorystore Redis
|
||||
- Deployment guide: `docs/devops/deployment.md`
|
||||
|
||||
## Module structure
|
||||
|
||||
```
|
||||
terraform/
|
||||
modules/
|
||||
agentidp/
|
||||
main.tf — compute (ECS task or Cloud Run service)
|
||||
networking.tf — VPC, subnets, security groups
|
||||
variables.tf — all configurable inputs
|
||||
outputs.tf — service URL, DB endpoint, Redis endpoint
|
||||
rds/ — managed PostgreSQL
|
||||
redis/ — managed Redis
|
||||
lb/ — ALB (AWS) or Cloud LB (GCP), TLS cert
|
||||
environments/
|
||||
aws/
|
||||
main.tf — calls modules, sets AWS-specific vars
|
||||
variables.tf
|
||||
terraform.tfvars.example
|
||||
gcp/
|
||||
main.tf
|
||||
variables.tf
|
||||
terraform.tfvars.example
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `terraform validate` passes for both aws and gcp environments
|
||||
- [ ] `terraform plan` produces no errors against a live AWS/GCP account (test in dev env)
|
||||
- [ ] JWT_PRIVATE_KEY and JWT_PUBLIC_KEY injected as environment secrets (not hardcoded)
|
||||
- [ ] TLS termination at load balancer — HTTPS only in production modules
|
||||
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
|
||||
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
|
||||
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control
|
||||
@@ -0,0 +1,23 @@
|
||||
# Spec: Go SDK (`github.com/sentryagent/idp-sdk-go`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 3 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-go/` directory at project root
|
||||
- Context-aware `AgentIdPClient` using standard library `net/http`
|
||||
- `TokenManager` with mutex-guarded cache and 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- Idiomatic Go error type `AgentIdPError` implementing `error` interface
|
||||
- `go.mod` module: `github.com/sentryagent/idp-sdk-go`
|
||||
- `sdk-go/README.md`
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 endpoints covered
|
||||
- [ ] All methods take `context.Context` as first argument
|
||||
- [ ] No panics — all errors returned as `error`
|
||||
- [ ] `AgentIdPError` implements `error` and exposes `.Code`, `.HTTPStatus`, `.Details`
|
||||
- [ ] `TokenManager` is goroutine-safe (`sync.Mutex` on cache)
|
||||
- [ ] `go vet` and `staticcheck` pass with zero warnings
|
||||
- [ ] `go test ./...` with >80% coverage
|
||||
- [ ] README matches Node.js SDK structure
|
||||
@@ -0,0 +1,23 @@
|
||||
# Spec: Java SDK (`ai.sentryagent:idp-sdk`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 4 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-java/` directory at project root
|
||||
- `AgentIdPClient` with sync and `CompletableFuture` async variants
|
||||
- `TokenManager` with thread-safe cache and 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- `AgentIdPException` extending `RuntimeException` with `code`, `httpStatus`, `details`
|
||||
- `pom.xml`: groupId=`ai.sentryagent`, artifactId=`idp-sdk`, Java 17+
|
||||
- `sdk-java/README.md`
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 endpoints covered
|
||||
- [ ] Sync methods return typed POJOs; async methods return `CompletableFuture<T>`
|
||||
- [ ] `AgentIdPException` thrown (not raw IOException) on all failure paths
|
||||
- [ ] `TokenManager` is thread-safe (`synchronized` on cache)
|
||||
- [ ] Apache HttpClient 5 for HTTP transport
|
||||
- [ ] Jackson for JSON serialization
|
||||
- [ ] `mvn verify` passes with >80% coverage (JUnit 5)
|
||||
- [ ] README matches Node.js SDK structure
|
||||
@@ -0,0 +1,32 @@
|
||||
# Spec: Prometheus + Grafana Monitoring
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 7 of 8
|
||||
|
||||
## Scope
|
||||
- `prom-client` integration — expose `GET /metrics`
|
||||
- 7 metrics (counters + histograms) across all services
|
||||
- `monitoring/` directory: Prometheus config + Grafana provisioning
|
||||
- `docker-compose.monitoring.yml` overlay (adds prometheus + grafana services)
|
||||
- Pre-built Grafana dashboard JSON (`monitoring/grafana/dashboards/agentidp.json`)
|
||||
|
||||
## Metrics
|
||||
|
||||
| Metric | Type | Labels |
|
||||
|--------|------|--------|
|
||||
| `agentidp_tokens_issued_total` | Counter | `outcome` (success/failure) |
|
||||
| `agentidp_agents_registered_total` | Counter | `outcome` |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `path`, `status_code` |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `path` |
|
||||
| `agentidp_rate_limit_rejections_total` | Counter | — |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` |
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `GET /metrics` returns Prometheus text format
|
||||
- [ ] `/metrics` endpoint does NOT require Bearer auth (Prometheus scrapes it)
|
||||
- [ ] All 7 metrics present and updating under load
|
||||
- [ ] Grafana dashboard auto-provisions on `docker compose -f docker-compose.monitoring.yml up`
|
||||
- [ ] Grafana runs on port 3001 (no conflict with AgentIdP on 3000)
|
||||
- [ ] `docs/devops/operations.md` updated with monitoring section
|
||||
- [ ] `prom-client` added as new dependency — CEO approval gate
|
||||
@@ -0,0 +1,37 @@
|
||||
# Spec: OPA Policy Engine Integration
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 5 of 8
|
||||
|
||||
## Scope
|
||||
- New `OpaMiddleware` replacing static scope check in `auth.ts`
|
||||
- `@openpolicyagent/opa-wasm` integration (embedded Wasm, no sidecar)
|
||||
- `policies/authz.rego` — main allow/deny policy
|
||||
- `policies/data/scopes.json` — scope to permission mapping
|
||||
- SIGHUP handler to hot-reload policies without restart
|
||||
- New env var: `POLICY_DIR` (default: `./policies`)
|
||||
|
||||
## Policy interface
|
||||
|
||||
```
|
||||
input = {
|
||||
"method": "GET",
|
||||
"path": "/api/v1/agents",
|
||||
"scopes": ["agents:read"],
|
||||
"agentId": "uuid"
|
||||
}
|
||||
|
||||
output = {
|
||||
"allow": true | false,
|
||||
"reason": "string" // populated when allow=false
|
||||
}
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All existing scope checks replaced by OPA evaluation
|
||||
- [ ] Policy files hot-reloadable on SIGHUP (no restart required)
|
||||
- [ ] OPA Wasm loaded at startup — fail-fast if `POLICY_DIR` invalid
|
||||
- [ ] `allow=false` responses return `403` with `reason` in error body
|
||||
- [ ] Existing test suite passes unchanged (OPA evaluates same rules as before)
|
||||
- [ ] New unit tests for OPA middleware: allow/deny cases, missing scope, invalid input
|
||||
- [ ] `POLICY_DIR` env var documented in `docs/devops/environment-variables.md`
|
||||
@@ -0,0 +1,24 @@
|
||||
# Spec: Python SDK (`sentryagent-idp`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 2 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-python/` directory at project root
|
||||
- `AgentIdPClient` with sync and async variants
|
||||
- `TokenManager` with 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- `AgentIdPError` typed exception
|
||||
- Full type hints — `mypy --strict` clean
|
||||
- `sdk-python/README.md` with installation and usage
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 API endpoints covered
|
||||
- [ ] Sync client: `requests` library
|
||||
- [ ] Async client: `httpx` library
|
||||
- [ ] `mypy --strict` passes with zero errors
|
||||
- [ ] Zero untyped code
|
||||
- [ ] `AgentIdPError` raised (not raw requests/httpx exceptions) on all failure paths
|
||||
- [ ] `TokenManager` tested: caches token, refreshes at exp-60s
|
||||
- [ ] `pyproject.toml` with: name=sentryagent-idp, python>=3.9, dependencies declared
|
||||
- [ ] README matches Node.js SDK structure
|
||||
@@ -0,0 +1,21 @@
|
||||
# Spec: HashiCorp Vault Integration
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 1 of 8
|
||||
|
||||
## Scope
|
||||
- VaultClient class wrapping `node-vault`
|
||||
- `005_add_vault_path.sql` migration
|
||||
- Updated CredentialService to write secrets to Vault instead of PostgreSQL
|
||||
- New env vars: VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
|
||||
- Migration guide: bcrypt → Vault coexistence strategy
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] New credentials: secret written to Vault KV v2, `vault_path` stored in PostgreSQL
|
||||
- [ ] Credential rotation: Vault versioned update, `vault_path` unchanged
|
||||
- [ ] Credential revocation: Vault secret deleted, DB status = `revoked`
|
||||
- [ ] Existing bcrypt credentials continue to work until rotated
|
||||
- [ ] VaultClient follows existing service interface pattern (DRY, SOLID)
|
||||
- [ ] Zero `any` types, TypeScript strict
|
||||
- [ ] `VAULT_ADDR` / `VAULT_TOKEN` validation at startup (fail-fast)
|
||||
- [ ] DevOps docs updated with Vault setup section
|
||||
@@ -0,0 +1,34 @@
|
||||
# Spec: Web Dashboard UI
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 6 of 8
|
||||
|
||||
## Scope
|
||||
- `dashboard/` directory at project root
|
||||
- React 18 + TypeScript strict, built with Vite 5
|
||||
- TanStack Query v5 for server state
|
||||
- shadcn/ui (Radix UI + Tailwind CSS) for components
|
||||
- Four pages: Agents, Credentials, Audit Log, Health
|
||||
- Client-side auth: `clientId` + `clientSecret` → `TokenManager`
|
||||
- Served from AgentIdP server at `GET /dashboard` (static build)
|
||||
|
||||
## Pages
|
||||
|
||||
| Page | Route | Scope Required |
|
||||
|------|-------|---------------|
|
||||
| Login | `/dashboard/login` | None |
|
||||
| Agents | `/dashboard/agents` | `agents:read` |
|
||||
| Agent Detail | `/dashboard/agents/:id` | `agents:read` |
|
||||
| Credentials | `/dashboard/agents/:id/credentials` | `agents:read` |
|
||||
| Audit Log | `/dashboard/audit` | `audit:read` |
|
||||
| Health | `/dashboard/health` | None |
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] TypeScript strict — zero `any` across all dashboard files
|
||||
- [ ] `dashboard/tsconfig.json` with `strict: true`
|
||||
- [ ] Login form stores token in `sessionStorage` only (not `localStorage`)
|
||||
- [ ] All write operations (suspend, revoke, rotate) require confirmation dialog
|
||||
- [ ] OWASP Top 10 review: no XSS, no CSRF, no sensitive data in URL params
|
||||
- [ ] Vite build outputs to `dashboard/dist/`; AgentIdP serves it as static
|
||||
- [ ] `dashboard/README.md` — how to build and serve
|
||||
- [ ] Responsive layout — functional on desktop and tablet
|
||||
Reference in New Issue
Block a user