chore: Phase 2 OpenSpec scoping — proposal, design, specs, tasks
8 workstreams scoped per OpenSpec standards: 1. HashiCorp Vault integration (secret management) 2. Python SDK (sentryagent-idp) 3. Go SDK (idp-sdk-go) 4. Java SDK (ai.sentryagent:idp-sdk) 5. OPA policy engine (dynamic ABAC, hot-reload Rego) 6. Web Dashboard UI (React 18 + TypeScript) 7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard) 8. Multi-region Terraform deployment (AWS + GCP) Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5) before any implementation begins. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
96
openspec/changes/phase-2-production-ready/proposal.md
Normal file
96
openspec/changes/phase-2-production-ready/proposal.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Phase 2: Production-Ready — Change Proposal
|
||||
|
||||
**Date**: 2026-03-28
|
||||
**Author**: Virtual CTO
|
||||
**Status**: Proposed — awaiting CEO approval
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 makes it production-ready: hardened secrets management, multi-language SDKs, a policy engine, a web dashboard, observability, and multi-region deployment.
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Phase 1 is functional but has the following production gaps:
|
||||
|
||||
| Gap | Risk |
|
||||
|-----|------|
|
||||
| Credentials stored as bcrypt hashes in PostgreSQL | No HSM/KMS — acceptable for MVP, not for enterprise |
|
||||
| Only Node.js SDK | Developers in Python/Go/Java cannot use the SDK |
|
||||
| No policy engine | Scope enforcement is static — no dynamic ABAC/RBAC |
|
||||
| No web UI | Operators must use `curl` to manage agents |
|
||||
| No observability | No metrics, no dashboards, no alerting |
|
||||
| Single-region deployment | No HA, no geo-redundancy |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### 1. HashiCorp Vault Integration
|
||||
Replace raw bcrypt credential storage with Vault-backed secret management. Vault handles secret generation, versioning, and revocation. AgentIdP stores only Vault secret paths, not the secrets themselves.
|
||||
|
||||
### 2. Multi-Language SDKs
|
||||
Add Python, Go, and Java SDKs with identical API surface to the existing Node.js SDK: `AgentIdPClient`, `TokenManager`, service clients for all 14 endpoints, typed error hierarchy.
|
||||
|
||||
### 3. Advanced Policy Engine (OPA)
|
||||
Integrate Open Policy Agent (OPA) as a sidecar for dynamic scope and attribute-based access control. Policies are hot-reloadable Rego files — no server restart required.
|
||||
|
||||
### 4. Web Dashboard UI
|
||||
A React + TypeScript dashboard for operators: agent list and management, credential overview, audit log viewer, system health panel. Read-only by default; write operations require `agents:write` scope.
|
||||
|
||||
### 5. Prometheus + Grafana Monitoring
|
||||
Instrument all services with Prometheus metrics (`/metrics` endpoint). Ship a pre-built Grafana dashboard for: token issuance rate, agent registration rate, error rates, Redis latency, PostgreSQL query latency.
|
||||
|
||||
### 6. Multi-Region Deployment
|
||||
Terraform modules for AWS/GCP deployment with: managed PostgreSQL (RDS/Cloud SQL), managed Redis (ElastiCache/Memorystore), container orchestration (ECS/Cloud Run), load balancer, and a deployment guide.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope for Phase 2
|
||||
|
||||
- AGNTCY federation (Phase 3)
|
||||
- W3C DID support (Phase 3)
|
||||
- SOC 2 certification (Phase 3)
|
||||
- Rust/C++ SDKs (Phase 3)
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| New Dependency | Purpose | CEO Approval Required |
|
||||
|---------------|---------|----------------------|
|
||||
| `@openpolicyagent/opa-wasm` | OPA policy evaluation | Yes |
|
||||
| `node-vault` | HashiCorp Vault client | Yes |
|
||||
| React 18 + Vite | Web dashboard | Yes |
|
||||
| `prom-client` | Prometheus metrics | Yes |
|
||||
| Terraform | Infrastructure as code | Yes |
|
||||
|
||||
---
|
||||
|
||||
## Delivery Sequence (per OpenSpec spec-first workflow)
|
||||
|
||||
```
|
||||
1. Vault integration (highest security impact)
|
||||
2. Python SDK (highest developer demand)
|
||||
3. Go SDK
|
||||
4. Java SDK
|
||||
5. OPA policy engine
|
||||
6. Web dashboard UI
|
||||
7. Prometheus + Grafana monitoring
|
||||
8. Multi-region deployment (Terraform)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- All new dependencies CEO-approved before implementation begins
|
||||
- All new API endpoints have OpenAPI 3.0 specs before implementation
|
||||
- TypeScript strict mode + zero `any` maintained throughout
|
||||
- >80% test coverage on all new services
|
||||
- All SDKs pass the same QA gate: 14-endpoint coverage, typed errors, zero `any`
|
||||
- Web dashboard passes OWASP Top 10 security review
|
||||
- Monitoring stack ships with pre-built dashboards — zero manual setup required
|
||||
Reference in New Issue
Block a user