chore(openspec): archive engineering-docs and phase-2-production-ready changes
- engineering-docs → archive/2026-03-29-engineering-docs (63/63 tasks complete) - phase-2-production-ready → archive/2026-03-29-phase-2-production-ready (89/89 tasks complete) - openspec/specs/ synced with all Phase 1 + Phase 2 + engineering-docs capabilities (22 specs total) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
35
openspec/specs/architecture-guide/spec.md
Normal file
35
openspec/specs/architecture-guide/spec.md
Normal file
@@ -0,0 +1,35 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: System architecture document
|
||||
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
|
||||
|
||||
#### Scenario: Component diagram present
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
|
||||
|
||||
#### Scenario: Request lifecycle explained
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
|
||||
|
||||
#### Scenario: Data flow for authentication described
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
|
||||
|
||||
#### Scenario: Deployment topology covered
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
|
||||
|
||||
### Requirement: Technology stack and ADR document
|
||||
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
|
||||
|
||||
#### Scenario: Every major technology documented with rationale
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
|
||||
|
||||
#### Scenario: TypeScript strict mode rationale explained
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
|
||||
|
||||
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)
|
||||
27
openspec/specs/code-walkthroughs/spec.md
Normal file
27
openspec/specs/code-walkthroughs/spec.md
Normal file
@@ -0,0 +1,27 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Annotated code walkthrough documents
|
||||
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
|
||||
|
||||
#### Scenario: Token issuance walkthrough complete
|
||||
- **WHEN** a new engineer reads the token issuance walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number where it occurs
|
||||
|
||||
#### Scenario: Agent registration walkthrough complete
|
||||
- **WHEN** a new engineer reads the agent registration walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Credential rotation walkthrough complete
|
||||
- **WHEN** a new engineer reads the credential rotation walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Walkthroughs include version reference
|
||||
- **WHEN** a new engineer reads any walkthrough
|
||||
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
|
||||
|
||||
#### Scenario: Each walkthrough annotates why, not just what
|
||||
- **WHEN** a new engineer reads a walkthrough step
|
||||
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before
|
||||
24
openspec/specs/codebase-structure/spec.md
Normal file
24
openspec/specs/codebase-structure/spec.md
Normal file
@@ -0,0 +1,24 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Codebase structure document
|
||||
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
|
||||
|
||||
#### Scenario: Full directory tree annotated
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
|
||||
|
||||
#### Scenario: src/ subdirectory roles explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
|
||||
|
||||
#### Scenario: Where to add new code explained
|
||||
- **WHEN** a new engineer needs to add a new feature
|
||||
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
|
||||
|
||||
#### Scenario: Key files identified and explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
|
||||
|
||||
#### Scenario: DRY principle mapped to structure
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue
|
||||
28
openspec/specs/deployment-operations/spec.md
Normal file
28
openspec/specs/deployment-operations/spec.md
Normal file
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Deployment and operations guide
|
||||
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
|
||||
|
||||
#### Scenario: Docker build and run documented
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
|
||||
|
||||
#### Scenario: Environment variables fully documented
|
||||
- **WHEN** a new engineer needs to configure the application
|
||||
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
|
||||
|
||||
#### Scenario: Database migrations documented
|
||||
- **WHEN** a new engineer needs to run or write migrations
|
||||
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
|
||||
|
||||
#### Scenario: Terraform multi-region deployment explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
|
||||
|
||||
#### Scenario: Prometheus metrics and Grafana explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
|
||||
|
||||
#### Scenario: Operational runbook for common tasks
|
||||
- **WHEN** a new engineer is on-call or supporting operations
|
||||
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident
|
||||
44
openspec/specs/deployment/spec.md
Normal file
44
openspec/specs/deployment/spec.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Spec: Multi-Region Deployment (Terraform)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 8 of 8
|
||||
|
||||
## Scope
|
||||
- `terraform/` directory at project root
|
||||
- Shared `agentidp` module (compute, networking, secrets)
|
||||
- `environments/aws/` — ECS Fargate + RDS PostgreSQL + ElastiCache Redis
|
||||
- `environments/gcp/` — Cloud Run + Cloud SQL + Memorystore Redis
|
||||
- Deployment guide: `docs/devops/deployment.md`
|
||||
|
||||
## Module structure
|
||||
|
||||
```
|
||||
terraform/
|
||||
modules/
|
||||
agentidp/
|
||||
main.tf — compute (ECS task or Cloud Run service)
|
||||
networking.tf — VPC, subnets, security groups
|
||||
variables.tf — all configurable inputs
|
||||
outputs.tf — service URL, DB endpoint, Redis endpoint
|
||||
rds/ — managed PostgreSQL
|
||||
redis/ — managed Redis
|
||||
lb/ — ALB (AWS) or Cloud LB (GCP), TLS cert
|
||||
environments/
|
||||
aws/
|
||||
main.tf — calls modules, sets AWS-specific vars
|
||||
variables.tf
|
||||
terraform.tfvars.example
|
||||
gcp/
|
||||
main.tf
|
||||
variables.tf
|
||||
terraform.tfvars.example
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `terraform validate` passes for both aws and gcp environments
|
||||
- [ ] `terraform plan` produces no errors against a live AWS/GCP account (test in dev env)
|
||||
- [ ] JWT_PRIVATE_KEY and JWT_PUBLIC_KEY injected as environment secrets (not hardcoded)
|
||||
- [ ] TLS termination at load balancer — HTTPS only in production modules
|
||||
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
|
||||
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
|
||||
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control
|
||||
32
openspec/specs/dev-environment-setup/spec.md
Normal file
32
openspec/specs/dev-environment-setup/spec.md
Normal file
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Development environment setup guide
|
||||
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
|
||||
|
||||
#### Scenario: Prerequisites listed completely
|
||||
- **WHEN** a new engineer reads 07-dev-setup.md
|
||||
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
|
||||
|
||||
#### Scenario: Repository clone and setup steps complete
|
||||
- **WHEN** a new engineer follows the clone and setup steps
|
||||
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
|
||||
|
||||
#### Scenario: Docker Compose local stack starts successfully
|
||||
- **WHEN** a new engineer runs `docker-compose up -d`
|
||||
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
|
||||
|
||||
#### Scenario: Smoke test confirms working stack
|
||||
- **WHEN** a new engineer follows the smoke test section
|
||||
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
|
||||
|
||||
#### Scenario: Common setup errors documented
|
||||
- **WHEN** a new engineer encounters a setup error
|
||||
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
|
||||
|
||||
#### Scenario: Running tests locally documented
|
||||
- **WHEN** a new engineer wants to run the test suite
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
|
||||
|
||||
#### Scenario: Web dashboard local development documented
|
||||
- **WHEN** a new engineer wants to run the web dashboard
|
||||
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server
|
||||
28
openspec/specs/engineering-overview/spec.md
Normal file
28
openspec/specs/engineering-overview/spec.md
Normal file
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Company and product overview document
|
||||
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
|
||||
|
||||
#### Scenario: Mission and vision covered
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
|
||||
|
||||
#### Scenario: AGNTCY alignment explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
|
||||
|
||||
#### Scenario: Product features listed
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
|
||||
|
||||
#### Scenario: Phase roadmap visible
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
|
||||
|
||||
#### Scenario: Engineering team structure explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
|
||||
|
||||
#### Scenario: Free tier limits documented
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning
|
||||
32
openspec/specs/engineering-workflow/spec.md
Normal file
32
openspec/specs/engineering-workflow/spec.md
Normal file
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Engineering workflow and contribution guide
|
||||
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
|
||||
|
||||
#### Scenario: OpenSpec spec-first workflow explained
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
|
||||
|
||||
#### Scenario: OpenSpec CLI commands documented
|
||||
- **WHEN** a new engineer wants to start a new change
|
||||
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
|
||||
|
||||
#### Scenario: Branching strategy documented
|
||||
- **WHEN** a new engineer creates a branch
|
||||
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop` → `main` requires CTO + CEO approval
|
||||
|
||||
#### Scenario: TypeScript and code standards enforced in workflow
|
||||
- **WHEN** a new engineer writes code
|
||||
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
|
||||
|
||||
#### Scenario: PR checklist documented
|
||||
- **WHEN** a new engineer opens a PR
|
||||
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
|
||||
|
||||
#### Scenario: Virtual engineering team roles explained for contributors
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
|
||||
|
||||
#### Scenario: Commit message conventions documented
|
||||
- **WHEN** a new engineer writes a commit message
|
||||
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each
|
||||
23
openspec/specs/go-sdk/spec.md
Normal file
23
openspec/specs/go-sdk/spec.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Spec: Go SDK (`github.com/sentryagent/idp-sdk-go`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 3 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-go/` directory at project root
|
||||
- Context-aware `AgentIdPClient` using standard library `net/http`
|
||||
- `TokenManager` with mutex-guarded cache and 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- Idiomatic Go error type `AgentIdPError` implementing `error` interface
|
||||
- `go.mod` module: `github.com/sentryagent/idp-sdk-go`
|
||||
- `sdk-go/README.md`
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 endpoints covered
|
||||
- [ ] All methods take `context.Context` as first argument
|
||||
- [ ] No panics — all errors returned as `error`
|
||||
- [ ] `AgentIdPError` implements `error` and exposes `.Code`, `.HTTPStatus`, `.Details`
|
||||
- [ ] `TokenManager` is goroutine-safe (`sync.Mutex` on cache)
|
||||
- [ ] `go vet` and `staticcheck` pass with zero warnings
|
||||
- [ ] `go test ./...` with >80% coverage
|
||||
- [ ] README matches Node.js SDK structure
|
||||
23
openspec/specs/java-sdk/spec.md
Normal file
23
openspec/specs/java-sdk/spec.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Spec: Java SDK (`ai.sentryagent:idp-sdk`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 4 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-java/` directory at project root
|
||||
- `AgentIdPClient` with sync and `CompletableFuture` async variants
|
||||
- `TokenManager` with thread-safe cache and 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- `AgentIdPException` extending `RuntimeException` with `code`, `httpStatus`, `details`
|
||||
- `pom.xml`: groupId=`ai.sentryagent`, artifactId=`idp-sdk`, Java 17+
|
||||
- `sdk-java/README.md`
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 endpoints covered
|
||||
- [ ] Sync methods return typed POJOs; async methods return `CompletableFuture<T>`
|
||||
- [ ] `AgentIdPException` thrown (not raw IOException) on all failure paths
|
||||
- [ ] `TokenManager` is thread-safe (`synchronized` on cache)
|
||||
- [ ] Apache HttpClient 5 for HTTP transport
|
||||
- [ ] Jackson for JSON serialization
|
||||
- [ ] `mvn verify` passes with >80% coverage (JUnit 5)
|
||||
- [ ] README matches Node.js SDK structure
|
||||
32
openspec/specs/monitoring/spec.md
Normal file
32
openspec/specs/monitoring/spec.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Spec: Prometheus + Grafana Monitoring
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 7 of 8
|
||||
|
||||
## Scope
|
||||
- `prom-client` integration — expose `GET /metrics`
|
||||
- 7 metrics (counters + histograms) across all services
|
||||
- `monitoring/` directory: Prometheus config + Grafana provisioning
|
||||
- `docker-compose.monitoring.yml` overlay (adds prometheus + grafana services)
|
||||
- Pre-built Grafana dashboard JSON (`monitoring/grafana/dashboards/agentidp.json`)
|
||||
|
||||
## Metrics
|
||||
|
||||
| Metric | Type | Labels |
|
||||
|--------|------|--------|
|
||||
| `agentidp_tokens_issued_total` | Counter | `outcome` (success/failure) |
|
||||
| `agentidp_agents_registered_total` | Counter | `outcome` |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `path`, `status_code` |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `path` |
|
||||
| `agentidp_rate_limit_rejections_total` | Counter | — |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` |
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `GET /metrics` returns Prometheus text format
|
||||
- [ ] `/metrics` endpoint does NOT require Bearer auth (Prometheus scrapes it)
|
||||
- [ ] All 7 metrics present and updating under load
|
||||
- [ ] Grafana dashboard auto-provisions on `docker compose -f docker-compose.monitoring.yml up`
|
||||
- [ ] Grafana runs on port 3001 (no conflict with AgentIdP on 3000)
|
||||
- [ ] `docs/devops/operations.md` updated with monitoring section
|
||||
- [ ] `prom-client` added as new dependency — CEO approval gate
|
||||
37
openspec/specs/opa-policy/spec.md
Normal file
37
openspec/specs/opa-policy/spec.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Spec: OPA Policy Engine Integration
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 5 of 8
|
||||
|
||||
## Scope
|
||||
- New `OpaMiddleware` replacing static scope check in `auth.ts`
|
||||
- `@openpolicyagent/opa-wasm` integration (embedded Wasm, no sidecar)
|
||||
- `policies/authz.rego` — main allow/deny policy
|
||||
- `policies/data/scopes.json` — scope to permission mapping
|
||||
- SIGHUP handler to hot-reload policies without restart
|
||||
- New env var: `POLICY_DIR` (default: `./policies`)
|
||||
|
||||
## Policy interface
|
||||
|
||||
```
|
||||
input = {
|
||||
"method": "GET",
|
||||
"path": "/api/v1/agents",
|
||||
"scopes": ["agents:read"],
|
||||
"agentId": "uuid"
|
||||
}
|
||||
|
||||
output = {
|
||||
"allow": true | false,
|
||||
"reason": "string" // populated when allow=false
|
||||
}
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All existing scope checks replaced by OPA evaluation
|
||||
- [ ] Policy files hot-reloadable on SIGHUP (no restart required)
|
||||
- [ ] OPA Wasm loaded at startup — fail-fast if `POLICY_DIR` invalid
|
||||
- [ ] `allow=false` responses return `403` with `reason` in error body
|
||||
- [ ] Existing test suite passes unchanged (OPA evaluates same rules as before)
|
||||
- [ ] New unit tests for OPA middleware: allow/deny cases, missing scope, invalid input
|
||||
- [ ] `POLICY_DIR` env var documented in `docs/devops/environment-variables.md`
|
||||
24
openspec/specs/python-sdk/spec.md
Normal file
24
openspec/specs/python-sdk/spec.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Spec: Python SDK (`sentryagent-idp`)
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 2 of 8
|
||||
|
||||
## Scope
|
||||
- `sdk-python/` directory at project root
|
||||
- `AgentIdPClient` with sync and async variants
|
||||
- `TokenManager` with 60s auto-refresh
|
||||
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
|
||||
- `AgentIdPError` typed exception
|
||||
- Full type hints — `mypy --strict` clean
|
||||
- `sdk-python/README.md` with installation and usage
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All 14 API endpoints covered
|
||||
- [ ] Sync client: `requests` library
|
||||
- [ ] Async client: `httpx` library
|
||||
- [ ] `mypy --strict` passes with zero errors
|
||||
- [ ] Zero untyped code
|
||||
- [ ] `AgentIdPError` raised (not raw requests/httpx exceptions) on all failure paths
|
||||
- [ ] `TokenManager` tested: caches token, refreshes at exp-60s
|
||||
- [ ] `pyproject.toml` with: name=sentryagent-idp, python>=3.9, dependencies declared
|
||||
- [ ] README matches Node.js SDK structure
|
||||
28
openspec/specs/sdk-guide/spec.md
Normal file
28
openspec/specs/sdk-guide/spec.md
Normal file
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: SDK integration guide
|
||||
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
|
||||
|
||||
#### Scenario: SDK architecture overview present
|
||||
- **WHEN** a new engineer reads 11-sdk-guide.md
|
||||
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
|
||||
|
||||
#### Scenario: Node.js SDK documented
|
||||
- **WHEN** a new engineer reads the Node.js SDK section
|
||||
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
|
||||
|
||||
#### Scenario: Python SDK documented
|
||||
- **WHEN** a new engineer reads the Python SDK section
|
||||
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
|
||||
|
||||
#### Scenario: Go SDK documented
|
||||
- **WHEN** a new engineer reads the Go SDK section
|
||||
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
|
||||
|
||||
#### Scenario: Java SDK documented
|
||||
- **WHEN** a new engineer reads the Java SDK section
|
||||
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
|
||||
|
||||
#### Scenario: SDK contribution guide included
|
||||
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
|
||||
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language
|
||||
40
openspec/specs/service-deep-dives/spec.md
Normal file
40
openspec/specs/service-deep-dives/spec.md
Normal file
@@ -0,0 +1,40 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Service deep-dive documents
|
||||
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
|
||||
|
||||
#### Scenario: AgentService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
|
||||
|
||||
#### Scenario: OAuth2Service documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
|
||||
|
||||
#### Scenario: CredentialService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
|
||||
|
||||
#### Scenario: AuditService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
|
||||
|
||||
#### Scenario: VaultClient documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
|
||||
|
||||
#### Scenario: OPA policy engine documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
|
||||
|
||||
#### Scenario: Web Dashboard documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
|
||||
|
||||
#### Scenario: Monitoring stack documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
|
||||
|
||||
#### Scenario: Consistent template enforced
|
||||
- **WHEN** a new engineer looks up any service
|
||||
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information
|
||||
32
openspec/specs/testing-strategy/spec.md
Normal file
32
openspec/specs/testing-strategy/spec.md
Normal file
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Testing strategy document
|
||||
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
|
||||
|
||||
#### Scenario: Test types and their purposes explained
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
|
||||
|
||||
#### Scenario: Test framework stack documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
|
||||
|
||||
#### Scenario: Coverage gates documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
|
||||
|
||||
#### Scenario: How to run the test suite documented
|
||||
- **WHEN** a new engineer wants to run tests
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
|
||||
|
||||
#### Scenario: Unit test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new unit test
|
||||
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
|
||||
|
||||
#### Scenario: Integration test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new integration test
|
||||
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
|
||||
|
||||
#### Scenario: OWASP security testing reference included
|
||||
- **WHEN** a new engineer writes security-relevant code
|
||||
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)
|
||||
21
openspec/specs/vault/spec.md
Normal file
21
openspec/specs/vault/spec.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Spec: HashiCorp Vault Integration
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 1 of 8
|
||||
|
||||
## Scope
|
||||
- VaultClient class wrapping `node-vault`
|
||||
- `005_add_vault_path.sql` migration
|
||||
- Updated CredentialService to write secrets to Vault instead of PostgreSQL
|
||||
- New env vars: VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
|
||||
- Migration guide: bcrypt → Vault coexistence strategy
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] New credentials: secret written to Vault KV v2, `vault_path` stored in PostgreSQL
|
||||
- [ ] Credential rotation: Vault versioned update, `vault_path` unchanged
|
||||
- [ ] Credential revocation: Vault secret deleted, DB status = `revoked`
|
||||
- [ ] Existing bcrypt credentials continue to work until rotated
|
||||
- [ ] VaultClient follows existing service interface pattern (DRY, SOLID)
|
||||
- [ ] Zero `any` types, TypeScript strict
|
||||
- [ ] `VAULT_ADDR` / `VAULT_TOKEN` validation at startup (fail-fast)
|
||||
- [ ] DevOps docs updated with Vault setup section
|
||||
34
openspec/specs/web-dashboard/spec.md
Normal file
34
openspec/specs/web-dashboard/spec.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Spec: Web Dashboard UI
|
||||
|
||||
**Status**: Pending CEO approval
|
||||
**Workstream**: 6 of 8
|
||||
|
||||
## Scope
|
||||
- `dashboard/` directory at project root
|
||||
- React 18 + TypeScript strict, built with Vite 5
|
||||
- TanStack Query v5 for server state
|
||||
- shadcn/ui (Radix UI + Tailwind CSS) for components
|
||||
- Four pages: Agents, Credentials, Audit Log, Health
|
||||
- Client-side auth: `clientId` + `clientSecret` → `TokenManager`
|
||||
- Served from AgentIdP server at `GET /dashboard` (static build)
|
||||
|
||||
## Pages
|
||||
|
||||
| Page | Route | Scope Required |
|
||||
|------|-------|---------------|
|
||||
| Login | `/dashboard/login` | None |
|
||||
| Agents | `/dashboard/agents` | `agents:read` |
|
||||
| Agent Detail | `/dashboard/agents/:id` | `agents:read` |
|
||||
| Credentials | `/dashboard/agents/:id/credentials` | `agents:read` |
|
||||
| Audit Log | `/dashboard/audit` | `audit:read` |
|
||||
| Health | `/dashboard/health` | None |
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] TypeScript strict — zero `any` across all dashboard files
|
||||
- [ ] `dashboard/tsconfig.json` with `strict: true`
|
||||
- [ ] Login form stores token in `sessionStorage` only (not `localStorage`)
|
||||
- [ ] All write operations (suspend, revoke, rotate) require confirmation dialog
|
||||
- [ ] OWASP Top 10 review: no XSS, no CSRF, no sensitive data in URL params
|
||||
- [ ] Vite build outputs to `dashboard/dist/`; AgentIdP serves it as static
|
||||
- [ ] `dashboard/README.md` — how to build and serve
|
||||
- [ ] Responsive layout — functional on desktop and tablet
|
||||
Reference in New Issue
Block a user