chore(openspec): archive engineering-docs and phase-2-production-ready changes

- engineering-docs → archive/2026-03-29-engineering-docs (63/63 tasks complete)
- phase-2-production-ready → archive/2026-03-29-phase-2-production-ready (89/89 tasks complete)
- openspec/specs/ synced with all Phase 1 + Phase 2 + engineering-docs capabilities (22 specs total)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-03-29 12:41:53 +00:00
parent eced5f8699
commit d42c653eea
44 changed files with 999 additions and 0 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-29

View File

@@ -0,0 +1,105 @@
## Context
SentryAgent.ai has completed Phase 1 (MVP) and Phase 2 (Production-Ready), producing a fully implemented AgentIdP with 12 capabilities across ~150 source files, 4 language SDKs, Terraform infrastructure, and a React web dashboard. The codebase is mature but undocumented at the engineering level — there are bedroom developer guides (`docs/developers/`) and DevOps guides (`docs/devops/`), but no structured internal engineering knowledge base.
New hires arrive with BSc Computer Science and one year of industrial experience. They understand programming fundamentals and have worked on codebases before, but they have no context on: what SentryAgent.ai is building, why architectural decisions were made, how the codebase is structured, how to navigate the services, how to contribute per our standards, or how the OpenSpec workflow operates. Without documentation, onboarding is fragmented and relies entirely on the CTO's time.
The goal is a `docs/engineering/` directory that a new engineer can read sequentially from top to bottom and arrive ready to contribute within their first week.
## Goals / Non-Goals
**Goals:**
- Produce a complete top-down engineering knowledge base readable in sequence
- Cover all 10 capability areas identified in the proposal
- Calibrate depth for BSc + 1yr experience — assume programming competence, explain domain and architectural decisions
- Every document is self-contained with internal cross-links where needed
- All code examples are complete and runnable (no ellipses, no `// ... rest of code`)
- Development environment setup is achievable in under 30 minutes following the guide alone
- Annotated walkthroughs trace the three critical flows through every layer of code with file:line references
**Non-Goals:**
- Not a replacement for `docs/developers/` (end-user API reference) or `docs/devops/` (operator runbooks)
- Not a tutorial for learning TypeScript, React, or Terraform — assumes language competence
- Not a complete API reference — `docs/developers/api-reference.md` already covers that
- Not roadmap documentation — focuses on what is built, not what is planned
## Decisions
### D1: Location — `docs/engineering/` as a flat directory with an index
**Decision**: All engineering docs live in `docs/engineering/` as flat markdown files with a `README.md` index.
**Rationale**: Deep nested directory structures create navigation friction. Flat layout with numbered filenames (`01-overview.md`, `02-architecture.md`) ensures reading order is obvious without needing a build tool. Gitea renders markdown natively, so no documentation site tooling is required.
**Alternatives considered**:
- `docs/engineering/<subdirs>/` — rejected: adds navigation complexity with no benefit at our current document count
- Docusaurus site — rejected: adds build infrastructure overhead; plain markdown in-repo is sufficient and always in sync with code
---
### D2: Numbered file naming for enforced reading order
**Decision**: Files are named `01-overview.md` through `10-sdk-guide.md`.
**Rationale**: New engineers need a guided path, not a reference library. Numbers make the intended reading sequence unambiguous without any tooling. The `README.md` index maps numbers to sections.
---
### D3: Annotated walkthroughs use file:line references
**Decision**: Code walkthrough documents reference actual source files with line numbers (e.g., `src/controllers/agentController.ts:45`).
**Rationale**: Engineers with 1yr experience learn fastest by reading real code, not simplified pseudocode. File:line references let them jump directly to the relevant section in their editor or on Gitea.
**Trade-off**: Line numbers drift as code changes. Mitigation: walkthrough documents include a "last verified" version comment and note which commit they were verified against. The CTO adds walkthrough review to the Phase 3 change process as a maintenance item.
---
### D4: Three walkthroughs selected by criticality and complexity
**Decision**: Walkthroughs cover: (1) OAuth 2.0 token issuance, (2) agent registration, (3) credential rotation.
**Rationale**:
- Token issuance is the highest-traffic path and touches the most layers (controller → service → repository → Redis → JWT signing)
- Agent registration is the entry point for all users and demonstrates the full validation + persistence + audit pattern
- Credential rotation demonstrates the Vault integration path and shows how Phase 2 extended Phase 1 patterns
These three flows collectively exercise every architectural layer and every major design pattern in the codebase.
---
### D5: Service deep-dives use a consistent template
**Decision**: Each service deep-dive follows the structure: Purpose → Responsibility boundary → Interface → Key methods → Database schema (if applicable) → Error types → Configuration.
**Rationale**: Consistency reduces cognitive load. An engineer who has read the AgentService deep-dive knows exactly where to look for the same information in the OAuth2Service deep-dive. The template mirrors SOLID's Single Responsibility — each section answers one question.
---
### D6: Engineering workflow doc is prescriptive, not descriptive
**Decision**: The workflow guide tells engineers exactly what to do step by step, not just what the process is.
**Rationale**: Engineers with 1yr experience have worked in teams but may not have used a spec-first workflow before. A prescriptive guide ("Step 1: run `openspec new change <name>`") reduces ambiguity and enforces our standards from day one.
## Risks / Trade-offs
**[Line numbers drift as code evolves]** → Walkthroughs include a "last verified against commit X" header. The CTO assigns a quarterly walkthrough review task in each Phase change.
**[Docs can become stale if not maintained]** → Each document has a "Last updated" field in its header. The engineering workflow guide explicitly requires updating relevant engineering docs as part of any PR that changes architecture or public service interfaces.
**[Scope is large — ~15 documents, ~10,000 lines]** → Tasks are broken into discrete documents, each independently completable. No document depends on another being written first (only the index depends on all others).
## Migration Plan
1. Create `docs/engineering/` directory
2. Write all 15 documents (10 capability areas, some split across multiple files)
3. Write `docs/engineering/README.md` index with links and reading order
4. Commit all to `develop` in a single commit
5. No existing documentation is modified or removed
No rollback required — this is additive only.
## Open Questions
_(none — all decisions made above; scope fully defined in proposal)_

View File

@@ -0,0 +1,42 @@
## Why
SentryAgent.ai is growing and hiring engineers with BSc Computer Science and one year of industrial experience. There are currently no internal engineering documents that explain how the system works from the top down — new engineers have no structured path from product vision to running code, and no reference for how to contribute correctly. This gap slows onboarding, increases mistakes, and risks divergence from our architecture and standards.
## What Changes
- New `docs/engineering/` directory added to the repository as the canonical engineering knowledge base
- Top-down documentation suite covering all layers of the system: company vision → architecture → codebase → services → workflows → operations
- Annotated code walkthroughs for the three most critical system flows (token issuance, agent registration, credential rotation)
- Development environment setup guide targeting < 30 minutes from clone to running local stack
- Engineering workflow guide covering the full OpenSpec → Architect → Developer → QA → merge cycle
- Service deep-dive documents for all 8 core services/components
- SDK integration guide covering all four language SDKs
- Testing strategy and quality gate reference
- Deployment and operations reference covering Docker, Terraform, and monitoring
## Capabilities
### New Capabilities
- `engineering-overview`: Company mission, product vision, system purpose, and how the engineering team operates — the entry point for all new hires
- `architecture-guide`: System architecture including component diagram, data flow diagrams, deployment topology, and technology stack rationale (ADRs)
- `codebase-structure`: Annotated directory map explaining every top-level directory and key file, what lives where and why
- `service-deep-dives`: Per-service documentation for AgentService, OAuth2Service, CredentialService, AuditService, VaultClient, OPA policy engine, Web Dashboard, and Prometheus/Grafana monitoring
- `code-walkthroughs`: Step-by-step annotated traces of the three critical flows: token issuance end-to-end, agent registration end-to-end, credential rotation end-to-end
- `dev-environment-setup`: Local development environment setup — prerequisites, clone, configure, Docker Compose up, smoke test — targeting < 30 minutes
- `engineering-workflow`: How to contribute — OpenSpec spec-first workflow, branching strategy, PR standards, quality gates, and the role of each virtual engineering team member
- `testing-strategy`: Test framework, test types (unit vs integration), coverage gates, how to run tests, and how to write new tests following project conventions
- `deployment-operations`: Docker build and run, Terraform multi-region deployment, environment configuration, Prometheus/Grafana monitoring, and operational runbooks
- `sdk-guide`: Integration guide for Node.js, Python, Go, and Java SDKs — installation, authentication, all major operations, error handling
### Modified Capabilities
_(none — this change adds documentation only; no existing spec-level behavior changes)_
## Impact
- **Repository**: New `docs/engineering/` directory (~15 documents, ~10,000 lines of markdown)
- **No code changes**: Documentation only — zero impact on `src/`, `tests/`, `sdk/`, or infrastructure
- **Dependencies**: None — no new packages required
- **APIs**: No API changes
- **Existing docs**: `docs/developers/` (bedroom developer guide) and `docs/devops/` (operations) remain unchanged; this is an additive engineering-internal knowledge base

View File

@@ -0,0 +1,35 @@
## ADDED Requirements
### Requirement: System architecture document
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
#### Scenario: Component diagram present
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
#### Scenario: Request lifecycle explained
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
#### Scenario: Data flow for authentication described
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
#### Scenario: Deployment topology covered
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
### Requirement: Technology stack and ADR document
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
#### Scenario: Every major technology documented with rationale
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
#### Scenario: TypeScript strict mode rationale explained
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Annotated code walkthrough documents
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
#### Scenario: Token issuance walkthrough complete
- **WHEN** a new engineer reads the token issuance walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number where it occurs
#### Scenario: Agent registration walkthrough complete
- **WHEN** a new engineer reads the agent registration walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Credential rotation walkthrough complete
- **WHEN** a new engineer reads the credential rotation walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Walkthroughs include version reference
- **WHEN** a new engineer reads any walkthrough
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
#### Scenario: Each walkthrough annotates why, not just what
- **WHEN** a new engineer reads a walkthrough step
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before

View File

@@ -0,0 +1,24 @@
## ADDED Requirements
### Requirement: Codebase structure document
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
#### Scenario: Full directory tree annotated
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
#### Scenario: src/ subdirectory roles explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
#### Scenario: Where to add new code explained
- **WHEN** a new engineer needs to add a new feature
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
#### Scenario: Key files identified and explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
#### Scenario: DRY principle mapped to structure
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Deployment and operations guide
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
#### Scenario: Docker build and run documented
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
#### Scenario: Environment variables fully documented
- **WHEN** a new engineer needs to configure the application
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
#### Scenario: Database migrations documented
- **WHEN** a new engineer needs to run or write migrations
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
#### Scenario: Terraform multi-region deployment explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
#### Scenario: Prometheus metrics and Grafana explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
#### Scenario: Operational runbook for common tasks
- **WHEN** a new engineer is on-call or supporting operations
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Development environment setup guide
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
#### Scenario: Prerequisites listed completely
- **WHEN** a new engineer reads 07-dev-setup.md
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
#### Scenario: Repository clone and setup steps complete
- **WHEN** a new engineer follows the clone and setup steps
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
#### Scenario: Docker Compose local stack starts successfully
- **WHEN** a new engineer runs `docker-compose up -d`
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
#### Scenario: Smoke test confirms working stack
- **WHEN** a new engineer follows the smoke test section
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
#### Scenario: Common setup errors documented
- **WHEN** a new engineer encounters a setup error
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
#### Scenario: Running tests locally documented
- **WHEN** a new engineer wants to run the test suite
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
#### Scenario: Web dashboard local development documented
- **WHEN** a new engineer wants to run the web dashboard
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Company and product overview document
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
#### Scenario: Mission and vision covered
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
#### Scenario: AGNTCY alignment explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
#### Scenario: Product features listed
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
#### Scenario: Phase roadmap visible
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
#### Scenario: Engineering team structure explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
#### Scenario: Free tier limits documented
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Engineering workflow and contribution guide
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
#### Scenario: OpenSpec spec-first workflow explained
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
#### Scenario: OpenSpec CLI commands documented
- **WHEN** a new engineer wants to start a new change
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
#### Scenario: Branching strategy documented
- **WHEN** a new engineer creates a branch
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop``main` requires CTO + CEO approval
#### Scenario: TypeScript and code standards enforced in workflow
- **WHEN** a new engineer writes code
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
#### Scenario: PR checklist documented
- **WHEN** a new engineer opens a PR
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
#### Scenario: Virtual engineering team roles explained for contributors
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
#### Scenario: Commit message conventions documented
- **WHEN** a new engineer writes a commit message
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: SDK integration guide
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
#### Scenario: SDK architecture overview present
- **WHEN** a new engineer reads 11-sdk-guide.md
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
#### Scenario: Node.js SDK documented
- **WHEN** a new engineer reads the Node.js SDK section
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
#### Scenario: Python SDK documented
- **WHEN** a new engineer reads the Python SDK section
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
#### Scenario: Go SDK documented
- **WHEN** a new engineer reads the Go SDK section
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
#### Scenario: Java SDK documented
- **WHEN** a new engineer reads the Java SDK section
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
#### Scenario: SDK contribution guide included
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language

View File

@@ -0,0 +1,40 @@
## ADDED Requirements
### Requirement: Service deep-dive documents
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
#### Scenario: AgentService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
#### Scenario: OAuth2Service documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
#### Scenario: CredentialService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
#### Scenario: AuditService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
#### Scenario: VaultClient documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
#### Scenario: OPA policy engine documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
#### Scenario: Web Dashboard documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
#### Scenario: Monitoring stack documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
#### Scenario: Consistent template enforced
- **WHEN** a new engineer looks up any service
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Testing strategy document
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
#### Scenario: Test types and their purposes explained
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
#### Scenario: Test framework stack documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
#### Scenario: Coverage gates documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
#### Scenario: How to run the test suite documented
- **WHEN** a new engineer wants to run tests
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
#### Scenario: Unit test writing conventions shown
- **WHEN** a new engineer writes a new unit test
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
#### Scenario: Integration test writing conventions shown
- **WHEN** a new engineer writes a new integration test
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
#### Scenario: OWASP security testing reference included
- **WHEN** a new engineer writes security-relevant code
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)

View File

@@ -0,0 +1,35 @@
## ADDED Requirements
### Requirement: System architecture document
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
#### Scenario: Component diagram present
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
#### Scenario: Request lifecycle explained
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
#### Scenario: Data flow for authentication described
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
#### Scenario: Deployment topology covered
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
### Requirement: Technology stack and ADR document
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
#### Scenario: Every major technology documented with rationale
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
#### Scenario: TypeScript strict mode rationale explained
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Annotated code walkthrough documents
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
#### Scenario: Token issuance walkthrough complete
- **WHEN** a new engineer reads the token issuance walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number where it occurs
#### Scenario: Agent registration walkthrough complete
- **WHEN** a new engineer reads the agent registration walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Credential rotation walkthrough complete
- **WHEN** a new engineer reads the credential rotation walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Walkthroughs include version reference
- **WHEN** a new engineer reads any walkthrough
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
#### Scenario: Each walkthrough annotates why, not just what
- **WHEN** a new engineer reads a walkthrough step
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before

View File

@@ -0,0 +1,24 @@
## ADDED Requirements
### Requirement: Codebase structure document
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
#### Scenario: Full directory tree annotated
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
#### Scenario: src/ subdirectory roles explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
#### Scenario: Where to add new code explained
- **WHEN** a new engineer needs to add a new feature
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
#### Scenario: Key files identified and explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
#### Scenario: DRY principle mapped to structure
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Deployment and operations guide
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
#### Scenario: Docker build and run documented
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
#### Scenario: Environment variables fully documented
- **WHEN** a new engineer needs to configure the application
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
#### Scenario: Database migrations documented
- **WHEN** a new engineer needs to run or write migrations
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
#### Scenario: Terraform multi-region deployment explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
#### Scenario: Prometheus metrics and Grafana explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
#### Scenario: Operational runbook for common tasks
- **WHEN** a new engineer is on-call or supporting operations
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident

View File

@@ -0,0 +1,44 @@
# Spec: Multi-Region Deployment (Terraform)
**Status**: Pending CEO approval
**Workstream**: 8 of 8
## Scope
- `terraform/` directory at project root
- Shared `agentidp` module (compute, networking, secrets)
- `environments/aws/` — ECS Fargate + RDS PostgreSQL + ElastiCache Redis
- `environments/gcp/` — Cloud Run + Cloud SQL + Memorystore Redis
- Deployment guide: `docs/devops/deployment.md`
## Module structure
```
terraform/
modules/
agentidp/
main.tf — compute (ECS task or Cloud Run service)
networking.tf — VPC, subnets, security groups
variables.tf — all configurable inputs
outputs.tf — service URL, DB endpoint, Redis endpoint
rds/ — managed PostgreSQL
redis/ — managed Redis
lb/ — ALB (AWS) or Cloud LB (GCP), TLS cert
environments/
aws/
main.tf — calls modules, sets AWS-specific vars
variables.tf
terraform.tfvars.example
gcp/
main.tf
variables.tf
terraform.tfvars.example
```
## Acceptance Criteria
- [ ] `terraform validate` passes for both aws and gcp environments
- [ ] `terraform plan` produces no errors against a live AWS/GCP account (test in dev env)
- [ ] JWT_PRIVATE_KEY and JWT_PUBLIC_KEY injected as environment secrets (not hardcoded)
- [ ] TLS termination at load balancer — HTTPS only in production modules
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Development environment setup guide
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
#### Scenario: Prerequisites listed completely
- **WHEN** a new engineer reads 07-dev-setup.md
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
#### Scenario: Repository clone and setup steps complete
- **WHEN** a new engineer follows the clone and setup steps
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
#### Scenario: Docker Compose local stack starts successfully
- **WHEN** a new engineer runs `docker-compose up -d`
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
#### Scenario: Smoke test confirms working stack
- **WHEN** a new engineer follows the smoke test section
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
#### Scenario: Common setup errors documented
- **WHEN** a new engineer encounters a setup error
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
#### Scenario: Running tests locally documented
- **WHEN** a new engineer wants to run the test suite
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
#### Scenario: Web dashboard local development documented
- **WHEN** a new engineer wants to run the web dashboard
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Company and product overview document
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
#### Scenario: Mission and vision covered
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
#### Scenario: AGNTCY alignment explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
#### Scenario: Product features listed
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
#### Scenario: Phase roadmap visible
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
#### Scenario: Engineering team structure explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
#### Scenario: Free tier limits documented
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Engineering workflow and contribution guide
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
#### Scenario: OpenSpec spec-first workflow explained
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
#### Scenario: OpenSpec CLI commands documented
- **WHEN** a new engineer wants to start a new change
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
#### Scenario: Branching strategy documented
- **WHEN** a new engineer creates a branch
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop``main` requires CTO + CEO approval
#### Scenario: TypeScript and code standards enforced in workflow
- **WHEN** a new engineer writes code
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
#### Scenario: PR checklist documented
- **WHEN** a new engineer opens a PR
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
#### Scenario: Virtual engineering team roles explained for contributors
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
#### Scenario: Commit message conventions documented
- **WHEN** a new engineer writes a commit message
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each

View File

@@ -0,0 +1,23 @@
# Spec: Go SDK (`github.com/sentryagent/idp-sdk-go`)
**Status**: Pending CEO approval
**Workstream**: 3 of 8
## Scope
- `sdk-go/` directory at project root
- Context-aware `AgentIdPClient` using standard library `net/http`
- `TokenManager` with mutex-guarded cache and 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- Idiomatic Go error type `AgentIdPError` implementing `error` interface
- `go.mod` module: `github.com/sentryagent/idp-sdk-go`
- `sdk-go/README.md`
## Acceptance Criteria
- [ ] All 14 endpoints covered
- [ ] All methods take `context.Context` as first argument
- [ ] No panics — all errors returned as `error`
- [ ] `AgentIdPError` implements `error` and exposes `.Code`, `.HTTPStatus`, `.Details`
- [ ] `TokenManager` is goroutine-safe (`sync.Mutex` on cache)
- [ ] `go vet` and `staticcheck` pass with zero warnings
- [ ] `go test ./...` with >80% coverage
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,23 @@
# Spec: Java SDK (`ai.sentryagent:idp-sdk`)
**Status**: Pending CEO approval
**Workstream**: 4 of 8
## Scope
- `sdk-java/` directory at project root
- `AgentIdPClient` with sync and `CompletableFuture` async variants
- `TokenManager` with thread-safe cache and 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- `AgentIdPException` extending `RuntimeException` with `code`, `httpStatus`, `details`
- `pom.xml`: groupId=`ai.sentryagent`, artifactId=`idp-sdk`, Java 17+
- `sdk-java/README.md`
## Acceptance Criteria
- [ ] All 14 endpoints covered
- [ ] Sync methods return typed POJOs; async methods return `CompletableFuture<T>`
- [ ] `AgentIdPException` thrown (not raw IOException) on all failure paths
- [ ] `TokenManager` is thread-safe (`synchronized` on cache)
- [ ] Apache HttpClient 5 for HTTP transport
- [ ] Jackson for JSON serialization
- [ ] `mvn verify` passes with >80% coverage (JUnit 5)
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,32 @@
# Spec: Prometheus + Grafana Monitoring
**Status**: Pending CEO approval
**Workstream**: 7 of 8
## Scope
- `prom-client` integration — expose `GET /metrics`
- 7 metrics (counters + histograms) across all services
- `monitoring/` directory: Prometheus config + Grafana provisioning
- `docker-compose.monitoring.yml` overlay (adds prometheus + grafana services)
- Pre-built Grafana dashboard JSON (`monitoring/grafana/dashboards/agentidp.json`)
## Metrics
| Metric | Type | Labels |
|--------|------|--------|
| `agentidp_tokens_issued_total` | Counter | `outcome` (success/failure) |
| `agentidp_agents_registered_total` | Counter | `outcome` |
| `agentidp_http_requests_total` | Counter | `method`, `path`, `status_code` |
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `path` |
| `agentidp_rate_limit_rejections_total` | Counter | — |
| `agentidp_db_query_duration_seconds` | Histogram | `operation` |
| `agentidp_redis_command_duration_seconds` | Histogram | `command` |
## Acceptance Criteria
- [ ] `GET /metrics` returns Prometheus text format
- [ ] `/metrics` endpoint does NOT require Bearer auth (Prometheus scrapes it)
- [ ] All 7 metrics present and updating under load
- [ ] Grafana dashboard auto-provisions on `docker compose -f docker-compose.monitoring.yml up`
- [ ] Grafana runs on port 3001 (no conflict with AgentIdP on 3000)
- [ ] `docs/devops/operations.md` updated with monitoring section
- [ ] `prom-client` added as new dependency — CEO approval gate

View File

@@ -0,0 +1,37 @@
# Spec: OPA Policy Engine Integration
**Status**: Pending CEO approval
**Workstream**: 5 of 8
## Scope
- New `OpaMiddleware` replacing static scope check in `auth.ts`
- `@openpolicyagent/opa-wasm` integration (embedded Wasm, no sidecar)
- `policies/authz.rego` — main allow/deny policy
- `policies/data/scopes.json` — scope to permission mapping
- SIGHUP handler to hot-reload policies without restart
- New env var: `POLICY_DIR` (default: `./policies`)
## Policy interface
```
input = {
"method": "GET",
"path": "/api/v1/agents",
"scopes": ["agents:read"],
"agentId": "uuid"
}
output = {
"allow": true | false,
"reason": "string" // populated when allow=false
}
```
## Acceptance Criteria
- [ ] All existing scope checks replaced by OPA evaluation
- [ ] Policy files hot-reloadable on SIGHUP (no restart required)
- [ ] OPA Wasm loaded at startup — fail-fast if `POLICY_DIR` invalid
- [ ] `allow=false` responses return `403` with `reason` in error body
- [ ] Existing test suite passes unchanged (OPA evaluates same rules as before)
- [ ] New unit tests for OPA middleware: allow/deny cases, missing scope, invalid input
- [ ] `POLICY_DIR` env var documented in `docs/devops/environment-variables.md`

View File

@@ -0,0 +1,24 @@
# Spec: Python SDK (`sentryagent-idp`)
**Status**: Pending CEO approval
**Workstream**: 2 of 8
## Scope
- `sdk-python/` directory at project root
- `AgentIdPClient` with sync and async variants
- `TokenManager` with 60s auto-refresh
- Service clients: `AgentRegistryClient`, `CredentialClient`, `TokenClient`, `AuditClient`
- `AgentIdPError` typed exception
- Full type hints — `mypy --strict` clean
- `sdk-python/README.md` with installation and usage
## Acceptance Criteria
- [ ] All 14 API endpoints covered
- [ ] Sync client: `requests` library
- [ ] Async client: `httpx` library
- [ ] `mypy --strict` passes with zero errors
- [ ] Zero untyped code
- [ ] `AgentIdPError` raised (not raw requests/httpx exceptions) on all failure paths
- [ ] `TokenManager` tested: caches token, refreshes at exp-60s
- [ ] `pyproject.toml` with: name=sentryagent-idp, python>=3.9, dependencies declared
- [ ] README matches Node.js SDK structure

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: SDK integration guide
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
#### Scenario: SDK architecture overview present
- **WHEN** a new engineer reads 11-sdk-guide.md
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
#### Scenario: Node.js SDK documented
- **WHEN** a new engineer reads the Node.js SDK section
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
#### Scenario: Python SDK documented
- **WHEN** a new engineer reads the Python SDK section
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
#### Scenario: Go SDK documented
- **WHEN** a new engineer reads the Go SDK section
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
#### Scenario: Java SDK documented
- **WHEN** a new engineer reads the Java SDK section
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
#### Scenario: SDK contribution guide included
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language

View File

@@ -0,0 +1,40 @@
## ADDED Requirements
### Requirement: Service deep-dive documents
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
#### Scenario: AgentService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
#### Scenario: OAuth2Service documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
#### Scenario: CredentialService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
#### Scenario: AuditService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
#### Scenario: VaultClient documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
#### Scenario: OPA policy engine documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
#### Scenario: Web Dashboard documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
#### Scenario: Monitoring stack documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
#### Scenario: Consistent template enforced
- **WHEN** a new engineer looks up any service
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Testing strategy document
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
#### Scenario: Test types and their purposes explained
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
#### Scenario: Test framework stack documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
#### Scenario: Coverage gates documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
#### Scenario: How to run the test suite documented
- **WHEN** a new engineer wants to run tests
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
#### Scenario: Unit test writing conventions shown
- **WHEN** a new engineer writes a new unit test
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
#### Scenario: Integration test writing conventions shown
- **WHEN** a new engineer writes a new integration test
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
#### Scenario: OWASP security testing reference included
- **WHEN** a new engineer writes security-relevant code
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)

View File

@@ -0,0 +1,21 @@
# Spec: HashiCorp Vault Integration
**Status**: Pending CEO approval
**Workstream**: 1 of 8
## Scope
- VaultClient class wrapping `node-vault`
- `005_add_vault_path.sql` migration
- Updated CredentialService to write secrets to Vault instead of PostgreSQL
- New env vars: VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT
- Migration guide: bcrypt → Vault coexistence strategy
## Acceptance Criteria
- [ ] New credentials: secret written to Vault KV v2, `vault_path` stored in PostgreSQL
- [ ] Credential rotation: Vault versioned update, `vault_path` unchanged
- [ ] Credential revocation: Vault secret deleted, DB status = `revoked`
- [ ] Existing bcrypt credentials continue to work until rotated
- [ ] VaultClient follows existing service interface pattern (DRY, SOLID)
- [ ] Zero `any` types, TypeScript strict
- [ ] `VAULT_ADDR` / `VAULT_TOKEN` validation at startup (fail-fast)
- [ ] DevOps docs updated with Vault setup section

View File

@@ -0,0 +1,34 @@
# Spec: Web Dashboard UI
**Status**: Pending CEO approval
**Workstream**: 6 of 8
## Scope
- `dashboard/` directory at project root
- React 18 + TypeScript strict, built with Vite 5
- TanStack Query v5 for server state
- shadcn/ui (Radix UI + Tailwind CSS) for components
- Four pages: Agents, Credentials, Audit Log, Health
- Client-side auth: `clientId` + `clientSecret``TokenManager`
- Served from AgentIdP server at `GET /dashboard` (static build)
## Pages
| Page | Route | Scope Required |
|------|-------|---------------|
| Login | `/dashboard/login` | None |
| Agents | `/dashboard/agents` | `agents:read` |
| Agent Detail | `/dashboard/agents/:id` | `agents:read` |
| Credentials | `/dashboard/agents/:id/credentials` | `agents:read` |
| Audit Log | `/dashboard/audit` | `audit:read` |
| Health | `/dashboard/health` | None |
## Acceptance Criteria
- [ ] TypeScript strict — zero `any` across all dashboard files
- [ ] `dashboard/tsconfig.json` with `strict: true`
- [ ] Login form stores token in `sessionStorage` only (not `localStorage`)
- [ ] All write operations (suspend, revoke, rotate) require confirmation dialog
- [ ] OWASP Top 10 review: no XSS, no CSRF, no sensitive data in URL params
- [ ] Vite build outputs to `dashboard/dist/`; AgentIdP serves it as static
- [ ] `dashboard/README.md` — how to build and serve
- [ ] Responsive layout — functional on desktop and tablet