docs: engineering knowledge base for new hires
Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
140
docs/engineering/02-architecture.md
Normal file
140
docs/engineering/02-architecture.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# System Architecture
|
||||
|
||||
---
|
||||
|
||||
## 1. Component Diagram
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Client["Client (AI Agent / Browser / CI)"]
|
||||
|
||||
Client -->|HTTPS| ExpressApp["Express App (AgentIdP)"]
|
||||
|
||||
subgraph ExpressApp["Express App — src/app.ts"]
|
||||
Router["Router (src/routes/)"]
|
||||
AuthMW["authMiddleware (src/middleware/auth.ts)"]
|
||||
OpaMW["opaMiddleware (src/middleware/opa.ts)"]
|
||||
Controller["Controller (src/controllers/)"]
|
||||
Service["Service (src/services/)"]
|
||||
Repository["Repository (src/repositories/)"]
|
||||
Router --> AuthMW --> OpaMW --> Controller --> Service --> Repository
|
||||
end
|
||||
|
||||
Repository -->|parameterized SQL| PG["PostgreSQL 14\n(agents, credentials, audit_events, token_revocations)"]
|
||||
Service -->|Redis commands| Redis["Redis 7\n(token revocation list, monthly counts, rate-limit counters)"]
|
||||
Service -->|KV v2 read/write| Vault["HashiCorp Vault\n(opt-in — when VAULT_ADDR is set)"]
|
||||
|
||||
ExpressApp -->|evaluate input| OPA["OPA Policy Engine\n(policies/authz.rego + data/scopes.json)"]
|
||||
ExpressApp -->|expose| Metrics["/metrics (prom-client)"]
|
||||
|
||||
Dashboard["Dashboard SPA (React 18 + Vite 5)\ndashboard/dist/ served from /dashboard"]
|
||||
Client -->|browser| Dashboard
|
||||
Dashboard -->|REST API calls| ExpressApp
|
||||
|
||||
Grafana["Grafana (port 3001)"] -->|scrapes| Metrics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. HTTP Request Lifecycle
|
||||
|
||||
Every authenticated API request travels through the following sequence. Understanding
|
||||
this sequence end-to-end is essential for debugging and for writing new endpoints
|
||||
correctly.
|
||||
|
||||
1. HTTP request arrives at the Node.js HTTP listener — configured in `src/server.ts`, which calls `app.listen(PORT)` after `createApp()` resolves.
|
||||
2. App-level middleware runs in registration order: `helmet()` sets security headers, `cors()` applies CORS policy from `CORS_ORIGIN`, `morgan('combined')` logs the request line (skipped in `NODE_ENV=test`), `express.json()` and `express.urlencoded()` parse the body, `metricsMiddleware` (`src/middleware/metrics.ts`) starts the request timer and records `agentidp_http_requests_total` and `agentidp_http_request_duration_seconds` on response finish.
|
||||
3. The Express router matches the path to a route definition in `src/routes/*.ts` and hands off to the appropriate middleware chain.
|
||||
4. `authMiddleware` (`src/middleware/auth.ts`) validates the Bearer JWT: extracts the token from the `Authorization` header, calls `verifyToken()` for RS256 signature and expiry, then calls `redis.get('revoked:{jti}')` to check the revocation list. On success, attaches the decoded `ITokenPayload` to `req.user`.
|
||||
5. `opaMiddleware` (`src/middleware/opa.ts`) evaluates the OPA policy: builds an `OpaInput` object from `req.method`, `req.baseUrl + req.path`, and `req.user.scope.split(' ')`, then calls `evaluate(input)`. Uses the Wasm bundle (`policies/authz.wasm`) when present, or the TypeScript fallback reading `policies/data/scopes.json`. Calls `next(new AuthorizationError())` if the policy denies.
|
||||
6. The controller (`src/controllers/*.ts`) receives the validated request, extracts and validates path params and body using Joi schemas, then delegates to the service layer.
|
||||
7. The service (`src/services/*.ts`) executes all business logic — enforces free-tier limits, resolves domain rules, and calls repositories. The service has no knowledge of HTTP.
|
||||
8. The repository (`src/repositories/*.ts`) executes parameterized SQL against PostgreSQL via `node-postgres`, or issues Redis commands via the `redis` client. No business logic lives here.
|
||||
9. The controller serialises the service result and calls `res.status(xxx).json(payload)`.
|
||||
10. `AuditService.logEvent()` is called — for high-throughput paths (token issuance, introspection, revocation) this is fire-and-forget (`void` — not awaited); for CRUD operations it is awaited. The audit event is written as an immutable row to the `audit_events` table in PostgreSQL.
|
||||
|
||||
---
|
||||
|
||||
## 3. OAuth 2.0 Client Credentials Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
actor Agent
|
||||
participant AgentIdP
|
||||
participant PostgreSQL
|
||||
participant Redis
|
||||
participant Vault as Vault (optional)
|
||||
|
||||
Agent->>AgentIdP: POST /api/v1/token<br/>grant_type=client_credentials<br/>client_id=<agentId><br/>client_secret=sk_live_...&<br/>scope=agents:read agents:write
|
||||
|
||||
AgentIdP->>PostgreSQL: SELECT * FROM agents WHERE agent_id = $1
|
||||
PostgreSQL-->>AgentIdP: agent row (status, etc.)
|
||||
|
||||
AgentIdP->>PostgreSQL: SELECT * FROM credentials WHERE agent_id = $1 AND status = 'active'
|
||||
PostgreSQL-->>AgentIdP: active credential rows
|
||||
|
||||
alt Vault path (vaultPath IS NOT NULL and VAULT_ADDR is set)
|
||||
AgentIdP->>Vault: readSecret(agentId, credentialId)
|
||||
Vault-->>AgentIdP: plain-text secret
|
||||
AgentIdP->>AgentIdP: crypto.timingSafeEqual(stored, candidate)
|
||||
else bcrypt path (fallback)
|
||||
AgentIdP->>AgentIdP: bcrypt.compare(clientSecret, secretHash)
|
||||
end
|
||||
|
||||
AgentIdP->>Redis: GET monthly:tokens:{agentId}:{yyyy-mm}
|
||||
Redis-->>AgentIdP: current monthly count
|
||||
|
||||
AgentIdP->>AgentIdP: signToken(payload, privateKey) — RS256 JWT
|
||||
|
||||
AgentIdP->>Redis: INCR monthly:tokens:{agentId}:{yyyy-mm} (fire-and-forget)
|
||||
|
||||
AgentIdP-->>Agent: 200 OK<br/>{ access_token, token_type: "Bearer", expires_in: 3600, scope }
|
||||
|
||||
Note over Agent,AgentIdP: Subsequent protected API call
|
||||
|
||||
Agent->>AgentIdP: GET /api/v1/agents<br/>Authorization: Bearer <access_token>
|
||||
AgentIdP->>AgentIdP: verifyToken(token, publicKey) — RS256 verify + expiry
|
||||
AgentIdP->>Redis: GET revoked:{jti}
|
||||
Redis-->>AgentIdP: null (not revoked)
|
||||
AgentIdP->>AgentIdP: OPA evaluate({method, path, scopes})
|
||||
AgentIdP-->>Agent: 200 OK — agents list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Multi-Region Deployment Topology
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
TFRoot["Terraform Root Module\nterraform/"]
|
||||
TFRoot --> AWSMod["AWS Module\nterraform/environments/aws/"]
|
||||
TFRoot --> GCPMod["GCP Module\nterraform/environments/gcp/"]
|
||||
|
||||
subgraph AWS["AWS (us-east-1 default)"]
|
||||
AWSVPC["VPC"] --> ECSCluster["ECS Cluster (Fargate)"]
|
||||
ECSCluster --> ECSTask["ECS Task — AgentIdP container"]
|
||||
ECSTask --> RDS["RDS PostgreSQL 14 (Multi-AZ)"]
|
||||
ECSTask --> Elasticache["ElastiCache Redis 7"]
|
||||
ALB["Application Load Balancer"] --> ECSCluster
|
||||
end
|
||||
|
||||
subgraph GCP["GCP (us-central1 default)"]
|
||||
GCPVPC["VPC"] --> CloudRun["Cloud Run service — AgentIdP"]
|
||||
CloudRun --> CloudSQL["Cloud SQL PostgreSQL 14"]
|
||||
CloudRun --> Memorystore["Memorystore Redis 7"]
|
||||
GCPLB["Cloud Load Balancer"] --> CloudRun
|
||||
end
|
||||
|
||||
AWSMod --> AWS
|
||||
GCPMod --> GCP
|
||||
|
||||
ECR["ECR / Artifact Registry\n(container image)"] --> ECSTask
|
||||
ECR --> CloudRun
|
||||
```
|
||||
|
||||
Each region is an independent deployment with its own PostgreSQL and Redis instances.
|
||||
The Terraform root module sets `aws_region` (default `us-east-1`) and `gcp_region`
|
||||
(default `us-central1`) as input variables. Infrastructure modules live under
|
||||
`terraform/modules/` (agentidp, lb, rds, redis) with environment-specific configuration
|
||||
under `terraform/environments/aws/` and `terraform/environments/gcp/`. Cross-region
|
||||
data replication and federation are Phase 3 goals.
|
||||
Reference in New Issue
Block a user