Files

SentryAgent.ai Developer eced5f8699 docs: engineering knowledge base for new hires

Complete docs/engineering/ suite — 12 documents covering company overview,
system architecture, tech stack ADRs, codebase structure, service deep dives,
annotated code walkthroughs, dev setup, engineering workflow, testing strategy,
deployment/ops, SDK guide, and README index. All content verified against
source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md
marked complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-29 12:38:42 +00:00

7.0 KiB

Raw Blame History

System Architecture

1. Component Diagram

graph TD
    Client["Client (AI Agent / Browser / CI)"]

    Client -->|HTTPS| ExpressApp["Express App (AgentIdP)"]

    subgraph ExpressApp["Express App — src/app.ts"]
        Router["Router (src/routes/)"]
        AuthMW["authMiddleware (src/middleware/auth.ts)"]
        OpaMW["opaMiddleware (src/middleware/opa.ts)"]
        Controller["Controller (src/controllers/)"]
        Service["Service (src/services/)"]
        Repository["Repository (src/repositories/)"]
        Router --> AuthMW --> OpaMW --> Controller --> Service --> Repository
    end

    Repository -->|parameterized SQL| PG["PostgreSQL 14\n(agents, credentials, audit_events, token_revocations)"]
    Service -->|Redis commands| Redis["Redis 7\n(token revocation list, monthly counts, rate-limit counters)"]
    Service -->|KV v2 read/write| Vault["HashiCorp Vault\n(opt-in — when VAULT_ADDR is set)"]

    ExpressApp -->|evaluate input| OPA["OPA Policy Engine\n(policies/authz.rego + data/scopes.json)"]
    ExpressApp -->|expose| Metrics["/metrics (prom-client)"]

    Dashboard["Dashboard SPA (React 18 + Vite 5)\ndashboard/dist/ served from /dashboard"]
    Client -->|browser| Dashboard
    Dashboard -->|REST API calls| ExpressApp

    Grafana["Grafana (port 3001)"] -->|scrapes| Metrics

2. HTTP Request Lifecycle

Every authenticated API request travels through the following sequence. Understanding this sequence end-to-end is essential for debugging and for writing new endpoints correctly.

HTTP request arrives at the Node.js HTTP listener — configured in src/server.ts, which calls app.listen(PORT) after createApp() resolves.
App-level middleware runs in registration order: helmet() sets security headers, cors() applies CORS policy from CORS_ORIGIN, morgan('combined') logs the request line (skipped in NODE_ENV=test), express.json() and express.urlencoded() parse the body, metricsMiddleware (src/middleware/metrics.ts) starts the request timer and records agentidp_http_requests_total and agentidp_http_request_duration_seconds on response finish.
The Express router matches the path to a route definition in src/routes/*.ts and hands off to the appropriate middleware chain.
authMiddleware (src/middleware/auth.ts) validates the Bearer JWT: extracts the token from the Authorization header, calls verifyToken() for RS256 signature and expiry, then calls redis.get('revoked:{jti}') to check the revocation list. On success, attaches the decoded ITokenPayload to req.user.
opaMiddleware (src/middleware/opa.ts) evaluates the OPA policy: builds an OpaInput object from req.method, req.baseUrl + req.path, and req.user.scope.split(' '), then calls evaluate(input). Uses the Wasm bundle (policies/authz.wasm) when present, or the TypeScript fallback reading policies/data/scopes.json. Calls next(new AuthorizationError()) if the policy denies.
The controller (src/controllers/*.ts) receives the validated request, extracts and validates path params and body using Joi schemas, then delegates to the service layer.
The service (src/services/*.ts) executes all business logic — enforces free-tier limits, resolves domain rules, and calls repositories. The service has no knowledge of HTTP.
The repository (src/repositories/*.ts) executes parameterized SQL against PostgreSQL via node-postgres, or issues Redis commands via the redis client. No business logic lives here.
The controller serialises the service result and calls res.status(xxx).json(payload).
AuditService.logEvent() is called — for high-throughput paths (token issuance, introspection, revocation) this is fire-and-forget (void — not awaited); for CRUD operations it is awaited. The audit event is written as an immutable row to the audit_events table in PostgreSQL.

3. OAuth 2.0 Client Credentials Flow

sequenceDiagram
    actor Agent
    participant AgentIdP
    participant PostgreSQL
    participant Redis
    participant Vault as Vault (optional)

    Agent->>AgentIdP: POST /api/v1/token<br/>grant_type=client_credentials<br/>client_id=&lt;agentId&gt;<br/>client_secret=sk_live_...&<br/>scope=agents:read agents:write

    AgentIdP->>PostgreSQL: SELECT * FROM agents WHERE agent_id = $1
    PostgreSQL-->>AgentIdP: agent row (status, etc.)

    AgentIdP->>PostgreSQL: SELECT * FROM credentials WHERE agent_id = $1 AND status = 'active'
    PostgreSQL-->>AgentIdP: active credential rows

    alt Vault path (vaultPath IS NOT NULL and VAULT_ADDR is set)
        AgentIdP->>Vault: readSecret(agentId, credentialId)
        Vault-->>AgentIdP: plain-text secret
        AgentIdP->>AgentIdP: crypto.timingSafeEqual(stored, candidate)
    else bcrypt path (fallback)
        AgentIdP->>AgentIdP: bcrypt.compare(clientSecret, secretHash)
    end

    AgentIdP->>Redis: GET monthly:tokens:{agentId}:{yyyy-mm}
    Redis-->>AgentIdP: current monthly count

    AgentIdP->>AgentIdP: signToken(payload, privateKey) — RS256 JWT

    AgentIdP->>Redis: INCR monthly:tokens:{agentId}:{yyyy-mm} (fire-and-forget)

    AgentIdP-->>Agent: 200 OK<br/>{ access_token, token_type: "Bearer", expires_in: 3600, scope }

    Note over Agent,AgentIdP: Subsequent protected API call

    Agent->>AgentIdP: GET /api/v1/agents<br/>Authorization: Bearer &lt;access_token&gt;
    AgentIdP->>AgentIdP: verifyToken(token, publicKey) — RS256 verify + expiry
    AgentIdP->>Redis: GET revoked:{jti}
    Redis-->>AgentIdP: null (not revoked)
    AgentIdP->>AgentIdP: OPA evaluate({method, path, scopes})
    AgentIdP-->>Agent: 200 OK — agents list

4. Multi-Region Deployment Topology

graph LR
    TFRoot["Terraform Root Module\nterraform/"]
    TFRoot --> AWSMod["AWS Module\nterraform/environments/aws/"]
    TFRoot --> GCPMod["GCP Module\nterraform/environments/gcp/"]

    subgraph AWS["AWS (us-east-1 default)"]
        AWSVPC["VPC"] --> ECSCluster["ECS Cluster (Fargate)"]
        ECSCluster --> ECSTask["ECS Task — AgentIdP container"]
        ECSTask --> RDS["RDS PostgreSQL 14 (Multi-AZ)"]
        ECSTask --> Elasticache["ElastiCache Redis 7"]
        ALB["Application Load Balancer"] --> ECSCluster
    end

    subgraph GCP["GCP (us-central1 default)"]
        GCPVPC["VPC"] --> CloudRun["Cloud Run service — AgentIdP"]
        CloudRun --> CloudSQL["Cloud SQL PostgreSQL 14"]
        CloudRun --> Memorystore["Memorystore Redis 7"]
        GCPLB["Cloud Load Balancer"] --> CloudRun
    end

    AWSMod --> AWS
    GCPMod --> GCP

    ECR["ECR / Artifact Registry\n(container image)"] --> ECSTask
    ECR --> CloudRun

Each region is an independent deployment with its own PostgreSQL and Redis instances. The Terraform root module sets aws_region (default us-east-1) and gcp_region (default us-central1) as input variables. Infrastructure modules live under terraform/modules/ (agentidp, lb, rds, redis) with environment-specific configuration under terraform/environments/aws/ and terraform/environments/gcp/. Cross-region data replication and federation are Phase 3 goals.

7.0 KiB Raw Blame History