# System Architecture

---

## 1. Component Diagram

```mermaid
graph TD
    Client["Client (AI Agent / Browser / CI)"]

    Client -->|HTTPS| ExpressApp["Express App (AgentIdP)"]

    subgraph ExpressApp["Express App — src/app.ts"]
        Router["Router (src/routes/)"]
        AuthMW["authMiddleware (src/middleware/auth.ts)"]
        TierMW["tierMiddleware (src/middleware/tier.ts)"]
        OpaMW["opaMiddleware (src/middleware/opa.ts)"]
        Controller["Controller (src/controllers/)"]
        Service["Service (src/services/)"]
        Repository["Repository (src/repositories/)"]
        Router --> AuthMW --> TierMW --> OpaMW --> Controller --> Service --> Repository
    end

    Repository -->|parameterized SQL| PG["PostgreSQL 14\n(agents, credentials, audit_events,\nanalytics_events, organizations,\nfederation_partners, webhook_subscriptions,\nagent_did_keys, delegation_chains)"]
    Service -->|Redis commands| Redis["Redis 7\n(token revocation list, daily tier counters,\nJWKS cache, compliance report cache,\nDID document cache)"]
    Service -->|KV v2 read/write| Vault["HashiCorp Vault\n(opt-in — credentials, DID private keys,\nwebhook secrets — when VAULT_ADDR is set)"]

    ExpressApp -->|evaluate input| OPA["OPA Policy Engine\n(policies/authz.rego + data/scopes.json)"]
    ExpressApp -->|expose| Metrics["/metrics (prom-client)"]
    ExpressApp -->|checkout session / webhooks| Stripe["Stripe\n(billing — when STRIPE_SECRET_KEY is set)"]

    Dashboard["Dashboard SPA (React 18 + Vite 5)\ndashboard/dist/ served from /dashboard"]
    Portal["Developer Portal (Next.js 14)\nportal/ — served separately on port 3002"]
    Client -->|browser| Dashboard
    Client -->|browser| Portal
    Dashboard -->|REST API calls| ExpressApp
    Portal -->|REST API calls| ExpressApp

    Grafana["Grafana (port 3001)"] -->|scrapes| Metrics

    OIDCProvider["OIDC Provider (oidc-provider v9)\nmounted at /oidc — A2A delegation tokens"]
    ExpressApp --- OIDCProvider
```

---

## 2. HTTP Request Lifecycle

Every authenticated API request travels through the following sequence. Understanding
this sequence end-to-end is essential for debugging and for writing new endpoints
correctly.

1. HTTP request arrives at the Node.js HTTP listener — configured in `src/server.ts`, which calls `app.listen(PORT)` after `createApp()` resolves.
2. App-level middleware runs in registration order: `helmet()` sets security headers, `cors()` applies CORS policy from `CORS_ORIGIN`, `morgan('combined')` logs the request line (skipped in `NODE_ENV=test`), `express.json()` and `express.urlencoded()` parse the body, `metricsMiddleware` (`src/middleware/metrics.ts`) starts the request timer and records `agentidp_http_requests_total` and `agentidp_http_request_duration_seconds` on response finish.
3. The Express router matches the path to a route definition in `src/routes/*.ts` and hands off to the appropriate middleware chain.
4. `authMiddleware` (`src/middleware/auth.ts`) validates the Bearer JWT: extracts the token from the `Authorization` header, calls `verifyToken()` for RS256 signature and expiry, then calls `redis.get('revoked:{jti}')` to check the revocation list. On success, attaches the decoded `ITokenPayload` to `req.user`.
5. `tierMiddleware` (`src/middleware/tier.ts`) enforces per-tier daily API call limits. It reads the organisation's current tier from `TierService.fetchTier(orgId)`, checks the daily call counter from Redis key `rate:tier:calls:<orgId>` against `TIER_CONFIG[tier].maxCallsPerDay`, increments the counter on each passing request (fire-and-forget `INCR` with TTL set to next UTC midnight), and throws `TierLimitError` (429) when the limit is reached. This middleware is applied only to API routes, not to `/health`, `/metrics`, or `/dashboard`.
6. `opaMiddleware` (`src/middleware/opa.ts`) evaluates the OPA policy: builds an `OpaInput` object from `req.method`, `req.baseUrl + req.path`, and `req.user.scope.split(' ')`, then calls `evaluate(input)`. Uses the Wasm bundle (`policies/authz.wasm`) when present, or the TypeScript fallback reading `policies/data/scopes.json`. Calls `next(new AuthorizationError())` if the policy denies.
7. The controller (`src/controllers/*.ts`) receives the validated request, extracts and validates path params and body using Joi schemas, then delegates to the service layer.
8. The service (`src/services/*.ts`) executes all business logic — enforces tier limits, resolves domain rules, and calls repositories. Phase 3–6 introduces specialised services: `AnalyticsService` (fire-and-forget event recording), `TierService` (enforces per-tier agent and call limits), `ComplianceService` (AGNTCY compliance reports, cached 5 min in Redis), `FederationService` (cross-IdP JWT verification with cached JWKS), `DIDService` (W3C DID document generation and caching), `WebhookService` (subscription management with Vault-backed HMAC secrets), and `BillingService` (Stripe Checkout and webhook processing). The service has no knowledge of HTTP.
9. The repository (`src/repositories/*.ts`) executes parameterized SQL against PostgreSQL via `node-postgres`, or issues Redis commands via the `redis` client. No business logic lives here. Phase 3–6 added the following tables: `analytics_events` (daily metric counters), `organizations` (org tier and billing), `federation_partners` (cross-IdP trust registry), `webhook_subscriptions` and `webhook_deliveries` (outbound event delivery), `agent_did_keys` (public EC keys for DID documents), `delegation_chains` (A2A delegation records), `tenant_subscriptions` (Stripe subscription status).
10. The controller serialises the service result and calls `res.status(xxx).json(payload)`.
11. `AuditService.logEvent()` is called — for high-throughput paths (token issuance, introspection, revocation) this is fire-and-forget (`void` — not awaited); for CRUD operations it is awaited. The audit event is written as an immutable row to the `audit_events` table in PostgreSQL.

---

## 3. OAuth 2.0 Client Credentials Flow

```mermaid
sequenceDiagram
    actor Agent
    participant AgentIdP
    participant PostgreSQL
    participant Redis
    participant Vault as Vault (optional)

    Agent->>AgentIdP: POST /api/v1/token<br/>grant_type=client_credentials<br/>client_id=&lt;agentId&gt;<br/>client_secret=sk_live_...&<br/>scope=agents:read agents:write

    AgentIdP->>PostgreSQL: SELECT * FROM agents WHERE agent_id = $1
    PostgreSQL-->>AgentIdP: agent row (status, etc.)

    AgentIdP->>PostgreSQL: SELECT * FROM credentials WHERE agent_id = $1 AND status = 'active'
    PostgreSQL-->>AgentIdP: active credential rows

    alt Vault path (vaultPath IS NOT NULL and VAULT_ADDR is set)
        AgentIdP->>Vault: readSecret(agentId, credentialId)
        Vault-->>AgentIdP: plain-text secret
        AgentIdP->>AgentIdP: crypto.timingSafeEqual(stored, candidate)
    else bcrypt path (fallback)
        AgentIdP->>AgentIdP: bcrypt.compare(clientSecret, secretHash)
    end

    AgentIdP->>Redis: GET monthly:tokens:{agentId}:{yyyy-mm}
    Redis-->>AgentIdP: current monthly count

    AgentIdP->>AgentIdP: signToken(payload, privateKey) — RS256 JWT

    AgentIdP->>Redis: INCR monthly:tokens:{agentId}:{yyyy-mm} (fire-and-forget)

    AgentIdP-->>Agent: 200 OK<br/>{ access_token, token_type: "Bearer", expires_in: 3600, scope }

    Note over Agent,AgentIdP: Subsequent protected API call

    Agent->>AgentIdP: GET /api/v1/agents<br/>Authorization: Bearer &lt;access_token&gt;
    AgentIdP->>AgentIdP: verifyToken(token, publicKey) — RS256 verify + expiry
    AgentIdP->>Redis: GET revoked:{jti}
    Redis-->>AgentIdP: null (not revoked)
    AgentIdP->>AgentIdP: OPA evaluate({method, path, scopes})
    AgentIdP-->>Agent: 200 OK — agents list
```

---

## 3b. Analytics Event Capture Flow

Every successful token issuance writes a fire-and-forget analytics event:

```mermaid
sequenceDiagram
    participant Controller as TokenController
    participant OAuth2Svc as OAuth2Service
    participant AnalyticsSvc as AnalyticsService
    participant PG as PostgreSQL

    Controller->>OAuth2Svc: issueToken(clientId, clientSecret, scope, ...)
    OAuth2Svc->>OAuth2Svc: signToken() — RS256 JWT
    OAuth2Svc-->>Controller: ITokenResponse

    Note over OAuth2Svc,AnalyticsSvc: fire-and-forget (void)
    OAuth2Svc-)AnalyticsSvc: recordEvent(tenantId, 'token_issued')
    AnalyticsSvc-)PG: INSERT INTO analytics_events ... ON CONFLICT DO UPDATE count + 1
```

`recordEvent` uses PostgreSQL `UPSERT` — one row per `(organization_id, date, metric_type)`. If the INSERT conflicts (same date, same org, same metric), the `count` column is incremented atomically. This keeps the table compact (one row per day per metric type per org) and fast to query.

---

## 3c. Tier Enforcement Middleware Chain

```mermaid
sequenceDiagram
    actor Agent
    participant TierMW as tierMiddleware
    participant TierSvc as TierService
    participant Redis
    participant PG as PostgreSQL

    Agent->>TierMW: API request (with valid Bearer token)
    TierMW->>TierSvc: fetchTier(orgId)
    TierSvc->>PG: SELECT tier FROM organizations WHERE organization_id = $1
    PG-->>TierSvc: 'pro'
    TierSvc-->>TierMW: 'pro'

    TierMW->>Redis: GET rate:tier:calls:<orgId>
    Redis-->>TierMW: "4999" (current daily count)

    Note over TierMW: TIER_CONFIG['pro'].maxCallsPerDay = 50000 — limit not reached

    TierMW-)Redis: INCR rate:tier:calls:<orgId> (fire-and-forget, TTL = next UTC midnight)
    TierMW->>Agent: next() — request proceeds to opaMiddleware
```

When the counter equals or exceeds the tier limit, `tierMiddleware` throws `TierLimitError` (429) before `opaMiddleware` runs. The daily counter resets at UTC midnight via Redis TTL.

---

## 3d. A2A Delegation End-to-End Flow

```mermaid
sequenceDiagram
    actor Delegator as Delegator Agent
    actor Delegatee as Delegatee Agent
    participant AgentIdP
    participant DelegationSvc as DelegationService
    participant OIDCProvider as OIDC Provider
    participant PG as PostgreSQL

    Delegator->>AgentIdP: POST /api/v1/oauth2/token/delegate<br/>{ delegatee_id, scope }
    AgentIdP->>DelegationSvc: createDelegation(delegatorId, delegateeId, scope)
    DelegationSvc->>PG: INSERT INTO delegation_chains ...
    PG-->>DelegationSvc: chain_id
    DelegationSvc->>OIDCProvider: issue delegation JWT (delegator claims + delegatee sub)
    OIDCProvider-->>DelegationSvc: signed delegation token
    DelegationSvc-->>AgentIdP: IDelegationChain (with token)
    AgentIdP-->>Delegator: 201 { token, chain_id }

    Note over Delegatee,AgentIdP: Delegatee uses the delegation token
    Delegatee->>AgentIdP: POST /api/v1/oauth2/token/verify-delegation<br/>{ token }
    AgentIdP->>DelegationSvc: verifyDelegation(token, delegateeId)
    DelegationSvc->>PG: SELECT * FROM delegation_chains WHERE chain_id = $1 AND status = 'active'
    PG-->>DelegationSvc: chain row (not expired, not revoked)
    DelegationSvc->>OIDCProvider: verify token signature
    OIDCProvider-->>DelegationSvc: verified claims
    DelegationSvc-->>AgentIdP: IDelegationVerifyResult { valid: true, ... }
    AgentIdP-->>Delegatee: 200 { valid: true, delegatorId, scope }
```

---

## 4. Multi-Region Deployment Topology

```mermaid
graph LR
    TFRoot["Terraform Root Module\nterraform/"]
    TFRoot --> AWSMod["AWS Module\nterraform/environments/aws/"]
    TFRoot --> GCPMod["GCP Module\nterraform/environments/gcp/"]

    subgraph AWS["AWS (us-east-1 default)"]
        AWSVPC["VPC"] --> ECSCluster["ECS Cluster (Fargate)"]
        ECSCluster --> ECSTask["ECS Task — AgentIdP container"]
        ECSTask --> RDS["RDS PostgreSQL 14 (Multi-AZ)"]
        ECSTask --> Elasticache["ElastiCache Redis 7"]
        ALB["Application Load Balancer"] --> ECSCluster
    end

    subgraph GCP["GCP (us-central1 default)"]
        GCPVPC["VPC"] --> CloudRun["Cloud Run service — AgentIdP"]
        CloudRun --> CloudSQL["Cloud SQL PostgreSQL 14"]
        CloudRun --> Memorystore["Memorystore Redis 7"]
        GCPLB["Cloud Load Balancer"] --> CloudRun
    end

    AWSMod --> AWS
    GCPMod --> GCP

    ECR["ECR / Artifact Registry\n(container image)"] --> ECSTask
    ECR --> CloudRun
```

Each region is an independent deployment with its own PostgreSQL and Redis instances.
The Terraform root module sets `aws_region` (default `us-east-1`) and `gcp_region`
(default `us-central1`) as input variables. Infrastructure modules live under
`terraform/modules/` (agentidp, lb, rds, redis) with environment-specific configuration
under `terraform/environments/aws/` and `terraform/environments/gcp/`. Cross-region
data replication and federation are Phase 3 goals.