# Service Deep Dives

---

### AgentService

**Purpose**: Manages the full lifecycle of AI agent identities — registration, retrieval, updates, and decommissioning.

**Responsibility boundary**: AgentService does not handle HTTP, credential secrets,
token issuance, or audit log queries. It delegates all data access to
`AgentRepository` and `CredentialRepository`, and all audit logging to `AuditService`.
It enforces free-tier limits and domain rules before any data is written.

**Public interface** (key methods):

| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| `registerAgent` | `data: ICreateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Checks the free-tier 100-agent limit, enforces email uniqueness, creates the agent record, writes an `agent.created` audit event, increments `agentidp_agents_registered_total` Prometheus counter |
| `getAgentById` | `agentId: string` | `Promise<IAgent>` | Retrieves a single agent by UUID; throws `AgentNotFoundError` if not found |
| `listAgents` | `filters: IAgentListFilters` | `Promise<IPaginatedAgentsResponse>` | Returns a paginated, optionally filtered list; filters include `owner`, `agentType`, `status`, `page`, `limit` |
| `updateAgent` | `agentId: string, data: IUpdateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Partially updates agent metadata; rejects updates to decommissioned agents; determines the correct audit action (`agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`) based on status transition |
| `decommissionAgent` | `agentId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Soft-deletes the agent (sets `status = 'decommissioned'`); revokes all active credentials by calling `credentialRepository.revokeAllForAgent(agentId)` before decommissioning |

**Database / storage schema**:
- Table `agents`: `agent_id` (UUID PK), `email` (UNIQUE), `agent_type`, `version`, `capabilities` (text array), `owner`, `deployment_env`, `status`, `created_at`, `updated_at`.
- No Redis usage — AgentService is PostgreSQL-only.

**Error types**:
- `FreeTierLimitError` (403) — 100-agent limit reached
- `AgentAlreadyExistsError` (409) — email already registered
- `AgentNotFoundError` (404) — agent UUID not found
- `AgentAlreadyDecommissionedError` (409) — agent is already decommissioned

**Configuration**: None — AgentService reads no environment variables. The free-tier limit (`FREE_TIER_MAX_AGENTS = 100`) is a module-level constant.

---

### OAuth2Service

**Purpose**: Issues, introspects, and revokes RS256 JWT access tokens via the OAuth 2.0 Client Credentials grant.

**Responsibility boundary**: OAuth2Service does not know about HTTP or routing. It
receives already-extracted values (`clientId`, `clientSecret`, `scope`) from the
controller, resolves credential verification (Vault or bcrypt), enforces the 10,000
tokens/month free-tier limit, and returns a typed `ITokenResponse`. All audit writes
on high-throughput paths (issue, introspect, revoke) are fire-and-forget (`void`) to
keep token endpoint latency low.

**Public interface** (key methods):

| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| `issueToken` | `clientId: string, clientSecret: string, scope: string, ipAddress: string, userAgent: string` | `Promise<ITokenResponse>` | Verifies credentials (Vault or bcrypt), checks agent status, enforces 10k/month limit, signs RS256 JWT, increments monthly counter and audit event as fire-and-forget |
| `introspectToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<IIntrospectResponse>` | Verifies JWT signature and checks Redis revocation list; always returns 200 with `active: true/false` per RFC 7662 |
| `revokeToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<void>` | Decodes token without verification; enforces that caller can only revoke their own tokens (`decoded.sub === callerPayload.sub`); adds JTI to Redis revocation list with TTL matching token expiry |

**Database / storage schema**:
- Redis key `revoked:{jti}` — value `1`, TTL = seconds until token expiry. Written on revocation; read on every authenticated request via `authMiddleware`.
- Redis key `monthly:tokens:{agentId}:{yyyy-mm}` — integer counter, incremented on every successful token issuance. Read to enforce the 10k/month free-tier limit.

**Error types**:
- `AuthenticationError` (401) — agent not found, or no active credential matches the provided secret
- `AuthorizationError` (403) — agent is suspended or decommissioned; or caller attempts to revoke another agent's token
- `FreeTierLimitError` (403) — 10,000 tokens/month limit reached

**Configuration**:
- `JWT_PRIVATE_KEY` — PEM-encoded RSA private key, required, read at app startup in `src/app.ts`
- `JWT_PUBLIC_KEY` — PEM-encoded RSA public key, required, read at app startup and in `authMiddleware`
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, Vault is used for credential verification instead of bcrypt

---

### CredentialService

**Purpose**: Manages the full lifecycle of agent credentials — generation, listing, rotation, and revocation.

**Responsibility boundary**: CredentialService does not know about HTTP or token
issuance. It enforces that credentials can only be generated for `active` agents. It
delegates secret storage to either `VaultClient` (Phase 2) or bcrypt (Phase 1 fallback).
The plain-text `clientSecret` is generated here, returned once in the response, and
never stored or logged — only the bcrypt hash or Vault path is persisted.

**Public interface** (key methods):

| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| `generateCredential` | `agentId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Verifies agent exists and is `active`; generates a cryptographically random secret via `generateClientSecret()`; writes to Vault (when configured) or hashes with bcrypt; returns `ICredentialWithSecret` — the only time the plain-text secret is returned |
| `listCredentials` | `agentId: string, filters: ICredentialListFilters` | `Promise<IPaginatedCredentialsResponse>` | Returns paginated credentials for an agent; `clientSecret` is never included in list responses |
| `rotateCredential` | `agentId: string, credentialId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Generates a new secret for the same `credentialId`; overwrites Vault entry (new KV v2 version) or updates bcrypt hash; old secret is immediately invalidated; returns new `ICredentialWithSecret` once |
| `revokeCredential` | `agentId: string, credentialId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Sets credential `status = 'revoked'`; permanently deletes the Vault secret via `vaultClient.deleteSecret()` when Vault is configured; idempotent rejection of already-revoked credentials with `CredentialAlreadyRevokedError` |

**Database / storage schema**:
- Table `credentials`: `credential_id` (UUID PK), `client_id` (= `agentId`, FK to `agents`), `secret_hash` (bcrypt hash; empty string when Vault path is set), `vault_path` (nullable — KV v2 data path), `status`, `created_at`, `expires_at` (nullable), `revoked_at` (nullable).

**Error types**:
- `AgentNotFoundError` (404) — agent UUID not found
- `CredentialError` (400) — agent is not in `active` status (code: `AGENT_NOT_ACTIVE`)
- `CredentialNotFoundError` (404) — credential not found or belongs to a different agent
- `CredentialAlreadyRevokedError` (409) — credential is already revoked

**Configuration**:
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, new credentials are stored in Vault KV v2 instead of bcrypt. Existing bcrypt-based credentials continue to work unchanged.

---

### AuditService

**Purpose**: Creates and queries immutable audit events for compliance and observability.

**Responsibility boundary**: AuditService does not know about HTTP, tokens, or agents.
It receives already-assembled event data from other services and delegates all
persistence to `AuditRepository`. It enforces the 90-day free-tier retention window
on all query and retrieval operations — events older than 90 days are treated as
non-existent.

**Public interface** (key methods):

| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| `logEvent` | `agentId: string, action: AuditAction, outcome: AuditOutcome, ipAddress: string, userAgent: string, metadata: Record<string, unknown>` | `Promise<IAuditEvent>` | Writes an immutable audit row to PostgreSQL. For token endpoints, callers use `void` (fire-and-forget). For CRUD operations, callers `await` this method. |
| `queryEvents` | `filters: IAuditListFilters` | `Promise<IPaginatedAuditEventsResponse>` | Returns paginated, filtered audit events; enforces the 90-day retention window by computing the cutoff date and rejecting queries with `fromDate` before the cutoff; validates that `fromDate <= toDate` |
| `getEventById` | `eventId: string` | `Promise<IAuditEvent>` | Retrieves a single event by UUID; returns `AuditEventNotFoundError` for both genuinely missing events and events outside the 90-day retention window (indistinguishable by design) |

**Database / storage schema**:
- Table `audit_events`: `event_id` (UUID PK), `agent_id` (text FK to agents), `action` (text — one of the `AuditAction` union type values), `outcome` (`success` or `failure`), `ip_address` (text), `user_agent` (text), `metadata` (JSONB), `timestamp` (timestamptz, NOT NULL, indexed).
- No Redis usage — AuditService is PostgreSQL-only.

**Error types**:
- `AuditEventNotFoundError` (404) — event not found or outside retention window
- `RetentionWindowError` (400) — query `fromDate` is before the 90-day retention cutoff
- `ValidationError` (400) — `fromDate` is after `toDate`

**Configuration**: None — the retention window (`FREE_TIER_RETENTION_DAYS = 90`) is a module-level constant.

---

### VaultClient

**Purpose**: Wraps HashiCorp Vault KV v2 operations for credential secret storage and verification.

**Responsibility boundary**: VaultClient is a client adapter — it knows only about
Vault API calls. It has no knowledge of business rules, HTTP, or PostgreSQL. It is
injected into `CredentialService` and `OAuth2Service` via constructor injection. When
`VAULT_ADDR` is not set, `createVaultClientFromEnv()` returns `null` and the bcrypt
code path is used unchanged.

**Public methods**:

| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| `writeSecret` | `agentId: string, credentialId: string, plainSecret: string` | `Promise<string>` | Writes the plain-text secret to the KV v2 data path; returns the path; creates a new KV v2 version on subsequent calls (used for rotation) |
| `readSecret` | `agentId: string, credentialId: string` | `Promise<string>` | Reads and returns the plain-text secret from Vault; throws `CredentialError` if the path is not found or the read fails |
| `verifySecret` | `agentId: string, credentialId: string, candidateSecret: string` | `Promise<boolean>` | Reads the stored secret via `readSecret`, then compares using `crypto.timingSafeEqual` to prevent timing-based side-channel attacks; returns `false` on any Vault error rather than throwing |
| `deleteSecret` | `agentId: string, credentialId: string` | `Promise<void>` | Permanently deletes all versions of a credential secret by calling the KV v2 metadata path (`DELETE {mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`) |

**KV v2 path structure**:
- Data path: `{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`
- Metadata path (for permanent deletion): `{mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`
- Default mount: `secret` (overridable via `VAULT_MOUNT`)

**Opt-in configuration**:
- `VAULT_ADDR` — Vault server address (e.g. `http://127.0.0.1:8200`) — required to enable Vault mode
- `VAULT_TOKEN` — Vault authentication token — required to enable Vault mode
- `VAULT_MOUNT` — KV v2 mount path — optional, defaults to `secret`

**Constant-time comparison rationale**: The `verifySecret` method uses Node.js
`crypto.timingSafeEqual` instead of `===` to prevent attackers from inferring the
length or content of stored secrets by measuring how long the comparison takes. When
the stored and candidate secrets differ in length, a dummy `timingSafeEqual` call is
still performed to eliminate the timing signal from the early-exit path.

---

### OPA Policy Engine

**Purpose**: Enforces scope-based authorisation on every protected HTTP request without requiring a code deployment to change access rules.

**Responsibility boundary**: The OPA policy engine (`src/middleware/opa.ts`) is a
middleware layer — it does not know about business rules, credentials, or audit events.
It receives the HTTP method, full request path, and caller scopes from `req.user`, and
returns allow or deny. All policy logic lives in `policies/authz.rego` and
`policies/data/scopes.json`.

**Policy file locations**:
- `policies/authz.rego` — Rego policy defining `normalise_path`, `lookup_key`, and the `allow` rule. Evaluated by the Wasm bundle when compiled; replicated in TypeScript for the fallback path.
- `policies/data/scopes.json` — JSON map of `"METHOD:/path/pattern"` → `[required_scopes]`. Loaded as data into the Wasm policy and used directly by the TypeScript fallback.
- `policies/authz.wasm` — compiled Wasm bundle (not committed to source control; built from `authz.rego` using the OPA CLI). When present, the Wasm path is used; when absent, the TypeScript fallback reads `scopes.json`.

**How `opaMiddleware` evaluates input**:

1. `createOpaMiddleware()` is called once at app startup in `src/app.ts`.
2. It attempts to load `policies/authz.wasm`. If found, `loadPolicy(wasmBuffer)` is called and `scopes.json` data is injected via `loaded.setData(parsed)`.
3. If no Wasm bundle is found, `scopes.json` is loaded into `scopesMap` as the TypeScript fallback.
4. On every request, the middleware builds an `OpaInput` object: `{ method: req.method, path: req.baseUrl + req.path, scopes: req.user.scope.split(' ') }`.
5. `evaluate(input)` checks the Wasm policy (if loaded) or applies `normalisePath` + scope-intersection logic against `scopesMap`. Returns `false` if neither is loaded (fail-closed).
6. If `evaluate` returns `false`, the middleware calls `next(new AuthorizationError())`.

**How to write a new policy rule**:

1. Add the new endpoint's scope requirement to `policies/data/scopes.json`:
   ```json
   "GET:/api/v1/reports": ["reports:read"]
   ```
2. Add `"reports:read"` to the `OAuthScope` union type in `src/types/index.ts`.
3. If Wasm mode is in use, recompile `authz.rego` to `authz.wasm` using the OPA CLI: `opa build policies/authz.rego -d policies/data/ -o policies/authz.wasm`.
4. Send `SIGHUP` to the running process to hot-reload: `kill -HUP <pid>`.

**How to test a policy rule**:
```bash
# Using the OPA CLI directly
opa eval --data policies/data/scopes.json \
         --input '{"method":"GET","path":"/api/v1/agents","scopes":["agents:read"]}' \
         --bundle policies/ \
         'data.authz.allow'
```
Expected output: `true`. Replace method/path/scopes to test deny cases.

**Hot-reload via SIGHUP**: When `SIGHUP` is received by the Node.js process,
`server.ts` calls `reloadOpaPolicy()`. This re-executes the same startup loading logic:
tries to load the Wasm bundle, falls back to `scopes.json`. The in-memory `wasmPolicy`
and `scopesMap` module-level variables are replaced atomically. No requests are dropped.

---

### Web Dashboard

**Purpose**: Provides a browser-based UI for human operators to manage agents, credentials, and audit logs without writing API calls directly.

**Responsibility boundary**: The dashboard is a pure client-side React SPA. It has no
server-side logic. It calls the AgentIdP REST API using the `@sentryagent/idp-sdk`
`TokenManager` for authentication and a typed `ApiClient` from `dashboard/src/lib/client.ts`
for all API calls. It never stores the `access_token` in localStorage — only
`client_id`, `client_secret`, and `baseUrl` are stored in `sessionStorage` (cleared
on tab close).

**React component structure**:

```
dashboard/src/
├── main.tsx               # React root — mounts App into #root, wraps with BrowserRouter
├── App.tsx                # Route definitions — AuthProvider, RequireAuth, AppShell
├── lib/
│   ├── auth.tsx           # AuthContext, AuthProvider, useAuth hook, sessionStorage helpers
│   └── client.ts          # Typed ApiClient class — wraps fetch with TokenManager token injection
├── components/
│   ├── RequireAuth.tsx    # Route guard — redirects to /dashboard/login if not authenticated
│   └── layout/AppShell.tsx # Persistent sidebar navigation + Outlet for page content
└── pages/
    ├── Login.tsx          # Login form — calls auth.login(), redirects to /dashboard/agents
    ├── Agents.tsx         # Paginated agents list with status filter and search
    ├── AgentDetail.tsx    # Single agent view — status, metadata, update, decommission actions
    ├── Credentials.tsx    # Credential list for an agent — generate, rotate, revoke actions
    ├── AuditLog.tsx       # Paginated audit log with date range and action filters
    └── Health.tsx         # /health endpoint response — PostgreSQL and Redis status display
```

**Authentication flow with sessionStorage**:
1. On `Login.tsx` form submit, `auth.login(creds)` is called.
2. `validateCredentials(creds)` creates a `TokenManager` and calls `getToken()` — if this succeeds, the credentials are valid.
3. `saveCredentials(creds)` stores `{ clientId, clientSecret, baseUrl }` in `sessionStorage` under key `agentidp_credentials`.
4. On every subsequent API call, `getClient()` in `lib/client.ts` reads credentials from `sessionStorage`, creates a `TokenManager`, and injects the current `access_token` into the `Authorization: Bearer` header. The `TokenManager` handles automatic token refresh when the token is expired.
5. `auth.logout()` calls `clearCredentials()` (removes the `sessionStorage` key) and navigates to `/dashboard/login`.

**Main views and their API calls**:
- **Agents** — `GET /api/v1/agents?page=N&limit=20` — paginated list with `status` filter
- **AgentDetail** — `GET /api/v1/agents/:id`, `PATCH /api/v1/agents/:id`, `DELETE /api/v1/agents/:id`
- **Credentials** — `GET /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials/:credId/rotate`, `DELETE /api/v1/agents/:id/credentials/:credId`
- **AuditLog** — `GET /api/v1/audit?page=N&limit=20&fromDate=...&toDate=...`
- **Health** — `GET /health`

**Local development**:
```bash
cd dashboard
npm install
npm run dev    # Vite dev server with HMR — dashboard available at http://localhost:5173/dashboard
```
The Vite dev server proxies `/api/` calls to the Express server at `http://localhost:3000`.
The Express server must be running separately for API calls to work.

---

### Prometheus/Grafana Monitoring

**Purpose**: Provides operational visibility into AgentIdP's HTTP traffic, token issuance rates, agent registration rates, database latency, and Redis command latency.

**Responsibility boundary**: The metrics middleware (`src/middleware/metrics.ts`) and
the metrics registry (`src/metrics/registry.ts`) are observability concerns only — they
do not affect business logic. Metrics are exposed at `GET /metrics` via
`createMetricsRouter()` using `metricsRegistry.metrics()` from `prom-client`. The
`/metrics` endpoint is unauthenticated, intended for scraping by Prometheus only and
not exposed to the public internet.

**Key metrics with labels**:

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | Total HTTP requests received; route is normalised (UUIDs replaced with `:id`) |
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration; buckets from 5ms to 2.5s |
| `agentidp_tokens_issued_total` | Counter | `scope` | Total OAuth 2.0 access tokens successfully issued |
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Total AI agents successfully registered |
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration; buckets from 1ms to 1s |
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration; buckets from 0.5ms to 250ms |

**How to add a new Counter**:
1. Open `src/metrics/registry.ts`.
2. Add a new `Counter` export:
   ```typescript
   export const myNewCounter = new Counter({
     name: 'agentidp_my_new_counter_total',
     help: 'Description of what this counts.',
     labelNames: ['label_one'] as const,
     registers: [metricsRegistry],
   });
   ```
3. Import and call `myNewCounter.inc({ label_one: value })` in the service or middleware where the event occurs.

**How to add a new Histogram**:
1. Open `src/metrics/registry.ts`.
2. Add a new `Histogram` export with appropriate buckets:
   ```typescript
   export const myDurationHistogram = new Histogram({
     name: 'agentidp_my_operation_duration_seconds',
     help: 'Duration of my operation in seconds.',
     labelNames: ['operation'] as const,
     buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
     registers: [metricsRegistry],
   });
   ```
3. Use `const end = myDurationHistogram.startTimer({ operation: 'name' }); ... end();` around the operation being measured.

**Grafana access in local Docker**:

Start the monitoring overlay:
```bash
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up
```
- Prometheus: `http://localhost:9090`
- Grafana: `http://localhost:3001` — default credentials: `admin` / `agentidp`

Grafana is pre-provisioned with a Prometheus data source pointing to `http://prometheus:9090`
and dashboard JSON files from `monitoring/grafana/dashboards/`. No manual configuration
is needed after startup.