Files
sentryagent-idp/docs/engineering/06-walkthroughs.md
SentryAgent.ai Developer 8cabc0191c docs: commit all Phase 6 documentation updates and OpenSpec archives
- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook)
- developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added
- engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview)
- OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update
- VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added
- .gitignore: exclude session artifacts, build artifacts, and agent workspaces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 02:24:24 +00:00

975 lines
33 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 06 — Code Walkthroughs
Last verified against commit: `1f95cfe89d1f45fa43b9fb7cff237f07bf9e889e`
These walkthroughs trace three real production code paths from the HTTP request
to the database and back. Every step includes a `file:line` reference and a
"why" annotation explaining the design decision.
---
## Walkthrough 1 — Token Issuance
**Request:** `POST /api/v1/token` with `grant_type=client_credentials`
This is the most security-critical path in the codebase. An AI agent calling this
endpoint is proving its identity and receiving a token that grants access to the
entire API for one hour.
---
### Step 1 — Express middleware stack
**File:** `src/app.ts` lines 5783
```
helmet() → security headers
cors() → CORS headers
morgan() → access log line (skipped in test env)
express.json() → parse JSON bodies
express.urlencoded({ extended: false }) → parse form-encoded bodies
metricsMiddleware → start request timer, record counters on finish
```
**Why `extended: false`?** The token endpoint receives `application/x-www-form-urlencoded`
bodies (RFC 6749 mandates this format for OAuth 2.0). The `express.urlencoded`
middleware parses them into `req.body`. `extended: false` uses the native `querystring`
parser, which is sufficient and avoids `qs` library complexity for flat key-value data.
---
### Step 2 — Route dispatch
**File:** `src/routes/token.ts` line 24
```typescript
router.post('/', asyncHandler(rateLimitMiddleware), asyncHandler(tokenController.issueToken.bind(tokenController)));
```
**Why no `authMiddleware` here?** The token endpoint is where the agent _gets_ its
token — it cannot present a Bearer token to authenticate. Instead, credentials go
in the request body (`client_id`, `client_secret`). `POST /token` is deliberately
unauthenticated at the transport layer; authentication happens inside the controller.
**Why `asyncHandler`?** Express does not natively support async middleware. `asyncHandler`
wraps the async function and calls `next(err)` if the promise rejects, routing the
error to `errorHandler`.
---
### Step 3 — Rate limit check
**File:** `src/middleware/rateLimit.ts`
The rate limiter checks a Redis sliding-window counter for the client's IP address.
If the counter exceeds 100 requests/minute, it throws `RateLimitError` (429).
**Why Redis, not in-memory?** If the server restarts or scales horizontally to multiple
instances, an in-memory counter would reset. Redis maintains the counter across
instances and restarts.
---
### Step 4 — Controller: validate grant_type
**File:** `src/controllers/TokenController.ts` lines 84103
```typescript
issueToken = async (req: Request, res: Response, _next: NextFunction): Promise<void> => {
const body = req.body as ITokenRequest;
if (!body.grant_type) { ... return res.status(400).json({error: 'invalid_request', ...}) }
if (body.grant_type !== 'client_credentials') { ... return res.status(400).json(...) }
```
**Why does this method catch errors itself instead of calling `next(err)`?** The token
endpoint must return errors in the **OAuth 2.0 error format** (`{ error, error_description }`)
per RFC 6749 §5.2, not the standard SentryAgent.ai format (`{ code, message }`). The
`mapToOAuth2Error()` helper translates `AuthenticationError` and `AuthorizationError`
into OAuth2 error codes. The `_next` parameter is intentionally unused for the error path.
---
### Step 5 — Controller: Joi validation and credential extraction
**File:** `src/controllers/TokenController.ts` lines 106138
```typescript
const { error, value } = tokenRequestSchema.validate(body, { abortEarly: false });
// ...
// Support HTTP Basic auth fallback (RFC 6749 §2.3.1)
const authHeader = req.headers['authorization'];
if (authHeader?.startsWith('Basic ')) {
const base64 = authHeader.slice(6);
const decoded = Buffer.from(base64, 'base64').toString('utf-8');
const colonIndex = decoded.indexOf(':');
clientId = decoded.slice(0, colonIndex);
clientSecret = decoded.slice(colonIndex + 1);
}
```
**Why `abortEarly: false`?** This returns all validation errors at once, so the
client can fix all problems in one round trip.
**Why Basic auth support?** RFC 6749 §2.3.1 specifies that client credentials MAY
be sent via HTTP Basic authentication. Some OAuth libraries default to this method.
---
### Step 6 — Controller: scope validation
**File:** `src/controllers/TokenController.ts` lines 141151
```typescript
const requestedScope = tokenBody.scope ?? 'agents:read';
const validScopes = ['agents:read', 'agents:write', 'tokens:read', 'audit:read'];
const scopeList = requestedScope.split(' ');
const invalidScope = scopeList.find((s) => !validScopes.includes(s));
if (invalidScope) { return res.status(400).json({error: 'invalid_scope', ...}) }
```
**Why validate scopes here?** Scope validation at the controller layer provides an
RFC 6749-compliant `invalid_scope` error before we even look up the agent. This is
faster and gives the client a clearer error message.
---
### Step 7 — Service: agent lookup
**File:** `src/services/OAuth2Service.ts` lines 8394
```typescript
const agent = await this.agentRepository.findById(clientId);
if (!agent) {
void this.auditService.logEvent(clientId, 'auth.failed', 'failure', ..., { reason: 'agent_not_found', clientId });
throw new AuthenticationError('Client authentication failed...');
}
```
**Why log auth failures?** Failed authentication attempts may indicate a brute-force
attack or a misconfigured client. Having them in the audit log enables incident
investigation and alerting.
**Why not distinguish between "agent not found" and "wrong secret" in the error message?**
Revealing which is wrong gives an attacker information — they can enumerate valid
`client_id` values by checking whether they get "agent not found" vs "wrong secret".
Both cases return the same message.
---
### Step 8 — Service: credential verification
**File:** `src/services/OAuth2Service.ts` lines 97131
```typescript
const { credentials } = await this.credentialRepository.findByAgentId(clientId, { status: 'active', page: 1, limit: 100 });
for (const cred of credentials) {
const credRow = await this.credentialRepository.findById(cred.credentialId);
if (credRow) {
if (credRow.expiresAt !== null && credRow.expiresAt < new Date()) { continue; }
let matches: boolean;
if (credRow.vaultPath !== null && this.vaultClient !== null) {
matches = await this.vaultClient.verifySecret(clientId, credRow.credentialId, clientSecret);
} else {
matches = await verifySecret(clientSecret, credRow.secretHash);
}
if (matches) { credentialVerified = true; break; }
}
}
```
**Why iterate over multiple credentials?** An agent can have multiple active
credentials (e.g. one per service that calls it). The agent rotates credentials
one at a time — if credential A is rotated while service X is still using it,
service X will fail. By checking all active credentials, we allow overlapping rotation.
**Why check expiry before hashing?** Bcrypt is intentionally slow (~100ms). Checking
expiry first is a cheap early exit that avoids the bcrypt computation on expired
credentials.
---
### Step 9 — Service: status and monthly limit checks
**File:** `src/services/OAuth2Service.ts` lines 144176
```typescript
if (agent.status === 'suspended') { throw new AuthorizationError(...) }
if (agent.status === 'decommissioned') { throw new AuthorizationError(...) }
const monthlyCount = await this.tokenRepository.getMonthlyCount(clientId);
if (monthlyCount >= FREE_TIER_MAX_MONTHLY_TOKENS) { throw new FreeTierLimitError(...) }
```
**Why check status after credential verification?** We verify credentials first so
a suspended agent with a wrong secret gets `AuthenticationError` (401) not
`AuthorizationError` (403). This prevents leaking which agents are suspended to
unauthenticated callers.
---
### Step 10 — Service: sign the JWT
**File:** `src/services/OAuth2Service.ts` lines 179190
```typescript
const jti = uuidv4();
const payload: Omit<ITokenPayload, 'iat' | 'exp'> = { sub: clientId, client_id: clientId, scope, jti };
const accessToken = signToken(payload, this.privateKey);
```
**File:** `src/utils/jwt.ts` lines 1931
```typescript
export function signToken(payload: Omit<ITokenPayload, 'iat' | 'exp'>, privateKey: string): string {
const now = Math.floor(Date.now() / 1000);
const fullPayload: ITokenPayload = { ...payload, iat: now, exp: now + TOKEN_EXPIRES_IN };
return jwt.sign(fullPayload, privateKey, { algorithm: 'RS256' });
}
```
**Why RS256 instead of HS256?** RS256 (RSA asymmetric) allows any consumer of the
token to verify it using the public key without needing the private signing key.
HS256 (HMAC symmetric) would require sharing the secret with every service that
verifies tokens.
**Why `jti` (JWT ID)?** The `jti` is a unique identifier for this specific token.
It is used as the key in the Redis revocation list. Without `jti`, you cannot
revoke a single token without revoking all tokens for the agent.
---
### Step 11 — Service: fire-and-forget operations
**File:** `src/services/OAuth2Service.ts` lines 193207
```typescript
void this.tokenRepository.incrementMonthlyCount(clientId);
void this.auditService.logEvent(clientId, 'token.issued', 'success', ..., { scope, expiresAt });
tokensIssuedTotal.inc({ scope });
```
**Why `void` (fire-and-forget)?** The token has been signed and is ready to return.
Waiting for the Redis increment and audit write would add ~510ms to every token
request. These operations are best-effort — if they fail, the token is still valid.
**Why is the Prometheus `.inc()` call synchronous?** Prometheus counters are
in-process memory operations — they do not write to Redis or PostgreSQL. They are
O(1) and sub-microsecond.
---
### Step 12 — Response
**File:** `src/controllers/TokenController.ts` lines 163167
```typescript
res.setHeader('Cache-Control', 'no-store');
res.setHeader('Pragma', 'no-cache');
res.status(200).json(tokenResponse);
```
**Why `Cache-Control: no-store`?** RFC 6749 §5.1 mandates that token responses
must not be cached. Without this header, a shared proxy or CDN could cache the
response and replay it to another client.
Final response:
```json
{
"access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "Bearer",
"expires_in": 3600,
"scope": "agents:read agents:write"
}
```
---
## Walkthrough 2 — Agent Registration
**Request:** `POST /api/v1/agents` with Bearer token and agent data JSON body
After token issuance, registering an agent is the second most common operation.
This walkthrough shows a request that goes through all three auth middleware layers.
---
### Step 1 — Middleware stack
**File:** `src/app.ts` lines 5783 (same security and parsing middleware as Walkthrough 1)
---
### Step 2 — Route dispatch
**File:** `src/routes/agents.ts` lines 2227
```typescript
router.use(asyncHandler(authMiddleware));
router.use(opaMiddleware);
router.use(asyncHandler(rateLimitMiddleware));
router.post('/', asyncHandler(agentController.registerAgent.bind(agentController)));
```
All three middleware run on every request to the agents router before the handler.
---
### Step 3 — Auth middleware: Bearer token verification
**File:** `src/middleware/auth.ts` lines 2877
```typescript
const authHeader = req.headers['authorization'];
if (!authHeader || !authHeader.startsWith('Bearer ')) { throw new AuthenticationError(...) }
const token = authHeader.slice(7).trim();
const publicKey = process.env['JWT_PUBLIC_KEY'];
let payload: ITokenPayload;
try {
payload = verifyToken(token, publicKey);
} catch (err) {
if (err instanceof TokenExpiredError) { throw new AuthenticationError('Token has expired.') }
if (err instanceof JsonWebTokenError) { throw new AuthenticationError('Token signature is invalid.') }
}
const redis = await getRedisClient();
const revocationKey = `revoked:${payload.jti}`;
const isRevoked = await redis.get(revocationKey);
if (isRevoked !== null) { throw new AuthenticationError('Token has been revoked.') }
req.user = payload;
next();
```
**Why check Redis after signature verification?** Signature verification is a pure
cryptographic operation (no I/O). If the token is expired or has a bad signature,
there is no need to hit Redis. The fast path exits early; Redis is the slower
secondary check.
**Why `await getRedisClient()` instead of storing the client?** `getRedisClient()`
returns the same singleton every time — the connection is created once and reused.
The `await` is fast (no I/O after the first call).
---
### Step 4 — OPA middleware: scope enforcement
**File:** `src/middleware/opa.ts` lines 230257
```typescript
const input: OpaInput = {
method: req.method, // "POST"
path: req.baseUrl + req.path, // "/api/v1/agents"
scopes: req.user.scope.split(' '), // ["agents:read", "agents:write"]
};
if (!evaluate(input)) {
next(new AuthorizationError());
return;
}
```
For `POST /api/v1/agents`, the policy requires `["agents:write"]`. If `agents:write`
is not in the token's scope, the request is rejected with 403 before the controller
runs.
**Why reconstruct the full path with `req.baseUrl + req.path`?** The OPA policy
uses full paths (`/api/v1/agents/:id`). Inside a nested router, `req.path` is
relative to the router's mount point (e.g. `/`). `req.baseUrl` is the mount prefix
(`/api/v1/agents`). Concatenating them gives the full path the policy expects.
---
### Step 5 — Controller: validation
**File:** `src/controllers/AgentController.ts` lines 3760
```typescript
registerAgent = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
if (!req.user) { throw new AuthorizationError() }
const { error, value } = createAgentSchema.validate(req.body, { abortEarly: false });
if (error) {
throw new ValidationError('Request validation failed.', {
details: error.details.map((d) => ({ field: d.path.join('.'), reason: d.message })),
});
}
const data = value as ICreateAgentRequest;
const ipAddress = req.ip ?? '0.0.0.0';
const userAgent = req.headers['user-agent'] ?? 'unknown';
const agent = await this.agentService.registerAgent(data, ipAddress, userAgent);
res.status(201).json(agent);
```
**Why check `req.user` in the controller when `authMiddleware` already set it?**
TypeScript's type system marks `req.user` as `ITokenPayload | undefined`. The check
at line 39 narrows the type so subsequent code can use `req.user` without null
assertions. It is a guard, not redundant authentication.
**Why pass `ipAddress` and `userAgent` to the service?** The service logs audit events.
Audit events include the client IP and User-Agent for forensic value. These values
come from the HTTP request, which the service has no access to — so the controller
extracts them and passes them down.
---
### Step 6 — Service: free-tier limit check
**File:** `src/services/AgentService.ts` lines 5965
```typescript
const currentCount = await this.agentRepository.countActive();
if (currentCount >= FREE_TIER_MAX_AGENTS) {
throw new FreeTierLimitError('Free tier limit of 100 registered agents has been reached.', ...);
}
```
**Why count before checking email uniqueness?** If the limit is reached, there is
no point checking whether the email already exists. Doing the cheaper check (count)
first avoids an unnecessary query.
---
### Step 7 — Service: email uniqueness check
**File:** `src/services/AgentService.ts` lines 6871
```typescript
const existing = await this.agentRepository.findByEmail(data.email);
if (existing !== null) { throw new AgentAlreadyExistsError(data.email) }
```
**Why not rely on the database UNIQUE constraint?** We could, but catching a
PostgreSQL `23505` error code in the repository would be less readable and would
not produce a typed `AgentAlreadyExistsError` with a structured `details` field.
The explicit check gives better error messages and keeps the repository layer clean.
---
### Step 8 — Repository: INSERT
**File:** `src/repositories/AgentRepository.ts` lines 6785
```typescript
async create(data: ICreateAgentRequest): Promise<IAgent> {
const agentId = uuidv4();
const result: QueryResult<AgentRow> = await this.pool.query(
`INSERT INTO agents (agent_id, email, agent_type, version, capabilities, owner, deployment_env, status, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, 'active', NOW(), NOW())
RETURNING *`,
[agentId, data.email, data.agentType, data.version, data.capabilities, data.owner, data.deploymentEnv],
);
return mapRowToAgent(result.rows[0]);
}
```
**Why generate `agentId` in application code instead of relying on `gen_random_uuid()`?**
Because we use the UUID as the OAuth 2.0 `client_id`. We need the UUID before writing
to the database so we can use it in the audit event and the response. Having it in
application code avoids a separate SELECT after the INSERT.
**Why `RETURNING *`?** PostgreSQL's `RETURNING` clause sends back the inserted row
in the same round trip as the INSERT. This avoids a second SELECT to fetch the
newly created record.
---
### Step 9 — Service: audit event
**File:** `src/services/AgentService.ts` lines 7683
```typescript
await this.auditService.logEvent(
agent.agentId,
'agent.created',
'success',
ipAddress,
userAgent,
{ agentType: agent.agentType, owner: agent.owner },
);
```
**Why `await` here but `void` for token audit events?** Agent registration is a
database write operation that happens once. Adding ~5ms for the audit write is
acceptable and ensures the audit event is recorded before the 201 response is sent.
Token issuance happens far more frequently — audit is fire-and-forget there.
---
### Step 10 — Response
**File:** `src/controllers/AgentController.ts` line 56
```typescript
res.status(201).json(agent);
```
Returns the full `IAgent` object with HTTP 201 Created.
---
## Walkthrough 3 — Credential Rotation
**Request:** `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`
Credential rotation is the process of replacing an existing client secret with a
new one without changing the `credentialId`. This is the recommended security
practice — rotate periodically and rotate immediately after suspected compromise.
---
### Step 1 — Route dispatch
**File:** `src/routes/credentials.ts` line 34
```typescript
router.post('/:credentialId/rotate', asyncHandler(credentialController.rotateCredential.bind(credentialController)));
```
The credentials router is mounted at `/api/v1/agents/:agentId/credentials` in `app.ts`.
The full path becomes `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`.
---
### Step 2 — Auth middleware
Same as Walkthrough 2, Step 3. Bearer token is verified via RS256 and Redis revocation check.
`req.user` is populated with the JWT payload.
---
### Step 3 — OPA middleware
The path `/api/v1/agents/:agentId/credentials/:credId/rotate` is normalised to
`/api/v1/agents/:id/credentials/:credId/rotate`. The policy requires `["agents:write"]`.
---
### Step 4 — Controller: ownership check
**File:** `src/controllers/CredentialController.ts` lines 127137
```typescript
rotateCredential = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
if (!req.user) { throw new AuthenticationError() }
const { agentId, credentialId } = req.params;
if (req.user.sub !== agentId) {
throw new AuthorizationError('You do not have permission to manage credentials for this agent.');
}
```
**Why check `req.user.sub !== agentId`?** An agent's token contains its own
`agentId` as the `sub` claim. This check enforces that an agent can only manage
its own credentials. Even if an agent has `agents:write` scope, it cannot rotate
another agent's credentials. This is Phase 1 behaviour — there is no admin scope yet.
---
### Step 5 — Controller: request validation
**File:** `src/controllers/CredentialController.ts` lines 139157
```typescript
const { error, value } = generateCredentialSchema.validate(req.body ?? {}, { abortEarly: false });
// generateCredentialSchema validates optional `expiresAt` field
const data = value as IGenerateCredentialRequest;
const result = await this.credentialService.rotateCredential(agentId, credentialId, data, ipAddress, userAgent);
res.status(200).json(result);
```
**Why `req.body ?? {}`?** The rotation body is optional — an agent may rotate a
credential without an expiry date, in which case the body may be empty. Passing
`undefined` to Joi would cause a different error than passing `{}`.
---
### Step 6 — Service: existence checks
**File:** `src/services/CredentialService.ts` lines 163177
```typescript
const agent = await this.agentRepository.findById(agentId);
if (!agent) { throw new AgentNotFoundError(agentId) }
const existing = await this.credentialRepository.findById(credentialId);
if (!existing || existing.clientId !== agentId) { throw new CredentialNotFoundError(credentialId) }
if (existing.status === 'revoked') {
throw new CredentialAlreadyRevokedError(credentialId, existing.revokedAt?.toISOString() ?? ...);
}
```
**Why check `existing.clientId !== agentId`?** Even though OPA restricts the agent
to its own credentials, a malicious actor could craft a request with a valid
`agentId` in the path but a `credentialId` belonging to another agent. This check
ensures that a credential is only accessible to the agent it was created for.
---
### Step 7 — Service: generate new secret and write to Vault or bcrypt
**File:** `src/services/CredentialService.ts` lines 180192
```typescript
const expiresAt = data.expiresAt !== undefined ? new Date(data.expiresAt) : null;
const plainSecret = generateClientSecret(); // sk_live_<64 hex chars>
let updated: ICredential | null;
if (this.vaultClient !== null) {
// Phase 2: overwrite the existing Vault secret (KV v2 creates a new version)
const vaultPath = await this.vaultClient.writeSecret(agentId, credentialId, plainSecret);
updated = await this.credentialRepository.updateVaultPath(credentialId, vaultPath, expiresAt);
} else {
// Phase 1: use bcrypt
const newHash = await hashSecret(plainSecret);
updated = await this.credentialRepository.updateHash(credentialId, newHash, expiresAt);
}
```
**Why does Vault rotation write to the same path?** Vault KV v2 is versioned — writing
to an existing path creates a new version without overwriting previous versions.
This preserves an audit trail in Vault itself.
**Why does the Vault path stay the same after rotation?** The `vault_path` column
stores the path, not the secret. The path is deterministic:
`{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`. Since the
`credentialId` does not change on rotation, the path does not change either.
Only the Vault version at that path changes.
---
### Step 8 — Repository: UPDATE the credential
**File:** `src/repositories/CredentialRepository.ts` lines 180218
```typescript
// Bcrypt path (updateHash):
UPDATE credentials
SET secret_hash = $1, vault_path = NULL, expires_at = $2, status = 'active', revoked_at = NULL
WHERE credential_id = $3
RETURNING *
// Vault path (updateVaultPath):
UPDATE credentials
SET vault_path = $1, secret_hash = '', expires_at = $2, status = 'active', revoked_at = NULL
WHERE credential_id = $3
RETURNING *
```
**Why `status = 'active'` in the UPDATE?** A credential could theoretically be
in any state when rotated. The UPDATE explicitly sets it to active. This handles
edge cases where a revoked credential is being "un-revoked" by rotation (though
the service layer prevents this — revoked credentials throw `CredentialAlreadyRevokedError`).
The belt-and-suspenders approach at the SQL layer ensures data integrity.
---
### Step 9 — Service: audit event
**File:** `src/services/CredentialService.ts` lines 199206
```typescript
await this.auditService.logEvent(
agentId,
'credential.rotated',
'success',
ipAddress,
userAgent,
{ credentialId },
);
```
The audit event records which credential was rotated. Combined with the timestamp,
this gives a complete rotation history for each credential.
---
### Step 10 — Response
**File:** `src/controllers/CredentialController.ts` line 161
```typescript
res.status(200).json(result);
```
Returns `ICredentialWithSecret` — the updated credential including the new
`clientSecret`. This is the only time the new secret is ever returned. The caller
must store it securely.
```json
{
"credentialId": "d4e5f6a7-...",
"clientId": "a1b2c3d4-...",
"status": "active",
"clientSecret": "sk_live_4f8a2e9b...",
"createdAt": "2026-01-15T10:00:00Z",
"expiresAt": "2027-01-15T10:00:00Z",
"revokedAt": null
}
```
---
## Walkthrough 4 — A2A Delegation End-to-End
**Request:** `POST /api/v1/oauth2/token/delegate` — one AI agent delegating a scoped capability to another
This walkthrough traces how agent A (an orchestrator) issues a delegation token that grants agent B (a sub-agent) the right to act on its behalf with a restricted scope.
---
### Step 1 — Route dispatch
**File:** `src/routes/delegation.ts`
```typescript
router.post(
'/token/delegate',
asyncHandler(authMiddleware),
opaMiddleware,
asyncHandler(delegationController.createDelegation.bind(delegationController))
);
```
Both `authMiddleware` and `opaMiddleware` run. The OPA policy requires scope `agents:write` for delegation creation.
---
### Step 2 — Controller: extract delegator and validate
**File:** `src/controllers/DelegationController.ts`
```typescript
const delegatorId = req.user.sub; // From the Bearer token's sub claim
const { delegatee_id, scope, expires_at } = req.body;
```
The controller validates that `delegatee_id` is a non-empty UUID, `scope` is a non-empty string, and `expires_at` (if provided) is a valid ISO 8601 datetime in the future. It passes these to `DelegationService.createDelegation()`.
---
### Step 3 — Service: verify both agents exist
**File:** `src/services/DelegationService.ts`
```typescript
const delegator = await this.agentRepository.findById(delegatorId);
if (!delegator || delegator.status !== 'active') { throw new AgentNotFoundError(delegatorId) }
const delegatee = await this.agentRepository.findById(delegateeId);
if (!delegatee || delegatee.status !== 'active') { throw new AgentNotFoundError(delegateeId) }
```
Both agents must exist and be in `active` status. A suspended or decommissioned agent cannot participate in delegation.
---
### Step 4 — Service: insert delegation chain record
**File:** `src/services/DelegationService.ts`
```typescript
await this.pool.query(
`INSERT INTO delegation_chains (chain_id, delegator_id, delegatee_id, scope, status, expires_at)
VALUES ($1, $2, $3, $4, 'active', $5)`,
[chainId, delegatorId, delegateeId, scope, expiresAt]
);
```
The `chain_id` is a UUID generated by the service. The `delegation_chains` table provides the authoritative source of truth for which delegations are active, independent of any token.
---
### Step 5 — Response
```json
{
"chain_id": "f1e2d3c4-...",
"token": "eyJhbGciOiJSUzI1NiJ9...",
"delegator_id": "a1b2c3d4-...",
"delegatee_id": "b2c3d4e5-...",
"scope": "agents:read",
"status": "active",
"expires_at": "2026-04-05T00:00:00Z"
}
```
The `token` field is the signed delegation JWT. The delegatee presents this token to `POST /api/v1/oauth2/token/verify-delegation` to prove it has authority to act on the delegator's behalf.
**Why store both the DB record and the JWT?** The DB record allows revocation — when the delegator calls `DELETE /api/v1/delegation-chains/:chainId`, the record is soft-deleted and all subsequent `verify-delegation` calls will fail even if the JWT itself has not yet expired.
---
## Walkthrough 5 — Tier Enforcement Request Lifecycle
**Request:** Any authenticated API request when the organisation's daily call limit is reached
This walkthrough traces how `tierMiddleware` intercepts a request before it reaches the OPA middleware, preventing quota-exceeded traffic from consuming service resources.
---
### Step 1 — Auth middleware passes
Same as Walkthrough 2, Step 3. The Bearer JWT is verified and `req.user` is populated with `sub` (agentId) and `organization_id`.
---
### Step 2 — Tier middleware: fetch org tier
**File:** `src/middleware/tier.ts`
```typescript
const orgId = req.user.organization_id;
const tier = await tierService.fetchTier(orgId);
const config = TIER_CONFIG[tier];
```
`fetchTier()` issues `SELECT tier FROM organizations WHERE organization_id = $1`. Returns `'free'` if no row is found (safe default).
---
### Step 3 — Tier middleware: read daily counter
**File:** `src/middleware/tier.ts`
```typescript
const callsKey = `rate:tier:calls:${orgId}`;
const callsToday = await redis.get(callsKey);
const count = callsToday !== null ? parseInt(callsToday, 10) : 0;
if (count >= config.maxCallsPerDay) {
throw new TierLimitError('calls', config.maxCallsPerDay, { orgId, tier, current: count });
}
```
The Redis key `rate:tier:calls:<orgId>` is read. If null (first call of the day), count is 0. When count equals or exceeds the tier limit, `TierLimitError` (HTTP 429) is thrown immediately — no further middleware runs.
---
### Step 4 — Tier middleware: increment counter (fire-and-forget)
**File:** `src/middleware/tier.ts`
```typescript
// Set TTL to next UTC midnight if key is new
void redis.multi()
.incr(callsKey)
.expireAt(callsKey, nextUtcMidnightUnix())
.exec();
next();
```
The counter is incremented atomically using a Redis MULTI block. The `EXPIREAT` command sets the key to auto-delete at the next UTC midnight, resetting the daily counter without any scheduled job. The increment is fire-and-forget — the request proceeds immediately to `opaMiddleware`.
**Why expire at UTC midnight rather than a rolling 24-hour window?** Tier limits are documented as "per day", which users interpret as resetting at midnight. A rolling window would allow a user to consume their full daily quota twice within a 48-hour period straddling midnight, which is counterintuitive. UTC midnight is predictable and easy to reason about.
---
### Step 5 — Error handler serialises TierLimitError
**File:** `src/middleware/errorHandler.ts`
```json
HTTP 429
{
"code": "TIER_LIMIT_EXCEEDED",
"message": "Daily API call limit reached for your tier.",
"details": {
"tier": "free",
"limit": 1000,
"current": 1000
}
}
```
The `Retry-After` header is set to the number of seconds until next UTC midnight so clients can implement automatic backoff.
---
## Walkthrough 6 — Analytics Event Capture Flow
**Trigger:** Any successful token issuance (`POST /api/v1/token`)
This walkthrough traces how an analytics event is captured without affecting the latency of the primary token issuance response.
---
### Step 1 — Token issuance completes
**File:** `src/services/OAuth2Service.ts`
```typescript
const accessToken = signToken(payload, this.privateKey);
// Primary response is ready — analytics is now fire-and-forget
void this.analyticsService.recordEvent(tenantId, 'token_issued');
tokensIssuedTotal.inc({ scope });
```
The `signToken()` call completes synchronously (RSA signing is CPU-bound, not I/O). The controller can now send the response. `analyticsService.recordEvent()` is called with `void` — the `await` is deliberately omitted.
**Why `void` instead of `await`?** Token issuance latency must remain below 100ms (per the QA performance gate). A PostgreSQL write adds 515ms. Since analytics data is aggregated (not transactional), losing an occasional event due to an error is acceptable. The response is never delayed for analytics.
---
### Step 2 — AnalyticsService: UPSERT daily counter
**File:** `src/services/AnalyticsService.ts`
```typescript
async recordEvent(tenantId: string, metricType: string): Promise<void> {
try {
await this.pool.query(
`INSERT INTO analytics_events (organization_id, date, metric_type, count)
VALUES ($1, CURRENT_DATE, $2, 1)
ON CONFLICT (organization_id, date, metric_type)
DO UPDATE SET count = analytics_events.count + 1`,
[tenantId, metricType],
);
} catch (err) {
console.error('[AnalyticsService] recordEvent failed — primary path unaffected', err);
}
}
```
The `ON CONFLICT DO UPDATE` upsert is atomic. Whether this is the first or the ten-thousandth `token_issued` event for this tenant today, the row is updated correctly. All errors are caught and swallowed — the token has already been returned to the caller.
**Why one row per day per metric, not one row per event?** Storing a row per event would create millions of rows. The daily aggregate model keeps the table compact while still providing daily trend data (the granularity that analytics dashboards need). Sub-day granularity is available from the Prometheus `agentidp_tokens_issued_total` counter if needed.
---
### Step 3 — Dashboard query (deferred)
When a developer visits the analytics page in the developer portal, the portal calls:
```
GET /api/v1/analytics/token-trend?days=30
```
**File:** `src/services/AnalyticsService.ts``getTokenTrend(tenantId, 30)`
```sql
SELECT
gs.date::DATE::TEXT AS date,
COALESCE(ae.count, 0)::INTEGER AS count
FROM generate_series(
CURRENT_DATE - 29 * INTERVAL '1 day',
CURRENT_DATE,
INTERVAL '1 day'
) AS gs(date)
LEFT JOIN analytics_events ae
ON ae.date = gs.date::DATE
AND ae.organization_id = $2
AND ae.metric_type = 'token_issued'
ORDER BY gs.date ASC
```
The `generate_series` + `LEFT JOIN` pattern ensures all 30 days appear in the result, with `count: 0` for days with no events. This avoids the need for the client to fill in gaps.