docs: engineering knowledge base for new hires
Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
717
docs/engineering/06-walkthroughs.md
Normal file
717
docs/engineering/06-walkthroughs.md
Normal file
@@ -0,0 +1,717 @@
|
||||
# 06 — Code Walkthroughs
|
||||
|
||||
Last verified against commit: `1f95cfe89d1f45fa43b9fb7cff237f07bf9e889e`
|
||||
|
||||
These walkthroughs trace three real production code paths from the HTTP request
|
||||
to the database and back. Every step includes a `file:line` reference and a
|
||||
"why" annotation explaining the design decision.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 1 — Token Issuance
|
||||
|
||||
**Request:** `POST /api/v1/token` with `grant_type=client_credentials`
|
||||
|
||||
This is the most security-critical path in the codebase. An AI agent calling this
|
||||
endpoint is proving its identity and receiving a token that grants access to the
|
||||
entire API for one hour.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Express middleware stack
|
||||
|
||||
**File:** `src/app.ts` lines 57–83
|
||||
|
||||
```
|
||||
helmet() → security headers
|
||||
cors() → CORS headers
|
||||
morgan() → access log line (skipped in test env)
|
||||
express.json() → parse JSON bodies
|
||||
express.urlencoded({ extended: false }) → parse form-encoded bodies
|
||||
metricsMiddleware → start request timer, record counters on finish
|
||||
```
|
||||
|
||||
**Why `extended: false`?** The token endpoint receives `application/x-www-form-urlencoded`
|
||||
bodies (RFC 6749 mandates this format for OAuth 2.0). The `express.urlencoded`
|
||||
middleware parses them into `req.body`. `extended: false` uses the native `querystring`
|
||||
parser, which is sufficient and avoids `qs` library complexity for flat key-value data.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Route dispatch
|
||||
|
||||
**File:** `src/routes/token.ts` line 24
|
||||
|
||||
```typescript
|
||||
router.post('/', asyncHandler(rateLimitMiddleware), asyncHandler(tokenController.issueToken.bind(tokenController)));
|
||||
```
|
||||
|
||||
**Why no `authMiddleware` here?** The token endpoint is where the agent _gets_ its
|
||||
token — it cannot present a Bearer token to authenticate. Instead, credentials go
|
||||
in the request body (`client_id`, `client_secret`). `POST /token` is deliberately
|
||||
unauthenticated at the transport layer; authentication happens inside the controller.
|
||||
|
||||
**Why `asyncHandler`?** Express does not natively support async middleware. `asyncHandler`
|
||||
wraps the async function and calls `next(err)` if the promise rejects, routing the
|
||||
error to `errorHandler`.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Rate limit check
|
||||
|
||||
**File:** `src/middleware/rateLimit.ts`
|
||||
|
||||
The rate limiter checks a Redis sliding-window counter for the client's IP address.
|
||||
If the counter exceeds 100 requests/minute, it throws `RateLimitError` (429).
|
||||
|
||||
**Why Redis, not in-memory?** If the server restarts or scales horizontally to multiple
|
||||
instances, an in-memory counter would reset. Redis maintains the counter across
|
||||
instances and restarts.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Controller: validate grant_type
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 84–103
|
||||
|
||||
```typescript
|
||||
issueToken = async (req: Request, res: Response, _next: NextFunction): Promise<void> => {
|
||||
const body = req.body as ITokenRequest;
|
||||
|
||||
if (!body.grant_type) { ... return res.status(400).json({error: 'invalid_request', ...}) }
|
||||
if (body.grant_type !== 'client_credentials') { ... return res.status(400).json(...) }
|
||||
```
|
||||
|
||||
**Why does this method catch errors itself instead of calling `next(err)`?** The token
|
||||
endpoint must return errors in the **OAuth 2.0 error format** (`{ error, error_description }`)
|
||||
per RFC 6749 §5.2, not the standard SentryAgent.ai format (`{ code, message }`). The
|
||||
`mapToOAuth2Error()` helper translates `AuthenticationError` and `AuthorizationError`
|
||||
into OAuth2 error codes. The `_next` parameter is intentionally unused for the error path.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: Joi validation and credential extraction
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 106–138
|
||||
|
||||
```typescript
|
||||
const { error, value } = tokenRequestSchema.validate(body, { abortEarly: false });
|
||||
// ...
|
||||
// Support HTTP Basic auth fallback (RFC 6749 §2.3.1)
|
||||
const authHeader = req.headers['authorization'];
|
||||
if (authHeader?.startsWith('Basic ')) {
|
||||
const base64 = authHeader.slice(6);
|
||||
const decoded = Buffer.from(base64, 'base64').toString('utf-8');
|
||||
const colonIndex = decoded.indexOf(':');
|
||||
clientId = decoded.slice(0, colonIndex);
|
||||
clientSecret = decoded.slice(colonIndex + 1);
|
||||
}
|
||||
```
|
||||
|
||||
**Why `abortEarly: false`?** This returns all validation errors at once, so the
|
||||
client can fix all problems in one round trip.
|
||||
|
||||
**Why Basic auth support?** RFC 6749 §2.3.1 specifies that client credentials MAY
|
||||
be sent via HTTP Basic authentication. Some OAuth libraries default to this method.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Controller: scope validation
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 141–151
|
||||
|
||||
```typescript
|
||||
const requestedScope = tokenBody.scope ?? 'agents:read';
|
||||
const validScopes = ['agents:read', 'agents:write', 'tokens:read', 'audit:read'];
|
||||
const scopeList = requestedScope.split(' ');
|
||||
const invalidScope = scopeList.find((s) => !validScopes.includes(s));
|
||||
if (invalidScope) { return res.status(400).json({error: 'invalid_scope', ...}) }
|
||||
```
|
||||
|
||||
**Why validate scopes here?** Scope validation at the controller layer provides an
|
||||
RFC 6749-compliant `invalid_scope` error before we even look up the agent. This is
|
||||
faster and gives the client a clearer error message.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: agent lookup
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 83–94
|
||||
|
||||
```typescript
|
||||
const agent = await this.agentRepository.findById(clientId);
|
||||
if (!agent) {
|
||||
void this.auditService.logEvent(clientId, 'auth.failed', 'failure', ..., { reason: 'agent_not_found', clientId });
|
||||
throw new AuthenticationError('Client authentication failed...');
|
||||
}
|
||||
```
|
||||
|
||||
**Why log auth failures?** Failed authentication attempts may indicate a brute-force
|
||||
attack or a misconfigured client. Having them in the audit log enables incident
|
||||
investigation and alerting.
|
||||
|
||||
**Why not distinguish between "agent not found" and "wrong secret" in the error message?**
|
||||
Revealing which is wrong gives an attacker information — they can enumerate valid
|
||||
`client_id` values by checking whether they get "agent not found" vs "wrong secret".
|
||||
Both cases return the same message.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Service: credential verification
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 97–131
|
||||
|
||||
```typescript
|
||||
const { credentials } = await this.credentialRepository.findByAgentId(clientId, { status: 'active', page: 1, limit: 100 });
|
||||
|
||||
for (const cred of credentials) {
|
||||
const credRow = await this.credentialRepository.findById(cred.credentialId);
|
||||
if (credRow) {
|
||||
if (credRow.expiresAt !== null && credRow.expiresAt < new Date()) { continue; }
|
||||
|
||||
let matches: boolean;
|
||||
if (credRow.vaultPath !== null && this.vaultClient !== null) {
|
||||
matches = await this.vaultClient.verifySecret(clientId, credRow.credentialId, clientSecret);
|
||||
} else {
|
||||
matches = await verifySecret(clientSecret, credRow.secretHash);
|
||||
}
|
||||
if (matches) { credentialVerified = true; break; }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why iterate over multiple credentials?** An agent can have multiple active
|
||||
credentials (e.g. one per service that calls it). The agent rotates credentials
|
||||
one at a time — if credential A is rotated while service X is still using it,
|
||||
service X will fail. By checking all active credentials, we allow overlapping rotation.
|
||||
|
||||
**Why check expiry before hashing?** Bcrypt is intentionally slow (~100ms). Checking
|
||||
expiry first is a cheap early exit that avoids the bcrypt computation on expired
|
||||
credentials.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: status and monthly limit checks
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 144–176
|
||||
|
||||
```typescript
|
||||
if (agent.status === 'suspended') { throw new AuthorizationError(...) }
|
||||
if (agent.status === 'decommissioned') { throw new AuthorizationError(...) }
|
||||
|
||||
const monthlyCount = await this.tokenRepository.getMonthlyCount(clientId);
|
||||
if (monthlyCount >= FREE_TIER_MAX_MONTHLY_TOKENS) { throw new FreeTierLimitError(...) }
|
||||
```
|
||||
|
||||
**Why check status after credential verification?** We verify credentials first so
|
||||
a suspended agent with a wrong secret gets `AuthenticationError` (401) not
|
||||
`AuthorizationError` (403). This prevents leaking which agents are suspended to
|
||||
unauthenticated callers.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Service: sign the JWT
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 179–190
|
||||
|
||||
```typescript
|
||||
const jti = uuidv4();
|
||||
const payload: Omit<ITokenPayload, 'iat' | 'exp'> = { sub: clientId, client_id: clientId, scope, jti };
|
||||
const accessToken = signToken(payload, this.privateKey);
|
||||
```
|
||||
|
||||
**File:** `src/utils/jwt.ts` lines 19–31
|
||||
|
||||
```typescript
|
||||
export function signToken(payload: Omit<ITokenPayload, 'iat' | 'exp'>, privateKey: string): string {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
const fullPayload: ITokenPayload = { ...payload, iat: now, exp: now + TOKEN_EXPIRES_IN };
|
||||
return jwt.sign(fullPayload, privateKey, { algorithm: 'RS256' });
|
||||
}
|
||||
```
|
||||
|
||||
**Why RS256 instead of HS256?** RS256 (RSA asymmetric) allows any consumer of the
|
||||
token to verify it using the public key without needing the private signing key.
|
||||
HS256 (HMAC symmetric) would require sharing the secret with every service that
|
||||
verifies tokens.
|
||||
|
||||
**Why `jti` (JWT ID)?** The `jti` is a unique identifier for this specific token.
|
||||
It is used as the key in the Redis revocation list. Without `jti`, you cannot
|
||||
revoke a single token without revoking all tokens for the agent.
|
||||
|
||||
---
|
||||
|
||||
### Step 11 — Service: fire-and-forget operations
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 193–207
|
||||
|
||||
```typescript
|
||||
void this.tokenRepository.incrementMonthlyCount(clientId);
|
||||
void this.auditService.logEvent(clientId, 'token.issued', 'success', ..., { scope, expiresAt });
|
||||
tokensIssuedTotal.inc({ scope });
|
||||
```
|
||||
|
||||
**Why `void` (fire-and-forget)?** The token has been signed and is ready to return.
|
||||
Waiting for the Redis increment and audit write would add ~5–10ms to every token
|
||||
request. These operations are best-effort — if they fail, the token is still valid.
|
||||
|
||||
**Why is the Prometheus `.inc()` call synchronous?** Prometheus counters are
|
||||
in-process memory operations — they do not write to Redis or PostgreSQL. They are
|
||||
O(1) and sub-microsecond.
|
||||
|
||||
---
|
||||
|
||||
### Step 12 — Response
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 163–167
|
||||
|
||||
```typescript
|
||||
res.setHeader('Cache-Control', 'no-store');
|
||||
res.setHeader('Pragma', 'no-cache');
|
||||
res.status(200).json(tokenResponse);
|
||||
```
|
||||
|
||||
**Why `Cache-Control: no-store`?** RFC 6749 §5.1 mandates that token responses
|
||||
must not be cached. Without this header, a shared proxy or CDN could cache the
|
||||
response and replay it to another client.
|
||||
|
||||
Final response:
|
||||
```json
|
||||
{
|
||||
"access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600,
|
||||
"scope": "agents:read agents:write"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 2 — Agent Registration
|
||||
|
||||
**Request:** `POST /api/v1/agents` with Bearer token and agent data JSON body
|
||||
|
||||
After token issuance, registering an agent is the second most common operation.
|
||||
This walkthrough shows a request that goes through all three auth middleware layers.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Middleware stack
|
||||
|
||||
**File:** `src/app.ts` lines 57–83 (same security and parsing middleware as Walkthrough 1)
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Route dispatch
|
||||
|
||||
**File:** `src/routes/agents.ts` lines 22–27
|
||||
|
||||
```typescript
|
||||
router.use(asyncHandler(authMiddleware));
|
||||
router.use(opaMiddleware);
|
||||
router.use(asyncHandler(rateLimitMiddleware));
|
||||
router.post('/', asyncHandler(agentController.registerAgent.bind(agentController)));
|
||||
```
|
||||
|
||||
All three middleware run on every request to the agents router before the handler.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Auth middleware: Bearer token verification
|
||||
|
||||
**File:** `src/middleware/auth.ts` lines 28–77
|
||||
|
||||
```typescript
|
||||
const authHeader = req.headers['authorization'];
|
||||
if (!authHeader || !authHeader.startsWith('Bearer ')) { throw new AuthenticationError(...) }
|
||||
|
||||
const token = authHeader.slice(7).trim();
|
||||
const publicKey = process.env['JWT_PUBLIC_KEY'];
|
||||
let payload: ITokenPayload;
|
||||
try {
|
||||
payload = verifyToken(token, publicKey);
|
||||
} catch (err) {
|
||||
if (err instanceof TokenExpiredError) { throw new AuthenticationError('Token has expired.') }
|
||||
if (err instanceof JsonWebTokenError) { throw new AuthenticationError('Token signature is invalid.') }
|
||||
}
|
||||
|
||||
const redis = await getRedisClient();
|
||||
const revocationKey = `revoked:${payload.jti}`;
|
||||
const isRevoked = await redis.get(revocationKey);
|
||||
if (isRevoked !== null) { throw new AuthenticationError('Token has been revoked.') }
|
||||
|
||||
req.user = payload;
|
||||
next();
|
||||
```
|
||||
|
||||
**Why check Redis after signature verification?** Signature verification is a pure
|
||||
cryptographic operation (no I/O). If the token is expired or has a bad signature,
|
||||
there is no need to hit Redis. The fast path exits early; Redis is the slower
|
||||
secondary check.
|
||||
|
||||
**Why `await getRedisClient()` instead of storing the client?** `getRedisClient()`
|
||||
returns the same singleton every time — the connection is created once and reused.
|
||||
The `await` is fast (no I/O after the first call).
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — OPA middleware: scope enforcement
|
||||
|
||||
**File:** `src/middleware/opa.ts` lines 230–257
|
||||
|
||||
```typescript
|
||||
const input: OpaInput = {
|
||||
method: req.method, // "POST"
|
||||
path: req.baseUrl + req.path, // "/api/v1/agents"
|
||||
scopes: req.user.scope.split(' '), // ["agents:read", "agents:write"]
|
||||
};
|
||||
|
||||
if (!evaluate(input)) {
|
||||
next(new AuthorizationError());
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
For `POST /api/v1/agents`, the policy requires `["agents:write"]`. If `agents:write`
|
||||
is not in the token's scope, the request is rejected with 403 before the controller
|
||||
runs.
|
||||
|
||||
**Why reconstruct the full path with `req.baseUrl + req.path`?** The OPA policy
|
||||
uses full paths (`/api/v1/agents/:id`). Inside a nested router, `req.path` is
|
||||
relative to the router's mount point (e.g. `/`). `req.baseUrl` is the mount prefix
|
||||
(`/api/v1/agents`). Concatenating them gives the full path the policy expects.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: validation
|
||||
|
||||
**File:** `src/controllers/AgentController.ts` lines 37–60
|
||||
|
||||
```typescript
|
||||
registerAgent = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
|
||||
if (!req.user) { throw new AuthorizationError() }
|
||||
|
||||
const { error, value } = createAgentSchema.validate(req.body, { abortEarly: false });
|
||||
if (error) {
|
||||
throw new ValidationError('Request validation failed.', {
|
||||
details: error.details.map((d) => ({ field: d.path.join('.'), reason: d.message })),
|
||||
});
|
||||
}
|
||||
|
||||
const data = value as ICreateAgentRequest;
|
||||
const ipAddress = req.ip ?? '0.0.0.0';
|
||||
const userAgent = req.headers['user-agent'] ?? 'unknown';
|
||||
|
||||
const agent = await this.agentService.registerAgent(data, ipAddress, userAgent);
|
||||
res.status(201).json(agent);
|
||||
```
|
||||
|
||||
**Why check `req.user` in the controller when `authMiddleware` already set it?**
|
||||
TypeScript's type system marks `req.user` as `ITokenPayload | undefined`. The check
|
||||
at line 39 narrows the type so subsequent code can use `req.user` without null
|
||||
assertions. It is a guard, not redundant authentication.
|
||||
|
||||
**Why pass `ipAddress` and `userAgent` to the service?** The service logs audit events.
|
||||
Audit events include the client IP and User-Agent for forensic value. These values
|
||||
come from the HTTP request, which the service has no access to — so the controller
|
||||
extracts them and passes them down.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Service: free-tier limit check
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 59–65
|
||||
|
||||
```typescript
|
||||
const currentCount = await this.agentRepository.countActive();
|
||||
if (currentCount >= FREE_TIER_MAX_AGENTS) {
|
||||
throw new FreeTierLimitError('Free tier limit of 100 registered agents has been reached.', ...);
|
||||
}
|
||||
```
|
||||
|
||||
**Why count before checking email uniqueness?** If the limit is reached, there is
|
||||
no point checking whether the email already exists. Doing the cheaper check (count)
|
||||
first avoids an unnecessary query.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: email uniqueness check
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 68–71
|
||||
|
||||
```typescript
|
||||
const existing = await this.agentRepository.findByEmail(data.email);
|
||||
if (existing !== null) { throw new AgentAlreadyExistsError(data.email) }
|
||||
```
|
||||
|
||||
**Why not rely on the database UNIQUE constraint?** We could, but catching a
|
||||
PostgreSQL `23505` error code in the repository would be less readable and would
|
||||
not produce a typed `AgentAlreadyExistsError` with a structured `details` field.
|
||||
The explicit check gives better error messages and keeps the repository layer clean.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Repository: INSERT
|
||||
|
||||
**File:** `src/repositories/AgentRepository.ts` lines 67–85
|
||||
|
||||
```typescript
|
||||
async create(data: ICreateAgentRequest): Promise<IAgent> {
|
||||
const agentId = uuidv4();
|
||||
const result: QueryResult<AgentRow> = await this.pool.query(
|
||||
`INSERT INTO agents (agent_id, email, agent_type, version, capabilities, owner, deployment_env, status, created_at, updated_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, 'active', NOW(), NOW())
|
||||
RETURNING *`,
|
||||
[agentId, data.email, data.agentType, data.version, data.capabilities, data.owner, data.deploymentEnv],
|
||||
);
|
||||
return mapRowToAgent(result.rows[0]);
|
||||
}
|
||||
```
|
||||
|
||||
**Why generate `agentId` in application code instead of relying on `gen_random_uuid()`?**
|
||||
Because we use the UUID as the OAuth 2.0 `client_id`. We need the UUID before writing
|
||||
to the database so we can use it in the audit event and the response. Having it in
|
||||
application code avoids a separate SELECT after the INSERT.
|
||||
|
||||
**Why `RETURNING *`?** PostgreSQL's `RETURNING` clause sends back the inserted row
|
||||
in the same round trip as the INSERT. This avoids a second SELECT to fetch the
|
||||
newly created record.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: audit event
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 76–83
|
||||
|
||||
```typescript
|
||||
await this.auditService.logEvent(
|
||||
agent.agentId,
|
||||
'agent.created',
|
||||
'success',
|
||||
ipAddress,
|
||||
userAgent,
|
||||
{ agentType: agent.agentType, owner: agent.owner },
|
||||
);
|
||||
```
|
||||
|
||||
**Why `await` here but `void` for token audit events?** Agent registration is a
|
||||
database write operation that happens once. Adding ~5ms for the audit write is
|
||||
acceptable and ensures the audit event is recorded before the 201 response is sent.
|
||||
Token issuance happens far more frequently — audit is fire-and-forget there.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Response
|
||||
|
||||
**File:** `src/controllers/AgentController.ts` line 56
|
||||
|
||||
```typescript
|
||||
res.status(201).json(agent);
|
||||
```
|
||||
|
||||
Returns the full `IAgent` object with HTTP 201 Created.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 3 — Credential Rotation
|
||||
|
||||
**Request:** `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`
|
||||
|
||||
Credential rotation is the process of replacing an existing client secret with a
|
||||
new one without changing the `credentialId`. This is the recommended security
|
||||
practice — rotate periodically and rotate immediately after suspected compromise.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Route dispatch
|
||||
|
||||
**File:** `src/routes/credentials.ts` line 34
|
||||
|
||||
```typescript
|
||||
router.post('/:credentialId/rotate', asyncHandler(credentialController.rotateCredential.bind(credentialController)));
|
||||
```
|
||||
|
||||
The credentials router is mounted at `/api/v1/agents/:agentId/credentials` in `app.ts`.
|
||||
The full path becomes `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Auth middleware
|
||||
|
||||
Same as Walkthrough 2, Step 3. Bearer token is verified via RS256 and Redis revocation check.
|
||||
`req.user` is populated with the JWT payload.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — OPA middleware
|
||||
|
||||
The path `/api/v1/agents/:agentId/credentials/:credId/rotate` is normalised to
|
||||
`/api/v1/agents/:id/credentials/:credId/rotate`. The policy requires `["agents:write"]`.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Controller: ownership check
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` lines 127–137
|
||||
|
||||
```typescript
|
||||
rotateCredential = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
|
||||
if (!req.user) { throw new AuthenticationError() }
|
||||
|
||||
const { agentId, credentialId } = req.params;
|
||||
|
||||
if (req.user.sub !== agentId) {
|
||||
throw new AuthorizationError('You do not have permission to manage credentials for this agent.');
|
||||
}
|
||||
```
|
||||
|
||||
**Why check `req.user.sub !== agentId`?** An agent's token contains its own
|
||||
`agentId` as the `sub` claim. This check enforces that an agent can only manage
|
||||
its own credentials. Even if an agent has `agents:write` scope, it cannot rotate
|
||||
another agent's credentials. This is Phase 1 behaviour — there is no admin scope yet.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: request validation
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` lines 139–157
|
||||
|
||||
```typescript
|
||||
const { error, value } = generateCredentialSchema.validate(req.body ?? {}, { abortEarly: false });
|
||||
// generateCredentialSchema validates optional `expiresAt` field
|
||||
const data = value as IGenerateCredentialRequest;
|
||||
const result = await this.credentialService.rotateCredential(agentId, credentialId, data, ipAddress, userAgent);
|
||||
res.status(200).json(result);
|
||||
```
|
||||
|
||||
**Why `req.body ?? {}`?** The rotation body is optional — an agent may rotate a
|
||||
credential without an expiry date, in which case the body may be empty. Passing
|
||||
`undefined` to Joi would cause a different error than passing `{}`.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Service: existence checks
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 163–177
|
||||
|
||||
```typescript
|
||||
const agent = await this.agentRepository.findById(agentId);
|
||||
if (!agent) { throw new AgentNotFoundError(agentId) }
|
||||
|
||||
const existing = await this.credentialRepository.findById(credentialId);
|
||||
if (!existing || existing.clientId !== agentId) { throw new CredentialNotFoundError(credentialId) }
|
||||
|
||||
if (existing.status === 'revoked') {
|
||||
throw new CredentialAlreadyRevokedError(credentialId, existing.revokedAt?.toISOString() ?? ...);
|
||||
}
|
||||
```
|
||||
|
||||
**Why check `existing.clientId !== agentId`?** Even though OPA restricts the agent
|
||||
to its own credentials, a malicious actor could craft a request with a valid
|
||||
`agentId` in the path but a `credentialId` belonging to another agent. This check
|
||||
ensures that a credential is only accessible to the agent it was created for.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: generate new secret and write to Vault or bcrypt
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 180–192
|
||||
|
||||
```typescript
|
||||
const expiresAt = data.expiresAt !== undefined ? new Date(data.expiresAt) : null;
|
||||
const plainSecret = generateClientSecret(); // sk_live_<64 hex chars>
|
||||
|
||||
let updated: ICredential | null;
|
||||
|
||||
if (this.vaultClient !== null) {
|
||||
// Phase 2: overwrite the existing Vault secret (KV v2 creates a new version)
|
||||
const vaultPath = await this.vaultClient.writeSecret(agentId, credentialId, plainSecret);
|
||||
updated = await this.credentialRepository.updateVaultPath(credentialId, vaultPath, expiresAt);
|
||||
} else {
|
||||
// Phase 1: use bcrypt
|
||||
const newHash = await hashSecret(plainSecret);
|
||||
updated = await this.credentialRepository.updateHash(credentialId, newHash, expiresAt);
|
||||
}
|
||||
```
|
||||
|
||||
**Why does Vault rotation write to the same path?** Vault KV v2 is versioned — writing
|
||||
to an existing path creates a new version without overwriting previous versions.
|
||||
This preserves an audit trail in Vault itself.
|
||||
|
||||
**Why does the Vault path stay the same after rotation?** The `vault_path` column
|
||||
stores the path, not the secret. The path is deterministic:
|
||||
`{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`. Since the
|
||||
`credentialId` does not change on rotation, the path does not change either.
|
||||
Only the Vault version at that path changes.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Repository: UPDATE the credential
|
||||
|
||||
**File:** `src/repositories/CredentialRepository.ts` lines 180–218
|
||||
|
||||
```typescript
|
||||
// Bcrypt path (updateHash):
|
||||
UPDATE credentials
|
||||
SET secret_hash = $1, vault_path = NULL, expires_at = $2, status = 'active', revoked_at = NULL
|
||||
WHERE credential_id = $3
|
||||
RETURNING *
|
||||
|
||||
// Vault path (updateVaultPath):
|
||||
UPDATE credentials
|
||||
SET vault_path = $1, secret_hash = '', expires_at = $2, status = 'active', revoked_at = NULL
|
||||
WHERE credential_id = $3
|
||||
RETURNING *
|
||||
```
|
||||
|
||||
**Why `status = 'active'` in the UPDATE?** A credential could theoretically be
|
||||
in any state when rotated. The UPDATE explicitly sets it to active. This handles
|
||||
edge cases where a revoked credential is being "un-revoked" by rotation (though
|
||||
the service layer prevents this — revoked credentials throw `CredentialAlreadyRevokedError`).
|
||||
The belt-and-suspenders approach at the SQL layer ensures data integrity.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: audit event
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 199–206
|
||||
|
||||
```typescript
|
||||
await this.auditService.logEvent(
|
||||
agentId,
|
||||
'credential.rotated',
|
||||
'success',
|
||||
ipAddress,
|
||||
userAgent,
|
||||
{ credentialId },
|
||||
);
|
||||
```
|
||||
|
||||
The audit event records which credential was rotated. Combined with the timestamp,
|
||||
this gives a complete rotation history for each credential.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Response
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` line 161
|
||||
|
||||
```typescript
|
||||
res.status(200).json(result);
|
||||
```
|
||||
|
||||
Returns `ICredentialWithSecret` — the updated credential including the new
|
||||
`clientSecret`. This is the only time the new secret is ever returned. The caller
|
||||
must store it securely.
|
||||
|
||||
```json
|
||||
{
|
||||
"credentialId": "d4e5f6a7-...",
|
||||
"clientId": "a1b2c3d4-...",
|
||||
"status": "active",
|
||||
"clientSecret": "sk_live_4f8a2e9b...",
|
||||
"createdAt": "2026-01-15T10:00:00Z",
|
||||
"expiresAt": "2027-01-15T10:00:00Z",
|
||||
"revokedAt": null
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user