Files

SentryAgent.ai Developer eced5f8699 docs: engineering knowledge base for new hires

Complete docs/engineering/ suite — 12 documents covering company overview,
system architecture, tech stack ADRs, codebase structure, service deep dives,
annotated code walkthroughs, dev setup, engineering workflow, testing strategy,
deployment/ops, SDK guide, and README index. All content verified against
source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md
marked complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-29 12:38:42 +00:00

24 KiB

Raw Blame History

06 — Code Walkthroughs

Last verified against commit: 1f95cfe89d1f45fa43b9fb7cff237f07bf9e889e

These walkthroughs trace three real production code paths from the HTTP request to the database and back. Every step includes a file:line reference and a "why" annotation explaining the design decision.

Walkthrough 1 — Token Issuance

Request: POST /api/v1/token with grant_type=client_credentials

This is the most security-critical path in the codebase. An AI agent calling this endpoint is proving its identity and receiving a token that grants access to the entire API for one hour.

Step 1 — Express middleware stack

File: src/app.ts lines 57–83

helmet()          → security headers
cors()            → CORS headers
morgan()          → access log line (skipped in test env)
express.json()    → parse JSON bodies
express.urlencoded({ extended: false }) → parse form-encoded bodies
metricsMiddleware → start request timer, record counters on finish

Why extended: false? The token endpoint receives application/x-www-form-urlencoded bodies (RFC 6749 mandates this format for OAuth 2.0). The express.urlencoded middleware parses them into req.body. extended: false uses the native querystring parser, which is sufficient and avoids qs library complexity for flat key-value data.

Step 2 — Route dispatch

File: src/routes/token.ts line 24

router.post('/', asyncHandler(rateLimitMiddleware), asyncHandler(tokenController.issueToken.bind(tokenController)));

Why no authMiddleware here? The token endpoint is where the agent gets its token — it cannot present a Bearer token to authenticate. Instead, credentials go in the request body (client_id, client_secret). POST /token is deliberately unauthenticated at the transport layer; authentication happens inside the controller.

Why asyncHandler? Express does not natively support async middleware. asyncHandler wraps the async function and calls next(err) if the promise rejects, routing the error to errorHandler.

Step 3 — Rate limit check

File: src/middleware/rateLimit.ts

The rate limiter checks a Redis sliding-window counter for the client's IP address. If the counter exceeds 100 requests/minute, it throws RateLimitError (429).

Why Redis, not in-memory? If the server restarts or scales horizontally to multiple instances, an in-memory counter would reset. Redis maintains the counter across instances and restarts.

Step 4 — Controller: validate grant_type

File: src/controllers/TokenController.ts lines 84–103

issueToken = async (req: Request, res: Response, _next: NextFunction): Promise<void> => {
  const body = req.body as ITokenRequest;

  if (!body.grant_type) { ... return res.status(400).json({error: 'invalid_request', ...}) }
  if (body.grant_type !== 'client_credentials') { ... return res.status(400).json(...) }

Why does this method catch errors itself instead of calling next(err)? The token endpoint must return errors in the OAuth 2.0 error format ({ error, error_description }) per RFC 6749 §5.2, not the standard SentryAgent.ai format ({ code, message }). The mapToOAuth2Error() helper translates AuthenticationError and AuthorizationError into OAuth2 error codes. The _next parameter is intentionally unused for the error path.

Step 5 — Controller: Joi validation and credential extraction

File: src/controllers/TokenController.ts lines 106–138

const { error, value } = tokenRequestSchema.validate(body, { abortEarly: false });
// ...
// Support HTTP Basic auth fallback (RFC 6749 §2.3.1)
const authHeader = req.headers['authorization'];
if (authHeader?.startsWith('Basic ')) {
  const base64 = authHeader.slice(6);
  const decoded = Buffer.from(base64, 'base64').toString('utf-8');
  const colonIndex = decoded.indexOf(':');
  clientId = decoded.slice(0, colonIndex);
  clientSecret = decoded.slice(colonIndex + 1);
}

Why abortEarly: false? This returns all validation errors at once, so the client can fix all problems in one round trip.

Why Basic auth support? RFC 6749 §2.3.1 specifies that client credentials MAY be sent via HTTP Basic authentication. Some OAuth libraries default to this method.

Step 6 — Controller: scope validation

File: src/controllers/TokenController.ts lines 141–151

const requestedScope = tokenBody.scope ?? 'agents:read';
const validScopes = ['agents:read', 'agents:write', 'tokens:read', 'audit:read'];
const scopeList = requestedScope.split(' ');
const invalidScope = scopeList.find((s) => !validScopes.includes(s));
if (invalidScope) { return res.status(400).json({error: 'invalid_scope', ...}) }

Why validate scopes here? Scope validation at the controller layer provides an RFC 6749-compliant invalid_scope error before we even look up the agent. This is faster and gives the client a clearer error message.

Step 7 — Service: agent lookup

File: src/services/OAuth2Service.ts lines 83–94

const agent = await this.agentRepository.findById(clientId);
if (!agent) {
  void this.auditService.logEvent(clientId, 'auth.failed', 'failure', ..., { reason: 'agent_not_found', clientId });
  throw new AuthenticationError('Client authentication failed...');
}

Why log auth failures? Failed authentication attempts may indicate a brute-force attack or a misconfigured client. Having them in the audit log enables incident investigation and alerting.

Why not distinguish between "agent not found" and "wrong secret" in the error message? Revealing which is wrong gives an attacker information — they can enumerate valid client_id values by checking whether they get "agent not found" vs "wrong secret". Both cases return the same message.

Step 8 — Service: credential verification

File: src/services/OAuth2Service.ts lines 97–131

const { credentials } = await this.credentialRepository.findByAgentId(clientId, { status: 'active', page: 1, limit: 100 });

for (const cred of credentials) {
  const credRow = await this.credentialRepository.findById(cred.credentialId);
  if (credRow) {
    if (credRow.expiresAt !== null && credRow.expiresAt < new Date()) { continue; }

    let matches: boolean;
    if (credRow.vaultPath !== null && this.vaultClient !== null) {
      matches = await this.vaultClient.verifySecret(clientId, credRow.credentialId, clientSecret);
    } else {
      matches = await verifySecret(clientSecret, credRow.secretHash);
    }
    if (matches) { credentialVerified = true; break; }
  }
}

Why iterate over multiple credentials? An agent can have multiple active credentials (e.g. one per service that calls it). The agent rotates credentials one at a time — if credential A is rotated while service X is still using it, service X will fail. By checking all active credentials, we allow overlapping rotation.

Why check expiry before hashing? Bcrypt is intentionally slow (~100ms). Checking expiry first is a cheap early exit that avoids the bcrypt computation on expired credentials.

Step 9 — Service: status and monthly limit checks

File: src/services/OAuth2Service.ts lines 144–176

if (agent.status === 'suspended') { throw new AuthorizationError(...) }
if (agent.status === 'decommissioned') { throw new AuthorizationError(...) }

const monthlyCount = await this.tokenRepository.getMonthlyCount(clientId);
if (monthlyCount >= FREE_TIER_MAX_MONTHLY_TOKENS) { throw new FreeTierLimitError(...) }

Why check status after credential verification? We verify credentials first so a suspended agent with a wrong secret gets AuthenticationError (401) not AuthorizationError (403). This prevents leaking which agents are suspended to unauthenticated callers.

Step 10 — Service: sign the JWT

File: src/services/OAuth2Service.ts lines 179–190

const jti = uuidv4();
const payload: Omit<ITokenPayload, 'iat' | 'exp'> = { sub: clientId, client_id: clientId, scope, jti };
const accessToken = signToken(payload, this.privateKey);

File: src/utils/jwt.ts lines 19–31

export function signToken(payload: Omit<ITokenPayload, 'iat' | 'exp'>, privateKey: string): string {
  const now = Math.floor(Date.now() / 1000);
  const fullPayload: ITokenPayload = { ...payload, iat: now, exp: now + TOKEN_EXPIRES_IN };
  return jwt.sign(fullPayload, privateKey, { algorithm: 'RS256' });
}

Why RS256 instead of HS256? RS256 (RSA asymmetric) allows any consumer of the token to verify it using the public key without needing the private signing key. HS256 (HMAC symmetric) would require sharing the secret with every service that verifies tokens.

Why jti (JWT ID)? The jti is a unique identifier for this specific token. It is used as the key in the Redis revocation list. Without jti, you cannot revoke a single token without revoking all tokens for the agent.

Step 11 — Service: fire-and-forget operations

File: src/services/OAuth2Service.ts lines 193–207

void this.tokenRepository.incrementMonthlyCount(clientId);
void this.auditService.logEvent(clientId, 'token.issued', 'success', ..., { scope, expiresAt });
tokensIssuedTotal.inc({ scope });

Why void (fire-and-forget)? The token has been signed and is ready to return. Waiting for the Redis increment and audit write would add ~5–10ms to every token request. These operations are best-effort — if they fail, the token is still valid.

Why is the Prometheus .inc() call synchronous? Prometheus counters are in-process memory operations — they do not write to Redis or PostgreSQL. They are O(1) and sub-microsecond.

Step 12 — Response

File: src/controllers/TokenController.ts lines 163–167

res.setHeader('Cache-Control', 'no-store');
res.setHeader('Pragma', 'no-cache');
res.status(200).json(tokenResponse);

Why Cache-Control: no-store? RFC 6749 §5.1 mandates that token responses must not be cached. Without this header, a shared proxy or CDN could cache the response and replay it to another client.

Final response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 3600,
  "scope": "agents:read agents:write"
}

Walkthrough 2 — Agent Registration

Request: POST /api/v1/agents with Bearer token and agent data JSON body

After token issuance, registering an agent is the second most common operation. This walkthrough shows a request that goes through all three auth middleware layers.

Step 1 — Middleware stack

File: src/app.ts lines 57–83 (same security and parsing middleware as Walkthrough 1)

Step 2 — Route dispatch

File: src/routes/agents.ts lines 22–27

router.use(asyncHandler(authMiddleware));
router.use(opaMiddleware);
router.use(asyncHandler(rateLimitMiddleware));
router.post('/', asyncHandler(agentController.registerAgent.bind(agentController)));

All three middleware run on every request to the agents router before the handler.

Step 3 — Auth middleware: Bearer token verification

File: src/middleware/auth.ts lines 28–77

const authHeader = req.headers['authorization'];
if (!authHeader || !authHeader.startsWith('Bearer ')) { throw new AuthenticationError(...) }

const token = authHeader.slice(7).trim();
const publicKey = process.env['JWT_PUBLIC_KEY'];
let payload: ITokenPayload;
try {
  payload = verifyToken(token, publicKey);
} catch (err) {
  if (err instanceof TokenExpiredError) { throw new AuthenticationError('Token has expired.') }
  if (err instanceof JsonWebTokenError) { throw new AuthenticationError('Token signature is invalid.') }
}

const redis = await getRedisClient();
const revocationKey = `revoked:${payload.jti}`;
const isRevoked = await redis.get(revocationKey);
if (isRevoked !== null) { throw new AuthenticationError('Token has been revoked.') }

req.user = payload;
next();

Why check Redis after signature verification? Signature verification is a pure cryptographic operation (no I/O). If the token is expired or has a bad signature, there is no need to hit Redis. The fast path exits early; Redis is the slower secondary check.

Why await getRedisClient() instead of storing the client? getRedisClient() returns the same singleton every time — the connection is created once and reused. The await is fast (no I/O after the first call).

Step 4 — OPA middleware: scope enforcement

File: src/middleware/opa.ts lines 230–257

const input: OpaInput = {
  method: req.method,               // "POST"
  path: req.baseUrl + req.path,     // "/api/v1/agents"
  scopes: req.user.scope.split(' '), // ["agents:read", "agents:write"]
};

if (!evaluate(input)) {
  next(new AuthorizationError());
  return;
}

For POST /api/v1/agents, the policy requires ["agents:write"]. If agents:write is not in the token's scope, the request is rejected with 403 before the controller runs.

Why reconstruct the full path with req.baseUrl + req.path? The OPA policy uses full paths (/api/v1/agents/:id). Inside a nested router, req.path is relative to the router's mount point (e.g. /). req.baseUrl is the mount prefix (/api/v1/agents). Concatenating them gives the full path the policy expects.

Step 5 — Controller: validation

File: src/controllers/AgentController.ts lines 37–60

registerAgent = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
  if (!req.user) { throw new AuthorizationError() }

  const { error, value } = createAgentSchema.validate(req.body, { abortEarly: false });
  if (error) {
    throw new ValidationError('Request validation failed.', {
      details: error.details.map((d) => ({ field: d.path.join('.'), reason: d.message })),
    });
  }

  const data = value as ICreateAgentRequest;
  const ipAddress = req.ip ?? '0.0.0.0';
  const userAgent = req.headers['user-agent'] ?? 'unknown';

  const agent = await this.agentService.registerAgent(data, ipAddress, userAgent);
  res.status(201).json(agent);

Why check req.user in the controller when authMiddleware already set it? TypeScript's type system marks req.user as ITokenPayload | undefined. The check at line 39 narrows the type so subsequent code can use req.user without null assertions. It is a guard, not redundant authentication.

Why pass ipAddress and userAgent to the service? The service logs audit events. Audit events include the client IP and User-Agent for forensic value. These values come from the HTTP request, which the service has no access to — so the controller extracts them and passes them down.

Step 6 — Service: free-tier limit check

File: src/services/AgentService.ts lines 59–65

const currentCount = await this.agentRepository.countActive();
if (currentCount >= FREE_TIER_MAX_AGENTS) {
  throw new FreeTierLimitError('Free tier limit of 100 registered agents has been reached.', ...);
}

Why count before checking email uniqueness? If the limit is reached, there is no point checking whether the email already exists. Doing the cheaper check (count) first avoids an unnecessary query.

Step 7 — Service: email uniqueness check

File: src/services/AgentService.ts lines 68–71

const existing = await this.agentRepository.findByEmail(data.email);
if (existing !== null) { throw new AgentAlreadyExistsError(data.email) }

Why not rely on the database UNIQUE constraint? We could, but catching a PostgreSQL 23505 error code in the repository would be less readable and would not produce a typed AgentAlreadyExistsError with a structured details field. The explicit check gives better error messages and keeps the repository layer clean.

Step 8 — Repository: INSERT

File: src/repositories/AgentRepository.ts lines 67–85

async create(data: ICreateAgentRequest): Promise<IAgent> {
  const agentId = uuidv4();
  const result: QueryResult<AgentRow> = await this.pool.query(
    `INSERT INTO agents (agent_id, email, agent_type, version, capabilities, owner, deployment_env, status, created_at, updated_at)
     VALUES ($1, $2, $3, $4, $5, $6, $7, 'active', NOW(), NOW())
     RETURNING *`,
    [agentId, data.email, data.agentType, data.version, data.capabilities, data.owner, data.deploymentEnv],
  );
  return mapRowToAgent(result.rows[0]);
}

Why generate agentId in application code instead of relying on gen_random_uuid()? Because we use the UUID as the OAuth 2.0 client_id. We need the UUID before writing to the database so we can use it in the audit event and the response. Having it in application code avoids a separate SELECT after the INSERT.

Why RETURNING *? PostgreSQL's RETURNING clause sends back the inserted row in the same round trip as the INSERT. This avoids a second SELECT to fetch the newly created record.

Step 9 — Service: audit event

File: src/services/AgentService.ts lines 76–83

await this.auditService.logEvent(
  agent.agentId,
  'agent.created',
  'success',
  ipAddress,
  userAgent,
  { agentType: agent.agentType, owner: agent.owner },
);

Why await here but void for token audit events? Agent registration is a database write operation that happens once. Adding ~5ms for the audit write is acceptable and ensures the audit event is recorded before the 201 response is sent. Token issuance happens far more frequently — audit is fire-and-forget there.

Step 10 — Response

File: src/controllers/AgentController.ts line 56

res.status(201).json(agent);

Returns the full IAgent object with HTTP 201 Created.

Walkthrough 3 — Credential Rotation

Request: POST /api/v1/agents/:agentId/credentials/:credentialId/rotate

Credential rotation is the process of replacing an existing client secret with a new one without changing the credentialId. This is the recommended security practice — rotate periodically and rotate immediately after suspected compromise.

Step 1 — Route dispatch

File: src/routes/credentials.ts line 34

router.post('/:credentialId/rotate', asyncHandler(credentialController.rotateCredential.bind(credentialController)));

The credentials router is mounted at /api/v1/agents/:agentId/credentials in app.ts. The full path becomes POST /api/v1/agents/:agentId/credentials/:credentialId/rotate.

Step 2 — Auth middleware

Same as Walkthrough 2, Step 3. Bearer token is verified via RS256 and Redis revocation check. req.user is populated with the JWT payload.

Step 3 — OPA middleware

The path /api/v1/agents/:agentId/credentials/:credId/rotate is normalised to /api/v1/agents/:id/credentials/:credId/rotate. The policy requires ["agents:write"].

Step 4 — Controller: ownership check

File: src/controllers/CredentialController.ts lines 127–137

rotateCredential = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
  if (!req.user) { throw new AuthenticationError() }

  const { agentId, credentialId } = req.params;

  if (req.user.sub !== agentId) {
    throw new AuthorizationError('You do not have permission to manage credentials for this agent.');
  }

Why check req.user.sub !== agentId? An agent's token contains its own agentId as the sub claim. This check enforces that an agent can only manage its own credentials. Even if an agent has agents:write scope, it cannot rotate another agent's credentials. This is Phase 1 behaviour — there is no admin scope yet.

Step 5 — Controller: request validation

File: src/controllers/CredentialController.ts lines 139–157

const { error, value } = generateCredentialSchema.validate(req.body ?? {}, { abortEarly: false });
// generateCredentialSchema validates optional `expiresAt` field
const data = value as IGenerateCredentialRequest;
const result = await this.credentialService.rotateCredential(agentId, credentialId, data, ipAddress, userAgent);
res.status(200).json(result);

Why req.body ?? {}? The rotation body is optional — an agent may rotate a credential without an expiry date, in which case the body may be empty. Passing undefined to Joi would cause a different error than passing {}.

Step 6 — Service: existence checks

File: src/services/CredentialService.ts lines 163–177

const agent = await this.agentRepository.findById(agentId);
if (!agent) { throw new AgentNotFoundError(agentId) }

const existing = await this.credentialRepository.findById(credentialId);
if (!existing || existing.clientId !== agentId) { throw new CredentialNotFoundError(credentialId) }

if (existing.status === 'revoked') {
  throw new CredentialAlreadyRevokedError(credentialId, existing.revokedAt?.toISOString() ?? ...);
}

Why check existing.clientId !== agentId? Even though OPA restricts the agent to its own credentials, a malicious actor could craft a request with a valid agentId in the path but a credentialId belonging to another agent. This check ensures that a credential is only accessible to the agent it was created for.

Step 7 — Service: generate new secret and write to Vault or bcrypt

File: src/services/CredentialService.ts lines 180–192

const expiresAt = data.expiresAt !== undefined ? new Date(data.expiresAt) : null;
const plainSecret = generateClientSecret();  // sk_live_<64 hex chars>

let updated: ICredential | null;

if (this.vaultClient !== null) {
  // Phase 2: overwrite the existing Vault secret (KV v2 creates a new version)
  const vaultPath = await this.vaultClient.writeSecret(agentId, credentialId, plainSecret);
  updated = await this.credentialRepository.updateVaultPath(credentialId, vaultPath, expiresAt);
} else {
  // Phase 1: use bcrypt
  const newHash = await hashSecret(plainSecret);
  updated = await this.credentialRepository.updateHash(credentialId, newHash, expiresAt);
}

Why does Vault rotation write to the same path? Vault KV v2 is versioned — writing to an existing path creates a new version without overwriting previous versions. This preserves an audit trail in Vault itself.

Why does the Vault path stay the same after rotation? The vault_path column stores the path, not the secret. The path is deterministic: {mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}. Since the credentialId does not change on rotation, the path does not change either. Only the Vault version at that path changes.

Step 8 — Repository: UPDATE the credential

File: src/repositories/CredentialRepository.ts lines 180–218

// Bcrypt path (updateHash):
UPDATE credentials
SET secret_hash = $1, vault_path = NULL, expires_at = $2, status = 'active', revoked_at = NULL
WHERE credential_id = $3
RETURNING *

// Vault path (updateVaultPath):
UPDATE credentials
SET vault_path = $1, secret_hash = '', expires_at = $2, status = 'active', revoked_at = NULL
WHERE credential_id = $3
RETURNING *

Why status = 'active' in the UPDATE? A credential could theoretically be in any state when rotated. The UPDATE explicitly sets it to active. This handles edge cases where a revoked credential is being "un-revoked" by rotation (though the service layer prevents this — revoked credentials throw CredentialAlreadyRevokedError). The belt-and-suspenders approach at the SQL layer ensures data integrity.

Step 9 — Service: audit event

File: src/services/CredentialService.ts lines 199–206

await this.auditService.logEvent(
  agentId,
  'credential.rotated',
  'success',
  ipAddress,
  userAgent,
  { credentialId },
);

The audit event records which credential was rotated. Combined with the timestamp, this gives a complete rotation history for each credential.

Step 10 — Response

File: src/controllers/CredentialController.ts line 161

res.status(200).json(result);

Returns ICredentialWithSecret — the updated credential including the new clientSecret. This is the only time the new secret is ever returned. The caller must store it securely.

{
  "credentialId": "d4e5f6a7-...",
  "clientId": "a1b2c3d4-...",
  "status": "active",
  "clientSecret": "sk_live_4f8a2e9b...",
  "createdAt": "2026-01-15T10:00:00Z",
  "expiresAt": "2027-01-15T10:00:00Z",
  "revokedAt": null
}

24 KiB Raw Blame History Unescape Escape

06 — Code Walkthroughs

Walkthrough 1 — Token Issuance

Step 1 — Express middleware stack

Step 2 — Route dispatch

Step 3 — Rate limit check

Step 4 — Controller: validate grant_type

Step 5 — Controller: Joi validation and credential extraction

Step 6 — Controller: scope validation

Step 7 — Service: agent lookup

Step 8 — Service: credential verification

Step 9 — Service: status and monthly limit checks

Step 10 — Service: sign the JWT

Step 11 — Service: fire-and-forget operations

Step 12 — Response

Walkthrough 2 — Agent Registration

Step 1 — Middleware stack

Step 2 — Route dispatch

Step 3 — Auth middleware: Bearer token verification

Step 4 — OPA middleware: scope enforcement

Step 5 — Controller: validation

Step 6 — Service: free-tier limit check

Step 7 — Service: email uniqueness check

Step 8 — Repository: INSERT

Step 9 — Service: audit event

Step 10 — Response

Walkthrough 3 — Credential Rotation

Step 1 — Route dispatch

Step 2 — Auth middleware

Step 3 — OPA middleware

Step 4 — Controller: ownership check

Step 5 — Controller: request validation

Step 6 — Service: existence checks

Step 7 — Service: generate new secret and write to Vault or bcrypt

Step 8 — Repository: UPDATE the credential

Step 9 — Service: audit event

Step 10 — Response

24 KiB

Raw Blame History