docs: engineering knowledge base for new hires

Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 12:38:42 +00:00
parent 1f95cfe89d
commit eced5f8699
13 changed files with 3820 additions and 0 deletions
--- a/docs/engineering/09-testing.md
+++ b/docs/engineering/09-testing.md
@@ -0,0 +1,424 @@
+# 09 — Testing Strategy
+
+---
+
+## 10.1 Test Types and Purposes
+
+This codebase uses two types of tests. Understanding when to use each prevents
+you from writing integration tests for things that should be unit tests (slow)
+and unit tests for things that need a real database (misleading).
+
+### Unit Tests
+
+**Location:** `tests/unit/`
+
+**What they test:** A single class or function in complete isolation. All
+dependencies (repositories, services, external clients) are replaced with Jest mocks.
+
+**When to use:**
+- Testing service business logic (free-tier limits, status transitions, error cases)
+- Testing utility functions (crypto, jwt, validators)
+- Testing error hierarchy behaviour
+- Any code that has conditional logic you want to test exhaustively
+
+**What they do NOT test:**
+- Whether the SQL queries are correct
+- Whether the HTTP routing works
+- Whether middleware chains execute in the right order
+
+**Speed:** Milliseconds. Hundreds of unit tests should complete in under 10 seconds.
+
+### Integration Tests
+
+**Location:** `tests/integration/`
+
+**What they test:** A full HTTP request through the Express application against
+a real PostgreSQL database and real Redis instance.
+
+**When to use:**
+- Testing that a route is correctly wired to the right controller method
+- Testing authentication and authorisation middleware in combination
+- Testing database operations end-to-end (INSERT → read back → verify)
+- Testing response shapes match the OpenAPI spec exactly
+
+**What they require:**
+- Running PostgreSQL (at `TEST_DATABASE_URL` or default)
+- Running Redis (at `TEST_REDIS_URL` or default)
+- The test creates its own tables and cleans up after every test case
+
+**Speed:** Seconds. Expect 2–5 seconds per integration test file.
+
+---
+
+## 10.2 Test Framework Stack
+
+| Tool | Role |
+|------|------|
+| **Jest 29.7** | Test runner. `describe`, `it`, `expect`, `beforeEach`, `afterAll`. Also provides mocking via `jest.mock()`, `jest.fn()`, `jest.spyOn()`. |
+| **ts-jest** | Transforms TypeScript test files for Jest without a separate compilation step. Configured in `jest.config.ts`. |
+| **Supertest 6.3** | HTTP testing library. Used in integration tests to make real HTTP requests against the Express app without opening a network port. Works by passing the `Application` object directly. |
+
+**Jest configuration** (`jest.config.ts`):
+```typescript
+export default {
+  preset: 'ts-jest',
+  testEnvironment: 'node',
+  roots: ['<rootDir>/tests'],
+  testPathPattern: ['tests/unit', 'tests/integration'],
+  collectCoverageFrom: ['src/**/*.ts', '!src/server.ts'],
+};
+```
+
+---
+
+## 10.3 Coverage Gates
+
+All four coverage metrics must be above 80% before a feature is considered complete:
+
+| Metric | Gate | What it means |
+|--------|------|---------------|
+| Statements | >80% | Each statement was executed at least once |
+| Branches | >80% | Each `if`/`else`/`switch` branch was taken at least once |
+| Functions | >80% | Each function was called at least once |
+| Lines | >80% | Each line was executed at least once |
+
+**Enforcement:**
+
+Coverage is checked in the PR process:
+```bash
+npm run test:unit -- --coverage
+# Fails if any metric is below 80%
+```
+
+Coverage reports are output to `coverage/lcov-report/index.html` for visual inspection.
+
+The coverage threshold configuration is in `jest.config.ts`:
+```typescript
+coverageThreshold: {
+  global: {
+    statements: 80,
+    branches: 80,
+    functions: 80,
+    lines: 80,
+  },
+},
+```
+
+---
+
+## 10.4 How to Run the Test Suite
+
+```bash
+# Run all tests (unit + integration)
+npm test
+
+# Run only unit tests
+npm run test:unit
+
+# Run only integration tests
+npm run test:integration
+
+# Run unit tests with coverage report
+npm run test:unit -- --coverage
+# HTML report: coverage/lcov-report/index.html
+
+# Run a single test file
+npx jest tests/unit/services/AgentService.test.ts
+
+# Run tests matching a name pattern
+npx jest --testNamePattern="should throw FreeTierLimitError"
+
+# Run tests in watch mode (re-runs on file changes)
+npx jest --watch
+
+# Run with verbose output (shows each test name)
+npx jest --verbose
+```
+
+**Integration test environment variables:**
+```bash
+export TEST_DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test
+export TEST_REDIS_URL=redis://localhost:6379/1
+npm run test:integration
+```
+
+Using database index `/1` for Redis in tests prevents test runs from polluting
+the main database (index `0`) used for local development.
+
+---
+
+## 10.5 Unit Test Writing Conventions
+
+Unit tests follow a strict pattern. Study this example carefully — it shows every
+convention in use.
+
+**Real example from `tests/unit/services/AgentService.test.ts`:**
+
+```typescript
+/**
+ * Unit tests for src/services/AgentService.ts
+ */
+
+import { AgentService } from '../../../src/services/AgentService';
+import { AgentRepository } from '../../../src/repositories/AgentRepository';
+import { CredentialRepository } from '../../../src/repositories/CredentialRepository';
+import { AuditService } from '../../../src/services/AuditService';
+import {
+  AgentAlreadyExistsError,
+  FreeTierLimitError,
+} from '../../../src/utils/errors';
+import { IAgent, ICreateAgentRequest } from '../../../src/types/index';
+
+// Mock all dependencies — none of them execute real code
+jest.mock('../../../src/repositories/AgentRepository');
+jest.mock('../../../src/repositories/CredentialRepository');
+jest.mock('../../../src/services/AuditService');
+
+// Get typed mock constructors so we can call .mockResolvedValue() on them
+const MockAgentRepository = AgentRepository as jest.MockedClass<typeof AgentRepository>;
+const MockCredentialRepository = CredentialRepository as jest.MockedClass<typeof CredentialRepository>;
+const MockAuditService = AuditService as jest.MockedClass<typeof AuditService>;
+
+// Define a complete test fixture — reuse this instead of duplicating object literals
+const MOCK_AGENT: IAgent = {
+  agentId: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
+  email: 'agent@sentryagent.ai',
+  agentType: 'screener',
+  version: '1.0.0',
+  capabilities: ['resume:read'],
+  owner: 'team-a',
+  deploymentEnv: 'production',
+  status: 'active',
+  createdAt: new Date('2026-03-28T09:00:00Z'),
+  updatedAt: new Date('2026-03-28T09:00:00Z'),
+};
+
+describe('AgentService', () => {
+  let agentService: AgentService;
+  let agentRepo: jest.Mocked<AgentRepository>;
+  let credentialRepo: jest.Mocked<CredentialRepository>;
+  let auditService: jest.Mocked<AuditService>;
+
+  beforeEach(() => {
+    // Clear all mocks before each test — prevents state leakage
+    jest.clearAllMocks();
+    // Create fresh mock instances for each test
+    agentRepo = new MockAgentRepository({} as never) as jest.Mocked<AgentRepository>;
+    credentialRepo = new MockCredentialRepository({} as never) as jest.Mocked<CredentialRepository>;
+    auditService = new MockAuditService({} as never) as jest.Mocked<AuditService>;
+    // Inject mocks into the system under test
+    agentService = new AgentService(agentRepo, credentialRepo, auditService);
+  });
+
+  describe('registerAgent()', () => {
+    const createData: ICreateAgentRequest = {
+      email: 'agent@sentryagent.ai',
+      agentType: 'screener',
+      version: '1.0.0',
+      capabilities: ['resume:read'],
+      owner: 'team-a',
+      deploymentEnv: 'production',
+    };
+
+    it('should create and return a new agent', async () => {
+      // Arrange — set up mock return values
+      agentRepo.countActive.mockResolvedValue(0);
+      agentRepo.findByEmail.mockResolvedValue(null);
+      agentRepo.create.mockResolvedValue(MOCK_AGENT);
+      auditService.logEvent.mockResolvedValue({} as never);
+
+      // Act — call the method under test
+      const result = await agentService.registerAgent(createData, '127.0.0.1', 'test/1.0');
+
+      // Assert — verify the result
+      expect(result).toEqual(MOCK_AGENT);
+      // Also verify the mock was called with the right arguments
+      expect(agentRepo.create).toHaveBeenCalledWith(createData);
+    });
+
+    it('should throw FreeTierLimitError when 100 agents already registered', async () => {
+      // Arrange — simulate limit reached
+      agentRepo.countActive.mockResolvedValue(100);
+
+      // Assert error — rejects.toThrow checks the error type
+      await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
+        .rejects.toThrow(FreeTierLimitError);
+    });
+
+    it('should throw AgentAlreadyExistsError if email is already registered', async () => {
+      agentRepo.countActive.mockResolvedValue(0);
+      agentRepo.findByEmail.mockResolvedValue(MOCK_AGENT); // Simulate existing agent
+
+      await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
+        .rejects.toThrow(AgentAlreadyExistsError);
+    });
+  });
+});
+```
+
+### Conventions explained:
+
+1. **One test file per source file.** `AgentService.test.ts` tests `AgentService.ts`.
+2. **`jest.mock()` before any imports from the mocked module.** Jest hoists mock declarations.
+3. **`jest.clearAllMocks()` in `beforeEach`.** Prevents mock call counts from leaking between tests.
+4. **AAA pattern (Arrange, Act, Assert).** Every `it` block follows this order.
+5. **Test both the happy path and every error case.** A service with 3 error conditions
+   needs at least 4 tests (1 success + 3 failures).
+6. **Verify mock calls for side effects.** Use `.toHaveBeenCalledWith()` to verify that
+   `auditService.logEvent` was called with the right arguments, not just that it was called.
+7. **Use typed error assertions.** `.rejects.toThrow(FreeTierLimitError)` verifies the
+   error type, not just a message string.
+
+---
+
+## 10.6 Integration Test Writing Conventions
+
+Integration tests use Supertest to make real HTTP requests against a live Express app.
+
+**Real example from `tests/integration/agents.test.ts`:**
+
+```typescript
+/**
+ * Integration tests for Agent Registry endpoints.
+ */
+
+import crypto from 'crypto';
+import request from 'supertest';
+import { Application } from 'express';
+import { v4 as uuidv4 } from 'uuid';
+import { Pool } from 'pg';
+
+// Generate RSA keys for test tokens — done once per test module
+const { privateKey, publicKey } = crypto.generateKeyPairSync('rsa', {
+  modulusLength: 2048,
+  publicKeyEncoding: { type: 'spki', format: 'pem' },
+  privateKeyEncoding: { type: 'pkcs8', format: 'pem' },
+});
+
+// Set environment variables BEFORE importing the app
+process.env['DATABASE_URL'] = process.env['TEST_DATABASE_URL'] ?? 'postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test';
+process.env['REDIS_URL'] = process.env['TEST_REDIS_URL'] ?? 'redis://localhost:6379/1';
+process.env['JWT_PRIVATE_KEY'] = privateKey;
+process.env['JWT_PUBLIC_KEY'] = publicKey;
+process.env['NODE_ENV'] = 'test';
+
+import { createApp } from '../../src/app';
+import { signToken } from '../../src/utils/jwt';
+import { closePool } from '../../src/db/pool';
+import { closeRedisClient } from '../../src/cache/redis';
+
+// Helper: mint a valid test token
+function makeToken(sub: string = uuidv4(), scope: string = 'agents:read agents:write'): string {
+  return signToken({ sub, client_id: sub, scope, jti: uuidv4() }, privateKey);
+}
+
+describe('Agent Registry Integration Tests', () => {
+  let app: Application;
+  let pool: Pool;
+
+  beforeAll(async () => {
+    // Boot the real Express app
+    app = await createApp();
+    pool = new Pool({ connectionString: process.env['DATABASE_URL'] });
+
+    // Create test tables (idempotent)
+    await pool.query(`CREATE TABLE IF NOT EXISTS agents (...)`);
+  });
+
+  afterEach(async () => {
+    // Clean up after each test — order matters (foreign key constraints)
+    await pool.query('DELETE FROM audit_events');
+    await pool.query('DELETE FROM credentials');
+    await pool.query('DELETE FROM agents');
+  });
+
+  afterAll(async () => {
+    // Close all connections — prevents Jest from hanging
+    await pool.end();
+    await closePool();
+    await closeRedisClient();
+  });
+
+  describe('POST /api/v1/agents', () => {
+    it('should register a new agent and return 201', async () => {
+      const token = makeToken();
+
+      const res = await request(app)
+        .post('/api/v1/agents')
+        .set('Authorization', `Bearer ${token}`)
+        .send({
+          email: 'test-agent@sentryagent.ai',
+          agentType: 'screener',
+          version: '1.0.0',
+          capabilities: ['resume:read'],
+          owner: 'test-team',
+          deploymentEnv: 'development',
+        });
+
+      expect(res.status).toBe(201);
+      expect(res.body.agentId).toBeDefined();
+      expect(res.body.email).toBe('test-agent@sentryagent.ai');
+      expect(res.body.status).toBe('active');
+    });
+
+    it('should return 401 without a token', async () => {
+      const res = await request(app)
+        .post('/api/v1/agents')
+        .send({ email: 'test@sentryagent.ai' });
+
+      expect(res.status).toBe(401);
+    });
+
+    it('should return 409 for duplicate email', async () => {
+      const token = makeToken();
+      const body = { email: 'dup@sentryagent.ai', agentType: 'screener', version: '1.0', capabilities: [], owner: 'team', deploymentEnv: 'development' };
+
+      await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
+      const res = await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
+
+      expect(res.status).toBe(409);
+      expect(res.body.code).toBe('AGENT_ALREADY_EXISTS');
+    });
+  });
+});
+```
+
+### Conventions explained:
+
+1. **Set `process.env` before importing the app.** The app reads env vars at import
+   time (`getPool()`, JWT keys). Setting them after import does nothing.
+2. **`afterEach` cleanup.** Delete all rows after each test so tests are independent.
+   Always delete in child-to-parent order (audit_events → credentials → agents)
+   to respect foreign key constraints.
+3. **`afterAll` close connections.** Always close the pool and Redis client at the end
+   of the suite. Jest will hang if connections remain open.
+4. **Test both success and failure status codes.** Every endpoint test must include
+   an unauthenticated request (401) and an invalid request (400).
+5. **Verify response body shape.** Check `res.body.code` for error responses to
+   verify the correct error type, not just the status code.
+6. **Use `makeToken()` for test tokens.** A helper function keeps token creation
+   consistent across all integration test files.
+
+---
+
+## 10.7 OWASP Top 10 Security Testing Reference
+
+These are the security concerns most relevant to an identity provider. For each,
+here is what AgentIdP does to mitigate the risk and how to test it.
+
+| OWASP Category | Relevant risk | Mitigation | Test approach |
+|---------------|--------------|-----------|---------------|
+| **A01 Broken Access Control** | Agent A accesses agent B's credentials | `req.user.sub !== agentId` check in all credential endpoints | Test: send credential request with a token for agent A but agentId for agent B in the path — expect 403 |
+| **A02 Cryptographic Failures** | Weak credential secrets or JWT algorithm | `sk_live_<64 hex>` = 256-bit entropy; RS256 signing; bcrypt 10 rounds | Test: verify generated secrets are 72 chars; verify JWT header shows `alg: RS256` |
+| **A03 Injection** | SQL injection via input fields | Parameterised queries (`$1, $2, ...`) in all repositories | Test: send `'; DROP TABLE agents; --` as `owner` field — expect 400 from Joi validation |
+| **A05 Security Misconfiguration** | Server leaking stack traces | `errorHandler` returns generic 500 for unknown errors | Test: trigger an unexpected error (mock a repository to throw `new Error()`) — verify response body does not contain stack trace |
+| **A06 Vulnerable Components** | Outdated dependencies with CVEs | Regular `npm audit` | Run: `npm audit` in CI; fail on high/critical findings |
+| **A07 Auth Failures** | Timing attack on credential verification | `crypto.timingSafeEqual` in `VaultClient.verifySecret()`; bcrypt inherently timing-safe | Test: measure multiple failed verification attempts with wrong secrets of varying lengths — timing should not increase linearly with shared prefix length |
+| **A08 Integrity Failures** | Forged JWT tokens | RS256 verification rejects tokens signed with wrong key | Test: create a token signed with a different private key — expect 401 |
+| **A09 Logging Failures** | Auth failures not logged | `auth.failed` audit events written for every authentication failure | Test: attempt token issuance with wrong secret — verify `auth_events` table contains `auth.failed` row |
+| **A10 SSRF** | Not applicable to current API surface | No outbound HTTP from user-supplied URLs | N/A — no URL-accepting fields in current API |
+
+**JWT algorithm confusion (bonus):**
+Test that the server rejects tokens with `alg: none` or `alg: HS256`. The
+`verifyToken()` function specifies `algorithms: ['RS256']`, which causes jsonwebtoken
+to reject any token with a different algorithm header.