docs: engineering knowledge base for new hires
Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
424
docs/engineering/09-testing.md
Normal file
424
docs/engineering/09-testing.md
Normal file
@@ -0,0 +1,424 @@
|
||||
# 09 — Testing Strategy
|
||||
|
||||
---
|
||||
|
||||
## 10.1 Test Types and Purposes
|
||||
|
||||
This codebase uses two types of tests. Understanding when to use each prevents
|
||||
you from writing integration tests for things that should be unit tests (slow)
|
||||
and unit tests for things that need a real database (misleading).
|
||||
|
||||
### Unit Tests
|
||||
|
||||
**Location:** `tests/unit/`
|
||||
|
||||
**What they test:** A single class or function in complete isolation. All
|
||||
dependencies (repositories, services, external clients) are replaced with Jest mocks.
|
||||
|
||||
**When to use:**
|
||||
- Testing service business logic (free-tier limits, status transitions, error cases)
|
||||
- Testing utility functions (crypto, jwt, validators)
|
||||
- Testing error hierarchy behaviour
|
||||
- Any code that has conditional logic you want to test exhaustively
|
||||
|
||||
**What they do NOT test:**
|
||||
- Whether the SQL queries are correct
|
||||
- Whether the HTTP routing works
|
||||
- Whether middleware chains execute in the right order
|
||||
|
||||
**Speed:** Milliseconds. Hundreds of unit tests should complete in under 10 seconds.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
**Location:** `tests/integration/`
|
||||
|
||||
**What they test:** A full HTTP request through the Express application against
|
||||
a real PostgreSQL database and real Redis instance.
|
||||
|
||||
**When to use:**
|
||||
- Testing that a route is correctly wired to the right controller method
|
||||
- Testing authentication and authorisation middleware in combination
|
||||
- Testing database operations end-to-end (INSERT → read back → verify)
|
||||
- Testing response shapes match the OpenAPI spec exactly
|
||||
|
||||
**What they require:**
|
||||
- Running PostgreSQL (at `TEST_DATABASE_URL` or default)
|
||||
- Running Redis (at `TEST_REDIS_URL` or default)
|
||||
- The test creates its own tables and cleans up after every test case
|
||||
|
||||
**Speed:** Seconds. Expect 2–5 seconds per integration test file.
|
||||
|
||||
---
|
||||
|
||||
## 10.2 Test Framework Stack
|
||||
|
||||
| Tool | Role |
|
||||
|------|------|
|
||||
| **Jest 29.7** | Test runner. `describe`, `it`, `expect`, `beforeEach`, `afterAll`. Also provides mocking via `jest.mock()`, `jest.fn()`, `jest.spyOn()`. |
|
||||
| **ts-jest** | Transforms TypeScript test files for Jest without a separate compilation step. Configured in `jest.config.ts`. |
|
||||
| **Supertest 6.3** | HTTP testing library. Used in integration tests to make real HTTP requests against the Express app without opening a network port. Works by passing the `Application` object directly. |
|
||||
|
||||
**Jest configuration** (`jest.config.ts`):
|
||||
```typescript
|
||||
export default {
|
||||
preset: 'ts-jest',
|
||||
testEnvironment: 'node',
|
||||
roots: ['<rootDir>/tests'],
|
||||
testPathPattern: ['tests/unit', 'tests/integration'],
|
||||
collectCoverageFrom: ['src/**/*.ts', '!src/server.ts'],
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10.3 Coverage Gates
|
||||
|
||||
All four coverage metrics must be above 80% before a feature is considered complete:
|
||||
|
||||
| Metric | Gate | What it means |
|
||||
|--------|------|---------------|
|
||||
| Statements | >80% | Each statement was executed at least once |
|
||||
| Branches | >80% | Each `if`/`else`/`switch` branch was taken at least once |
|
||||
| Functions | >80% | Each function was called at least once |
|
||||
| Lines | >80% | Each line was executed at least once |
|
||||
|
||||
**Enforcement:**
|
||||
|
||||
Coverage is checked in the PR process:
|
||||
```bash
|
||||
npm run test:unit -- --coverage
|
||||
# Fails if any metric is below 80%
|
||||
```
|
||||
|
||||
Coverage reports are output to `coverage/lcov-report/index.html` for visual inspection.
|
||||
|
||||
The coverage threshold configuration is in `jest.config.ts`:
|
||||
```typescript
|
||||
coverageThreshold: {
|
||||
global: {
|
||||
statements: 80,
|
||||
branches: 80,
|
||||
functions: 80,
|
||||
lines: 80,
|
||||
},
|
||||
},
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10.4 How to Run the Test Suite
|
||||
|
||||
```bash
|
||||
# Run all tests (unit + integration)
|
||||
npm test
|
||||
|
||||
# Run only unit tests
|
||||
npm run test:unit
|
||||
|
||||
# Run only integration tests
|
||||
npm run test:integration
|
||||
|
||||
# Run unit tests with coverage report
|
||||
npm run test:unit -- --coverage
|
||||
# HTML report: coverage/lcov-report/index.html
|
||||
|
||||
# Run a single test file
|
||||
npx jest tests/unit/services/AgentService.test.ts
|
||||
|
||||
# Run tests matching a name pattern
|
||||
npx jest --testNamePattern="should throw FreeTierLimitError"
|
||||
|
||||
# Run tests in watch mode (re-runs on file changes)
|
||||
npx jest --watch
|
||||
|
||||
# Run with verbose output (shows each test name)
|
||||
npx jest --verbose
|
||||
```
|
||||
|
||||
**Integration test environment variables:**
|
||||
```bash
|
||||
export TEST_DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test
|
||||
export TEST_REDIS_URL=redis://localhost:6379/1
|
||||
npm run test:integration
|
||||
```
|
||||
|
||||
Using database index `/1` for Redis in tests prevents test runs from polluting
|
||||
the main database (index `0`) used for local development.
|
||||
|
||||
---
|
||||
|
||||
## 10.5 Unit Test Writing Conventions
|
||||
|
||||
Unit tests follow a strict pattern. Study this example carefully — it shows every
|
||||
convention in use.
|
||||
|
||||
**Real example from `tests/unit/services/AgentService.test.ts`:**
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Unit tests for src/services/AgentService.ts
|
||||
*/
|
||||
|
||||
import { AgentService } from '../../../src/services/AgentService';
|
||||
import { AgentRepository } from '../../../src/repositories/AgentRepository';
|
||||
import { CredentialRepository } from '../../../src/repositories/CredentialRepository';
|
||||
import { AuditService } from '../../../src/services/AuditService';
|
||||
import {
|
||||
AgentAlreadyExistsError,
|
||||
FreeTierLimitError,
|
||||
} from '../../../src/utils/errors';
|
||||
import { IAgent, ICreateAgentRequest } from '../../../src/types/index';
|
||||
|
||||
// Mock all dependencies — none of them execute real code
|
||||
jest.mock('../../../src/repositories/AgentRepository');
|
||||
jest.mock('../../../src/repositories/CredentialRepository');
|
||||
jest.mock('../../../src/services/AuditService');
|
||||
|
||||
// Get typed mock constructors so we can call .mockResolvedValue() on them
|
||||
const MockAgentRepository = AgentRepository as jest.MockedClass<typeof AgentRepository>;
|
||||
const MockCredentialRepository = CredentialRepository as jest.MockedClass<typeof CredentialRepository>;
|
||||
const MockAuditService = AuditService as jest.MockedClass<typeof AuditService>;
|
||||
|
||||
// Define a complete test fixture — reuse this instead of duplicating object literals
|
||||
const MOCK_AGENT: IAgent = {
|
||||
agentId: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
|
||||
email: 'agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'team-a',
|
||||
deploymentEnv: 'production',
|
||||
status: 'active',
|
||||
createdAt: new Date('2026-03-28T09:00:00Z'),
|
||||
updatedAt: new Date('2026-03-28T09:00:00Z'),
|
||||
};
|
||||
|
||||
describe('AgentService', () => {
|
||||
let agentService: AgentService;
|
||||
let agentRepo: jest.Mocked<AgentRepository>;
|
||||
let credentialRepo: jest.Mocked<CredentialRepository>;
|
||||
let auditService: jest.Mocked<AuditService>;
|
||||
|
||||
beforeEach(() => {
|
||||
// Clear all mocks before each test — prevents state leakage
|
||||
jest.clearAllMocks();
|
||||
// Create fresh mock instances for each test
|
||||
agentRepo = new MockAgentRepository({} as never) as jest.Mocked<AgentRepository>;
|
||||
credentialRepo = new MockCredentialRepository({} as never) as jest.Mocked<CredentialRepository>;
|
||||
auditService = new MockAuditService({} as never) as jest.Mocked<AuditService>;
|
||||
// Inject mocks into the system under test
|
||||
agentService = new AgentService(agentRepo, credentialRepo, auditService);
|
||||
});
|
||||
|
||||
describe('registerAgent()', () => {
|
||||
const createData: ICreateAgentRequest = {
|
||||
email: 'agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'team-a',
|
||||
deploymentEnv: 'production',
|
||||
};
|
||||
|
||||
it('should create and return a new agent', async () => {
|
||||
// Arrange — set up mock return values
|
||||
agentRepo.countActive.mockResolvedValue(0);
|
||||
agentRepo.findByEmail.mockResolvedValue(null);
|
||||
agentRepo.create.mockResolvedValue(MOCK_AGENT);
|
||||
auditService.logEvent.mockResolvedValue({} as never);
|
||||
|
||||
// Act — call the method under test
|
||||
const result = await agentService.registerAgent(createData, '127.0.0.1', 'test/1.0');
|
||||
|
||||
// Assert — verify the result
|
||||
expect(result).toEqual(MOCK_AGENT);
|
||||
// Also verify the mock was called with the right arguments
|
||||
expect(agentRepo.create).toHaveBeenCalledWith(createData);
|
||||
});
|
||||
|
||||
it('should throw FreeTierLimitError when 100 agents already registered', async () => {
|
||||
// Arrange — simulate limit reached
|
||||
agentRepo.countActive.mockResolvedValue(100);
|
||||
|
||||
// Assert error — rejects.toThrow checks the error type
|
||||
await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
|
||||
.rejects.toThrow(FreeTierLimitError);
|
||||
});
|
||||
|
||||
it('should throw AgentAlreadyExistsError if email is already registered', async () => {
|
||||
agentRepo.countActive.mockResolvedValue(0);
|
||||
agentRepo.findByEmail.mockResolvedValue(MOCK_AGENT); // Simulate existing agent
|
||||
|
||||
await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
|
||||
.rejects.toThrow(AgentAlreadyExistsError);
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Conventions explained:
|
||||
|
||||
1. **One test file per source file.** `AgentService.test.ts` tests `AgentService.ts`.
|
||||
2. **`jest.mock()` before any imports from the mocked module.** Jest hoists mock declarations.
|
||||
3. **`jest.clearAllMocks()` in `beforeEach`.** Prevents mock call counts from leaking between tests.
|
||||
4. **AAA pattern (Arrange, Act, Assert).** Every `it` block follows this order.
|
||||
5. **Test both the happy path and every error case.** A service with 3 error conditions
|
||||
needs at least 4 tests (1 success + 3 failures).
|
||||
6. **Verify mock calls for side effects.** Use `.toHaveBeenCalledWith()` to verify that
|
||||
`auditService.logEvent` was called with the right arguments, not just that it was called.
|
||||
7. **Use typed error assertions.** `.rejects.toThrow(FreeTierLimitError)` verifies the
|
||||
error type, not just a message string.
|
||||
|
||||
---
|
||||
|
||||
## 10.6 Integration Test Writing Conventions
|
||||
|
||||
Integration tests use Supertest to make real HTTP requests against a live Express app.
|
||||
|
||||
**Real example from `tests/integration/agents.test.ts`:**
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Integration tests for Agent Registry endpoints.
|
||||
*/
|
||||
|
||||
import crypto from 'crypto';
|
||||
import request from 'supertest';
|
||||
import { Application } from 'express';
|
||||
import { v4 as uuidv4 } from 'uuid';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
// Generate RSA keys for test tokens — done once per test module
|
||||
const { privateKey, publicKey } = crypto.generateKeyPairSync('rsa', {
|
||||
modulusLength: 2048,
|
||||
publicKeyEncoding: { type: 'spki', format: 'pem' },
|
||||
privateKeyEncoding: { type: 'pkcs8', format: 'pem' },
|
||||
});
|
||||
|
||||
// Set environment variables BEFORE importing the app
|
||||
process.env['DATABASE_URL'] = process.env['TEST_DATABASE_URL'] ?? 'postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test';
|
||||
process.env['REDIS_URL'] = process.env['TEST_REDIS_URL'] ?? 'redis://localhost:6379/1';
|
||||
process.env['JWT_PRIVATE_KEY'] = privateKey;
|
||||
process.env['JWT_PUBLIC_KEY'] = publicKey;
|
||||
process.env['NODE_ENV'] = 'test';
|
||||
|
||||
import { createApp } from '../../src/app';
|
||||
import { signToken } from '../../src/utils/jwt';
|
||||
import { closePool } from '../../src/db/pool';
|
||||
import { closeRedisClient } from '../../src/cache/redis';
|
||||
|
||||
// Helper: mint a valid test token
|
||||
function makeToken(sub: string = uuidv4(), scope: string = 'agents:read agents:write'): string {
|
||||
return signToken({ sub, client_id: sub, scope, jti: uuidv4() }, privateKey);
|
||||
}
|
||||
|
||||
describe('Agent Registry Integration Tests', () => {
|
||||
let app: Application;
|
||||
let pool: Pool;
|
||||
|
||||
beforeAll(async () => {
|
||||
// Boot the real Express app
|
||||
app = await createApp();
|
||||
pool = new Pool({ connectionString: process.env['DATABASE_URL'] });
|
||||
|
||||
// Create test tables (idempotent)
|
||||
await pool.query(`CREATE TABLE IF NOT EXISTS agents (...)`);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
// Clean up after each test — order matters (foreign key constraints)
|
||||
await pool.query('DELETE FROM audit_events');
|
||||
await pool.query('DELETE FROM credentials');
|
||||
await pool.query('DELETE FROM agents');
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
// Close all connections — prevents Jest from hanging
|
||||
await pool.end();
|
||||
await closePool();
|
||||
await closeRedisClient();
|
||||
});
|
||||
|
||||
describe('POST /api/v1/agents', () => {
|
||||
it('should register a new agent and return 201', async () => {
|
||||
const token = makeToken();
|
||||
|
||||
const res = await request(app)
|
||||
.post('/api/v1/agents')
|
||||
.set('Authorization', `Bearer ${token}`)
|
||||
.send({
|
||||
email: 'test-agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'test-team',
|
||||
deploymentEnv: 'development',
|
||||
});
|
||||
|
||||
expect(res.status).toBe(201);
|
||||
expect(res.body.agentId).toBeDefined();
|
||||
expect(res.body.email).toBe('test-agent@sentryagent.ai');
|
||||
expect(res.body.status).toBe('active');
|
||||
});
|
||||
|
||||
it('should return 401 without a token', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/v1/agents')
|
||||
.send({ email: 'test@sentryagent.ai' });
|
||||
|
||||
expect(res.status).toBe(401);
|
||||
});
|
||||
|
||||
it('should return 409 for duplicate email', async () => {
|
||||
const token = makeToken();
|
||||
const body = { email: 'dup@sentryagent.ai', agentType: 'screener', version: '1.0', capabilities: [], owner: 'team', deploymentEnv: 'development' };
|
||||
|
||||
await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
|
||||
const res = await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
|
||||
|
||||
expect(res.status).toBe(409);
|
||||
expect(res.body.code).toBe('AGENT_ALREADY_EXISTS');
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Conventions explained:
|
||||
|
||||
1. **Set `process.env` before importing the app.** The app reads env vars at import
|
||||
time (`getPool()`, JWT keys). Setting them after import does nothing.
|
||||
2. **`afterEach` cleanup.** Delete all rows after each test so tests are independent.
|
||||
Always delete in child-to-parent order (audit_events → credentials → agents)
|
||||
to respect foreign key constraints.
|
||||
3. **`afterAll` close connections.** Always close the pool and Redis client at the end
|
||||
of the suite. Jest will hang if connections remain open.
|
||||
4. **Test both success and failure status codes.** Every endpoint test must include
|
||||
an unauthenticated request (401) and an invalid request (400).
|
||||
5. **Verify response body shape.** Check `res.body.code` for error responses to
|
||||
verify the correct error type, not just the status code.
|
||||
6. **Use `makeToken()` for test tokens.** A helper function keeps token creation
|
||||
consistent across all integration test files.
|
||||
|
||||
---
|
||||
|
||||
## 10.7 OWASP Top 10 Security Testing Reference
|
||||
|
||||
These are the security concerns most relevant to an identity provider. For each,
|
||||
here is what AgentIdP does to mitigate the risk and how to test it.
|
||||
|
||||
| OWASP Category | Relevant risk | Mitigation | Test approach |
|
||||
|---------------|--------------|-----------|---------------|
|
||||
| **A01 Broken Access Control** | Agent A accesses agent B's credentials | `req.user.sub !== agentId` check in all credential endpoints | Test: send credential request with a token for agent A but agentId for agent B in the path — expect 403 |
|
||||
| **A02 Cryptographic Failures** | Weak credential secrets or JWT algorithm | `sk_live_<64 hex>` = 256-bit entropy; RS256 signing; bcrypt 10 rounds | Test: verify generated secrets are 72 chars; verify JWT header shows `alg: RS256` |
|
||||
| **A03 Injection** | SQL injection via input fields | Parameterised queries (`$1, $2, ...`) in all repositories | Test: send `'; DROP TABLE agents; --` as `owner` field — expect 400 from Joi validation |
|
||||
| **A05 Security Misconfiguration** | Server leaking stack traces | `errorHandler` returns generic 500 for unknown errors | Test: trigger an unexpected error (mock a repository to throw `new Error()`) — verify response body does not contain stack trace |
|
||||
| **A06 Vulnerable Components** | Outdated dependencies with CVEs | Regular `npm audit` | Run: `npm audit` in CI; fail on high/critical findings |
|
||||
| **A07 Auth Failures** | Timing attack on credential verification | `crypto.timingSafeEqual` in `VaultClient.verifySecret()`; bcrypt inherently timing-safe | Test: measure multiple failed verification attempts with wrong secrets of varying lengths — timing should not increase linearly with shared prefix length |
|
||||
| **A08 Integrity Failures** | Forged JWT tokens | RS256 verification rejects tokens signed with wrong key | Test: create a token signed with a different private key — expect 401 |
|
||||
| **A09 Logging Failures** | Auth failures not logged | `auth.failed` audit events written for every authentication failure | Test: attempt token issuance with wrong secret — verify `auth_events` table contains `auth.failed` row |
|
||||
| **A10 SSRF** | Not applicable to current API surface | No outbound HTTP from user-supplied URLs | N/A — no URL-accepting fields in current API |
|
||||
|
||||
**JWT algorithm confusion (bonus):**
|
||||
Test that the server rejects tokens with `alg: none` or `alg: HS256`. The
|
||||
`verifyToken()` function specifies `algorithms: ['RS256']`, which causes jsonwebtoken
|
||||
to reject any token with a different algorithm header.
|
||||
Reference in New Issue
Block a user