## ADDED Requirements

### Requirement: Redis-backed distributed rate limiting replaces in-memory limiter
The system SHALL use `ioredis` + `rate-limiter-flexible` to enforce rate limits across all Express instances using a Redis sliding window algorithm. The in-memory `express-rate-limit` store SHALL be removed. Rate limit configuration SHALL be injectable via environment variables (`RATE_LIMIT_WINDOW_MS`, `RATE_LIMIT_MAX_REQUESTS`). When `REDIS_RATE_LIMIT_ENABLED=false`, the system SHALL fall back to an in-memory limiter for local development.

#### Scenario: Rate limit enforced across multiple instances
- **WHEN** two Express instances are running behind a load balancer and a client sends requests alternating between instances
- **THEN** the rate limit counter is shared across both instances via Redis and the client is rejected after the combined limit is reached

#### Scenario: Redis unavailable — graceful fallback
- **WHEN** Redis is unreachable and `REDIS_RATE_LIMIT_ENABLED=true`
- **THEN** the system SHALL log a warning and fall back to in-memory limiting rather than rejecting all requests

#### Scenario: Rate limit exceeded
- **WHEN** a client exceeds the configured request limit within the window
- **THEN** the system SHALL respond with HTTP 429 and a `Retry-After` header indicating when the window resets

### Requirement: Database connection pool is explicitly configured
The system SHALL configure `pg` connection pool with explicit `max`, `min`, `idleTimeoutMillis`, and `connectionTimeoutMillis` parameters via environment variables (`DB_POOL_MAX`, `DB_POOL_MIN`, `DB_POOL_IDLE_TIMEOUT_MS`, `DB_POOL_CONNECTION_TIMEOUT_MS`). Defaults SHALL be: max=20, min=2, idleTimeout=30000ms, connectionTimeout=5000ms.

#### Scenario: Pool exhaustion under load
- **WHEN** all pool connections are in use and a new query is requested
- **THEN** the system SHALL queue the request and resolve it within `DB_POOL_CONNECTION_TIMEOUT_MS`, or reject with a 503 if timeout is exceeded

#### Scenario: Idle connections are reaped
- **WHEN** a connection has been idle for longer than `DB_POOL_IDLE_TIMEOUT_MS`
- **THEN** the pool SHALL close the connection and reduce active pool size toward `min`

### Requirement: Detailed health endpoint reports per-service status
The system SHALL expose `GET /health/detailed` returning a JSON object with individual status for each dependency: `database`, `redis`, `vault` (if configured), `opa` (if configured). Each service SHALL report `status` (`healthy` | `degraded` | `unreachable`), `latencyMs`, and an optional `message`. The overall response status SHALL be HTTP 200 if all services are healthy, HTTP 207 if any are degraded, and HTTP 503 if any are unreachable.

#### Scenario: All services healthy
- **WHEN** all dependencies respond within acceptable latency
- **THEN** `GET /health/detailed` returns HTTP 200 with all services reporting `status: "healthy"`

#### Scenario: Redis unreachable
- **WHEN** Redis does not respond within 2000ms
- **THEN** `GET /health/detailed` returns HTTP 503 with `redis.status: "unreachable"` and overall `status: "unhealthy"`

#### Scenario: Vault degraded
- **WHEN** Vault responds but with latency exceeding 1000ms
- **THEN** `GET /health/detailed` returns HTTP 207 with `vault.status: "degraded"` and a latency measurement

### Requirement: k6 load test suite validates production readiness
The system SHALL include a k6 load test suite at `tests/load/` covering: agent registration under load (100 virtual users, 60s), token issuance under load (1000 virtual users, 60s), and credential rotation under load (50 virtual users, 60s). Each scenario SHALL define pass/fail thresholds: p95 response time < 500ms, error rate < 1%.

#### Scenario: Token issuance load test passes thresholds
- **WHEN** the k6 load test `token-issuance.js` runs with 1000 virtual users for 60 seconds
- **THEN** p95 response time SHALL be below 500ms and error rate SHALL be below 1%

#### Scenario: Load test threshold failure surfaces clearly
- **WHEN** a k6 threshold is breached during the load test run
- **THEN** the k6 process SHALL exit with a non-zero exit code, making CI failure explicit