feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)

Rate limiting: - Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window) - Graceful fallback to RateLimiterMemory when Redis unreachable - RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config - Retry-After header on 429 responses - agentidp_rate_limit_hits_total Prometheus counter Database pool: - Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS - Defaults: max=20, min=2, idle=30s, conn timeout=5s - agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges Health endpoint: - GET /health/detailed — per-service status (database, Redis, Vault, OPA) - healthy / degraded (>1000ms) / unreachable classification - HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable) Load tests: - tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs) - npm run load-test script Tests: 586 passing, zero TypeScript errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 04:20:37 +00:00
parent b0f70b7ac4
commit 1b682c22b2
16 changed files with 1467 additions and 113 deletions
--- a/tests/load/README.md
+++ b/tests/load/README.md
@@ -0,0 +1,87 @@
+# Load Tests — SentryAgent.ai AgentIdP
+
+Load tests are written for [k6](https://k6.io/) and cover the three most
+performance-critical API flows.
+
+## Prerequisites
+
+Install k6 on your machine (one-time):
+
+```bash
+# macOS
+brew install k6
+
+# Ubuntu / Debian
+sudo gpg -k
+sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
+  --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
+echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
+  | sudo tee /etc/apt/sources.list.d/k6.list
+sudo apt-get update && sudo apt-get install k6
+
+# Windows (Chocolatey)
+choco install k6
+```
+
+## Environment Variables
+
+Each script reads the following env vars:
+
+| Variable          | Default                        | Description                          |
+|-------------------|--------------------------------|--------------------------------------|
+| `BASE_URL`        | `http://localhost:3000`        | AgentIdP base URL                    |
+| `CLIENT_ID`       | *(required for token test)*    | OAuth2 client_id for token issuance  |
+| `CLIENT_SECRET`   | *(required for token test)*    | OAuth2 client_secret                 |
+| `AGENT_ID`        | *(required for rotation test)* | Agent ID for credential rotation     |
+
+Export them before running:
+
+```bash
+export BASE_URL=http://localhost:3000
+export CLIENT_ID=your-client-id
+export CLIENT_SECRET=your-client-secret
+export AGENT_ID=your-agent-id
+```
+
+## Running Individual Scenarios
+
+```bash
+# Agent Registration — 100 VUs, 60s
+k6 run tests/load/agent-registration.js
+
+# Token Issuance — 1000 VUs, 60s
+k6 run tests/load/token-issuance.js
+
+# Credential Rotation — 50 VUs, 60s
+k6 run tests/load/credential-rotation.js
+```
+
+## Running All Scenarios (npm script)
+
+```bash
+npm run load-test
+```
+
+This runs all three scenarios sequentially, matching the same order as the CI
+pipeline.
+
+## Pass / Fail Thresholds
+
+All scenarios enforce these thresholds (tests FAIL if any is breached):
+
+| Metric                  | Threshold  |
+|-------------------------|------------|
+| p95 response time       | < 500 ms   |
+| HTTP error rate         | < 1 %      |
+
+k6 exits with a non-zero status code when any threshold is breached, making it
+safe to use in CI pipelines.
+
+## Results
+
+k6 prints a summary table to stdout on completion. For HTML reports:
+
+```bash
+k6 run --out json=results.json tests/load/agent-registration.js
+k6 report results.json
+```