feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)

Rate limiting:
- Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window)
- Graceful fallback to RateLimiterMemory when Redis unreachable
- RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config
- Retry-After header on 429 responses
- agentidp_rate_limit_hits_total Prometheus counter

Database pool:
- Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS
- Defaults: max=20, min=2, idle=30s, conn timeout=5s
- agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges

Health endpoint:
- GET /health/detailed — per-service status (database, Redis, Vault, OPA)
- healthy / degraded (>1000ms) / unreachable classification
- HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable)

Load tests:
- tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs)
- npm run load-test script

Tests: 586 passing, zero TypeScript errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-04-02 04:20:37 +00:00
parent b0f70b7ac4
commit 1b682c22b2
16 changed files with 1467 additions and 113 deletions

87
tests/load/README.md Normal file
View File

@@ -0,0 +1,87 @@
# Load Tests — SentryAgent.ai AgentIdP
Load tests are written for [k6](https://k6.io/) and cover the three most
performance-critical API flows.
## Prerequisites
Install k6 on your machine (one-time):
```bash
# macOS
brew install k6
# Ubuntu / Debian
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
# Windows (Chocolatey)
choco install k6
```
## Environment Variables
Each script reads the following env vars:
| Variable | Default | Description |
|-------------------|--------------------------------|--------------------------------------|
| `BASE_URL` | `http://localhost:3000` | AgentIdP base URL |
| `CLIENT_ID` | *(required for token test)* | OAuth2 client_id for token issuance |
| `CLIENT_SECRET` | *(required for token test)* | OAuth2 client_secret |
| `AGENT_ID` | *(required for rotation test)* | Agent ID for credential rotation |
Export them before running:
```bash
export BASE_URL=http://localhost:3000
export CLIENT_ID=your-client-id
export CLIENT_SECRET=your-client-secret
export AGENT_ID=your-agent-id
```
## Running Individual Scenarios
```bash
# Agent Registration — 100 VUs, 60s
k6 run tests/load/agent-registration.js
# Token Issuance — 1000 VUs, 60s
k6 run tests/load/token-issuance.js
# Credential Rotation — 50 VUs, 60s
k6 run tests/load/credential-rotation.js
```
## Running All Scenarios (npm script)
```bash
npm run load-test
```
This runs all three scenarios sequentially, matching the same order as the CI
pipeline.
## Pass / Fail Thresholds
All scenarios enforce these thresholds (tests FAIL if any is breached):
| Metric | Threshold |
|-------------------------|------------|
| p95 response time | < 500 ms |
| HTTP error rate | < 1 % |
k6 exits with a non-zero status code when any threshold is breached, making it
safe to use in CI pipelines.
## Results
k6 prints a summary table to stdout on completion. For HTML reports:
```bash
k6 run --out json=results.json tests/load/agent-registration.js
k6 report results.json
```