feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)

Rate limiting:
- Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window)
- Graceful fallback to RateLimiterMemory when Redis unreachable
- RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config
- Retry-After header on 429 responses
- agentidp_rate_limit_hits_total Prometheus counter

Database pool:
- Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS
- Defaults: max=20, min=2, idle=30s, conn timeout=5s
- agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges

Health endpoint:
- GET /health/detailed — per-service status (database, Redis, Vault, OPA)
- healthy / degraded (>1000ms) / unreachable classification
- HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable)

Load tests:
- tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs)
- npm run load-test script

Tests: 586 passing, zero TypeScript errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-04-02 04:20:37 +00:00
parent b0f70b7ac4
commit 1b682c22b2
16 changed files with 1467 additions and 113 deletions

View File

@@ -116,3 +116,34 @@ export const auditChainIntegrity = new Gauge({
help: 'Binary gauge: 1 = most recent audit chain verification passed, 0 = failed.',
registers: [metricsRegistry],
});
/**
* Total number of HTTP 429 responses returned by the rate limiter.
* Labels: endpoint (req.path at time of rejection)
*/
export const rateLimitHitsTotal = new Counter({
name: 'agentidp_rate_limit_hits_total',
help: 'Total number of HTTP 429 responses returned by the rate limiter.',
labelNames: ['endpoint'] as const,
registers: [metricsRegistry],
});
/**
* Current number of active (checked-out) PostgreSQL pool connections.
* Updated on pool `acquire` and `remove` events.
*/
export const dbPoolActiveConnections = new Gauge({
name: 'agentidp_db_pool_active_connections',
help: 'Current number of active (checked-out) PostgreSQL pool connections.',
registers: [metricsRegistry],
});
/**
* Current number of waiting client requests in the PostgreSQL pool queue.
* Updated whenever the pool queue length changes.
*/
export const dbPoolWaitingRequests = new Gauge({
name: 'agentidp_db_pool_waiting_requests',
help: 'Current number of requests waiting for a PostgreSQL connection.',
registers: [metricsRegistry],
});