Files
sentryagent-idp/openspec/changes/phase-4-developer-growth/tasks.md
SentryAgent.ai Developer 1b682c22b2 feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)
Rate limiting:
- Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window)
- Graceful fallback to RateLimiterMemory when Redis unreachable
- RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config
- Retry-After header on 429 responses
- agentidp_rate_limit_hits_total Prometheus counter

Database pool:
- Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS
- Defaults: max=20, min=2, idle=30s, conn timeout=5s
- agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges

Health endpoint:
- GET /health/detailed — per-service status (database, Redis, Vault, OPA)
- healthy / degraded (>1000ms) / unreachable classification
- HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable)

Load tests:
- tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs)
- npm run load-test script

Tests: 586 passing, zero TypeScript errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 04:20:37 +00:00

11 KiB

1. WS1: Production Hardening — Redis Rate Limiting

  • 1.1 Install ioredis and rate-limiter-flexible — add to package.json dependencies
  • 1.2 Create src/infrastructure/redisClient.ts — singleton ioredis client with connection error handling and REDIS_RATE_LIMIT_ENABLED env var guard
  • 1.3 Replace in-memory express-rate-limit with RateLimiterRedis from rate-limiter-flexible — sliding window, configurable via RATE_LIMIT_WINDOW_MS and RATE_LIMIT_MAX_REQUESTS
  • 1.4 Implement graceful fallback to RateLimiterMemory when Redis is unreachable
  • 1.5 Add agentidp_rate_limit_hits_total Prometheus counter (labels: endpoint) — increment on HTTP 429
  • 1.6 Update rate limiter middleware to set Retry-After header on rejection
  • 1.7 Write unit tests for rate limiter middleware — Redis path, fallback path, 429 response shape

2. WS1: Production Hardening — Database Pool & Health

  • 2.1 Add DB_POOL_MAX, DB_POOL_MIN, DB_POOL_IDLE_TIMEOUT_MS, DB_POOL_CONNECTION_TIMEOUT_MS env vars to .env.example and database config
  • 2.2 Configure pg.Pool with explicit pool parameters; defaults: max=20, min=2, idle=30000ms, conn timeout=5000ms
  • 2.3 Expose agentidp_db_pool_active_connections gauge and agentidp_db_pool_waiting_requests gauge — update on pool events
  • 2.4 Create GET /health/detailed route and controller — check database, Redis, Vault (if configured), OPA (if configured)
  • 2.5 Implement per-service health checks with latency measurement — healthy / degraded (>1000ms) / unreachable (timeout/error)
  • 2.6 Return HTTP 200 (all healthy), HTTP 207 (any degraded), HTTP 503 (any unreachable)
  • 2.7 Write unit tests for health controller — all healthy, degraded, unreachable scenarios

3. WS1: Production Hardening — Load Tests

  • 3.1 Install k6 and create tests/load/ directory with README.md explaining how to run tests
  • 3.2 Write tests/load/agent-registration.js — 100 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
  • 3.3 Write tests/load/token-issuance.js — 1000 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
  • 3.4 Write tests/load/credential-rotation.js — 50 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
  • 3.5 Add npm run load-test script to package.json running all three k6 scenarios sequentially

4. WS2: Developer Portal — Setup & Core Pages

  • 4.1 Scaffold portal/ as a standalone Next.js 14 app with Tailwind CSS — npx create-next-app@latest portal --typescript --tailwind
  • 4.2 Add NEXT_PUBLIC_API_URL env var support — create portal/.env.example
  • 4.3 Create portal home page (portal/app/page.tsx) — hero, product description, CTA to /get-started
  • 4.4 Create /pricing page with free tier limits table (10 agents, 1,000 calls/day) and paid tier CTA
  • 4.5 Create /sdks page listing all 4 SDKs with installation commands and minimal code examples
  • 4.6 Create shared nav component with links to: Home, API Explorer, Get Started, SDKs, Pricing

5. WS2: Developer Portal — API Explorer & Onboarding Wizard

  • 5.1 Install swagger-ui-react in portal/ — add to portal package.json
  • 5.2 Create /api-explorer page embedding Swagger UI loaded from NEXT_PUBLIC_API_URL/openapi.json
  • 5.3 Configure Swagger UI with persistAuthorization: true and Bearer token auth scheme
  • 5.4 Create /get-started wizard — Step 1: account setup instructions
  • 5.5 Create wizard Step 2: agent name input → calls POST /agents via API → displays agent ID
  • 5.6 Create wizard Step 3: generate credentials → calls credentials endpoint → displays client ID/secret with copy buttons
  • 5.7 Create wizard Step 4: SDK selection → displays ready-to-run code snippet for chosen SDK (Node.js / Python / Go / Java)
  • 5.8 Wizard state management using React useState — no external state library needed
  • 5.9 Build portal/npm run build passes without errors or TypeScript errors

6. WS3: CLI Tool — Setup & Configuration

  • 6.1 Scaffold cli/ directory with package.json (name: sentryagent, bin: sentryagent) — TypeScript with commander and chalk
  • 6.2 Create cli/src/config.ts — read/write ~/.sentryagent/config.json with apiUrl, clientId, clientSecret
  • 6.3 Implement sentryagent configure command — prompts for API URL, client ID, client secret using readline — writes to config file
  • 6.4 Implement config validation helper — fail with "Not configured. Run sentryagent configure first." if config missing
  • 6.5 Implement sentryagent --version outputting version from package.json
  • 6.6 Implement sentryagent --help showing all available commands

7. WS3: CLI Tool — Agent Commands

  • 7.1 Implement sentryagent register-agent --name <name> [--description <desc>] — calls POST /agents, outputs agent ID
  • 7.2 Implement sentryagent list-agents — calls GET /agents, outputs formatted table with chalk
  • 7.3 Implement sentryagent issue-token --agent-id <id> — calls POST /oauth2/token, outputs access token and expiry
  • 7.4 Implement sentryagent rotate-credentials --agent-id <id> — prompts for confirmation, calls rotate endpoint, outputs new secret
  • 7.5 Implement sentryagent tail-audit-log [--agent-id <id>] — polls GET /audit/logs every 5s, streams new events to stdout, runs until Ctrl+C
  • 7.6 Implement sentryagent completion bash and sentryagent completion zsh — output shell completion scripts
  • 7.7 Write cli/README.md — installation, configuration, all commands with examples, shell completion setup
  • 7.8 Build CLI — npm run build in cli/ passes; node dist/index.js --help works

8. WS4: Agent Marketplace

  • 8.1 Add is_public boolean column (default false) to agents table — create migration 006_add_agent_marketplace.sql
  • 8.2 Update PATCH /agents/:id to accept isPublic field — update AgentService and AgentController
  • 8.3 Create MarketplaceService with listPublicAgents(filters, pagination) and getPublicAgent(agentId) methods
  • 8.4 Create GET /marketplace/agents endpoint — unauthenticated, paginated, supports ?q=, ?capability=, ?publisher= filters
  • 8.5 Create GET /marketplace/agents/:agentId endpoint — unauthenticated, returns agent with DID document and agent card
  • 8.6 Add agentidp_tenant_api_calls_total Prometheus counter (label: tenant_id) — increment on authenticated requests
  • 8.7 Add MARKETPLACE_ENABLED feature flag — return 404 on all marketplace routes when disabled
  • 8.8 Write unit tests for MarketplaceService — list, filter, get, public/private visibility
  • 8.9 Update OpenAPI spec to document /marketplace/agents endpoints

9. WS5: GitHub Actions

  • 9.1 Create .github/actions/register-agent/action.yml — inputs: api-url, agent-name, agent-description; outputs: agent-id
  • 9.2 Implement register-agent Action script (action.js) — exchange GitHub OIDC token via POST /oidc/token, then call POST /agents
  • 9.3 Implement OIDC token exchange error handling in register-agent — clear error message with trust policy setup link
  • 9.4 Create .github/actions/issue-token/action.yml — inputs: api-url, agent-id; outputs: access-token, expires-at
  • 9.5 Implement issue-token Action script — exchange GitHub OIDC token, call POST /oauth2/token, mask token with core.setSecret()
  • 9.6 Create POST /oidc/trust-policies endpoint — accepts provider, repository, branch, agentId — stores trust policy
  • 9.7 Enforce trust policy on GitHub OIDC token exchange — reject tokens from repos not matching a registered policy with HTTP 403
  • 9.8 Write register-agent/README.md — purpose, OIDC trust policy setup, inputs, outputs, example workflow
  • 9.9 Write issue-token/README.md — same structure as register-agent README

10. WS6: Billing & Usage Metering

  • 10.1 Create migration 007_add_billing.sqltenant_subscriptions table (tenant_id, status, stripe_customer_id, stripe_subscription_id, current_period_end) and usage_events table (tenant_id, date, metric_type, count)
  • 10.2 Install stripe npm package — add to package.json
  • 10.3 Create UsageMeteringMiddleware — increments in-memory per-tenant counters on every authenticated request; flushes to usage_events every 60s
  • 10.4 Create UsageService with getDailyUsage(tenantId, date) and getActivAgentCount(tenantId) methods
  • 10.5 Create FreeTierEnforcementMiddleware — checks usage cache (Redis, 60s TTL) before agent creation and API calls; rejects with HTTP 429 when limit exceeded; skips when BILLING_ENABLED=false
  • 10.6 Add agentidp_billing_limit_rejections_total Prometheus counter (labels: tenant_id, limit_type)
  • 10.7 Create BillingService with createCheckoutSession(tenantId), handleWebhookEvent(event), getSubscriptionStatus(tenantId) methods
  • 10.8 Create POST /billing/checkout endpoint — creates Stripe Checkout session, returns checkoutUrl
  • 10.9 Create POST /billing/webhook endpoint — verifies Stripe signature, processes subscription events, updates tenant_subscriptions
  • 10.10 Create GET /billing/usage endpoint (authenticated) — returns current period usage summary for tenant
  • 10.11 Add BILLING_ENABLED env var — disable enforcement and Stripe processing when false; document in .env.example
  • 10.12 Write unit tests for UsageService, BillingService, FreeTierEnforcementMiddleware — free tier block, paid tier pass-through, webhook processing
  • 10.13 Update web dashboard — add "Usage" tab to navigation with billing status panel and usage metrics from GET /billing/usage

11. QA & Release

  • 11.1 Run full TypeScript check across all packages (tsc --noEmit) — zero errors
  • 11.2 Run all unit tests (npm test) — all pass, coverage ≥ 80%
  • 11.3 Run k6 load tests — all thresholds pass (p95 < 500ms, error rate < 1%)
  • 11.4 Verify GET /health/detailed returns correct status for all dependency states
  • 11.5 Verify marketplace endpoints are unauthenticated and return correct data
  • 11.6 Verify Stripe webhook signature rejection on invalid signature
  • 11.7 Verify free tier limit enforcement with BILLING_ENABLED=true
  • 11.8 Verify BILLING_ENABLED=false disables enforcement without breaking metering
  • 11.9 Build portal — npm run build passes in portal/
  • 11.10 Build CLI — npm run build passes in cli/; sentryagent --help works
  • 11.11 Commit all Phase 4 work on main — conventional commit message per workstream