sentryagent-idp/openspec/changes/phase-4-developer-growth/tasks.md at 1b682c22b2f21ec31df35fb8044246e10a7009c0

Files

SentryAgent.ai Developer 1b682c22b2 feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)

Rate limiting:
- Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window)
- Graceful fallback to RateLimiterMemory when Redis unreachable
- RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config
- Retry-After header on 429 responses
- agentidp_rate_limit_hits_total Prometheus counter

Database pool:
- Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS
- Defaults: max=20, min=2, idle=30s, conn timeout=5s
- agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges

Health endpoint:
- GET /health/detailed — per-service status (database, Redis, Vault, OPA)
- healthy / degraded (>1000ms) / unreachable classification
- HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable)

Load tests:
- tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs)
- npm run load-test script

Tests: 586 passing, zero TypeScript errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-02 04:20:37 +00:00

11 KiB

Raw Blame History

1. WS1: Production Hardening — Redis Rate Limiting

1.1 Install ioredis and rate-limiter-flexible — add to package.json dependencies
1.2 Create src/infrastructure/redisClient.ts — singleton ioredis client with connection error handling and REDIS_RATE_LIMIT_ENABLED env var guard
1.3 Replace in-memory express-rate-limit with RateLimiterRedis from rate-limiter-flexible — sliding window, configurable via RATE_LIMIT_WINDOW_MS and RATE_LIMIT_MAX_REQUESTS
1.4 Implement graceful fallback to RateLimiterMemory when Redis is unreachable
1.5 Add agentidp_rate_limit_hits_total Prometheus counter (labels: endpoint) — increment on HTTP 429
1.6 Update rate limiter middleware to set Retry-After header on rejection
1.7 Write unit tests for rate limiter middleware — Redis path, fallback path, 429 response shape

2. WS1: Production Hardening — Database Pool & Health

2.1 Add DB_POOL_MAX, DB_POOL_MIN, DB_POOL_IDLE_TIMEOUT_MS, DB_POOL_CONNECTION_TIMEOUT_MS env vars to .env.example and database config
2.2 Configure pg.Pool with explicit pool parameters; defaults: max=20, min=2, idle=30000ms, conn timeout=5000ms
2.3 Expose agentidp_db_pool_active_connections gauge and agentidp_db_pool_waiting_requests gauge — update on pool events
2.4 Create GET /health/detailed route and controller — check database, Redis, Vault (if configured), OPA (if configured)
2.5 Implement per-service health checks with latency measurement — healthy / degraded (>1000ms) / unreachable (timeout/error)
2.6 Return HTTP 200 (all healthy), HTTP 207 (any degraded), HTTP 503 (any unreachable)
2.7 Write unit tests for health controller — all healthy, degraded, unreachable scenarios

3. WS1: Production Hardening — Load Tests

3.1 Install k6 and create tests/load/ directory with README.md explaining how to run tests
3.2 Write tests/load/agent-registration.js — 100 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
3.3 Write tests/load/token-issuance.js — 1000 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
3.4 Write tests/load/credential-rotation.js — 50 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
3.5 Add npm run load-test script to package.json running all three k6 scenarios sequentially

4. WS2: Developer Portal — Setup & Core Pages

4.1 Scaffold portal/ as a standalone Next.js 14 app with Tailwind CSS — npx create-next-app@latest portal --typescript --tailwind
4.2 Add NEXT_PUBLIC_API_URL env var support — create portal/.env.example
4.3 Create portal home page (portal/app/page.tsx) — hero, product description, CTA to /get-started
4.4 Create /pricing page with free tier limits table (10 agents, 1,000 calls/day) and paid tier CTA
4.5 Create /sdks page listing all 4 SDKs with installation commands and minimal code examples
4.6 Create shared nav component with links to: Home, API Explorer, Get Started, SDKs, Pricing

5. WS2: Developer Portal — API Explorer & Onboarding Wizard

5.1 Install swagger-ui-react in portal/ — add to portal package.json
5.2 Create /api-explorer page embedding Swagger UI loaded from NEXT_PUBLIC_API_URL/openapi.json
5.3 Configure Swagger UI with persistAuthorization: true and Bearer token auth scheme
5.4 Create /get-started wizard — Step 1: account setup instructions
5.5 Create wizard Step 2: agent name input → calls POST /agents via API → displays agent ID
5.6 Create wizard Step 3: generate credentials → calls credentials endpoint → displays client ID/secret with copy buttons
5.7 Create wizard Step 4: SDK selection → displays ready-to-run code snippet for chosen SDK (Node.js / Python / Go / Java)
5.8 Wizard state management using React useState — no external state library needed
5.9 Build portal/ — npm run build passes without errors or TypeScript errors

6. WS3: CLI Tool — Setup & Configuration

6.1 Scaffold cli/ directory with package.json (name: sentryagent, bin: sentryagent) — TypeScript with commander and chalk
6.2 Create cli/src/config.ts — read/write ~/.sentryagent/config.json with apiUrl, clientId, clientSecret
6.3 Implement sentryagent configure command — prompts for API URL, client ID, client secret using readline — writes to config file
6.4 Implement config validation helper — fail with "Not configured. Run sentryagent configure first." if config missing
6.5 Implement sentryagent --version outputting version from package.json
6.6 Implement sentryagent --help showing all available commands

7. WS3: CLI Tool — Agent Commands

7.1 Implement sentryagent register-agent --name <name> [--description <desc>] — calls POST /agents, outputs agent ID
7.2 Implement sentryagent list-agents — calls GET /agents, outputs formatted table with chalk
7.3 Implement sentryagent issue-token --agent-id <id> — calls POST /oauth2/token, outputs access token and expiry
7.4 Implement sentryagent rotate-credentials --agent-id <id> — prompts for confirmation, calls rotate endpoint, outputs new secret
7.5 Implement sentryagent tail-audit-log [--agent-id <id>] — polls GET /audit/logs every 5s, streams new events to stdout, runs until Ctrl+C
7.6 Implement sentryagent completion bash and sentryagent completion zsh — output shell completion scripts
7.7 Write cli/README.md — installation, configuration, all commands with examples, shell completion setup
7.8 Build CLI — npm run build in cli/ passes; node dist/index.js --help works

8. WS4: Agent Marketplace

8.1 Add is_public boolean column (default false) to agents table — create migration 006_add_agent_marketplace.sql
8.2 Update PATCH /agents/:id to accept isPublic field — update AgentService and AgentController
8.3 Create MarketplaceService with listPublicAgents(filters, pagination) and getPublicAgent(agentId) methods
8.4 Create GET /marketplace/agents endpoint — unauthenticated, paginated, supports ?q=, ?capability=, ?publisher= filters
8.5 Create GET /marketplace/agents/:agentId endpoint — unauthenticated, returns agent with DID document and agent card
8.6 Add agentidp_tenant_api_calls_total Prometheus counter (label: tenant_id) — increment on authenticated requests
8.7 Add MARKETPLACE_ENABLED feature flag — return 404 on all marketplace routes when disabled
8.8 Write unit tests for MarketplaceService — list, filter, get, public/private visibility
8.9 Update OpenAPI spec to document /marketplace/agents endpoints

9. WS5: GitHub Actions

9.1 Create .github/actions/register-agent/action.yml — inputs: api-url, agent-name, agent-description; outputs: agent-id
9.2 Implement register-agent Action script (action.js) — exchange GitHub OIDC token via POST /oidc/token, then call POST /agents
9.3 Implement OIDC token exchange error handling in register-agent — clear error message with trust policy setup link
9.4 Create .github/actions/issue-token/action.yml — inputs: api-url, agent-id; outputs: access-token, expires-at
9.5 Implement issue-token Action script — exchange GitHub OIDC token, call POST /oauth2/token, mask token with core.setSecret()
9.6 Create POST /oidc/trust-policies endpoint — accepts provider, repository, branch, agentId — stores trust policy
9.7 Enforce trust policy on GitHub OIDC token exchange — reject tokens from repos not matching a registered policy with HTTP 403
9.8 Write register-agent/README.md — purpose, OIDC trust policy setup, inputs, outputs, example workflow
9.9 Write issue-token/README.md — same structure as register-agent README

10. WS6: Billing & Usage Metering

10.1 Create migration 007_add_billing.sql — tenant_subscriptions table (tenant_id, status, stripe_customer_id, stripe_subscription_id, current_period_end) and usage_events table (tenant_id, date, metric_type, count)
10.2 Install stripe npm package — add to package.json
10.3 Create UsageMeteringMiddleware — increments in-memory per-tenant counters on every authenticated request; flushes to usage_events every 60s
10.4 Create UsageService with getDailyUsage(tenantId, date) and getActivAgentCount(tenantId) methods
10.5 Create FreeTierEnforcementMiddleware — checks usage cache (Redis, 60s TTL) before agent creation and API calls; rejects with HTTP 429 when limit exceeded; skips when BILLING_ENABLED=false
10.6 Add agentidp_billing_limit_rejections_total Prometheus counter (labels: tenant_id, limit_type)
10.7 Create BillingService with createCheckoutSession(tenantId), handleWebhookEvent(event), getSubscriptionStatus(tenantId) methods
10.8 Create POST /billing/checkout endpoint — creates Stripe Checkout session, returns checkoutUrl
10.9 Create POST /billing/webhook endpoint — verifies Stripe signature, processes subscription events, updates tenant_subscriptions
10.10 Create GET /billing/usage endpoint (authenticated) — returns current period usage summary for tenant
10.11 Add BILLING_ENABLED env var — disable enforcement and Stripe processing when false; document in .env.example
10.12 Write unit tests for UsageService, BillingService, FreeTierEnforcementMiddleware — free tier block, paid tier pass-through, webhook processing
10.13 Update web dashboard — add "Usage" tab to navigation with billing status panel and usage metrics from GET /billing/usage

11. QA & Release

11.1 Run full TypeScript check across all packages (tsc --noEmit) — zero errors
11.2 Run all unit tests (npm test) — all pass, coverage ≥ 80%
11.3 Run k6 load tests — all thresholds pass (p95 < 500ms, error rate < 1%)
11.4 Verify GET /health/detailed returns correct status for all dependency states
11.5 Verify marketplace endpoints are unauthenticated and return correct data
11.6 Verify Stripe webhook signature rejection on invalid signature
11.7 Verify free tier limit enforcement with BILLING_ENABLED=true
11.8 Verify BILLING_ENABLED=false disables enforcement without breaking metering
11.9 Build portal — npm run build passes in portal/
11.10 Build CLI — npm run build passes in cli/; sentryagent --help works
11.11 Commit all Phase 4 work on main — conventional commit message per workstream

11 KiB Raw Blame History

1. WS1: Production Hardening — Redis Rate Limiting

2. WS1: Production Hardening — Database Pool & Health

3. WS1: Production Hardening — Load Tests

4. WS2: Developer Portal — Setup & Core Pages

5. WS2: Developer Portal — API Explorer & Onboarding Wizard

6. WS3: CLI Tool — Setup & Configuration

7. WS3: CLI Tool — Agent Commands

8. WS4: Agent Marketplace

9. WS5: GitHub Actions

10. WS6: Billing & Usage Metering

11. QA & Release

11 KiB

Raw Blame History