Files
sentryagent-idp/openspec/changes/phase-6-market-expansion/design.md
SentryAgent.ai Developer 0fad328329 feat(openspec): propose phase-6-market-expansion change
Analytics Dashboard, API Gateway Tiers, AGNTCY Compliance — 62 tasks across 8 groups.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 12:57:23 +00:00

5.8 KiB

Context

SentryAgent.ai AgentIdP has completed four major phases: core OAuth2/agent identity (Phase 1), enterprise features — multi-tenancy, DIDs, OIDC, federation, webhooks, SOC2 (Phase 3), production hardening and developer tooling (Phase 4), and SDK ecosystem + A2A authorization (Phase 5). The platform is feature-complete for identity; Phase 6 makes it commercially competitive by adding visibility (analytics), monetization (tiers), and ecosystem certification (AGNTCY compliance).

Current state constraints:

  • Stack: Node.js 18+, TypeScript strict, Express, PostgreSQL 14+, Redis 7+, React/Next.js 14 (portal)
  • Existing Redis rate limiter: per-agent token bucket keys (rate:agent:<id>)
  • Existing tenant model: tenants table with plan column (currently unused for enforcement)
  • Existing audit log: audit_events table (used for SOC2, not analytics)
  • Portal: Next.js 14 at portal/, API explorer via Stoplight Elements added in Phase 5

Goals / Non-Goals

Goals:

  • Deliver tenant-facing analytics (token trends, agent activity heatmaps) via new API + portal views
  • Enforce tier-based rate limits (free/pro/enterprise) transparently via existing middleware layer
  • Expose self-service tier upgrade endpoint (integration with existing Stripe billing from Phase 4)
  • Generate AGNTCY-standard compliance reports and agent cards on demand
  • Provide an interoperability test suite for AGNTCY ecosystem partners

Non-Goals:

  • Real-time streaming analytics (WebSocket push) — batch/polling sufficient for Phase 6
  • Billing tier downgrades (upgrade only; downgrade requires support intervention)
  • Multi-region analytics aggregation — single-region PostgreSQL sufficient
  • Full AGNTCY federation partner onboarding UI — report + card export is the deliverable

Decisions

D1: Analytics Storage — Dedicated Table vs. Querying audit_events

Decision: New analytics_events table with pre-aggregated daily rollups.

Rationale: audit_events is an append-only SOC2 ledger with a Merkle chain — querying it for analytics requires full scans and breaks its integrity guarantees. A separate analytics_events table with (tenant_id, date, metric_type, count) schema enables indexed range queries without touching the audit chain.

Alternative considered: Materialized views over audit_events — rejected because the Merkle chain structure makes column selection fragile and any schema change risks audit integrity.

D2: Tier Enforcement — Middleware vs. Per-Endpoint Guards

Decision: New tierEnforcement Express middleware inserted before route handlers, reading tier config from a TIER_CONFIG constant (not DB) for zero-latency enforcement.

Rationale: Per-endpoint guards duplicate logic and are easy to miss on new routes. A single middleware checking req.tenant.tier against TIER_CONFIG[tier].limits is DRY and auditable. Redis keys prefixed rate:tier:<tier>:<tenant_id> track daily call counts.

Alternative considered: DB-driven tier limits per tenant row — rejected because it adds a DB read to every request; tier configs change rarely and belong in config, not data.

D3: AGNTCY Compliance Report — Generated On-Demand vs. Scheduled

Decision: Generated on-demand (GET /api/compliance/report) with 5-minute Redis cache.

Rationale: Compliance reports are requested infrequently (audit cycles, partner onboarding) and must reflect current state. A 5-minute cache prevents redundant computation without staleness risk. No background job needed.

Alternative considered: Nightly scheduled generation — rejected because it could be stale at the moment of an audit request.

D4: recharts Integration — Dashboard vs. Portal

Decision: Analytics views live in portal/src/pages/analytics/ as new Next.js pages using recharts components. No changes to the existing dashboard/ directory (Phase 3 admin dashboard).

Rationale: The developer portal (portal/) is the tenant-facing surface. The dashboard/ is the internal admin view. Tenant analytics belong in the portal alongside existing usage stats.

D5: Delivery Sequence

Decision: WS3 (Analytics) → WS4 (API Tiers) → WS6 (AGNTCY Compliance), sequential.

Rationale: Analytics requires the analytics_events table; tiers build on top of existing Redis + tenant model and can reuse analytics data for usage display; compliance requires all prior capabilities to be stable for report generation. Each workstream is independently deployable.

Risks / Trade-offs

  • Analytics table growth → Mitigation: daily rollup rows (not per-event rows); add created_at index with a 90-day retention policy via PostgreSQL pg_partman or a simple cron DELETE.
  • Tier enforcement bypass during Redis failure → Mitigation: fail-open on Redis errors (log + allow) to avoid availability impact; flag in health endpoint.
  • recharts bundle size in portal → Mitigation: dynamic import (next/dynamic) for chart components — only loaded on analytics pages.
  • AGNTCY spec version drift → Mitigation: pin to AGNTCY agent-card schema v1.0; add schema version field to report output; test suite validates against pinned version.

Migration Plan

  1. WS3: Run migration 023_add_analytics_events.sql → deploy API changes → deploy portal analytics pages
  2. WS4: Run migration 024_add_tenant_tiers.sql → deploy tier middleware (with TIER_ENFORCEMENT=false feature flag initially) → smoke test → enable flag
  3. WS6: No migration required — compliance report queries existing tables; deploy endpoint + test suite
  4. Rollback: Each WS has a feature flag (ANALYTICS_ENABLED, TIER_ENFORCEMENT, COMPLIANCE_ENABLED). Set to false to disable without rollback.

Open Questions

  • None — scope is fully defined from Phase 5 deferred workstreams. AGNTCY agent-card schema version confirmed as v1.0 per Phase 3 W3C DIDs implementation.