Analytics Dashboard, API Gateway Tiers, AGNTCY Compliance — 62 tasks across 8 groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.8 KiB
Context
SentryAgent.ai AgentIdP has completed four major phases: core OAuth2/agent identity (Phase 1), enterprise features — multi-tenancy, DIDs, OIDC, federation, webhooks, SOC2 (Phase 3), production hardening and developer tooling (Phase 4), and SDK ecosystem + A2A authorization (Phase 5). The platform is feature-complete for identity; Phase 6 makes it commercially competitive by adding visibility (analytics), monetization (tiers), and ecosystem certification (AGNTCY compliance).
Current state constraints:
- Stack: Node.js 18+, TypeScript strict, Express, PostgreSQL 14+, Redis 7+, React/Next.js 14 (portal)
- Existing Redis rate limiter: per-agent token bucket keys (
rate:agent:<id>) - Existing tenant model:
tenantstable withplancolumn (currently unused for enforcement) - Existing audit log:
audit_eventstable (used for SOC2, not analytics) - Portal: Next.js 14 at
portal/, API explorer via Stoplight Elements added in Phase 5
Goals / Non-Goals
Goals:
- Deliver tenant-facing analytics (token trends, agent activity heatmaps) via new API + portal views
- Enforce tier-based rate limits (free/pro/enterprise) transparently via existing middleware layer
- Expose self-service tier upgrade endpoint (integration with existing Stripe billing from Phase 4)
- Generate AGNTCY-standard compliance reports and agent cards on demand
- Provide an interoperability test suite for AGNTCY ecosystem partners
Non-Goals:
- Real-time streaming analytics (WebSocket push) — batch/polling sufficient for Phase 6
- Billing tier downgrades (upgrade only; downgrade requires support intervention)
- Multi-region analytics aggregation — single-region PostgreSQL sufficient
- Full AGNTCY federation partner onboarding UI — report + card export is the deliverable
Decisions
D1: Analytics Storage — Dedicated Table vs. Querying audit_events
Decision: New analytics_events table with pre-aggregated daily rollups.
Rationale: audit_events is an append-only SOC2 ledger with a Merkle chain — querying it for analytics requires full scans and breaks its integrity guarantees. A separate analytics_events table with (tenant_id, date, metric_type, count) schema enables indexed range queries without touching the audit chain.
Alternative considered: Materialized views over audit_events — rejected because the Merkle chain structure makes column selection fragile and any schema change risks audit integrity.
D2: Tier Enforcement — Middleware vs. Per-Endpoint Guards
Decision: New tierEnforcement Express middleware inserted before route handlers, reading tier config from a TIER_CONFIG constant (not DB) for zero-latency enforcement.
Rationale: Per-endpoint guards duplicate logic and are easy to miss on new routes. A single middleware checking req.tenant.tier against TIER_CONFIG[tier].limits is DRY and auditable. Redis keys prefixed rate:tier:<tier>:<tenant_id> track daily call counts.
Alternative considered: DB-driven tier limits per tenant row — rejected because it adds a DB read to every request; tier configs change rarely and belong in config, not data.
D3: AGNTCY Compliance Report — Generated On-Demand vs. Scheduled
Decision: Generated on-demand (GET /api/compliance/report) with 5-minute Redis cache.
Rationale: Compliance reports are requested infrequently (audit cycles, partner onboarding) and must reflect current state. A 5-minute cache prevents redundant computation without staleness risk. No background job needed.
Alternative considered: Nightly scheduled generation — rejected because it could be stale at the moment of an audit request.
D4: recharts Integration — Dashboard vs. Portal
Decision: Analytics views live in portal/src/pages/analytics/ as new Next.js pages using recharts components. No changes to the existing dashboard/ directory (Phase 3 admin dashboard).
Rationale: The developer portal (portal/) is the tenant-facing surface. The dashboard/ is the internal admin view. Tenant analytics belong in the portal alongside existing usage stats.
D5: Delivery Sequence
Decision: WS3 (Analytics) → WS4 (API Tiers) → WS6 (AGNTCY Compliance), sequential.
Rationale: Analytics requires the analytics_events table; tiers build on top of existing Redis + tenant model and can reuse analytics data for usage display; compliance requires all prior capabilities to be stable for report generation. Each workstream is independently deployable.
Risks / Trade-offs
- Analytics table growth → Mitigation: daily rollup rows (not per-event rows); add
created_atindex with a 90-day retention policy via PostgreSQLpg_partmanor a simple cron DELETE. - Tier enforcement bypass during Redis failure → Mitigation: fail-open on Redis errors (log + allow) to avoid availability impact; flag in health endpoint.
- recharts bundle size in portal → Mitigation: dynamic import (
next/dynamic) for chart components — only loaded on analytics pages. - AGNTCY spec version drift → Mitigation: pin to AGNTCY agent-card schema v1.0; add schema version field to report output; test suite validates against pinned version.
Migration Plan
- WS3: Run migration
023_add_analytics_events.sql→ deploy API changes → deploy portal analytics pages - WS4: Run migration
024_add_tenant_tiers.sql→ deploy tier middleware (withTIER_ENFORCEMENT=falsefeature flag initially) → smoke test → enable flag - WS6: No migration required — compliance report queries existing tables; deploy endpoint + test suite
- Rollback: Each WS has a feature flag (
ANALYTICS_ENABLED,TIER_ENFORCEMENT,COMPLIANCE_ENABLED). Set tofalseto disable without rollback.
Open Questions
- None — scope is fully defined from Phase 5 deferred workstreams. AGNTCY agent-card schema version confirmed as v1.0 per Phase 3 W3C DIDs implementation.