feat(openspec): propose phase-6-market-expansion change
Analytics Dashboard, API Gateway Tiers, AGNTCY Compliance — 62 tasks across 8 groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
81
openspec/changes/phase-6-market-expansion/design.md
Normal file
81
openspec/changes/phase-6-market-expansion/design.md
Normal file
@@ -0,0 +1,81 @@
|
||||
## Context
|
||||
|
||||
SentryAgent.ai AgentIdP has completed four major phases: core OAuth2/agent identity (Phase 1), enterprise features — multi-tenancy, DIDs, OIDC, federation, webhooks, SOC2 (Phase 3), production hardening and developer tooling (Phase 4), and SDK ecosystem + A2A authorization (Phase 5). The platform is feature-complete for identity; Phase 6 makes it commercially competitive by adding visibility (analytics), monetization (tiers), and ecosystem certification (AGNTCY compliance).
|
||||
|
||||
**Current state constraints:**
|
||||
- Stack: Node.js 18+, TypeScript strict, Express, PostgreSQL 14+, Redis 7+, React/Next.js 14 (portal)
|
||||
- Existing Redis rate limiter: per-agent token bucket keys (`rate:agent:<id>`)
|
||||
- Existing tenant model: `tenants` table with `plan` column (currently unused for enforcement)
|
||||
- Existing audit log: `audit_events` table (used for SOC2, not analytics)
|
||||
- Portal: Next.js 14 at `portal/`, API explorer via Stoplight Elements added in Phase 5
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Deliver tenant-facing analytics (token trends, agent activity heatmaps) via new API + portal views
|
||||
- Enforce tier-based rate limits (free/pro/enterprise) transparently via existing middleware layer
|
||||
- Expose self-service tier upgrade endpoint (integration with existing Stripe billing from Phase 4)
|
||||
- Generate AGNTCY-standard compliance reports and agent cards on demand
|
||||
- Provide an interoperability test suite for AGNTCY ecosystem partners
|
||||
|
||||
**Non-Goals:**
|
||||
- Real-time streaming analytics (WebSocket push) — batch/polling sufficient for Phase 6
|
||||
- Billing tier downgrades (upgrade only; downgrade requires support intervention)
|
||||
- Multi-region analytics aggregation — single-region PostgreSQL sufficient
|
||||
- Full AGNTCY federation partner onboarding UI — report + card export is the deliverable
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1: Analytics Storage — Dedicated Table vs. Querying `audit_events`
|
||||
|
||||
**Decision**: New `analytics_events` table with pre-aggregated daily rollups.
|
||||
|
||||
**Rationale**: `audit_events` is an append-only SOC2 ledger with a Merkle chain — querying it for analytics requires full scans and breaks its integrity guarantees. A separate `analytics_events` table with `(tenant_id, date, metric_type, count)` schema enables indexed range queries without touching the audit chain.
|
||||
|
||||
**Alternative considered**: Materialized views over `audit_events` — rejected because the Merkle chain structure makes column selection fragile and any schema change risks audit integrity.
|
||||
|
||||
### D2: Tier Enforcement — Middleware vs. Per-Endpoint Guards
|
||||
|
||||
**Decision**: New `tierEnforcement` Express middleware inserted before route handlers, reading tier config from a `TIER_CONFIG` constant (not DB) for zero-latency enforcement.
|
||||
|
||||
**Rationale**: Per-endpoint guards duplicate logic and are easy to miss on new routes. A single middleware checking `req.tenant.tier` against `TIER_CONFIG[tier].limits` is DRY and auditable. Redis keys prefixed `rate:tier:<tier>:<tenant_id>` track daily call counts.
|
||||
|
||||
**Alternative considered**: DB-driven tier limits per tenant row — rejected because it adds a DB read to every request; tier configs change rarely and belong in config, not data.
|
||||
|
||||
### D3: AGNTCY Compliance Report — Generated On-Demand vs. Scheduled
|
||||
|
||||
**Decision**: Generated on-demand (`GET /api/compliance/report`) with 5-minute Redis cache.
|
||||
|
||||
**Rationale**: Compliance reports are requested infrequently (audit cycles, partner onboarding) and must reflect current state. A 5-minute cache prevents redundant computation without staleness risk. No background job needed.
|
||||
|
||||
**Alternative considered**: Nightly scheduled generation — rejected because it could be stale at the moment of an audit request.
|
||||
|
||||
### D4: recharts Integration — Dashboard vs. Portal
|
||||
|
||||
**Decision**: Analytics views live in `portal/src/pages/analytics/` as new Next.js pages using `recharts` components. No changes to the existing `dashboard/` directory (Phase 3 admin dashboard).
|
||||
|
||||
**Rationale**: The developer portal (`portal/`) is the tenant-facing surface. The `dashboard/` is the internal admin view. Tenant analytics belong in the portal alongside existing usage stats.
|
||||
|
||||
### D5: Delivery Sequence
|
||||
|
||||
**Decision**: WS3 (Analytics) → WS4 (API Tiers) → WS6 (AGNTCY Compliance), sequential.
|
||||
|
||||
**Rationale**: Analytics requires the `analytics_events` table; tiers build on top of existing Redis + tenant model and can reuse analytics data for usage display; compliance requires all prior capabilities to be stable for report generation. Each workstream is independently deployable.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **Analytics table growth** → Mitigation: daily rollup rows (not per-event rows); add `created_at` index with a 90-day retention policy via PostgreSQL `pg_partman` or a simple cron DELETE.
|
||||
- **Tier enforcement bypass during Redis failure** → Mitigation: fail-open on Redis errors (log + allow) to avoid availability impact; flag in health endpoint.
|
||||
- **recharts bundle size in portal** → Mitigation: dynamic import (`next/dynamic`) for chart components — only loaded on analytics pages.
|
||||
- **AGNTCY spec version drift** → Mitigation: pin to AGNTCY agent-card schema v1.0; add schema version field to report output; test suite validates against pinned version.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. **WS3**: Run migration `023_add_analytics_events.sql` → deploy API changes → deploy portal analytics pages
|
||||
2. **WS4**: Run migration `024_add_tenant_tiers.sql` → deploy tier middleware (with `TIER_ENFORCEMENT=false` feature flag initially) → smoke test → enable flag
|
||||
3. **WS6**: No migration required — compliance report queries existing tables; deploy endpoint + test suite
|
||||
4. **Rollback**: Each WS has a feature flag (`ANALYTICS_ENABLED`, `TIER_ENFORCEMENT`, `COMPLIANCE_ENABLED`). Set to `false` to disable without rollback.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None — scope is fully defined from Phase 5 deferred workstreams. AGNTCY agent-card schema version confirmed as v1.0 per Phase 3 W3C DIDs implementation.
|
||||
Reference in New Issue
Block a user