chore(openspec): trim phase-5 scope to WS1+WS2+WS5 per CEO approval
Approved: Rust SDK, A2A Authorization, Developer Experience. Deferred to Phase 6: Analytics Dashboard, API Gateway Tiers, AGNTCY Compliance. Tasks: 119 → 76. Specs: 6 → 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,320 +0,0 @@
|
||||
## WS6: AGNTCY Compliance Certification Package
|
||||
|
||||
### Purpose
|
||||
|
||||
Position SentryAgent.ai as the reference implementation for the AGNTCY standard. Deliver four artifacts: (1) an auto-generated machine-readable AGNTCY compliance report endpoint; (2) an agent card export endpoint per the AGNTCY Agent Card specification; (3) a Jest-based interoperability test suite verifying AGNTCY alignment on every CI run; (4) a human-readable certification guide documenting how SentryAgent.ai satisfies each AGNTCY requirement.
|
||||
|
||||
This workstream produces no user-facing UI changes. It is infrastructure for compliance, certification, and ecosystem trust.
|
||||
|
||||
### New Endpoints
|
||||
|
||||
#### `GET /agntcy/compliance-report`
|
||||
|
||||
**Summary:** Generate and return a real-time AGNTCY compliance report for the authenticated tenant's environment.
|
||||
|
||||
**Authentication:** Bearer token (tenant-scoped). The tenant's subscription tier must be `pro` or `enterprise`.
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"reportId": "string (UUID)",
|
||||
"generatedAt": "string (ISO 8601)",
|
||||
"agntcySpecVersion": "1.0.0",
|
||||
"tenantId": "string (UUID)",
|
||||
"overallStatus": "compliant",
|
||||
"sections": [
|
||||
{
|
||||
"id": "agent-identity",
|
||||
"name": "Agent Identity",
|
||||
"status": "compliant",
|
||||
"requirements": [
|
||||
{
|
||||
"id": "AI-001",
|
||||
"description": "Each agent MUST have a globally unique, persistent identifier",
|
||||
"status": "compliant",
|
||||
"evidence": "All agents are assigned a UUID v4 at registration, stored immutably in agents.id",
|
||||
"verifiedAt": "string (ISO 8601)"
|
||||
},
|
||||
{
|
||||
"id": "AI-002",
|
||||
"description": "Each agent MUST have a W3C DID document",
|
||||
"status": "compliant",
|
||||
"evidence": "DID documents are auto-generated as did:web identifiers at agent registration",
|
||||
"verifiedAt": "string (ISO 8601)"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "authentication",
|
||||
"name": "Authentication",
|
||||
"status": "compliant",
|
||||
"requirements": [
|
||||
{
|
||||
"id": "AUTH-001",
|
||||
"description": "Agent authentication MUST use OAuth 2.0 or OIDC",
|
||||
"status": "compliant",
|
||||
"evidence": "OAuth 2.0 Client Credentials flow implemented at POST /oauth2/token",
|
||||
"verifiedAt": "string (ISO 8601)"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "authorization",
|
||||
"name": "Authorization",
|
||||
"status": "compliant",
|
||||
"requirements": []
|
||||
},
|
||||
{
|
||||
"id": "audit-and-governance",
|
||||
"name": "Audit & Governance",
|
||||
"status": "compliant",
|
||||
"requirements": []
|
||||
},
|
||||
{
|
||||
"id": "interoperability",
|
||||
"name": "Interoperability",
|
||||
"status": "compliant",
|
||||
"requirements": []
|
||||
},
|
||||
{
|
||||
"id": "delegation",
|
||||
"name": "Agent-to-Agent Delegation",
|
||||
"status": "compliant",
|
||||
"requirements": []
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"totalRequirements": 24,
|
||||
"compliant": 24,
|
||||
"nonCompliant": 0,
|
||||
"notApplicable": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**`overallStatus`** values: `"compliant"` | `"partial"` | `"non-compliant"`
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 401 | `UNAUTHORIZED` | Missing or invalid Bearer token |
|
||||
| 403 | `TIER_REQUIRED` | Compliance report requires Pro or Enterprise tier |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
**Business Rules:**
|
||||
- Report is generated on demand from live system state — no cache
|
||||
- Each requirement's `status` is computed by querying current system configuration (e.g., verify DID documents exist by checking `agents` table, verify audit log is enabled by checking config)
|
||||
- `agntcySpecVersion` is hardcoded to the AGNTCY spec version the system was last validated against
|
||||
- An audit log entry is created with `event_type: "compliance.report_generated"`
|
||||
|
||||
---
|
||||
|
||||
#### `GET /agents/:id/agent-card`
|
||||
|
||||
**Summary:** Return the AGNTCY-compliant Agent Card for a specific agent. Agent Cards are publicly accessible for public agents and require authentication for private agents.
|
||||
|
||||
**Authentication:** Optional. Required only if the agent's `is_public` is `false`.
|
||||
|
||||
**Path Parameter:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|---|---|---|
|
||||
| `id` | string (UUID) | Agent ID |
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
|
||||
Per the AGNTCY Agent Card specification:
|
||||
```json
|
||||
{
|
||||
"agntcyVersion": "1.0",
|
||||
"type": "agent-card",
|
||||
"agent": {
|
||||
"id": "string (UUID)",
|
||||
"name": "string",
|
||||
"description": "string | null",
|
||||
"did": "did:web:sentryagent.ai:agents:abc123",
|
||||
"capabilities": ["string"],
|
||||
"version": "string",
|
||||
"publisher": {
|
||||
"tenantId": "string (UUID)",
|
||||
"name": "string"
|
||||
},
|
||||
"endpoints": {
|
||||
"tokenEndpoint": "https://api.sentryagent.ai/oauth2/token",
|
||||
"delegationEndpoint": "https://api.sentryagent.ai/oauth2/token/delegate"
|
||||
},
|
||||
"authentication": {
|
||||
"schemes": ["oauth2_client_credentials"],
|
||||
"tokenEndpoint": "https://api.sentryagent.ai/oauth2/token"
|
||||
},
|
||||
"governance": {
|
||||
"auditLogEnabled": true,
|
||||
"credentialRotationPolicy": "manual",
|
||||
"complianceStandards": ["AGNTCY-1.0", "OAuth2-RFC6749", "W3C-DID"]
|
||||
},
|
||||
"metadata": {}
|
||||
},
|
||||
"issuedAt": "string (ISO 8601)",
|
||||
"expiresAt": "string (ISO 8601)"
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 401 | `UNAUTHORIZED` | Agent is private and no Bearer token provided |
|
||||
| 403 | `FORBIDDEN` | Agent is private and authenticated tenant does not own it |
|
||||
| 404 | `AGENT_NOT_FOUND` | No agent with the given ID |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
**Business Rules:**
|
||||
- Public agents (`is_public: true`) return agent card without authentication
|
||||
- Private agents require the owning tenant's Bearer token
|
||||
- Agent card `expiresAt` is `issuedAt + 24 hours` (cards are short-lived — consumers should re-fetch daily)
|
||||
- `complianceStandards` array is sourced from system config, not per-agent configuration
|
||||
|
||||
---
|
||||
|
||||
### AGNTCY Interoperability Test Suite
|
||||
|
||||
**File:** `tests/agntcy/interoperability.test.ts`
|
||||
|
||||
A Jest test suite that verifies AGNTCY alignment on every CI run. Tests run against a live API instance (reads `AGENTIDP_API_URL` from environment).
|
||||
|
||||
**Test categories and cases:**
|
||||
|
||||
```typescript
|
||||
// AGNTCY-AI-001: Agent identity uniqueness
|
||||
test('each registered agent receives a unique UUID', ...)
|
||||
test('agent UUID is immutable after registration', ...)
|
||||
|
||||
// AGNTCY-AI-002: W3C DID documents
|
||||
test('registered agent has a valid did:web DID', ...)
|
||||
test('DID document resolves via GET /agents/:id', ...)
|
||||
|
||||
// AGNTCY-AUTH-001: OAuth 2.0 token issuance
|
||||
test('POST /oauth2/token returns access_token and token_type: bearer', ...)
|
||||
test('access token is a valid JWT with correct claims', ...)
|
||||
test('expired token is rejected with 401', ...)
|
||||
|
||||
// AGNTCY-AUTH-002: OIDC compliance
|
||||
test('GET /.well-known/openid-configuration returns valid OIDC discovery document', ...)
|
||||
test('JWKS endpoint returns valid JWK Set', ...)
|
||||
|
||||
// AGNTCY-AUTHZ-001: Scope-based access control
|
||||
test('token with agent:read scope cannot call agent:write operations', ...)
|
||||
test('scopes are included in JWT payload', ...)
|
||||
|
||||
// AGNTCY-DEL-001: Agent-to-Agent delegation
|
||||
test('POST /oauth2/token/delegate creates a valid delegation chain', ...)
|
||||
test('delegated scopes cannot exceed delegator scopes', ...)
|
||||
test('POST /oauth2/token/verify-delegation returns valid: true for active chain', ...)
|
||||
test('POST /oauth2/token/verify-delegation returns valid: false for expired chain', ...)
|
||||
|
||||
// AGNTCY-AUDIT-001: Immutable audit logs
|
||||
test('every token issuance creates an audit log entry', ...)
|
||||
test('audit log entries cannot be deleted via API', ...)
|
||||
|
||||
// AGNTCY-GOV-001: Agent lifecycle governance
|
||||
test('credential rotation is logged in audit log', ...)
|
||||
test('agent deletion logs deletion event in audit log', ...)
|
||||
|
||||
// AGNTCY-INTER-001: Agent Card export
|
||||
test('GET /agents/:id/agent-card returns valid AGNTCY Agent Card', ...)
|
||||
test('Agent Card contains required agntcyVersion, did, capabilities fields', ...)
|
||||
|
||||
// AGNTCY-COMP-001: Compliance report
|
||||
test('GET /agntcy/compliance-report returns compliant status', ...)
|
||||
test('compliance report covers all 6 AGNTCY sections', ...)
|
||||
test('compliance report totalRequirements >= 24', ...)
|
||||
```
|
||||
|
||||
**Running the suite:**
|
||||
```bash
|
||||
# In CI (requires live API):
|
||||
AGENTIDP_API_URL=http://localhost:3000 npm run test:agntcy
|
||||
|
||||
# Added to package.json:
|
||||
"test:agntcy": "jest --testPathPattern=tests/agntcy --forceExit"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### AGNTCY Certification Guide
|
||||
|
||||
**File:** `docs/agntcy/certification-guide.md`
|
||||
|
||||
A markdown document structured as follows:
|
||||
1. **Overview** — What AGNTCY certification means and how SentryAgent.ai achieves it
|
||||
2. **Requirement Mapping** — Table mapping each AGNTCY requirement ID to the SentryAgent.ai implementation (endpoint, service, or config)
|
||||
3. **Running the Compliance Report** — Step-by-step guide to generating and interpreting the compliance report
|
||||
4. **Agent Card Usage** — How to retrieve, cache, and use Agent Cards in multi-agent workflows
|
||||
5. **Self-Certification Checklist** — Checklist for operators deploying self-hosted SentryAgent.ai to verify their instance's compliance
|
||||
6. **Submitting for Official AGNTCY Certification** — Links and instructions for the Linux Foundation AGNTCY certification program
|
||||
|
||||
---
|
||||
|
||||
### New Source Files
|
||||
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `src/services/ComplianceService.ts` | Business logic: query system state, evaluate each AGNTCY requirement, build report |
|
||||
| `src/controllers/ComplianceController.ts` | HTTP handlers for compliance report and agent card endpoints |
|
||||
| `src/routes/agntcy.ts` | Express router: `GET /agntcy/compliance-report`, `GET /agents/:id/agent-card` |
|
||||
| `src/types/compliance.ts` | TypeScript interfaces: `ComplianceReport`, `ComplianceSection`, `ComplianceRequirement`, `AgentCard` |
|
||||
| `src/config/agntcyRequirements.ts` | Static array of AGNTCY requirement definitions (id, description, evaluator function reference) |
|
||||
| `tests/agntcy/interoperability.test.ts` | Jest interoperability test suite |
|
||||
| `docs/agntcy/certification-guide.md` | Human-readable certification guide |
|
||||
|
||||
### Modified Source Files
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `src/routes/index.ts` | Register `agntcy` router |
|
||||
| `src/routes/agents.ts` | Add `GET /agents/:id/agent-card` route (or register via agntcy router — agent-card is agent-scoped) |
|
||||
| `package.json` (API) | Add `"test:agntcy"` script |
|
||||
| `docs/openapi.yaml` | Add `GET /agntcy/compliance-report` and `GET /agents/:id/agent-card` endpoints |
|
||||
|
||||
### `ComplianceService` Interface
|
||||
|
||||
```typescript
|
||||
interface IComplianceService {
|
||||
/**
|
||||
* Generate a live AGNTCY compliance report for the given tenant.
|
||||
* Evaluates all registered AGNTCY requirements against current system state.
|
||||
*/
|
||||
generateComplianceReport(tenantId: string): Promise<ComplianceReport>;
|
||||
|
||||
/**
|
||||
* Generate an AGNTCY Agent Card for a specific agent.
|
||||
*/
|
||||
generateAgentCard(agentId: string): Promise<AgentCard>;
|
||||
}
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|---|---|---|---|
|
||||
| `agentidp_compliance_reports_generated_total` | Counter | `tenant_id` | Total compliance reports generated |
|
||||
| `agentidp_compliance_report_duration_ms` | Histogram | — | Time to generate compliance report |
|
||||
| `agentidp_agent_cards_served_total` | Counter | `visibility` (public/private) | Agent cards served by visibility |
|
||||
|
||||
### Feature Flag
|
||||
|
||||
`AGNTCY_ENABLED` (default: `true`). When `false`, all `/agntcy/` routes and `GET /agents/:id/agent-card` return HTTP 404.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `GET /agntcy/compliance-report` returns a report with `overallStatus: "compliant"` on a correctly configured instance
|
||||
- Report contains all 6 sections: agent-identity, authentication, authorization, audit-and-governance, interoperability, delegation
|
||||
- Report `totalRequirements >= 24`
|
||||
- `GET /agents/:id/agent-card` returns a valid AGNTCY Agent Card with all required fields
|
||||
- Agent Card is accessible without auth for public agents
|
||||
- Agent Card requires owning tenant's auth for private agents
|
||||
- All 25+ interoperability test cases pass against a live API instance
|
||||
- `npm run test:agntcy` exits 0 on a correctly configured instance
|
||||
- `docs/agntcy/certification-guide.md` is complete — no TODOs, no placeholders
|
||||
- Unit tests cover: compliance report generation (compliant system, partially compliant), agent card generation (public agent, private agent)
|
||||
@@ -1,279 +0,0 @@
|
||||
## WS3: Advanced Analytics Dashboard
|
||||
|
||||
### Purpose
|
||||
|
||||
Give paying tenants actionable visibility into their agent usage patterns. Analytics surface four dimensions: agent activity over time (heatmap), token issuance frequency and volume (trends), credential rotation frequency (rotation frequency table), and per-endpoint API call patterns (call patterns breakdown). Data is pre-aggregated nightly from the existing `usage_events` table into a new `analytics_daily_aggregates` table. Analytics are rendered in a new Analytics tab in the existing React web dashboard.
|
||||
|
||||
### New Endpoints
|
||||
|
||||
#### `GET /analytics/usage-summary`
|
||||
|
||||
**Summary:** Return a high-level usage summary for the authenticated tenant over a date range.
|
||||
|
||||
**Authentication:** Bearer token (tenant-scoped).
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Required | Default | Constraints |
|
||||
|---|---|---|---|---|
|
||||
| `from` | string (YYYY-MM-DD) | no | 30 days ago | Must be <= `to` |
|
||||
| `to` | string (YYYY-MM-DD) | no | today | Must be <= today |
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"tenantId": "string (UUID)",
|
||||
"period": {
|
||||
"from": "string (YYYY-MM-DD)",
|
||||
"to": "string (YYYY-MM-DD)"
|
||||
},
|
||||
"summary": {
|
||||
"totalApiCalls": 84320,
|
||||
"totalTokenIssuances": 12400,
|
||||
"totalCredentialRotations": 48,
|
||||
"activeAgentCount": 23,
|
||||
"averageDailyApiCalls": 2810,
|
||||
"peakDailyApiCalls": 5102,
|
||||
"peakDate": "2026-03-28"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 400 | `INVALID_DATE_RANGE` | `from` > `to`, or date range exceeds 365 days |
|
||||
| 401 | `UNAUTHORIZED` | Missing or invalid Bearer token |
|
||||
| 403 | `ANALYTICS_NOT_AVAILABLE` | Tenant is on free tier — analytics require Pro or Enterprise |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
---
|
||||
|
||||
#### `GET /analytics/agent-activity`
|
||||
|
||||
**Summary:** Return per-agent daily activity counts for heatmap rendering.
|
||||
|
||||
**Authentication:** Bearer token (tenant-scoped).
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Required | Default | Constraints |
|
||||
|---|---|---|---|---|
|
||||
| `from` | string (YYYY-MM-DD) | no | 30 days ago | Must be <= `to` |
|
||||
| `to` | string (YYYY-MM-DD) | no | today | Max range: 90 days |
|
||||
| `agentId` | string (UUID) | no | (all agents) | Filter to a single agent |
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"tenantId": "string (UUID)",
|
||||
"period": {
|
||||
"from": "string (YYYY-MM-DD)",
|
||||
"to": "string (YYYY-MM-DD)"
|
||||
},
|
||||
"agents": [
|
||||
{
|
||||
"agentId": "string (UUID)",
|
||||
"agentName": "string",
|
||||
"dailyActivity": [
|
||||
{
|
||||
"date": "2026-03-01",
|
||||
"apiCalls": 342,
|
||||
"tokenIssuances": 12,
|
||||
"credentialRotations": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 400 | `INVALID_DATE_RANGE` | `from` > `to`, or date range exceeds 90 days |
|
||||
| 401 | `UNAUTHORIZED` | Missing or invalid Bearer token |
|
||||
| 403 | `ANALYTICS_NOT_AVAILABLE` | Free tier — requires Pro or Enterprise |
|
||||
| 404 | `AGENT_NOT_FOUND` | `agentId` filter specified but agent does not belong to tenant |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
---
|
||||
|
||||
#### `GET /analytics/token-trends`
|
||||
|
||||
**Summary:** Return daily token issuance counts and success/failure breakdown for trend charts.
|
||||
|
||||
**Authentication:** Bearer token (tenant-scoped).
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Required | Default | Constraints |
|
||||
|---|---|---|---|---|
|
||||
| `from` | string (YYYY-MM-DD) | no | 30 days ago | Must be <= `to` |
|
||||
| `to` | string (YYYY-MM-DD) | no | today | Max range: 365 days |
|
||||
| `granularity` | string | no | `day` | Enum: `day`, `week` |
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"tenantId": "string (UUID)",
|
||||
"period": {
|
||||
"from": "string (YYYY-MM-DD)",
|
||||
"to": "string (YYYY-MM-DD)"
|
||||
},
|
||||
"granularity": "day",
|
||||
"dataPoints": [
|
||||
{
|
||||
"date": "2026-03-01",
|
||||
"totalIssuances": 420,
|
||||
"successfulIssuances": 415,
|
||||
"failedIssuances": 5,
|
||||
"uniqueAgents": 8
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 400 | `INVALID_DATE_RANGE` | `from` > `to`, or date range exceeds 365 days |
|
||||
| 400 | `INVALID_GRANULARITY` | `granularity` is not `day` or `week` |
|
||||
| 401 | `UNAUTHORIZED` | Missing or invalid Bearer token |
|
||||
| 403 | `ANALYTICS_NOT_AVAILABLE` | Free tier — requires Pro or Enterprise |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
---
|
||||
|
||||
### Database Schema Changes
|
||||
|
||||
#### Migration: `009_add_analytics_aggregates.sql`
|
||||
|
||||
```sql
|
||||
CREATE TABLE analytics_daily_aggregates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
agent_id UUID REFERENCES agents(id) ON DELETE SET NULL, -- NULL = tenant-wide aggregate
|
||||
date DATE NOT NULL,
|
||||
metric_type VARCHAR(64) NOT NULL, -- 'api_calls' | 'token_issuances' | 'credential_rotations' | 'token_failures'
|
||||
count BIGINT NOT NULL DEFAULT 0,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
|
||||
CONSTRAINT uq_daily_aggregate UNIQUE (tenant_id, agent_id, date, metric_type)
|
||||
);
|
||||
|
||||
-- Index for analytics queries (tenant + date range)
|
||||
CREATE INDEX idx_analytics_tenant_date ON analytics_daily_aggregates(tenant_id, date);
|
||||
CREATE INDEX idx_analytics_agent_date ON analytics_daily_aggregates(agent_id, date) WHERE agent_id IS NOT NULL;
|
||||
```
|
||||
|
||||
#### Nightly Aggregation Job
|
||||
|
||||
A `node-cron` job runs at `00:05 UTC` daily inside the Express API process. It executes an upsert query aggregating the previous day's `usage_events` rows into `analytics_daily_aggregates`. The job is idempotent — running it twice for the same date produces no duplicates (upsert on the unique constraint).
|
||||
|
||||
Job logic (pseudocode):
|
||||
```
|
||||
1. Compute target_date = yesterday (UTC)
|
||||
2. SELECT tenant_id, agent_id, metric_type, SUM(count)
|
||||
FROM usage_events
|
||||
WHERE date = target_date
|
||||
GROUP BY tenant_id, agent_id, metric_type
|
||||
3. UPSERT INTO analytics_daily_aggregates
|
||||
ON CONFLICT (tenant_id, agent_id, date, metric_type)
|
||||
DO UPDATE SET count = EXCLUDED.count, updated_at = NOW()
|
||||
```
|
||||
|
||||
### New Source Files
|
||||
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `src/services/AnalyticsService.ts` | Business logic: query aggregates, build response shapes, Redis caching |
|
||||
| `src/controllers/AnalyticsController.ts` | HTTP handlers for analytics endpoints |
|
||||
| `src/routes/analytics.ts` | Express router for `/analytics/` prefix |
|
||||
| `src/jobs/analyticsAggregation.ts` | `node-cron` job that aggregates usage_events nightly |
|
||||
| `src/types/analytics.ts` | TypeScript interfaces: `UsageSummary`, `AgentActivityResponse`, `TokenTrendsResponse`, `DailyAggregate` |
|
||||
| `dashboard/src/pages/Analytics.tsx` | New Analytics tab in existing React dashboard |
|
||||
| `dashboard/src/components/charts/AgentHeatmap.tsx` | Heatmap component using `recharts` `ResponsiveContainer` + custom cells |
|
||||
| `dashboard/src/components/charts/TokenTrendsChart.tsx` | Line chart of token issuance over time using `recharts` `LineChart` |
|
||||
| `dashboard/src/components/charts/RotationFrequencyTable.tsx` | Sortable table of credential rotation counts per agent |
|
||||
| `dashboard/src/api/analyticsApi.ts` | Typed fetch functions for analytics endpoints |
|
||||
|
||||
### Modified Source Files
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `src/app.ts` | Register `analytics` router; start nightly aggregation cron job |
|
||||
| `src/infrastructure/migrations/` | Add `009_add_analytics_aggregates.sql` |
|
||||
| `dashboard/src/App.tsx` | Add Analytics route and nav link |
|
||||
| `package.json` (API) | Add `node-cron` dependency |
|
||||
| `package.json` (dashboard) | Add `recharts`, `date-fns` dependencies |
|
||||
| `docs/openapi.yaml` | Add analytics endpoints |
|
||||
|
||||
### Redis Caching
|
||||
|
||||
Analytics responses are cached in Redis with `analytics:{tenantId}:{endpoint}:{queryHash}` keys. TTL: 5 minutes for agent-activity and token-trends; 60 seconds for usage-summary. Cache is invalidated on the next request after TTL expiry (no explicit invalidation).
|
||||
|
||||
### `AnalyticsService` Interface
|
||||
|
||||
```typescript
|
||||
interface IAnalyticsService {
|
||||
/**
|
||||
* Return a high-level usage summary for a tenant over a date range.
|
||||
*/
|
||||
getUsageSummary(tenantId: string, from: Date, to: Date): Promise<UsageSummary>;
|
||||
|
||||
/**
|
||||
* Return per-agent daily activity data for heatmap rendering.
|
||||
*/
|
||||
getAgentActivity(
|
||||
tenantId: string,
|
||||
from: Date,
|
||||
to: Date,
|
||||
agentId?: string
|
||||
): Promise<AgentActivityResponse>;
|
||||
|
||||
/**
|
||||
* Return daily token issuance trends with success/failure breakdown.
|
||||
*/
|
||||
getTokenTrends(
|
||||
tenantId: string,
|
||||
from: Date,
|
||||
to: Date,
|
||||
granularity: 'day' | 'week'
|
||||
): Promise<TokenTrendsResponse>;
|
||||
}
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|---|---|---|---|
|
||||
| `agentidp_analytics_query_duration_ms` | Histogram | `endpoint` | Analytics query latency (before cache) |
|
||||
| `agentidp_analytics_cache_hits_total` | Counter | `endpoint` | Analytics Redis cache hits |
|
||||
| `agentidp_analytics_cache_misses_total` | Counter | `endpoint` | Analytics Redis cache misses |
|
||||
| `agentidp_analytics_aggregation_job_duration_ms` | Gauge | — | Nightly aggregation job runtime |
|
||||
| `agentidp_analytics_aggregation_job_last_run` | Gauge | — | Unix timestamp of last successful aggregation job run |
|
||||
|
||||
### Feature Flags
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `ANALYTICS_ENABLED` | `true` | When `false`, all `/analytics/` routes return HTTP 404 |
|
||||
| `ANALYTICS_FREE_TIER` | `false` | When `true`, free tier tenants can access analytics (for beta/testing) |
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `GET /analytics/usage-summary` returns correct aggregate counts for a date range
|
||||
- `GET /analytics/agent-activity` returns per-agent daily rows matching `analytics_daily_aggregates`
|
||||
- `GET /analytics/token-trends` returns daily and weekly granularity correctly
|
||||
- All three endpoints return HTTP 403 for free-tier tenants (when `ANALYTICS_FREE_TIER=false`)
|
||||
- Date range validation rejects `from > to` with HTTP 400
|
||||
- Nightly aggregation job runs idempotently — running twice for same date produces no duplicates
|
||||
- Analytics responses are cached in Redis — a second identical request does not hit the DB
|
||||
- Dashboard Analytics tab renders heatmap, trend chart, and rotation table with mock data in Storybook
|
||||
- Unit test coverage >= 80% on `AnalyticsService`
|
||||
- Integration tests cover: summary, activity, trends (daily), trends (weekly), free-tier rejection, invalid date range
|
||||
@@ -1,276 +0,0 @@
|
||||
## WS4: Public API Gateway & Rate Limiting SaaS
|
||||
|
||||
### Purpose
|
||||
|
||||
Replace the single flat rate limit (Phase 4) with a multi-tier enforcement model where each tenant's rate limits are determined by their subscription tier (`free` | `pro` | `enterprise`). Expose the tier definitions publicly via `GET /tiers` so developers can understand limits before registering. Add `POST /billing/upgrade` so tenants can self-service upgrade their tier without contacting support.
|
||||
|
||||
This workstream closes the gap between Phase 4's flat rate limiter and a proper commercial SaaS gateway model.
|
||||
|
||||
### New Endpoints
|
||||
|
||||
#### `GET /tiers`
|
||||
|
||||
**Summary:** Return the current tier definitions including rate limits, feature flags, and pricing.
|
||||
|
||||
**Authentication:** None (public endpoint).
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"tiers": [
|
||||
{
|
||||
"id": "free",
|
||||
"name": "Free",
|
||||
"price": {
|
||||
"monthly": 0,
|
||||
"currency": "USD"
|
||||
},
|
||||
"limits": {
|
||||
"registeredAgents": 10,
|
||||
"apiCallsPerDay": 1000,
|
||||
"tokenIssuancesPerDay": 200,
|
||||
"rateLimitPerMinute": 60,
|
||||
"rateLimitBurst": 10,
|
||||
"auditLogRetentionDays": 30
|
||||
},
|
||||
"features": {
|
||||
"marketplace": true,
|
||||
"githubActions": true,
|
||||
"analytics": false,
|
||||
"webhooks": false,
|
||||
"sso": false,
|
||||
"sla": false,
|
||||
"customDomain": false,
|
||||
"prioritySupport": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "pro",
|
||||
"name": "Pro",
|
||||
"price": {
|
||||
"monthly": 49,
|
||||
"currency": "USD"
|
||||
},
|
||||
"limits": {
|
||||
"registeredAgents": 100,
|
||||
"apiCallsPerDay": 50000,
|
||||
"tokenIssuancesPerDay": 10000,
|
||||
"rateLimitPerMinute": 600,
|
||||
"rateLimitBurst": 100,
|
||||
"auditLogRetentionDays": 90
|
||||
},
|
||||
"features": {
|
||||
"marketplace": true,
|
||||
"githubActions": true,
|
||||
"analytics": true,
|
||||
"webhooks": true,
|
||||
"sso": false,
|
||||
"sla": false,
|
||||
"customDomain": false,
|
||||
"prioritySupport": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "enterprise",
|
||||
"name": "Enterprise",
|
||||
"price": {
|
||||
"monthly": null,
|
||||
"currency": "USD",
|
||||
"note": "Contact sales"
|
||||
},
|
||||
"limits": {
|
||||
"registeredAgents": null,
|
||||
"apiCallsPerDay": null,
|
||||
"tokenIssuancesPerDay": null,
|
||||
"rateLimitPerMinute": 6000,
|
||||
"rateLimitBurst": 1000,
|
||||
"auditLogRetentionDays": 365
|
||||
},
|
||||
"features": {
|
||||
"marketplace": true,
|
||||
"githubActions": true,
|
||||
"analytics": true,
|
||||
"webhooks": true,
|
||||
"sso": true,
|
||||
"sla": true,
|
||||
"customDomain": true,
|
||||
"prioritySupport": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded (even unauthenticated endpoints have a global IP-based limit) |
|
||||
|
||||
**Notes:**
|
||||
- `null` limits mean unlimited
|
||||
- Tier definitions are sourced from a static configuration object in the codebase, not a database table
|
||||
- The response is cached at the HTTP layer with `Cache-Control: public, max-age=3600`
|
||||
|
||||
---
|
||||
|
||||
#### `POST /billing/upgrade`
|
||||
|
||||
**Summary:** Initiate a self-service tier upgrade for the authenticated tenant. Creates a Stripe Checkout session for the target tier.
|
||||
|
||||
**Authentication:** Bearer token (tenant-scoped).
|
||||
|
||||
**Request Body** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"targetTier": "pro"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Constraints |
|
||||
|---|---|---|---|
|
||||
| `targetTier` | string | yes | Enum: `pro`, `enterprise` — cannot downgrade via this endpoint |
|
||||
|
||||
**Response 200** (`application/json`):
|
||||
```json
|
||||
{
|
||||
"checkoutUrl": "https://checkout.stripe.com/pay/cs_...",
|
||||
"sessionId": "cs_...",
|
||||
"targetTier": "pro",
|
||||
"expiresAt": "string (ISO 8601)"
|
||||
}
|
||||
```
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
| Status | Code | Description |
|
||||
|---|---|---|
|
||||
| 400 | `ALREADY_ON_TIER` | Tenant is already subscribed to `targetTier` |
|
||||
| 400 | `INVALID_TARGET_TIER` | `targetTier` is not a valid upgradeable tier |
|
||||
| 400 | `DOWNGRADE_NOT_SUPPORTED` | `targetTier` is lower than the tenant's current tier |
|
||||
| 401 | `UNAUTHORIZED` | Missing or invalid Bearer token |
|
||||
| 422 | `STRIPE_ERROR` | Stripe API returned an error creating the Checkout session |
|
||||
| 429 | `RATE_LIMITED` | Rate limit exceeded |
|
||||
|
||||
**Business Rules:**
|
||||
- This endpoint extends the existing `BillingService` — a new `upgradeTier(tenantId, targetTier)` method creates a Stripe Checkout session with the correct Stripe Price ID for the target tier
|
||||
- The Stripe Price IDs per tier are configured via environment variables: `STRIPE_PRICE_ID_PRO`, `STRIPE_PRICE_ID_ENTERPRISE`
|
||||
- After payment, Stripe sends `customer.subscription.created` webhook → existing webhook handler updates `tenant_subscriptions`
|
||||
- The `TierRateLimiter` reads the updated tier from `tenant_subscriptions` within 60 seconds (Redis cache TTL for tier lookup)
|
||||
- Downgrade is handled via the existing Stripe customer portal — not exposed as an API endpoint
|
||||
|
||||
---
|
||||
|
||||
### `TierRateLimiter` Middleware
|
||||
|
||||
This replaces the single `RateLimiterRedis` middleware for all authenticated routes. It reads the tenant's current tier, looks up the tier rate limit configuration, and enforces it using per-tenant Redis keys via `rate-limiter-flexible`.
|
||||
|
||||
**Middleware behavior:**
|
||||
1. Extract `tenantId` from the authenticated request context
|
||||
2. Look up tier from Redis cache key `tier:{tenantId}` (TTL: 60 seconds)
|
||||
3. On cache miss: query `tenant_subscriptions` for `tenantId`, cache result for 60s
|
||||
4. Look up rate limit configuration for the tier from the static tier config
|
||||
5. Apply `rate-limiter-flexible` with key `rl:{tier}:{tenantId}` and tier-specific limits
|
||||
6. On rate limit exceeded: return HTTP 429 with headers:
|
||||
- `X-RateLimit-Limit: <limit>`
|
||||
- `X-RateLimit-Remaining: <remaining>`
|
||||
- `X-RateLimit-Reset: <unix timestamp>`
|
||||
- `Retry-After: <seconds>`
|
||||
7. Increment `agentidp_rate_limit_hits_total` counter (labels: `tier`, `tenant_id`, `endpoint`)
|
||||
|
||||
**Unauthenticated routes:** Continue to use the existing flat `RateLimiterRedis` with IP-based keys (unchanged from Phase 4).
|
||||
|
||||
### Tier Configuration Object
|
||||
|
||||
Centralized in `src/config/tiers.ts` — this is the single source of truth for all tier limits and features. Both `GET /tiers` and `TierRateLimiter` read from this same object.
|
||||
|
||||
```typescript
|
||||
export const TIER_CONFIG: Record<TierName, TierDefinition> = {
|
||||
free: {
|
||||
id: 'free',
|
||||
limits: {
|
||||
registeredAgents: 10,
|
||||
apiCallsPerDay: 1000,
|
||||
tokenIssuancesPerDay: 200,
|
||||
rateLimitPerMinute: 60,
|
||||
rateLimitBurst: 10,
|
||||
auditLogRetentionDays: 30,
|
||||
},
|
||||
features: { analytics: false, webhooks: false, sso: false, sla: false },
|
||||
stripeProductId: null,
|
||||
},
|
||||
pro: {
|
||||
id: 'pro',
|
||||
limits: {
|
||||
registeredAgents: 100,
|
||||
apiCallsPerDay: 50000,
|
||||
tokenIssuancesPerDay: 10000,
|
||||
rateLimitPerMinute: 600,
|
||||
rateLimitBurst: 100,
|
||||
auditLogRetentionDays: 90,
|
||||
},
|
||||
features: { analytics: true, webhooks: true, sso: false, sla: false },
|
||||
stripeProductId: process.env.STRIPE_PRICE_ID_PRO ?? '',
|
||||
},
|
||||
enterprise: {
|
||||
id: 'enterprise',
|
||||
limits: {
|
||||
registeredAgents: null,
|
||||
apiCallsPerDay: null,
|
||||
tokenIssuancesPerDay: null,
|
||||
rateLimitPerMinute: 6000,
|
||||
rateLimitBurst: 1000,
|
||||
auditLogRetentionDays: 365,
|
||||
},
|
||||
features: { analytics: true, webhooks: true, sso: true, sla: true },
|
||||
stripeProductId: process.env.STRIPE_PRICE_ID_ENTERPRISE ?? '',
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
### New Source Files
|
||||
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `src/config/tiers.ts` | Static tier configuration — single source of truth for limits and features |
|
||||
| `src/middleware/tierRateLimiter.ts` | `TierRateLimiter` middleware — reads tenant tier, enforces tier-specific limits |
|
||||
| `src/routes/tiers.ts` | Express router for `GET /tiers` |
|
||||
| `src/types/tiers.ts` | TypeScript interfaces: `TierDefinition`, `TierName`, `TierLimits`, `TierFeatures` |
|
||||
|
||||
### Modified Source Files
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `src/middleware/rateLimiter.ts` | Retain for unauthenticated routes; authenticated routes switch to `tierRateLimiter` |
|
||||
| `src/services/BillingService.ts` | Add `upgradeTier(tenantId, targetTier)` method |
|
||||
| `src/controllers/BillingController.ts` | Add handler for `POST /billing/upgrade` |
|
||||
| `src/routes/billing.ts` | Register `POST /billing/upgrade` route |
|
||||
| `src/routes/index.ts` | Register `tiers` router |
|
||||
| `.env.example` | Add `STRIPE_PRICE_ID_PRO`, `STRIPE_PRICE_ID_ENTERPRISE`, `TIER_RATE_LIMITING_ENABLED` |
|
||||
| `docs/openapi.yaml` | Add `GET /tiers` and `POST /billing/upgrade` endpoints |
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|---|---|---|---|
|
||||
| `agentidp_rate_limit_hits_total` | Counter | `tier`, `tenant_id`, `endpoint` | Rate limit rejections per tier (replaces old flat counter) |
|
||||
| `agentidp_tier_cache_hits_total` | Counter | — | Tier Redis cache hits |
|
||||
| `agentidp_tier_cache_misses_total` | Counter | — | Tier Redis cache misses |
|
||||
| `agentidp_billing_upgrades_total` | Counter | `from_tier`, `to_tier` | Self-service upgrade checkout sessions created |
|
||||
|
||||
### Feature Flag
|
||||
|
||||
`TIER_RATE_LIMITING_ENABLED` (default: `true`). When `false`, the system uses the old flat `RateLimiterRedis` middleware — this is the rollback mechanism.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `GET /tiers` returns all three tier definitions matching `TIER_CONFIG` exactly — no database query, cached `Cache-Control: max-age=3600`
|
||||
- `POST /billing/upgrade` creates a Stripe Checkout session and returns `checkoutUrl`
|
||||
- `POST /billing/upgrade` returns HTTP 400 `ALREADY_ON_TIER` when tenant is already on the target tier
|
||||
- `POST /billing/upgrade` returns HTTP 400 `DOWNGRADE_NOT_SUPPORTED` when target tier is lower than current
|
||||
- `TierRateLimiter` enforces free tier limits (60 req/min) for free tenants
|
||||
- `TierRateLimiter` enforces pro tier limits (600 req/min) for pro tenants
|
||||
- Tier lookup is cached in Redis — second request does not query `tenant_subscriptions`
|
||||
- Rate limit response includes `X-RateLimit-*` headers and `Retry-After`
|
||||
- After a Stripe webhook updates `tenant_subscriptions` to `pro`, `TierRateLimiter` applies pro limits within 60 seconds (next cache refresh)
|
||||
- Unit tests cover: tier lookup (cached), tier lookup (miss), free limit enforcement, pro limit enforcement, upgrade (success), upgrade (already on tier), upgrade (downgrade rejected)
|
||||
Reference in New Issue
Block a user