# WS5 — Remaining Documentation Updates **Targets:** 5 separate files with surgical edits. --- ## File 1: `docs/engineering/01-overview.md` **Operation:** Replace the Phase Roadmap table (Section 4) to reflect Phase 3–6 completion status and add Phase 6 capabilities to the Product Features table. --- ### Change 1a — Update Phase Roadmap Table **Find (Section 4, the Phase 3 row):** ``` | Phase 3 — Enterprise | PLANNED | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, advanced compliance reporting, SOC 2 Type II certification, enterprise tier (custom retention, SLAs, advanced RBAC) | ``` **Replace with (3 rows — Phase 3 was completed and Phases 4–6 have been added):** ``` | Phase 3 — Enterprise | COMPLETE | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, OIDC provider (A2A delegation), Rust SDK, developer portal (Next.js 14) | | Phase 4 — Compliance & Security | COMPLETE | AGNTCY compliance reports (agent-identity + audit-trail sections), audit hash chain verification, SOC 2 CC6.1 AES-256-CBC column encryption (`EncryptionService`), DID document caching, federation partner JWKS caching | | Phase 5 — Scale & Ecosystem | COMPLETE | Multi-tier subscription model (free/pro/enterprise), Stripe billing integration (`BillingService`, `TierService`), tier enforcement middleware (daily call and token limits), webhook subscriptions + delivery history (`WebhookService`), analytics service (daily event aggregation + trend queries) | | Phase 6 — Market Expansion | COMPLETE | AGNTCY conformance test suite (4 conformance scenarios), API tiers enforced end-to-end, analytics dashboard in developer portal, full Phase 6 engineering documentation update | ``` --- ### Change 1b — Add Phase 3–6 Capabilities to Product Features Table **Find (Section 3, the last row of the features table):** ``` | Health Check | `GET /health` | Checks PostgreSQL and Redis connectivity; unauthenticated; used by load balancers | ``` **Insert the following rows after that line (before the closing of the table):** ``` | W3C Decentralised Identifiers | `GET /api/v1/agents/:id/did`, `GET /api/v1/.well-known/did.json` | DID Core 1.0 documents; `did:web` method; EC P-256 keys; AGNTCY extension fields | | AGNTCY Agent Cards | `GET /api/v1/agents/:id/card` | Machine-readable agent identity summary; AGNTCY schema v1.0 | | AGNTCY Compliance Reports | `GET /api/v1/compliance/report`, `GET /api/v1/compliance/agent-cards` | Compliance sections: agent-identity + audit-trail; cached 5 min; AGNTCY schema v1.0 | | Federation (Cross-IdP) | `POST /api/v1/federation/partners`, `GET /api/v1/federation/partners`, `POST /api/v1/federation/verify` | Register partner IdPs; verify cross-IdP JWTs using cached partner JWKS | | A2A Delegation | `POST /api/v1/oauth2/token/delegate`, `POST /api/v1/oauth2/token/verify-delegation` | Agent-to-agent delegation tokens; OIDC provider (oidc-provider v9) mounted at `/oidc` | | Webhook Subscriptions | `POST /api/v1/webhooks`, `GET /api/v1/webhooks`, `GET /api/v1/webhooks/:id/deliveries` | Outbound event delivery with HMAC signing; Vault-backed secrets; delivery history | | Tier Management | `GET /api/v1/tiers/status`, `POST /api/v1/tiers/upgrade` | Free / Pro / Enterprise tiers; daily call and token limits; Stripe Checkout upgrade flow | | Billing | `POST /api/v1/billing/checkout`, `POST /api/v1/billing/webhook`, `GET /api/v1/billing/status` | Stripe subscription management; webhook event processing | | Analytics | Internal (via `AnalyticsService`) | Daily aggregated event counts per org; token trend queries (up to 90 days); agent activity heatmap; usage summary | | Developer Portal | `/portal` (Next.js 14, separate process) | Get-started wizard, SDK explorer, API reference, analytics dashboard, pricing page | ``` --- ### Change 1c — Update Free Tier Limits Table **Find (Section 6, entire table):** ``` | Limit | Value | |-------|-------| | Max agents | 100 | | Max credentials per agent | No hard cap enforced in code (5 is the documented recommendation) | | Max tokens in flight | 10,000 per agent per calendar month | | Token TTL | 3,600 seconds (1 hour) | | Audit log retention | 90 days | | API rate limit | 100 requests per minute per IP address | ``` **Replace with:** ``` | Limit | Free Tier | Pro Tier | Enterprise Tier | |-------|-----------|----------|-----------------| | Max agents | 100 | 1,000 | Unlimited | | Max API calls per day | Configured in `TIER_CONFIG` | Configured in `TIER_CONFIG` | Unlimited | | Max tokens per day | Configured in `TIER_CONFIG` | Configured in `TIER_CONFIG` | Unlimited | | Token TTL | 3,600 seconds (1 hour) | 3,600 seconds (1 hour) | 3,600 seconds (1 hour) | | Audit log retention | 90 days | 1 year | Custom | | API rate limit (per IP) | 100 req/min | 100 req/min | 100 req/min | | Webhook subscriptions | 0 | 10 | Unlimited | | Analytics retention | 90 days | 1 year | Custom | Tier limits are configured in `src/config/tiers.ts` (`TIER_CONFIG`). Enforcement is handled by `TierService.enforceAgentLimit()` (agent cap) and `src/middleware/tier.ts` (daily call/token caps). Tier upgrades are initiated via `POST /api/v1/tiers/upgrade` and confirmed via the Stripe webhook. ``` --- ## File 2: `docs/engineering/03-tech-stack.md` **Operation:** Append new ADR entries after the existing `### ADR-10: Terraform` section. **Find (last line of the file):** ``` **Consequences**: All infrastructure changes must go through Terraform. No manual edits via the AWS console or GCP console are permitted — they will be overwritten on the next `terraform apply`. Terraform state is stored in a remote backend and must not be edited manually. ``` **Append the following after that line:** ```markdown --- ### ADR-11: Stripe **Status**: Adopted **Component**: Billing — subscription management and payment processing **Decision**: Use Stripe as the payment processing and subscription management platform. The `stripe` npm package (v21+) handles Checkout Session creation, webhook event verification, and subscription lifecycle events. **Rationale**: Stripe's hosted Checkout flow eliminates the need to handle PCI-DSS scope for card data. The `stripe.webhooks.constructEvent()` method uses HMAC-SHA256 to verify incoming webhook payloads, preventing replay attacks. The `checkout.session.completed` event carries `metadata: { orgId, targetTier }`, allowing `BillingService` to delegate tier upgrades to `TierService.applyUpgrade()` without coupling billing logic to tier logic. **Alternatives considered**: - Paddle — rejected because its global merchant-of-record model introduced complexities with the open-source free tier. - Braintree — rejected because Stripe's webhook reliability and developer experience are superior. **Consequences**: Stripe requires `STRIPE_SECRET_KEY` (for API calls) and `STRIPE_WEBHOOK_SECRET` (`whsec_...`, for webhook verification). Per-tier Stripe price IDs are configured via `STRIPE_PRICE_ID_PRO` and `STRIPE_PRICE_ID_ENTERPRISE`. All billing webhook handlers must pass the raw `Buffer` body (not parsed JSON) to `stripe.webhooks.constructEvent()` — use `express.raw()` middleware on the webhook route. --- ### ADR-12: oidc-provider (A2A Delegation) **Status**: Adopted **Component**: A2A delegation — OIDC provider for agent-to-agent trust tokens **Decision**: Use the `oidc-provider` npm package (v9.7.x) as the OIDC provider for issuing A2A delegation tokens. The provider is mounted as a sub-application at `/oidc` within the Express app. **Rationale**: `oidc-provider` is a certified OpenID Connect implementation that handles the full OIDC protocol, including JWKS serving, token endpoint, and discovery document. Rather than implementing a custom delegation token format, using a standards-compliant OIDC provider means delegation tokens can be verified by any OIDC-aware party using the published JWKS at `/oidc/jwks`. **Alternatives considered**: - Custom JWT signing — rejected because hand-rolled token formats cannot benefit from OIDC tooling and interoperability. **Consequences**: `A2A_ENABLED` env var gates the OIDC provider — when set to `'false'`, delegation endpoints return 404. The `OIDC_ISSUER` env var must be set to the full base URL of the OIDC provider (e.g. `https://api.sentryagent.ai`). --- ### ADR-13: Next.js 14 (Developer Portal) **Status**: Adopted **Component**: Developer Portal (`portal/`) — public-facing documentation and onboarding **Decision**: Use Next.js 14 (App Router) with Tailwind CSS for the developer portal. The portal is a separate process served on its own port (independent of the Express API server). **Rationale**: The developer portal has different performance and SEO requirements than the internal operator dashboard (`dashboard/`). Next.js 14's App Router supports React Server Components, which allows the marketing and documentation pages to be statically generated while the analytics dashboard and API Explorer are client-rendered. Tailwind CSS enables rapid UI development consistent with the design system. **Alternatives considered**: - Extending the Vite dashboard — rejected because the developer portal requires server-side rendering for SEO on marketing pages, which Vite does not provide. - Docusaurus — rejected because the portal includes interactive components (Swagger Explorer, analytics charts) that are not well-suited to a documentation-only tool. **Consequences**: The portal (`portal/`) has its own `package.json`, `tsconfig.json`, `tailwind.config.ts`, and `next.config.js`. It is built and run independently: `cd portal && npm install && npm run dev`. The portal calls the AgentIdP REST API using the same `@sentryagent/idp-sdk` as the dashboard. --- ### ADR-14: bull (Job Queue) + kafkajs (Event Streaming) **Status**: Adopted (opt-in) **Component**: Async job processing and event streaming **Decision**: Use `bull` (Redis-backed job queue) for async webhook delivery retries and `kafkajs` for event streaming to external consumers. Both are opt-in — the system operates correctly without Kafka configured. **Rationale**: Webhook delivery requires retry logic with exponential backoff and dead-letter handling. `bull` provides this out of the box using the existing Redis dependency. `kafkajs` enables high-throughput event streaming for analytics and audit events to external data pipelines without blocking the primary request path. **Alternatives considered**: - BullMQ — considered as a more modern alternative to `bull` but rejected to avoid adding a new package family during Phase 6. Migration is a future backlog item. **Consequences**: Kafka is entirely optional. When `KAFKA_BROKERS` is not set, `kafkajs` is not initialised and no events are published. The `bull` queue for webhook delivery requires only the existing Redis instance. --- ### ADR-15: did-resolver + web-did-resolver (W3C DIDs) **Status**: Adopted **Component**: W3C DID Core 1.0 document resolution **Decision**: Use `did-resolver` (v4.1.x) as the DID resolution framework and `web-did-resolver` (v2.0.x) for the `did:web` method implementation. **Rationale**: `did-resolver` provides a pluggable resolver interface used by both the server (for internal resolution) and by third parties who want to verify AgentIdP-issued DIDs. The `did:web` method maps DID identifiers to HTTPS URLs hosting the DID document JSON, requiring no blockchain. `DIDService` generates documents that conform to the W3C DID Core 1.0 specification and include AGNTCY-specific extension fields. **Consequences**: `DID_WEB_DOMAIN` env var is required for DID generation. DID documents are cached in Redis (`did:doc:`, TTL from `DID_DOCUMENT_CACHE_TTL_SECONDS`, default 300s). Private keys are stored in HashiCorp Vault KV v2 when Vault is configured; in dev mode, a `dev:no-vault` marker is stored and keys are ephemeral. ``` --- ## File 3: `docs/engineering/04-codebase-structure.md` **Operation:** Two surgical edits — update the directory tree and update the `src/` subdirectory table. --- ### Change 3a — Update the Annotated Directory Tree **Find (inside the code block in Section 1, after the `sdk-java/` line):** ``` ├── policies/ # OPA policy files ``` **Replace the entire block from `├── policies/` down through `└── jest.config.ts # Jest configuration — ts-jest, test timeouts, coverage thresholds` with the following updated version:** ``` ├── sdk-rust/ # Rust SDK (sentryagent-idp crate) — async, tokio, reqwest, typed errors ├── policies/ # OPA policy files │ ├── authz.rego # Rego policy — normalise_path + scope-intersection allow rule │ └── data/scopes.json # Endpoint permission map — used by Rego and TypeScript fallback ├── portal/ # Developer Portal — Next.js 14 App Router, Tailwind CSS │ ├── app/ # Next.js App Router pages (get-started, pricing, sdks, analytics, settings, login) │ ├── components/ # Shared UI components (Nav.tsx, SwaggerExplorer.tsx, GetStartedWizard.tsx) │ ├── hooks/ # React hooks (useAuth.ts) │ └── types/ # TypeScript type definitions for portal-only types ├── terraform/ # Terraform infrastructure as code │ ├── modules/ # Reusable modules: agentidp, lb, rds, redis │ └── environments/ # Environment configs: aws/ (ECS+RDS+ElastiCache), gcp/ (Cloud Run+SQL+Memorystore) ├── monitoring/ # Prometheus and Grafana configuration │ ├── prometheus/ # prometheus.yml scrape configuration │ └── grafana/ # Grafana provisioning YAML and dashboard JSON files ├── docs/ # All project documentation │ ├── engineering/ # Internal engineering knowledge base (this directory) │ ├── developers/ # End-user API reference and developer guides │ ├── devops/ # Operator runbooks and environment variable reference │ ├── agntcy/ # AGNTCY alignment documentation │ └── openapi/ # OpenAPI 3.0 specification files ├── openspec/ # OpenSpec change management — proposals, designs, specs, tasks, archives ├── tests/ # Jest test suite — mirrors src/ structure │ ├── unit/ # Unit tests (mocked dependencies) — mirrors src/ │ ├── integration/ # Integration tests (real DB + Redis) │ ├── agntcy-conformance/ # AGNTCY conformance test suite (separate Jest config) │ └── load/ # k6 load test scripts ├── Dockerfile # Multi-stage production build (build + runtime stages) ├── docker-compose.yml # Local development: PostgreSQL 14 (port 5432) + Redis 7 (port 6379) ├── docker-compose.monitoring.yml # Monitoring overlay: Prometheus (port 9090) + Grafana (port 3001) ├── package.json # Node.js dependencies and npm scripts ├── tsconfig.json # TypeScript strict configuration — compiled to dist/ └── jest.config.ts # Jest configuration — ts-jest, test timeouts, coverage thresholds ``` --- ### Change 3b — Add New src/ Subdirectories to Section 2 **Find (Section 2 table, the last row):** ``` | `src/cache/` | Redis client factory — creates and caches a single `redis` client instance | Client is a singleton created once in `src/app.ts` and passed to repositories | ``` **Insert these rows after that line:** ``` | `src/config/` | Configuration constants — `tiers.ts` exports `TIER_CONFIG`, `TIER_RANK`, `TierName`, and `isTierName()` type guard | Imported by `TierService` and `tierMiddleware`; never imports from services | | `src/middleware/tier.ts` | Tier enforcement middleware — reads org tier from `TierService`, checks daily call counter in Redis, throws `TierLimitError` (429) when limit is exceeded, increments counter on pass | Applied only to API routes; skips `/health`, `/metrics`, and static file routes | ``` --- ### Change 3c — Add New Entries to Section 3 (Where to Add New Code) **Find (Section 3 table, after the `A new Prometheus metric` row):** ``` | A new TypeScript type used in 2+ files | `src/types/index.ts` | A new `AgentGroupMembership` interface | ``` **Insert these rows after that line:** ``` | A new tier-gated feature | `src/config/tiers.ts` (add limit field) + `src/middleware/tier.ts` (add check) + service (enforce) | Adding a `maxWebhooksPerOrg` tier limit | | A webhook event handler | `src/services/WebhookService.ts` (add event type to `WebhookEventType`) + the producer that calls `void webhookService.dispatch(orgId, eventType, payload)` | Emitting `agent.decommissioned` events to subscriber URLs | | A new analytics metric type | `src/services/AnalyticsService.ts` (call `recordEvent(tenantId, 'new_metric')` in the relevant service using `void`) | Recording `credential_rotated` events for analytics | | A new DID endpoint | `src/controllers/DIDController.ts` + `src/routes/did.ts` + `src/services/DIDService.ts` (if new method needed) + `policies/data/scopes.json` | Adding `GET /api/v1/agents/:id/did/rotate-key` | ``` --- ## File 4: `docs/engineering/README.md` **Operation:** Replace the reading order table and quick reference table to reflect all Phase 6 additions. --- ### Change 4a — Update Reading Order Table **Find (Section "Reading Order (New Engineers Start Here)", the last row):** ``` | 11 | [SDK Integration Guide](11-sdk-guide.md) | All 4 SDKs — installation, examples, contribution guide | 20 min | ``` **Replace with (adds the Rust SDK to the description and updates the estimated time):** ``` | 11 | [SDK Integration Guide](11-sdk-guide.md) | All 5 SDKs (Node.js, Python, Go, Java, Rust) — installation, examples, contribution guide | 25 min | ``` **Find (the line after the table):** ``` **Total estimated reading time for new engineers: ~3.5 hours** ``` **Replace with:** ``` **Total estimated reading time for new engineers: ~4 hours** ``` --- ### Change 4b — Update "Service Deep Dives" Entry **Find:** ``` | 5 | [Service Deep Dives](05-services.md) | All 8 services/components — purpose, interface, schema, error types | 30 min | ``` **Replace with:** ``` | 5 | [Service Deep Dives](05-services.md) | All 17 services/components (incl. Phase 3–6: AnalyticsService, TierService, ComplianceService, FederationService, DIDService, WebhookService, BillingService, DelegationService, OIDCService) — purpose, interface, schema, error types | 45 min | ``` --- ### Change 4c — Update Quick Reference Table **Find (in the Quick Reference section):** ``` | Integrate with the SDK | [11-sdk-guide.md](11-sdk-guide.md) | ``` **Replace with:** ``` | Integrate with the SDK (Node.js, Python, Go, Java, Rust) | [11-sdk-guide.md](11-sdk-guide.md) | ``` **Find (after the "Integrate with the SDK" row):** ``` | Understand why a technology was chosen | [03-tech-stack.md](03-tech-stack.md) | ``` **Insert after that row:** ``` | Understand tier limits and billing | [01-overview.md](01-overview.md) (Section 6) + [03-tech-stack.md](03-tech-stack.md) (ADR-11) | | Understand AGNTCY compliance reports | [05-services.md](05-services.md) (ComplianceService) | | Understand the A2A delegation flow | [06-walkthroughs.md](06-walkthroughs.md) (Walkthrough 4) | | Run the AGNTCY conformance suite | [09-testing.md](09-testing.md) (Section 10.8) | | Add a new Rust SDK endpoint | [11-sdk-guide.md](11-sdk-guide.md) (Section 6 contribution guide) | ``` --- ## File 5: `docs/engineering/06-walkthroughs.md` **Operation:** Append three new walkthrough sections at the end of the file. **Find (the last line of the file):** ``` Returns `ICredentialWithSecret` — the updated credential including the new `clientSecret`. This is the only time the new secret is ever returned. The caller must store it securely. ``` **Append the following after that final JSON block:** ```markdown --- ## Walkthrough 4 — A2A Delegation End-to-End **Request:** `POST /api/v1/oauth2/token/delegate` — one AI agent delegating a scoped capability to another This walkthrough traces how agent A (an orchestrator) issues a delegation token that grants agent B (a sub-agent) the right to act on its behalf with a restricted scope. --- ### Step 1 — Route dispatch **File:** `src/routes/delegation.ts` ```typescript router.post( '/token/delegate', asyncHandler(authMiddleware), opaMiddleware, asyncHandler(delegationController.createDelegation.bind(delegationController)) ); ``` Both `authMiddleware` and `opaMiddleware` run. The OPA policy requires scope `agents:write` for delegation creation. --- ### Step 2 — Controller: extract delegator and validate **File:** `src/controllers/DelegationController.ts` ```typescript const delegatorId = req.user.sub; // From the Bearer token's sub claim const { delegatee_id, scope, expires_at } = req.body; ``` The controller validates that `delegatee_id` is a non-empty UUID, `scope` is a non-empty string, and `expires_at` (if provided) is a valid ISO 8601 datetime in the future. It passes these to `DelegationService.createDelegation()`. --- ### Step 3 — Service: verify both agents exist **File:** `src/services/DelegationService.ts` ```typescript const delegator = await this.agentRepository.findById(delegatorId); if (!delegator || delegator.status !== 'active') { throw new AgentNotFoundError(delegatorId) } const delegatee = await this.agentRepository.findById(delegateeId); if (!delegatee || delegatee.status !== 'active') { throw new AgentNotFoundError(delegateeId) } ``` Both agents must exist and be in `active` status. A suspended or decommissioned agent cannot participate in delegation. --- ### Step 4 — Service: insert delegation chain record **File:** `src/services/DelegationService.ts` ```typescript await this.pool.query( `INSERT INTO delegation_chains (chain_id, delegator_id, delegatee_id, scope, status, expires_at) VALUES ($1, $2, $3, $4, 'active', $5)`, [chainId, delegatorId, delegateeId, scope, expiresAt] ); ``` The `chain_id` is a UUID generated by the service. The `delegation_chains` table provides the authoritative source of truth for which delegations are active, independent of any token. --- ### Step 5 — Response ```json { "chain_id": "f1e2d3c4-...", "token": "eyJhbGciOiJSUzI1NiJ9...", "delegator_id": "a1b2c3d4-...", "delegatee_id": "b2c3d4e5-...", "scope": "agents:read", "status": "active", "expires_at": "2026-04-05T00:00:00Z" } ``` The `token` field is the signed delegation JWT. The delegatee presents this token to `POST /api/v1/oauth2/token/verify-delegation` to prove it has authority to act on the delegator's behalf. **Why store both the DB record and the JWT?** The DB record allows revocation — when the delegator calls `DELETE /api/v1/delegation-chains/:chainId`, the record is soft-deleted and all subsequent `verify-delegation` calls will fail even if the JWT itself has not yet expired. --- ## Walkthrough 5 — Tier Enforcement Request Lifecycle **Request:** Any authenticated API request when the organisation's daily call limit is reached This walkthrough traces how `tierMiddleware` intercepts a request before it reaches the OPA middleware, preventing quota-exceeded traffic from consuming service resources. --- ### Step 1 — Auth middleware passes Same as Walkthrough 2, Step 3. The Bearer JWT is verified and `req.user` is populated with `sub` (agentId) and `organization_id`. --- ### Step 2 — Tier middleware: fetch org tier **File:** `src/middleware/tier.ts` ```typescript const orgId = req.user.organization_id; const tier = await tierService.fetchTier(orgId); const config = TIER_CONFIG[tier]; ``` `fetchTier()` issues `SELECT tier FROM organizations WHERE organization_id = $1`. Returns `'free'` if no row is found (safe default). --- ### Step 3 — Tier middleware: read daily counter **File:** `src/middleware/tier.ts` ```typescript const callsKey = `rate:tier:calls:${orgId}`; const callsToday = await redis.get(callsKey); const count = callsToday !== null ? parseInt(callsToday, 10) : 0; if (count >= config.maxCallsPerDay) { throw new TierLimitError('calls', config.maxCallsPerDay, { orgId, tier, current: count }); } ``` The Redis key `rate:tier:calls:` is read. If null (first call of the day), count is 0. When count equals or exceeds the tier limit, `TierLimitError` (HTTP 429) is thrown immediately — no further middleware runs. --- ### Step 4 — Tier middleware: increment counter (fire-and-forget) **File:** `src/middleware/tier.ts` ```typescript // Set TTL to next UTC midnight if key is new void redis.multi() .incr(callsKey) .expireAt(callsKey, nextUtcMidnightUnix()) .exec(); next(); ``` The counter is incremented atomically using a Redis MULTI block. The `EXPIREAT` command sets the key to auto-delete at the next UTC midnight, resetting the daily counter without any scheduled job. The increment is fire-and-forget — the request proceeds immediately to `opaMiddleware`. **Why expire at UTC midnight rather than a rolling 24-hour window?** Tier limits are documented as "per day", which users interpret as resetting at midnight. A rolling window would allow a user to consume their full daily quota twice within a 48-hour period straddling midnight, which is counterintuitive. UTC midnight is predictable and easy to reason about. --- ### Step 5 — Error handler serialises TierLimitError **File:** `src/middleware/errorHandler.ts` ```json HTTP 429 { "code": "TIER_LIMIT_EXCEEDED", "message": "Daily API call limit reached for your tier.", "details": { "tier": "free", "limit": 1000, "current": 1000 } } ``` The `Retry-After` header is set to the number of seconds until next UTC midnight so clients can implement automatic backoff. --- ## Walkthrough 6 — Analytics Event Capture Flow **Trigger:** Any successful token issuance (`POST /api/v1/token`) This walkthrough traces how an analytics event is captured without affecting the latency of the primary token issuance response. --- ### Step 1 — Token issuance completes **File:** `src/services/OAuth2Service.ts` ```typescript const accessToken = signToken(payload, this.privateKey); // Primary response is ready — analytics is now fire-and-forget void this.analyticsService.recordEvent(tenantId, 'token_issued'); tokensIssuedTotal.inc({ scope }); ``` The `signToken()` call completes synchronously (RSA signing is CPU-bound, not I/O). The controller can now send the response. `analyticsService.recordEvent()` is called with `void` — the `await` is deliberately omitted. **Why `void` instead of `await`?** Token issuance latency must remain below 100ms (per the QA performance gate). A PostgreSQL write adds 5–15ms. Since analytics data is aggregated (not transactional), losing an occasional event due to an error is acceptable. The response is never delayed for analytics. --- ### Step 2 — AnalyticsService: UPSERT daily counter **File:** `src/services/AnalyticsService.ts` ```typescript async recordEvent(tenantId: string, metricType: string): Promise { try { await this.pool.query( `INSERT INTO analytics_events (organization_id, date, metric_type, count) VALUES ($1, CURRENT_DATE, $2, 1) ON CONFLICT (organization_id, date, metric_type) DO UPDATE SET count = analytics_events.count + 1`, [tenantId, metricType], ); } catch (err) { console.error('[AnalyticsService] recordEvent failed — primary path unaffected', err); } } ``` The `ON CONFLICT DO UPDATE` upsert is atomic. Whether this is the first or the ten-thousandth `token_issued` event for this tenant today, the row is updated correctly. All errors are caught and swallowed — the token has already been returned to the caller. **Why one row per day per metric, not one row per event?** Storing a row per event would create millions of rows. The daily aggregate model keeps the table compact while still providing daily trend data (the granularity that analytics dashboards need). Sub-day granularity is available from the Prometheus `agentidp_tokens_issued_total` counter if needed. --- ### Step 3 — Dashboard query (deferred) When a developer visits the analytics page in the developer portal, the portal calls: ``` GET /api/v1/analytics/token-trend?days=30 ``` **File:** `src/services/AnalyticsService.ts` — `getTokenTrend(tenantId, 30)` ```sql SELECT gs.date::DATE::TEXT AS date, COALESCE(ae.count, 0)::INTEGER AS count FROM generate_series( CURRENT_DATE - 29 * INTERVAL '1 day', CURRENT_DATE, INTERVAL '1 day' ) AS gs(date) LEFT JOIN analytics_events ae ON ae.date = gs.date::DATE AND ae.organization_id = $2 AND ae.metric_type = 'token_issued' ORDER BY gs.date ASC ``` The `generate_series` + `LEFT JOIN` pattern ensures all 30 days appear in the result, with `count: 0` for days with no events. This avoids the need for the client to fill in gaps. ```