# WS5 — Remaining Documentation Updates

**Targets:** 5 separate files with surgical edits.

---

## File 1: `docs/engineering/01-overview.md`

**Operation:** Replace the Phase Roadmap table (Section 4) to reflect Phase 3–6 completion status and add Phase 6 capabilities to the Product Features table.

---

### Change 1a — Update Phase Roadmap Table

**Find (Section 4, the Phase 3 row):**
```
| Phase 3 — Enterprise | PLANNED | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, advanced compliance reporting, SOC 2 Type II certification, enterprise tier (custom retention, SLAs, advanced RBAC) |
```

**Replace with (3 rows — Phase 3 was completed and Phases 4–6 have been added):**
```
| Phase 3 — Enterprise | COMPLETE | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, OIDC provider (A2A delegation), Rust SDK, developer portal (Next.js 14) |
| Phase 4 — Compliance & Security | COMPLETE | AGNTCY compliance reports (agent-identity + audit-trail sections), audit hash chain verification, SOC 2 CC6.1 AES-256-CBC column encryption (`EncryptionService`), DID document caching, federation partner JWKS caching |
| Phase 5 — Scale & Ecosystem | COMPLETE | Multi-tier subscription model (free/pro/enterprise), Stripe billing integration (`BillingService`, `TierService`), tier enforcement middleware (daily call and token limits), webhook subscriptions + delivery history (`WebhookService`), analytics service (daily event aggregation + trend queries) |
| Phase 6 — Market Expansion | COMPLETE | AGNTCY conformance test suite (4 conformance scenarios), API tiers enforced end-to-end, analytics dashboard in developer portal, full Phase 6 engineering documentation update |
```

---

### Change 1b — Add Phase 3–6 Capabilities to Product Features Table

**Find (Section 3, the last row of the features table):**
```
| Health Check | `GET /health` | Checks PostgreSQL and Redis connectivity; unauthenticated; used by load balancers |
```

**Insert the following rows after that line (before the closing of the table):**
```
| W3C Decentralised Identifiers | `GET /api/v1/agents/:id/did`, `GET /api/v1/.well-known/did.json` | DID Core 1.0 documents; `did:web` method; EC P-256 keys; AGNTCY extension fields |
| AGNTCY Agent Cards | `GET /api/v1/agents/:id/card` | Machine-readable agent identity summary; AGNTCY schema v1.0 |
| AGNTCY Compliance Reports | `GET /api/v1/compliance/report`, `GET /api/v1/compliance/agent-cards` | Compliance sections: agent-identity + audit-trail; cached 5 min; AGNTCY schema v1.0 |
| Federation (Cross-IdP) | `POST /api/v1/federation/partners`, `GET /api/v1/federation/partners`, `POST /api/v1/federation/verify` | Register partner IdPs; verify cross-IdP JWTs using cached partner JWKS |
| A2A Delegation | `POST /api/v1/oauth2/token/delegate`, `POST /api/v1/oauth2/token/verify-delegation` | Agent-to-agent delegation tokens; OIDC provider (oidc-provider v9) mounted at `/oidc` |
| Webhook Subscriptions | `POST /api/v1/webhooks`, `GET /api/v1/webhooks`, `GET /api/v1/webhooks/:id/deliveries` | Outbound event delivery with HMAC signing; Vault-backed secrets; delivery history |
| Tier Management | `GET /api/v1/tiers/status`, `POST /api/v1/tiers/upgrade` | Free / Pro / Enterprise tiers; daily call and token limits; Stripe Checkout upgrade flow |
| Billing | `POST /api/v1/billing/checkout`, `POST /api/v1/billing/webhook`, `GET /api/v1/billing/status` | Stripe subscription management; webhook event processing |
| Analytics | Internal (via `AnalyticsService`) | Daily aggregated event counts per org; token trend queries (up to 90 days); agent activity heatmap; usage summary |
| Developer Portal | `/portal` (Next.js 14, separate process) | Get-started wizard, SDK explorer, API reference, analytics dashboard, pricing page |
```

---

### Change 1c — Update Free Tier Limits Table

**Find (Section 6, entire table):**
```
| Limit | Value |
|-------|-------|
| Max agents | 100 |
| Max credentials per agent | No hard cap enforced in code (5 is the documented recommendation) |
| Max tokens in flight | 10,000 per agent per calendar month |
| Token TTL | 3,600 seconds (1 hour) |
| Audit log retention | 90 days |
| API rate limit | 100 requests per minute per IP address |
```

**Replace with:**
```
| Limit | Free Tier | Pro Tier | Enterprise Tier |
|-------|-----------|----------|-----------------|
| Max agents | 100 | 1,000 | Unlimited |
| Max API calls per day | Configured in `TIER_CONFIG` | Configured in `TIER_CONFIG` | Unlimited |
| Max tokens per day | Configured in `TIER_CONFIG` | Configured in `TIER_CONFIG` | Unlimited |
| Token TTL | 3,600 seconds (1 hour) | 3,600 seconds (1 hour) | 3,600 seconds (1 hour) |
| Audit log retention | 90 days | 1 year | Custom |
| API rate limit (per IP) | 100 req/min | 100 req/min | 100 req/min |
| Webhook subscriptions | 0 | 10 | Unlimited |
| Analytics retention | 90 days | 1 year | Custom |

Tier limits are configured in `src/config/tiers.ts` (`TIER_CONFIG`). Enforcement is handled by `TierService.enforceAgentLimit()` (agent cap) and `src/middleware/tier.ts` (daily call/token caps). Tier upgrades are initiated via `POST /api/v1/tiers/upgrade` and confirmed via the Stripe webhook.
```

---

## File 2: `docs/engineering/03-tech-stack.md`

**Operation:** Append new ADR entries after the existing `### ADR-10: Terraform` section.

**Find (last line of the file):**
```
**Consequences**: All infrastructure changes must go through Terraform. No manual edits
via the AWS console or GCP console are permitted — they will be overwritten on the next
`terraform apply`. Terraform state is stored in a remote backend and must not be edited
manually.
```

**Append the following after that line:**

```markdown
---

### ADR-11: Stripe

**Status**: Adopted
**Component**: Billing — subscription management and payment processing

**Decision**: Use Stripe as the payment processing and subscription management platform. The `stripe` npm package (v21+) handles Checkout Session creation, webhook event verification, and subscription lifecycle events.

**Rationale**: Stripe's hosted Checkout flow eliminates the need to handle PCI-DSS scope for card data. The `stripe.webhooks.constructEvent()` method uses HMAC-SHA256 to verify incoming webhook payloads, preventing replay attacks. The `checkout.session.completed` event carries `metadata: { orgId, targetTier }`, allowing `BillingService` to delegate tier upgrades to `TierService.applyUpgrade()` without coupling billing logic to tier logic.

**Alternatives considered**:
- Paddle — rejected because its global merchant-of-record model introduced complexities with the open-source free tier.
- Braintree — rejected because Stripe's webhook reliability and developer experience are superior.

**Consequences**: Stripe requires `STRIPE_SECRET_KEY` (for API calls) and `STRIPE_WEBHOOK_SECRET` (`whsec_...`, for webhook verification). Per-tier Stripe price IDs are configured via `STRIPE_PRICE_ID_PRO` and `STRIPE_PRICE_ID_ENTERPRISE`. All billing webhook handlers must pass the raw `Buffer` body (not parsed JSON) to `stripe.webhooks.constructEvent()` — use `express.raw()` middleware on the webhook route.

---

### ADR-12: oidc-provider (A2A Delegation)

**Status**: Adopted
**Component**: A2A delegation — OIDC provider for agent-to-agent trust tokens

**Decision**: Use the `oidc-provider` npm package (v9.7.x) as the OIDC provider for issuing A2A delegation tokens. The provider is mounted as a sub-application at `/oidc` within the Express app.

**Rationale**: `oidc-provider` is a certified OpenID Connect implementation that handles the full OIDC protocol, including JWKS serving, token endpoint, and discovery document. Rather than implementing a custom delegation token format, using a standards-compliant OIDC provider means delegation tokens can be verified by any OIDC-aware party using the published JWKS at `/oidc/jwks`.

**Alternatives considered**:
- Custom JWT signing — rejected because hand-rolled token formats cannot benefit from OIDC tooling and interoperability.

**Consequences**: `A2A_ENABLED` env var gates the OIDC provider — when set to `'false'`, delegation endpoints return 404. The `OIDC_ISSUER` env var must be set to the full base URL of the OIDC provider (e.g. `https://api.sentryagent.ai`).

---

### ADR-13: Next.js 14 (Developer Portal)

**Status**: Adopted
**Component**: Developer Portal (`portal/`) — public-facing documentation and onboarding

**Decision**: Use Next.js 14 (App Router) with Tailwind CSS for the developer portal. The portal is a separate process served on its own port (independent of the Express API server).

**Rationale**: The developer portal has different performance and SEO requirements than the internal operator dashboard (`dashboard/`). Next.js 14's App Router supports React Server Components, which allows the marketing and documentation pages to be statically generated while the analytics dashboard and API Explorer are client-rendered. Tailwind CSS enables rapid UI development consistent with the design system.

**Alternatives considered**:
- Extending the Vite dashboard — rejected because the developer portal requires server-side rendering for SEO on marketing pages, which Vite does not provide.
- Docusaurus — rejected because the portal includes interactive components (Swagger Explorer, analytics charts) that are not well-suited to a documentation-only tool.

**Consequences**: The portal (`portal/`) has its own `package.json`, `tsconfig.json`, `tailwind.config.ts`, and `next.config.js`. It is built and run independently: `cd portal && npm install && npm run dev`. The portal calls the AgentIdP REST API using the same `@sentryagent/idp-sdk` as the dashboard.

---

### ADR-14: bull (Job Queue) + kafkajs (Event Streaming)

**Status**: Adopted (opt-in)
**Component**: Async job processing and event streaming

**Decision**: Use `bull` (Redis-backed job queue) for async webhook delivery retries and `kafkajs` for event streaming to external consumers. Both are opt-in — the system operates correctly without Kafka configured.

**Rationale**: Webhook delivery requires retry logic with exponential backoff and dead-letter handling. `bull` provides this out of the box using the existing Redis dependency. `kafkajs` enables high-throughput event streaming for analytics and audit events to external data pipelines without blocking the primary request path.

**Alternatives considered**:
- BullMQ — considered as a more modern alternative to `bull` but rejected to avoid adding a new package family during Phase 6. Migration is a future backlog item.

**Consequences**: Kafka is entirely optional. When `KAFKA_BROKERS` is not set, `kafkajs` is not initialised and no events are published. The `bull` queue for webhook delivery requires only the existing Redis instance.

---

### ADR-15: did-resolver + web-did-resolver (W3C DIDs)

**Status**: Adopted
**Component**: W3C DID Core 1.0 document resolution

**Decision**: Use `did-resolver` (v4.1.x) as the DID resolution framework and `web-did-resolver` (v2.0.x) for the `did:web` method implementation.

**Rationale**: `did-resolver` provides a pluggable resolver interface used by both the server (for internal resolution) and by third parties who want to verify AgentIdP-issued DIDs. The `did:web` method maps DID identifiers to HTTPS URLs hosting the DID document JSON, requiring no blockchain. `DIDService` generates documents that conform to the W3C DID Core 1.0 specification and include AGNTCY-specific extension fields.

**Consequences**: `DID_WEB_DOMAIN` env var is required for DID generation. DID documents are cached in Redis (`did:doc:<agentId>`, TTL from `DID_DOCUMENT_CACHE_TTL_SECONDS`, default 300s). Private keys are stored in HashiCorp Vault KV v2 when Vault is configured; in dev mode, a `dev:no-vault` marker is stored and keys are ephemeral.
```

---

## File 3: `docs/engineering/04-codebase-structure.md`

**Operation:** Two surgical edits — update the directory tree and update the `src/` subdirectory table.

---

### Change 3a — Update the Annotated Directory Tree

**Find (inside the code block in Section 1, after the `sdk-java/` line):**
```
├── policies/                  # OPA policy files
```

**Replace the entire block from `├── policies/` down through `└── jest.config.ts             # Jest configuration — ts-jest, test timeouts, coverage thresholds` with the following updated version:**

```
├── sdk-rust/                  # Rust SDK (sentryagent-idp crate) — async, tokio, reqwest, typed errors
├── policies/                  # OPA policy files
│   ├── authz.rego             # Rego policy — normalise_path + scope-intersection allow rule
│   └── data/scopes.json       # Endpoint permission map — used by Rego and TypeScript fallback
├── portal/                    # Developer Portal — Next.js 14 App Router, Tailwind CSS
│   ├── app/                   # Next.js App Router pages (get-started, pricing, sdks, analytics, settings, login)
│   ├── components/            # Shared UI components (Nav.tsx, SwaggerExplorer.tsx, GetStartedWizard.tsx)
│   ├── hooks/                 # React hooks (useAuth.ts)
│   └── types/                 # TypeScript type definitions for portal-only types
├── terraform/                 # Terraform infrastructure as code
│   ├── modules/               # Reusable modules: agentidp, lb, rds, redis
│   └── environments/          # Environment configs: aws/ (ECS+RDS+ElastiCache), gcp/ (Cloud Run+SQL+Memorystore)
├── monitoring/                # Prometheus and Grafana configuration
│   ├── prometheus/            # prometheus.yml scrape configuration
│   └── grafana/               # Grafana provisioning YAML and dashboard JSON files
├── docs/                      # All project documentation
│   ├── engineering/           # Internal engineering knowledge base (this directory)
│   ├── developers/            # End-user API reference and developer guides
│   ├── devops/                # Operator runbooks and environment variable reference
│   ├── agntcy/                # AGNTCY alignment documentation
│   └── openapi/               # OpenAPI 3.0 specification files
├── openspec/                  # OpenSpec change management — proposals, designs, specs, tasks, archives
├── tests/                     # Jest test suite — mirrors src/ structure
│   ├── unit/                  # Unit tests (mocked dependencies) — mirrors src/
│   ├── integration/           # Integration tests (real DB + Redis)
│   ├── agntcy-conformance/    # AGNTCY conformance test suite (separate Jest config)
│   └── load/                  # k6 load test scripts
├── Dockerfile                 # Multi-stage production build (build + runtime stages)
├── docker-compose.yml         # Local development: PostgreSQL 14 (port 5432) + Redis 7 (port 6379)
├── docker-compose.monitoring.yml  # Monitoring overlay: Prometheus (port 9090) + Grafana (port 3001)
├── package.json               # Node.js dependencies and npm scripts
├── tsconfig.json              # TypeScript strict configuration — compiled to dist/
└── jest.config.ts             # Jest configuration — ts-jest, test timeouts, coverage thresholds
```

---

### Change 3b — Add New src/ Subdirectories to Section 2

**Find (Section 2 table, the last row):**
```
| `src/cache/` | Redis client factory — creates and caches a single `redis` client instance | Client is a singleton created once in `src/app.ts` and passed to repositories |
```

**Insert these rows after that line:**
```
| `src/config/` | Configuration constants — `tiers.ts` exports `TIER_CONFIG`, `TIER_RANK`, `TierName`, and `isTierName()` type guard | Imported by `TierService` and `tierMiddleware`; never imports from services |
| `src/middleware/tier.ts` | Tier enforcement middleware — reads org tier from `TierService`, checks daily call counter in Redis, throws `TierLimitError` (429) when limit is exceeded, increments counter on pass | Applied only to API routes; skips `/health`, `/metrics`, and static file routes |
```

---

### Change 3c — Add New Entries to Section 3 (Where to Add New Code)

**Find (Section 3 table, after the `A new Prometheus metric` row):**
```
| A new TypeScript type used in 2+ files | `src/types/index.ts` | A new `AgentGroupMembership` interface |
```

**Insert these rows after that line:**
```
| A new tier-gated feature | `src/config/tiers.ts` (add limit field) + `src/middleware/tier.ts` (add check) + service (enforce) | Adding a `maxWebhooksPerOrg` tier limit |
| A webhook event handler | `src/services/WebhookService.ts` (add event type to `WebhookEventType`) + the producer that calls `void webhookService.dispatch(orgId, eventType, payload)` | Emitting `agent.decommissioned` events to subscriber URLs |
| A new analytics metric type | `src/services/AnalyticsService.ts` (call `recordEvent(tenantId, 'new_metric')` in the relevant service using `void`) | Recording `credential_rotated` events for analytics |
| A new DID endpoint | `src/controllers/DIDController.ts` + `src/routes/did.ts` + `src/services/DIDService.ts` (if new method needed) + `policies/data/scopes.json` | Adding `GET /api/v1/agents/:id/did/rotate-key` |
```

---

## File 4: `docs/engineering/README.md`

**Operation:** Replace the reading order table and quick reference table to reflect all Phase 6 additions.

---

### Change 4a — Update Reading Order Table

**Find (Section "Reading Order (New Engineers Start Here)", the last row):**
```
| 11 | [SDK Integration Guide](11-sdk-guide.md) | All 4 SDKs — installation, examples, contribution guide | 20 min |
```

**Replace with (adds the Rust SDK to the description and updates the estimated time):**
```
| 11 | [SDK Integration Guide](11-sdk-guide.md) | All 5 SDKs (Node.js, Python, Go, Java, Rust) — installation, examples, contribution guide | 25 min |
```

**Find (the line after the table):**
```
**Total estimated reading time for new engineers: ~3.5 hours**
```

**Replace with:**
```
**Total estimated reading time for new engineers: ~4 hours**
```

---

### Change 4b — Update "Service Deep Dives" Entry

**Find:**
```
| 5 | [Service Deep Dives](05-services.md) | All 8 services/components — purpose, interface, schema, error types | 30 min |
```

**Replace with:**
```
| 5 | [Service Deep Dives](05-services.md) | All 17 services/components (incl. Phase 3–6: AnalyticsService, TierService, ComplianceService, FederationService, DIDService, WebhookService, BillingService, DelegationService, OIDCService) — purpose, interface, schema, error types | 45 min |
```

---

### Change 4c — Update Quick Reference Table

**Find (in the Quick Reference section):**
```
| Integrate with the SDK | [11-sdk-guide.md](11-sdk-guide.md) |
```

**Replace with:**
```
| Integrate with the SDK (Node.js, Python, Go, Java, Rust) | [11-sdk-guide.md](11-sdk-guide.md) |
```

**Find (after the "Integrate with the SDK" row):**
```
| Understand why a technology was chosen | [03-tech-stack.md](03-tech-stack.md) |
```

**Insert after that row:**
```
| Understand tier limits and billing | [01-overview.md](01-overview.md) (Section 6) + [03-tech-stack.md](03-tech-stack.md) (ADR-11) |
| Understand AGNTCY compliance reports | [05-services.md](05-services.md) (ComplianceService) |
| Understand the A2A delegation flow | [06-walkthroughs.md](06-walkthroughs.md) (Walkthrough 4) |
| Run the AGNTCY conformance suite | [09-testing.md](09-testing.md) (Section 10.8) |
| Add a new Rust SDK endpoint | [11-sdk-guide.md](11-sdk-guide.md) (Section 6 contribution guide) |
```

---

## File 5: `docs/engineering/06-walkthroughs.md`

**Operation:** Append three new walkthrough sections at the end of the file.

**Find (the last line of the file):**
```
Returns `ICredentialWithSecret` — the updated credential including the new
`clientSecret`. This is the only time the new secret is ever returned. The caller
must store it securely.
```

**Append the following after that final JSON block:**

```markdown
---

## Walkthrough 4 — A2A Delegation End-to-End

**Request:** `POST /api/v1/oauth2/token/delegate` — one AI agent delegating a scoped capability to another

This walkthrough traces how agent A (an orchestrator) issues a delegation token that grants agent B (a sub-agent) the right to act on its behalf with a restricted scope.

---

### Step 1 — Route dispatch

**File:** `src/routes/delegation.ts`

```typescript
router.post(
  '/token/delegate',
  asyncHandler(authMiddleware),
  opaMiddleware,
  asyncHandler(delegationController.createDelegation.bind(delegationController))
);
```

Both `authMiddleware` and `opaMiddleware` run. The OPA policy requires scope `agents:write` for delegation creation.

---

### Step 2 — Controller: extract delegator and validate

**File:** `src/controllers/DelegationController.ts`

```typescript
const delegatorId = req.user.sub;               // From the Bearer token's sub claim
const { delegatee_id, scope, expires_at } = req.body;
```

The controller validates that `delegatee_id` is a non-empty UUID, `scope` is a non-empty string, and `expires_at` (if provided) is a valid ISO 8601 datetime in the future. It passes these to `DelegationService.createDelegation()`.

---

### Step 3 — Service: verify both agents exist

**File:** `src/services/DelegationService.ts`

```typescript
const delegator = await this.agentRepository.findById(delegatorId);
if (!delegator || delegator.status !== 'active') { throw new AgentNotFoundError(delegatorId) }

const delegatee = await this.agentRepository.findById(delegateeId);
if (!delegatee || delegatee.status !== 'active') { throw new AgentNotFoundError(delegateeId) }
```

Both agents must exist and be in `active` status. A suspended or decommissioned agent cannot participate in delegation.

---

### Step 4 — Service: insert delegation chain record

**File:** `src/services/DelegationService.ts`

```typescript
await this.pool.query(
  `INSERT INTO delegation_chains (chain_id, delegator_id, delegatee_id, scope, status, expires_at)
   VALUES ($1, $2, $3, $4, 'active', $5)`,
  [chainId, delegatorId, delegateeId, scope, expiresAt]
);
```

The `chain_id` is a UUID generated by the service. The `delegation_chains` table provides the authoritative source of truth for which delegations are active, independent of any token.

---

### Step 5 — Response

```json
{
  "chain_id": "f1e2d3c4-...",
  "token": "eyJhbGciOiJSUzI1NiJ9...",
  "delegator_id": "a1b2c3d4-...",
  "delegatee_id": "b2c3d4e5-...",
  "scope": "agents:read",
  "status": "active",
  "expires_at": "2026-04-05T00:00:00Z"
}
```

The `token` field is the signed delegation JWT. The delegatee presents this token to `POST /api/v1/oauth2/token/verify-delegation` to prove it has authority to act on the delegator's behalf.

**Why store both the DB record and the JWT?** The DB record allows revocation — when the delegator calls `DELETE /api/v1/delegation-chains/:chainId`, the record is soft-deleted and all subsequent `verify-delegation` calls will fail even if the JWT itself has not yet expired.

---

## Walkthrough 5 — Tier Enforcement Request Lifecycle

**Request:** Any authenticated API request when the organisation's daily call limit is reached

This walkthrough traces how `tierMiddleware` intercepts a request before it reaches the OPA middleware, preventing quota-exceeded traffic from consuming service resources.

---

### Step 1 — Auth middleware passes

Same as Walkthrough 2, Step 3. The Bearer JWT is verified and `req.user` is populated with `sub` (agentId) and `organization_id`.

---

### Step 2 — Tier middleware: fetch org tier

**File:** `src/middleware/tier.ts`

```typescript
const orgId = req.user.organization_id;
const tier = await tierService.fetchTier(orgId);
const config = TIER_CONFIG[tier];
```

`fetchTier()` issues `SELECT tier FROM organizations WHERE organization_id = $1`. Returns `'free'` if no row is found (safe default).

---

### Step 3 — Tier middleware: read daily counter

**File:** `src/middleware/tier.ts`

```typescript
const callsKey = `rate:tier:calls:${orgId}`;
const callsToday = await redis.get(callsKey);
const count = callsToday !== null ? parseInt(callsToday, 10) : 0;

if (count >= config.maxCallsPerDay) {
  throw new TierLimitError('calls', config.maxCallsPerDay, { orgId, tier, current: count });
}
```

The Redis key `rate:tier:calls:<orgId>` is read. If null (first call of the day), count is 0. When count equals or exceeds the tier limit, `TierLimitError` (HTTP 429) is thrown immediately — no further middleware runs.

---

### Step 4 — Tier middleware: increment counter (fire-and-forget)

**File:** `src/middleware/tier.ts`

```typescript
// Set TTL to next UTC midnight if key is new
void redis.multi()
  .incr(callsKey)
  .expireAt(callsKey, nextUtcMidnightUnix())
  .exec();
next();
```

The counter is incremented atomically using a Redis MULTI block. The `EXPIREAT` command sets the key to auto-delete at the next UTC midnight, resetting the daily counter without any scheduled job. The increment is fire-and-forget — the request proceeds immediately to `opaMiddleware`.

**Why expire at UTC midnight rather than a rolling 24-hour window?** Tier limits are documented as "per day", which users interpret as resetting at midnight. A rolling window would allow a user to consume their full daily quota twice within a 48-hour period straddling midnight, which is counterintuitive. UTC midnight is predictable and easy to reason about.

---

### Step 5 — Error handler serialises TierLimitError

**File:** `src/middleware/errorHandler.ts`

```json
HTTP 429
{
  "code": "TIER_LIMIT_EXCEEDED",
  "message": "Daily API call limit reached for your tier.",
  "details": {
    "tier": "free",
    "limit": 1000,
    "current": 1000
  }
}
```

The `Retry-After` header is set to the number of seconds until next UTC midnight so clients can implement automatic backoff.

---

## Walkthrough 6 — Analytics Event Capture Flow

**Trigger:** Any successful token issuance (`POST /api/v1/token`)

This walkthrough traces how an analytics event is captured without affecting the latency of the primary token issuance response.

---

### Step 1 — Token issuance completes

**File:** `src/services/OAuth2Service.ts`

```typescript
const accessToken = signToken(payload, this.privateKey);
// Primary response is ready — analytics is now fire-and-forget
void this.analyticsService.recordEvent(tenantId, 'token_issued');
tokensIssuedTotal.inc({ scope });
```

The `signToken()` call completes synchronously (RSA signing is CPU-bound, not I/O). The controller can now send the response. `analyticsService.recordEvent()` is called with `void` — the `await` is deliberately omitted.

**Why `void` instead of `await`?** Token issuance latency must remain below 100ms (per the QA performance gate). A PostgreSQL write adds 5–15ms. Since analytics data is aggregated (not transactional), losing an occasional event due to an error is acceptable. The response is never delayed for analytics.

---

### Step 2 — AnalyticsService: UPSERT daily counter

**File:** `src/services/AnalyticsService.ts`

```typescript
async recordEvent(tenantId: string, metricType: string): Promise<void> {
  try {
    await this.pool.query(
      `INSERT INTO analytics_events (organization_id, date, metric_type, count)
       VALUES ($1, CURRENT_DATE, $2, 1)
       ON CONFLICT (organization_id, date, metric_type)
       DO UPDATE SET count = analytics_events.count + 1`,
      [tenantId, metricType],
    );
  } catch (err) {
    console.error('[AnalyticsService] recordEvent failed — primary path unaffected', err);
  }
}
```

The `ON CONFLICT DO UPDATE` upsert is atomic. Whether this is the first or the ten-thousandth `token_issued` event for this tenant today, the row is updated correctly. All errors are caught and swallowed — the token has already been returned to the caller.

**Why one row per day per metric, not one row per event?** Storing a row per event would create millions of rows. The daily aggregate model keeps the table compact while still providing daily trend data (the granularity that analytics dashboards need). Sub-day granularity is available from the Prometheus `agentidp_tokens_issued_total` counter if needed.

---

### Step 3 — Dashboard query (deferred)

When a developer visits the analytics page in the developer portal, the portal calls:

```
GET /api/v1/analytics/token-trend?days=30
```

**File:** `src/services/AnalyticsService.ts` — `getTokenTrend(tenantId, 30)`

```sql
SELECT
  gs.date::DATE::TEXT AS date,
  COALESCE(ae.count, 0)::INTEGER AS count
FROM generate_series(
  CURRENT_DATE - 29 * INTERVAL '1 day',
  CURRENT_DATE,
  INTERVAL '1 day'
) AS gs(date)
LEFT JOIN analytics_events ae
  ON ae.date = gs.date::DATE
  AND ae.organization_id = $2
  AND ae.metric_type = 'token_issued'
ORDER BY gs.date ASC
```

The `generate_series` + `LEFT JOIN` pattern ensures all 30 days appear in the result, with `count: 0` for days with no events. This avoids the need for the client to fill in gaps.
```