docs: commit all Phase 6 documentation updates and OpenSpec archives
- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook) - developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added - engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview) - OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update - VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added - .gitignore: exclude session artifacts, build artifacts, and agent workspaces Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -715,3 +715,260 @@ must store it securely.
|
||||
"revokedAt": null
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 4 — A2A Delegation End-to-End
|
||||
|
||||
**Request:** `POST /api/v1/oauth2/token/delegate` — one AI agent delegating a scoped capability to another
|
||||
|
||||
This walkthrough traces how agent A (an orchestrator) issues a delegation token that grants agent B (a sub-agent) the right to act on its behalf with a restricted scope.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Route dispatch
|
||||
|
||||
**File:** `src/routes/delegation.ts`
|
||||
|
||||
```typescript
|
||||
router.post(
|
||||
'/token/delegate',
|
||||
asyncHandler(authMiddleware),
|
||||
opaMiddleware,
|
||||
asyncHandler(delegationController.createDelegation.bind(delegationController))
|
||||
);
|
||||
```
|
||||
|
||||
Both `authMiddleware` and `opaMiddleware` run. The OPA policy requires scope `agents:write` for delegation creation.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Controller: extract delegator and validate
|
||||
|
||||
**File:** `src/controllers/DelegationController.ts`
|
||||
|
||||
```typescript
|
||||
const delegatorId = req.user.sub; // From the Bearer token's sub claim
|
||||
const { delegatee_id, scope, expires_at } = req.body;
|
||||
```
|
||||
|
||||
The controller validates that `delegatee_id` is a non-empty UUID, `scope` is a non-empty string, and `expires_at` (if provided) is a valid ISO 8601 datetime in the future. It passes these to `DelegationService.createDelegation()`.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Service: verify both agents exist
|
||||
|
||||
**File:** `src/services/DelegationService.ts`
|
||||
|
||||
```typescript
|
||||
const delegator = await this.agentRepository.findById(delegatorId);
|
||||
if (!delegator || delegator.status !== 'active') { throw new AgentNotFoundError(delegatorId) }
|
||||
|
||||
const delegatee = await this.agentRepository.findById(delegateeId);
|
||||
if (!delegatee || delegatee.status !== 'active') { throw new AgentNotFoundError(delegateeId) }
|
||||
```
|
||||
|
||||
Both agents must exist and be in `active` status. A suspended or decommissioned agent cannot participate in delegation.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Service: insert delegation chain record
|
||||
|
||||
**File:** `src/services/DelegationService.ts`
|
||||
|
||||
```typescript
|
||||
await this.pool.query(
|
||||
`INSERT INTO delegation_chains (chain_id, delegator_id, delegatee_id, scope, status, expires_at)
|
||||
VALUES ($1, $2, $3, $4, 'active', $5)`,
|
||||
[chainId, delegatorId, delegateeId, scope, expiresAt]
|
||||
);
|
||||
```
|
||||
|
||||
The `chain_id` is a UUID generated by the service. The `delegation_chains` table provides the authoritative source of truth for which delegations are active, independent of any token.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Response
|
||||
|
||||
```json
|
||||
{
|
||||
"chain_id": "f1e2d3c4-...",
|
||||
"token": "eyJhbGciOiJSUzI1NiJ9...",
|
||||
"delegator_id": "a1b2c3d4-...",
|
||||
"delegatee_id": "b2c3d4e5-...",
|
||||
"scope": "agents:read",
|
||||
"status": "active",
|
||||
"expires_at": "2026-04-05T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
The `token` field is the signed delegation JWT. The delegatee presents this token to `POST /api/v1/oauth2/token/verify-delegation` to prove it has authority to act on the delegator's behalf.
|
||||
|
||||
**Why store both the DB record and the JWT?** The DB record allows revocation — when the delegator calls `DELETE /api/v1/delegation-chains/:chainId`, the record is soft-deleted and all subsequent `verify-delegation` calls will fail even if the JWT itself has not yet expired.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 5 — Tier Enforcement Request Lifecycle
|
||||
|
||||
**Request:** Any authenticated API request when the organisation's daily call limit is reached
|
||||
|
||||
This walkthrough traces how `tierMiddleware` intercepts a request before it reaches the OPA middleware, preventing quota-exceeded traffic from consuming service resources.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Auth middleware passes
|
||||
|
||||
Same as Walkthrough 2, Step 3. The Bearer JWT is verified and `req.user` is populated with `sub` (agentId) and `organization_id`.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Tier middleware: fetch org tier
|
||||
|
||||
**File:** `src/middleware/tier.ts`
|
||||
|
||||
```typescript
|
||||
const orgId = req.user.organization_id;
|
||||
const tier = await tierService.fetchTier(orgId);
|
||||
const config = TIER_CONFIG[tier];
|
||||
```
|
||||
|
||||
`fetchTier()` issues `SELECT tier FROM organizations WHERE organization_id = $1`. Returns `'free'` if no row is found (safe default).
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Tier middleware: read daily counter
|
||||
|
||||
**File:** `src/middleware/tier.ts`
|
||||
|
||||
```typescript
|
||||
const callsKey = `rate:tier:calls:${orgId}`;
|
||||
const callsToday = await redis.get(callsKey);
|
||||
const count = callsToday !== null ? parseInt(callsToday, 10) : 0;
|
||||
|
||||
if (count >= config.maxCallsPerDay) {
|
||||
throw new TierLimitError('calls', config.maxCallsPerDay, { orgId, tier, current: count });
|
||||
}
|
||||
```
|
||||
|
||||
The Redis key `rate:tier:calls:<orgId>` is read. If null (first call of the day), count is 0. When count equals or exceeds the tier limit, `TierLimitError` (HTTP 429) is thrown immediately — no further middleware runs.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Tier middleware: increment counter (fire-and-forget)
|
||||
|
||||
**File:** `src/middleware/tier.ts`
|
||||
|
||||
```typescript
|
||||
// Set TTL to next UTC midnight if key is new
|
||||
void redis.multi()
|
||||
.incr(callsKey)
|
||||
.expireAt(callsKey, nextUtcMidnightUnix())
|
||||
.exec();
|
||||
next();
|
||||
```
|
||||
|
||||
The counter is incremented atomically using a Redis MULTI block. The `EXPIREAT` command sets the key to auto-delete at the next UTC midnight, resetting the daily counter without any scheduled job. The increment is fire-and-forget — the request proceeds immediately to `opaMiddleware`.
|
||||
|
||||
**Why expire at UTC midnight rather than a rolling 24-hour window?** Tier limits are documented as "per day", which users interpret as resetting at midnight. A rolling window would allow a user to consume their full daily quota twice within a 48-hour period straddling midnight, which is counterintuitive. UTC midnight is predictable and easy to reason about.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Error handler serialises TierLimitError
|
||||
|
||||
**File:** `src/middleware/errorHandler.ts`
|
||||
|
||||
```json
|
||||
HTTP 429
|
||||
{
|
||||
"code": "TIER_LIMIT_EXCEEDED",
|
||||
"message": "Daily API call limit reached for your tier.",
|
||||
"details": {
|
||||
"tier": "free",
|
||||
"limit": 1000,
|
||||
"current": 1000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `Retry-After` header is set to the number of seconds until next UTC midnight so clients can implement automatic backoff.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 6 — Analytics Event Capture Flow
|
||||
|
||||
**Trigger:** Any successful token issuance (`POST /api/v1/token`)
|
||||
|
||||
This walkthrough traces how an analytics event is captured without affecting the latency of the primary token issuance response.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Token issuance completes
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts`
|
||||
|
||||
```typescript
|
||||
const accessToken = signToken(payload, this.privateKey);
|
||||
// Primary response is ready — analytics is now fire-and-forget
|
||||
void this.analyticsService.recordEvent(tenantId, 'token_issued');
|
||||
tokensIssuedTotal.inc({ scope });
|
||||
```
|
||||
|
||||
The `signToken()` call completes synchronously (RSA signing is CPU-bound, not I/O). The controller can now send the response. `analyticsService.recordEvent()` is called with `void` — the `await` is deliberately omitted.
|
||||
|
||||
**Why `void` instead of `await`?** Token issuance latency must remain below 100ms (per the QA performance gate). A PostgreSQL write adds 5–15ms. Since analytics data is aggregated (not transactional), losing an occasional event due to an error is acceptable. The response is never delayed for analytics.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — AnalyticsService: UPSERT daily counter
|
||||
|
||||
**File:** `src/services/AnalyticsService.ts`
|
||||
|
||||
```typescript
|
||||
async recordEvent(tenantId: string, metricType: string): Promise<void> {
|
||||
try {
|
||||
await this.pool.query(
|
||||
`INSERT INTO analytics_events (organization_id, date, metric_type, count)
|
||||
VALUES ($1, CURRENT_DATE, $2, 1)
|
||||
ON CONFLICT (organization_id, date, metric_type)
|
||||
DO UPDATE SET count = analytics_events.count + 1`,
|
||||
[tenantId, metricType],
|
||||
);
|
||||
} catch (err) {
|
||||
console.error('[AnalyticsService] recordEvent failed — primary path unaffected', err);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `ON CONFLICT DO UPDATE` upsert is atomic. Whether this is the first or the ten-thousandth `token_issued` event for this tenant today, the row is updated correctly. All errors are caught and swallowed — the token has already been returned to the caller.
|
||||
|
||||
**Why one row per day per metric, not one row per event?** Storing a row per event would create millions of rows. The daily aggregate model keeps the table compact while still providing daily trend data (the granularity that analytics dashboards need). Sub-day granularity is available from the Prometheus `agentidp_tokens_issued_total` counter if needed.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Dashboard query (deferred)
|
||||
|
||||
When a developer visits the analytics page in the developer portal, the portal calls:
|
||||
|
||||
```
|
||||
GET /api/v1/analytics/token-trend?days=30
|
||||
```
|
||||
|
||||
**File:** `src/services/AnalyticsService.ts` — `getTokenTrend(tenantId, 30)`
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
gs.date::DATE::TEXT AS date,
|
||||
COALESCE(ae.count, 0)::INTEGER AS count
|
||||
FROM generate_series(
|
||||
CURRENT_DATE - 29 * INTERVAL '1 day',
|
||||
CURRENT_DATE,
|
||||
INTERVAL '1 day'
|
||||
) AS gs(date)
|
||||
LEFT JOIN analytics_events ae
|
||||
ON ae.date = gs.date::DATE
|
||||
AND ae.organization_id = $2
|
||||
AND ae.metric_type = 'token_issued'
|
||||
ORDER BY gs.date ASC
|
||||
```
|
||||
|
||||
The `generate_series` + `LEFT JOIN` pattern ensures all 30 days appear in the result, with `count: 0` for days with no events. This avoids the need for the client to fill in gaps.
|
||||
|
||||
Reference in New Issue
Block a user