docs: commit all Phase 6 documentation updates and OpenSpec archives

- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook)
- developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added
- engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview)
- OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update
- VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added
- .gitignore: exclude session artifacts, build artifacts, and agent workspaces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-04-07 02:24:24 +00:00
parent 0fb00256b4
commit 8cabc0191c
56 changed files with 12780 additions and 446 deletions

View File

@@ -18,21 +18,22 @@ Always start services in this order. Starting the application before PostgreSQL
### Startup checklist
```bash
# 1. Start PostgreSQL and Redis
docker-compose up -d postgres redis
# 1. Start the full stack
docker compose up --build -d
# 2. Wait for healthy status
docker-compose ps
# Both postgres and redis must show "healthy" before proceeding
# 2. Verify all three services are healthy
docker compose ps
# app, postgres, and redis must all show "healthy"
# 3. Run migrations
npm run db:migrate
# Must complete with 0 errors before starting the app
docker compose exec app npm run db:migrate
# 4. Start the application
npm run dev # development
# or
npm start # production (requires prior npm run build)
# 4. Verify application health
curl http://localhost:3000/health
# Expected: {"status":"ok"}
# 5. (Optional) Start the portal for local dev
cd portal && npm run dev
```
---
@@ -115,9 +116,12 @@ docker-compose exec redis redis-cli
| Key pattern | Example | Purpose | TTL |
|------------|---------|---------|-----|
| `revoked:<jti>` | `revoked:f1e2d3c4-b5a6-...` | Revoked token JTI | Remaining token lifetime |
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per minute window | 60 seconds |
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Token issuance count for free tier | End of month |
| `revoked:<jti>` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime |
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` |
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month |
| `rate:tier:calls:<tenantId>` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC |
| `rate:tier:tokens:<tenantId>` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC |
| `compliance:report:<tenantId>` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes |
Inspect keys:
@@ -130,6 +134,16 @@ redis-cli GET "rate:<client_id>:<window_key>"
# Check monthly token count for a specific client
redis-cli GET "monthly:<client_id>:2026:3"
# Check tier API call counter for a tenant
redis-cli GET "rate:tier:calls:<org_id>"
# Check tier token counter for a tenant
redis-cli GET "rate:tier:tokens:<org_id>"
# Check cached compliance report for a tenant
redis-cli GET "compliance:report:<org_id>"
redis-cli TTL "compliance:report:<org_id>"
```
Where `<window_key>` is `floor(unix_ms / 60000)`. For the current window:
@@ -258,12 +272,25 @@ AgentIdP exposes a Prometheus metrics endpoint at `GET /metrics` (unauthenticate
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued successfully |
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered successfully |
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests received |
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration |
| `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued |
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered |
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests |
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP latency |
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration |
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration |
| `agentidp_webhook_dead_letters_total` | Counter | `event_type` | Webhook deliveries moved to dead-letter queue |
| `agentidp_credentials_expiring_soon_total` | Gauge | — | Credentials expiring within 7 days |
| `agentidp_audit_chain_integrity` | Gauge | — | `1` if audit chain is intact, `0` if broken |
| `agentidp_rate_limit_hits_total` | Counter | `client_id` | Rate limit rejections |
| `agentidp_db_pool_active_connections` | Gauge | — | Active PostgreSQL connections |
| `agentidp_db_pool_waiting_requests` | Gauge | — | Requests waiting for a pool connection |
| `agentidp_tenant_api_calls_total` | Counter | `org_id`, `tier` | API calls per tenant per tier |
| `agentidp_billing_limit_rejections_total` | Counter | `org_id`, `limit_type` | Tier limit enforcement rejections |
| `agentidp_did_documents_generated_total` | Counter | — | DID documents generated |
| `agentidp_oidc_tokens_issued_total` | Counter | — | OIDC ID tokens issued |
| `agentidp_federation_events_total` | Counter | `event_type` | Federation partner events |
| `agentidp_delegation_chains_created_total` | Counter | — | A2A delegation chains created |
| `agentidp_compliance_reports_generated_total` | Counter | — | Compliance reports generated |
### Starting the Monitoring Stack
@@ -282,3 +309,50 @@ The Grafana dashboard auto-provisions on first start. Navigate to **Dashboards
`GET /metrics` is unauthenticated. In production, ensure this endpoint is:
- Only accessible from your internal network (firewall rule or reverse proxy restriction)
- Not exposed on a public-facing port
---
### Tier limit rejected — 429 with `tier_limit_exceeded` code
Symptom: `429 TOO_MANY_REQUESTS` with body `{"code":"tier_limit_exceeded","message":"..."}`
Check the tenant's current tier counter:
```bash
# Check API call counter
docker compose exec redis redis-cli GET "rate:tier:calls:<org_id>"
# Check the tenant's tier
psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '<org_id>';"
```
If the org is on the `free` tier and has hit 1,000 calls/day, upgrade the tier or wait until
midnight UTC for the counter to reset.
---
### Analytics endpoints return 404
Cause: `ANALYTICS_ENABLED` is set to `false` in `.env`.
Fix: Set `ANALYTICS_ENABLED=true` and restart the application.
---
### Compliance report returns 404
Cause: `COMPLIANCE_ENABLED` is set to `false` in `.env`.
Fix: Set `COMPLIANCE_ENABLED=true` and restart the application.
---
### Portal CORS error
Symptom: Browser console shows `Access-Control-Allow-Origin` error on requests to
`http://localhost:3000`.
Fix: Ensure `CORS_ORIGIN` in `.env` includes `http://localhost:3001`:
```
CORS_ORIGIN=http://localhost:3001
```
Restart the application after changing this variable.