docs: commit all Phase 6 documentation updates and OpenSpec archives
- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook) - developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added - engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview) - OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update - VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added - .gitignore: exclude session artifacts, build artifacts, and agent workspaces Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -18,21 +18,22 @@ Always start services in this order. Starting the application before PostgreSQL
|
||||
### Startup checklist
|
||||
|
||||
```bash
|
||||
# 1. Start PostgreSQL and Redis
|
||||
docker-compose up -d postgres redis
|
||||
# 1. Start the full stack
|
||||
docker compose up --build -d
|
||||
|
||||
# 2. Wait for healthy status
|
||||
docker-compose ps
|
||||
# Both postgres and redis must show "healthy" before proceeding
|
||||
# 2. Verify all three services are healthy
|
||||
docker compose ps
|
||||
# app, postgres, and redis must all show "healthy"
|
||||
|
||||
# 3. Run migrations
|
||||
npm run db:migrate
|
||||
# Must complete with 0 errors before starting the app
|
||||
docker compose exec app npm run db:migrate
|
||||
|
||||
# 4. Start the application
|
||||
npm run dev # development
|
||||
# or
|
||||
npm start # production (requires prior npm run build)
|
||||
# 4. Verify application health
|
||||
curl http://localhost:3000/health
|
||||
# Expected: {"status":"ok"}
|
||||
|
||||
# 5. (Optional) Start the portal for local dev
|
||||
cd portal && npm run dev
|
||||
```
|
||||
|
||||
---
|
||||
@@ -115,9 +116,12 @@ docker-compose exec redis redis-cli
|
||||
|
||||
| Key pattern | Example | Purpose | TTL |
|
||||
|------------|---------|---------|-----|
|
||||
| `revoked:<jti>` | `revoked:f1e2d3c4-b5a6-...` | Revoked token JTI | Remaining token lifetime |
|
||||
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per minute window | 60 seconds |
|
||||
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Token issuance count for free tier | End of month |
|
||||
| `revoked:<jti>` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime |
|
||||
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` |
|
||||
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month |
|
||||
| `rate:tier:calls:<tenantId>` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC |
|
||||
| `rate:tier:tokens:<tenantId>` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC |
|
||||
| `compliance:report:<tenantId>` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes |
|
||||
|
||||
Inspect keys:
|
||||
|
||||
@@ -130,6 +134,16 @@ redis-cli GET "rate:<client_id>:<window_key>"
|
||||
|
||||
# Check monthly token count for a specific client
|
||||
redis-cli GET "monthly:<client_id>:2026:3"
|
||||
|
||||
# Check tier API call counter for a tenant
|
||||
redis-cli GET "rate:tier:calls:<org_id>"
|
||||
|
||||
# Check tier token counter for a tenant
|
||||
redis-cli GET "rate:tier:tokens:<org_id>"
|
||||
|
||||
# Check cached compliance report for a tenant
|
||||
redis-cli GET "compliance:report:<org_id>"
|
||||
redis-cli TTL "compliance:report:<org_id>"
|
||||
```
|
||||
|
||||
Where `<window_key>` is `floor(unix_ms / 60000)`. For the current window:
|
||||
@@ -258,12 +272,25 @@ AgentIdP exposes a Prometheus metrics endpoint at `GET /metrics` (unauthenticate
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued successfully |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered successfully |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests received |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration |
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP latency |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration |
|
||||
| `agentidp_webhook_dead_letters_total` | Counter | `event_type` | Webhook deliveries moved to dead-letter queue |
|
||||
| `agentidp_credentials_expiring_soon_total` | Gauge | — | Credentials expiring within 7 days |
|
||||
| `agentidp_audit_chain_integrity` | Gauge | — | `1` if audit chain is intact, `0` if broken |
|
||||
| `agentidp_rate_limit_hits_total` | Counter | `client_id` | Rate limit rejections |
|
||||
| `agentidp_db_pool_active_connections` | Gauge | — | Active PostgreSQL connections |
|
||||
| `agentidp_db_pool_waiting_requests` | Gauge | — | Requests waiting for a pool connection |
|
||||
| `agentidp_tenant_api_calls_total` | Counter | `org_id`, `tier` | API calls per tenant per tier |
|
||||
| `agentidp_billing_limit_rejections_total` | Counter | `org_id`, `limit_type` | Tier limit enforcement rejections |
|
||||
| `agentidp_did_documents_generated_total` | Counter | — | DID documents generated |
|
||||
| `agentidp_oidc_tokens_issued_total` | Counter | — | OIDC ID tokens issued |
|
||||
| `agentidp_federation_events_total` | Counter | `event_type` | Federation partner events |
|
||||
| `agentidp_delegation_chains_created_total` | Counter | — | A2A delegation chains created |
|
||||
| `agentidp_compliance_reports_generated_total` | Counter | — | Compliance reports generated |
|
||||
|
||||
### Starting the Monitoring Stack
|
||||
|
||||
@@ -282,3 +309,50 @@ The Grafana dashboard auto-provisions on first start. Navigate to **Dashboards
|
||||
`GET /metrics` is unauthenticated. In production, ensure this endpoint is:
|
||||
- Only accessible from your internal network (firewall rule or reverse proxy restriction)
|
||||
- Not exposed on a public-facing port
|
||||
|
||||
---
|
||||
|
||||
### Tier limit rejected — 429 with `tier_limit_exceeded` code
|
||||
|
||||
Symptom: `429 TOO_MANY_REQUESTS` with body `{"code":"tier_limit_exceeded","message":"..."}`
|
||||
|
||||
Check the tenant's current tier counter:
|
||||
```bash
|
||||
# Check API call counter
|
||||
docker compose exec redis redis-cli GET "rate:tier:calls:<org_id>"
|
||||
|
||||
# Check the tenant's tier
|
||||
psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '<org_id>';"
|
||||
```
|
||||
|
||||
If the org is on the `free` tier and has hit 1,000 calls/day, upgrade the tier or wait until
|
||||
midnight UTC for the counter to reset.
|
||||
|
||||
---
|
||||
|
||||
### Analytics endpoints return 404
|
||||
|
||||
Cause: `ANALYTICS_ENABLED` is set to `false` in `.env`.
|
||||
|
||||
Fix: Set `ANALYTICS_ENABLED=true` and restart the application.
|
||||
|
||||
---
|
||||
|
||||
### Compliance report returns 404
|
||||
|
||||
Cause: `COMPLIANCE_ENABLED` is set to `false` in `.env`.
|
||||
|
||||
Fix: Set `COMPLIANCE_ENABLED=true` and restart the application.
|
||||
|
||||
---
|
||||
|
||||
### Portal CORS error
|
||||
|
||||
Symptom: Browser console shows `Access-Control-Allow-Origin` error on requests to
|
||||
`http://localhost:3000`.
|
||||
|
||||
Fix: Ensure `CORS_ORIGIN` in `.env` includes `http://localhost:3001`:
|
||||
```
|
||||
CORS_ORIGIN=http://localhost:3001
|
||||
```
|
||||
Restart the application after changing this variable.
|
||||
|
||||
Reference in New Issue
Block a user