Files
sentryagent-idp/docs/devops/architecture.md
SentryAgent.ai Developer 8cabc0191c docs: commit all Phase 6 documentation updates and OpenSpec archives
- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook)
- developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added
- engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview)
- OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update
- VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added
- .gitignore: exclude session artifacts, build artifacts, and agent workspaces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 02:24:24 +00:00

232 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Architecture
## Component Overview
```
┌───────────────────────────────────────────┐
│ Next.js Portal (port 3001) │
│ portal/ — Next.js 14 │
│ /login /agents /credentials /audit │
│ /analytics /settings/tier /compliance │
│ /webhooks /marketplace │
└────────────────┬──────────────────────────┘
│ HTTP (localhost:3000)
┌────────────────▼──────────────────────────┐
│ AgentIdP Application │
│ Node.js / Express (port 3000) │
│ │
│ TLS MW → Helmet → CORS → Morgan │
│ Metrics MW → OrgContext MW │
│ UsageMetering MW → TierEnforcement MW │
│ Auth MW → OPA MW → Routes │
│ ↓ │
│ Controllers → Services → Repos │
└──────────┬───────────────┬────────────────┘
│ │
┌────────────────▼──┐ ┌────────▼────────┐
│ PostgreSQL 14 │ │ Redis 7 │
│ Port 5432 │ │ Port 6379 │
│ │ │ │
│ 26 migrations │ │ Rate limits │
│ (001026) │ │ Token revoke │
│ organizations │ │ Monthly counts │
│ agents + DID keys │ │ Tier counters │
│ credentials │ │ Compliance cache│
│ audit_events │ │ │
│ token_revocations │ └──────────────────┘
│ oidc_keys │
│ federation_partne-│ ┌──────────────────┐
│ rs │ │ HashiCorp Vault │
│ webhook_subscript-│ │ (optional) │
│ ions + deliveries │ │ KV v2 — creds │
│ agent_marketplace │ └──────────────────┘
│ github_oidc_trust │
│ billing │ ┌──────────────────┐
│ delegation_chains │ │ Stripe │
│ analytics_events │ │ (optional) │
│ tenant_tiers │ │ Billing/upgrades │
└────────────────────┘ └──────────────────┘
```
## Components
### AgentIdP Application
A stateless Express HTTP server. Every request is handled independently — no in-process shared state. This means it can be horizontally scaled (multiple instances) as long as all instances share the same PostgreSQL and Redis.
**Internal layers:**
| Layer | Responsibility |
|-------|---------------|
| Routes | Wire HTTP methods and paths to controllers |
| TLS middleware | Redirect HTTP → HTTPS when `ENFORCE_TLS=true` |
| Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) |
| OrgContext middleware | Resolve `organization_id` from JWT and attach to `req` |
| UsageMetering middleware | Fire-and-forget analytics event recording |
| TierEnforcement middleware | Enforce daily API call and token limits via Redis (when `TIER_ENFORCEMENT=true`) |
| OPA middleware | Scope-based authorization via embedded Wasm or JSON policy |
| Controllers | Parse and validate request, call service, return response |
| Services | Business logic — no direct DB access |
| Repositories | All SQL queries — no business logic |
| Utils | JWT sign/verify, bcrypt, error types, async handler |
### PostgreSQL 14+
Primary durable data store. All agent identities, credentials, audit events, and token revocation records live here. See [database.md](database.md) for schema details.
The application connects via a connection pool (`pg.Pool`) initialised from `DATABASE_URL`. The pool is a singleton shared across all request handlers.
### Redis 7+
Ephemeral store for three use cases:
| Key pattern | Example | Purpose | TTL |
|------------|---------|---------|-----|
| `revoked:<jti>` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime |
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` |
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month |
| `rate:tier:calls:<tenantId>` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC |
| `rate:tier:tokens:<tenantId>` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC |
| `compliance:report:<tenantId>` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes |
**Redis is supplementary, not the source of truth.** Token revocations are also written to the `token_revocations` PostgreSQL table for durability across Redis restarts. On Redis restart, the revocation list is cold — previously revoked tokens will pass auth until the PostgreSQL-backed warm-up is implemented (Phase 2).
## Request Data Flow
```
HTTP Request
Express Router (matches path + method)
Auth Middleware
- Extract Bearer token from Authorization header
- Verify RS256 signature using JWT_PUBLIC_KEY
- Check Redis for revocation (key: revoked:<jti>)
- Attach decoded payload to req.user
Rate Limit Middleware
- Key: rate:<client_id>:<60s-window>
- Increment counter in Redis (INCR + EXPIRE)
- Set X-RateLimit-* headers
- Reject with 429 if count > 100
Controller
- Validate request body / query params (Joi schemas)
- Call service method
- Return HTTP response
Service
- Business logic and orchestration
- Calls one or more repositories
- Fires audit log writes (async, fire-and-forget)
Repository
- Executes parameterised SQL queries
- Maps DB rows to typed interfaces
- Returns typed results to service
PostgreSQL / Redis
```
## Service Map
| Route prefix | Controller | Service(s) | Repository/ies |
|-------------|-----------|-----------|----------------|
| `/api/v1/agents` | `AgentController` | `AgentService` | `AgentRepository` |
| `/api/v1/credentials` | `CredentialController` | `CredentialService` | `CredentialRepository` |
| `/api/v1/token` | `TokenController` | `OAuth2Service` | `TokenRepository`, `CredentialRepository`, `AgentRepository` |
| `/api/v1/audit` | `AuditController` | `AuditService` | `AuditRepository` |
| `/api/v1/organizations` | `OrgController` | `OrgService` | `OrgRepository` |
| `/api/v1/compliance/*` | `ComplianceController` | `ComplianceService` | `AuditRepository` |
| `/api/v1/analytics/*` | `AnalyticsController` | `AnalyticsService` | direct pool queries |
| `/api/v1/tiers/*` | `TierController` | `TierService` | pool queries, Stripe SDK |
| `/api/v1/webhooks` | `WebhookController` | `WebhookService` | `WebhookRepository` |
| `/api/v1/federation` | `FederationController` | `FederationService` | direct pool queries |
| `/api/v1/marketplace` | `MarketplaceController` | `MarketplaceService` | direct pool queries |
| `/api/v1/billing` | `BillingController` | `BillingService` | direct pool queries |
| `/.well-known/did.json`, `/api/v1/did/*` | `DIDController` | `DIDService` | `AgentRepository` |
| `/.well-known/openid-configuration`, `/api/v1/oidc/*` | `OIDCController` | `OIDCKeyService`, `IDTokenService` | direct pool queries |
| `/api/v1/oidc/trust-policies` | `OIDCTrustPolicyController` | `OIDCTrustPolicyService` | direct pool queries |
| `/api/v1/delegation` | `DelegationController` | `DelegationService` | direct pool queries |
| `/api/v1/scaffold` | `ScaffoldController` | `ScaffoldService` | — |
| `/health` | inline | — | pool, redis |
| `/metrics` | inline | — | prom-client |
## New Services (Phases 36)
| Service | Source file | Responsibility |
|---------|------------|----------------|
| `AnalyticsService` | `src/services/AnalyticsService.ts` | Fire-and-forget `recordEvent`, time-series `getTokenTrend`, heatmap `getAgentActivity`, per-agent `getAgentUsageSummary` |
| `TierService` | `src/services/TierService.ts` | `getStatus` (reads `tenant_tiers`), `initiateUpgrade` (creates Stripe Checkout Session), `applyUpgrade` (handles Stripe webhook), `enforceAgentLimit` |
| `ComplianceService` | `src/services/ComplianceService.ts` | `generateReport` (Redis-cached 5 min), `exportAgentCards` (AGNTCY format) |
| `DelegationService` | `src/services/DelegationService.ts` | A2A delegation chain creation and verification |
| `DIDService` | `src/services/DIDService.ts` | `did:web` identifier generation and DID document management |
| `OIDCKeyService` | `src/services/OIDCKeyService.ts` | OIDC key rotation, JWKS endpoint |
| `IDTokenService` | `src/services/IDTokenService.ts` | OIDC ID token issuance |
| `FederationService` | `src/services/FederationService.ts` | Cross-tenant agent identity federation |
| `WebhookService` | `src/services/WebhookService.ts` | Event subscriptions, delivery with retry, dead-letter queue |
| `VaultService` | `src/services/VaultService.ts` | HashiCorp Vault KV v2 read/write for credential storage |
| `BillingService` | `src/services/BillingService.ts` | Stripe customer and subscription management |
| `MarketplaceService` | `src/services/MarketplaceService.ts` | Agent listing and discovery |
| `OIDCTrustPolicyService` | `src/services/OIDCTrustPolicyService.ts` | GitHub OIDC trust policy management |
| `EventPublisher` | `src/services/EventPublisher.ts` | Routes domain events to webhook delivery and Kafka (if configured) |
## Ports
| Service | Internal port | Exposed port (local dev) |
|---------|--------------|--------------------------|
| AgentIdP app | 3000 | 3000 |
| Next.js portal | 3001 | 3001 |
| PostgreSQL | 5432 | 5432 |
| Redis | 6379 | 6379 |
## API Routes (Phase 6 complete)
Base path: `/api/v1`
| Route | Method(s) | Auth | Feature flag |
|-------|----------|------|-------------|
| `/api/v1/agents` | GET, POST, PATCH, DELETE | Bearer JWT | always on |
| `/api/v1/credentials` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/token` | POST | none (client credentials) | always on |
| `/api/v1/audit` | GET | Bearer JWT | always on |
| `/api/v1/audit/verify` | GET | Bearer JWT | always on |
| `/api/v1/organizations` | GET, POST | Bearer JWT | always on |
| `/api/v1/compliance/controls` | GET | none | always on |
| `/api/v1/compliance/report` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/compliance/agent-cards` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/analytics/token-trend` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/agent-activity` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/usage-summary` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/tiers/status` | GET | Bearer JWT | always on |
| `/api/v1/tiers/upgrade` | POST | Bearer JWT | always on |
| `/api/v1/webhooks` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/federation` | GET, POST | Bearer JWT | always on |
| `/api/v1/delegation` | GET, POST | Bearer JWT | always on |
| `/api/v1/marketplace` | GET | none | always on |
| `/api/v1/billing` | GET, POST | Bearer JWT | always on |
| `/api/v1/did/*` | GET | none | always on |
| `/api/v1/oidc/*` | GET, POST | mixed | always on |
| `/.well-known/openid-configuration` | GET | none | always on |
| `/.well-known/jwks.json` | GET | none | always on |
| `/.well-known/did.json` | GET | none | always on |
| `/health` | GET | none | always on |
| `/metrics` | GET | none | always on |
## Graceful Shutdown
The server listens for `SIGTERM` and `SIGINT`. On receipt:
1. `server.close()` is called — stops accepting new connections
2. In-flight requests complete
3. `process.exit(0)` is called
The PostgreSQL pool and Redis client are not explicitly closed in the current shutdown path. This is safe for single-instance deployments; connection cleanup is handled by the OS.