Implements W3C DID Core 1.0 per-agent identity for every registered agent: Schema: - agent_did_keys table: stores EC P-256 public key JWK + Vault path for private key - agents.did + agents.did_created_at columns Key management: - EC P-256 key pair generated on every agent registration via Node.js crypto - Private key stored in Vault KV v2 (dev:no-vault marker when Vault not configured) - Public key JWK stored in PostgreSQL agent_did_keys table API (4 new endpoints): - GET /.well-known/did.json — instance DID Document (public, cached) - GET /api/v1/agents/:id/did — per-agent DID Document (public, 410 for decommissioned) - GET /api/v1/agents/:id/did/resolve — W3C DID Resolution result (agents:read scope) - GET /api/v1/agents/:id/did/card — AGNTCY agent card (public) Implementation: - DIDService: DID construction, key generation, Redis caching (TTL configurable) - DIDController: 410 Gone for decommissioned agents, correct Content-Type on resolve - AgentService: calls DIDService.generateDIDForAgent on every new registration Tests: 429 passing, DIDService 98.93% coverage, private key absence verified in all responses Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
143 lines
14 KiB
Markdown
143 lines
14 KiB
Markdown
# Phase 3: Enterprise — Tasks
|
|
|
|
**Status**: In Progress — WS1 complete
|
|
|
|
## CEO Approval Gates (required before implementation)
|
|
|
|
- [x] A0.1 Approve dependency: `did-resolver` + `web-did-resolver` (W3C DID support)
|
|
- [x] A0.2 Approve dependency: `oidc-provider` (certified OIDC server library)
|
|
- [x] A0.3 Approve dependency: `bull` (Redis-backed webhook delivery queue)
|
|
- [x] A0.4 Approve dependency: `kafkajs` (optional Kafka adapter for webhooks)
|
|
- [x] A0.5 Approve dependency: `node-forge` (column-level encryption for SOC 2)
|
|
|
|
---
|
|
|
|
## Workstream 1: Multi-Tenancy
|
|
|
|
- [x] 1.1 Write `src/db/migrations/006_create_organizations_table.sql` — organizations table with slug, plan_tier, max_agents, max_tokens_per_month, status
|
|
- [x] 1.2 Write `src/db/migrations/007_create_organization_members_table.sql` — organization_members with agent_id FK and role
|
|
- [x] 1.3 Write `src/db/migrations/008_add_organization_id_to_agents.sql` — add organization_id column + index + RLS policy on agents
|
|
- [x] 1.4 Write `src/db/migrations/009_add_organization_id_to_credentials.sql` — add organization_id column + index + RLS policy on credentials
|
|
- [x] 1.5 Write `src/db/migrations/010_add_organization_id_to_audit_logs.sql` — add organization_id column + index + RLS policy on audit_logs
|
|
- [x] 1.6 Write `src/db/migrations/011_seed_system_organization.sql` — insert default system org and backfill existing rows
|
|
- [x] 1.7 Write `src/types/organization.ts` — IOrganization, ICreateOrgRequest, IUpdateOrgRequest, IOrgMember, IPaginatedOrgsResponse, OrgStatus, PlanTier interfaces
|
|
- [x] 1.8 Write `src/services/OrgService.ts` — createOrg, listOrgs, getOrg, updateOrg, deleteOrg, addMember; all methods accept organizationId context
|
|
- [x] 1.9 Write `src/controllers/OrgController.ts` — request parsing and validation for all 6 org endpoints
|
|
- [x] 1.10 Write `src/routes/organizations.ts` — mount all 6 org endpoints with admin:orgs scope guard
|
|
- [x] 1.11 Write `src/middleware/orgContext.ts` — OrgContextMiddleware: extracts organization_id from JWT and calls SET app.organization_id before each DB query
|
|
- [x] 1.12 Update `src/middleware/auth.ts` — extend ITokenPayload with organization_id claim; backfill from DEFAULT_ORG_ID for backward compat
|
|
- [x] 1.13 Update `src/services/AgentService.ts` — organizationId propagated via RLS session variable (orgContext middleware)
|
|
- [x] 1.14 Update `src/services/CredentialService.ts` — organizationId propagated via RLS session variable
|
|
- [x] 1.15 Update `src/services/AuditService.ts` — organizationId propagated via RLS session variable
|
|
- [x] 1.16 Update `src/services/OAuth2Service.ts` — include organization_id claim in issued JWT payload
|
|
- [x] 1.17 Update `src/types/index.ts` — extend ITokenPayload with organization_id field, admin:orgs scope, org audit actions
|
|
- [x] 1.18 Update OPA policy `policies/authz.rego` + `policies/data/scopes.json` — 6 new org endpoint → admin:orgs mappings
|
|
- [x] 1.19 Write unit tests for OrgService (CRUD, member management, org isolation)
|
|
- [x] 1.20 Write integration tests — all 6 /organizations endpoints, cross-org isolation via RLS
|
|
- [x] 1.21 QA sign-off: 373 tests passing, 80.64% branch coverage, zero `any`, TypeScript clean
|
|
|
|
---
|
|
|
|
## Workstream 2: W3C DIDs
|
|
|
|
- [x] 2.1 Write `src/db/migrations/012_create_agent_did_keys_table.sql` — agent_did_keys table with public_key_jwk JSONB and vault_key_path
|
|
- [x] 2.2 Write `src/db/migrations/013_add_did_columns_to_agents.sql` — add did and did_created_at columns to agents
|
|
- [x] 2.3 Write `src/types/did.ts` — IDIDDocument, IVerificationMethod, IDIDService, IDIDResolutionResult, IAgentCard interfaces
|
|
- [x] 2.4 Write `src/services/DIDService.ts` — generateDIDForAgent (EC P-256 key pair, Vault storage, public key in DB), buildInstanceDIDDocument, buildAgentDIDDocument, buildAgentCard, buildResolutionResult
|
|
- [x] 2.5 Update `src/services/AgentService.ts` — call DIDService.generateDIDForAgent on every new agent registration
|
|
- [x] 2.6 Write `src/controllers/DIDController.ts` — handlers for root DID Document, per-agent DID Document (410 for decommissioned), resolution endpoint, agent card
|
|
- [x] 2.7 Write `src/routes/did.ts` — createDIDRouter for `/agents/:id/did`, `/did/resolve`, `/did/card`; `/.well-known/did.json` registered in app.ts
|
|
- [x] 2.8 Implement Redis caching in DIDService — cache DID Documents with TTL from DID_DOCUMENT_CACHE_TTL_SECONDS (default 300s)
|
|
- [x] 2.9 Handle decommissioned agents — deactivated: true in metadata; HTTP 410 Gone from DIDController
|
|
- [x] 2.10 Write unit tests for DIDService — 39 tests, 98.93% coverage; private key security asserted
|
|
- [x] 2.11 Write integration tests — all 4 DID endpoints; 22 tests
|
|
- [x] 2.12 QA sign-off: 429 tests passing, 98.93% DIDService coverage, private key never in response, zero `any`
|
|
|
|
---
|
|
|
|
## Workstream 3: OpenID Connect (OIDC)
|
|
|
|
- [ ] 3.1 Write `src/db/migrations/014_create_oidc_keys_table.sql` — oidc_keys table with kid, public_key_jwk, vault_key_path, is_current
|
|
- [ ] 3.2 Write `src/services/OIDCKeyService.ts` — generateSigningKeyPair (RSA-2048 or EC P-256), storeKeyInVault, getPublicJWKS, getCurrentKeyId, rotateKey
|
|
- [ ] 3.3 Write `src/services/IDTokenService.ts` — buildIDTokenClaims (agent claims), signIDToken using current Vault-stored key, verifyIDToken
|
|
- [ ] 3.4 Write `src/types/oidc.ts` — IIDTokenClaims, IJWKSResponse, IOIDCDiscoveryDocument, IAgentInfoResponse interfaces
|
|
- [ ] 3.5 Write `src/controllers/OIDCController.ts` — handlers for discovery, JWKS, agent-info
|
|
- [ ] 3.6 Write `src/routes/oidc.ts` — mount `/.well-known/openid-configuration`, `/.well-known/jwks.json`, `/agent-info`
|
|
- [ ] 3.7 Update `src/services/OAuth2Service.ts` — when `openid` scope is present in request, generate and append `id_token` to token response
|
|
- [ ] 3.8 Implement JWKS caching — cache JWKS in Redis with TTL; invalidate on key rotation
|
|
- [ ] 3.9 Implement key rotation logic — on rotation, old key remains in JWKS until all tokens signed with it have expired
|
|
- [ ] 3.10 Write unit tests for OIDCKeyService and IDTokenService — key generation, token signing, JWKS format
|
|
- [ ] 3.11 Write integration tests — POST /oauth2/token with `openid` scope returns id_token; validate id_token against JWKS; GET /agent-info returns correct claims
|
|
- [ ] 3.12 QA sign-off: OIDC discovery document passes conformance checks, id_token verifiable, `alg: none` rejected, zero `any`, >80% coverage
|
|
|
|
---
|
|
|
|
## Workstream 4: AGNTCY Federation
|
|
|
|
- [ ] 4.1 Write `src/db/migrations/015_create_federation_partners_table.sql` — federation_partners table with issuer, jwks_uri, allowed_organizations JSONB, status, expires_at
|
|
- [ ] 4.2 Write `src/types/federation.ts` — IFederationPartner, ICreatePartnerRequest, IVerifyFederatedTokenRequest, IFederationVerifyResult interfaces
|
|
- [ ] 4.3 Write `src/services/FederationService.ts` — registerPartner (validates by fetching JWKS), listPartners, deletePartner, verifyFederatedToken (fetch-or-cache JWKS, verify signature, validate claims)
|
|
- [ ] 4.4 Implement JWKS caching in FederationService — store partner JWKS in Redis with TTL configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS
|
|
- [ ] 4.5 Write `src/controllers/FederationController.ts` — handlers for POST /federation/trust, GET /federation/partners, DELETE /federation/partners/:id, POST /federation/verify
|
|
- [ ] 4.6 Write `src/routes/federation.ts` — mount all 4 federation endpoints
|
|
- [ ] 4.7 Implement partner expiry check — partners past `expires_at` are treated as status `expired`; their tokens rejected
|
|
- [ ] 4.8 Implement `allowedOrganizations` filter — reject tokens whose `organization_id` is not in the allow list (if list is non-empty)
|
|
- [ ] 4.9 Write unit tests for FederationService — trust registration, token verification (valid/expired/untrusted/tampered), JWKS cache behavior
|
|
- [ ] 4.10 Write integration tests — end-to-end: register partner, verify a valid token from that partner, verify rejection for unknown issuer
|
|
- [ ] 4.11 QA sign-off: tampered token rejected, expired partner rejected, JWKS cache verified, zero `any`, >80% coverage
|
|
|
|
---
|
|
|
|
## Workstream 5: Webhooks and Event Streaming
|
|
|
|
- [ ] 5.1 Write `src/db/migrations/016_create_webhook_subscriptions_table.sql` — webhook_subscriptions with url, events JSONB, secret_hash, vault_secret_path, active, failure_count
|
|
- [ ] 5.2 Write `src/db/migrations/017_create_webhook_deliveries_table.sql` — webhook_deliveries with status, http_status_code, attempt_count, next_retry_at
|
|
- [ ] 5.3 Write `src/types/webhook.ts` — IWebhookSubscription, ICreateWebhookRequest, IWebhookDelivery, IWebhookPayload, WebhookEventType interfaces
|
|
- [ ] 5.4 Write `src/services/WebhookService.ts` — createSubscription (store secret in Vault), listSubscriptions, getSubscription, updateSubscription, deleteSubscription, listDeliveries
|
|
- [ ] 5.5 Write `src/workers/WebhookDeliveryWorker.ts` — bull queue worker: fetch subscription, compute HMAC-SHA256 signature, POST to URL with headers, update delivery status, schedule retry on failure
|
|
- [ ] 5.6 Write `src/services/EventPublisher.ts` — buildEventPayload, publishEvent (enqueues to bull queue; also produces to Kafka if KAFKA_BROKERS is set)
|
|
- [ ] 5.7 Update `src/services/AgentService.ts` — call EventPublisher.publishEvent for: agent.created, agent.updated, agent.suspended, agent.reactivated, agent.decommissioned
|
|
- [ ] 5.8 Update `src/services/CredentialService.ts` — call EventPublisher.publishEvent for: credential.generated, credential.rotated, credential.revoked
|
|
- [ ] 5.9 Update `src/services/OAuth2Service.ts` — call EventPublisher.publishEvent for: token.issued, token.revoked
|
|
- [ ] 5.10 Write `src/controllers/WebhookController.ts` — handlers for all 6 webhook endpoints
|
|
- [ ] 5.11 Write `src/routes/webhooks.ts` — mount all 6 webhook endpoints with correct scope guards
|
|
- [ ] 5.12 Implement SSRF protection in WebhookDeliveryWorker — reject delivery to RFC 1918 addresses, loopback, and link-local ranges
|
|
- [ ] 5.13 Implement dead-letter handling — after max retries, set status to dead_letter and increment `agentidp_webhook_dead_letters_total` Prometheus metric
|
|
- [ ] 5.14 Write `src/adapters/KafkaAdapter.ts` — optional Kafka producer; activated only when KAFKA_BROKERS env var is set
|
|
- [ ] 5.15 Write unit tests for WebhookService, WebhookDeliveryWorker, EventPublisher — HMAC computation, retry schedule, dead-letter logic
|
|
- [ ] 5.16 Write integration tests — create subscription, trigger an event, verify delivery; verify SSRF rejection; verify retry on 5xx response
|
|
- [ ] 5.17 QA sign-off: HMAC verifiable, SSRF protection active, retry schedule correct, dead-letter metric fires, zero `any`, >80% coverage
|
|
|
|
---
|
|
|
|
## Workstream 6: SOC 2 Type II Preparation
|
|
|
|
- [ ] 6.1 Enable `pgcrypto` PostgreSQL extension in `src/db/migrations/018_enable_pgcrypto.sql`
|
|
- [ ] 6.2 Write `src/services/EncryptionService.ts` — AES-256-CBC encrypt/decrypt using key from Vault; methods: encryptColumn, decryptColumn, isEncrypted
|
|
- [ ] 6.3 Write `src/db/migrations/019_encrypt_sensitive_columns.sql` — re-encrypt existing credentials.secret_hash and credentials.vault_path values using EncryptionService (migration script)
|
|
- [ ] 6.4 Update `src/services/CredentialService.ts` — all reads/writes of secret_hash and vault_path go through EncryptionService
|
|
- [ ] 6.5 Update `src/services/WebhookService.ts` — vault_secret_path column encrypted via EncryptionService
|
|
- [ ] 6.6 Update `src/services/DIDService.ts` — vault_key_path in agent_did_keys encrypted via EncryptionService
|
|
- [ ] 6.7 Write `src/middleware/TLSEnforcementMiddleware.ts` — redirect HTTP to HTTPS in production using X-Forwarded-Proto header; passthrough in development
|
|
- [ ] 6.8 Register TLSEnforcementMiddleware in `src/app.ts` — first in middleware stack
|
|
- [ ] 6.9 Write `src/db/migrations/020_add_audit_chain_columns.sql` — add hash and previous_hash columns to audit_logs; add immutability trigger; backfill chain for existing rows
|
|
- [ ] 6.10 Update `src/services/AuditService.ts` — compute Merkle hash on every insert: hash = SHA-256(eventId + timestamp + action + outcome + agentId + organizationId + previousHash)
|
|
- [ ] 6.11 Write `src/services/AuditVerificationService.ts` — verifyChain(fromDate?, toDate?): reads rows in order, recomputes hashes, returns IChainVerificationResult
|
|
- [ ] 6.12 Write `src/jobs/SecretsRotationJob.ts` — cron job: identify expiring credentials, emit `agentidp_credentials_expiring_soon_total` metric, renew Vault leases
|
|
- [ ] 6.13 Write `src/jobs/AuditChainVerificationJob.ts` — cron job: runs verifyChain on a schedule, sets `agentidp_audit_chain_integrity` Prometheus gauge to 1 (pass) or 0 (fail)
|
|
- [ ] 6.14 Write `src/controllers/ComplianceController.ts` — handlers for GET /audit/verify and GET /compliance/controls
|
|
- [ ] 6.15 Write `src/routes/compliance.ts` — mount /audit/verify (rate-limited) and /compliance/controls
|
|
- [ ] 6.16 Write `monitoring/prometheus/alerts.yml` — all 6 alerting rules: AuthFailureSpike, RateLimitExhaustion, AnomalousTokenIssuance, WebhookDeadLetterAccumulating, AuditChainIntegrityFailed, CredentialExpiryApproaching
|
|
- [ ] 6.17 Update `monitoring/prometheus/prometheus.yml` — add alerting rules file reference
|
|
- [ ] 6.18 Write compliance documentation package: `docs/compliance/soc2-controls-matrix.md` (Trust Services Criteria → controls map), `docs/compliance/encryption-runbook.md` (key rotation procedure), `docs/compliance/audit-log-runbook.md` (chain verification guide)
|
|
- [ ] 6.19 Write operational runbooks: `docs/compliance/incident-response.md` (security event procedures), `docs/compliance/secrets-rotation.md` (credential and signing key rotation guide)
|
|
- [ ] 6.20 Write unit tests for EncryptionService (encrypt/decrypt round-trip, Vault key fetch) and AuditVerificationService (intact chain, tampered chain with correct brokenAtEventId)
|
|
- [ ] 6.21 Write integration tests — TLS enforcement verified, encrypted columns not plaintext-readable in direct DB query, chain verification returns correct results
|
|
- [ ] 6.22 QA sign-off: all 5 controls pass GET /compliance/controls, all 6 Prometheus alerts valid, zero `any`, >80% coverage
|
|
|
|
---
|
|
|
|
## Phase 3 Complete Criteria
|
|
|
|
All 6 workstreams done. All tasks checked. All QA gates passed. CEO reviewed. SOC 2 audit window begins.
|