Files
SentryAgent.ai Developer 7593bfe1c1 chore: Phase 2 OpenSpec scoping — proposal, design, specs, tasks
8 workstreams scoped per OpenSpec standards:
1. HashiCorp Vault integration (secret management)
2. Python SDK (sentryagent-idp)
3. Go SDK (idp-sdk-go)
4. Java SDK (ai.sentryagent:idp-sdk)
5. OPA policy engine (dynamic ABAC, hot-reload Rego)
6. Web Dashboard UI (React 18 + TypeScript)
7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard)
8. Multi-region Terraform deployment (AWS + GCP)

Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5)
before any implementation begins.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 14:53:09 +00:00

8.0 KiB

Phase 2: Production-Ready — Technical Design

Date: 2026-03-28 Author: Virtual Architect Status: Draft — pending CEO approval of proposal


1. HashiCorp Vault Integration

Architecture

AgentIdP Server
  └── CredentialService
        └── VaultClient (new)
              └── HashiCorp Vault (sidecar or external)
                    └── KV Secrets Engine v2

Design Decisions

ADR-001: Vault over AWS KMS/GCP Secret Manager Vault is cloud-agnostic, open-source, and already standard in enterprise environments. Using Vault keeps Phase 2 cloud-provider independent.

ADR-002: KV Secrets Engine v2 KV v2 provides versioned secrets and metadata. When a credential is rotated, the old version is retained in Vault history, enabling audit-grade secret lifecycle tracking.

ADR-003: AgentIdP stores Vault path, not secret credentials.vault_path stores the Vault KV path (e.g. secret/agentidp/agents/{agentId}/credentials/{credentialId}). The secret itself is never written to PostgreSQL.

New environment variables

Variable Description
VAULT_ADDR Vault server address
VAULT_TOKEN Vault root/service token
VAULT_MOUNT KV mount path (default: secret)

Migration

Add vault_path column to credentials table (005_add_vault_path.sql). Existing credentials retain bcrypt hashes; new credentials use Vault. Both code paths coexist until all credentials are rotated (migration guide provided).


2. Multi-Language SDKs

Shared contract (all SDKs implement identically)

AgentIdPClient(baseUrl, clientId, clientSecret, scopes?)
  .agents     → AgentRegistryClient   (5 methods)
  .credentials → CredentialClient     (4 methods)
  .tokens     → TokenClient           (2 methods)
  .audit      → AuditClient           (2 methods)
  .clearTokenCache()

TokenManager — auto-refresh 60s before expiry
AgentIdPError — code, message, httpStatus, details

Python SDK (sentryagent-idp)

  • Python 3.9+ (httpx for async, requests for sync)
  • Both sync and async client variants
  • PyPI package: sentryagent-idp
  • Type hints throughout (mypy --strict clean)

Go SDK (github.com/sentryagent/idp-sdk-go)

  • Go 1.21+, standard library net/http
  • Context-aware methods (context.Context first arg)
  • Idiomatic Go error handling (error return, no panic)
  • Go module: github.com/sentryagent/idp-sdk-go

Java SDK (ai.sentryagent:idp-sdk)

  • Java 17+, Apache HttpClient 5
  • Synchronous and CompletableFuture async variants
  • Maven Central: ai.sentryagent:idp-sdk
  • Fully typed with generics

3. OPA Policy Engine

Architecture

HTTP Request
  → Auth Middleware (JWT verify) — unchanged
  → OPA Middleware (new) — evaluates policy
      → OPA Wasm (embedded, no network call)
          → Rego policy files (hot-reloadable)
  → Controller

Design Decisions

ADR-004: OPA Wasm over OPA sidecar Embedding OPA as Wasm in the Node.js process eliminates a network hop and removes a runtime dependency. Policy files are loaded from policies/ directory at startup and reloaded on SIGHUP.

ADR-005: Policy replaces, does not wrap, scope check The existing static scope check in auth.ts is replaced by an OPA policy evaluation. This keeps the policy as the single source of truth for access control.

Policy structure (policies/)

policies/
  authz.rego          — main policy: allow/deny
  data/
    scopes.json       — scope → permission mapping

4. Web Dashboard UI

Architecture

dashboard/            (new — separate from sdk/)
  src/
    components/       — reusable UI components
    pages/            — Agents, Credentials, Audit, Health
    hooks/            — useAgents, useCredentials, useAudit
    lib/
      client.ts       — wraps @sentryagent/idp-sdk
      auth.ts         — credential entry and storage

Tech Stack

  • React 18 + TypeScript strict
  • Vite 5 (build tool)
  • TanStack Query v5 (server state)
  • shadcn/ui components (Radix UI + Tailwind CSS)

Pages

Page Scope Required Features
Agents agents:read List, search, view detail, suspend/reactivate
Credentials agents:read List credentials per agent, rotate, revoke
Audit Log audit:read Filter by agent/action/outcome/date, paginate
Health None Server uptime, Redis/PostgreSQL connectivity

Authentication

The dashboard accepts clientId + clientSecret via a login form. The @sentryagent/idp-sdk TokenManager handles token acquisition and caching in sessionStorage. No backend session — all state is client-side.


5. Prometheus + Grafana Monitoring

Metrics exposed at GET /metrics

Metric Type Description
agentidp_tokens_issued_total Counter Tokens issued, labelled by outcome
agentidp_agents_registered_total Counter Agent registrations
agentidp_http_requests_total Counter All requests, labelled by method/path/status
agentidp_http_request_duration_seconds Histogram Request latency
agentidp_rate_limit_rejections_total Counter 429 responses
agentidp_db_query_duration_seconds Histogram PostgreSQL query latency
agentidp_redis_command_duration_seconds Histogram Redis command latency

Grafana dashboard

Pre-built JSON dashboard shipped in monitoring/grafana/dashboards/agentidp.json. Auto-provisioned via monitoring/grafana/provisioning/.

Docker Compose extension

Add prometheus and grafana services to a docker-compose.monitoring.yml overlay — keeps the base docker-compose.yml clean for developers who don't need monitoring.


6. Multi-Region Deployment (Terraform)

Structure

terraform/
  modules/
    agentidp/         — reusable module: compute + networking
    rds/              — managed PostgreSQL
    redis/            — managed Redis
    lb/               — load balancer + TLS
  environments/
    aws/              — AWS-specific config (ECS + RDS + ElastiCache)
    gcp/              — GCP-specific config (Cloud Run + Cloud SQL + Memorystore)

Design Decisions

ADR-006: Two provider targets (AWS + GCP) in Phase 2 AWS and GCP cover the majority of developer deployments. Azure module is Phase 3. Each environment is a thin wrapper over the shared agentidp module.

ADR-007: Terraform over Pulumi/CDK Terraform is the most widely-used IaC tool, familiar to most DevOps teams. The HCL syntax is simpler for documentation purposes.


Component Interaction Map (Phase 2)

                      ┌────────────────────┐
                      │   Web Dashboard    │
                      │  (React + Vite)    │
                      └────────┬───────────┘
                               │ HTTPS
              ┌────────────────▼────────────────┐
              │         AgentIdP Server         │
              │  Auth MW → OPA MW → Controllers │
              │  /metrics (prom-client)         │
              └──┬──────────┬──────────┬────────┘
                 │          │          │
           ┌─────▼──┐  ┌────▼───┐  ┌──▼───────┐
           │Postgres│  │ Redis  │  │  Vault   │
           └────────┘  └────────┘  └──────────┘
                 │
        ┌────────▼────────┐
        │   Prometheus    │
        └────────┬────────┘
                 │
        ┌────────▼────────┐
        │    Grafana      │
        └─────────────────┘