docs: engineering knowledge base for new hires

Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 12:38:42 +00:00
parent 1f95cfe89d
commit eced5f8699
13 changed files with 3820 additions and 0 deletions
--- a/docs/engineering/01-overview.md
+++ b/docs/engineering/01-overview.md
@@ -0,0 +1,115 @@
+# SentryAgent.ai — Company and Product Overview
+
+---
+
+## 1. Company Mission
+
+SentryAgent.ai is building the world's first free, open-source Agent Identity Provider
+(AgentIdP) — democratising AI agent authentication, authorisation, and governance for
+developers worldwide. The core problem it solves is one that did not have a standard
+answer until now: when an AI agent needs to call an API, how does it prove who it is?
+How does it obtain a short-lived token? How does a security team revoke its access the
+moment it is compromised? How does compliance require a full, tamper-proof record of
+everything that agent ever did? Traditional identity infrastructure — built for humans
+and static service accounts — was not designed for the fluid lifecycle of AI agents.
+
+AgentIdP is the answer. It is a REST API server that acts as an identity provider
+designed specifically for non-human AI agents. Agents register with a stable UUID
+identity, authenticate via the OAuth 2.0 Client Credentials grant (RFC 6749), receive
+short-lived RS256 JWTs, and are governed by an OPA policy engine that enforces
+capability-based access control at runtime. Every significant event is written to an
+immutable audit log. The entire system is built on open standards: OAuth 2.0, RFC 7662
+(introspection), RFC 7009 (revocation), OpenAPI 3.0, and the AGNTCY interoperability
+standard from the Linux Foundation.
+
+The market context is one of rapid proliferation. Enterprises are deploying dozens,
+then hundreds, then thousands of autonomous AI agents — each one acting on behalf of
+the organisation, calling APIs, reading sensitive data, and making decisions. Without
+standardised identity infrastructure, there is no way to audit what happened, no way
+to revoke a compromised agent cleanly, and no standard protocol for agents from
+different vendors to authenticate to each other. SentryAgent.ai fills this gap,
+providing every developer — from a student working alone to a global enterprise — the
+same enterprise-grade identity layer for free.
+
+---
+
+## 2. What is AGNTCY?
+
+AGNTCY (pronounced "agency") is an open interoperability standard for AI agents,
+maintained under the Linux Foundation. Its central premise is that AI agents must be
+treated as first-class identities — with stable identifiers, standard authentication
+protocols, lifecycle management, and accountability mechanisms — in the same way that
+human users and cloud service accounts are today.
+
+AgentIdP is the first production IdP implementing AGNTCY-aligned agent identity across
+all six AGNTCY domains:
+
+| AGNTCY Domain | How AgentIdP Implements It |
+|---------------|---------------------------|
+| Non-Human Identity | Every agent receives an immutable UUID (`agentId`) assigned at registration. The identifier is DID-ready — structured to be portable into W3C DID documents in Phase 3. |
+| Agent Registry | `POST /api/v1/agents` registers an agent. `GET /api/v1/agents` lists all agents. `GET /api/v1/agents/:id` retrieves a single agent by UUID. |
+| Credential Management | Each agent holds one or more `(client_id, client_secret)` credential pairs. Secrets are bcrypt-hashed in PostgreSQL or stored in HashiCorp Vault KV v2. Credentials can be rotated and revoked independently. |
+| Authentication | OAuth 2.0 Client Credentials grant per RFC 6749. Agents POST `grant_type=client_credentials` with their `client_id` and `client_secret` to receive a signed RS256 JWT. |
+| Lifecycle | Agents transition through `active`, `suspended`, and `decommissioned` states. Decommissioning is a soft delete that cascades to revoke all active credentials. Suspended agents cannot obtain new tokens. |
+| Audit | Every significant platform event is written to an immutable `audit_events` table in PostgreSQL. Events carry `agentId`, `action`, `outcome`, `ipAddress`, `userAgent`, `metadata`, and `timestamp`. The free tier retains 90 days of history. |
+
+---
+
+## 3. Product Features
+
+| Feature | Endpoint(s) | Notes |
+|---------|-------------|-------|
+| Agent Registry | `POST /api/v1/agents`, `GET /api/v1/agents`, `GET /api/v1/agents/:id`, `PATCH /api/v1/agents/:id`, `DELETE /api/v1/agents/:id` | Full CRUD with lifecycle; free tier capped at 100 registered agents |
+| OAuth 2.0 Token Issuance | `POST /api/v1/token` | Client Credentials flow (RFC 6749); issues RS256 JWTs; free tier capped at 10,000 tokens/month |
+| Token Introspection | `POST /api/v1/token/introspect` | RFC 7662 compliant; always returns 200, check `active` field |
+| Token Revocation | `POST /api/v1/token/revoke` | RFC 7009 compliant; idempotent; agents may only revoke their own tokens |
+| Credential Management | `POST /api/v1/agents/:id/credentials`, `GET /api/v1/agents/:id/credentials`, `DELETE /api/v1/agents/:id/credentials/:credId` | `client_secret` returned once on creation; never retrievable again |
+| Credential Rotation | `POST /api/v1/agents/:id/credentials/:credId/rotate` | Generates new secret; old secret immediately invalidated; atomic |
+| Audit Log | `GET /api/v1/audit`, `GET /api/v1/audit/:id` | Immutable, filterable by `agentId`, `action`, `outcome`, date range; paginated |
+| Web Dashboard | `/dashboard` | React 18 SPA — agents list, agent detail, credentials management, audit log, health views |
+| OPA Policy Engine | (middleware on all protected routes) | Dynamic scope-based authorisation; Rego policy in `policies/authz.rego`; hot-reload via SIGHUP |
+| Prometheus Metrics | `GET /metrics` | prom-client; all HTTP routes instrumented with request counter and duration histogram |
+| HashiCorp Vault | (opt-in, via `VAULT_ADDR` + `VAULT_TOKEN`) | KV v2 secret storage; constant-time comparison; bcrypt fallback when Vault is not configured |
+| Health Check | `GET /health` | Checks PostgreSQL and Redis connectivity; unauthenticated; used by load balancers |
+
+---
+
+## 4. Phase Roadmap
+
+| Phase | Status | Key Deliverables |
+|-------|--------|-----------------|
+| Phase 1 — MVP | COMPLETE | Agent registry, OAuth 2.0 Client Credentials (RS256 JWTs), credential management (bcrypt), immutable audit log, Node.js SDK, Dockerfile, Docker Compose, AGNTCY alignment documentation, >80% test coverage |
+| Phase 2 — Production-Ready | COMPLETE | HashiCorp Vault opt-in integration, Python SDK (sync + async), Go SDK (context-aware), Java SDK (builder + CompletableFuture), OPA policy engine (Rego + Wasm + TypeScript fallback), React 18 + Vite 5 web dashboard, Prometheus metrics + Grafana dashboards, Terraform multi-region deployment (AWS ECS + RDS + ElastiCache; GCP Cloud Run + Cloud SQL + Memorystore) |
+| Phase 3 — Enterprise | PLANNED | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, advanced compliance reporting, SOC 2 Type II certification, enterprise tier (custom retention, SLAs, advanced RBAC) |
+
+---
+
+## 5. Virtual Engineering Team
+
+SentryAgent.ai uses a Virtual Engineering Team (VET) model — all engineering work is
+designed, implemented, tested, and reviewed by Claude Code instances fulfilling defined
+engineering roles. The CEO (human) is the sole business decision-maker. The CTO (Claude)
+owns technical architecture and manages the engineering team autonomously. The team
+follows a strict spec-first workflow governed by the OpenSpec change management process:
+no implementation begins until an OpenAPI specification is approved by the CTO.
+
+| Role | Responsibility | Approval Gate |
+|------|---------------|---------------|
+| CEO | Business priorities, scope approval, architecture approval | All scope changes, new dependencies, git push to main |
+| Virtual CTO | Architecture, technical standards, engineering team coordination, risk management | Reports to CEO; approves all implementation before commit; approves all QA sign-offs before merge |
+| Virtual Architect | OpenAPI specs, ADRs, system design, database schemas | CTO review required before implementation begins |
+| Virtual Principal Developer | TypeScript implementation per approved spec; JSDoc; zero `any` types | CTO review required before QA begins |
+| Virtual QA Engineer | Jest + Supertest test suites; >80% coverage; all quality gates | All gates must pass before CTO signs off for merge |
+
+---
+
+## 6. Free Tier Limits
+
+| Limit | Value |
+|-------|-------|
+| Max agents | 100 |
+| Max credentials per agent | No hard cap enforced in code (5 is the documented recommendation) |
+| Max tokens in flight | 10,000 per agent per calendar month |
+| Token TTL | 3,600 seconds (1 hour) |
+| Audit log retention | 90 days |
+| API rate limit | 100 requests per minute per IP address |