docs: engineering knowledge base for new hires
Complete docs/engineering/ suite — 12 documents covering company overview, system architecture, tech stack ADRs, codebase structure, service deep dives, annotated code walkthroughs, dev setup, engineering workflow, testing strategy, deployment/ops, SDK guide, and README index. All content verified against source files. All 82 tasks in openspec/changes/engineering-docs/tasks.md marked complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
342
docs/engineering/05-services.md
Normal file
342
docs/engineering/05-services.md
Normal file
@@ -0,0 +1,342 @@
|
||||
# Service Deep Dives
|
||||
|
||||
---
|
||||
|
||||
### AgentService
|
||||
|
||||
**Purpose**: Manages the full lifecycle of AI agent identities — registration, retrieval, updates, and decommissioning.
|
||||
|
||||
**Responsibility boundary**: AgentService does not handle HTTP, credential secrets,
|
||||
token issuance, or audit log queries. It delegates all data access to
|
||||
`AgentRepository` and `CredentialRepository`, and all audit logging to `AuditService`.
|
||||
It enforces free-tier limits and domain rules before any data is written.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `registerAgent` | `data: ICreateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Checks the free-tier 100-agent limit, enforces email uniqueness, creates the agent record, writes an `agent.created` audit event, increments `agentidp_agents_registered_total` Prometheus counter |
|
||||
| `getAgentById` | `agentId: string` | `Promise<IAgent>` | Retrieves a single agent by UUID; throws `AgentNotFoundError` if not found |
|
||||
| `listAgents` | `filters: IAgentListFilters` | `Promise<IPaginatedAgentsResponse>` | Returns a paginated, optionally filtered list; filters include `owner`, `agentType`, `status`, `page`, `limit` |
|
||||
| `updateAgent` | `agentId: string, data: IUpdateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Partially updates agent metadata; rejects updates to decommissioned agents; determines the correct audit action (`agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`) based on status transition |
|
||||
| `decommissionAgent` | `agentId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Soft-deletes the agent (sets `status = 'decommissioned'`); revokes all active credentials by calling `credentialRepository.revokeAllForAgent(agentId)` before decommissioning |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `agents`: `agent_id` (UUID PK), `email` (UNIQUE), `agent_type`, `version`, `capabilities` (text array), `owner`, `deployment_env`, `status`, `created_at`, `updated_at`.
|
||||
- No Redis usage — AgentService is PostgreSQL-only.
|
||||
|
||||
**Error types**:
|
||||
- `FreeTierLimitError` (403) — 100-agent limit reached
|
||||
- `AgentAlreadyExistsError` (409) — email already registered
|
||||
- `AgentNotFoundError` (404) — agent UUID not found
|
||||
- `AgentAlreadyDecommissionedError` (409) — agent is already decommissioned
|
||||
|
||||
**Configuration**: None — AgentService reads no environment variables. The free-tier limit (`FREE_TIER_MAX_AGENTS = 100`) is a module-level constant.
|
||||
|
||||
---
|
||||
|
||||
### OAuth2Service
|
||||
|
||||
**Purpose**: Issues, introspects, and revokes RS256 JWT access tokens via the OAuth 2.0 Client Credentials grant.
|
||||
|
||||
**Responsibility boundary**: OAuth2Service does not know about HTTP or routing. It
|
||||
receives already-extracted values (`clientId`, `clientSecret`, `scope`) from the
|
||||
controller, resolves credential verification (Vault or bcrypt), enforces the 10,000
|
||||
tokens/month free-tier limit, and returns a typed `ITokenResponse`. All audit writes
|
||||
on high-throughput paths (issue, introspect, revoke) are fire-and-forget (`void`) to
|
||||
keep token endpoint latency low.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `issueToken` | `clientId: string, clientSecret: string, scope: string, ipAddress: string, userAgent: string` | `Promise<ITokenResponse>` | Verifies credentials (Vault or bcrypt), checks agent status, enforces 10k/month limit, signs RS256 JWT, increments monthly counter and audit event as fire-and-forget |
|
||||
| `introspectToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<IIntrospectResponse>` | Verifies JWT signature and checks Redis revocation list; always returns 200 with `active: true/false` per RFC 7662 |
|
||||
| `revokeToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<void>` | Decodes token without verification; enforces that caller can only revoke their own tokens (`decoded.sub === callerPayload.sub`); adds JTI to Redis revocation list with TTL matching token expiry |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Redis key `revoked:{jti}` — value `1`, TTL = seconds until token expiry. Written on revocation; read on every authenticated request via `authMiddleware`.
|
||||
- Redis key `monthly:tokens:{agentId}:{yyyy-mm}` — integer counter, incremented on every successful token issuance. Read to enforce the 10k/month free-tier limit.
|
||||
|
||||
**Error types**:
|
||||
- `AuthenticationError` (401) — agent not found, or no active credential matches the provided secret
|
||||
- `AuthorizationError` (403) — agent is suspended or decommissioned; or caller attempts to revoke another agent's token
|
||||
- `FreeTierLimitError` (403) — 10,000 tokens/month limit reached
|
||||
|
||||
**Configuration**:
|
||||
- `JWT_PRIVATE_KEY` — PEM-encoded RSA private key, required, read at app startup in `src/app.ts`
|
||||
- `JWT_PUBLIC_KEY` — PEM-encoded RSA public key, required, read at app startup and in `authMiddleware`
|
||||
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, Vault is used for credential verification instead of bcrypt
|
||||
|
||||
---
|
||||
|
||||
### CredentialService
|
||||
|
||||
**Purpose**: Manages the full lifecycle of agent credentials — generation, listing, rotation, and revocation.
|
||||
|
||||
**Responsibility boundary**: CredentialService does not know about HTTP or token
|
||||
issuance. It enforces that credentials can only be generated for `active` agents. It
|
||||
delegates secret storage to either `VaultClient` (Phase 2) or bcrypt (Phase 1 fallback).
|
||||
The plain-text `clientSecret` is generated here, returned once in the response, and
|
||||
never stored or logged — only the bcrypt hash or Vault path is persisted.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `generateCredential` | `agentId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Verifies agent exists and is `active`; generates a cryptographically random secret via `generateClientSecret()`; writes to Vault (when configured) or hashes with bcrypt; returns `ICredentialWithSecret` — the only time the plain-text secret is returned |
|
||||
| `listCredentials` | `agentId: string, filters: ICredentialListFilters` | `Promise<IPaginatedCredentialsResponse>` | Returns paginated credentials for an agent; `clientSecret` is never included in list responses |
|
||||
| `rotateCredential` | `agentId: string, credentialId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Generates a new secret for the same `credentialId`; overwrites Vault entry (new KV v2 version) or updates bcrypt hash; old secret is immediately invalidated; returns new `ICredentialWithSecret` once |
|
||||
| `revokeCredential` | `agentId: string, credentialId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Sets credential `status = 'revoked'`; permanently deletes the Vault secret via `vaultClient.deleteSecret()` when Vault is configured; idempotent rejection of already-revoked credentials with `CredentialAlreadyRevokedError` |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `credentials`: `credential_id` (UUID PK), `client_id` (= `agentId`, FK to `agents`), `secret_hash` (bcrypt hash; empty string when Vault path is set), `vault_path` (nullable — KV v2 data path), `status`, `created_at`, `expires_at` (nullable), `revoked_at` (nullable).
|
||||
|
||||
**Error types**:
|
||||
- `AgentNotFoundError` (404) — agent UUID not found
|
||||
- `CredentialError` (400) — agent is not in `active` status (code: `AGENT_NOT_ACTIVE`)
|
||||
- `CredentialNotFoundError` (404) — credential not found or belongs to a different agent
|
||||
- `CredentialAlreadyRevokedError` (409) — credential is already revoked
|
||||
|
||||
**Configuration**:
|
||||
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, new credentials are stored in Vault KV v2 instead of bcrypt. Existing bcrypt-based credentials continue to work unchanged.
|
||||
|
||||
---
|
||||
|
||||
### AuditService
|
||||
|
||||
**Purpose**: Creates and queries immutable audit events for compliance and observability.
|
||||
|
||||
**Responsibility boundary**: AuditService does not know about HTTP, tokens, or agents.
|
||||
It receives already-assembled event data from other services and delegates all
|
||||
persistence to `AuditRepository`. It enforces the 90-day free-tier retention window
|
||||
on all query and retrieval operations — events older than 90 days are treated as
|
||||
non-existent.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `logEvent` | `agentId: string, action: AuditAction, outcome: AuditOutcome, ipAddress: string, userAgent: string, metadata: Record<string, unknown>` | `Promise<IAuditEvent>` | Writes an immutable audit row to PostgreSQL. For token endpoints, callers use `void` (fire-and-forget). For CRUD operations, callers `await` this method. |
|
||||
| `queryEvents` | `filters: IAuditListFilters` | `Promise<IPaginatedAuditEventsResponse>` | Returns paginated, filtered audit events; enforces the 90-day retention window by computing the cutoff date and rejecting queries with `fromDate` before the cutoff; validates that `fromDate <= toDate` |
|
||||
| `getEventById` | `eventId: string` | `Promise<IAuditEvent>` | Retrieves a single event by UUID; returns `AuditEventNotFoundError` for both genuinely missing events and events outside the 90-day retention window (indistinguishable by design) |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `audit_events`: `event_id` (UUID PK), `agent_id` (text FK to agents), `action` (text — one of the `AuditAction` union type values), `outcome` (`success` or `failure`), `ip_address` (text), `user_agent` (text), `metadata` (JSONB), `timestamp` (timestamptz, NOT NULL, indexed).
|
||||
- No Redis usage — AuditService is PostgreSQL-only.
|
||||
|
||||
**Error types**:
|
||||
- `AuditEventNotFoundError` (404) — event not found or outside retention window
|
||||
- `RetentionWindowError` (400) — query `fromDate` is before the 90-day retention cutoff
|
||||
- `ValidationError` (400) — `fromDate` is after `toDate`
|
||||
|
||||
**Configuration**: None — the retention window (`FREE_TIER_RETENTION_DAYS = 90`) is a module-level constant.
|
||||
|
||||
---
|
||||
|
||||
### VaultClient
|
||||
|
||||
**Purpose**: Wraps HashiCorp Vault KV v2 operations for credential secret storage and verification.
|
||||
|
||||
**Responsibility boundary**: VaultClient is a client adapter — it knows only about
|
||||
Vault API calls. It has no knowledge of business rules, HTTP, or PostgreSQL. It is
|
||||
injected into `CredentialService` and `OAuth2Service` via constructor injection. When
|
||||
`VAULT_ADDR` is not set, `createVaultClientFromEnv()` returns `null` and the bcrypt
|
||||
code path is used unchanged.
|
||||
|
||||
**Public methods**:
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `writeSecret` | `agentId: string, credentialId: string, plainSecret: string` | `Promise<string>` | Writes the plain-text secret to the KV v2 data path; returns the path; creates a new KV v2 version on subsequent calls (used for rotation) |
|
||||
| `readSecret` | `agentId: string, credentialId: string` | `Promise<string>` | Reads and returns the plain-text secret from Vault; throws `CredentialError` if the path is not found or the read fails |
|
||||
| `verifySecret` | `agentId: string, credentialId: string, candidateSecret: string` | `Promise<boolean>` | Reads the stored secret via `readSecret`, then compares using `crypto.timingSafeEqual` to prevent timing-based side-channel attacks; returns `false` on any Vault error rather than throwing |
|
||||
| `deleteSecret` | `agentId: string, credentialId: string` | `Promise<void>` | Permanently deletes all versions of a credential secret by calling the KV v2 metadata path (`DELETE {mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`) |
|
||||
|
||||
**KV v2 path structure**:
|
||||
- Data path: `{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`
|
||||
- Metadata path (for permanent deletion): `{mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`
|
||||
- Default mount: `secret` (overridable via `VAULT_MOUNT`)
|
||||
|
||||
**Opt-in configuration**:
|
||||
- `VAULT_ADDR` — Vault server address (e.g. `http://127.0.0.1:8200`) — required to enable Vault mode
|
||||
- `VAULT_TOKEN` — Vault authentication token — required to enable Vault mode
|
||||
- `VAULT_MOUNT` — KV v2 mount path — optional, defaults to `secret`
|
||||
|
||||
**Constant-time comparison rationale**: The `verifySecret` method uses Node.js
|
||||
`crypto.timingSafeEqual` instead of `===` to prevent attackers from inferring the
|
||||
length or content of stored secrets by measuring how long the comparison takes. When
|
||||
the stored and candidate secrets differ in length, a dummy `timingSafeEqual` call is
|
||||
still performed to eliminate the timing signal from the early-exit path.
|
||||
|
||||
---
|
||||
|
||||
### OPA Policy Engine
|
||||
|
||||
**Purpose**: Enforces scope-based authorisation on every protected HTTP request without requiring a code deployment to change access rules.
|
||||
|
||||
**Responsibility boundary**: The OPA policy engine (`src/middleware/opa.ts`) is a
|
||||
middleware layer — it does not know about business rules, credentials, or audit events.
|
||||
It receives the HTTP method, full request path, and caller scopes from `req.user`, and
|
||||
returns allow or deny. All policy logic lives in `policies/authz.rego` and
|
||||
`policies/data/scopes.json`.
|
||||
|
||||
**Policy file locations**:
|
||||
- `policies/authz.rego` — Rego policy defining `normalise_path`, `lookup_key`, and the `allow` rule. Evaluated by the Wasm bundle when compiled; replicated in TypeScript for the fallback path.
|
||||
- `policies/data/scopes.json` — JSON map of `"METHOD:/path/pattern"` → `[required_scopes]`. Loaded as data into the Wasm policy and used directly by the TypeScript fallback.
|
||||
- `policies/authz.wasm` — compiled Wasm bundle (not committed to source control; built from `authz.rego` using the OPA CLI). When present, the Wasm path is used; when absent, the TypeScript fallback reads `scopes.json`.
|
||||
|
||||
**How `opaMiddleware` evaluates input**:
|
||||
|
||||
1. `createOpaMiddleware()` is called once at app startup in `src/app.ts`.
|
||||
2. It attempts to load `policies/authz.wasm`. If found, `loadPolicy(wasmBuffer)` is called and `scopes.json` data is injected via `loaded.setData(parsed)`.
|
||||
3. If no Wasm bundle is found, `scopes.json` is loaded into `scopesMap` as the TypeScript fallback.
|
||||
4. On every request, the middleware builds an `OpaInput` object: `{ method: req.method, path: req.baseUrl + req.path, scopes: req.user.scope.split(' ') }`.
|
||||
5. `evaluate(input)` checks the Wasm policy (if loaded) or applies `normalisePath` + scope-intersection logic against `scopesMap`. Returns `false` if neither is loaded (fail-closed).
|
||||
6. If `evaluate` returns `false`, the middleware calls `next(new AuthorizationError())`.
|
||||
|
||||
**How to write a new policy rule**:
|
||||
|
||||
1. Add the new endpoint's scope requirement to `policies/data/scopes.json`:
|
||||
```json
|
||||
"GET:/api/v1/reports": ["reports:read"]
|
||||
```
|
||||
2. Add `"reports:read"` to the `OAuthScope` union type in `src/types/index.ts`.
|
||||
3. If Wasm mode is in use, recompile `authz.rego` to `authz.wasm` using the OPA CLI: `opa build policies/authz.rego -d policies/data/ -o policies/authz.wasm`.
|
||||
4. Send `SIGHUP` to the running process to hot-reload: `kill -HUP <pid>`.
|
||||
|
||||
**How to test a policy rule**:
|
||||
```bash
|
||||
# Using the OPA CLI directly
|
||||
opa eval --data policies/data/scopes.json \
|
||||
--input '{"method":"GET","path":"/api/v1/agents","scopes":["agents:read"]}' \
|
||||
--bundle policies/ \
|
||||
'data.authz.allow'
|
||||
```
|
||||
Expected output: `true`. Replace method/path/scopes to test deny cases.
|
||||
|
||||
**Hot-reload via SIGHUP**: When `SIGHUP` is received by the Node.js process,
|
||||
`server.ts` calls `reloadOpaPolicy()`. This re-executes the same startup loading logic:
|
||||
tries to load the Wasm bundle, falls back to `scopes.json`. The in-memory `wasmPolicy`
|
||||
and `scopesMap` module-level variables are replaced atomically. No requests are dropped.
|
||||
|
||||
---
|
||||
|
||||
### Web Dashboard
|
||||
|
||||
**Purpose**: Provides a browser-based UI for human operators to manage agents, credentials, and audit logs without writing API calls directly.
|
||||
|
||||
**Responsibility boundary**: The dashboard is a pure client-side React SPA. It has no
|
||||
server-side logic. It calls the AgentIdP REST API using the `@sentryagent/idp-sdk`
|
||||
`TokenManager` for authentication and a typed `ApiClient` from `dashboard/src/lib/client.ts`
|
||||
for all API calls. It never stores the `access_token` in localStorage — only
|
||||
`client_id`, `client_secret`, and `baseUrl` are stored in `sessionStorage` (cleared
|
||||
on tab close).
|
||||
|
||||
**React component structure**:
|
||||
|
||||
```
|
||||
dashboard/src/
|
||||
├── main.tsx # React root — mounts App into #root, wraps with BrowserRouter
|
||||
├── App.tsx # Route definitions — AuthProvider, RequireAuth, AppShell
|
||||
├── lib/
|
||||
│ ├── auth.tsx # AuthContext, AuthProvider, useAuth hook, sessionStorage helpers
|
||||
│ └── client.ts # Typed ApiClient class — wraps fetch with TokenManager token injection
|
||||
├── components/
|
||||
│ ├── RequireAuth.tsx # Route guard — redirects to /dashboard/login if not authenticated
|
||||
│ └── layout/AppShell.tsx # Persistent sidebar navigation + Outlet for page content
|
||||
└── pages/
|
||||
├── Login.tsx # Login form — calls auth.login(), redirects to /dashboard/agents
|
||||
├── Agents.tsx # Paginated agents list with status filter and search
|
||||
├── AgentDetail.tsx # Single agent view — status, metadata, update, decommission actions
|
||||
├── Credentials.tsx # Credential list for an agent — generate, rotate, revoke actions
|
||||
├── AuditLog.tsx # Paginated audit log with date range and action filters
|
||||
└── Health.tsx # /health endpoint response — PostgreSQL and Redis status display
|
||||
```
|
||||
|
||||
**Authentication flow with sessionStorage**:
|
||||
1. On `Login.tsx` form submit, `auth.login(creds)` is called.
|
||||
2. `validateCredentials(creds)` creates a `TokenManager` and calls `getToken()` — if this succeeds, the credentials are valid.
|
||||
3. `saveCredentials(creds)` stores `{ clientId, clientSecret, baseUrl }` in `sessionStorage` under key `agentidp_credentials`.
|
||||
4. On every subsequent API call, `getClient()` in `lib/client.ts` reads credentials from `sessionStorage`, creates a `TokenManager`, and injects the current `access_token` into the `Authorization: Bearer` header. The `TokenManager` handles automatic token refresh when the token is expired.
|
||||
5. `auth.logout()` calls `clearCredentials()` (removes the `sessionStorage` key) and navigates to `/dashboard/login`.
|
||||
|
||||
**Main views and their API calls**:
|
||||
- **Agents** — `GET /api/v1/agents?page=N&limit=20` — paginated list with `status` filter
|
||||
- **AgentDetail** — `GET /api/v1/agents/:id`, `PATCH /api/v1/agents/:id`, `DELETE /api/v1/agents/:id`
|
||||
- **Credentials** — `GET /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials/:credId/rotate`, `DELETE /api/v1/agents/:id/credentials/:credId`
|
||||
- **AuditLog** — `GET /api/v1/audit?page=N&limit=20&fromDate=...&toDate=...`
|
||||
- **Health** — `GET /health`
|
||||
|
||||
**Local development**:
|
||||
```bash
|
||||
cd dashboard
|
||||
npm install
|
||||
npm run dev # Vite dev server with HMR — dashboard available at http://localhost:5173/dashboard
|
||||
```
|
||||
The Vite dev server proxies `/api/` calls to the Express server at `http://localhost:3000`.
|
||||
The Express server must be running separately for API calls to work.
|
||||
|
||||
---
|
||||
|
||||
### Prometheus/Grafana Monitoring
|
||||
|
||||
**Purpose**: Provides operational visibility into AgentIdP's HTTP traffic, token issuance rates, agent registration rates, database latency, and Redis command latency.
|
||||
|
||||
**Responsibility boundary**: The metrics middleware (`src/middleware/metrics.ts`) and
|
||||
the metrics registry (`src/metrics/registry.ts`) are observability concerns only — they
|
||||
do not affect business logic. Metrics are exposed at `GET /metrics` via
|
||||
`createMetricsRouter()` using `metricsRegistry.metrics()` from `prom-client`. The
|
||||
`/metrics` endpoint is unauthenticated, intended for scraping by Prometheus only and
|
||||
not exposed to the public internet.
|
||||
|
||||
**Key metrics with labels**:
|
||||
|
||||
| Metric Name | Type | Labels | Description |
|
||||
|-------------|------|--------|-------------|
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | Total HTTP requests received; route is normalised (UUIDs replaced with `:id`) |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration; buckets from 5ms to 2.5s |
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | Total OAuth 2.0 access tokens successfully issued |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Total AI agents successfully registered |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration; buckets from 1ms to 1s |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration; buckets from 0.5ms to 250ms |
|
||||
|
||||
**How to add a new Counter**:
|
||||
1. Open `src/metrics/registry.ts`.
|
||||
2. Add a new `Counter` export:
|
||||
```typescript
|
||||
export const myNewCounter = new Counter({
|
||||
name: 'agentidp_my_new_counter_total',
|
||||
help: 'Description of what this counts.',
|
||||
labelNames: ['label_one'] as const,
|
||||
registers: [metricsRegistry],
|
||||
});
|
||||
```
|
||||
3. Import and call `myNewCounter.inc({ label_one: value })` in the service or middleware where the event occurs.
|
||||
|
||||
**How to add a new Histogram**:
|
||||
1. Open `src/metrics/registry.ts`.
|
||||
2. Add a new `Histogram` export with appropriate buckets:
|
||||
```typescript
|
||||
export const myDurationHistogram = new Histogram({
|
||||
name: 'agentidp_my_operation_duration_seconds',
|
||||
help: 'Duration of my operation in seconds.',
|
||||
labelNames: ['operation'] as const,
|
||||
buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
|
||||
registers: [metricsRegistry],
|
||||
});
|
||||
```
|
||||
3. Use `const end = myDurationHistogram.startTimer({ operation: 'name' }); ... end();` around the operation being measured.
|
||||
|
||||
**Grafana access in local Docker**:
|
||||
|
||||
Start the monitoring overlay:
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up
|
||||
```
|
||||
- Prometheus: `http://localhost:9090`
|
||||
- Grafana: `http://localhost:3001` — default credentials: `admin` / `agentidp`
|
||||
|
||||
Grafana is pre-provisioned with a Prometheus data source pointing to `http://prometheus:9090`
|
||||
and dashboard JSON files from `monitoring/grafana/dashboards/`. No manual configuration
|
||||
is needed after startup.
|
||||
Reference in New Issue
Block a user