Compare commits
26 Commits
develop
...
389a764e8d
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
389a764e8d | ||
|
|
831e91c467 | ||
|
|
af630b43d4 | ||
|
|
26a56f84e1 | ||
|
|
fefbf1e3ea | ||
|
|
89c99b666d | ||
|
|
d1e6af25aa | ||
|
|
1b682c22b2 | ||
|
|
b0f70b7ac4 | ||
|
|
f1fbe0e29a | ||
|
|
ceec22f714 | ||
|
|
fd90b2acd1 | ||
|
|
272b69f18d | ||
|
|
03b5de300c | ||
|
|
5e465e596a | ||
|
|
3d1fff15f6 | ||
|
|
d252097f71 | ||
|
|
cb7d079ef6 | ||
|
|
d42c653eea | ||
|
|
eced5f8699 | ||
|
|
1f95cfe89d | ||
|
|
6913d62648 | ||
|
|
a504964e5f | ||
|
|
7d6e248a14 | ||
|
|
7328a61c44 | ||
|
|
c8f916b849 |
110
.github/actions/issue-token/README.md
vendored
Normal file
110
.github/actions/issue-token/README.md
vendored
Normal file
@@ -0,0 +1,110 @@
|
||||
# sentryagent/issue-token
|
||||
|
||||
Issues a SentryAgent.ai OAuth2 Bearer token for an existing agent from a GitHub
|
||||
Actions workflow.
|
||||
|
||||
No long-lived API credentials are required. The action uses a GitHub-issued OIDC
|
||||
token to authenticate with the SentryAgent.ai AgentIdP via `POST /oidc/token`.
|
||||
The returned access token is automatically masked with `core.setSecret()` so it
|
||||
never appears in plaintext in workflow logs.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. Register the agent
|
||||
|
||||
The agent must already exist in SentryAgent.ai. If you need to create the agent
|
||||
in CI, use [`sentryagent/register-agent@v1`](../register-agent/README.md) first.
|
||||
|
||||
### 2. Configure an OIDC Trust Policy for the agent
|
||||
|
||||
A trust policy linking the repository to the specific agent must be registered:
|
||||
|
||||
```bash
|
||||
curl -X POST https://idp.sentryagent.ai/api/v1/oidc/trust-policies \
|
||||
-H "Authorization: Bearer <your-admin-token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider": "github",
|
||||
"repository": "org/your-repo",
|
||||
"branch": "main",
|
||||
"agentId": "<agent-uuid>"
|
||||
}'
|
||||
```
|
||||
|
||||
Omit `branch` to allow any branch to issue tokens for this agent.
|
||||
|
||||
### 3. Grant `id-token: write` permission
|
||||
|
||||
The workflow must have permission to request a GitHub OIDC token:
|
||||
|
||||
```yaml
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: read
|
||||
```
|
||||
|
||||
## Inputs
|
||||
|
||||
| Input | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `api-url` | Yes | Base URL of the SentryAgent.ai API (e.g. `https://idp.sentryagent.ai`) |
|
||||
| `agent-id` | Yes | UUID of the agent for which to issue an access token |
|
||||
|
||||
## Outputs
|
||||
|
||||
| Output | Description |
|
||||
|--------|-------------|
|
||||
| `access-token` | Short-lived Bearer token. Masked in all log output. |
|
||||
| `expires-at` | ISO 8601 timestamp indicating when the token expires. |
|
||||
|
||||
## Example workflow
|
||||
|
||||
```yaml
|
||||
name: Deploy with Agent Token
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Issue SentryAgent access token
|
||||
id: token
|
||||
uses: sentryagent/issue-token@v1
|
||||
with:
|
||||
api-url: https://idp.sentryagent.ai
|
||||
agent-id: ${{ vars.SENTRY_AGENT_ID }}
|
||||
|
||||
- name: Call authenticated API
|
||||
run: |
|
||||
curl -H "Authorization: Bearer ${{ steps.token.outputs.access-token }}" \
|
||||
https://my-service.example.com/deploy
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**HTTP 403 — Trust policy violation**
|
||||
No trust policy exists for this repository + agent combination. Register a trust
|
||||
policy using the Prerequisites steps above.
|
||||
|
||||
**HTTP 403 — Branch not permitted**
|
||||
A trust policy exists but specifies a branch constraint that does not match the
|
||||
current workflow's branch. Add a policy for the current branch, or remove the
|
||||
branch constraint to allow all branches.
|
||||
|
||||
**Failed to obtain a GitHub OIDC token**
|
||||
Ensure `id-token: write` is set in the workflow's `permissions` block.
|
||||
|
||||
**Token expires too quickly**
|
||||
The default token TTL is set by the SentryAgent.ai server configuration. Check
|
||||
`expires-at` and re-issue a token before it expires if your workflow is long-running.
|
||||
|
||||
## Full documentation
|
||||
|
||||
[https://docs.sentryagent.ai/github-actions](https://docs.sentryagent.ai/github-actions)
|
||||
153
.github/actions/issue-token/action.js
vendored
Normal file
153
.github/actions/issue-token/action.js
vendored
Normal file
@@ -0,0 +1,153 @@
|
||||
/**
|
||||
* issue-token GitHub Action script.
|
||||
*
|
||||
* Flow:
|
||||
* 1. Request a GitHub OIDC token via @actions/core.getIDToken()
|
||||
* 2. Exchange the OIDC token for a SentryAgent.ai access token via POST /oidc/token
|
||||
* 3. Set outputs: access-token (masked) and expires-at (ISO 8601)
|
||||
*
|
||||
* The access token is immediately registered with core.setSecret() so it never
|
||||
* appears in plaintext in workflow logs.
|
||||
*
|
||||
* Error handling:
|
||||
* - OIDC exchange failures emit a clear message with a link to the trust policy setup docs
|
||||
*/
|
||||
|
||||
'use strict';
|
||||
|
||||
const core = require('@actions/core');
|
||||
const { HttpClient } = require('@actions/http-client');
|
||||
|
||||
/**
|
||||
* Exchanges a GitHub OIDC JWT for a SentryAgent.ai access token for a specific agent.
|
||||
*
|
||||
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
|
||||
* @param {string} oidcToken - GitHub OIDC JWT obtained from core.getIDToken().
|
||||
* @param {string} agentId - UUID of the agent for which to issue a token.
|
||||
* @returns {Promise<{ accessToken: string; expiresIn: number }>} The access token and its TTL in seconds.
|
||||
* @throws {Error} If the exchange fails, with a message including trust policy setup instructions.
|
||||
*/
|
||||
async function exchangeOIDCToken(apiUrl, oidcToken, agentId) {
|
||||
const client = new HttpClient('sentryagent-issue-token/1.0');
|
||||
const url = `${apiUrl}/api/v1/oidc/token`;
|
||||
|
||||
const body = JSON.stringify({
|
||||
provider: 'github',
|
||||
token: oidcToken,
|
||||
agentId,
|
||||
});
|
||||
|
||||
let response;
|
||||
try {
|
||||
response = await client.post(url, body, {
|
||||
'Content-Type': 'application/json',
|
||||
Accept: 'application/json',
|
||||
});
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
`Failed to reach the SentryAgent.ai OIDC token endpoint at ${url}. ` +
|
||||
`Check that the api-url input is correct and the API is reachable.\n` +
|
||||
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
}
|
||||
|
||||
const rawBody = await response.readBody();
|
||||
const statusCode = response.message.statusCode ?? 0;
|
||||
|
||||
if (statusCode === 403) {
|
||||
throw new Error(
|
||||
'GitHub OIDC token exchange was rejected with HTTP 403 (Forbidden). ' +
|
||||
'This usually means no trust policy has been registered for this repository.\n\n' +
|
||||
'To fix this, register a trust policy by calling:\n' +
|
||||
` POST ${apiUrl}/oidc/trust-policies\n` +
|
||||
' Body: { "provider": "github", "repository": "org/repo", "agentId": "<agent-id>" }\n\n' +
|
||||
'For full setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
let detail = rawBody;
|
||||
try {
|
||||
const parsed = JSON.parse(rawBody);
|
||||
detail = parsed.message ?? parsed.error_description ?? rawBody;
|
||||
} catch {
|
||||
// use rawBody as-is
|
||||
}
|
||||
throw new Error(
|
||||
`OIDC token exchange failed with HTTP ${statusCode}: ${detail}\n` +
|
||||
'For trust policy setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
let tokenData;
|
||||
try {
|
||||
tokenData = JSON.parse(rawBody);
|
||||
} catch {
|
||||
throw new Error(`OIDC token exchange returned non-JSON response: ${rawBody}`);
|
||||
}
|
||||
|
||||
if (typeof tokenData.access_token !== 'string' || tokenData.access_token.length === 0) {
|
||||
throw new Error('OIDC token exchange response did not include an access_token.');
|
||||
}
|
||||
|
||||
const expiresIn = typeof tokenData.expires_in === 'number' ? tokenData.expires_in : 3600;
|
||||
|
||||
return { accessToken: tokenData.access_token, expiresIn };
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes an ISO 8601 expiry timestamp from a TTL in seconds.
|
||||
*
|
||||
* @param {number} expiresInSeconds - Number of seconds until the token expires.
|
||||
* @returns {string} ISO 8601 timestamp string.
|
||||
*/
|
||||
function computeExpiresAt(expiresInSeconds) {
|
||||
return new Date(Date.now() + expiresInSeconds * 1000).toISOString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Main entry point for the issue-token GitHub Action.
|
||||
*
|
||||
* @returns {Promise<void>}
|
||||
*/
|
||||
async function run() {
|
||||
try {
|
||||
// Read inputs
|
||||
const apiUrl = core.getInput('api-url', { required: true }).replace(/\/$/, '');
|
||||
const agentId = core.getInput('agent-id', { required: true });
|
||||
|
||||
core.info(`Requesting GitHub OIDC token for audience: ${apiUrl}`);
|
||||
let oidcToken;
|
||||
try {
|
||||
oidcToken = await core.getIDToken(apiUrl);
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
'Failed to obtain a GitHub OIDC token. ' +
|
||||
"Ensure the workflow has 'id-token: write' permission in its permissions block.\n\n" +
|
||||
'Example:\n' +
|
||||
'permissions:\n' +
|
||||
' id-token: write\n' +
|
||||
' contents: read\n\n' +
|
||||
`Underlying error: ${err instanceof Error ? err.message : String(err)}\n` +
|
||||
'For setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
core.info(`Exchanging GitHub OIDC token for SentryAgent.ai access token (agent: ${agentId})...`);
|
||||
const { accessToken, expiresIn } = await exchangeOIDCToken(apiUrl, oidcToken, agentId);
|
||||
|
||||
// Mask the token immediately — must happen before any logging or output
|
||||
core.setSecret(accessToken);
|
||||
|
||||
const expiresAt = computeExpiresAt(expiresIn);
|
||||
|
||||
core.setOutput('access-token', accessToken);
|
||||
core.setOutput('expires-at', expiresAt);
|
||||
|
||||
core.info(`Access token issued successfully. Expires at: ${expiresAt}`);
|
||||
} catch (err) {
|
||||
core.setFailed(err instanceof Error ? err.message : String(err));
|
||||
}
|
||||
}
|
||||
|
||||
run();
|
||||
37
.github/actions/issue-token/action.yml
vendored
Normal file
37
.github/actions/issue-token/action.yml
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
name: 'SentryAgent Issue Token'
|
||||
description: >
|
||||
Issues a SentryAgent.ai OAuth2 access token for an agent using GitHub OIDC
|
||||
token exchange. No long-lived API credentials required. The issued access
|
||||
token is automatically masked in GitHub Actions logs via core.setSecret().
|
||||
|
||||
author: 'SentryAgent.ai'
|
||||
|
||||
branding:
|
||||
icon: 'key'
|
||||
color: 'blue'
|
||||
|
||||
inputs:
|
||||
api-url:
|
||||
description: >
|
||||
Base URL of the SentryAgent.ai AgentIdP API.
|
||||
Example: https://idp.sentryagent.ai
|
||||
required: true
|
||||
agent-id:
|
||||
description: >
|
||||
The UUID of the agent for which to issue an access token.
|
||||
Obtain this from the register-agent action output or from the API.
|
||||
required: true
|
||||
|
||||
outputs:
|
||||
access-token:
|
||||
description: >
|
||||
A short-lived Bearer access token for the specified agent.
|
||||
The token value is masked in all GitHub Actions log output.
|
||||
expires-at:
|
||||
description: >
|
||||
ISO 8601 timestamp indicating when the access token expires.
|
||||
Use this to decide when to re-issue a fresh token.
|
||||
|
||||
runs:
|
||||
using: 'node20'
|
||||
main: 'action.js'
|
||||
96
.github/actions/register-agent/README.md
vendored
Normal file
96
.github/actions/register-agent/README.md
vendored
Normal file
@@ -0,0 +1,96 @@
|
||||
# sentryagent/register-agent
|
||||
|
||||
Registers a new AI agent in SentryAgent.ai from a GitHub Actions workflow.
|
||||
|
||||
No long-lived API credentials are required. The action uses a GitHub-issued OIDC
|
||||
token to authenticate with the SentryAgent.ai AgentIdP via `POST /oidc/token`, then
|
||||
calls `POST /agents` to create the agent.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. Configure an OIDC Trust Policy
|
||||
|
||||
Before this action can exchange tokens, a trust policy must be registered in
|
||||
SentryAgent.ai for the repository that will run the workflow.
|
||||
|
||||
```bash
|
||||
curl -X POST https://idp.sentryagent.ai/api/v1/oidc/trust-policies \
|
||||
-H "Authorization: Bearer <your-admin-token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider": "github",
|
||||
"repository": "org/your-repo",
|
||||
"branch": "main"
|
||||
}'
|
||||
```
|
||||
|
||||
Omit `branch` to allow any branch to register agents from this repository.
|
||||
|
||||
### 2. Grant `id-token: write` permission
|
||||
|
||||
The workflow must have permission to request a GitHub OIDC token:
|
||||
|
||||
```yaml
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: read
|
||||
```
|
||||
|
||||
## Inputs
|
||||
|
||||
| Input | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `api-url` | Yes | Base URL of the SentryAgent.ai API (e.g. `https://idp.sentryagent.ai`) |
|
||||
| `agent-name` | Yes | Unique name (email format) for the new agent |
|
||||
| `agent-description` | No | Human-readable description of the agent's purpose |
|
||||
|
||||
## Outputs
|
||||
|
||||
| Output | Description |
|
||||
|--------|-------------|
|
||||
| `agent-id` | UUID of the newly registered agent. Use in subsequent steps to issue tokens or manage credentials. |
|
||||
|
||||
## Example workflow
|
||||
|
||||
```yaml
|
||||
name: Register Agent
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
register:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Register SentryAgent
|
||||
id: register
|
||||
uses: sentryagent/register-agent@v1
|
||||
with:
|
||||
api-url: https://idp.sentryagent.ai
|
||||
agent-name: my-ci-agent@acme.com
|
||||
agent-description: CI agent for the acme/my-repo build pipeline
|
||||
|
||||
- name: Print agent ID
|
||||
run: echo "Registered agent ${{ steps.register.outputs.agent-id }}"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**HTTP 403 — Trust policy not configured**
|
||||
Register a trust policy for this repository first. See the Prerequisites section above.
|
||||
|
||||
**Failed to obtain a GitHub OIDC token**
|
||||
Ensure `id-token: write` is set in the workflow's `permissions` block.
|
||||
|
||||
**Agent registration failed with HTTP 401**
|
||||
The OIDC token exchange succeeded but the returned access token was rejected by
|
||||
`POST /agents`. Check that the SentryAgent.ai API version matches and the
|
||||
bootstrap token has `agents:write` scope.
|
||||
|
||||
## Full documentation
|
||||
|
||||
[https://docs.sentryagent.ai/github-actions](https://docs.sentryagent.ai/github-actions)
|
||||
200
.github/actions/register-agent/action.js
vendored
Normal file
200
.github/actions/register-agent/action.js
vendored
Normal file
@@ -0,0 +1,200 @@
|
||||
/**
|
||||
* register-agent GitHub Action script.
|
||||
*
|
||||
* Flow:
|
||||
* 1. Request a GitHub OIDC token via @actions/core.getIDToken()
|
||||
* 2. Exchange the OIDC token for a SentryAgent.ai access token via POST /oidc/token
|
||||
* 3. Register a new agent via POST /agents using the access token
|
||||
* 4. Set the `agent-id` output
|
||||
*
|
||||
* Error handling:
|
||||
* - OIDC exchange failures emit a clear message with a link to the trust policy setup docs
|
||||
* - Agent registration failures surface the API error message
|
||||
*/
|
||||
|
||||
'use strict';
|
||||
|
||||
const core = require('@actions/core');
|
||||
const { HttpClient, BearerCredentialHandler } = require('@actions/http-client');
|
||||
|
||||
/**
|
||||
* Exchanges a GitHub OIDC JWT for a SentryAgent.ai access token.
|
||||
*
|
||||
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
|
||||
* @param {string} oidcToken - GitHub OIDC JWT obtained from core.getIDToken().
|
||||
* @returns {Promise<string>} The SentryAgent.ai access token.
|
||||
* @throws {Error} If the exchange fails, with a message including trust policy setup instructions.
|
||||
*/
|
||||
async function exchangeOIDCToken(apiUrl, oidcToken) {
|
||||
const client = new HttpClient('sentryagent-register-agent/1.0');
|
||||
const url = `${apiUrl}/api/v1/oidc/token`;
|
||||
|
||||
const body = JSON.stringify({
|
||||
provider: 'github',
|
||||
token: oidcToken,
|
||||
});
|
||||
|
||||
let response;
|
||||
try {
|
||||
response = await client.post(url, body, {
|
||||
'Content-Type': 'application/json',
|
||||
Accept: 'application/json',
|
||||
});
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
`Failed to reach the SentryAgent.ai OIDC token endpoint at ${url}. ` +
|
||||
`Check that the api-url input is correct and the API is reachable.\n` +
|
||||
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
}
|
||||
|
||||
const rawBody = await response.readBody();
|
||||
const statusCode = response.message.statusCode ?? 0;
|
||||
|
||||
if (statusCode === 403) {
|
||||
throw new Error(
|
||||
'GitHub OIDC token exchange was rejected with HTTP 403 (Forbidden). ' +
|
||||
'This usually means no trust policy has been registered for this repository.\n\n' +
|
||||
'To fix this, register a trust policy by calling:\n' +
|
||||
` POST ${apiUrl}/oidc/trust-policies\n` +
|
||||
' Body: { "provider": "github", "repository": "org/repo", "agentId": "<agent-id>" }\n\n' +
|
||||
'For full setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
let detail = rawBody;
|
||||
try {
|
||||
const parsed = JSON.parse(rawBody);
|
||||
detail = parsed.message ?? parsed.error_description ?? rawBody;
|
||||
} catch {
|
||||
// use rawBody as-is
|
||||
}
|
||||
throw new Error(
|
||||
`OIDC token exchange failed with HTTP ${statusCode}: ${detail}\n` +
|
||||
'For trust policy setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
let tokenData;
|
||||
try {
|
||||
tokenData = JSON.parse(rawBody);
|
||||
} catch {
|
||||
throw new Error(`OIDC token exchange returned non-JSON response: ${rawBody}`);
|
||||
}
|
||||
|
||||
if (typeof tokenData.access_token !== 'string' || tokenData.access_token.length === 0) {
|
||||
throw new Error('OIDC token exchange response did not include an access_token.');
|
||||
}
|
||||
|
||||
return tokenData.access_token;
|
||||
}
|
||||
|
||||
/**
|
||||
* Registers a new agent via POST /agents.
|
||||
*
|
||||
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
|
||||
* @param {string} accessToken - A valid SentryAgent.ai Bearer access token.
|
||||
* @param {string} agentName - Email (unique name) for the new agent.
|
||||
* @param {string} agentDescription - Optional description stored as the owner field.
|
||||
* @returns {Promise<string>} The UUID of the newly registered agent.
|
||||
* @throws {Error} If the API returns a non-2xx response.
|
||||
*/
|
||||
async function registerAgent(apiUrl, accessToken, agentName, agentDescription) {
|
||||
const auth = new BearerCredentialHandler(accessToken);
|
||||
const client = new HttpClient('sentryagent-register-agent/1.0', [auth]);
|
||||
const url = `${apiUrl}/api/v1/agents`;
|
||||
|
||||
const payload = {
|
||||
email: agentName,
|
||||
agentType: 'custom',
|
||||
version: '1.0.0',
|
||||
capabilities: [],
|
||||
owner: agentDescription || agentName,
|
||||
deploymentEnv: 'production',
|
||||
};
|
||||
|
||||
let response;
|
||||
try {
|
||||
response = await client.post(url, JSON.stringify(payload), {
|
||||
'Content-Type': 'application/json',
|
||||
Accept: 'application/json',
|
||||
});
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
`Failed to reach the SentryAgent.ai agents endpoint at ${url}.\n` +
|
||||
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
}
|
||||
|
||||
const rawBody = await response.readBody();
|
||||
const statusCode = response.message.statusCode ?? 0;
|
||||
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
let detail = rawBody;
|
||||
try {
|
||||
const parsed = JSON.parse(rawBody);
|
||||
detail = parsed.message ?? parsed.error ?? rawBody;
|
||||
} catch {
|
||||
// use rawBody as-is
|
||||
}
|
||||
throw new Error(`Agent registration failed with HTTP ${statusCode}: ${detail}`);
|
||||
}
|
||||
|
||||
let agentData;
|
||||
try {
|
||||
agentData = JSON.parse(rawBody);
|
||||
} catch {
|
||||
throw new Error(`Agent registration returned non-JSON response: ${rawBody}`);
|
||||
}
|
||||
|
||||
if (typeof agentData.agentId !== 'string' || agentData.agentId.length === 0) {
|
||||
throw new Error('Agent registration response did not include an agentId.');
|
||||
}
|
||||
|
||||
return agentData.agentId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Main entry point for the register-agent GitHub Action.
|
||||
*
|
||||
* @returns {Promise<void>}
|
||||
*/
|
||||
async function run() {
|
||||
try {
|
||||
// Read inputs
|
||||
const apiUrl = core.getInput('api-url', { required: true }).replace(/\/$/, '');
|
||||
const agentName = core.getInput('agent-name', { required: true });
|
||||
const agentDescription = core.getInput('agent-description') || '';
|
||||
|
||||
core.info(`Requesting GitHub OIDC token for audience: ${apiUrl}`);
|
||||
let oidcToken;
|
||||
try {
|
||||
oidcToken = await core.getIDToken(apiUrl);
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
'Failed to obtain a GitHub OIDC token. ' +
|
||||
"Ensure the workflow has 'id-token: write' permission in its permissions block.\n\n" +
|
||||
'Example:\n' +
|
||||
'permissions:\n' +
|
||||
' id-token: write\n' +
|
||||
' contents: read\n\n' +
|
||||
`Underlying error: ${err instanceof Error ? err.message : String(err)}\n` +
|
||||
'For setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
|
||||
);
|
||||
}
|
||||
|
||||
core.info('Exchanging GitHub OIDC token for SentryAgent.ai access token...');
|
||||
const accessToken = await exchangeOIDCToken(apiUrl, oidcToken);
|
||||
|
||||
core.info(`Registering agent: ${agentName}`);
|
||||
const agentId = await registerAgent(apiUrl, accessToken, agentName, agentDescription);
|
||||
|
||||
core.setOutput('agent-id', agentId);
|
||||
core.info(`Agent registered successfully. agent-id: ${agentId}`);
|
||||
} catch (err) {
|
||||
core.setFailed(err instanceof Error ? err.message : String(err));
|
||||
}
|
||||
}
|
||||
|
||||
run();
|
||||
39
.github/actions/register-agent/action.yml
vendored
Normal file
39
.github/actions/register-agent/action.yml
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
name: 'SentryAgent Register Agent'
|
||||
description: >
|
||||
Registers a new agent in SentryAgent.ai using GitHub OIDC token exchange.
|
||||
No long-lived API credentials required — the GitHub Actions OIDC token is
|
||||
exchanged for a short-lived SentryAgent.ai access token to call POST /agents.
|
||||
|
||||
author: 'SentryAgent.ai'
|
||||
|
||||
branding:
|
||||
icon: 'shield'
|
||||
color: 'blue'
|
||||
|
||||
inputs:
|
||||
api-url:
|
||||
description: >
|
||||
Base URL of the SentryAgent.ai AgentIdP API.
|
||||
Example: https://idp.sentryagent.ai
|
||||
required: true
|
||||
agent-name:
|
||||
description: >
|
||||
Unique name (email) for the agent being registered.
|
||||
Must be a valid email address format used as the agent identity.
|
||||
required: true
|
||||
agent-description:
|
||||
description: >
|
||||
Optional human-readable description of the agent's purpose.
|
||||
Stored as the agent owner field.
|
||||
required: false
|
||||
default: ''
|
||||
|
||||
outputs:
|
||||
agent-id:
|
||||
description: >
|
||||
The UUID of the newly registered agent.
|
||||
Use in subsequent steps to issue tokens or manage credentials.
|
||||
|
||||
runs:
|
||||
using: 'node20'
|
||||
main: 'action.js'
|
||||
4
.gitignore
vendored
4
.gitignore
vendored
@@ -5,3 +5,7 @@ coverage/
|
||||
.env.*
|
||||
*.log
|
||||
.DS_Store
|
||||
|
||||
# Next.js build output
|
||||
portal/.next/
|
||||
portal/node_modules/
|
||||
|
||||
348
cli/README.md
Normal file
348
cli/README.md
Normal file
@@ -0,0 +1,348 @@
|
||||
# sentryagent CLI
|
||||
|
||||
The official command-line interface for [SentryAgent.ai](https://sentryagent.ai) — manage agents, issue OAuth2 tokens, rotate credentials, and stream audit logs from your terminal.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### From npm (once published)
|
||||
|
||||
```bash
|
||||
npm install -g sentryagent
|
||||
```
|
||||
|
||||
### From source
|
||||
|
||||
```bash
|
||||
cd cli/
|
||||
npm install
|
||||
npm run build
|
||||
npm install -g .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
Before using any command, configure the CLI with your API endpoint and credentials:
|
||||
|
||||
```bash
|
||||
sentryagent configure
|
||||
```
|
||||
|
||||
You will be prompted for:
|
||||
|
||||
| Field | Description |
|
||||
|---------------|--------------------------------------------------|
|
||||
| API URL | The SentryAgent.ai API base URL (e.g. `https://api.sentryagent.ai`) |
|
||||
| Client ID | Your tenant client ID |
|
||||
| Client Secret | Your tenant client secret |
|
||||
|
||||
Configuration is stored at `~/.sentryagent/config.json` with permissions `0600`.
|
||||
|
||||
If any command is run before `sentryagent configure` has been called, the CLI exits with:
|
||||
|
||||
```
|
||||
Not configured. Run `sentryagent configure` first.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Commands
|
||||
|
||||
### `sentryagent --version` / `-v`
|
||||
|
||||
Output the installed CLI version.
|
||||
|
||||
```bash
|
||||
sentryagent --version
|
||||
# 1.0.0
|
||||
```
|
||||
|
||||
### `sentryagent --help` / `-h`
|
||||
|
||||
Show all available commands and global options.
|
||||
|
||||
```bash
|
||||
sentryagent --help
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent configure`
|
||||
|
||||
Interactively configure the CLI.
|
||||
|
||||
```bash
|
||||
sentryagent configure
|
||||
```
|
||||
|
||||
**Prompts:**
|
||||
|
||||
```
|
||||
SentryAgent CLI Configuration
|
||||
────────────────────────────────────────
|
||||
API URL (e.g. https://api.sentryagent.ai): https://api.sentryagent.ai
|
||||
Client ID: tenant_01ABC...
|
||||
Client Secret: ****
|
||||
|
||||
✓ Configuration saved to ~/.sentryagent/config.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent register-agent`
|
||||
|
||||
Register a new agent with the identity provider.
|
||||
|
||||
```bash
|
||||
sentryagent register-agent --name <name> [--description <desc>]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Flag | Required | Description |
|
||||
|-------------------|----------|---------------------|
|
||||
| `--name <name>` | Yes | Agent display name |
|
||||
| `--description` | No | Agent description |
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
sentryagent register-agent --name "billing-agent" --description "Handles billing workflows"
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
✓ Agent registered successfully
|
||||
|
||||
Agent ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
Name: billing-agent
|
||||
Description: Handles billing workflows
|
||||
Status: active
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent list-agents`
|
||||
|
||||
List all agents registered for your tenant, displayed as a formatted table.
|
||||
|
||||
```bash
|
||||
sentryagent list-agents
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
AGENT ID NAME STATUS CREATED AT
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
01ARZ3NDEKTSV4RRFFQ69G5FAV billing-agent active 4/2/2026, 9:00:00 AM
|
||||
01ARZ3NDEKTSV4RRFFQ69G5FAX auth-agent active 4/1/2026, 3:00:00 PM
|
||||
────────────────────────────────────────────────────────────────────────────
|
||||
Total: 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent issue-token`
|
||||
|
||||
Issue an OAuth2 `client_credentials` access token for a specific agent.
|
||||
|
||||
```bash
|
||||
sentryagent issue-token --agent-id <id>
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Flag | Required | Description |
|
||||
|--------------------|----------|-------------------------|
|
||||
| `--agent-id <id>` | Yes | Target agent ID |
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
sentryagent issue-token --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
✓ Token issued successfully
|
||||
|
||||
Access Token:
|
||||
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
|
||||
Token Type: Bearer
|
||||
Expires In: 3600s
|
||||
Expires At: 2026-04-02T10:00:00.000Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent rotate-credentials`
|
||||
|
||||
Rotate the client secret for an agent. Prompts for confirmation before proceeding.
|
||||
|
||||
```bash
|
||||
sentryagent rotate-credentials --agent-id <id>
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Flag | Required | Description |
|
||||
|--------------------|----------|-------------------------|
|
||||
| `--agent-id <id>` | Yes | Target agent ID |
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
sentryagent rotate-credentials --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
⚠ This will invalidate the current secret for agent 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
This will invalidate the current secret. Continue? [y/N] y
|
||||
|
||||
✓ Credentials rotated successfully
|
||||
|
||||
Client ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
Client Secret: cs_new_secret_value_here
|
||||
|
||||
Store the new client secret securely — it will not be shown again.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent tail-audit-log`
|
||||
|
||||
Poll the audit log API every 5 seconds and stream new events to stdout. Press **Ctrl+C** to stop.
|
||||
|
||||
```bash
|
||||
sentryagent tail-audit-log [--agent-id <id>]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Flag | Required | Description |
|
||||
|--------------------|----------|------------------------------------|
|
||||
| `--agent-id <id>` | No | Filter events for a specific agent |
|
||||
|
||||
**Example (all events):**
|
||||
|
||||
```bash
|
||||
sentryagent tail-audit-log
|
||||
```
|
||||
|
||||
**Example (filtered by agent):**
|
||||
|
||||
```bash
|
||||
sentryagent tail-audit-log --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Tailing audit log — press Ctrl+C to stop
|
||||
────────────────────────────────────────────────────────────
|
||||
4/2/2026, 9:05:00 AM agent.token.issued outcome=success agent=01ARZ3NDEKTSV... id=evt_01...
|
||||
4/2/2026, 9:10:03 AM agent.registered outcome=success id=evt_02...
|
||||
^C
|
||||
|
||||
Stopped.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `sentryagent completion`
|
||||
|
||||
Output shell completion scripts.
|
||||
|
||||
#### Bash
|
||||
|
||||
```bash
|
||||
sentryagent completion bash
|
||||
```
|
||||
|
||||
To enable permanently, add to `~/.bashrc` or `~/.bash_profile`:
|
||||
|
||||
```bash
|
||||
source <(sentryagent completion bash)
|
||||
```
|
||||
|
||||
Or write to a file:
|
||||
|
||||
```bash
|
||||
sentryagent completion bash > ~/.bash_completion.d/sentryagent
|
||||
```
|
||||
|
||||
#### Zsh
|
||||
|
||||
```bash
|
||||
sentryagent completion zsh
|
||||
```
|
||||
|
||||
To enable permanently, add to `~/.zshrc`:
|
||||
|
||||
```bash
|
||||
source <(sentryagent completion zsh)
|
||||
```
|
||||
|
||||
Or write to a file in your `$fpath`:
|
||||
|
||||
```bash
|
||||
sentryagent completion zsh > ~/.zsh/completions/_sentryagent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Shell Completion Setup
|
||||
|
||||
### Bash (one-time setup)
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.bash_completion.d
|
||||
sentryagent completion bash > ~/.bash_completion.d/sentryagent
|
||||
echo 'source ~/.bash_completion.d/sentryagent' >> ~/.bashrc
|
||||
source ~/.bashrc
|
||||
```
|
||||
|
||||
### Zsh (one-time setup)
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.zsh/completions
|
||||
sentryagent completion zsh > ~/.zsh/completions/_sentryagent
|
||||
echo 'fpath=(~/.zsh/completions $fpath)' >> ~/.zshrc
|
||||
echo 'autoload -Uz compinit && compinit' >> ~/.zshrc
|
||||
source ~/.zshrc
|
||||
```
|
||||
|
||||
After setup, pressing **Tab** after `sentryagent` will autocomplete commands and flags.
|
||||
|
||||
---
|
||||
|
||||
## Configuration File
|
||||
|
||||
The config file is stored at `~/.sentryagent/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiUrl": "https://api.sentryagent.ai",
|
||||
"clientId": "tenant_01ABC...",
|
||||
"clientSecret": "cs_secret_value"
|
||||
}
|
||||
```
|
||||
|
||||
The directory is created with mode `0700` and the file with mode `0600` to prevent other users from reading your credentials.
|
||||
|
||||
---
|
||||
|
||||
## Environment
|
||||
|
||||
- Node.js >= 18.0.0 is required (uses the built-in `fetch` API)
|
||||
- All HTTP requests use OAuth2 `client_credentials` tokens fetched automatically from your configuration
|
||||
- Tokens are cached in memory for the duration of the CLI session (refreshed 30 seconds before expiry)
|
||||
267
cli/package-lock.json
generated
Normal file
267
cli/package-lock.json
generated
Normal file
@@ -0,0 +1,267 @@
|
||||
{
|
||||
"name": "sentryagent",
|
||||
"version": "1.0.0",
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"": {
|
||||
"name": "sentryagent",
|
||||
"version": "1.0.0",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"chalk": "^5.3.0",
|
||||
"commander": "^12.1.0"
|
||||
},
|
||||
"bin": {
|
||||
"sentryagent": "dist/index.js"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.12.7",
|
||||
"ts-node": "^10.9.2",
|
||||
"typescript": "^5.4.5"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/@cspotcode/source-map-support": {
|
||||
"version": "0.8.1",
|
||||
"resolved": "https://registry.npmjs.org/@cspotcode/source-map-support/-/source-map-support-0.8.1.tgz",
|
||||
"integrity": "sha512-IchNf6dN4tHoMFIn/7OE8LWZ19Y6q/67Bmf6vnGREv8RSbBVb9LPJxEcnwrcwX6ixSvaiGoomAUvu4YSxXrVgw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"@jridgewell/trace-mapping": "0.3.9"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=12"
|
||||
}
|
||||
},
|
||||
"node_modules/@jridgewell/resolve-uri": {
|
||||
"version": "3.1.2",
|
||||
"resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz",
|
||||
"integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=6.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/@jridgewell/sourcemap-codec": {
|
||||
"version": "1.5.5",
|
||||
"resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz",
|
||||
"integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/@jridgewell/trace-mapping": {
|
||||
"version": "0.3.9",
|
||||
"resolved": "https://registry.npmjs.org/@jridgewell/trace-mapping/-/trace-mapping-0.3.9.tgz",
|
||||
"integrity": "sha512-3Belt6tdc8bPgAtbcmdtNJlirVoTmEb5e2gC94PnkwEW9jI6CAHUeoG85tjWP5WquqfavoMtMwiG4P926ZKKuQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"@jridgewell/resolve-uri": "^3.0.3",
|
||||
"@jridgewell/sourcemap-codec": "^1.4.10"
|
||||
}
|
||||
},
|
||||
"node_modules/@tsconfig/node10": {
|
||||
"version": "1.0.12",
|
||||
"resolved": "https://registry.npmjs.org/@tsconfig/node10/-/node10-1.0.12.tgz",
|
||||
"integrity": "sha512-UCYBaeFvM11aU2y3YPZ//O5Rhj+xKyzy7mvcIoAjASbigy8mHMryP5cK7dgjlz2hWxh1g5pLw084E0a/wlUSFQ==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/@tsconfig/node12": {
|
||||
"version": "1.0.11",
|
||||
"resolved": "https://registry.npmjs.org/@tsconfig/node12/-/node12-1.0.11.tgz",
|
||||
"integrity": "sha512-cqefuRsh12pWyGsIoBKJA9luFu3mRxCA+ORZvA4ktLSzIuCUtWVxGIuXigEwO5/ywWFMZ2QEGKWvkZG1zDMTag==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/@tsconfig/node14": {
|
||||
"version": "1.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@tsconfig/node14/-/node14-1.0.3.tgz",
|
||||
"integrity": "sha512-ysT8mhdixWK6Hw3i1V2AeRqZ5WfXg1G43mqoYlM2nc6388Fq5jcXyr5mRsqViLx/GJYdoL0bfXD8nmF+Zn/Iow==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/@tsconfig/node16": {
|
||||
"version": "1.0.4",
|
||||
"resolved": "https://registry.npmjs.org/@tsconfig/node16/-/node16-1.0.4.tgz",
|
||||
"integrity": "sha512-vxhUy4J8lyeyinH7Azl1pdd43GJhZH/tP2weN8TntQblOY+A0XbT8DJk1/oCPuOOyg/Ja757rG0CgHcWC8OfMA==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/@types/node": {
|
||||
"version": "20.19.37",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.37.tgz",
|
||||
"integrity": "sha512-8kzdPJ3FsNsVIurqBs7oodNnCEVbni9yUEkaHbgptDACOPW04jimGagZ51E6+lXUwJjgnBw+hyko/lkFWCldqw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~6.21.0"
|
||||
}
|
||||
},
|
||||
"node_modules/acorn": {
|
||||
"version": "8.16.0",
|
||||
"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.16.0.tgz",
|
||||
"integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"bin": {
|
||||
"acorn": "bin/acorn"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=0.4.0"
|
||||
}
|
||||
},
|
||||
"node_modules/acorn-walk": {
|
||||
"version": "8.3.5",
|
||||
"resolved": "https://registry.npmjs.org/acorn-walk/-/acorn-walk-8.3.5.tgz",
|
||||
"integrity": "sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"acorn": "^8.11.0"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=0.4.0"
|
||||
}
|
||||
},
|
||||
"node_modules/arg": {
|
||||
"version": "4.1.3",
|
||||
"resolved": "https://registry.npmjs.org/arg/-/arg-4.1.3.tgz",
|
||||
"integrity": "sha512-58S9QDqG0Xx27YwPSt9fJxivjYl432YCwfDMfZ+71RAqUrZef7LrKQZ3LHLOwCS4FLNBplP533Zx895SeOCHvA==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/chalk": {
|
||||
"version": "5.6.2",
|
||||
"resolved": "https://registry.npmjs.org/chalk/-/chalk-5.6.2.tgz",
|
||||
"integrity": "sha512-7NzBL0rN6fMUW+f7A6Io4h40qQlG+xGmtMxfbnH/K7TAtt8JQWVQK+6g0UXKMeVJoyV5EkkNsErQ8pVD3bLHbA==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": "^12.17.0 || ^14.13 || >=16.0.0"
|
||||
},
|
||||
"funding": {
|
||||
"url": "https://github.com/chalk/chalk?sponsor=1"
|
||||
}
|
||||
},
|
||||
"node_modules/commander": {
|
||||
"version": "12.1.0",
|
||||
"resolved": "https://registry.npmjs.org/commander/-/commander-12.1.0.tgz",
|
||||
"integrity": "sha512-Vw8qHK3bZM9y/P10u3Vib8o/DdkvA2OtPtZvD871QKjy74Wj1WSKFILMPRPSdUSx5RFK1arlJzEtA4PkFgnbuA==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=18"
|
||||
}
|
||||
},
|
||||
"node_modules/create-require": {
|
||||
"version": "1.1.1",
|
||||
"resolved": "https://registry.npmjs.org/create-require/-/create-require-1.1.1.tgz",
|
||||
"integrity": "sha512-dcKFX3jn0MpIaXjisoRvexIJVEKzaq7z2rZKxf+MSr9TkdmHmsU4m2lcLojrj/FHl8mk5VxMmYA+ftRkP/3oKQ==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/diff": {
|
||||
"version": "4.0.4",
|
||||
"resolved": "https://registry.npmjs.org/diff/-/diff-4.0.4.tgz",
|
||||
"integrity": "sha512-X07nttJQkwkfKfvTPG/KSnE2OMdcUCao6+eXF3wmnIQRn2aPAHH3VxDbDOdegkd6JbPsXqShpvEOHfAT+nCNwQ==",
|
||||
"dev": true,
|
||||
"license": "BSD-3-Clause",
|
||||
"engines": {
|
||||
"node": ">=0.3.1"
|
||||
}
|
||||
},
|
||||
"node_modules/make-error": {
|
||||
"version": "1.3.6",
|
||||
"resolved": "https://registry.npmjs.org/make-error/-/make-error-1.3.6.tgz",
|
||||
"integrity": "sha512-s8UhlNe7vPKomQhC1qFelMokr/Sc3AgNbso3n74mVPA5LTZwkB9NlXf4XPamLxJE8h0gh73rM94xvwRT2CVInw==",
|
||||
"dev": true,
|
||||
"license": "ISC"
|
||||
},
|
||||
"node_modules/ts-node": {
|
||||
"version": "10.9.2",
|
||||
"resolved": "https://registry.npmjs.org/ts-node/-/ts-node-10.9.2.tgz",
|
||||
"integrity": "sha512-f0FFpIdcHgn8zcPSbf1dRevwt047YMnaiJM3u2w2RewrB+fob/zePZcrOyQoLMMO7aBIddLcQIEK5dYjkLnGrQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"@cspotcode/source-map-support": "^0.8.0",
|
||||
"@tsconfig/node10": "^1.0.7",
|
||||
"@tsconfig/node12": "^1.0.7",
|
||||
"@tsconfig/node14": "^1.0.0",
|
||||
"@tsconfig/node16": "^1.0.2",
|
||||
"acorn": "^8.4.1",
|
||||
"acorn-walk": "^8.1.1",
|
||||
"arg": "^4.1.0",
|
||||
"create-require": "^1.1.0",
|
||||
"diff": "^4.0.1",
|
||||
"make-error": "^1.1.1",
|
||||
"v8-compile-cache-lib": "^3.0.1",
|
||||
"yn": "3.1.1"
|
||||
},
|
||||
"bin": {
|
||||
"ts-node": "dist/bin.js",
|
||||
"ts-node-cwd": "dist/bin-cwd.js",
|
||||
"ts-node-esm": "dist/bin-esm.js",
|
||||
"ts-node-script": "dist/bin-script.js",
|
||||
"ts-node-transpile-only": "dist/bin-transpile.js",
|
||||
"ts-script": "dist/bin-script-deprecated.js"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"@swc/core": ">=1.2.50",
|
||||
"@swc/wasm": ">=1.2.50",
|
||||
"@types/node": "*",
|
||||
"typescript": ">=2.7"
|
||||
},
|
||||
"peerDependenciesMeta": {
|
||||
"@swc/core": {
|
||||
"optional": true
|
||||
},
|
||||
"@swc/wasm": {
|
||||
"optional": true
|
||||
}
|
||||
}
|
||||
},
|
||||
"node_modules/typescript": {
|
||||
"version": "5.9.3",
|
||||
"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
|
||||
"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"bin": {
|
||||
"tsc": "bin/tsc",
|
||||
"tsserver": "bin/tsserver"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=14.17"
|
||||
}
|
||||
},
|
||||
"node_modules/undici-types": {
|
||||
"version": "6.21.0",
|
||||
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
|
||||
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/v8-compile-cache-lib": {
|
||||
"version": "3.0.1",
|
||||
"resolved": "https://registry.npmjs.org/v8-compile-cache-lib/-/v8-compile-cache-lib-3.0.1.tgz",
|
||||
"integrity": "sha512-wa7YjyUGfNZngI/vtK0UHAN+lgDCxBPCylVXGp0zu59Fz5aiGtNXaq3DhIov063MorB+VfufLh3JlF2KdTK3xg==",
|
||||
"dev": true,
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/yn": {
|
||||
"version": "3.1.1",
|
||||
"resolved": "https://registry.npmjs.org/yn/-/yn-3.1.1.tgz",
|
||||
"integrity": "sha512-Ux4ygGWsu2c7isFWe8Yu1YluJmqVhxqK2cLXNQA5AcC3QfbGNpM7fu0Y8b/z16pXLnFxZYvWhd3fhBY9DLmC6Q==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=6"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
34
cli/package.json
Normal file
34
cli/package.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"name": "sentryagent",
|
||||
"version": "1.0.0",
|
||||
"description": "SentryAgent.ai CLI — manage agents, tokens, and audit logs",
|
||||
"main": "dist/index.js",
|
||||
"bin": {
|
||||
"sentryagent": "./dist/index.js"
|
||||
},
|
||||
"scripts": {
|
||||
"build": "tsc",
|
||||
"dev": "ts-node src/index.ts",
|
||||
"clean": "rm -rf dist"
|
||||
},
|
||||
"dependencies": {
|
||||
"chalk": "^5.3.0",
|
||||
"commander": "^12.1.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.12.7",
|
||||
"typescript": "^5.4.5",
|
||||
"ts-node": "^10.9.2"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18.0.0"
|
||||
},
|
||||
"keywords": [
|
||||
"sentryagent",
|
||||
"agentidp",
|
||||
"cli",
|
||||
"agents",
|
||||
"identity"
|
||||
],
|
||||
"license": "MIT"
|
||||
}
|
||||
95
cli/src/api.ts
Normal file
95
cli/src/api.ts
Normal file
@@ -0,0 +1,95 @@
|
||||
import { Config } from './config';
|
||||
|
||||
interface TokenCache {
|
||||
accessToken: string;
|
||||
expiresAt: number;
|
||||
}
|
||||
|
||||
let tokenCache: TokenCache | null = null;
|
||||
|
||||
interface TokenResponse {
|
||||
access_token: string;
|
||||
expires_in: number;
|
||||
token_type: string;
|
||||
}
|
||||
|
||||
async function fetchToken(config: Config): Promise<string> {
|
||||
const now = Date.now();
|
||||
if (tokenCache !== null && tokenCache.expiresAt > now + 30_000) {
|
||||
return tokenCache.accessToken;
|
||||
}
|
||||
|
||||
const body = new URLSearchParams({
|
||||
grant_type: 'client_credentials',
|
||||
client_id: config.clientId,
|
||||
client_secret: config.clientSecret,
|
||||
});
|
||||
|
||||
const res = await fetch(`${config.apiUrl}/oauth2/token`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
|
||||
body: body.toString(),
|
||||
});
|
||||
|
||||
if (!res.ok) {
|
||||
const text = await res.text();
|
||||
throw new Error(`Authentication failed (${res.status}): ${text}`);
|
||||
}
|
||||
|
||||
const data = (await res.json()) as TokenResponse;
|
||||
tokenCache = {
|
||||
accessToken: data.access_token,
|
||||
expiresAt: now + data.expires_in * 1000,
|
||||
};
|
||||
return tokenCache.accessToken;
|
||||
}
|
||||
|
||||
export function clearTokenCache(): void {
|
||||
tokenCache = null;
|
||||
}
|
||||
|
||||
type HttpMethod = 'GET' | 'POST' | 'PUT' | 'PATCH' | 'DELETE';
|
||||
|
||||
interface ApiRequestOptions {
|
||||
method?: HttpMethod;
|
||||
body?: unknown;
|
||||
params?: Record<string, string>;
|
||||
}
|
||||
|
||||
export async function apiRequest<T>(
|
||||
config: Config,
|
||||
endpoint: string,
|
||||
options: ApiRequestOptions = {},
|
||||
): Promise<T> {
|
||||
const token = await fetchToken(config);
|
||||
const { method = 'GET', body, params } = options;
|
||||
|
||||
let url = `${config.apiUrl}${endpoint}`;
|
||||
if (params !== undefined && Object.keys(params).length > 0) {
|
||||
const qs = new URLSearchParams(params);
|
||||
url = `${url}?${qs.toString()}`;
|
||||
}
|
||||
|
||||
const headers: Record<string, string> = {
|
||||
Authorization: `Bearer ${token}`,
|
||||
'Content-Type': 'application/json',
|
||||
};
|
||||
|
||||
const fetchOptions: RequestInit = { method, headers };
|
||||
if (body !== undefined) {
|
||||
fetchOptions.body = JSON.stringify(body);
|
||||
}
|
||||
|
||||
const res = await fetch(url, fetchOptions);
|
||||
|
||||
if (!res.ok) {
|
||||
const text = await res.text();
|
||||
throw new Error(`API error (${res.status}): ${text}`);
|
||||
}
|
||||
|
||||
if (res.status === 204) {
|
||||
return undefined as unknown as T;
|
||||
}
|
||||
|
||||
return (await res.json()) as T;
|
||||
}
|
||||
155
cli/src/commands/completion.ts
Normal file
155
cli/src/commands/completion.ts
Normal file
@@ -0,0 +1,155 @@
|
||||
import { Command } from 'commander';
|
||||
|
||||
const BASH_COMPLETION = `
|
||||
# sentryagent bash completion
|
||||
# Add to ~/.bashrc or ~/.bash_profile:
|
||||
# source <(sentryagent completion bash)
|
||||
|
||||
_sentryagent_completion() {
|
||||
local cur prev words cword
|
||||
_init_completion || return
|
||||
|
||||
local commands="configure register-agent list-agents issue-token rotate-credentials tail-audit-log completion"
|
||||
local global_opts="--help --version"
|
||||
|
||||
case "\${prev}" in
|
||||
sentryagent)
|
||||
COMPREPLY=( \$(compgen -W "\${commands} \${global_opts}" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
configure)
|
||||
COMPREPLY=( \$(compgen -W "--help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
register-agent)
|
||||
COMPREPLY=( \$(compgen -W "--name --description --help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
list-agents)
|
||||
COMPREPLY=( \$(compgen -W "--help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
issue-token)
|
||||
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
rotate-credentials)
|
||||
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
tail-audit-log)
|
||||
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
completion)
|
||||
COMPREPLY=( \$(compgen -W "bash zsh --help" -- "\${cur}") )
|
||||
return 0
|
||||
;;
|
||||
*)
|
||||
COMPREPLY=()
|
||||
return 0
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
complete -F _sentryagent_completion sentryagent
|
||||
`.trim();
|
||||
|
||||
const ZSH_COMPLETION = `
|
||||
#compdef sentryagent
|
||||
|
||||
# sentryagent zsh completion
|
||||
# Add to ~/.zshrc:
|
||||
# source <(sentryagent completion zsh)
|
||||
# Or generate a file and place it in your $fpath:
|
||||
# sentryagent completion zsh > ~/.zsh/completions/_sentryagent
|
||||
|
||||
_sentryagent() {
|
||||
local state
|
||||
|
||||
_arguments \\
|
||||
'(-v --version)'{-v,--version}'[Show version]' \\
|
||||
'(-h --help)'{-h,--help}'[Show help]' \\
|
||||
'1: :->command' \\
|
||||
'*: :->args'
|
||||
|
||||
case \$state in
|
||||
command)
|
||||
local commands=(
|
||||
'configure:Configure CLI with API URL and credentials'
|
||||
'register-agent:Register a new agent'
|
||||
'list-agents:List all registered agents'
|
||||
'issue-token:Issue an OAuth2 access token for an agent'
|
||||
'rotate-credentials:Rotate credentials for an agent'
|
||||
'tail-audit-log:Poll and stream audit log events'
|
||||
'completion:Output shell completion script'
|
||||
)
|
||||
_describe 'command' commands
|
||||
;;
|
||||
args)
|
||||
case \${words[2]} in
|
||||
configure)
|
||||
_arguments \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
register-agent)
|
||||
_arguments \\
|
||||
'--name[Agent name]:name' \\
|
||||
'--description[Agent description]:description' \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
list-agents)
|
||||
_arguments \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
issue-token)
|
||||
_arguments \\
|
||||
'--agent-id[Agent ID]:agent-id' \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
rotate-credentials)
|
||||
_arguments \\
|
||||
'--agent-id[Agent ID]:agent-id' \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
tail-audit-log)
|
||||
_arguments \\
|
||||
'--agent-id[Filter by agent ID]:agent-id' \\
|
||||
'(-h --help)'{-h,--help}'[Show help]'
|
||||
;;
|
||||
completion)
|
||||
local shells=('bash:Generate bash completion script' 'zsh:Generate zsh completion script')
|
||||
_describe 'shell' shells
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
_sentryagent "\$@"
|
||||
`.trim();
|
||||
|
||||
export function registerCompletion(program: Command): void {
|
||||
const completion = program
|
||||
.command('completion')
|
||||
.description('Output shell completion scripts');
|
||||
|
||||
completion
|
||||
.command('bash')
|
||||
.description('Output bash completion script')
|
||||
.action(() => {
|
||||
console.log(BASH_COMPLETION);
|
||||
});
|
||||
|
||||
completion
|
||||
.command('zsh')
|
||||
.description('Output zsh completion script')
|
||||
.action(() => {
|
||||
console.log(ZSH_COMPLETION);
|
||||
});
|
||||
|
||||
completion.addHelpText(
|
||||
'after',
|
||||
'\nSupported shells: bash, zsh',
|
||||
);
|
||||
}
|
||||
63
cli/src/commands/configure.ts
Normal file
63
cli/src/commands/configure.ts
Normal file
@@ -0,0 +1,63 @@
|
||||
import * as readline from 'readline';
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { writeConfig } from '../config';
|
||||
|
||||
function prompt(rl: readline.Interface, question: string): Promise<string> {
|
||||
return new Promise((resolve) => {
|
||||
rl.question(question, (answer) => {
|
||||
resolve(answer.trim());
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
export function registerConfigure(program: Command): void {
|
||||
program
|
||||
.command('configure')
|
||||
.description('Configure the CLI with API URL and credentials')
|
||||
.action(async () => {
|
||||
const rl = readline.createInterface({
|
||||
input: process.stdin,
|
||||
output: process.stdout,
|
||||
});
|
||||
|
||||
try {
|
||||
console.log(chalk.bold('SentryAgent CLI Configuration'));
|
||||
console.log(chalk.dim('─'.repeat(40)));
|
||||
|
||||
const apiUrl = await prompt(
|
||||
rl,
|
||||
chalk.cyan('API URL') + ' (e.g. https://api.sentryagent.ai): ',
|
||||
);
|
||||
if (apiUrl === '') {
|
||||
console.error(chalk.red('API URL cannot be empty.'));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const clientId = await prompt(rl, chalk.cyan('Client ID') + ': ');
|
||||
if (clientId === '') {
|
||||
console.error(chalk.red('Client ID cannot be empty.'));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const clientSecret = await prompt(
|
||||
rl,
|
||||
chalk.cyan('Client Secret') + ': ',
|
||||
);
|
||||
if (clientSecret === '') {
|
||||
console.error(chalk.red('Client Secret cannot be empty.'));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
writeConfig({ apiUrl, clientId, clientSecret });
|
||||
|
||||
console.log();
|
||||
console.log(
|
||||
chalk.green('✓') +
|
||||
' Configuration saved to ~/.sentryagent/config.json',
|
||||
);
|
||||
} finally {
|
||||
rl.close();
|
||||
}
|
||||
});
|
||||
}
|
||||
70
cli/src/commands/issue-token.ts
Normal file
70
cli/src/commands/issue-token.ts
Normal file
@@ -0,0 +1,70 @@
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { requireConfig } from '../config';
|
||||
|
||||
interface TokenResponse {
|
||||
access_token: string;
|
||||
expires_in: number;
|
||||
token_type: string;
|
||||
scope?: string;
|
||||
}
|
||||
|
||||
export function registerIssueToken(program: Command): void {
|
||||
program
|
||||
.command('issue-token')
|
||||
.description('Issue an OAuth2 access token for an agent')
|
||||
.requiredOption('--agent-id <id>', 'Agent ID to issue a token for')
|
||||
.action(async (options: { agentId: string }) => {
|
||||
const config = requireConfig();
|
||||
|
||||
try {
|
||||
const body = new URLSearchParams({
|
||||
grant_type: 'client_credentials',
|
||||
client_id: config.clientId,
|
||||
client_secret: config.clientSecret,
|
||||
agent_id: options.agentId,
|
||||
});
|
||||
|
||||
const res = await fetch(`${config.apiUrl}/oauth2/token`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
|
||||
body: body.toString(),
|
||||
});
|
||||
|
||||
if (!res.ok) {
|
||||
const text = await res.text();
|
||||
throw new Error(`Token issuance failed (${res.status}): ${text}`);
|
||||
}
|
||||
|
||||
const data = (await res.json()) as TokenResponse;
|
||||
const expiresAt = new Date(
|
||||
Date.now() + data.expires_in * 1000,
|
||||
).toISOString();
|
||||
|
||||
console.log(chalk.green('✓') + ' Token issued successfully');
|
||||
console.log();
|
||||
console.log(chalk.bold('Access Token:'));
|
||||
console.log(chalk.cyan(data.access_token));
|
||||
console.log();
|
||||
console.log(
|
||||
chalk.bold('Token Type: ') + data.token_type,
|
||||
);
|
||||
console.log(
|
||||
chalk.bold('Expires In: ') + `${data.expires_in}s`,
|
||||
);
|
||||
console.log(
|
||||
chalk.bold('Expires At: ') + chalk.dim(expiresAt),
|
||||
);
|
||||
if (data.scope !== undefined) {
|
||||
console.log(chalk.bold('Scope: ') + data.scope);
|
||||
}
|
||||
|
||||
} catch (err) {
|
||||
console.error(
|
||||
chalk.red('Error:'),
|
||||
err instanceof Error ? err.message : String(err),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
});
|
||||
}
|
||||
105
cli/src/commands/list-agents.ts
Normal file
105
cli/src/commands/list-agents.ts
Normal file
@@ -0,0 +1,105 @@
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { requireConfig } from '../config';
|
||||
import { apiRequest } from '../api';
|
||||
|
||||
interface Agent {
|
||||
id: string;
|
||||
name: string;
|
||||
status: string;
|
||||
createdAt: string;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
interface AgentsResponse {
|
||||
agents: Agent[];
|
||||
total?: number;
|
||||
}
|
||||
|
||||
function truncate(str: string, maxLen: number): string {
|
||||
if (str.length <= maxLen) return str;
|
||||
return str.slice(0, maxLen - 1) + '…';
|
||||
}
|
||||
|
||||
function padEnd(str: string, len: number): string {
|
||||
return str.padEnd(len, ' ');
|
||||
}
|
||||
|
||||
export function registerListAgents(program: Command): void {
|
||||
program
|
||||
.command('list-agents')
|
||||
.description('List all registered agents')
|
||||
.action(async () => {
|
||||
const config = requireConfig();
|
||||
|
||||
try {
|
||||
const data = await apiRequest<AgentsResponse | Agent[]>(
|
||||
config,
|
||||
'/agents',
|
||||
);
|
||||
|
||||
const agents: Agent[] = Array.isArray(data)
|
||||
? data
|
||||
: (data as AgentsResponse).agents ?? [];
|
||||
|
||||
if (agents.length === 0) {
|
||||
console.log(chalk.yellow('No agents found.'));
|
||||
return;
|
||||
}
|
||||
|
||||
const ID_W = 26;
|
||||
const NAME_W = 24;
|
||||
const STATUS_W = 10;
|
||||
const DATE_W = 20;
|
||||
|
||||
const header =
|
||||
chalk.bold(padEnd('AGENT ID', ID_W)) +
|
||||
' ' +
|
||||
chalk.bold(padEnd('NAME', NAME_W)) +
|
||||
' ' +
|
||||
chalk.bold(padEnd('STATUS', STATUS_W)) +
|
||||
' ' +
|
||||
chalk.bold('CREATED AT');
|
||||
|
||||
const divider = chalk.dim(
|
||||
'─'.repeat(ID_W + NAME_W + STATUS_W + DATE_W + 6),
|
||||
);
|
||||
|
||||
console.log(header);
|
||||
console.log(divider);
|
||||
|
||||
for (const agent of agents) {
|
||||
const statusColor =
|
||||
agent.status === 'active'
|
||||
? chalk.green
|
||||
: agent.status === 'inactive'
|
||||
? chalk.yellow
|
||||
: chalk.red;
|
||||
|
||||
const createdAt = new Date(agent.createdAt).toLocaleString();
|
||||
|
||||
console.log(
|
||||
chalk.cyan(padEnd(truncate(agent.id, ID_W), ID_W)) +
|
||||
' ' +
|
||||
padEnd(truncate(agent.name, NAME_W), NAME_W) +
|
||||
' ' +
|
||||
statusColor(padEnd(truncate(agent.status, STATUS_W), STATUS_W)) +
|
||||
' ' +
|
||||
chalk.dim(truncate(createdAt, DATE_W)),
|
||||
);
|
||||
}
|
||||
|
||||
console.log(divider);
|
||||
const total = Array.isArray(data)
|
||||
? agents.length
|
||||
: ((data as AgentsResponse).total ?? agents.length);
|
||||
console.log(chalk.dim(`Total: ${total}`));
|
||||
} catch (err) {
|
||||
console.error(
|
||||
chalk.red('Error:'),
|
||||
err instanceof Error ? err.message : String(err),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
});
|
||||
}
|
||||
54
cli/src/commands/register-agent.ts
Normal file
54
cli/src/commands/register-agent.ts
Normal file
@@ -0,0 +1,54 @@
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { requireConfig } from '../config';
|
||||
import { apiRequest } from '../api';
|
||||
|
||||
interface AgentResponse {
|
||||
id: string;
|
||||
name: string;
|
||||
description?: string;
|
||||
status: string;
|
||||
createdAt: string;
|
||||
}
|
||||
|
||||
export function registerRegisterAgent(program: Command): void {
|
||||
program
|
||||
.command('register-agent')
|
||||
.description('Register a new agent')
|
||||
.requiredOption('--name <name>', 'Agent name')
|
||||
.option('--description <desc>', 'Agent description')
|
||||
.action(async (options: { name: string; description?: string }) => {
|
||||
const config = requireConfig();
|
||||
|
||||
try {
|
||||
const body: { name: string; description?: string } = {
|
||||
name: options.name,
|
||||
};
|
||||
if (options.description !== undefined) {
|
||||
body.description = options.description;
|
||||
}
|
||||
|
||||
const agent = await apiRequest<AgentResponse>(config, '/agents', {
|
||||
method: 'POST',
|
||||
body,
|
||||
});
|
||||
|
||||
console.log(chalk.green('✓') + ' Agent registered successfully');
|
||||
console.log();
|
||||
console.log(
|
||||
chalk.bold('Agent ID: ') + chalk.cyan(agent.id),
|
||||
);
|
||||
console.log(chalk.bold('Name: ') + agent.name);
|
||||
if (agent.description !== undefined) {
|
||||
console.log(chalk.bold('Description:') + ' ' + agent.description);
|
||||
}
|
||||
console.log(chalk.bold('Status: ') + agent.status);
|
||||
} catch (err) {
|
||||
console.error(
|
||||
chalk.red('Error:'),
|
||||
err instanceof Error ? err.message : String(err),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
});
|
||||
}
|
||||
85
cli/src/commands/rotate-credentials.ts
Normal file
85
cli/src/commands/rotate-credentials.ts
Normal file
@@ -0,0 +1,85 @@
|
||||
import * as readline from 'readline';
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { requireConfig } from '../config';
|
||||
import { apiRequest } from '../api';
|
||||
|
||||
interface RotateResponse {
|
||||
clientId: string;
|
||||
clientSecret: string;
|
||||
rotatedAt?: string;
|
||||
}
|
||||
|
||||
function prompt(rl: readline.Interface, question: string): Promise<string> {
|
||||
return new Promise((resolve) => {
|
||||
rl.question(question, (answer) => {
|
||||
resolve(answer.trim());
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
export function registerRotateCredentials(program: Command): void {
|
||||
program
|
||||
.command('rotate-credentials')
|
||||
.description('Rotate credentials for an agent (invalidates current secret)')
|
||||
.requiredOption('--agent-id <id>', 'Agent ID whose credentials to rotate')
|
||||
.action(async (options: { agentId: string }) => {
|
||||
const config = requireConfig();
|
||||
|
||||
const rl = readline.createInterface({
|
||||
input: process.stdin,
|
||||
output: process.stdout,
|
||||
});
|
||||
|
||||
try {
|
||||
console.log(
|
||||
chalk.yellow('⚠') +
|
||||
' This will invalidate the current secret for agent ' +
|
||||
chalk.cyan(options.agentId),
|
||||
);
|
||||
|
||||
const answer = await prompt(
|
||||
rl,
|
||||
chalk.bold('This will invalidate the current secret. Continue? [y/N] '),
|
||||
);
|
||||
|
||||
if (answer.toLowerCase() !== 'y' && answer.toLowerCase() !== 'yes') {
|
||||
console.log(chalk.dim('Aborted.'));
|
||||
return;
|
||||
}
|
||||
|
||||
const data = await apiRequest<RotateResponse>(
|
||||
config,
|
||||
`/agents/${options.agentId}/credentials/rotate`,
|
||||
{ method: 'POST' },
|
||||
);
|
||||
|
||||
console.log();
|
||||
console.log(chalk.green('✓') + ' Credentials rotated successfully');
|
||||
console.log();
|
||||
console.log(chalk.bold('Client ID: ') + chalk.cyan(data.clientId));
|
||||
console.log(
|
||||
chalk.bold('Client Secret: ') + chalk.yellow(data.clientSecret),
|
||||
);
|
||||
console.log();
|
||||
console.log(
|
||||
chalk.dim(
|
||||
'Store the new client secret securely — it will not be shown again.',
|
||||
),
|
||||
);
|
||||
if (data.rotatedAt !== undefined) {
|
||||
console.log(
|
||||
chalk.dim('Rotated at: ') + chalk.dim(data.rotatedAt),
|
||||
);
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(
|
||||
chalk.red('Error:'),
|
||||
err instanceof Error ? err.message : String(err),
|
||||
);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
rl.close();
|
||||
}
|
||||
});
|
||||
}
|
||||
122
cli/src/commands/tail-audit-log.ts
Normal file
122
cli/src/commands/tail-audit-log.ts
Normal file
@@ -0,0 +1,122 @@
|
||||
import { Command } from 'commander';
|
||||
import chalk from 'chalk';
|
||||
import { requireConfig } from '../config';
|
||||
import { apiRequest } from '../api';
|
||||
|
||||
interface AuditEvent {
|
||||
id: string;
|
||||
timestamp: string;
|
||||
action: string;
|
||||
agentId?: string;
|
||||
tenantId?: string;
|
||||
outcome: string;
|
||||
details?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
interface AuditLogsResponse {
|
||||
events: AuditEvent[];
|
||||
nextCursor?: string;
|
||||
}
|
||||
|
||||
function formatEvent(event: AuditEvent): string {
|
||||
const ts = chalk.dim(new Date(event.timestamp).toLocaleString());
|
||||
const outcome =
|
||||
event.outcome === 'success'
|
||||
? chalk.green(event.outcome)
|
||||
: chalk.red(event.outcome);
|
||||
const action = chalk.cyan(event.action);
|
||||
const agentPart =
|
||||
event.agentId !== undefined
|
||||
? ' ' + chalk.dim('agent=' + event.agentId)
|
||||
: '';
|
||||
|
||||
return `${ts} ${action} outcome=${outcome}${agentPart} id=${chalk.dim(event.id)}`;
|
||||
}
|
||||
|
||||
export function registerTailAuditLog(program: Command): void {
|
||||
program
|
||||
.command('tail-audit-log')
|
||||
.description(
|
||||
'Poll and stream audit log events every 5 seconds (Ctrl+C to stop)',
|
||||
)
|
||||
.option('--agent-id <id>', 'Filter events for a specific agent ID')
|
||||
.action(async (options: { agentId?: string }) => {
|
||||
const config = requireConfig();
|
||||
|
||||
console.log(
|
||||
chalk.bold('Tailing audit log') +
|
||||
(options.agentId !== undefined
|
||||
? chalk.dim(` (agent: ${options.agentId})`)
|
||||
: '') +
|
||||
chalk.dim(' — press Ctrl+C to stop'),
|
||||
);
|
||||
console.log(chalk.dim('─'.repeat(60)));
|
||||
|
||||
const seenIds = new Set<string>();
|
||||
let cursor: string | undefined;
|
||||
let running = true;
|
||||
|
||||
process.on('SIGINT', () => {
|
||||
running = false;
|
||||
console.log();
|
||||
console.log(chalk.dim('Stopped.'));
|
||||
process.exit(0);
|
||||
});
|
||||
|
||||
while (running) {
|
||||
try {
|
||||
const params: Record<string, string> = {};
|
||||
if (options.agentId !== undefined) {
|
||||
params['agentId'] = options.agentId;
|
||||
}
|
||||
if (cursor !== undefined) {
|
||||
params['cursor'] = cursor;
|
||||
}
|
||||
// Request events from the last poll window
|
||||
params['limit'] = '50';
|
||||
|
||||
const data = await apiRequest<AuditLogsResponse | AuditEvent[]>(
|
||||
config,
|
||||
'/audit/logs',
|
||||
{ params },
|
||||
);
|
||||
|
||||
const events: AuditEvent[] = Array.isArray(data)
|
||||
? data
|
||||
: (data as AuditLogsResponse).events ?? [];
|
||||
|
||||
if (!Array.isArray(data) && (data as AuditLogsResponse).nextCursor !== undefined) {
|
||||
cursor = (data as AuditLogsResponse).nextCursor;
|
||||
}
|
||||
|
||||
for (const event of events) {
|
||||
if (!seenIds.has(event.id)) {
|
||||
seenIds.add(event.id);
|
||||
console.log(formatEvent(event));
|
||||
}
|
||||
}
|
||||
|
||||
// Keep the seenIds set bounded to avoid unbounded memory growth
|
||||
if (seenIds.size > 10_000) {
|
||||
const arr = Array.from(seenIds);
|
||||
const keep = arr.slice(arr.length - 5_000);
|
||||
seenIds.clear();
|
||||
for (const id of keep) seenIds.add(id);
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(
|
||||
chalk.yellow('⚠') +
|
||||
' Poll error: ' +
|
||||
(err instanceof Error ? err.message : String(err)),
|
||||
);
|
||||
}
|
||||
|
||||
// Wait 5 seconds between polls
|
||||
await new Promise<void>((resolve) => {
|
||||
const timer = setTimeout(resolve, 5000);
|
||||
// Allow the timer to be garbage-collected if process exits
|
||||
if (typeof timer.unref === 'function') timer.unref();
|
||||
});
|
||||
}
|
||||
});
|
||||
}
|
||||
61
cli/src/config.ts
Normal file
61
cli/src/config.ts
Normal file
@@ -0,0 +1,61 @@
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
export interface Config {
|
||||
apiUrl: string;
|
||||
clientId: string;
|
||||
clientSecret: string;
|
||||
}
|
||||
|
||||
const CONFIG_DIR = path.join(os.homedir(), '.sentryagent');
|
||||
const CONFIG_FILE = path.join(CONFIG_DIR, 'config.json');
|
||||
|
||||
export function readConfig(): Config | null {
|
||||
if (!fs.existsSync(CONFIG_FILE)) {
|
||||
return null;
|
||||
}
|
||||
try {
|
||||
const raw = fs.readFileSync(CONFIG_FILE, 'utf-8');
|
||||
const parsed: unknown = JSON.parse(raw);
|
||||
if (
|
||||
parsed !== null &&
|
||||
typeof parsed === 'object' &&
|
||||
'apiUrl' in parsed &&
|
||||
'clientId' in parsed &&
|
||||
'clientSecret' in parsed &&
|
||||
typeof (parsed as Record<string, unknown>)['apiUrl'] === 'string' &&
|
||||
typeof (parsed as Record<string, unknown>)['clientId'] === 'string' &&
|
||||
typeof (parsed as Record<string, unknown>)['clientSecret'] === 'string'
|
||||
) {
|
||||
const p = parsed as Record<string, unknown>;
|
||||
return {
|
||||
apiUrl: p['apiUrl'] as string,
|
||||
clientId: p['clientId'] as string,
|
||||
clientSecret: p['clientSecret'] as string,
|
||||
};
|
||||
}
|
||||
return null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export function writeConfig(config: Config): void {
|
||||
if (!fs.existsSync(CONFIG_DIR)) {
|
||||
fs.mkdirSync(CONFIG_DIR, { recursive: true, mode: 0o700 });
|
||||
}
|
||||
fs.writeFileSync(CONFIG_FILE, JSON.stringify(config, null, 2), {
|
||||
encoding: 'utf-8',
|
||||
mode: 0o600,
|
||||
});
|
||||
}
|
||||
|
||||
export function requireConfig(): Config {
|
||||
const config = readConfig();
|
||||
if (config === null) {
|
||||
console.error('Not configured. Run `sentryagent configure` first.');
|
||||
process.exit(1);
|
||||
}
|
||||
return config;
|
||||
}
|
||||
31
cli/src/index.ts
Normal file
31
cli/src/index.ts
Normal file
@@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
import { Command } from 'commander';
|
||||
import packageJson from '../package.json';
|
||||
|
||||
import { registerConfigure } from './commands/configure';
|
||||
import { registerRegisterAgent } from './commands/register-agent';
|
||||
import { registerListAgents } from './commands/list-agents';
|
||||
import { registerIssueToken } from './commands/issue-token';
|
||||
import { registerRotateCredentials } from './commands/rotate-credentials';
|
||||
import { registerTailAuditLog } from './commands/tail-audit-log';
|
||||
import { registerCompletion } from './commands/completion';
|
||||
|
||||
const program = new Command();
|
||||
|
||||
program
|
||||
.name('sentryagent')
|
||||
.description('SentryAgent.ai CLI — manage agents, tokens, and audit logs')
|
||||
.version(packageJson.version, '-v, --version', 'Output the current version');
|
||||
|
||||
// Register all commands
|
||||
registerConfigure(program);
|
||||
registerRegisterAgent(program);
|
||||
registerListAgents(program);
|
||||
registerIssueToken(program);
|
||||
registerRotateCredentials(program);
|
||||
registerTailAuditLog(program);
|
||||
registerCompletion(program);
|
||||
|
||||
// Parse args — commander will display help automatically on --help
|
||||
program.parse(process.argv);
|
||||
29
cli/tsconfig.json
Normal file
29
cli/tsconfig.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2020",
|
||||
"module": "commonjs",
|
||||
"lib": ["ES2020"],
|
||||
"outDir": "./dist",
|
||||
"rootDir": "./src",
|
||||
"strict": true,
|
||||
"noImplicitAny": true,
|
||||
"strictNullChecks": true,
|
||||
"strictFunctionTypes": true,
|
||||
"strictBindCallApply": true,
|
||||
"strictPropertyInitialization": true,
|
||||
"noImplicitThis": true,
|
||||
"alwaysStrict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"noImplicitReturns": true,
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"esModuleInterop": true,
|
||||
"skipLibCheck": true,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"resolveJsonModule": true,
|
||||
"declaration": true,
|
||||
"sourceMap": true
|
||||
},
|
||||
"include": ["src/**/*"],
|
||||
"exclude": ["node_modules", "dist"]
|
||||
}
|
||||
95
dashboard/README.md
Normal file
95
dashboard/README.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# SentryAgent.ai AgentIdP — Web Dashboard
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The AgentIdP Dashboard is a React 18 single-page application (SPA) that provides a visual
|
||||
management interface for the AgentIdP API. It allows operators to:
|
||||
|
||||
- Browse, search, and filter all registered AI agents
|
||||
- View agent details and manage lifecycle (suspend / reactivate)
|
||||
- Generate, rotate, and revoke agent credentials
|
||||
- Query the audit log with filters for agent, action, outcome, and date range
|
||||
- Monitor PostgreSQL and Redis connectivity in real time
|
||||
|
||||
The dashboard is co-served by the Express API server at `/dashboard/` — no separate hosting
|
||||
is required.
|
||||
|
||||
## 2. Prerequisites
|
||||
|
||||
- Node.js 18+
|
||||
- A running AgentIdP server (local or remote)
|
||||
- An active agent credential (Client ID + Client Secret) with full scopes
|
||||
|
||||
## 3. Development
|
||||
|
||||
Install dashboard dependencies:
|
||||
|
||||
```bash
|
||||
cd dashboard
|
||||
npm install
|
||||
```
|
||||
|
||||
Start the Vite dev server:
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
The dev server starts at `http://localhost:5173/dashboard/`. API calls are made to
|
||||
`window.location.origin` (defaulted in the Login form), so either:
|
||||
|
||||
- Set the **API Base URL** field to your local server (e.g. `http://localhost:3000`)
|
||||
- Or configure a Vite proxy in `vite.config.ts` for `/api` and `/health` paths
|
||||
|
||||
## 4. Building
|
||||
|
||||
Compile TypeScript and bundle with Vite:
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
```
|
||||
|
||||
Output is written to `dashboard/dist/`. The build is an optimised static bundle (HTML, CSS, JS).
|
||||
|
||||
To verify the build locally:
|
||||
|
||||
```bash
|
||||
npm run preview
|
||||
```
|
||||
|
||||
## 5. Deployment
|
||||
|
||||
The AgentIdP Express server automatically serves the built dashboard:
|
||||
|
||||
- Static assets at `/dashboard/` (via `express.static`)
|
||||
- SPA fallback — all `/dashboard/*` requests not matching a static file return `index.html`
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Build the dashboard: `cd dashboard && npm run build`
|
||||
2. Start (or restart) the AgentIdP server: `npm start`
|
||||
3. Open `https://your-api-host/dashboard/` in a browser
|
||||
|
||||
No additional nginx or CDN configuration is required for basic deployments.
|
||||
|
||||
## 6. Login
|
||||
|
||||
The login form has three fields:
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| **API Base URL** | Base URL of the AgentIdP server, e.g. `https://api.example.com`. Defaults to the current page origin, which works when the dashboard is co-served. |
|
||||
| **Client ID** | The UUID of an agent registered in AgentIdP. This agent must have the scopes `agents:read agents:write tokens:read audit:read`. |
|
||||
| **Client Secret** | The plain-text client secret for the agent. Validated against the token endpoint on login. |
|
||||
|
||||
Credentials are stored in `sessionStorage` only — they are cleared when the browser tab is closed.
|
||||
|
||||
## 7. Pages
|
||||
|
||||
| Page | Route | Description |
|
||||
|---|---|---|
|
||||
| **Agents** | `/dashboard/agents` | Paginated list of all agents. Search by email (debounced), filter by status. Click a row for details. |
|
||||
| **Agent Detail** | `/dashboard/agents/:agentId` | Full agent metadata. Suspend or reactivate (with confirmation). Link to credentials. |
|
||||
| **Credentials** | `/dashboard/agents/:agentId/credentials` | List all credentials. Generate, rotate, or revoke. New secrets shown exactly once. |
|
||||
| **Audit Log** | `/dashboard/audit` | Paginated audit events with filters for agent ID, action, outcome, and date range. |
|
||||
| **Health** | `/dashboard/health` | PostgreSQL and Redis connectivity cards. Auto-refreshes every 30 seconds. |
|
||||
12
dashboard/index.html
Normal file
12
dashboard/index.html
Normal file
@@ -0,0 +1,12 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>SentryAgent.ai — AgentIdP Dashboard</title>
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.tsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
2755
dashboard/package-lock.json
generated
Normal file
2755
dashboard/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
29
dashboard/package.json
Normal file
29
dashboard/package.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"name": "@sentryagent/dashboard",
|
||||
"version": "1.0.0",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "vite",
|
||||
"build": "tsc -p tsconfig.app.json && vite build",
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"dependencies": {
|
||||
"@sentryagent/idp-sdk": "file:../sdk",
|
||||
"react": "^18.3.1",
|
||||
"react-dom": "^18.3.1",
|
||||
"react-router-dom": "^6.26.2",
|
||||
"lucide-react": "^0.446.0",
|
||||
"clsx": "^2.1.1",
|
||||
"tailwind-merge": "^2.5.2"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/react": "^18.3.5",
|
||||
"@types/react-dom": "^18.3.0",
|
||||
"@vitejs/plugin-react": "^4.3.1",
|
||||
"autoprefixer": "^10.4.20",
|
||||
"postcss": "^8.4.47",
|
||||
"tailwindcss": "^3.4.12",
|
||||
"typescript": "^5.5.3",
|
||||
"vite": "^5.4.8"
|
||||
}
|
||||
}
|
||||
6
dashboard/postcss.config.js
Normal file
6
dashboard/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
||||
export default {
|
||||
plugins: {
|
||||
tailwindcss: {},
|
||||
autoprefixer: {},
|
||||
},
|
||||
};
|
||||
35
dashboard/src/App.tsx
Normal file
35
dashboard/src/App.tsx
Normal file
@@ -0,0 +1,35 @@
|
||||
import * as React from 'react';
|
||||
import { Routes, Route, Navigate } from 'react-router-dom';
|
||||
import { AuthProvider } from '@/lib/auth';
|
||||
import { RequireAuth } from '@/components/RequireAuth';
|
||||
import { AppShell } from '@/components/layout/AppShell';
|
||||
import Login from '@/pages/Login';
|
||||
import Agents from '@/pages/Agents';
|
||||
import AgentDetail from '@/pages/AgentDetail';
|
||||
import Credentials from '@/pages/Credentials';
|
||||
import AuditLog from '@/pages/AuditLog';
|
||||
import Health from '@/pages/Health';
|
||||
import { UsagePanel } from '@/components/UsagePanel';
|
||||
|
||||
/** Top-level router — defines all application routes. */
|
||||
export default function App(): React.JSX.Element {
|
||||
return (
|
||||
<AuthProvider>
|
||||
<Routes>
|
||||
<Route path="/dashboard/login" element={<Login />} />
|
||||
<Route element={<RequireAuth />}>
|
||||
<Route element={<AppShell />}>
|
||||
<Route path="/dashboard/agents" element={<Agents />} />
|
||||
<Route path="/dashboard/agents/:agentId" element={<AgentDetail />} />
|
||||
<Route path="/dashboard/agents/:agentId/credentials" element={<Credentials />} />
|
||||
<Route path="/dashboard/audit" element={<AuditLog />} />
|
||||
<Route path="/dashboard/health" element={<Health />} />
|
||||
<Route path="/dashboard/usage" element={<UsagePanel />} />
|
||||
</Route>
|
||||
</Route>
|
||||
<Route path="/dashboard" element={<Navigate to="/dashboard/agents" replace />} />
|
||||
<Route path="*" element={<Navigate to="/dashboard/agents" replace />} />
|
||||
</Routes>
|
||||
</AuthProvider>
|
||||
);
|
||||
}
|
||||
11
dashboard/src/components/RequireAuth.tsx
Normal file
11
dashboard/src/components/RequireAuth.tsx
Normal file
@@ -0,0 +1,11 @@
|
||||
import * as React from 'react';
|
||||
import { Navigate, Outlet } from 'react-router-dom';
|
||||
import { isAuthenticated } from '@/lib/auth';
|
||||
|
||||
/** Redirects to /dashboard/login if not authenticated. */
|
||||
export function RequireAuth(): React.JSX.Element {
|
||||
if (!isAuthenticated()) {
|
||||
return <Navigate to="/dashboard/login" replace />;
|
||||
}
|
||||
return <Outlet />;
|
||||
}
|
||||
192
dashboard/src/components/UsagePanel.tsx
Normal file
192
dashboard/src/components/UsagePanel.tsx
Normal file
@@ -0,0 +1,192 @@
|
||||
import * as React from 'react';
|
||||
import { useAuth } from '@/lib/auth';
|
||||
import { TokenManager } from '@sentryagent/idp-sdk';
|
||||
|
||||
/** Shape of the GET /api/v1/billing/usage response. */
|
||||
interface UsageResponse {
|
||||
tenantId: string;
|
||||
date: string;
|
||||
apiCalls: number;
|
||||
agentCount: number;
|
||||
subscriptionStatus: string;
|
||||
currentPeriodEnd: string | null;
|
||||
stripeSubscriptionId: string | null;
|
||||
}
|
||||
|
||||
type LoadState = 'idle' | 'loading' | 'success' | 'error';
|
||||
|
||||
interface UsageState {
|
||||
loadState: LoadState;
|
||||
data: UsageResponse | null;
|
||||
errorMessage: string | null;
|
||||
}
|
||||
|
||||
const initialState: UsageState = {
|
||||
loadState: 'idle',
|
||||
data: null,
|
||||
errorMessage: null,
|
||||
};
|
||||
|
||||
/**
|
||||
* Fetches the current usage summary from the API using the stored credentials.
|
||||
*
|
||||
* @param baseUrl - The API base URL.
|
||||
* @param clientId - The agent client ID.
|
||||
* @param clientSecret - The agent client secret.
|
||||
* @returns The usage response from the server.
|
||||
*/
|
||||
async function fetchUsage(
|
||||
baseUrl: string,
|
||||
clientId: string,
|
||||
clientSecret: string,
|
||||
): Promise<UsageResponse> {
|
||||
const tokenManager = new TokenManager(
|
||||
baseUrl,
|
||||
clientId,
|
||||
clientSecret,
|
||||
'agents:read',
|
||||
);
|
||||
const token = await tokenManager.getToken();
|
||||
|
||||
const response = await fetch(`${baseUrl}/api/v1/billing/usage`, {
|
||||
headers: { Authorization: `Bearer ${token}` },
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`Failed to fetch usage data (HTTP ${response.status})`);
|
||||
}
|
||||
|
||||
return response.json() as Promise<UsageResponse>;
|
||||
}
|
||||
|
||||
/** Badge shown for the tenant's subscription tier. */
|
||||
function SubscriptionBadge({ status }: { status: string }): React.JSX.Element {
|
||||
const isPro = status !== 'free';
|
||||
|
||||
return (
|
||||
<span
|
||||
className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${
|
||||
isPro
|
||||
? 'bg-brand-100 text-brand-700'
|
||||
: 'bg-slate-100 text-slate-600'
|
||||
}`}
|
||||
>
|
||||
{isPro ? 'Pro' : 'Free Tier'}
|
||||
</span>
|
||||
);
|
||||
}
|
||||
|
||||
/** A single metric card with label and value. */
|
||||
function MetricCard({ label, value }: { label: string; value: string | number }): React.JSX.Element {
|
||||
return (
|
||||
<div className="rounded-xl border border-slate-200 bg-white p-6 shadow-sm">
|
||||
<p className="text-sm font-medium text-slate-500">{label}</p>
|
||||
<p className="mt-1 text-2xl font-bold text-slate-900">{value}</p>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Displays the current tenant's usage summary:
|
||||
* - API calls today
|
||||
* - Active agent count
|
||||
* - Subscription status (Free Tier / Pro)
|
||||
*
|
||||
* Fetches GET /api/v1/billing/usage with the current Bearer token.
|
||||
* Handles loading state and error state gracefully.
|
||||
*/
|
||||
export function UsagePanel(): React.JSX.Element {
|
||||
const { credentials } = useAuth();
|
||||
const [state, setState] = React.useState<UsageState>(initialState);
|
||||
|
||||
const loadUsage = React.useCallback(async (): Promise<void> => {
|
||||
if (!credentials) return;
|
||||
|
||||
setState((prev) => ({ ...prev, loadState: 'loading', errorMessage: null }));
|
||||
|
||||
try {
|
||||
const data = await fetchUsage(
|
||||
credentials.baseUrl,
|
||||
credentials.clientId,
|
||||
credentials.clientSecret,
|
||||
);
|
||||
setState({ loadState: 'success', data, errorMessage: null });
|
||||
} catch (err) {
|
||||
const message = err instanceof Error ? err.message : 'Unknown error occurred.';
|
||||
setState({ loadState: 'error', data: null, errorMessage: message });
|
||||
}
|
||||
}, [credentials]);
|
||||
|
||||
React.useEffect(() => {
|
||||
void loadUsage();
|
||||
}, [loadUsage]);
|
||||
|
||||
const isLoading = state.loadState === 'loading' || state.loadState === 'idle';
|
||||
|
||||
return (
|
||||
<div>
|
||||
<div className="mb-6 flex items-center justify-between">
|
||||
<h1 className="text-2xl font-bold text-slate-900">Usage & Billing</h1>
|
||||
<button
|
||||
onClick={() => { void loadUsage(); }}
|
||||
disabled={isLoading}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 text-sm hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Refresh
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Error state */}
|
||||
{state.loadState === 'error' && (
|
||||
<div className="mb-6 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{state.errorMessage ?? 'Failed to load usage data.'}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Loading skeleton */}
|
||||
{isLoading && (
|
||||
<div className="grid grid-cols-1 gap-4 sm:grid-cols-3 animate-pulse">
|
||||
{[1, 2, 3].map((i) => (
|
||||
<div key={i} className="h-28 rounded-xl border border-slate-200 bg-slate-100" />
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Data */}
|
||||
{state.loadState === 'success' && state.data !== null && (
|
||||
<>
|
||||
<div className="mb-4 flex items-center gap-3">
|
||||
<p className="text-sm text-slate-500">
|
||||
Showing usage for <strong>{state.data.date}</strong>
|
||||
</p>
|
||||
<SubscriptionBadge status={state.data.subscriptionStatus} />
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-1 gap-4 sm:grid-cols-3">
|
||||
<MetricCard label="API Calls Today" value={state.data.apiCalls.toLocaleString()} />
|
||||
<MetricCard label="Active Agents" value={state.data.agentCount.toLocaleString()} />
|
||||
<MetricCard label="Plan" value={state.data.subscriptionStatus === 'free' ? 'Free Tier' : 'Pro'} />
|
||||
</div>
|
||||
|
||||
{state.data.subscriptionStatus === 'free' && (
|
||||
<div className="mt-6 rounded-xl border border-brand-200 bg-brand-50 p-5">
|
||||
<p className="text-sm font-medium text-brand-800">
|
||||
You are on the Free Tier — limited to 10 agents and 1,000 API calls/day.
|
||||
</p>
|
||||
<p className="mt-1 text-sm text-brand-700">
|
||||
Upgrade to Pro for unlimited agents and API calls.
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{state.data.currentPeriodEnd !== null && (
|
||||
<p className="mt-4 text-xs text-slate-400">
|
||||
Current period ends:{' '}
|
||||
{new Date(state.data.currentPeriodEnd).toLocaleDateString()}
|
||||
</p>
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
63
dashboard/src/components/layout/AppShell.tsx
Normal file
63
dashboard/src/components/layout/AppShell.tsx
Normal file
@@ -0,0 +1,63 @@
|
||||
import * as React from 'react';
|
||||
import { NavLink, Outlet } from 'react-router-dom';
|
||||
import { cn } from '@/lib/utils';
|
||||
import { useAuth } from '@/lib/auth';
|
||||
|
||||
interface NavItem {
|
||||
to: string;
|
||||
label: string;
|
||||
}
|
||||
|
||||
const NAV_ITEMS: NavItem[] = [
|
||||
{ to: '/dashboard/agents', label: 'Agents' },
|
||||
{ to: '/dashboard/audit', label: 'Audit Log' },
|
||||
{ to: '/dashboard/health', label: 'Health' },
|
||||
{ to: '/dashboard/usage', label: 'Usage' },
|
||||
];
|
||||
|
||||
/**
|
||||
* Outer application shell: top navigation bar and main content area.
|
||||
* Renders the active page via <Outlet />.
|
||||
*/
|
||||
export function AppShell(): React.JSX.Element {
|
||||
const { logout } = useAuth();
|
||||
|
||||
return (
|
||||
<div className="min-h-screen bg-slate-50">
|
||||
<header className="border-b border-slate-200 bg-white shadow-sm">
|
||||
<div className="mx-auto flex max-w-7xl items-center justify-between px-4 py-3">
|
||||
<div className="flex items-center gap-8">
|
||||
<span className="text-lg font-bold text-brand-700">SentryAgent.ai</span>
|
||||
<nav className="flex gap-1">
|
||||
{NAV_ITEMS.map(({ to, label }) => (
|
||||
<NavLink
|
||||
key={to}
|
||||
to={to}
|
||||
className={({ isActive }) =>
|
||||
cn(
|
||||
'rounded-md px-3 py-2 text-sm font-medium transition-colors',
|
||||
isActive
|
||||
? 'bg-brand-50 text-brand-700'
|
||||
: 'text-slate-600 hover:bg-slate-100 hover:text-slate-900',
|
||||
)
|
||||
}
|
||||
>
|
||||
{label}
|
||||
</NavLink>
|
||||
))}
|
||||
</nav>
|
||||
</div>
|
||||
<button
|
||||
onClick={logout}
|
||||
className="text-sm text-slate-500 hover:text-slate-900"
|
||||
>
|
||||
Sign out
|
||||
</button>
|
||||
</div>
|
||||
</header>
|
||||
<main className="mx-auto max-w-7xl px-4 py-8">
|
||||
<Outlet />
|
||||
</main>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
27
dashboard/src/components/ui/badge.tsx
Normal file
27
dashboard/src/components/ui/badge.tsx
Normal file
@@ -0,0 +1,27 @@
|
||||
import * as React from 'react';
|
||||
import { cn } from '@/lib/utils';
|
||||
|
||||
type BadgeVariant = 'default' | 'success' | 'warning' | 'danger' | 'muted';
|
||||
|
||||
interface BadgeProps {
|
||||
variant?: BadgeVariant;
|
||||
children: React.ReactNode;
|
||||
className?: string;
|
||||
}
|
||||
|
||||
const variantClasses: Record<BadgeVariant, string> = {
|
||||
default: 'bg-brand-100 text-brand-700',
|
||||
success: 'bg-green-100 text-green-700',
|
||||
warning: 'bg-yellow-100 text-yellow-700',
|
||||
danger: 'bg-red-100 text-red-700',
|
||||
muted: 'bg-slate-100 text-slate-600',
|
||||
};
|
||||
|
||||
/** Small status badge. */
|
||||
export function Badge({ variant = 'default', children, className }: BadgeProps): React.JSX.Element {
|
||||
return (
|
||||
<span className={cn('inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-medium', variantClasses[variant], className)}>
|
||||
{children}
|
||||
</span>
|
||||
);
|
||||
}
|
||||
65
dashboard/src/components/ui/button.tsx
Normal file
65
dashboard/src/components/ui/button.tsx
Normal file
@@ -0,0 +1,65 @@
|
||||
import * as React from 'react';
|
||||
import { cn } from '@/lib/utils';
|
||||
|
||||
type Variant = 'default' | 'destructive' | 'outline' | 'ghost';
|
||||
type Size = 'sm' | 'md' | 'lg';
|
||||
|
||||
interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
|
||||
variant?: Variant;
|
||||
size?: Size;
|
||||
loading?: boolean;
|
||||
}
|
||||
|
||||
const variantClasses: Record<Variant, string> = {
|
||||
default: 'bg-brand-600 text-white hover:bg-brand-700 focus:ring-brand-500',
|
||||
destructive: 'bg-red-600 text-white hover:bg-red-700 focus:ring-red-500',
|
||||
outline: 'border border-slate-300 bg-white text-slate-700 hover:bg-slate-50 focus:ring-brand-500',
|
||||
ghost: 'text-slate-600 hover:bg-slate-100 hover:text-slate-900 focus:ring-brand-500',
|
||||
};
|
||||
|
||||
const sizeClasses: Record<Size, string> = {
|
||||
sm: 'px-3 py-1.5 text-sm',
|
||||
md: 'px-4 py-2 text-sm',
|
||||
lg: 'px-6 py-3 text-base',
|
||||
};
|
||||
|
||||
/**
|
||||
* Reusable button component with variant and size support.
|
||||
*
|
||||
* @param variant - Visual style: default | destructive | outline | ghost
|
||||
* @param size - Size: sm | md | lg
|
||||
* @param loading - When true, shows a spinner and disables the button
|
||||
*/
|
||||
export function Button({
|
||||
variant = 'default',
|
||||
size = 'md',
|
||||
loading = false,
|
||||
className,
|
||||
children,
|
||||
disabled,
|
||||
...props
|
||||
}: ButtonProps): React.JSX.Element {
|
||||
return (
|
||||
<button
|
||||
className={cn(
|
||||
'inline-flex items-center justify-center gap-2 rounded-md font-medium',
|
||||
'focus:outline-none focus:ring-2 focus:ring-offset-2',
|
||||
'disabled:pointer-events-none disabled:opacity-50',
|
||||
'transition-colors duration-150',
|
||||
variantClasses[variant],
|
||||
sizeClasses[size],
|
||||
className,
|
||||
)}
|
||||
disabled={disabled ?? loading}
|
||||
{...props}
|
||||
>
|
||||
{loading && (
|
||||
<svg className="h-4 w-4 animate-spin" fill="none" viewBox="0 0 24 24">
|
||||
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
|
||||
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8v4a4 4 0 00-4 4H4z" />
|
||||
</svg>
|
||||
)}
|
||||
{children}
|
||||
</button>
|
||||
);
|
||||
}
|
||||
45
dashboard/src/components/ui/dialog.tsx
Normal file
45
dashboard/src/components/ui/dialog.tsx
Normal file
@@ -0,0 +1,45 @@
|
||||
import * as React from 'react';
|
||||
import { Button } from './button';
|
||||
|
||||
interface DialogProps {
|
||||
open: boolean;
|
||||
title: string;
|
||||
description: string;
|
||||
confirmLabel?: string;
|
||||
cancelLabel?: string;
|
||||
variant?: 'default' | 'destructive';
|
||||
onConfirm: () => void;
|
||||
onCancel: () => void;
|
||||
}
|
||||
|
||||
/**
|
||||
* Modal confirmation dialog for destructive actions (suspend, revoke, rotate).
|
||||
*/
|
||||
export function ConfirmDialog({
|
||||
open,
|
||||
title,
|
||||
description,
|
||||
confirmLabel = 'Confirm',
|
||||
cancelLabel = 'Cancel',
|
||||
variant = 'default',
|
||||
onConfirm,
|
||||
onCancel,
|
||||
}: DialogProps): React.JSX.Element | null {
|
||||
if (!open) return null;
|
||||
|
||||
return (
|
||||
<div className="fixed inset-0 z-50 flex items-center justify-center">
|
||||
<div className="absolute inset-0 bg-black/50" onClick={onCancel} />
|
||||
<div className="relative z-10 w-full max-w-md rounded-lg bg-white p-6 shadow-xl">
|
||||
<h2 className="text-lg font-semibold text-slate-900">{title}</h2>
|
||||
<p className="mt-2 text-sm text-slate-600">{description}</p>
|
||||
<div className="mt-6 flex justify-end gap-3">
|
||||
<Button variant="outline" onClick={onCancel}>{cancelLabel}</Button>
|
||||
<Button variant={variant === 'destructive' ? 'destructive' : 'default'} onClick={onConfirm}>
|
||||
{confirmLabel}
|
||||
</Button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
26
dashboard/src/index.css
Normal file
26
dashboard/src/index.css
Normal file
@@ -0,0 +1,26 @@
|
||||
@tailwind base;
|
||||
@tailwind components;
|
||||
@tailwind utilities;
|
||||
|
||||
@layer base {
|
||||
:root {
|
||||
--background: 0 0% 100%;
|
||||
--foreground: 222.2 84% 4.9%;
|
||||
--muted: 210 40% 96.1%;
|
||||
--muted-foreground: 215.4 16.3% 46.9%;
|
||||
--border: 214.3 31.8% 91.4%;
|
||||
--input: 214.3 31.8% 91.4%;
|
||||
--ring: 198 89% 48%;
|
||||
--radius: 0.5rem;
|
||||
}
|
||||
}
|
||||
|
||||
* {
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: system-ui, -apple-system, sans-serif;
|
||||
background-color: #f8fafc;
|
||||
color: #0f172a;
|
||||
}
|
||||
109
dashboard/src/lib/auth.tsx
Normal file
109
dashboard/src/lib/auth.tsx
Normal file
@@ -0,0 +1,109 @@
|
||||
import { TokenManager } from '@sentryagent/idp-sdk';
|
||||
|
||||
const SESSION_KEY = 'agentidp_credentials';
|
||||
|
||||
interface StoredCredentials {
|
||||
clientId: string;
|
||||
clientSecret: string;
|
||||
baseUrl: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Persists user credentials to sessionStorage (cleared on tab close).
|
||||
*/
|
||||
export function saveCredentials(creds: StoredCredentials): void {
|
||||
sessionStorage.setItem(SESSION_KEY, JSON.stringify(creds));
|
||||
}
|
||||
|
||||
/**
|
||||
* Retrieves credentials from sessionStorage.
|
||||
* Returns null if not logged in.
|
||||
*/
|
||||
export function loadCredentials(): StoredCredentials | null {
|
||||
const raw = sessionStorage.getItem(SESSION_KEY);
|
||||
if (!raw) return null;
|
||||
try {
|
||||
return JSON.parse(raw) as StoredCredentials;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes credentials from sessionStorage (logout).
|
||||
*/
|
||||
export function clearCredentials(): void {
|
||||
sessionStorage.removeItem(SESSION_KEY);
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns true if the user has stored credentials.
|
||||
*/
|
||||
export function isAuthenticated(): boolean {
|
||||
return loadCredentials() !== null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Validates stored credentials by requesting a token.
|
||||
* Returns true if successful; false on auth failure.
|
||||
*/
|
||||
export async function validateCredentials(creds: StoredCredentials): Promise<boolean> {
|
||||
try {
|
||||
const tm = new TokenManager(creds.baseUrl, creds.clientId, creds.clientSecret, 'agents:read agents:write tokens:read audit:read');
|
||||
await tm.getToken();
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// ── React context ──────────────────────────────────────────────────────────────
|
||||
|
||||
import * as React from 'react';
|
||||
import { useNavigate } from 'react-router-dom';
|
||||
|
||||
interface AuthContextValue {
|
||||
credentials: StoredCredentials | null;
|
||||
login: (creds: StoredCredentials) => Promise<boolean>;
|
||||
logout: () => void;
|
||||
}
|
||||
|
||||
const AuthContext = React.createContext<AuthContextValue | null>(null);
|
||||
|
||||
/**
|
||||
* Provides authentication state to the application.
|
||||
* Reads initial state from sessionStorage on mount.
|
||||
*/
|
||||
export function AuthProvider({ children }: { children: React.ReactNode }): React.JSX.Element {
|
||||
const [credentials, setCredentials] = React.useState<StoredCredentials | null>(loadCredentials);
|
||||
const navigate = useNavigate();
|
||||
|
||||
const login = React.useCallback(async (creds: StoredCredentials): Promise<boolean> => {
|
||||
const valid = await validateCredentials(creds);
|
||||
if (valid) {
|
||||
saveCredentials(creds);
|
||||
setCredentials(creds);
|
||||
}
|
||||
return valid;
|
||||
}, []);
|
||||
|
||||
const logout = React.useCallback((): void => {
|
||||
clearCredentials();
|
||||
setCredentials(null);
|
||||
navigate('/dashboard/login');
|
||||
}, [navigate]);
|
||||
|
||||
const value = React.useMemo(() => ({ credentials, login, logout }), [credentials, login, logout]);
|
||||
|
||||
return <AuthContext.Provider value={value}>{children}</AuthContext.Provider>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns the current authentication context.
|
||||
* Must be used inside <AuthProvider>.
|
||||
*/
|
||||
export function useAuth(): AuthContextValue {
|
||||
const ctx = React.useContext(AuthContext);
|
||||
if (!ctx) throw new Error('useAuth must be used within AuthProvider');
|
||||
return ctx;
|
||||
}
|
||||
18
dashboard/src/lib/client.ts
Normal file
18
dashboard/src/lib/client.ts
Normal file
@@ -0,0 +1,18 @@
|
||||
import { AgentIdPClient } from '@sentryagent/idp-sdk';
|
||||
import { loadCredentials } from './auth';
|
||||
|
||||
/**
|
||||
* Returns an AgentIdPClient configured with credentials from sessionStorage.
|
||||
* Throws if not authenticated (caller must ensure login first).
|
||||
*/
|
||||
export function getClient(): AgentIdPClient {
|
||||
const creds = loadCredentials();
|
||||
if (!creds) {
|
||||
throw new Error('Not authenticated. Please log in.');
|
||||
}
|
||||
return new AgentIdPClient({
|
||||
baseUrl: creds.baseUrl,
|
||||
clientId: creds.clientId,
|
||||
clientSecret: creds.clientSecret,
|
||||
});
|
||||
}
|
||||
7
dashboard/src/lib/utils.ts
Normal file
7
dashboard/src/lib/utils.ts
Normal file
@@ -0,0 +1,7 @@
|
||||
import { clsx, type ClassValue } from 'clsx';
|
||||
import { twMerge } from 'tailwind-merge';
|
||||
|
||||
/** Merges Tailwind class names, handling conflicts correctly. */
|
||||
export function cn(...inputs: ClassValue[]): string {
|
||||
return twMerge(clsx(inputs));
|
||||
}
|
||||
13
dashboard/src/main.tsx
Normal file
13
dashboard/src/main.tsx
Normal file
@@ -0,0 +1,13 @@
|
||||
import React from 'react';
|
||||
import ReactDOM from 'react-dom/client';
|
||||
import { BrowserRouter } from 'react-router-dom';
|
||||
import App from './App';
|
||||
import './index.css';
|
||||
|
||||
ReactDOM.createRoot(document.getElementById('root')!).render(
|
||||
<React.StrictMode>
|
||||
<BrowserRouter>
|
||||
<App />
|
||||
</BrowserRouter>
|
||||
</React.StrictMode>,
|
||||
);
|
||||
222
dashboard/src/pages/AgentDetail.tsx
Normal file
222
dashboard/src/pages/AgentDetail.tsx
Normal file
@@ -0,0 +1,222 @@
|
||||
import * as React from 'react';
|
||||
import { useParams, useNavigate } from 'react-router-dom';
|
||||
import type { Agent } from '@sentryagent/idp-sdk';
|
||||
import { Badge } from '@/components/ui/badge';
|
||||
import { Button } from '@/components/ui/button';
|
||||
import { ConfirmDialog } from '@/components/ui/dialog';
|
||||
import { getClient } from '@/lib/client';
|
||||
|
||||
type BadgeVariant = 'success' | 'warning' | 'danger';
|
||||
|
||||
/** Maps AgentStatus to a Badge variant. */
|
||||
function statusVariant(status: Agent['status']): BadgeVariant {
|
||||
switch (status) {
|
||||
case 'active': return 'success';
|
||||
case 'suspended': return 'warning';
|
||||
case 'decommissioned': return 'danger';
|
||||
}
|
||||
}
|
||||
|
||||
/** Formats an ISO timestamp to a readable local date-time string. */
|
||||
function formatDateTime(iso: string): string {
|
||||
return new Date(iso).toLocaleString(undefined, {
|
||||
year: 'numeric', month: 'short', day: 'numeric',
|
||||
hour: '2-digit', minute: '2-digit',
|
||||
});
|
||||
}
|
||||
|
||||
interface DetailRowProps {
|
||||
label: string;
|
||||
value: string;
|
||||
}
|
||||
|
||||
/** Single label/value row in the detail card. */
|
||||
function DetailRow({ label, value }: DetailRowProps): React.JSX.Element {
|
||||
return (
|
||||
<div className="flex flex-col gap-1 sm:flex-row sm:gap-4">
|
||||
<dt className="w-36 shrink-0 text-sm font-medium text-slate-500">{label}</dt>
|
||||
<dd className="text-sm text-slate-900 break-all">{value}</dd>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
type DialogAction = 'suspend' | 'reactivate';
|
||||
|
||||
/**
|
||||
* Agent Detail page — shows all agent fields and provides suspend/reactivate actions.
|
||||
* Route: /dashboard/agents/:agentId
|
||||
*/
|
||||
export default function AgentDetail(): React.JSX.Element {
|
||||
const { agentId } = useParams<{ agentId: string }>();
|
||||
const navigate = useNavigate();
|
||||
|
||||
const [agent, setAgent] = React.useState<Agent | null>(null);
|
||||
const [loading, setLoading] = React.useState<boolean>(true);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
const [actionLoading, setActionLoading] = React.useState<boolean>(false);
|
||||
const [dialog, setDialog] = React.useState<DialogAction | null>(null);
|
||||
|
||||
React.useEffect(() => {
|
||||
if (!agentId) return;
|
||||
let cancelled = false;
|
||||
setLoading(true);
|
||||
setError(null);
|
||||
|
||||
const fetchAgent = async (): Promise<void> => {
|
||||
try {
|
||||
const result = await getClient().agents.getAgent(agentId);
|
||||
if (!cancelled) setAgent(result);
|
||||
} catch (err) {
|
||||
if (!cancelled) setError(err instanceof Error ? err.message : 'Failed to load agent.');
|
||||
} finally {
|
||||
if (!cancelled) setLoading(false);
|
||||
}
|
||||
};
|
||||
|
||||
void fetchAgent();
|
||||
return () => { cancelled = true; };
|
||||
}, [agentId]);
|
||||
|
||||
const handleAction = React.useCallback(
|
||||
async (action: DialogAction): Promise<void> => {
|
||||
if (!agentId) return;
|
||||
setActionLoading(true);
|
||||
setDialog(null);
|
||||
try {
|
||||
const newStatus = action === 'suspend' ? 'suspended' : 'active';
|
||||
const updated = await getClient().agents.updateAgent(agentId, { status: newStatus });
|
||||
setAgent(updated);
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Action failed.');
|
||||
} finally {
|
||||
setActionLoading(false);
|
||||
}
|
||||
},
|
||||
[agentId],
|
||||
);
|
||||
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="space-y-4">
|
||||
{Array.from({ length: 6 }).map((_, i) => (
|
||||
<div key={i} className="h-5 w-full animate-pulse rounded bg-slate-200" />
|
||||
))}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
if (error || !agent) {
|
||||
return (
|
||||
<div className="rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{error ?? 'Agent not found.'}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
const dialogConfig = dialog === 'suspend'
|
||||
? {
|
||||
title: `Suspend agent ${agent.email}?`,
|
||||
description: `Suspending ${agent.email} means it will no longer be able to authenticate.`,
|
||||
confirmLabel: 'Suspend',
|
||||
variant: 'destructive' as const,
|
||||
}
|
||||
: {
|
||||
title: `Reactivate agent ${agent.email}?`,
|
||||
description: `Reactivating ${agent.email} will allow it to authenticate again.`,
|
||||
confirmLabel: 'Reactivate',
|
||||
variant: 'default' as const,
|
||||
};
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Back navigation */}
|
||||
<button
|
||||
onClick={() => { navigate('/dashboard/agents'); }}
|
||||
className="mb-6 flex items-center gap-1 text-sm text-brand-600 hover:text-brand-800"
|
||||
>
|
||||
← Back to Agents
|
||||
</button>
|
||||
|
||||
<div className="mb-6 flex items-start justify-between gap-4">
|
||||
<div>
|
||||
<h1 className="text-2xl font-bold text-slate-900">{agent.email}</h1>
|
||||
<p className="mt-1 text-sm text-slate-500">Agent ID: {agent.agentId}</p>
|
||||
</div>
|
||||
<Badge variant={statusVariant(agent.status)} className="mt-1">{agent.status}</Badge>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="mb-4 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{error}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Detail card */}
|
||||
<div className="rounded-xl border border-slate-200 bg-white p-6 shadow-sm">
|
||||
<dl className="space-y-4">
|
||||
<DetailRow label="Email" value={agent.email} />
|
||||
<DetailRow label="Agent ID" value={agent.agentId} />
|
||||
<DetailRow label="Type" value={agent.agentType} />
|
||||
<DetailRow label="Version" value={agent.version} />
|
||||
<DetailRow label="Owner" value={agent.owner} />
|
||||
<DetailRow label="Environment" value={agent.deploymentEnv} />
|
||||
<DetailRow label="Capabilities" value={agent.capabilities.join(', ') || '—'} />
|
||||
<DetailRow label="Status" value={agent.status} />
|
||||
<DetailRow label="Created" value={formatDateTime(agent.createdAt)} />
|
||||
<DetailRow label="Updated" value={formatDateTime(agent.updatedAt)} />
|
||||
</dl>
|
||||
</div>
|
||||
|
||||
{/* Actions */}
|
||||
{agent.status !== 'decommissioned' && (
|
||||
<div className="mt-6 flex gap-3">
|
||||
{agent.status === 'active' && (
|
||||
<Button
|
||||
variant="destructive"
|
||||
loading={actionLoading}
|
||||
onClick={() => { setDialog('suspend'); }}
|
||||
>
|
||||
Suspend Agent
|
||||
</Button>
|
||||
)}
|
||||
{agent.status === 'suspended' && (
|
||||
<Button
|
||||
variant="default"
|
||||
loading={actionLoading}
|
||||
onClick={() => { setDialog('reactivate'); }}
|
||||
>
|
||||
Reactivate Agent
|
||||
</Button>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Credentials section */}
|
||||
<div className="mt-8 rounded-xl border border-slate-200 bg-white p-6 shadow-sm">
|
||||
<h2 className="mb-4 text-lg font-semibold text-slate-900">Credentials</h2>
|
||||
<p className="mb-4 text-sm text-slate-600">
|
||||
Manage client secrets for this agent. Rotate or revoke credentials as needed.
|
||||
</p>
|
||||
<Button
|
||||
variant="outline"
|
||||
onClick={() => { navigate(`/dashboard/agents/${agent.agentId}/credentials`); }}
|
||||
>
|
||||
View Credentials
|
||||
</Button>
|
||||
</div>
|
||||
|
||||
{/* Confirm dialog */}
|
||||
{dialog !== null && (
|
||||
<ConfirmDialog
|
||||
open
|
||||
title={dialogConfig.title}
|
||||
description={dialogConfig.description}
|
||||
confirmLabel={dialogConfig.confirmLabel}
|
||||
variant={dialogConfig.variant}
|
||||
onConfirm={() => { void handleAction(dialog); }}
|
||||
onCancel={() => { setDialog(null); }}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
204
dashboard/src/pages/Agents.tsx
Normal file
204
dashboard/src/pages/Agents.tsx
Normal file
@@ -0,0 +1,204 @@
|
||||
import * as React from 'react';
|
||||
import { useNavigate } from 'react-router-dom';
|
||||
import type { Agent, AgentStatus } from '@sentryagent/idp-sdk';
|
||||
import { Badge } from '@/components/ui/badge';
|
||||
import { getClient } from '@/lib/client';
|
||||
|
||||
const PAGE_LIMIT = 20;
|
||||
|
||||
/** Maps AgentStatus to a Badge variant. */
|
||||
function statusVariant(status: AgentStatus): 'success' | 'warning' | 'danger' | 'muted' {
|
||||
switch (status) {
|
||||
case 'active': return 'success';
|
||||
case 'suspended': return 'warning';
|
||||
case 'decommissioned': return 'danger';
|
||||
}
|
||||
}
|
||||
|
||||
/** Formats an ISO timestamp to a short local date string. */
|
||||
function formatDate(iso: string): string {
|
||||
return new Date(iso).toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' });
|
||||
}
|
||||
|
||||
/** Skeleton row shown while loading. */
|
||||
function SkeletonRow(): React.JSX.Element {
|
||||
return (
|
||||
<tr>
|
||||
{Array.from({ length: 6 }).map((_, i) => (
|
||||
<td key={i} className="px-4 py-3">
|
||||
<div className="h-4 w-full animate-pulse rounded bg-slate-200" />
|
||||
</td>
|
||||
))}
|
||||
</tr>
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Agents list page — displays all registered agents with search, status filter, and pagination.
|
||||
* Clicking a row navigates to the Agent Detail page.
|
||||
*/
|
||||
export default function Agents(): React.JSX.Element {
|
||||
const navigate = useNavigate();
|
||||
|
||||
const [agents, setAgents] = React.useState<Agent[]>([]);
|
||||
const [total, setTotal] = React.useState<number>(0);
|
||||
const [page, setPage] = React.useState<number>(1);
|
||||
const [loading, setLoading] = React.useState<boolean>(false);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
|
||||
// Filters (client-side email search, server-side status)
|
||||
const [searchInput, setSearchInput] = React.useState<string>('');
|
||||
const [debouncedSearch, setDebouncedSearch] = React.useState<string>('');
|
||||
const [statusFilter, setStatusFilter] = React.useState<AgentStatus | ''>('');
|
||||
|
||||
// Debounce search input 300ms
|
||||
React.useEffect(() => {
|
||||
const timer = setTimeout(() => { setDebouncedSearch(searchInput); }, 300);
|
||||
return () => { clearTimeout(timer); };
|
||||
}, [searchInput]);
|
||||
|
||||
// Reset to page 1 on filter change
|
||||
React.useEffect(() => {
|
||||
setPage(1);
|
||||
}, [debouncedSearch, statusFilter]);
|
||||
|
||||
React.useEffect(() => {
|
||||
let cancelled = false;
|
||||
setLoading(true);
|
||||
setError(null);
|
||||
|
||||
const fetchAgents = async (): Promise<void> => {
|
||||
try {
|
||||
const client = getClient();
|
||||
const result = await client.agents.listAgents({
|
||||
page,
|
||||
limit: PAGE_LIMIT,
|
||||
status: statusFilter !== '' ? statusFilter : undefined,
|
||||
});
|
||||
if (!cancelled) {
|
||||
setAgents(result.data);
|
||||
setTotal(result.total);
|
||||
}
|
||||
} catch (err) {
|
||||
if (!cancelled) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to load agents.');
|
||||
}
|
||||
} finally {
|
||||
if (!cancelled) setLoading(false);
|
||||
}
|
||||
};
|
||||
|
||||
void fetchAgents();
|
||||
return () => { cancelled = true; };
|
||||
}, [page, statusFilter]);
|
||||
|
||||
// Client-side email filter applied after API results arrive
|
||||
const filteredAgents = React.useMemo(() => {
|
||||
if (!debouncedSearch.trim()) return agents;
|
||||
const lower = debouncedSearch.toLowerCase();
|
||||
return agents.filter((a) => a.email.toLowerCase().includes(lower));
|
||||
}, [agents, debouncedSearch]);
|
||||
|
||||
const totalPages = Math.max(1, Math.ceil(total / PAGE_LIMIT));
|
||||
|
||||
return (
|
||||
<div>
|
||||
<div className="mb-6 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
|
||||
<h1 className="text-2xl font-bold text-slate-900">Agents</h1>
|
||||
<div className="flex gap-3">
|
||||
<input
|
||||
type="search"
|
||||
value={searchInput}
|
||||
onChange={(e) => { setSearchInput(e.target.value); }}
|
||||
placeholder="Search by email…"
|
||||
className="w-60 rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
/>
|
||||
<select
|
||||
value={statusFilter}
|
||||
onChange={(e) => { setStatusFilter(e.target.value as AgentStatus | ''); }}
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
>
|
||||
<option value="">All Statuses</option>
|
||||
<option value="active">Active</option>
|
||||
<option value="suspended">Suspended</option>
|
||||
<option value="decommissioned">Decommissioned</option>
|
||||
</select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="mb-4 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{error}
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="overflow-hidden rounded-xl border border-slate-200 bg-white shadow-sm">
|
||||
<table className="min-w-full divide-y divide-slate-200 text-sm">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
{['Name (Email)', 'Type', 'Status', 'Environment', 'Owner', 'Created'].map((col) => (
|
||||
<th key={col} className="px-4 py-3 text-left text-xs font-semibold uppercase tracking-wide text-slate-500">
|
||||
{col}
|
||||
</th>
|
||||
))}
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-100">
|
||||
{loading
|
||||
? Array.from({ length: 5 }).map((_, i) => <SkeletonRow key={i} />)
|
||||
: filteredAgents.length === 0
|
||||
? (
|
||||
<tr>
|
||||
<td colSpan={6} className="px-4 py-12 text-center text-slate-400">
|
||||
No agents found.
|
||||
</td>
|
||||
</tr>
|
||||
)
|
||||
: filteredAgents.map((agent) => (
|
||||
<tr
|
||||
key={agent.agentId}
|
||||
onClick={() => { navigate(`/dashboard/agents/${agent.agentId}`); }}
|
||||
className="cursor-pointer hover:bg-slate-50"
|
||||
>
|
||||
<td className="px-4 py-3 font-medium text-brand-700">{agent.email}</td>
|
||||
<td className="px-4 py-3 text-slate-600">{agent.agentType}</td>
|
||||
<td className="px-4 py-3">
|
||||
<Badge variant={statusVariant(agent.status)}>{agent.status}</Badge>
|
||||
</td>
|
||||
<td className="px-4 py-3 text-slate-600">{agent.deploymentEnv}</td>
|
||||
<td className="px-4 py-3 text-slate-600">{agent.owner}</td>
|
||||
<td className="px-4 py-3 text-slate-500">{formatDate(agent.createdAt)}</td>
|
||||
</tr>
|
||||
))
|
||||
}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Pagination */}
|
||||
{!loading && total > 0 && (
|
||||
<div className="mt-4 flex items-center justify-between text-sm text-slate-600">
|
||||
<span>
|
||||
Page {page} of {totalPages} ({total} total)
|
||||
</span>
|
||||
<div className="flex gap-2">
|
||||
<button
|
||||
onClick={() => { setPage((p) => Math.max(1, p - 1)); }}
|
||||
disabled={page <= 1}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Previous
|
||||
</button>
|
||||
<button
|
||||
onClick={() => { setPage((p) => Math.min(totalPages, p + 1)); }}
|
||||
disabled={page >= totalPages}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Next
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
223
dashboard/src/pages/AuditLog.tsx
Normal file
223
dashboard/src/pages/AuditLog.tsx
Normal file
@@ -0,0 +1,223 @@
|
||||
import * as React from 'react';
|
||||
import type { AuditEvent, AuditAction, AuditOutcome } from '@sentryagent/idp-sdk';
|
||||
import { Badge } from '@/components/ui/badge';
|
||||
import { getClient } from '@/lib/client';
|
||||
|
||||
const PAGE_LIMIT = 20;
|
||||
|
||||
/** All AuditAction values for the filter dropdown. */
|
||||
const AUDIT_ACTIONS: AuditAction[] = [
|
||||
'agent.created',
|
||||
'agent.updated',
|
||||
'agent.decommissioned',
|
||||
'agent.suspended',
|
||||
'agent.reactivated',
|
||||
'token.issued',
|
||||
'token.revoked',
|
||||
'token.introspected',
|
||||
'credential.generated',
|
||||
'credential.rotated',
|
||||
'credential.revoked',
|
||||
'auth.failed',
|
||||
];
|
||||
|
||||
/** Formats an ISO timestamp to a readable local date-time string. */
|
||||
function formatDateTime(iso: string): string {
|
||||
return new Date(iso).toLocaleString(undefined, {
|
||||
year: 'numeric', month: 'short', day: 'numeric',
|
||||
hour: '2-digit', minute: '2-digit', second: '2-digit',
|
||||
});
|
||||
}
|
||||
|
||||
/** Truncates a string to a maximum length with ellipsis. */
|
||||
function truncate(value: string, maxLen = 24): string {
|
||||
return value.length > maxLen ? `${value.slice(0, maxLen)}…` : value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Audit Log page — displays audit events with filters for agent, action, outcome, and date range.
|
||||
* Route: /dashboard/audit
|
||||
*/
|
||||
export default function AuditLog(): React.JSX.Element {
|
||||
const [events, setEvents] = React.useState<AuditEvent[]>([]);
|
||||
const [total, setTotal] = React.useState<number>(0);
|
||||
const [page, setPage] = React.useState<number>(1);
|
||||
const [loading, setLoading] = React.useState<boolean>(false);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
|
||||
// Filters
|
||||
const [agentIdFilter, setAgentIdFilter] = React.useState<string>('');
|
||||
const [actionFilter, setActionFilter] = React.useState<AuditAction | ''>('');
|
||||
const [outcomeFilter, setOutcomeFilter] = React.useState<AuditOutcome | ''>('');
|
||||
const [fromDate, setFromDate] = React.useState<string>('');
|
||||
const [toDate, setToDate] = React.useState<string>('');
|
||||
|
||||
// Reset to page 1 on filter change
|
||||
React.useEffect(() => {
|
||||
setPage(1);
|
||||
}, [agentIdFilter, actionFilter, outcomeFilter, fromDate, toDate]);
|
||||
|
||||
React.useEffect(() => {
|
||||
let cancelled = false;
|
||||
setLoading(true);
|
||||
setError(null);
|
||||
|
||||
const fetchEvents = async (): Promise<void> => {
|
||||
try {
|
||||
const result = await getClient().audit.queryAuditLog({
|
||||
page,
|
||||
limit: PAGE_LIMIT,
|
||||
agentId: agentIdFilter.trim() || undefined,
|
||||
action: actionFilter !== '' ? actionFilter : undefined,
|
||||
outcome: outcomeFilter !== '' ? outcomeFilter : undefined,
|
||||
fromDate: fromDate || undefined,
|
||||
toDate: toDate || undefined,
|
||||
});
|
||||
if (!cancelled) {
|
||||
setEvents(result.data);
|
||||
setTotal(result.total);
|
||||
}
|
||||
} catch (err) {
|
||||
if (!cancelled) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to load audit log.');
|
||||
}
|
||||
} finally {
|
||||
if (!cancelled) setLoading(false);
|
||||
}
|
||||
};
|
||||
|
||||
void fetchEvents();
|
||||
return () => { cancelled = true; };
|
||||
}, [page, agentIdFilter, actionFilter, outcomeFilter, fromDate, toDate]);
|
||||
|
||||
const totalPages = Math.max(1, Math.ceil(total / PAGE_LIMIT));
|
||||
|
||||
return (
|
||||
<div>
|
||||
<h1 className="mb-6 text-2xl font-bold text-slate-900">Audit Log</h1>
|
||||
|
||||
{/* Filters */}
|
||||
<div className="mb-6 grid grid-cols-1 gap-3 sm:grid-cols-2 lg:grid-cols-5">
|
||||
<input
|
||||
type="text"
|
||||
value={agentIdFilter}
|
||||
onChange={(e) => { setAgentIdFilter(e.target.value); }}
|
||||
placeholder="Agent ID…"
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
/>
|
||||
<select
|
||||
value={actionFilter}
|
||||
onChange={(e) => { setActionFilter(e.target.value as AuditAction | ''); }}
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
>
|
||||
<option value="">All Actions</option>
|
||||
{AUDIT_ACTIONS.map((action) => (
|
||||
<option key={action} value={action}>{action}</option>
|
||||
))}
|
||||
</select>
|
||||
<select
|
||||
value={outcomeFilter}
|
||||
onChange={(e) => { setOutcomeFilter(e.target.value as AuditOutcome | ''); }}
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
>
|
||||
<option value="">All Outcomes</option>
|
||||
<option value="success">Success</option>
|
||||
<option value="failure">Failure</option>
|
||||
</select>
|
||||
<input
|
||||
type="date"
|
||||
value={fromDate}
|
||||
onChange={(e) => { setFromDate(e.target.value); }}
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
title="From date"
|
||||
/>
|
||||
<input
|
||||
type="date"
|
||||
value={toDate}
|
||||
onChange={(e) => { setToDate(e.target.value); }}
|
||||
className="rounded-md border border-slate-300 px-3 py-2 text-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
title="To date"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="mb-4 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{error}
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="overflow-hidden rounded-xl border border-slate-200 bg-white shadow-sm">
|
||||
<table className="min-w-full divide-y divide-slate-200 text-sm">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
{['Timestamp', 'Agent ID', 'Action', 'Outcome', 'IP Address'].map((col) => (
|
||||
<th key={col} className="px-4 py-3 text-left text-xs font-semibold uppercase tracking-wide text-slate-500">
|
||||
{col}
|
||||
</th>
|
||||
))}
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-100">
|
||||
{loading
|
||||
? Array.from({ length: 5 }).map((_, i) => (
|
||||
<tr key={i}>
|
||||
{Array.from({ length: 5 }).map((__, j) => (
|
||||
<td key={j} className="px-4 py-3">
|
||||
<div className="h-4 w-full animate-pulse rounded bg-slate-200" />
|
||||
</td>
|
||||
))}
|
||||
</tr>
|
||||
))
|
||||
: events.length === 0
|
||||
? (
|
||||
<tr>
|
||||
<td colSpan={5} className="px-4 py-12 text-center text-slate-400">
|
||||
No audit events found.
|
||||
</td>
|
||||
</tr>
|
||||
)
|
||||
: events.map((event) => (
|
||||
<tr key={event.eventId} className="hover:bg-slate-50">
|
||||
<td className="px-4 py-3 text-slate-500 whitespace-nowrap">{formatDateTime(event.timestamp)}</td>
|
||||
<td className="px-4 py-3 font-mono text-xs text-slate-700">{truncate(event.agentId)}</td>
|
||||
<td className="px-4 py-3 text-slate-700">{event.action}</td>
|
||||
<td className="px-4 py-3">
|
||||
<Badge variant={event.outcome === 'success' ? 'success' : 'danger'}>
|
||||
{event.outcome}
|
||||
</Badge>
|
||||
</td>
|
||||
<td className="px-4 py-3 text-slate-500">{event.ipAddress}</td>
|
||||
</tr>
|
||||
))
|
||||
}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Pagination */}
|
||||
{!loading && total > 0 && (
|
||||
<div className="mt-4 flex items-center justify-between text-sm text-slate-600">
|
||||
<span>
|
||||
Page {page} of {totalPages} ({total} total)
|
||||
</span>
|
||||
<div className="flex gap-2">
|
||||
<button
|
||||
onClick={() => { setPage((p) => Math.max(1, p - 1)); }}
|
||||
disabled={page <= 1}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Previous
|
||||
</button>
|
||||
<button
|
||||
onClick={() => { setPage((p) => Math.min(totalPages, p + 1)); }}
|
||||
disabled={page >= totalPages}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Next
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
264
dashboard/src/pages/Credentials.tsx
Normal file
264
dashboard/src/pages/Credentials.tsx
Normal file
@@ -0,0 +1,264 @@
|
||||
import * as React from 'react';
|
||||
import { useParams, useNavigate } from 'react-router-dom';
|
||||
import type { Credential, CredentialWithSecret } from '@sentryagent/idp-sdk';
|
||||
import { Badge } from '@/components/ui/badge';
|
||||
import { Button } from '@/components/ui/button';
|
||||
import { ConfirmDialog } from '@/components/ui/dialog';
|
||||
import { getClient } from '@/lib/client';
|
||||
|
||||
/** Truncates a string to a maximum length with ellipsis. */
|
||||
function truncate(value: string, maxLen = 16): string {
|
||||
return value.length > maxLen ? `${value.slice(0, maxLen)}…` : value;
|
||||
}
|
||||
|
||||
/** Formats an ISO timestamp to a short local date string. */
|
||||
function formatDate(iso: string): string {
|
||||
return new Date(iso).toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' });
|
||||
}
|
||||
|
||||
interface NewSecretBoxProps {
|
||||
secret: string;
|
||||
onDismiss: () => void;
|
||||
}
|
||||
|
||||
/**
|
||||
* Displays a newly issued client secret exactly once.
|
||||
* Provides a copy button and a dismiss button.
|
||||
*/
|
||||
function NewSecretBox({ secret, onDismiss }: NewSecretBoxProps): React.JSX.Element {
|
||||
const [copied, setCopied] = React.useState<boolean>(false);
|
||||
|
||||
const handleCopy = React.useCallback(async (): Promise<void> => {
|
||||
await navigator.clipboard.writeText(secret);
|
||||
setCopied(true);
|
||||
setTimeout(() => { setCopied(false); }, 2000);
|
||||
}, [secret]);
|
||||
|
||||
return (
|
||||
<div className="mb-6 rounded-lg border-2 border-green-400 bg-green-50 p-4">
|
||||
<p className="mb-2 text-sm font-semibold text-green-800">
|
||||
New client secret — copy it now. It will not be shown again.
|
||||
</p>
|
||||
<div className="flex items-center gap-3">
|
||||
<code className="flex-1 break-all rounded bg-white px-3 py-2 text-sm font-mono text-green-900 border border-green-200">
|
||||
{secret}
|
||||
</code>
|
||||
<Button variant="outline" size="sm" onClick={() => { void handleCopy(); }}>
|
||||
{copied ? 'Copied!' : 'Copy'}
|
||||
</Button>
|
||||
</div>
|
||||
<button
|
||||
onClick={onDismiss}
|
||||
className="mt-3 text-xs text-green-700 underline hover:text-green-900"
|
||||
>
|
||||
I have saved this secret — dismiss
|
||||
</button>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
type DialogAction = { type: 'rotate'; credentialId: string } | { type: 'revoke'; credentialId: string };
|
||||
|
||||
/**
|
||||
* Credentials page — lists all credentials for an agent with rotate/revoke actions.
|
||||
* Route: /dashboard/agents/:agentId/credentials
|
||||
*/
|
||||
export default function Credentials(): React.JSX.Element {
|
||||
const { agentId } = useParams<{ agentId: string }>();
|
||||
const navigate = useNavigate();
|
||||
|
||||
const [credentials, setCredentials] = React.useState<Credential[]>([]);
|
||||
const [loading, setLoading] = React.useState<boolean>(true);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
const [actionLoading, setActionLoading] = React.useState<boolean>(false);
|
||||
const [dialog, setDialog] = React.useState<DialogAction | null>(null);
|
||||
const [newSecret, setNewSecret] = React.useState<CredentialWithSecret | null>(null);
|
||||
|
||||
const fetchCredentials = React.useCallback(async (): Promise<void> => {
|
||||
if (!agentId) return;
|
||||
setLoading(true);
|
||||
setError(null);
|
||||
try {
|
||||
const result = await getClient().credentials.listCredentials(agentId);
|
||||
setCredentials(result.data);
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to load credentials.');
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, [agentId]);
|
||||
|
||||
React.useEffect(() => {
|
||||
void fetchCredentials();
|
||||
}, [fetchCredentials]);
|
||||
|
||||
const handleGenerate = React.useCallback(async (): Promise<void> => {
|
||||
if (!agentId) return;
|
||||
setActionLoading(true);
|
||||
setError(null);
|
||||
try {
|
||||
const result = await getClient().credentials.generateCredential(agentId, {});
|
||||
setNewSecret(result);
|
||||
await fetchCredentials();
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to generate credential.');
|
||||
} finally {
|
||||
setActionLoading(false);
|
||||
}
|
||||
}, [agentId, fetchCredentials]);
|
||||
|
||||
const handleConfirm = React.useCallback(async (): Promise<void> => {
|
||||
if (!dialog || !agentId) return;
|
||||
setActionLoading(true);
|
||||
setDialog(null);
|
||||
setError(null);
|
||||
try {
|
||||
if (dialog.type === 'rotate') {
|
||||
const result = await getClient().credentials.rotateCredential(agentId, dialog.credentialId);
|
||||
setNewSecret(result);
|
||||
} else {
|
||||
await getClient().credentials.revokeCredential(agentId, dialog.credentialId);
|
||||
}
|
||||
await fetchCredentials();
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : `Failed to ${dialog.type} credential.`);
|
||||
} finally {
|
||||
setActionLoading(false);
|
||||
}
|
||||
}, [dialog, agentId, fetchCredentials]);
|
||||
|
||||
const dialogConfig = React.useMemo(() => {
|
||||
if (!dialog) return null;
|
||||
if (dialog.type === 'rotate') {
|
||||
return {
|
||||
title: 'Rotate credential?',
|
||||
description: 'The existing secret will be invalidated immediately. You will receive a new secret — store it securely.',
|
||||
confirmLabel: 'Rotate',
|
||||
variant: 'destructive' as const,
|
||||
};
|
||||
}
|
||||
return {
|
||||
title: 'Revoke credential?',
|
||||
description: 'This will permanently revoke the credential. This cannot be undone.',
|
||||
confirmLabel: 'Revoke',
|
||||
variant: 'destructive' as const,
|
||||
};
|
||||
}, [dialog]);
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Back navigation */}
|
||||
<button
|
||||
onClick={() => { navigate(`/dashboard/agents/${agentId ?? ''}`); }}
|
||||
className="mb-6 flex items-center gap-1 text-sm text-brand-600 hover:text-brand-800"
|
||||
>
|
||||
← Back to Agent
|
||||
</button>
|
||||
|
||||
<div className="mb-6 flex items-center justify-between">
|
||||
<h1 className="text-2xl font-bold text-slate-900">Credentials</h1>
|
||||
<Button
|
||||
loading={actionLoading}
|
||||
onClick={() => { void handleGenerate(); }}
|
||||
>
|
||||
Generate Credential
|
||||
</Button>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="mb-4 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
{error}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* New secret display — shown once */}
|
||||
{newSecret !== null && (
|
||||
<NewSecretBox
|
||||
secret={newSecret.clientSecret}
|
||||
onDismiss={() => { setNewSecret(null); }}
|
||||
/>
|
||||
)}
|
||||
|
||||
{/* Credentials table */}
|
||||
<div className="overflow-hidden rounded-xl border border-slate-200 bg-white shadow-sm">
|
||||
<table className="min-w-full divide-y divide-slate-200 text-sm">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
{['Credential ID', 'Status', 'Created', 'Actions'].map((col) => (
|
||||
<th key={col} className="px-4 py-3 text-left text-xs font-semibold uppercase tracking-wide text-slate-500">
|
||||
{col}
|
||||
</th>
|
||||
))}
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-100">
|
||||
{loading ? (
|
||||
Array.from({ length: 3 }).map((_, i) => (
|
||||
<tr key={i}>
|
||||
{Array.from({ length: 4 }).map((__, j) => (
|
||||
<td key={j} className="px-4 py-3">
|
||||
<div className="h-4 w-full animate-pulse rounded bg-slate-200" />
|
||||
</td>
|
||||
))}
|
||||
</tr>
|
||||
))
|
||||
) : credentials.length === 0 ? (
|
||||
<tr>
|
||||
<td colSpan={4} className="px-4 py-12 text-center text-slate-400">
|
||||
No credentials found. Generate one above.
|
||||
</td>
|
||||
</tr>
|
||||
) : credentials.map((cred) => (
|
||||
<tr key={cred.credentialId} className="hover:bg-slate-50">
|
||||
<td className="px-4 py-3 font-mono text-xs text-slate-700">
|
||||
{truncate(cred.credentialId, 24)}
|
||||
</td>
|
||||
<td className="px-4 py-3">
|
||||
<Badge variant={cred.status === 'active' ? 'success' : 'muted'}>
|
||||
{cred.status}
|
||||
</Badge>
|
||||
</td>
|
||||
<td className="px-4 py-3 text-slate-500">{formatDate(cred.createdAt)}</td>
|
||||
<td className="px-4 py-3">
|
||||
{cred.status === 'active' && (
|
||||
<div className="flex gap-2">
|
||||
<Button
|
||||
variant="outline"
|
||||
size="sm"
|
||||
disabled={actionLoading}
|
||||
onClick={() => { setDialog({ type: 'rotate', credentialId: cred.credentialId }); }}
|
||||
>
|
||||
Rotate
|
||||
</Button>
|
||||
<Button
|
||||
variant="destructive"
|
||||
size="sm"
|
||||
disabled={actionLoading}
|
||||
onClick={() => { setDialog({ type: 'revoke', credentialId: cred.credentialId }); }}
|
||||
>
|
||||
Revoke
|
||||
</Button>
|
||||
</div>
|
||||
)}
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Confirm dialog */}
|
||||
{dialog !== null && dialogConfig !== null && (
|
||||
<ConfirmDialog
|
||||
open
|
||||
title={dialogConfig.title}
|
||||
description={dialogConfig.description}
|
||||
confirmLabel={dialogConfig.confirmLabel}
|
||||
variant={dialogConfig.variant}
|
||||
onConfirm={() => { void handleConfirm(); }}
|
||||
onCancel={() => { setDialog(null); }}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
173
dashboard/src/pages/Health.tsx
Normal file
173
dashboard/src/pages/Health.tsx
Normal file
@@ -0,0 +1,173 @@
|
||||
import * as React from 'react';
|
||||
|
||||
/** Shape of the /health API response. */
|
||||
interface HealthResponse {
|
||||
status: 'ok' | 'degraded';
|
||||
version?: string;
|
||||
uptime?: number;
|
||||
services: {
|
||||
postgres: 'connected' | 'disconnected';
|
||||
redis: 'connected' | 'disconnected';
|
||||
};
|
||||
}
|
||||
|
||||
type ServiceStatus = 'connected' | 'disconnected' | 'unknown';
|
||||
|
||||
interface HealthState {
|
||||
postgres: ServiceStatus;
|
||||
redis: ServiceStatus;
|
||||
version: string | null;
|
||||
uptime: number | null;
|
||||
lastChecked: Date | null;
|
||||
reachable: boolean;
|
||||
}
|
||||
|
||||
const initialState: HealthState = {
|
||||
postgres: 'unknown',
|
||||
redis: 'unknown',
|
||||
version: null,
|
||||
uptime: null,
|
||||
lastChecked: null,
|
||||
reachable: true,
|
||||
};
|
||||
|
||||
/** Formats seconds into a human-readable uptime string. */
|
||||
function formatUptime(seconds: number): string {
|
||||
const days = Math.floor(seconds / 86400);
|
||||
const hours = Math.floor((seconds % 86400) / 3600);
|
||||
const minutes = Math.floor((seconds % 3600) / 60);
|
||||
const parts: string[] = [];
|
||||
if (days > 0) parts.push(`${days}d`);
|
||||
if (hours > 0) parts.push(`${hours}h`);
|
||||
parts.push(`${minutes}m`);
|
||||
return parts.join(' ');
|
||||
}
|
||||
|
||||
interface StatusCardProps {
|
||||
label: string;
|
||||
status: ServiceStatus;
|
||||
}
|
||||
|
||||
/** Card displaying the connectivity status of a single service. */
|
||||
function StatusCard({ label, status }: StatusCardProps): React.JSX.Element {
|
||||
const isConnected = status === 'connected';
|
||||
const isUnknown = status === 'unknown';
|
||||
|
||||
return (
|
||||
<div className={`rounded-xl border p-6 shadow-sm ${
|
||||
isUnknown
|
||||
? 'border-slate-200 bg-slate-50'
|
||||
: isConnected
|
||||
? 'border-green-200 bg-green-50'
|
||||
: 'border-red-200 bg-red-50'
|
||||
}`}>
|
||||
<p className="text-sm font-medium text-slate-600">{label}</p>
|
||||
<div className="mt-2 flex items-center gap-2">
|
||||
<span className={`inline-block h-3 w-3 rounded-full ${
|
||||
isUnknown ? 'bg-slate-400' : isConnected ? 'bg-green-500' : 'bg-red-500'
|
||||
}`} />
|
||||
<span className={`text-lg font-semibold ${
|
||||
isUnknown ? 'text-slate-600' : isConnected ? 'text-green-700' : 'text-red-700'
|
||||
}`}>
|
||||
{isUnknown ? 'Checking…' : isConnected ? 'Connected' : 'Disconnected'}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Health page — shows PostgreSQL and Redis connectivity status.
|
||||
* Polls GET /health every 30 seconds. No authentication required.
|
||||
* Route: /dashboard/health
|
||||
*/
|
||||
export default function Health(): React.JSX.Element {
|
||||
const [health, setHealth] = React.useState<HealthState>(initialState);
|
||||
const [loading, setLoading] = React.useState<boolean>(true);
|
||||
|
||||
const checkHealth = React.useCallback(async (): Promise<void> => {
|
||||
try {
|
||||
const response = await fetch('/health');
|
||||
const data = (await response.json()) as HealthResponse;
|
||||
|
||||
setHealth({
|
||||
postgres: data.services?.postgres ?? 'unknown',
|
||||
redis: data.services?.redis ?? 'unknown',
|
||||
version: data.version ?? null,
|
||||
uptime: data.uptime ?? null,
|
||||
lastChecked: new Date(),
|
||||
reachable: true,
|
||||
});
|
||||
} catch {
|
||||
setHealth((prev) => ({
|
||||
...prev,
|
||||
postgres: 'disconnected',
|
||||
redis: 'disconnected',
|
||||
lastChecked: new Date(),
|
||||
reachable: false,
|
||||
}));
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
React.useEffect(() => {
|
||||
void checkHealth();
|
||||
const interval = setInterval(() => { void checkHealth(); }, 30_000);
|
||||
return () => { clearInterval(interval); };
|
||||
}, [checkHealth]);
|
||||
|
||||
return (
|
||||
<div>
|
||||
<div className="mb-6 flex items-center justify-between">
|
||||
<h1 className="text-2xl font-bold text-slate-900">System Health</h1>
|
||||
<button
|
||||
onClick={() => { void checkHealth(); }}
|
||||
disabled={loading}
|
||||
className="rounded-md border border-slate-300 px-3 py-1.5 text-sm hover:bg-slate-50 disabled:opacity-40"
|
||||
>
|
||||
Refresh
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{!health.reachable && (
|
||||
<div className="mb-6 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
|
||||
API is unreachable. Check that the server is running.
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="grid grid-cols-1 gap-4 sm:grid-cols-2">
|
||||
<StatusCard label="PostgreSQL" status={loading ? 'unknown' : health.postgres} />
|
||||
<StatusCard label="Redis" status={loading ? 'unknown' : health.redis} />
|
||||
</div>
|
||||
|
||||
{/* Metadata */}
|
||||
{(health.version !== null || health.uptime !== null) && (
|
||||
<div className="mt-6 rounded-xl border border-slate-200 bg-white p-6 shadow-sm">
|
||||
<h2 className="mb-4 text-base font-semibold text-slate-900">API Details</h2>
|
||||
<dl className="space-y-2">
|
||||
{health.version !== null && (
|
||||
<div className="flex gap-4">
|
||||
<dt className="w-24 text-sm font-medium text-slate-500">Version</dt>
|
||||
<dd className="text-sm text-slate-900">{health.version}</dd>
|
||||
</div>
|
||||
)}
|
||||
{health.uptime !== null && (
|
||||
<div className="flex gap-4">
|
||||
<dt className="w-24 text-sm font-medium text-slate-500">Uptime</dt>
|
||||
<dd className="text-sm text-slate-900">{formatUptime(health.uptime)}</dd>
|
||||
</div>
|
||||
)}
|
||||
</dl>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Last checked */}
|
||||
{health.lastChecked !== null && (
|
||||
<p className="mt-4 text-xs text-slate-400">
|
||||
Last checked: {health.lastChecked.toLocaleTimeString()} — auto-refreshes every 30 seconds
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
109
dashboard/src/pages/Login.tsx
Normal file
109
dashboard/src/pages/Login.tsx
Normal file
@@ -0,0 +1,109 @@
|
||||
import * as React from 'react';
|
||||
import { useNavigate } from 'react-router-dom';
|
||||
import { Button } from '@/components/ui/button';
|
||||
import { useAuth } from '@/lib/auth';
|
||||
|
||||
/**
|
||||
* Login page — accepts API Base URL, Client ID, and Client Secret.
|
||||
* Validates credentials against the AgentIdP token endpoint before persisting.
|
||||
*/
|
||||
export default function Login(): React.JSX.Element {
|
||||
const { login } = useAuth();
|
||||
const navigate = useNavigate();
|
||||
|
||||
const [baseUrl, setBaseUrl] = React.useState<string>(window.location.origin);
|
||||
const [clientId, setClientId] = React.useState<string>('');
|
||||
const [clientSecret, setClientSecret] = React.useState<string>('');
|
||||
const [loading, setLoading] = React.useState<boolean>(false);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
|
||||
const handleSubmit = React.useCallback(
|
||||
async (e: React.FormEvent<HTMLFormElement>): Promise<void> => {
|
||||
e.preventDefault();
|
||||
setError(null);
|
||||
setLoading(true);
|
||||
|
||||
try {
|
||||
const success = await login({ baseUrl: baseUrl.trim(), clientId: clientId.trim(), clientSecret });
|
||||
if (success) {
|
||||
navigate('/dashboard/agents', { replace: true });
|
||||
} else {
|
||||
setError('Invalid credentials. Please check your Client ID and secret.');
|
||||
setClientSecret('');
|
||||
}
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
},
|
||||
[login, navigate, baseUrl, clientId, clientSecret],
|
||||
);
|
||||
|
||||
return (
|
||||
<div className="flex min-h-screen items-center justify-center bg-slate-50 px-4">
|
||||
<div className="w-full max-w-md rounded-xl bg-white p-8 shadow-lg">
|
||||
<div className="mb-8 text-center">
|
||||
<h1 className="text-2xl font-bold text-brand-700">SentryAgent.ai</h1>
|
||||
<p className="mt-1 text-sm text-slate-500">AgentIdP Dashboard — Sign In</p>
|
||||
</div>
|
||||
|
||||
<form onSubmit={(e) => { void handleSubmit(e); }} className="space-y-5">
|
||||
<div>
|
||||
<label htmlFor="baseUrl" className="block text-sm font-medium text-slate-700">
|
||||
API Base URL
|
||||
</label>
|
||||
<input
|
||||
id="baseUrl"
|
||||
type="url"
|
||||
required
|
||||
value={baseUrl}
|
||||
onChange={(e) => { setBaseUrl(e.target.value); }}
|
||||
className="mt-1 block w-full rounded-md border border-slate-300 px-3 py-2 text-sm shadow-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
placeholder="https://api.example.com"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label htmlFor="clientId" className="block text-sm font-medium text-slate-700">
|
||||
Client ID
|
||||
</label>
|
||||
<input
|
||||
id="clientId"
|
||||
type="text"
|
||||
required
|
||||
value={clientId}
|
||||
onChange={(e) => { setClientId(e.target.value); }}
|
||||
className="mt-1 block w-full rounded-md border border-slate-300 px-3 py-2 text-sm shadow-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
placeholder="agent-uuid"
|
||||
autoComplete="username"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label htmlFor="clientSecret" className="block text-sm font-medium text-slate-700">
|
||||
Client Secret
|
||||
</label>
|
||||
<input
|
||||
id="clientSecret"
|
||||
type="password"
|
||||
required
|
||||
value={clientSecret}
|
||||
onChange={(e) => { setClientSecret(e.target.value); }}
|
||||
className="mt-1 block w-full rounded-md border border-slate-300 px-3 py-2 text-sm shadow-sm focus:border-brand-500 focus:outline-none focus:ring-1 focus:ring-brand-500"
|
||||
autoComplete="current-password"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<p className="rounded-md bg-red-50 px-3 py-2 text-sm text-red-700" role="alert">
|
||||
{error}
|
||||
</p>
|
||||
)}
|
||||
|
||||
<Button type="submit" loading={loading} className="w-full" size="lg">
|
||||
{loading ? 'Validating…' : 'Sign In'}
|
||||
</Button>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
1
dashboard/src/vite-env.d.ts
vendored
Normal file
1
dashboard/src/vite-env.d.ts
vendored
Normal file
@@ -0,0 +1 @@
|
||||
/// <reference types="vite/client" />
|
||||
19
dashboard/tailwind.config.js
Normal file
19
dashboard/tailwind.config.js
Normal file
@@ -0,0 +1,19 @@
|
||||
/** @type {import('tailwindcss').Config} */
|
||||
export default {
|
||||
content: ['./index.html', './src/**/*.{ts,tsx}'],
|
||||
theme: {
|
||||
extend: {
|
||||
colors: {
|
||||
brand: {
|
||||
50: '#f0f9ff',
|
||||
100: '#e0f2fe',
|
||||
500: '#0ea5e9',
|
||||
600: '#0284c7',
|
||||
700: '#0369a1',
|
||||
900: '#0c4a6e',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
plugins: [],
|
||||
};
|
||||
25
dashboard/tsconfig.app.json
Normal file
25
dashboard/tsconfig.app.json
Normal file
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
|
||||
"target": "ES2020",
|
||||
"useDefineForClassFields": true,
|
||||
"lib": ["ES2020", "DOM", "DOM.Iterable"],
|
||||
"module": "ESNext",
|
||||
"skipLibCheck": true,
|
||||
"moduleResolution": "bundler",
|
||||
"allowImportingTsExtensions": true,
|
||||
"isolatedModules": true,
|
||||
"moduleDetection": "force",
|
||||
"noEmit": true,
|
||||
"jsx": "react-jsx",
|
||||
"strict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"noUncheckedSideEffectImports": true,
|
||||
"paths": {
|
||||
"@/*": ["./src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["src"]
|
||||
}
|
||||
7
dashboard/tsconfig.json
Normal file
7
dashboard/tsconfig.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"files": [],
|
||||
"references": [
|
||||
{ "path": "./tsconfig.app.json" },
|
||||
{ "path": "./tsconfig.node.json" }
|
||||
]
|
||||
}
|
||||
20
dashboard/tsconfig.node.json
Normal file
20
dashboard/tsconfig.node.json
Normal file
@@ -0,0 +1,20 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
|
||||
"target": "ES2022",
|
||||
"lib": ["ES2023"],
|
||||
"module": "ESNext",
|
||||
"skipLibCheck": true,
|
||||
"moduleResolution": "bundler",
|
||||
"allowImportingTsExtensions": true,
|
||||
"isolatedModules": true,
|
||||
"moduleDetection": "force",
|
||||
"noEmit": true,
|
||||
"strict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"noUncheckedSideEffectImports": true
|
||||
},
|
||||
"include": ["vite.config.ts"]
|
||||
}
|
||||
17
dashboard/vite.config.ts
Normal file
17
dashboard/vite.config.ts
Normal file
@@ -0,0 +1,17 @@
|
||||
import { defineConfig } from 'vite';
|
||||
import react from '@vitejs/plugin-react';
|
||||
import path from 'path';
|
||||
|
||||
export default defineConfig({
|
||||
plugins: [react()],
|
||||
base: '/dashboard/',
|
||||
resolve: {
|
||||
alias: {
|
||||
'@': path.resolve(__dirname, './src'),
|
||||
},
|
||||
},
|
||||
build: {
|
||||
outDir: 'dist',
|
||||
emptyOutDir: true,
|
||||
},
|
||||
});
|
||||
50
docker-compose.monitoring.yml
Normal file
50
docker-compose.monitoring.yml
Normal file
@@ -0,0 +1,50 @@
|
||||
version: '3.8'
|
||||
|
||||
# Monitoring overlay — extend the base docker-compose.yml
|
||||
# Usage: docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:v2.53.0
|
||||
container_name: agentidp_prometheus
|
||||
volumes:
|
||||
- ./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- prometheus_data:/prometheus
|
||||
command:
|
||||
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||
- '--storage.tsdb.path=/prometheus'
|
||||
- '--web.console.libraries=/etc/prometheus/console_libraries'
|
||||
- '--web.console.templates=/etc/prometheus/consoles'
|
||||
- '--web.enable-lifecycle'
|
||||
ports:
|
||||
- '9090:9090'
|
||||
networks:
|
||||
- agentidp_network
|
||||
restart: unless-stopped
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:11.2.0
|
||||
container_name: agentidp_grafana
|
||||
volumes:
|
||||
- grafana_data:/var/lib/grafana
|
||||
- ./monitoring/grafana/provisioning:/etc/grafana/provisioning:ro
|
||||
- ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards:ro
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD=agentidp
|
||||
- GF_USERS_ALLOW_SIGN_UP=false
|
||||
- GF_AUTH_ANONYMOUS_ENABLED=false
|
||||
ports:
|
||||
- '3001:3000'
|
||||
networks:
|
||||
- agentidp_network
|
||||
depends_on:
|
||||
- prometheus
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
prometheus_data:
|
||||
grafana_data:
|
||||
|
||||
networks:
|
||||
agentidp_network:
|
||||
external: true
|
||||
172
docs/compliance/audit-log-runbook.md
Normal file
172
docs/compliance/audit-log-runbook.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP
|
||||
|
||||
**Control:** SOC 2 CC7.2 — Audit Log Integrity
|
||||
**Service:** `src/services/AuditVerificationService.ts`
|
||||
**Job:** `src/jobs/AuditChainVerificationJob.ts`
|
||||
**Endpoint:** `GET /api/v1/audit/verify`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
|
||||
via a SHA-256 hash chain. Each event stores:
|
||||
|
||||
- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
|
||||
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)
|
||||
|
||||
The first event in the chain uses `previous_hash = ''` (empty string sentinel).
|
||||
|
||||
A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
|
||||
on `audit_events`, making the log tamper-evident at the database level.
|
||||
|
||||
---
|
||||
|
||||
## Running GET /audit/verify
|
||||
|
||||
### Full chain verification (no date range)
|
||||
|
||||
```bash
|
||||
# Requires Bearer token with audit:read scope
|
||||
curl -s -H "Authorization: Bearer <token>" \
|
||||
"https://api.sentryagent.ai/v1/audit/verify"
|
||||
```
|
||||
|
||||
**Response (chain intact):**
|
||||
```json
|
||||
{
|
||||
"verified": true,
|
||||
"checkedCount": 18504,
|
||||
"brokenAtEventId": null
|
||||
}
|
||||
```
|
||||
|
||||
**Response (chain break detected):**
|
||||
```json
|
||||
{
|
||||
"verified": false,
|
||||
"checkedCount": 1203,
|
||||
"brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
|
||||
}
|
||||
```
|
||||
|
||||
### Date-ranged verification
|
||||
|
||||
```bash
|
||||
curl -s -H "Authorization: Bearer <token>" \
|
||||
"https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
|
||||
```
|
||||
|
||||
### Interpreting the response
|
||||
|
||||
| Field | Meaning |
|
||||
|---|---|
|
||||
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
|
||||
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
|
||||
| `checkedCount` | Number of events examined (0 = no events in range) |
|
||||
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
|
||||
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |
|
||||
|
||||
---
|
||||
|
||||
## AuditChainVerificationJob
|
||||
|
||||
The `AuditChainVerificationJob` runs automatically in the background every hour (default).
|
||||
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).
|
||||
|
||||
On each tick it calls `verifyChain()` and:
|
||||
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
|
||||
- Updates `ComplianceStatusStore` with `CC7.2 = passing`
|
||||
|
||||
If verification fails:
|
||||
- Sets gauge to **0**
|
||||
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
|
||||
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
|
||||
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`
|
||||
|
||||
---
|
||||
|
||||
## What to Do When `brokenAtEventId` is Returned
|
||||
|
||||
### Step 1: Preserve Evidence
|
||||
|
||||
Immediately capture the full state of the audit log for forensic analysis:
|
||||
|
||||
```sql
|
||||
-- Export all events around the break point
|
||||
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
|
||||
FROM audit_events
|
||||
WHERE timestamp >= (
|
||||
SELECT timestamp - INTERVAL '1 hour'
|
||||
FROM audit_events WHERE event_id = '<brokenAtEventId>'
|
||||
)
|
||||
ORDER BY timestamp ASC, event_id ASC;
|
||||
```
|
||||
|
||||
Save the output to a secure, immutable location (e.g. S3 with object locking).
|
||||
|
||||
### Step 2: Identify the Break Type
|
||||
|
||||
Compare the recomputed hash for the broken event with its stored hash:
|
||||
|
||||
```bash
|
||||
# Using Node.js
|
||||
node -e "
|
||||
const crypto = require('crypto');
|
||||
const eventId = '<event_id>';
|
||||
const timestamp = '<timestamp_from_db>';
|
||||
const action = '<action>';
|
||||
const outcome = '<outcome>';
|
||||
const agentId = '<agent_id>';
|
||||
const orgId = '<organization_id>';
|
||||
const prevHash = '<previous_hash_from_db>';
|
||||
const expected = crypto.createHash('sha256')
|
||||
.update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
|
||||
.digest('hex');
|
||||
console.log('Expected hash:', expected);
|
||||
console.log('Stored hash: <hash_from_db>');
|
||||
console.log('Match:', expected === '<hash_from_db>');
|
||||
"
|
||||
```
|
||||
|
||||
Possible break types:
|
||||
- **Hash mismatch only** — event data was modified after insertion
|
||||
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
|
||||
- **Both mismatched** — multiple modifications or an injection attack
|
||||
|
||||
### Step 3: Escalate
|
||||
|
||||
A chain break is a **critical security incident**. Immediately:
|
||||
|
||||
1. Notify the security team and CISO
|
||||
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
|
||||
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
|
||||
4. Consider temporarily suspending API access pending investigation
|
||||
5. Notify affected customers per data breach notification obligations
|
||||
|
||||
### Step 4: Forensic Investigation
|
||||
|
||||
Using PostgreSQL audit logs, Vault audit logs, and application logs:
|
||||
- Identify which application process or database connection modified the row
|
||||
- Correlate with access logs and authentication events
|
||||
- Determine the extent of the compromise (single row vs. systematic)
|
||||
|
||||
---
|
||||
|
||||
## Verification Rate Limiting
|
||||
|
||||
`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
|
||||
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
|
||||
and poll `GET /compliance/controls` instead.
|
||||
|
||||
---
|
||||
|
||||
## SOC 2 Evidence Package
|
||||
|
||||
For auditors, provide:
|
||||
|
||||
1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
|
||||
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
|
||||
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
|
||||
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
|
||||
5. `docs/openapi/compliance.yaml` — endpoint specification
|
||||
159
docs/compliance/encryption-runbook.md
Normal file
159
docs/compliance/encryption-runbook.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Encryption Key Rotation Runbook — SentryAgent.ai AgentIdP
|
||||
|
||||
**Control:** SOC 2 CC6.1 — Encryption at Rest
|
||||
**Service:** `src/services/EncryptionService.ts`
|
||||
**Vault path:** Configured via `ENCRYPTION_KEY_VAULT_PATH` env var (default: `secret/data/agentidp/encryption-key`)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
AgentIdP uses AES-256-CBC column-level encryption for sensitive PostgreSQL columns.
|
||||
The encryption key is a 64-character hex string (32 bytes) stored in HashiCorp Vault.
|
||||
The `EncryptionService` fetches the key once and caches it in process memory.
|
||||
|
||||
Encrypted format: `base64(IV):base64(ciphertext)` where IV is 16 random bytes per encryption call.
|
||||
|
||||
---
|
||||
|
||||
## Key Rotation Procedure
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Access to HashiCorp Vault with write permissions to the encryption key path
|
||||
- Access to the production application environment (to trigger restart)
|
||||
- At least one backup of the current key stored securely offline
|
||||
|
||||
### Step 1: Generate a New Key
|
||||
|
||||
Generate a cryptographically strong 32-byte (64-character hex) key:
|
||||
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
# Example output: a1b2c3d4e5f6... (64 hex chars)
|
||||
```
|
||||
|
||||
Record the new key securely.
|
||||
|
||||
### Step 2: Backup the Current Key
|
||||
|
||||
Before overwriting, read and securely store the current key:
|
||||
|
||||
```bash
|
||||
vault kv get -field=encryptionKey secret/agentidp/encryption-key > /secure/backup/encryption-key-$(date +%Y%m%d).txt
|
||||
```
|
||||
|
||||
Store in a hardware security module (HSM) or offline key store.
|
||||
|
||||
### Step 3: Write the New Key to Vault
|
||||
|
||||
```bash
|
||||
vault kv put secret/agentidp/encryption-key encryptionKey="<new-64-char-hex-key>"
|
||||
```
|
||||
|
||||
Verify the write:
|
||||
|
||||
```bash
|
||||
vault kv get secret/agentidp/encryption-key
|
||||
```
|
||||
|
||||
Confirm the `encryptionKey` field contains exactly 64 hex characters.
|
||||
|
||||
### Step 4: Restart the Application
|
||||
|
||||
The `EncryptionService` caches the key in process memory. A restart forces a re-fetch from Vault:
|
||||
|
||||
```bash
|
||||
# Kubernetes rolling restart
|
||||
kubectl rollout restart deployment/agentidp
|
||||
|
||||
# Docker Compose
|
||||
docker-compose restart agentidp
|
||||
|
||||
# PM2
|
||||
pm2 restart agentidp
|
||||
```
|
||||
|
||||
### Step 5: Verify Key Pick-Up
|
||||
|
||||
Check the application logs for:
|
||||
|
||||
```
|
||||
[AgentIdP] EncryptionService enabled — sensitive columns encrypted at rest (SOC 2 CC6.1)
|
||||
```
|
||||
|
||||
Call the compliance controls endpoint to confirm the control is passing:
|
||||
|
||||
```bash
|
||||
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```json
|
||||
{ "id": "CC6.1", "name": "Encryption at Rest", "status": "passing", "lastChecked": "..." }
|
||||
```
|
||||
|
||||
### Step 6: Re-encryption of Existing Rows
|
||||
|
||||
Existing rows encrypted with the old key will fail to decrypt after key rotation.
|
||||
Re-encryption happens lazily: the next time each row is read and re-written (e.g. credential rotation,
|
||||
webhook update), the application will decrypt with the old key and re-encrypt with the new one.
|
||||
|
||||
For immediate full re-encryption, use the re-encryption script:
|
||||
|
||||
```bash
|
||||
# Run the re-encryption migration script (reads old key from backup, encrypts with new key)
|
||||
# Note: This script requires both old and new keys to be available
|
||||
ts-node scripts/reencrypt-columns.ts --old-key-file /secure/backup/encryption-key-<date>.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Rollback
|
||||
|
||||
If the new key causes issues (e.g. test failures, decryption errors), roll back:
|
||||
|
||||
### Step 1: Restore Old Key to Vault
|
||||
|
||||
```bash
|
||||
vault kv put secret/agentidp/encryption-key encryptionKey="<old-64-char-hex-key-from-backup>"
|
||||
```
|
||||
|
||||
### Step 2: Restart the Application
|
||||
|
||||
```bash
|
||||
kubectl rollout restart deployment/agentidp
|
||||
```
|
||||
|
||||
### Step 3: Verify Recovery
|
||||
|
||||
```bash
|
||||
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
|
||||
```
|
||||
|
||||
### Step 4: Investigate Root Cause
|
||||
|
||||
Review application logs for `AES-256-CBC decryption failed` errors and audit the cause before
|
||||
reattempting rotation.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely Cause | Resolution |
|
||||
|---|---|---|
|
||||
| `Invalid encryption key ... expected a 64-character hex string` | Key in Vault is wrong length or encoding | Re-write correct key to Vault, restart |
|
||||
| `AES-256-CBC decryption failed — possible key mismatch` | Key rotated but rows still encrypted with old key | Rollback to old key, then migrate properly |
|
||||
| `CC6.1` status shows `unknown` | Vault unreachable, key fetch failed | Check Vault connectivity, `VAULT_ADDR`, `VAULT_TOKEN` |
|
||||
|
||||
---
|
||||
|
||||
## Audit Evidence
|
||||
|
||||
After rotation, record the following for SOC 2 evidence:
|
||||
|
||||
- Date of rotation
|
||||
- Who performed the rotation (approver + executor)
|
||||
- Vault audit log entry confirming the key write
|
||||
- Application log confirming EncryptionService initialised with new key
|
||||
- `GET /compliance/controls` response showing CC6.1 = passing
|
||||
229
docs/compliance/incident-response.md
Normal file
229
docs/compliance/incident-response.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Incident Response Runbook — SentryAgent.ai AgentIdP
|
||||
|
||||
**Owner:** Security Engineering
|
||||
**Last updated:** 2026-03-31
|
||||
**Applies to:** Production AgentIdP deployments
|
||||
|
||||
This runbook covers the four incident types most relevant to SOC 2 Type II compliance monitoring.
|
||||
|
||||
---
|
||||
|
||||
## 1. Auth Failure Spike
|
||||
|
||||
### Detection
|
||||
|
||||
**Prometheus alert:** `AuthFailureSpike`
|
||||
```yaml
|
||||
expr: rate(agentidp_http_requests_total{status_code="401"}[5m]) > 0.5
|
||||
for: 2m
|
||||
severity: warning
|
||||
```
|
||||
|
||||
Triggers when the rate of HTTP 401 responses exceeds 0.5 per second sustained over 2 minutes.
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. Acknowledge the alert in PagerDuty / alerting system
|
||||
2. Check whether the spike correlates with a scheduled process (e.g. batch agent key rotation, deployment)
|
||||
3. Check Prometheus dashboard for the geographic distribution of the failing requests
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
1. **Identify source agents:**
|
||||
```bash
|
||||
# Query audit log for recent auth failures
|
||||
curl -s -H "Authorization: Bearer <admin-token>" \
|
||||
"https://api.sentryagent.ai/v1/audit?action=auth.failed&limit=100"
|
||||
```
|
||||
|
||||
2. **Check for brute-force patterns:**
|
||||
Look for repeated failures from the same `client_id` or IP address.
|
||||
|
||||
3. **Check if an agent's credentials expired:**
|
||||
```bash
|
||||
# Look for expired credentials
|
||||
psql "$DATABASE_URL" -c "
|
||||
SELECT credential_id, client_id, expires_at
|
||||
FROM credentials
|
||||
WHERE status = 'active' AND expires_at < NOW()
|
||||
ORDER BY expires_at DESC LIMIT 20;"
|
||||
```
|
||||
|
||||
4. **Check for key compromise signals:**
|
||||
- Multiple agents failing simultaneously → possible key store issue
|
||||
- Single agent with high failure rate → possible credential stuffing or misconfiguration
|
||||
|
||||
### Escalation Path
|
||||
|
||||
- **Warning (< 2 req/s):** Engineering on-call investigates within 1 hour
|
||||
- **Critical (> 2 req/s sustained):** CISO notified, potential account compromise investigation
|
||||
- **If credential compromise confirmed:** Revoke affected credentials immediately via `POST /agents/:id/credentials/:credId/revoke`
|
||||
|
||||
---
|
||||
|
||||
## 2. Anomalous Token Issuance
|
||||
|
||||
### Detection
|
||||
|
||||
**Prometheus alert:** `AnomalousTokenIssuance`
|
||||
```yaml
|
||||
expr: rate(agentidp_tokens_issued_total[5m]) > 10
|
||||
for: 5m
|
||||
severity: warning
|
||||
```
|
||||
|
||||
Triggers when token issuance rate exceeds 10 per second for 5 continuous minutes.
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. Acknowledge the alert
|
||||
2. Determine if a legitimate mass-scale operation is underway (e.g. new customer onboarding, load test)
|
||||
3. Check the `scope` label breakdown on `agentidp_tokens_issued_total` to identify what scopes are being requested
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
1. **Identify top issuing agents:**
|
||||
```bash
|
||||
# Query audit log for recent token issuances
|
||||
curl -s -H "Authorization: Bearer <admin-token>" \
|
||||
"https://api.sentryagent.ai/v1/audit?action=token.issued&limit=100"
|
||||
```
|
||||
|
||||
2. **Check monthly token budget:**
|
||||
Each agent is limited to 10,000 tokens/month (free tier). A single agent hitting the limit may indicate automation abuse.
|
||||
|
||||
3. **Check for abnormal scope combinations:**
|
||||
If tokens are being issued with `admin:orgs` or `audit:read` at high volume, this warrants immediate investigation.
|
||||
|
||||
4. **Check for valid business reason:**
|
||||
Contact the organization owner for the top-issuing agents.
|
||||
|
||||
### Escalation Path
|
||||
|
||||
- **Warning:** Engineering on-call investigates within 4 hours
|
||||
- **If compromise suspected:** Revoke affected agent tokens via Redis revocation list, rotate credentials
|
||||
- **If systematic abuse confirmed:** Suspend the issuing agent(s) via `PATCH /agents/:id` with `status: suspended`
|
||||
|
||||
---
|
||||
|
||||
## 3. Audit Chain Integrity Failure
|
||||
|
||||
### Detection
|
||||
|
||||
**Prometheus alert:** `AuditChainIntegrityFailed`
|
||||
```yaml
|
||||
expr: agentidp_audit_chain_integrity == 0
|
||||
for: 0m
|
||||
severity: critical
|
||||
```
|
||||
|
||||
Fires immediately when `AuditChainVerificationJob` detects a break in the audit event hash chain.
|
||||
This is a **CRITICAL** security event — possible evidence of log tampering.
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. **Do NOT attempt to repair the broken chain** — preserve all evidence
|
||||
2. Notify CISO and security team immediately
|
||||
3. Page the on-call security engineer with P0 priority
|
||||
4. Capture the current state:
|
||||
```bash
|
||||
curl -s -H "Authorization: Bearer <audit-token>" \
|
||||
"https://api.sentryagent.ai/v1/audit/verify" | tee /secure/incident-$(date +%Y%m%d-%H%M).json
|
||||
```
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
1. **Determine the broken event:**
|
||||
The `brokenAtEventId` field in the `/audit/verify` response identifies the first broken event.
|
||||
|
||||
2. **Forensic analysis:**
|
||||
Follow the steps in `docs/compliance/audit-log-runbook.md` — "What to Do When brokenAtEventId is Returned".
|
||||
|
||||
3. **Check database access logs:**
|
||||
Review PostgreSQL `pg_stat_activity` and connection logs for unauthorized direct DB access.
|
||||
|
||||
4. **Check application logs:**
|
||||
Look for any errors from the immutability trigger (`audit_events_immutable`).
|
||||
|
||||
5. **Check Vault audit logs:**
|
||||
Review whether any encryption key access was abnormal.
|
||||
|
||||
### Escalation Path
|
||||
|
||||
- **Immediate:** CISO + Legal + Security Engineering
|
||||
- **Within 1 hour:** Begin forensic preservation per incident response plan
|
||||
- **Within 24 hours:** Determine scope of compromise and notification obligations
|
||||
- **Customer notification:** Per contractual and regulatory obligations (GDPR, SOC 2 requirements)
|
||||
|
||||
---
|
||||
|
||||
## 4. Webhook Dead-Letter Accumulation
|
||||
|
||||
### Detection
|
||||
|
||||
**Prometheus alert:** `WebhookDeadLetterAccumulating`
|
||||
```yaml
|
||||
expr: increase(agentidp_webhook_dead_letters_total[1h]) > 10
|
||||
for: 0m
|
||||
severity: critical
|
||||
```
|
||||
|
||||
Fires when more than 10 webhook deliveries reach dead-letter status within an hour.
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. Acknowledge the alert
|
||||
2. Check which `organization_id` labels are accumulating dead-letters:
|
||||
```bash
|
||||
# Prometheus query: top organizations by dead-letter rate
|
||||
# agentidp_webhook_dead_letters_total (by organization_id)
|
||||
```
|
||||
|
||||
3. Check if the destination endpoints are reachable:
|
||||
```bash
|
||||
curl -I https://<webhook-destination-url>/
|
||||
```
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
1. **List affected webhook subscriptions:**
|
||||
```bash
|
||||
# Query delivery records for dead-letter status
|
||||
psql "$DATABASE_URL" -c "
|
||||
SELECT s.id, s.organization_id, s.url, COUNT(d.id) AS dead_letters
|
||||
FROM webhook_subscriptions s
|
||||
JOIN webhook_deliveries d ON d.subscription_id = s.id
|
||||
WHERE d.status = 'dead_letter'
|
||||
AND d.updated_at > NOW() - INTERVAL '2 hours'
|
||||
GROUP BY s.id
|
||||
ORDER BY dead_letters DESC
|
||||
LIMIT 20;"
|
||||
```
|
||||
|
||||
2. **Check delivery failure reasons:**
|
||||
```bash
|
||||
psql "$DATABASE_URL" -c "
|
||||
SELECT http_status_code, COUNT(*) as count
|
||||
FROM webhook_deliveries
|
||||
WHERE status = 'dead_letter'
|
||||
AND updated_at > NOW() - INTERVAL '2 hours'
|
||||
GROUP BY http_status_code;"
|
||||
```
|
||||
|
||||
3. **Common causes and resolutions:**
|
||||
| HTTP Status | Likely Cause | Resolution |
|
||||
|---|---|---|
|
||||
| 0 / null | Network unreachable / DNS failure | Check recipient endpoint availability |
|
||||
| 401 / 403 | HMAC signature validation failing | Customer to verify HMAC secret |
|
||||
| 404 | Endpoint URL changed | Customer to update webhook URL |
|
||||
| 5xx | Recipient server error | Customer to investigate their endpoint |
|
||||
| Timeout | Slow recipient endpoint | Customer to optimize endpoint response time |
|
||||
|
||||
4. **Notify affected customers:**
|
||||
Contact the organization owner for high-volume dead-letter subscriptions.
|
||||
|
||||
### Escalation Path
|
||||
|
||||
- **Warning (10-50/hr):** Engineering notifies affected customers, investigates endpoint health
|
||||
- **Critical (> 50/hr):** Engineering on-call + Platform reliability team engaged
|
||||
- **If systemic delivery infrastructure failure:** Activate incident bridge, escalate to VP Engineering
|
||||
142
docs/compliance/secrets-rotation.md
Normal file
142
docs/compliance/secrets-rotation.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Secrets Rotation Runbook — SentryAgent.ai AgentIdP
|
||||
|
||||
**Control:** SOC 2 CC9.2 — Secrets Rotation
|
||||
**Last updated:** 2026-03-31
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
AgentIdP manages three categories of secrets that require periodic rotation:
|
||||
|
||||
1. **Agent client secrets** — Per-credential client secrets used for OAuth 2.0 token issuance
|
||||
2. **OIDC signing keys** — RSA/EC keys used to sign ID tokens
|
||||
3. **AES-256-CBC encryption key** — Column-level database encryption key (see `encryption-runbook.md`)
|
||||
|
||||
---
|
||||
|
||||
## 1. Agent Credential (Client Secret) Rotation
|
||||
|
||||
### API endpoint
|
||||
|
||||
```
|
||||
POST /api/v1/agents/:agentId/credentials/:credentialId/rotate
|
||||
```
|
||||
|
||||
Requires Bearer token with `agents:write` scope.
|
||||
|
||||
### Procedure
|
||||
|
||||
```bash
|
||||
# 1. List active credentials for the agent
|
||||
curl -s -H "Authorization: Bearer <token>" \
|
||||
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials?status=active"
|
||||
|
||||
# 2. Rotate the credential (generate new secret)
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer <token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"expiresAt": "2027-03-31T00:00:00.000Z"}' \
|
||||
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials/<credentialId>/rotate"
|
||||
|
||||
# Response includes the new clientSecret — store it immediately; it is never shown again
|
||||
```
|
||||
|
||||
### Key points
|
||||
|
||||
- The new `clientSecret` is returned **once only** — store it securely before the response is discarded
|
||||
- The agent's previous secret is immediately invalidated (Vault KV v2 version overwritten)
|
||||
- An audit event `credential.rotated` is logged to the immutable audit chain
|
||||
- A `credential.rotated` webhook event is dispatched to all active subscriptions
|
||||
|
||||
### Recommended rotation schedule
|
||||
|
||||
| Credential type | Recommended rotation interval |
|
||||
|---|---|
|
||||
| Production agent credentials | 90 days |
|
||||
| Staging / development credentials | 180 days |
|
||||
| Service account credentials | 365 days (annual) |
|
||||
| Credentials involved in a security incident | Immediately |
|
||||
|
||||
### Automated expiry detection
|
||||
|
||||
`SecretsRotationJob` runs hourly and queries credentials expiring within 7 days.
|
||||
Prometheus alert `CredentialExpiryApproaching` fires immediately when any are detected.
|
||||
Respond to this alert by rotating the flagged credential(s) before the expiry date.
|
||||
|
||||
---
|
||||
|
||||
## 2. OIDC Signing Key Rotation
|
||||
|
||||
### Overview
|
||||
|
||||
OIDC signing keys are managed by `OIDCKeyService` (`src/services/OIDCKeyService.ts`).
|
||||
Keys are stored in the `oidc_keys` PostgreSQL table. The current active key is used to
|
||||
sign all new ID tokens; public keys are exposed via `GET /.well-known/jwks.json`.
|
||||
|
||||
### When to rotate
|
||||
|
||||
- Key compromise or suspected exposure
|
||||
- Scheduled rotation (recommended every 90 days for production)
|
||||
- Algorithm upgrade (e.g. RS256 → ES256)
|
||||
|
||||
### Rotation procedure
|
||||
|
||||
OIDC key rotation is handled automatically by `OIDCKeyService.ensureCurrentKey()`:
|
||||
|
||||
```bash
|
||||
# Force generation of a new signing key by calling the internal rotate endpoint
|
||||
# (or trigger by redeploying with OIDC_FORCE_KEY_ROTATION=true)
|
||||
|
||||
# 1. Mark current key as inactive (if manual rotation is required)
|
||||
psql "$DATABASE_URL" -c "
|
||||
UPDATE oidc_keys
|
||||
SET active = false
|
||||
WHERE active = true;"
|
||||
|
||||
# 2. Restart the application — ensureCurrentKey() will generate a new key on startup
|
||||
kubectl rollout restart deployment/agentidp
|
||||
```
|
||||
|
||||
### JWKS update behavior
|
||||
|
||||
- Old public keys remain in `GET /.well-known/jwks.json` for **24 hours** after rotation
|
||||
(grace period for in-flight tokens)
|
||||
- After the grace period, old keys are removed from the JWKS endpoint
|
||||
- Redis JWKS cache TTL is configured by `JWKS_CACHE_TTL_SECONDS` (default: 3600)
|
||||
|
||||
### Impact on existing tokens
|
||||
|
||||
Existing valid tokens signed with the old key **continue to work** until they expire,
|
||||
as long as the old public key remains in JWKS. After the grace period, old tokens
|
||||
will fail verification.
|
||||
|
||||
---
|
||||
|
||||
## 3. Encryption Key Rotation
|
||||
|
||||
See `docs/compliance/encryption-runbook.md` for the full AES-256-CBC encryption key rotation procedure.
|
||||
|
||||
**Summary:** Generate new 32-byte hex key → write to Vault at `ENCRYPTION_KEY_VAULT_PATH` → restart app → existing rows re-encrypted lazily on next read-write cycle.
|
||||
|
||||
---
|
||||
|
||||
## Schedule Recommendations
|
||||
|
||||
| Secret Type | Production Interval | Staging Interval | Trigger for Immediate Rotation |
|
||||
|---|---|---|---|
|
||||
| Agent client secrets | 90 days | 180 days | Credential suspected compromised |
|
||||
| OIDC signing keys | 90 days | 180 days | Key file exposed, algorithm upgrade |
|
||||
| AES-256-CBC encryption key | 365 days (annual) | On demand | Key exposed, Vault breach, compliance audit requirement |
|
||||
| Webhook HMAC secrets | Per customer policy | N/A | Webhook endpoint compromised |
|
||||
|
||||
---
|
||||
|
||||
## Compliance Evidence
|
||||
|
||||
For SOC 2 CC9.2 evidence collection:
|
||||
|
||||
- Prometheus metric history: `agentidp_credentials_expiring_soon_total`
|
||||
- Audit log entries with `action: credential.rotated` — query via `GET /audit?action=credential.rotated`
|
||||
- Key rotation records from Vault audit log
|
||||
- This runbook + sign-off from Security Engineering
|
||||
42
docs/compliance/soc2-controls-matrix.md
Normal file
42
docs/compliance/soc2-controls-matrix.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# SOC 2 Type II Controls Matrix — SentryAgent.ai AgentIdP
|
||||
|
||||
This document maps the five in-scope SOC 2 Trust Services Criteria (TSC) controls to their
|
||||
corresponding implementation artefacts, mechanisms, and automated verification methods.
|
||||
|
||||
---
|
||||
|
||||
## Controls Matrix
|
||||
|
||||
| Control ID | TSC Criterion Name | Implementation File | Mechanism | Automated Check |
|
||||
|---|---|---|---|---|
|
||||
| **CC6.1** | Encryption at Rest | `src/services/EncryptionService.ts` | AES-256-CBC column-level encryption on `credentials.secret_hash`, `credentials.vault_path`, `webhook_subscriptions.vault_secret_path`, `agent_did_keys.vault_key_path`. Key is stored in HashiCorp Vault KV v2 at path configured by `ENCRYPTION_KEY_VAULT_PATH`. IV is randomised per encryption call. Backward-compat: `isEncrypted()` gate allows plaintext rows to coexist during migration. | `GET /api/v1/compliance/controls` returns `CC6.1` status. Status is set to `passing` on service startup when `EncryptionService` initialises. |
|
||||
| **CC6.7** | TLS Enforcement | `src/middleware/TLSEnforcementMiddleware.ts` | Express middleware registered as the **first** middleware in the app stack (before all routes and body parsers). In `NODE_ENV=production`, checks `X-Forwarded-Proto` header set by the upstream load balancer/reverse proxy. Any non-HTTPS request receives a `301 Moved Permanently` redirect to `https://`. | `GET /api/v1/compliance/controls` returns `CC6.7` status. TLS enforcement is a static configuration control; status is set to `passing` on application startup. |
|
||||
| **CC7.2** | Audit Log Integrity | `src/services/AuditVerificationService.ts`, `src/repositories/AuditRepository.ts`, `src/jobs/AuditChainVerificationJob.ts` | Each audit event (`audit_events` table) stores a `hash` (SHA-256 of `eventId + timestamp + action + outcome + agentId + organizationId + previousHash`) and `previous_hash` linking it to the prior event. An immutability trigger prevents UPDATE/DELETE on `audit_events`. `AuditChainVerificationJob` re-walks the entire chain every hour. | Prometheus gauge `agentidp_audit_chain_integrity` (1 = passing, 0 = failing). Prometheus alert `AuditChainIntegrityFailed` fires when gauge = 0. `GET /api/v1/audit/verify` triggers an on-demand verification. `GET /api/v1/compliance/controls` returns `CC7.2` status. |
|
||||
| **CC9.2** | Secrets Rotation | `src/jobs/SecretsRotationJob.ts` | `SecretsRotationJob` runs every hour (configurable via `SECRETS_ROTATION_CHECK_INTERVAL_MS`) and queries `credentials` for `active` credentials expiring within 7 days. For each, it increments the `agentidp_credentials_expiring_soon_total` Prometheus counter with the owning `agent_id`. Operators are expected to act on the alert within the 7-day window. | Prometheus counter `agentidp_credentials_expiring_soon_total` per `agent_id`. Prometheus alert `CredentialExpiryApproaching` fires when any increase is detected. `GET /api/v1/compliance/controls` returns `CC9.2` status. |
|
||||
| **CC7.1** | Webhook Dead-Letter Monitoring | `src/workers/WebhookDeliveryWorker.ts` | `WebhookDeliveryWorker` processes webhook deliveries from a Redis queue. After exhausting all retry attempts (configurable `WEBHOOK_MAX_RETRIES`), the delivery is moved to dead-letter status and `agentidp_webhook_dead_letters_total` is incremented. | Prometheus counter `agentidp_webhook_dead_letters_total` per `organization_id`. Prometheus alert `WebhookDeadLetterAccumulating` fires when > 10 dead-letters accumulate in 1 hour. `GET /api/v1/compliance/controls` returns `CC7.1` status. |
|
||||
|
||||
---
|
||||
|
||||
## Evidence Collection
|
||||
|
||||
For a SOC 2 Type II audit, the following evidence should be collected:
|
||||
|
||||
| Evidence Type | Collection Method |
|
||||
|---|---|
|
||||
| Encryption at rest configuration | Export Vault KV v2 policy + `_encryption_migration_log` table contents |
|
||||
| TLS certificate and enforcement logs | Load balancer access logs + `X-Forwarded-Proto` middleware responses |
|
||||
| Audit chain integrity report | `GET /api/v1/audit/verify` with full date range |
|
||||
| Secrets rotation compliance | Prometheus metric history for `agentidp_credentials_expiring_soon_total` |
|
||||
| Webhook dead-letter rate | Prometheus metric history for `agentidp_webhook_dead_letters_total` |
|
||||
| Immutable audit log dump | Direct PostgreSQL export of `audit_events` table with hash verification |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- SOC 2 Trust Services Criteria: [AICPA TSC 2017](https://www.aicpa.org/resources/article/trust-services-criteria)
|
||||
- OpenAPI spec: `docs/openapi/compliance.yaml`
|
||||
- Encryption runbook: `docs/compliance/encryption-runbook.md`
|
||||
- Audit log runbook: `docs/compliance/audit-log-runbook.md`
|
||||
- Incident response: `docs/compliance/incident-response.md`
|
||||
- Secrets rotation: `docs/compliance/secrets-rotation.md`
|
||||
603
docs/devops/deployment.md
Normal file
603
docs/devops/deployment.md
Normal file
@@ -0,0 +1,603 @@
|
||||
# Deployment Guide — SentryAgent.ai AgentIdP
|
||||
|
||||
End-to-end guide for deploying AgentIdP to AWS (primary) and GCP (secondary) using the Terraform infrastructure-as-code in `terraform/`.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Prerequisites](#1-prerequisites)
|
||||
2. [AWS Deployment](#2-aws-deployment)
|
||||
3. [GCP Deployment](#3-gcp-deployment)
|
||||
4. [Post-Deploy Verification](#4-post-deploy-verification)
|
||||
5. [Rollback Procedure](#5-rollback-procedure)
|
||||
6. [Environment Variable Reference](#6-environment-variable-reference)
|
||||
|
||||
---
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
### Tools
|
||||
|
||||
| Tool | Minimum Version | Install |
|
||||
|------|-----------------|---------|
|
||||
| Terraform | 1.6.0 | https://developer.hashicorp.com/terraform/install |
|
||||
| AWS CLI | 2.13 | https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html |
|
||||
| gcloud CLI | 460.0 | https://cloud.google.com/sdk/docs/install |
|
||||
| Docker | 24.0 | Required only for building and pushing images |
|
||||
| openssl | any | Required for generating JWT key pairs |
|
||||
|
||||
Verify all tools are available:
|
||||
|
||||
```bash
|
||||
terraform version
|
||||
aws --version
|
||||
gcloud version
|
||||
docker version
|
||||
openssl version
|
||||
```
|
||||
|
||||
### Container Image
|
||||
|
||||
Build and push the `sentryagent/agentidp` image to your registry before deploying. Terraform references the image by tag — it does not build it.
|
||||
|
||||
```bash
|
||||
# From the project root
|
||||
docker build -t sentryagent/agentidp:1.0.0 .
|
||||
|
||||
# Push to your registry (ECR example):
|
||||
aws ecr get-login-password --region us-east-1 \
|
||||
| docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
|
||||
|
||||
docker tag sentryagent/agentidp:1.0.0 \
|
||||
123456789012.dkr.ecr.us-east-1.amazonaws.com/sentryagent/agentidp:1.0.0
|
||||
|
||||
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/sentryagent/agentidp:1.0.0
|
||||
```
|
||||
|
||||
Update `app_image_tag` in your `terraform.tfvars` to match.
|
||||
|
||||
### JWT Key Pair
|
||||
|
||||
Generate the RSA-2048 key pair used for signing and verifying JWTs:
|
||||
|
||||
```bash
|
||||
openssl genrsa -out jwt_private.pem 2048
|
||||
openssl rsa -in jwt_private.pem -pubout -out jwt_public.pem
|
||||
|
||||
# Verify
|
||||
openssl rsa -in jwt_private.pem -check -noout
|
||||
```
|
||||
|
||||
Keep `jwt_private.pem` secure — treat it with the same sensitivity as a TLS private key. You will paste its contents into `terraform.tfvars`.
|
||||
|
||||
---
|
||||
|
||||
## 2. AWS Deployment
|
||||
|
||||
### 2.1 Configure AWS CLI
|
||||
|
||||
```bash
|
||||
aws configure
|
||||
# Provide: AWS Access Key ID, Secret Access Key, region (e.g. us-east-1), output format (json)
|
||||
|
||||
# Verify credentials
|
||||
aws sts get-caller-identity
|
||||
```
|
||||
|
||||
The IAM principal running Terraform requires permissions to manage: VPC, ECS, RDS, ElastiCache, ALB, IAM roles, Secrets Manager, Route 53, CloudWatch, and VPC endpoints.
|
||||
|
||||
### 2.2 Provision an ACM Certificate
|
||||
|
||||
The ALB requires an ACM certificate for your domain. Create it in the same region as your deployment.
|
||||
|
||||
```bash
|
||||
aws acm request-certificate \
|
||||
--domain-name idp.example.com \
|
||||
--validation-method DNS \
|
||||
--region us-east-1
|
||||
```
|
||||
|
||||
Complete DNS validation by adding the CNAME record shown in the ACM console. Wait for the status to become `ISSUED` before proceeding.
|
||||
|
||||
```bash
|
||||
# Monitor validation status
|
||||
aws acm describe-certificate \
|
||||
--certificate-arn arn:aws:acm:us-east-1:123456789012:certificate/XXXX \
|
||||
--region us-east-1 \
|
||||
--query 'Certificate.Status'
|
||||
```
|
||||
|
||||
### 2.3 Prepare tfvars
|
||||
|
||||
```bash
|
||||
cd terraform/environments/aws
|
||||
cp terraform.tfvars.example terraform.tfvars
|
||||
```
|
||||
|
||||
Edit `terraform.tfvars`. All fields marked `REPLACE_WITH_*` are required. Key fields:
|
||||
|
||||
- `region` — AWS region (must match the ACM certificate region)
|
||||
- `domain_name` — your domain (e.g. `idp.example.com`)
|
||||
- `certificate_arn` — ARN from step 2.2
|
||||
- `app_image_tag` — tag of the image you pushed in step 1
|
||||
- `db_password` — strong random password (no `@`, `#`, `?`, `/` characters — they break URL parsing)
|
||||
- `redis_auth_token` — minimum 16 characters, no spaces
|
||||
- `jwt_private_key` — full PEM contents of `jwt_private.pem` with literal `\n` for newlines
|
||||
- `jwt_public_key` — full PEM contents of `jwt_public.pem` with literal `\n` for newlines
|
||||
|
||||
Example for encoding PEM keys in tfvars:
|
||||
|
||||
```bash
|
||||
# Output the private key as a single line with \n separators (for pasting into tfvars)
|
||||
awk 'NF {printf "%s\\n", $0}' jwt_private.pem
|
||||
```
|
||||
|
||||
**Never commit `terraform.tfvars` to version control.**
|
||||
|
||||
### 2.4 Configure Remote State (Recommended)
|
||||
|
||||
Uncomment and configure the `backend "s3"` block in `terraform/environments/aws/main.tf`:
|
||||
|
||||
```hcl
|
||||
backend "s3" {
|
||||
bucket = "your-terraform-state-bucket"
|
||||
key = "agentidp/aws/production/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "your-terraform-locks-table"
|
||||
}
|
||||
```
|
||||
|
||||
Create the S3 bucket and DynamoDB table if they do not exist:
|
||||
|
||||
```bash
|
||||
# S3 bucket with versioning and encryption
|
||||
aws s3api create-bucket --bucket your-terraform-state-bucket --region us-east-1
|
||||
aws s3api put-bucket-versioning \
|
||||
--bucket your-terraform-state-bucket \
|
||||
--versioning-configuration Status=Enabled
|
||||
aws s3api put-bucket-encryption \
|
||||
--bucket your-terraform-state-bucket \
|
||||
--server-side-encryption-configuration \
|
||||
'{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
|
||||
|
||||
# DynamoDB table for state locking
|
||||
aws dynamodb create-table \
|
||||
--table-name your-terraform-locks-table \
|
||||
--attribute-definitions AttributeName=LockID,AttributeType=S \
|
||||
--key-schema AttributeName=LockID,KeyType=HASH \
|
||||
--billing-mode PAY_PER_REQUEST \
|
||||
--region us-east-1
|
||||
```
|
||||
|
||||
### 2.5 Terraform Init
|
||||
|
||||
```bash
|
||||
cd terraform/environments/aws
|
||||
terraform init
|
||||
```
|
||||
|
||||
Expected output: provider plugins downloaded, backend initialized.
|
||||
|
||||
### 2.6 Terraform Plan
|
||||
|
||||
```bash
|
||||
terraform plan -out=tfplan
|
||||
```
|
||||
|
||||
Review the plan carefully before applying. Expected resources on first apply: ~50–60 resources (VPC, subnets, NAT gateways, VPC endpoints, IAM roles, secrets, RDS, ElastiCache, ALB, ECS cluster, task definition, service, Route 53 record).
|
||||
|
||||
### 2.7 Terraform Apply
|
||||
|
||||
```bash
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
**First apply takes 20–30 minutes** — RDS Multi-AZ provisioning is the longest step (~15 min). Do not interrupt the apply.
|
||||
|
||||
When complete, note the outputs:
|
||||
|
||||
```bash
|
||||
terraform output
|
||||
```
|
||||
|
||||
Key outputs:
|
||||
- `service_url` — the HTTPS URL of your deployed service
|
||||
- `alb_dns_name` — ALB DNS name (verify Route 53 alias is pointing here)
|
||||
- `ecs_service_name` — use for ECS deployment commands
|
||||
- `cloudwatch_log_group` — where container logs appear
|
||||
|
||||
### 2.8 Run Database Migrations
|
||||
|
||||
After first deploy, run migrations against the new RDS instance. The easiest approach is to exec into a running ECS task:
|
||||
|
||||
```bash
|
||||
# Get a running task ARN
|
||||
TASK_ARN=$(aws ecs list-tasks \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--service-name sentryagent-agentidp-production \
|
||||
--query 'taskArns[0]' \
|
||||
--output text)
|
||||
|
||||
# Run migrations via ECS Exec (requires enableExecuteCommand on the service)
|
||||
aws ecs execute-command \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--task $TASK_ARN \
|
||||
--container agentidp \
|
||||
--command "node scripts/db-migrate.js" \
|
||||
--interactive
|
||||
```
|
||||
|
||||
Alternatively, run a one-off ECS task with the migration command as the container override.
|
||||
|
||||
---
|
||||
|
||||
## 3. GCP Deployment
|
||||
|
||||
### 3.1 Configure gcloud CLI
|
||||
|
||||
```bash
|
||||
gcloud auth login
|
||||
gcloud config set project your-gcp-project-id
|
||||
gcloud auth application-default login
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
gcloud config list
|
||||
gcloud projects describe your-gcp-project-id
|
||||
```
|
||||
|
||||
The principal running Terraform requires the following roles on the project:
|
||||
- `roles/owner` or a custom role covering: Cloud Run Admin, Cloud SQL Admin, Redis Admin, Secret Manager Admin, IAM Admin, Compute Admin, Service Networking Admin.
|
||||
|
||||
### 3.2 Prepare tfvars
|
||||
|
||||
```bash
|
||||
cd terraform/environments/gcp
|
||||
cp terraform.tfvars.example terraform.tfvars
|
||||
```
|
||||
|
||||
Edit `terraform.tfvars`. Key fields:
|
||||
|
||||
- `project_id` — your GCP project ID
|
||||
- `region` — GCP region (e.g. `us-central1`)
|
||||
- `app_image_tag` — tag of the image you built
|
||||
- `db_password` — strong random password for Cloud SQL
|
||||
- `jwt_private_key` / `jwt_public_key` — same PEM keys used for AWS (same key pair for both regions)
|
||||
|
||||
**Never commit `terraform.tfvars` to version control.**
|
||||
|
||||
### 3.3 Configure Remote State (Recommended)
|
||||
|
||||
Uncomment and configure the `backend "gcs"` block in `terraform/environments/gcp/main.tf`:
|
||||
|
||||
```hcl
|
||||
backend "gcs" {
|
||||
bucket = "your-terraform-state-bucket"
|
||||
prefix = "agentidp/gcp/production"
|
||||
}
|
||||
```
|
||||
|
||||
Create the GCS bucket:
|
||||
|
||||
```bash
|
||||
gsutil mb -l us-central1 gs://your-terraform-state-bucket
|
||||
gsutil versioning set on gs://your-terraform-state-bucket
|
||||
```
|
||||
|
||||
### 3.4 Terraform Init
|
||||
|
||||
```bash
|
||||
cd terraform/environments/gcp
|
||||
terraform init
|
||||
```
|
||||
|
||||
### 3.5 Terraform Plan
|
||||
|
||||
```bash
|
||||
terraform plan -out=tfplan
|
||||
```
|
||||
|
||||
Review the plan. Expected resources: ~35–45 resources (VPC, subnet, VPC connector, service accounts, secrets, Cloud SQL, Memorystore, Cloud Run service, IAM bindings, API enablement).
|
||||
|
||||
### 3.6 Terraform Apply
|
||||
|
||||
```bash
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
**First apply takes 15–20 minutes** — Cloud SQL provisioning is the longest step.
|
||||
|
||||
When complete:
|
||||
|
||||
```bash
|
||||
terraform output
|
||||
```
|
||||
|
||||
Key outputs:
|
||||
- `service_url` — Cloud Run HTTPS URL (Google-managed TLS, no cert setup required)
|
||||
- `cloud_sql_connection_name` — for Cloud SQL Proxy if needed
|
||||
- `memorystore_host` — Redis private IP
|
||||
|
||||
### 3.7 Run Database Migrations
|
||||
|
||||
Cloud Run does not support exec. Use a one-off Cloud Run Job for migrations:
|
||||
|
||||
```bash
|
||||
gcloud run jobs create agentidp-migrate \
|
||||
--image sentryagent/agentidp:1.0.0 \
|
||||
--region us-central1 \
|
||||
--command node \
|
||||
--args "scripts/db-migrate.js" \
|
||||
--set-secrets "DATABASE_URL=sentryagent-agentidp-production-database-url:latest" \
|
||||
--vpc-connector sentryagent-agentidp-production-connector \
|
||||
--service-account sentryagent-agentidp-production-run-sa@your-gcp-project-id.iam.gserviceaccount.com
|
||||
|
||||
gcloud run jobs execute agentidp-migrate --region us-central1 --wait
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Post-Deploy Verification
|
||||
|
||||
Run these checks after deploying to either environment. Replace `https://idp.example.com` with your actual service URL.
|
||||
|
||||
### 4.1 Health Check
|
||||
|
||||
```bash
|
||||
curl -si https://idp.example.com/health
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```
|
||||
HTTP/2 200
|
||||
content-type: application/json
|
||||
|
||||
{"status":"ok"}
|
||||
```
|
||||
|
||||
If you receive a 502 or 503, the load balancer has not yet registered healthy targets. Wait 60–90 seconds and retry — ECS tasks or Cloud Run instances take time to pass health checks.
|
||||
|
||||
### 4.2 Metrics Endpoint
|
||||
|
||||
```bash
|
||||
curl -si https://idp.example.com/metrics
|
||||
```
|
||||
|
||||
Expected: HTTP 200 with Prometheus-format metrics text (lines beginning with `# HELP`, `# TYPE`, and metric values).
|
||||
|
||||
### 4.3 Token Endpoint (Smoke Test)
|
||||
|
||||
First, register a test agent client (requires a valid JWT or admin credentials — see [developers guide](../developers/)):
|
||||
|
||||
```bash
|
||||
# Issue a client credentials token (replace CLIENT_ID and CLIENT_SECRET with real values)
|
||||
curl -s -X POST https://idp.example.com/api/v1/token \
|
||||
-H "Content-Type: application/x-www-form-urlencoded" \
|
||||
-d "grant_type=client_credentials&client_id=test-client&client_secret=test-secret&scope=read"
|
||||
```
|
||||
|
||||
Expected response (abbreviated):
|
||||
|
||||
```json
|
||||
{
|
||||
"access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600,
|
||||
"scope": "read"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 JWKS Endpoint
|
||||
|
||||
```bash
|
||||
curl -si https://idp.example.com/.well-known/jwks.json
|
||||
```
|
||||
|
||||
Expected: HTTP 200 with a JSON object containing a `keys` array with at least one RSA public key entry.
|
||||
|
||||
### 4.5 TLS Verification
|
||||
|
||||
```bash
|
||||
# Verify TLS certificate is valid and matches your domain
|
||||
curl -vI https://idp.example.com 2>&1 | grep -E "(SSL|TLS|certificate|issuer|subject)"
|
||||
```
|
||||
|
||||
Expected: TLS 1.2 or 1.3, certificate issued by a trusted CA, subject matching your domain.
|
||||
|
||||
### 4.6 AWS-Specific: ECS Service Status
|
||||
|
||||
```bash
|
||||
aws ecs describe-services \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--services sentryagent-agentidp-production \
|
||||
--query 'services[0].{desired:desiredCount,running:runningCount,pending:pendingCount,status:status}'
|
||||
```
|
||||
|
||||
Expected: `running` equals `desired`, `status` is `ACTIVE`.
|
||||
|
||||
### 4.7 GCP-Specific: Cloud Run Service Status
|
||||
|
||||
```bash
|
||||
gcloud run services describe sentryagent-agentidp-production \
|
||||
--region us-central1 \
|
||||
--format='value(status.conditions[0].type,status.conditions[0].status)'
|
||||
```
|
||||
|
||||
Expected: `Ready True`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Rollback Procedure
|
||||
|
||||
### 5.1 Image Rollback (Recommended — fastest)
|
||||
|
||||
To roll back to a previous image tag without modifying infrastructure:
|
||||
|
||||
**AWS:**
|
||||
|
||||
```bash
|
||||
# Find the previous task definition revision
|
||||
aws ecs list-task-definitions \
|
||||
--family-prefix sentryagent-agentidp-production \
|
||||
--sort DESC \
|
||||
--query 'taskDefinitionArns[:5]'
|
||||
|
||||
# Update the service to use the previous task definition
|
||||
aws ecs update-service \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--service sentryagent-agentidp-production \
|
||||
--task-definition sentryagent-agentidp-production:PREVIOUS_REVISION \
|
||||
--force-new-deployment
|
||||
|
||||
# Monitor the rollout
|
||||
aws ecs wait services-stable \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--services sentryagent-agentidp-production
|
||||
```
|
||||
|
||||
**GCP:**
|
||||
|
||||
```bash
|
||||
# Deploy the previous image tag directly
|
||||
gcloud run services update sentryagent-agentidp-production \
|
||||
--region us-central1 \
|
||||
--image sentryagent/agentidp:PREVIOUS_TAG
|
||||
|
||||
# Or route 100% of traffic to a specific revision
|
||||
gcloud run services update-traffic sentryagent-agentidp-production \
|
||||
--region us-central1 \
|
||||
--to-revisions PREVIOUS_REVISION_NAME=100
|
||||
```
|
||||
|
||||
### 5.2 Infrastructure Rollback via Terraform
|
||||
|
||||
If an infrastructure change (not an image update) caused the problem:
|
||||
|
||||
```bash
|
||||
# Check the state and plan to understand what changed
|
||||
terraform show
|
||||
terraform plan
|
||||
|
||||
# If you have a previous state file (S3/GCS versioning), restore it:
|
||||
# AWS:
|
||||
aws s3 cp s3://your-state-bucket/agentidp/aws/production/terraform.tfstate.PREVIOUS ./terraform.tfstate
|
||||
terraform apply -target=<affected_resource>
|
||||
|
||||
# GCP:
|
||||
gsutil cp gs://your-state-bucket/agentidp/gcp/production/PREVIOUS_VERSION ./terraform.tfstate
|
||||
terraform apply -target=<affected_resource>
|
||||
```
|
||||
|
||||
**Never run `terraform destroy` in production without CEO approval.**
|
||||
|
||||
### 5.3 Database Rollback
|
||||
|
||||
RDS (AWS) and Cloud SQL (GCP) both support point-in-time restore. Use this only as a last resort — it creates a new DB instance and requires updating the `DATABASE_URL` secret.
|
||||
|
||||
**AWS:**
|
||||
|
||||
```bash
|
||||
# Restore to a point before the problematic deployment
|
||||
aws rds restore-db-instance-to-point-in-time \
|
||||
--source-db-instance-identifier sentryagent-agentidp-production \
|
||||
--target-db-instance-identifier sentryagent-agentidp-production-restored \
|
||||
--restore-time 2026-01-01T12:00:00Z
|
||||
```
|
||||
|
||||
**GCP:**
|
||||
|
||||
```bash
|
||||
# List available backups
|
||||
gcloud sql backups list --instance sentryagent-agentidp-production-pg14
|
||||
|
||||
# Restore from a backup
|
||||
gcloud sql backups restore BACKUP_ID \
|
||||
--restore-instance sentryagent-agentidp-production-pg14
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Environment Variable Reference
|
||||
|
||||
All environment variables injected into the AgentIdP container are documented in full at:
|
||||
|
||||
**[docs/devops/environment-variables.md](./environment-variables.md)**
|
||||
|
||||
### Quick Reference
|
||||
|
||||
| Variable | Required | Source (AWS) | Source (GCP) |
|
||||
|----------|----------|--------------|--------------|
|
||||
| `DATABASE_URL` | Yes | Secrets Manager: `/<project>/<env>/database-url` | Secret Manager: `<name-prefix>-database-url` |
|
||||
| `REDIS_URL` | Yes | Secrets Manager: `/<project>/<env>/redis-url` | Secret Manager: `<name-prefix>-redis-url` |
|
||||
| `JWT_PRIVATE_KEY` | Yes | Secrets Manager: `/<project>/<env>/jwt-private-key` | Secret Manager: `<name-prefix>-jwt-private-key` |
|
||||
| `JWT_PUBLIC_KEY` | Yes | Secrets Manager: `/<project>/<env>/jwt-public-key` | Secret Manager: `<name-prefix>-jwt-public-key` |
|
||||
| `PORT` | No | Task definition env var (default: 3000) | Cloud Run env var (default: 3000) |
|
||||
| `NODE_ENV` | No | Task definition env var (`production`) | Cloud Run env var (`production`) |
|
||||
| `CORS_ORIGIN` | No | Task definition env var | Cloud Run env var |
|
||||
| `POLICY_DIR` | No | Task definition env var (`/app/policies`) | Cloud Run env var (`/app/policies`) |
|
||||
| `VAULT_ADDR` | No | Task definition env var | Cloud Run env var |
|
||||
| `VAULT_TOKEN` | No | Secrets Manager: `/<project>/<env>/vault-token` | Secret Manager: `<name-prefix>-vault-token` |
|
||||
| `VAULT_MOUNT` | No | Task definition env var (default: `secret`) | Cloud Run env var (default: `secret`) |
|
||||
|
||||
### Updating a Secret
|
||||
|
||||
**AWS:**
|
||||
|
||||
```bash
|
||||
# Update a secret value (e.g. rotate JWT keys)
|
||||
aws secretsmanager put-secret-value \
|
||||
--secret-id /sentryagent-agentidp/production/jwt-private-key \
|
||||
--secret-string "$(cat new_jwt_private.pem)"
|
||||
|
||||
# Force new ECS deployment to pick up the new secret value
|
||||
aws ecs update-service \
|
||||
--cluster sentryagent-agentidp-production \
|
||||
--service sentryagent-agentidp-production \
|
||||
--force-new-deployment
|
||||
```
|
||||
|
||||
**GCP:**
|
||||
|
||||
```bash
|
||||
# Add a new version of a secret
|
||||
gcloud secrets versions add sentryagent-agentidp-production-jwt-private-key \
|
||||
--data-file=new_jwt_private.pem
|
||||
|
||||
# Deploy a new Cloud Run revision to pick up the latest secret version
|
||||
gcloud run services update sentryagent-agentidp-production \
|
||||
--region us-central1 \
|
||||
--image sentryagent/agentidp:CURRENT_TAG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Summary
|
||||
|
||||
### AWS
|
||||
|
||||
```
|
||||
Route 53 (A alias)
|
||||
└── ALB (public subnets, HTTPS/443, ACM cert, HTTP→HTTPS redirect)
|
||||
└── Target Group
|
||||
└── ECS Fargate Service (private subnets, 2+ tasks)
|
||||
├── Secrets Manager (DATABASE_URL, REDIS_URL, JWT keys)
|
||||
├── RDS PostgreSQL 14 (private subnets, Multi-AZ, encrypted)
|
||||
└── ElastiCache Redis 7 (private subnets, primary+replica, TLS)
|
||||
```
|
||||
|
||||
### GCP
|
||||
|
||||
```
|
||||
Internet → Cloud Run Service (Google-managed TLS, auto-scaling)
|
||||
├── Secret Manager (DATABASE_URL, REDIS_URL, JWT keys)
|
||||
├── Serverless VPC Connector
|
||||
│ ├── Cloud SQL PostgreSQL 14 (private IP, REGIONAL HA)
|
||||
│ └── Memorystore Redis 7 (STANDARD_HA, TLS)
|
||||
```
|
||||
|
||||
Both environments share the same Docker image (`sentryagent/agentidp`) and the same JWT key pair — tokens issued in one region are verifiable in the other.
|
||||
@@ -117,6 +117,21 @@ KV v2 secrets engine mount path.
|
||||
|
||||
---
|
||||
|
||||
### `POLICY_DIR`
|
||||
|
||||
Directory containing OPA policy files (`authz.rego`, `authz.wasm`, `data/scopes.json`).
|
||||
|
||||
| | |
|
||||
|-|-|
|
||||
| **Required** | No |
|
||||
| **Default** | `<cwd>/policies` |
|
||||
| **Format** | Absolute or relative directory path |
|
||||
| **Example** | `POLICY_DIR=/etc/sentryagent/policies` |
|
||||
|
||||
At startup the OPA authorization middleware loads `${POLICY_DIR}/authz.wasm` (Wasm mode) if present; otherwise it loads `${POLICY_DIR}/data/scopes.json` (fallback mode). Send `SIGHUP` to the process to hot-reload the policy files without a restart.
|
||||
|
||||
---
|
||||
|
||||
### `PORT`
|
||||
|
||||
HTTP port the Express server listens on.
|
||||
@@ -187,6 +202,9 @@ MIIBIjANBgkq...
|
||||
# VAULT_ADDR=http://127.0.0.1:8200
|
||||
# VAULT_TOKEN=hvs.XXXXXXXXXXXXXXXXXXXXXX
|
||||
# VAULT_MOUNT=secret
|
||||
|
||||
# OPA Policy Engine (Phase 2 — optional, defaults to <cwd>/policies)
|
||||
# POLICY_DIR=/etc/sentryagent/policies
|
||||
```
|
||||
|
||||
> Do not commit `.env` to version control. Add it to `.gitignore`.
|
||||
|
||||
@@ -247,3 +247,38 @@ docker-compose exec redis redis-cli GET "rate:<client_id>:$WINDOW"
|
||||
```
|
||||
|
||||
**Fix:** Wait until `X-RateLimit-Reset` (Unix timestamp in the response header) before retrying. The window resets every 60 seconds.
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
AgentIdP exposes a Prometheus metrics endpoint at `GET /metrics` (unauthenticated, plain text).
|
||||
|
||||
### Metrics Exposed
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued successfully |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered successfully |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests received |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration |
|
||||
|
||||
### Starting the Monitoring Stack
|
||||
|
||||
```bash
|
||||
# Start the full stack with monitoring
|
||||
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d
|
||||
|
||||
# Prometheus: http://localhost:9090
|
||||
# Grafana: http://localhost:3001 (admin / agentidp)
|
||||
```
|
||||
|
||||
The Grafana dashboard auto-provisions on first start. Navigate to **Dashboards → AgentIdP → SentryAgent.ai — AgentIdP**.
|
||||
|
||||
### Security Note
|
||||
|
||||
`GET /metrics` is unauthenticated. In production, ensure this endpoint is:
|
||||
- Only accessible from your internal network (firewall rule or reverse proxy restriction)
|
||||
- Not exposed on a public-facing port
|
||||
|
||||
115
docs/engineering/01-overview.md
Normal file
115
docs/engineering/01-overview.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# SentryAgent.ai — Company and Product Overview
|
||||
|
||||
---
|
||||
|
||||
## 1. Company Mission
|
||||
|
||||
SentryAgent.ai is building the world's first free, open-source Agent Identity Provider
|
||||
(AgentIdP) — democratising AI agent authentication, authorisation, and governance for
|
||||
developers worldwide. The core problem it solves is one that did not have a standard
|
||||
answer until now: when an AI agent needs to call an API, how does it prove who it is?
|
||||
How does it obtain a short-lived token? How does a security team revoke its access the
|
||||
moment it is compromised? How does compliance require a full, tamper-proof record of
|
||||
everything that agent ever did? Traditional identity infrastructure — built for humans
|
||||
and static service accounts — was not designed for the fluid lifecycle of AI agents.
|
||||
|
||||
AgentIdP is the answer. It is a REST API server that acts as an identity provider
|
||||
designed specifically for non-human AI agents. Agents register with a stable UUID
|
||||
identity, authenticate via the OAuth 2.0 Client Credentials grant (RFC 6749), receive
|
||||
short-lived RS256 JWTs, and are governed by an OPA policy engine that enforces
|
||||
capability-based access control at runtime. Every significant event is written to an
|
||||
immutable audit log. The entire system is built on open standards: OAuth 2.0, RFC 7662
|
||||
(introspection), RFC 7009 (revocation), OpenAPI 3.0, and the AGNTCY interoperability
|
||||
standard from the Linux Foundation.
|
||||
|
||||
The market context is one of rapid proliferation. Enterprises are deploying dozens,
|
||||
then hundreds, then thousands of autonomous AI agents — each one acting on behalf of
|
||||
the organisation, calling APIs, reading sensitive data, and making decisions. Without
|
||||
standardised identity infrastructure, there is no way to audit what happened, no way
|
||||
to revoke a compromised agent cleanly, and no standard protocol for agents from
|
||||
different vendors to authenticate to each other. SentryAgent.ai fills this gap,
|
||||
providing every developer — from a student working alone to a global enterprise — the
|
||||
same enterprise-grade identity layer for free.
|
||||
|
||||
---
|
||||
|
||||
## 2. What is AGNTCY?
|
||||
|
||||
AGNTCY (pronounced "agency") is an open interoperability standard for AI agents,
|
||||
maintained under the Linux Foundation. Its central premise is that AI agents must be
|
||||
treated as first-class identities — with stable identifiers, standard authentication
|
||||
protocols, lifecycle management, and accountability mechanisms — in the same way that
|
||||
human users and cloud service accounts are today.
|
||||
|
||||
AgentIdP is the first production IdP implementing AGNTCY-aligned agent identity across
|
||||
all six AGNTCY domains:
|
||||
|
||||
| AGNTCY Domain | How AgentIdP Implements It |
|
||||
|---------------|---------------------------|
|
||||
| Non-Human Identity | Every agent receives an immutable UUID (`agentId`) assigned at registration. The identifier is DID-ready — structured to be portable into W3C DID documents in Phase 3. |
|
||||
| Agent Registry | `POST /api/v1/agents` registers an agent. `GET /api/v1/agents` lists all agents. `GET /api/v1/agents/:id` retrieves a single agent by UUID. |
|
||||
| Credential Management | Each agent holds one or more `(client_id, client_secret)` credential pairs. Secrets are bcrypt-hashed in PostgreSQL or stored in HashiCorp Vault KV v2. Credentials can be rotated and revoked independently. |
|
||||
| Authentication | OAuth 2.0 Client Credentials grant per RFC 6749. Agents POST `grant_type=client_credentials` with their `client_id` and `client_secret` to receive a signed RS256 JWT. |
|
||||
| Lifecycle | Agents transition through `active`, `suspended`, and `decommissioned` states. Decommissioning is a soft delete that cascades to revoke all active credentials. Suspended agents cannot obtain new tokens. |
|
||||
| Audit | Every significant platform event is written to an immutable `audit_events` table in PostgreSQL. Events carry `agentId`, `action`, `outcome`, `ipAddress`, `userAgent`, `metadata`, and `timestamp`. The free tier retains 90 days of history. |
|
||||
|
||||
---
|
||||
|
||||
## 3. Product Features
|
||||
|
||||
| Feature | Endpoint(s) | Notes |
|
||||
|---------|-------------|-------|
|
||||
| Agent Registry | `POST /api/v1/agents`, `GET /api/v1/agents`, `GET /api/v1/agents/:id`, `PATCH /api/v1/agents/:id`, `DELETE /api/v1/agents/:id` | Full CRUD with lifecycle; free tier capped at 100 registered agents |
|
||||
| OAuth 2.0 Token Issuance | `POST /api/v1/token` | Client Credentials flow (RFC 6749); issues RS256 JWTs; free tier capped at 10,000 tokens/month |
|
||||
| Token Introspection | `POST /api/v1/token/introspect` | RFC 7662 compliant; always returns 200, check `active` field |
|
||||
| Token Revocation | `POST /api/v1/token/revoke` | RFC 7009 compliant; idempotent; agents may only revoke their own tokens |
|
||||
| Credential Management | `POST /api/v1/agents/:id/credentials`, `GET /api/v1/agents/:id/credentials`, `DELETE /api/v1/agents/:id/credentials/:credId` | `client_secret` returned once on creation; never retrievable again |
|
||||
| Credential Rotation | `POST /api/v1/agents/:id/credentials/:credId/rotate` | Generates new secret; old secret immediately invalidated; atomic |
|
||||
| Audit Log | `GET /api/v1/audit`, `GET /api/v1/audit/:id` | Immutable, filterable by `agentId`, `action`, `outcome`, date range; paginated |
|
||||
| Web Dashboard | `/dashboard` | React 18 SPA — agents list, agent detail, credentials management, audit log, health views |
|
||||
| OPA Policy Engine | (middleware on all protected routes) | Dynamic scope-based authorisation; Rego policy in `policies/authz.rego`; hot-reload via SIGHUP |
|
||||
| Prometheus Metrics | `GET /metrics` | prom-client; all HTTP routes instrumented with request counter and duration histogram |
|
||||
| HashiCorp Vault | (opt-in, via `VAULT_ADDR` + `VAULT_TOKEN`) | KV v2 secret storage; constant-time comparison; bcrypt fallback when Vault is not configured |
|
||||
| Health Check | `GET /health` | Checks PostgreSQL and Redis connectivity; unauthenticated; used by load balancers |
|
||||
|
||||
---
|
||||
|
||||
## 4. Phase Roadmap
|
||||
|
||||
| Phase | Status | Key Deliverables |
|
||||
|-------|--------|-----------------|
|
||||
| Phase 1 — MVP | COMPLETE | Agent registry, OAuth 2.0 Client Credentials (RS256 JWTs), credential management (bcrypt), immutable audit log, Node.js SDK, Dockerfile, Docker Compose, AGNTCY alignment documentation, >80% test coverage |
|
||||
| Phase 2 — Production-Ready | COMPLETE | HashiCorp Vault opt-in integration, Python SDK (sync + async), Go SDK (context-aware), Java SDK (builder + CompletableFuture), OPA policy engine (Rego + Wasm + TypeScript fallback), React 18 + Vite 5 web dashboard, Prometheus metrics + Grafana dashboards, Terraform multi-region deployment (AWS ECS + RDS + ElastiCache; GCP Cloud Run + Cloud SQL + Memorystore) |
|
||||
| Phase 3 — Enterprise | PLANNED | AGNTCY federation (cross-IdP agent identity), W3C Decentralised Identifiers (DIDs), agent marketplace, advanced compliance reporting, SOC 2 Type II certification, enterprise tier (custom retention, SLAs, advanced RBAC) |
|
||||
|
||||
---
|
||||
|
||||
## 5. Virtual Engineering Team
|
||||
|
||||
SentryAgent.ai uses a Virtual Engineering Team (VET) model — all engineering work is
|
||||
designed, implemented, tested, and reviewed by Claude Code instances fulfilling defined
|
||||
engineering roles. The CEO (human) is the sole business decision-maker. The CTO (Claude)
|
||||
owns technical architecture and manages the engineering team autonomously. The team
|
||||
follows a strict spec-first workflow governed by the OpenSpec change management process:
|
||||
no implementation begins until an OpenAPI specification is approved by the CTO.
|
||||
|
||||
| Role | Responsibility | Approval Gate |
|
||||
|------|---------------|---------------|
|
||||
| CEO | Business priorities, scope approval, architecture approval | All scope changes, new dependencies, git push to main |
|
||||
| Virtual CTO | Architecture, technical standards, engineering team coordination, risk management | Reports to CEO; approves all implementation before commit; approves all QA sign-offs before merge |
|
||||
| Virtual Architect | OpenAPI specs, ADRs, system design, database schemas | CTO review required before implementation begins |
|
||||
| Virtual Principal Developer | TypeScript implementation per approved spec; JSDoc; zero `any` types | CTO review required before QA begins |
|
||||
| Virtual QA Engineer | Jest + Supertest test suites; >80% coverage; all quality gates | All gates must pass before CTO signs off for merge |
|
||||
|
||||
---
|
||||
|
||||
## 6. Free Tier Limits
|
||||
|
||||
| Limit | Value |
|
||||
|-------|-------|
|
||||
| Max agents | 100 |
|
||||
| Max credentials per agent | No hard cap enforced in code (5 is the documented recommendation) |
|
||||
| Max tokens in flight | 10,000 per agent per calendar month |
|
||||
| Token TTL | 3,600 seconds (1 hour) |
|
||||
| Audit log retention | 90 days |
|
||||
| API rate limit | 100 requests per minute per IP address |
|
||||
140
docs/engineering/02-architecture.md
Normal file
140
docs/engineering/02-architecture.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# System Architecture
|
||||
|
||||
---
|
||||
|
||||
## 1. Component Diagram
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Client["Client (AI Agent / Browser / CI)"]
|
||||
|
||||
Client -->|HTTPS| ExpressApp["Express App (AgentIdP)"]
|
||||
|
||||
subgraph ExpressApp["Express App — src/app.ts"]
|
||||
Router["Router (src/routes/)"]
|
||||
AuthMW["authMiddleware (src/middleware/auth.ts)"]
|
||||
OpaMW["opaMiddleware (src/middleware/opa.ts)"]
|
||||
Controller["Controller (src/controllers/)"]
|
||||
Service["Service (src/services/)"]
|
||||
Repository["Repository (src/repositories/)"]
|
||||
Router --> AuthMW --> OpaMW --> Controller --> Service --> Repository
|
||||
end
|
||||
|
||||
Repository -->|parameterized SQL| PG["PostgreSQL 14\n(agents, credentials, audit_events, token_revocations)"]
|
||||
Service -->|Redis commands| Redis["Redis 7\n(token revocation list, monthly counts, rate-limit counters)"]
|
||||
Service -->|KV v2 read/write| Vault["HashiCorp Vault\n(opt-in — when VAULT_ADDR is set)"]
|
||||
|
||||
ExpressApp -->|evaluate input| OPA["OPA Policy Engine\n(policies/authz.rego + data/scopes.json)"]
|
||||
ExpressApp -->|expose| Metrics["/metrics (prom-client)"]
|
||||
|
||||
Dashboard["Dashboard SPA (React 18 + Vite 5)\ndashboard/dist/ served from /dashboard"]
|
||||
Client -->|browser| Dashboard
|
||||
Dashboard -->|REST API calls| ExpressApp
|
||||
|
||||
Grafana["Grafana (port 3001)"] -->|scrapes| Metrics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. HTTP Request Lifecycle
|
||||
|
||||
Every authenticated API request travels through the following sequence. Understanding
|
||||
this sequence end-to-end is essential for debugging and for writing new endpoints
|
||||
correctly.
|
||||
|
||||
1. HTTP request arrives at the Node.js HTTP listener — configured in `src/server.ts`, which calls `app.listen(PORT)` after `createApp()` resolves.
|
||||
2. App-level middleware runs in registration order: `helmet()` sets security headers, `cors()` applies CORS policy from `CORS_ORIGIN`, `morgan('combined')` logs the request line (skipped in `NODE_ENV=test`), `express.json()` and `express.urlencoded()` parse the body, `metricsMiddleware` (`src/middleware/metrics.ts`) starts the request timer and records `agentidp_http_requests_total` and `agentidp_http_request_duration_seconds` on response finish.
|
||||
3. The Express router matches the path to a route definition in `src/routes/*.ts` and hands off to the appropriate middleware chain.
|
||||
4. `authMiddleware` (`src/middleware/auth.ts`) validates the Bearer JWT: extracts the token from the `Authorization` header, calls `verifyToken()` for RS256 signature and expiry, then calls `redis.get('revoked:{jti}')` to check the revocation list. On success, attaches the decoded `ITokenPayload` to `req.user`.
|
||||
5. `opaMiddleware` (`src/middleware/opa.ts`) evaluates the OPA policy: builds an `OpaInput` object from `req.method`, `req.baseUrl + req.path`, and `req.user.scope.split(' ')`, then calls `evaluate(input)`. Uses the Wasm bundle (`policies/authz.wasm`) when present, or the TypeScript fallback reading `policies/data/scopes.json`. Calls `next(new AuthorizationError())` if the policy denies.
|
||||
6. The controller (`src/controllers/*.ts`) receives the validated request, extracts and validates path params and body using Joi schemas, then delegates to the service layer.
|
||||
7. The service (`src/services/*.ts`) executes all business logic — enforces free-tier limits, resolves domain rules, and calls repositories. The service has no knowledge of HTTP.
|
||||
8. The repository (`src/repositories/*.ts`) executes parameterized SQL against PostgreSQL via `node-postgres`, or issues Redis commands via the `redis` client. No business logic lives here.
|
||||
9. The controller serialises the service result and calls `res.status(xxx).json(payload)`.
|
||||
10. `AuditService.logEvent()` is called — for high-throughput paths (token issuance, introspection, revocation) this is fire-and-forget (`void` — not awaited); for CRUD operations it is awaited. The audit event is written as an immutable row to the `audit_events` table in PostgreSQL.
|
||||
|
||||
---
|
||||
|
||||
## 3. OAuth 2.0 Client Credentials Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
actor Agent
|
||||
participant AgentIdP
|
||||
participant PostgreSQL
|
||||
participant Redis
|
||||
participant Vault as Vault (optional)
|
||||
|
||||
Agent->>AgentIdP: POST /api/v1/token<br/>grant_type=client_credentials<br/>client_id=<agentId><br/>client_secret=sk_live_...&<br/>scope=agents:read agents:write
|
||||
|
||||
AgentIdP->>PostgreSQL: SELECT * FROM agents WHERE agent_id = $1
|
||||
PostgreSQL-->>AgentIdP: agent row (status, etc.)
|
||||
|
||||
AgentIdP->>PostgreSQL: SELECT * FROM credentials WHERE agent_id = $1 AND status = 'active'
|
||||
PostgreSQL-->>AgentIdP: active credential rows
|
||||
|
||||
alt Vault path (vaultPath IS NOT NULL and VAULT_ADDR is set)
|
||||
AgentIdP->>Vault: readSecret(agentId, credentialId)
|
||||
Vault-->>AgentIdP: plain-text secret
|
||||
AgentIdP->>AgentIdP: crypto.timingSafeEqual(stored, candidate)
|
||||
else bcrypt path (fallback)
|
||||
AgentIdP->>AgentIdP: bcrypt.compare(clientSecret, secretHash)
|
||||
end
|
||||
|
||||
AgentIdP->>Redis: GET monthly:tokens:{agentId}:{yyyy-mm}
|
||||
Redis-->>AgentIdP: current monthly count
|
||||
|
||||
AgentIdP->>AgentIdP: signToken(payload, privateKey) — RS256 JWT
|
||||
|
||||
AgentIdP->>Redis: INCR monthly:tokens:{agentId}:{yyyy-mm} (fire-and-forget)
|
||||
|
||||
AgentIdP-->>Agent: 200 OK<br/>{ access_token, token_type: "Bearer", expires_in: 3600, scope }
|
||||
|
||||
Note over Agent,AgentIdP: Subsequent protected API call
|
||||
|
||||
Agent->>AgentIdP: GET /api/v1/agents<br/>Authorization: Bearer <access_token>
|
||||
AgentIdP->>AgentIdP: verifyToken(token, publicKey) — RS256 verify + expiry
|
||||
AgentIdP->>Redis: GET revoked:{jti}
|
||||
Redis-->>AgentIdP: null (not revoked)
|
||||
AgentIdP->>AgentIdP: OPA evaluate({method, path, scopes})
|
||||
AgentIdP-->>Agent: 200 OK — agents list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Multi-Region Deployment Topology
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
TFRoot["Terraform Root Module\nterraform/"]
|
||||
TFRoot --> AWSMod["AWS Module\nterraform/environments/aws/"]
|
||||
TFRoot --> GCPMod["GCP Module\nterraform/environments/gcp/"]
|
||||
|
||||
subgraph AWS["AWS (us-east-1 default)"]
|
||||
AWSVPC["VPC"] --> ECSCluster["ECS Cluster (Fargate)"]
|
||||
ECSCluster --> ECSTask["ECS Task — AgentIdP container"]
|
||||
ECSTask --> RDS["RDS PostgreSQL 14 (Multi-AZ)"]
|
||||
ECSTask --> Elasticache["ElastiCache Redis 7"]
|
||||
ALB["Application Load Balancer"] --> ECSCluster
|
||||
end
|
||||
|
||||
subgraph GCP["GCP (us-central1 default)"]
|
||||
GCPVPC["VPC"] --> CloudRun["Cloud Run service — AgentIdP"]
|
||||
CloudRun --> CloudSQL["Cloud SQL PostgreSQL 14"]
|
||||
CloudRun --> Memorystore["Memorystore Redis 7"]
|
||||
GCPLB["Cloud Load Balancer"] --> CloudRun
|
||||
end
|
||||
|
||||
AWSMod --> AWS
|
||||
GCPMod --> GCP
|
||||
|
||||
ECR["ECR / Artifact Registry\n(container image)"] --> ECSTask
|
||||
ECR --> CloudRun
|
||||
```
|
||||
|
||||
Each region is an independent deployment with its own PostgreSQL and Redis instances.
|
||||
The Terraform root module sets `aws_region` (default `us-east-1`) and `gcp_region`
|
||||
(default `us-central1`) as input variables. Infrastructure modules live under
|
||||
`terraform/modules/` (agentidp, lb, rds, redis) with environment-specific configuration
|
||||
under `terraform/environments/aws/` and `terraform/environments/gcp/`. Cross-region
|
||||
data replication and federation are Phase 3 goals.
|
||||
255
docs/engineering/03-tech-stack.md
Normal file
255
docs/engineering/03-tech-stack.md
Normal file
@@ -0,0 +1,255 @@
|
||||
# Technology Stack and Architecture Decision Records
|
||||
|
||||
Every technology choice in AgentIdP was made deliberately. This document records the
|
||||
decision, rationale, and alternatives considered for each major technology. New engineers
|
||||
should read this before making any technology additions or changes — the pattern here is
|
||||
the template for future ADRs.
|
||||
|
||||
---
|
||||
|
||||
### ADR-1: Node.js 18 LTS
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: AgentIdP server runtime and Node.js SDK runtime
|
||||
|
||||
**Decision**: Use Node.js 18 LTS as the server runtime.
|
||||
|
||||
**Rationale**: Node.js 18 LTS provides native `fetch`, native ESM support, and a
|
||||
stable V8 engine with long-term security updates. The ecosystem for Express, PostgreSQL
|
||||
(`pg`), Redis (`redis`), JWT (`jsonwebtoken`), and bcrypt (`bcryptjs`) is mature and
|
||||
well-maintained on this version. The non-blocking I/O model is well-suited for an IdP
|
||||
that handles many concurrent short-lived authentication requests. The `engines.node`
|
||||
field in `package.json` enforces `>=18.0.0`.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Deno — rejected because the npm ecosystem compatibility layer introduced friction with key dependencies (`pg`, `bcryptjs`), and the production deployment story on ECS and Cloud Run was less mature at the time of the decision.
|
||||
- Bun — rejected because it lacked LTS stability guarantees at the time of the decision, which is not acceptable for a security-critical authentication service.
|
||||
|
||||
**Consequences**: All Dockerfiles and Terraform ECS/Cloud Run task definitions must
|
||||
target Node.js 18 or a compatible LTS release. Upgrading the Node.js version requires
|
||||
CTO approval and a QA sign-off on the full test suite.
|
||||
|
||||
---
|
||||
|
||||
### ADR-2: TypeScript 5.3 Strict Mode
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: All source files — server, all SDKs, dashboard
|
||||
|
||||
**Decision**: TypeScript 5.3 with `strict: true` and every additional strictness flag enabled in `tsconfig.json`.
|
||||
|
||||
**Rationale**: AgentIdP handles authentication tokens and cryptographic secrets. Type
|
||||
errors in this domain can cause security vulnerabilities — a value that should be
|
||||
`string | null` treated as `string` can produce silent authentication bypasses. Strict
|
||||
TypeScript with `noImplicitAny`, `strictNullChecks`, `noUnusedLocals`, `noUnusedParameters`,
|
||||
and `noImplicitReturns` makes these classes of bug a compile-time error rather than a
|
||||
runtime failure in production.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Plain JavaScript — rejected because a security-critical IdP with no type safety is not a system this team is willing to ship. Every public method, every error boundary, and every data transformation must be typed.
|
||||
|
||||
**Consequences**: All new code must compile cleanly under `tsc --strict`. Zero `any`
|
||||
types — ever. No exceptions granted without CTO approval. The `tsconfig.json` enables
|
||||
`noImplicitAny`, `strictNullChecks`, `strictFunctionTypes`, `strictBindCallApply`,
|
||||
`strictPropertyInitialization`, `noImplicitThis`, `alwaysStrict`, `noUnusedLocals`,
|
||||
`noUnusedParameters`, `noImplicitReturns`, and `noFallthroughCasesInSwitch`.
|
||||
|
||||
---
|
||||
|
||||
### ADR-3: Express 4.18
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: HTTP server framework
|
||||
|
||||
**Decision**: Use Express 4.18 as the HTTP framework.
|
||||
|
||||
**Rationale**: Express is the most widely understood Node.js HTTP framework. Its
|
||||
middleware model (`(req, res, next)`) maps directly to the IdP's layered architecture:
|
||||
`helmet` → `cors` → `metricsMiddleware` → `authMiddleware` → `opaMiddleware` →
|
||||
controller → service → repository → `errorHandler`. The ecosystem for Express
|
||||
middleware (`helmet`, `cors`, `morgan`) is mature. For a spec-first project, Express's
|
||||
lack of convention about code structure is a feature — the architecture is explicit and
|
||||
fully visible in `src/app.ts`.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Fastify — rejected because the team's familiarity was lower and the performance gains would be negligible for a token service whose latency is dominated by PostgreSQL queries and bcrypt comparisons.
|
||||
- NestJS — rejected because its decorator-heavy convention-over-configuration style adds complexity not appropriate for the current team size and project scope.
|
||||
- Koa — rejected because its ecosystem is smaller and fewer engineers are familiar with it.
|
||||
|
||||
**Consequences**: All HTTP concerns (routing, middleware, error handling) use the
|
||||
Express 4 API. The `errorHandler` middleware must remain the last `app.use()` call in
|
||||
`src/app.ts`.
|
||||
|
||||
---
|
||||
|
||||
### ADR-4: PostgreSQL 14
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Primary data store for agents, credentials, and audit events
|
||||
|
||||
**Decision**: Use PostgreSQL 14 as the primary relational database.
|
||||
|
||||
**Rationale**: The audit log requires ACID guarantees — partial writes or uncommitted
|
||||
reads are not acceptable for a compliance-grade append-only event store. PostgreSQL's
|
||||
`JSONB` column type is used for the `metadata` field in `audit_events`, allowing
|
||||
structured context data without schema changes for each new event type. PostgreSQL's
|
||||
row-level security is available for multi-tenancy if that becomes a Phase 3 requirement.
|
||||
|
||||
**Alternatives considered**:
|
||||
- MySQL — rejected because its JSON support is weaker than PostgreSQL's `JSONB` with GIN indexing, and its default transaction isolation level has historically produced surprises.
|
||||
- MongoDB — rejected because the audit log must be append-only and ACID-safe. MongoDB's document model requires explicit multi-document transactions for ACID behaviour, and the schema flexibility is not needed here.
|
||||
|
||||
**Consequences**: All schema changes go through numbered SQL migration files in
|
||||
`src/db/migrations/`. Migration files are append-only — never modify an existing
|
||||
migration. New tables require a new numbered file (e.g. `005_create_agent_groups.sql`).
|
||||
|
||||
---
|
||||
|
||||
### ADR-5: Redis 7
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Token revocation list, monthly usage counters, rate-limit sliding window
|
||||
|
||||
**Decision**: Use Redis 7 as the in-memory data store.
|
||||
|
||||
**Rationale**: Token revocation requires O(1) key lookup with TTL-based automatic
|
||||
expiry. `SET revoked:{jti} 1 EX {seconds_until_expiry}` stores a revocation entry
|
||||
that expires precisely when the token itself would have expired — zero manual cleanup
|
||||
required. The monthly token counter uses Redis `INCR`, which is atomic and O(1). The
|
||||
rate-limiter uses a Redis sorted set for the sliding-window algorithm.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Memcached — rejected because Memcached does not support per-key TTL on sorted-set structures, which is required for the sliding-window rate-limiter.
|
||||
- PostgreSQL for revocation — rejected because the token verification path is the hot path in every authenticated request. A PostgreSQL round-trip adds 5–15 ms compared to a Redis `GET` at sub-millisecond latency.
|
||||
|
||||
**Consequences**: Redis is a required infrastructure dependency. A Redis instance must
|
||||
be running and reachable via `REDIS_URL` before the server starts. `docker-compose.yml`
|
||||
provides a Redis 7 Alpine container for local development on port 6379.
|
||||
|
||||
---
|
||||
|
||||
### ADR-6: HashiCorp Vault
|
||||
|
||||
**Status**: Adopted (opt-in)
|
||||
**Component**: Credential secret storage — alternative to bcrypt in PostgreSQL
|
||||
|
||||
**Decision**: Integrate HashiCorp Vault KV v2 as an opt-in secret storage backend for agent credentials.
|
||||
|
||||
**Rationale**: The Phase 1 bcrypt approach stores hashes in PostgreSQL. While bcrypt
|
||||
hashes cannot be reversed, some enterprises require that secrets never touch a relational
|
||||
database — even in hashed form. Vault provides a dedicated secrets management plane with
|
||||
HSM backing and an independent audit trail at the secrets level. The `verifySecret`
|
||||
method in `VaultClient` uses `crypto.timingSafeEqual` to prevent timing-based
|
||||
side-channel attacks when comparing stored and candidate secrets.
|
||||
|
||||
**Alternatives considered**:
|
||||
- AWS Secrets Manager — rejected because it introduces cloud-vendor lock-in. AgentIdP must run identically on AWS, GCP, and on-premises; a Vault-based approach works in all environments.
|
||||
- Plain bcrypt only — retained as the fallback path. When `VAULT_ADDR` is not set, `createVaultClientFromEnv()` returns `null` and the server operates identically to Phase 1.
|
||||
|
||||
**Consequences**: Vault is controlled by `VAULT_ADDR` (required), `VAULT_TOKEN`
|
||||
(required), and `VAULT_MOUNT` (optional, defaults to `secret`). When these are not set,
|
||||
bcrypt is used unchanged. Credential rows carry a nullable `vault_path` column: `null`
|
||||
means bcrypt; a non-null path means Vault verification is used.
|
||||
|
||||
---
|
||||
|
||||
### ADR-7: OPA (Open Policy Agent)
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Request authorisation — scope enforcement on all protected endpoints
|
||||
|
||||
**Decision**: Use Open Policy Agent with a Rego policy compiled to a Wasm bundle for runtime authorisation.
|
||||
|
||||
**Rationale**: Hard-coded scope checks in middleware would require a code deployment
|
||||
for every policy change. OPA decouples the policy (`policies/authz.rego`) from the
|
||||
server code. The policy can be updated, re-compiled to Wasm, and hot-reloaded via
|
||||
`SIGHUP` without restarting the server. The `@open-policy-agent/opa-wasm` package
|
||||
evaluates the compiled Wasm bundle in-process with microsecond latency. When no Wasm
|
||||
bundle is present (development, CI), the middleware falls back to a TypeScript
|
||||
implementation that reads `policies/data/scopes.json`.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Custom middleware with hard-coded scope checks — rejected because policy changes require code changes and a full deployment cycle. As the endpoint surface grows this becomes unmanageable.
|
||||
- Casbin — rejected because its RBAC/ABAC model is less expressive than Rego for the compound `method + path + scope-intersection` pattern AgentIdP requires.
|
||||
|
||||
**Consequences**: All authorisation rules live in `policies/authz.rego` and
|
||||
`policies/data/scopes.json`. Adding a new endpoint requires adding its scope
|
||||
requirement to `scopes.json`. A policy change is deployed by updating `scopes.json`
|
||||
(or `authz.wasm`) and sending `SIGHUP` to the running process — no redeployment needed.
|
||||
|
||||
---
|
||||
|
||||
### ADR-8: React 18 + Vite 5
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Web dashboard SPA (`dashboard/`)
|
||||
|
||||
**Decision**: Use React 18 with Vite 5 as the web dashboard framework and build tool.
|
||||
|
||||
**Rationale**: React 18's concurrent rendering model handles the dashboard's async data
|
||||
fetching patterns cleanly. The `@sentryagent/idp-sdk` Node.js package is reused
|
||||
directly in the dashboard via `TokenManager` for authentication, avoiding duplicated
|
||||
API client code. Vite 5 provides sub-second HMR in development and a fast production
|
||||
build with tree-shaking. The dashboard is built to `dashboard/dist/` and served as
|
||||
static files from Express at `/dashboard`, keeping the deployment footprint to a
|
||||
single container.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Next.js — rejected because server-side rendering is not needed for an internal operator dashboard, and the added complexity of a Next.js server is not justified.
|
||||
- Vue — rejected because the broader SentryAgent.ai ecosystem is React-first; consistency reduces context-switching overhead.
|
||||
|
||||
**Consequences**: The dashboard must be built (`cd dashboard && npm run build`) before
|
||||
Express can serve it. In local development, run `cd dashboard && npm run dev` to use
|
||||
Vite's dev server with HMR; the Vite proxy forwards `/api/` calls to Express at
|
||||
`localhost:3000`.
|
||||
|
||||
---
|
||||
|
||||
### ADR-9: Prometheus + Grafana
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Operational metrics collection and visualisation
|
||||
|
||||
**Decision**: Use Prometheus for metrics collection and Grafana for dashboards.
|
||||
|
||||
**Rationale**: Prometheus is the industry standard for metrics in container
|
||||
environments. The `prom-client` npm package integrates natively with Express and
|
||||
provides `Counter` and `Histogram` metric types that cover all observability needs for
|
||||
AgentIdP. Grafana's YAML provisioning in `monitoring/grafana/provisioning/` makes
|
||||
dashboards reproducible and version-controlled. The monitoring stack runs as a Docker
|
||||
Compose overlay (`docker-compose.monitoring.yml`) without interfering with the base dev
|
||||
environment.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Datadog — rejected because SaaS cost and vendor lock-in are not acceptable for a free, open-source product. Operators who self-host AgentIdP should not be required to pay for monitoring.
|
||||
- StatsD — rejected because StatsD's flat metric model lacks label/dimension support, which is essential for distinguishing metrics by `method`, `route`, and `status_code`.
|
||||
|
||||
**Consequences**: All metric definitions live exclusively in `src/metrics/registry.ts`.
|
||||
No other file may instantiate a `Counter` or `Histogram` — all other files import
|
||||
specific metrics from that registry. Grafana is available at port 3001 when the
|
||||
monitoring overlay is running.
|
||||
|
||||
---
|
||||
|
||||
### ADR-10: Terraform
|
||||
|
||||
**Status**: Adopted
|
||||
**Component**: Infrastructure as code — multi-region AWS + GCP deployment
|
||||
|
||||
**Decision**: Use Terraform with HCL for all infrastructure provisioning across AWS and GCP.
|
||||
|
||||
**Rationale**: Terraform's HCL syntax is readable and its provider ecosystem covers
|
||||
both AWS and GCP with the same toolchain. Reusable modules in `terraform/modules/`
|
||||
(agentidp, lb, rds, redis) are composed in environment-specific configurations under
|
||||
`terraform/environments/aws/` and `terraform/environments/gcp/`. All infrastructure
|
||||
changes go through `terraform plan` review before `terraform apply`, providing a
|
||||
diff-based approval workflow.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Pulumi — rejected because the Pulumi provider ecosystem for AWS and GCP was less mature than Terraform's at the time of the Phase 2 decision, and HCL is more readable for non-engineers reviewing infrastructure changes.
|
||||
- AWS CDK — rejected because it is AWS-only. AgentIdP must deploy identically to both AWS and GCP.
|
||||
|
||||
**Consequences**: All infrastructure changes must go through Terraform. No manual edits
|
||||
via the AWS console or GCP console are permitted — they will be overwritten on the next
|
||||
`terraform apply`. Terraform state is stored in a remote backend and must not be edited
|
||||
manually.
|
||||
146
docs/engineering/04-codebase-structure.md
Normal file
146
docs/engineering/04-codebase-structure.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Codebase Structure
|
||||
|
||||
---
|
||||
|
||||
## 1. Annotated Directory Tree
|
||||
|
||||
```
|
||||
sentryagent-idp/
|
||||
├── src/ # Express application source — controllers, services, middleware, repositories, routes
|
||||
│ ├── app.ts # Express app factory — creates and configures the app; does NOT call listen
|
||||
│ ├── server.ts # Entry point — calls listen, handles SIGTERM/SIGINT/SIGHUP
|
||||
│ ├── types/ # Canonical TypeScript interfaces and type definitions
|
||||
│ ├── controllers/ # HTTP layer — extract/validate inputs, call services, build responses
|
||||
│ ├── services/ # Business logic — pure domain operations, no HTTP knowledge
|
||||
│ ├── repositories/ # Database and Redis access — parameterized SQL, no logic
|
||||
│ ├── middleware/ # Cross-cutting request concerns — auth, OPA, rate-limit, metrics, error handling
|
||||
│ ├── routes/ # Express router definitions — wiring only, no logic
|
||||
│ ├── utils/ # Shared pure utilities — errors, validators, crypto, JWT helpers
|
||||
│ ├── vault/ # HashiCorp Vault KV v2 client
|
||||
│ ├── metrics/ # Prometheus metrics registry — all Counter and Histogram definitions
|
||||
│ ├── db/ # PostgreSQL pool factory and SQL migration files
|
||||
│ └── cache/ # Redis client factory
|
||||
├── tests/ # Jest test suite — mirrors src/ structure (unit/ and integration/)
|
||||
├── dashboard/ # React 18 + Vite 5 web dashboard SPA
|
||||
│ ├── src/ # Dashboard source — pages, components, auth, API client
|
||||
│ └── dist/ # Built dashboard — served by Express at /dashboard (git-ignored)
|
||||
├── sdk/ # Node.js SDK (@sentryagent/idp-sdk) — TypeScript, auto token refresh
|
||||
├── sdk-python/ # Python SDK (sentryagent-idp) — sync + async clients
|
||||
├── sdk-go/ # Go SDK (github.com/sentryagent/idp-sdk-go) — context-aware, goroutine-safe
|
||||
├── sdk-java/ # Java SDK (ai.sentryagent:idp-sdk) — builder pattern, CompletableFuture
|
||||
├── policies/ # OPA policy files
|
||||
│ ├── authz.rego # Rego policy — normalise_path + scope-intersection allow rule
|
||||
│ └── data/scopes.json # Endpoint permission map — used by Rego and TypeScript fallback
|
||||
├── terraform/ # Terraform infrastructure as code
|
||||
│ ├── modules/ # Reusable modules: agentidp, lb, rds, redis
|
||||
│ └── environments/ # Environment configs: aws/ (ECS+RDS+ElastiCache), gcp/ (Cloud Run+SQL+Memorystore)
|
||||
├── monitoring/ # Prometheus and Grafana configuration
|
||||
│ ├── prometheus/ # prometheus.yml scrape configuration
|
||||
│ └── grafana/ # Grafana provisioning YAML and dashboard JSON files
|
||||
├── docs/ # All project documentation
|
||||
│ ├── engineering/ # Internal engineering knowledge base (this directory)
|
||||
│ ├── developers/ # End-user API reference and developer guides
|
||||
│ ├── devops/ # Operator runbooks and environment variable reference
|
||||
│ ├── agntcy/ # AGNTCY alignment documentation
|
||||
│ └── openapi/ # OpenAPI 3.0 specification files
|
||||
├── openspec/ # OpenSpec change management — proposals, designs, specs, tasks, archives
|
||||
├── Dockerfile # Multi-stage production build (build + runtime stages)
|
||||
├── docker-compose.yml # Local development: PostgreSQL 14 (port 5432) + Redis 7 (port 6379)
|
||||
├── docker-compose.monitoring.yml # Monitoring overlay: Prometheus (port 9090) + Grafana (port 3001)
|
||||
├── package.json # Node.js dependencies and npm scripts
|
||||
├── tsconfig.json # TypeScript strict configuration — compiled to dist/
|
||||
└── jest.config.ts # Jest configuration — ts-jest, test timeouts, coverage thresholds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. src/ Subdirectory Roles
|
||||
|
||||
| Directory | Role | Rule |
|
||||
|-----------|------|------|
|
||||
| `src/controllers/` | Receive HTTP requests, extract and validate inputs using Joi, call service methods, serialise responses | No business logic — controllers are thin wrappers that translate HTTP into service calls |
|
||||
| `src/services/` | All business logic — free-tier limit enforcement, domain rule evaluation, orchestration of repository calls and audit events | Never import from controllers or routes; never know about `req` or `res` |
|
||||
| `src/repositories/` | All database and Redis queries — parameterized SQL via `node-postgres`, Redis commands via `redis` client | Only called from services; never called directly from controllers; no business logic |
|
||||
| `src/middleware/` | Cross-cutting request concerns — `authMiddleware`, `opaMiddleware`, `rateLimitMiddleware`, `metricsMiddleware`, `errorHandler` | Applied at router or app level in `src/app.ts`; never import from controllers |
|
||||
| `src/routes/` | Map HTTP paths and methods to middleware chains and controller methods | Wiring only — no logic, no validation, no business rules |
|
||||
| `src/utils/` | Shared pure utilities — `errors.ts`, `validators.ts`, `crypto.ts`, `jwt.ts`, `asyncHandler.ts` | No side effects; no imports from services or controllers |
|
||||
| `src/types/` | All TypeScript type definitions, interfaces, and enums — the single source of truth for all shared types | Imported everywhere; never imports from anywhere else in `src/` |
|
||||
| `src/vault/` | `VaultClient` — wraps HashiCorp Vault KV v2 operations; constant-time secret verification | Only instantiated by `createVaultClientFromEnv()` in `src/app.ts`; passed to services via constructor injection |
|
||||
| `src/metrics/` | Prometheus metrics registry — all `Counter` and `Histogram` definitions in one place | Only file that calls `new Counter()` or `new Histogram()`; all other files import from here |
|
||||
| `src/db/` | PostgreSQL connection pool factory (`pool.ts`) and numbered SQL migration files in `migrations/` | Pool is a singleton created once in `src/app.ts` and passed to repositories |
|
||||
| `src/cache/` | Redis client factory — creates and caches a single `redis` client instance | Client is a singleton created once in `src/app.ts` and passed to repositories |
|
||||
|
||||
---
|
||||
|
||||
## 3. Where to Add New Code
|
||||
|
||||
| I need to add... | Where it goes | Example |
|
||||
|-----------------|---------------|---------|
|
||||
| A new API endpoint | `src/routes/` (wire it), `src/controllers/` (HTTP layer), `src/services/` (business logic), `src/repositories/` (data access) | Adding `DELETE /api/v1/agents/:id/credentials/:credId/bulk` |
|
||||
| A new business rule | `src/services/[relevant]Service.ts` | Enforcing a maximum of 5 credentials per agent |
|
||||
| A new database table | `src/db/migrations/` — new numbered SQL file (append-only) | Adding an `agent_groups` table as `005_create_agent_groups.sql` |
|
||||
| A new authorisation policy rule | `policies/authz.rego` + `policies/data/scopes.json` | Adding a new scope `reports:read` for a `GET /api/v1/reports` endpoint |
|
||||
| A new shared error type | `src/utils/errors.ts` | `VaultUnavailableError` extending `SentryAgentError` |
|
||||
| A new environment variable | `src/utils/config.ts` (if it exists) or the relevant consumer file + `docs/devops/environment-variables.md` | `RATE_LIMIT_MAX` controlling the rate-limit ceiling |
|
||||
| A new Prometheus metric | `src/metrics/registry.ts` | A `Histogram` for Vault lookup duration |
|
||||
| A new TypeScript type used in 2+ files | `src/types/index.ts` | A new `AgentGroupMembership` interface |
|
||||
|
||||
---
|
||||
|
||||
## 4. Key Files
|
||||
|
||||
**`src/app.ts`**
|
||||
Creates and configures the Express application. Registers all middleware (helmet, cors,
|
||||
morgan, body parsers, metricsMiddleware), instantiates all infrastructure singletons
|
||||
(PostgreSQL pool, Redis client, VaultClient), constructs the full dependency graph
|
||||
(repositories → services → controllers), and mounts all routers. Returns the configured
|
||||
`Application` without calling `listen`. Tests import `createApp()` directly — this is
|
||||
the design decision that makes integration tests possible without binding a port.
|
||||
|
||||
**`src/server.ts`**
|
||||
The only file that calls `app.listen()`. Loads environment variables via `dotenv.config()`,
|
||||
calls `createApp()`, binds the port from `PORT` env var (default 3000), and registers
|
||||
`SIGTERM`, `SIGINT` (graceful shutdown), and `SIGHUP` (OPA policy hot-reload via
|
||||
`reloadOpaPolicy()`) signal handlers. This file is never imported by tests.
|
||||
|
||||
**`src/types/index.ts`**
|
||||
The canonical type definition file for the entire project. Contains all exported
|
||||
interfaces (`IAgent`, `ICredential`, `ITokenPayload`, `IAuditEvent`, etc.), union types
|
||||
(`AgentStatus`, `AgentType`, `AuditAction`, etc.), and the global Express `Request`
|
||||
augmentation that adds `req.user?: ITokenPayload`. If a type is needed in two or more
|
||||
files, it lives here — never redefined inline.
|
||||
|
||||
**`src/utils/errors.ts`**
|
||||
The `SentryAgentError` base class and all typed error subclasses. Every error thrown
|
||||
in the application must extend `SentryAgentError` — never `throw new Error('string')`.
|
||||
The `errorHandler` middleware in `src/middleware/errorHandler.ts` maps
|
||||
`SentryAgentError` subclasses to their `httpStatus` codes and serialises the response
|
||||
as `IErrorResponse { code, message, details }`.
|
||||
|
||||
**`docker-compose.yml`**
|
||||
Starts PostgreSQL 14 (Alpine) on port 5432 with database `sentryagent_idp` and
|
||||
Redis 7 (Alpine) on port 6379. Used for local development only. Both services have
|
||||
health checks so `depends_on` conditions work correctly. The `app` service mounts
|
||||
`./src` as a read-only volume for live code reloading.
|
||||
|
||||
**`tsconfig.json`**
|
||||
TypeScript compiler configuration. `strict: true` enables the full suite of strictness
|
||||
checks. `target: ES2022`, `module: commonjs` (the project compiles to CommonJS for
|
||||
Node.js compatibility). `outDir: ./dist`, `rootDir: ./src`. The `noUnusedLocals` and
|
||||
`noUnusedParameters` flags are enabled — unused code is a compile error. Never disable
|
||||
these flags.
|
||||
|
||||
---
|
||||
|
||||
## 5. DRY Enforcement
|
||||
|
||||
Every piece of logic lives in exactly one place. Violations are CTO-blocking.
|
||||
|
||||
| Concern | Single source of truth | Violation pattern to reject |
|
||||
|---------|----------------------|----------------------------|
|
||||
| Business logic | One service method — called from multiple controllers if needed | Business logic duplicated in a route handler or controller |
|
||||
| Database queries | One repository method — never repeated inline | SQL written directly in a service or controller |
|
||||
| Error types | `src/utils/errors.ts` — imported wherever errors are thrown | `new Error('AGENT_NOT_FOUND')` instead of `new AgentNotFoundError()` |
|
||||
| TypeScript types | `src/types/index.ts` — imported in every consumer file | An interface defined inline in a service file |
|
||||
| Validation logic | `src/utils/validators.ts` — Joi schemas used in controllers | Validation logic duplicated across multiple controllers |
|
||||
| Prometheus metrics | `src/metrics/registry.ts` — one definition per metric | A second `new Counter({ name: 'agentidp_tokens_issued_total' })` anywhere |
|
||||
342
docs/engineering/05-services.md
Normal file
342
docs/engineering/05-services.md
Normal file
@@ -0,0 +1,342 @@
|
||||
# Service Deep Dives
|
||||
|
||||
---
|
||||
|
||||
### AgentService
|
||||
|
||||
**Purpose**: Manages the full lifecycle of AI agent identities — registration, retrieval, updates, and decommissioning.
|
||||
|
||||
**Responsibility boundary**: AgentService does not handle HTTP, credential secrets,
|
||||
token issuance, or audit log queries. It delegates all data access to
|
||||
`AgentRepository` and `CredentialRepository`, and all audit logging to `AuditService`.
|
||||
It enforces free-tier limits and domain rules before any data is written.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `registerAgent` | `data: ICreateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Checks the free-tier 100-agent limit, enforces email uniqueness, creates the agent record, writes an `agent.created` audit event, increments `agentidp_agents_registered_total` Prometheus counter |
|
||||
| `getAgentById` | `agentId: string` | `Promise<IAgent>` | Retrieves a single agent by UUID; throws `AgentNotFoundError` if not found |
|
||||
| `listAgents` | `filters: IAgentListFilters` | `Promise<IPaginatedAgentsResponse>` | Returns a paginated, optionally filtered list; filters include `owner`, `agentType`, `status`, `page`, `limit` |
|
||||
| `updateAgent` | `agentId: string, data: IUpdateAgentRequest, ipAddress: string, userAgent: string` | `Promise<IAgent>` | Partially updates agent metadata; rejects updates to decommissioned agents; determines the correct audit action (`agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`) based on status transition |
|
||||
| `decommissionAgent` | `agentId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Soft-deletes the agent (sets `status = 'decommissioned'`); revokes all active credentials by calling `credentialRepository.revokeAllForAgent(agentId)` before decommissioning |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `agents`: `agent_id` (UUID PK), `email` (UNIQUE), `agent_type`, `version`, `capabilities` (text array), `owner`, `deployment_env`, `status`, `created_at`, `updated_at`.
|
||||
- No Redis usage — AgentService is PostgreSQL-only.
|
||||
|
||||
**Error types**:
|
||||
- `FreeTierLimitError` (403) — 100-agent limit reached
|
||||
- `AgentAlreadyExistsError` (409) — email already registered
|
||||
- `AgentNotFoundError` (404) — agent UUID not found
|
||||
- `AgentAlreadyDecommissionedError` (409) — agent is already decommissioned
|
||||
|
||||
**Configuration**: None — AgentService reads no environment variables. The free-tier limit (`FREE_TIER_MAX_AGENTS = 100`) is a module-level constant.
|
||||
|
||||
---
|
||||
|
||||
### OAuth2Service
|
||||
|
||||
**Purpose**: Issues, introspects, and revokes RS256 JWT access tokens via the OAuth 2.0 Client Credentials grant.
|
||||
|
||||
**Responsibility boundary**: OAuth2Service does not know about HTTP or routing. It
|
||||
receives already-extracted values (`clientId`, `clientSecret`, `scope`) from the
|
||||
controller, resolves credential verification (Vault or bcrypt), enforces the 10,000
|
||||
tokens/month free-tier limit, and returns a typed `ITokenResponse`. All audit writes
|
||||
on high-throughput paths (issue, introspect, revoke) are fire-and-forget (`void`) to
|
||||
keep token endpoint latency low.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `issueToken` | `clientId: string, clientSecret: string, scope: string, ipAddress: string, userAgent: string` | `Promise<ITokenResponse>` | Verifies credentials (Vault or bcrypt), checks agent status, enforces 10k/month limit, signs RS256 JWT, increments monthly counter and audit event as fire-and-forget |
|
||||
| `introspectToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<IIntrospectResponse>` | Verifies JWT signature and checks Redis revocation list; always returns 200 with `active: true/false` per RFC 7662 |
|
||||
| `revokeToken` | `token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string` | `Promise<void>` | Decodes token without verification; enforces that caller can only revoke their own tokens (`decoded.sub === callerPayload.sub`); adds JTI to Redis revocation list with TTL matching token expiry |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Redis key `revoked:{jti}` — value `1`, TTL = seconds until token expiry. Written on revocation; read on every authenticated request via `authMiddleware`.
|
||||
- Redis key `monthly:tokens:{agentId}:{yyyy-mm}` — integer counter, incremented on every successful token issuance. Read to enforce the 10k/month free-tier limit.
|
||||
|
||||
**Error types**:
|
||||
- `AuthenticationError` (401) — agent not found, or no active credential matches the provided secret
|
||||
- `AuthorizationError` (403) — agent is suspended or decommissioned; or caller attempts to revoke another agent's token
|
||||
- `FreeTierLimitError` (403) — 10,000 tokens/month limit reached
|
||||
|
||||
**Configuration**:
|
||||
- `JWT_PRIVATE_KEY` — PEM-encoded RSA private key, required, read at app startup in `src/app.ts`
|
||||
- `JWT_PUBLIC_KEY` — PEM-encoded RSA public key, required, read at app startup and in `authMiddleware`
|
||||
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, Vault is used for credential verification instead of bcrypt
|
||||
|
||||
---
|
||||
|
||||
### CredentialService
|
||||
|
||||
**Purpose**: Manages the full lifecycle of agent credentials — generation, listing, rotation, and revocation.
|
||||
|
||||
**Responsibility boundary**: CredentialService does not know about HTTP or token
|
||||
issuance. It enforces that credentials can only be generated for `active` agents. It
|
||||
delegates secret storage to either `VaultClient` (Phase 2) or bcrypt (Phase 1 fallback).
|
||||
The plain-text `clientSecret` is generated here, returned once in the response, and
|
||||
never stored or logged — only the bcrypt hash or Vault path is persisted.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `generateCredential` | `agentId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Verifies agent exists and is `active`; generates a cryptographically random secret via `generateClientSecret()`; writes to Vault (when configured) or hashes with bcrypt; returns `ICredentialWithSecret` — the only time the plain-text secret is returned |
|
||||
| `listCredentials` | `agentId: string, filters: ICredentialListFilters` | `Promise<IPaginatedCredentialsResponse>` | Returns paginated credentials for an agent; `clientSecret` is never included in list responses |
|
||||
| `rotateCredential` | `agentId: string, credentialId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string` | `Promise<ICredentialWithSecret>` | Generates a new secret for the same `credentialId`; overwrites Vault entry (new KV v2 version) or updates bcrypt hash; old secret is immediately invalidated; returns new `ICredentialWithSecret` once |
|
||||
| `revokeCredential` | `agentId: string, credentialId: string, ipAddress: string, userAgent: string` | `Promise<void>` | Sets credential `status = 'revoked'`; permanently deletes the Vault secret via `vaultClient.deleteSecret()` when Vault is configured; idempotent rejection of already-revoked credentials with `CredentialAlreadyRevokedError` |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `credentials`: `credential_id` (UUID PK), `client_id` (= `agentId`, FK to `agents`), `secret_hash` (bcrypt hash; empty string when Vault path is set), `vault_path` (nullable — KV v2 data path), `status`, `created_at`, `expires_at` (nullable), `revoked_at` (nullable).
|
||||
|
||||
**Error types**:
|
||||
- `AgentNotFoundError` (404) — agent UUID not found
|
||||
- `CredentialError` (400) — agent is not in `active` status (code: `AGENT_NOT_ACTIVE`)
|
||||
- `CredentialNotFoundError` (404) — credential not found or belongs to a different agent
|
||||
- `CredentialAlreadyRevokedError` (409) — credential is already revoked
|
||||
|
||||
**Configuration**:
|
||||
- `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_MOUNT` — optional; when set, new credentials are stored in Vault KV v2 instead of bcrypt. Existing bcrypt-based credentials continue to work unchanged.
|
||||
|
||||
---
|
||||
|
||||
### AuditService
|
||||
|
||||
**Purpose**: Creates and queries immutable audit events for compliance and observability.
|
||||
|
||||
**Responsibility boundary**: AuditService does not know about HTTP, tokens, or agents.
|
||||
It receives already-assembled event data from other services and delegates all
|
||||
persistence to `AuditRepository`. It enforces the 90-day free-tier retention window
|
||||
on all query and retrieval operations — events older than 90 days are treated as
|
||||
non-existent.
|
||||
|
||||
**Public interface** (key methods):
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `logEvent` | `agentId: string, action: AuditAction, outcome: AuditOutcome, ipAddress: string, userAgent: string, metadata: Record<string, unknown>` | `Promise<IAuditEvent>` | Writes an immutable audit row to PostgreSQL. For token endpoints, callers use `void` (fire-and-forget). For CRUD operations, callers `await` this method. |
|
||||
| `queryEvents` | `filters: IAuditListFilters` | `Promise<IPaginatedAuditEventsResponse>` | Returns paginated, filtered audit events; enforces the 90-day retention window by computing the cutoff date and rejecting queries with `fromDate` before the cutoff; validates that `fromDate <= toDate` |
|
||||
| `getEventById` | `eventId: string` | `Promise<IAuditEvent>` | Retrieves a single event by UUID; returns `AuditEventNotFoundError` for both genuinely missing events and events outside the 90-day retention window (indistinguishable by design) |
|
||||
|
||||
**Database / storage schema**:
|
||||
- Table `audit_events`: `event_id` (UUID PK), `agent_id` (text FK to agents), `action` (text — one of the `AuditAction` union type values), `outcome` (`success` or `failure`), `ip_address` (text), `user_agent` (text), `metadata` (JSONB), `timestamp` (timestamptz, NOT NULL, indexed).
|
||||
- No Redis usage — AuditService is PostgreSQL-only.
|
||||
|
||||
**Error types**:
|
||||
- `AuditEventNotFoundError` (404) — event not found or outside retention window
|
||||
- `RetentionWindowError` (400) — query `fromDate` is before the 90-day retention cutoff
|
||||
- `ValidationError` (400) — `fromDate` is after `toDate`
|
||||
|
||||
**Configuration**: None — the retention window (`FREE_TIER_RETENTION_DAYS = 90`) is a module-level constant.
|
||||
|
||||
---
|
||||
|
||||
### VaultClient
|
||||
|
||||
**Purpose**: Wraps HashiCorp Vault KV v2 operations for credential secret storage and verification.
|
||||
|
||||
**Responsibility boundary**: VaultClient is a client adapter — it knows only about
|
||||
Vault API calls. It has no knowledge of business rules, HTTP, or PostgreSQL. It is
|
||||
injected into `CredentialService` and `OAuth2Service` via constructor injection. When
|
||||
`VAULT_ADDR` is not set, `createVaultClientFromEnv()` returns `null` and the bcrypt
|
||||
code path is used unchanged.
|
||||
|
||||
**Public methods**:
|
||||
|
||||
| Method | Parameters | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `writeSecret` | `agentId: string, credentialId: string, plainSecret: string` | `Promise<string>` | Writes the plain-text secret to the KV v2 data path; returns the path; creates a new KV v2 version on subsequent calls (used for rotation) |
|
||||
| `readSecret` | `agentId: string, credentialId: string` | `Promise<string>` | Reads and returns the plain-text secret from Vault; throws `CredentialError` if the path is not found or the read fails |
|
||||
| `verifySecret` | `agentId: string, credentialId: string, candidateSecret: string` | `Promise<boolean>` | Reads the stored secret via `readSecret`, then compares using `crypto.timingSafeEqual` to prevent timing-based side-channel attacks; returns `false` on any Vault error rather than throwing |
|
||||
| `deleteSecret` | `agentId: string, credentialId: string` | `Promise<void>` | Permanently deletes all versions of a credential secret by calling the KV v2 metadata path (`DELETE {mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`) |
|
||||
|
||||
**KV v2 path structure**:
|
||||
- Data path: `{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`
|
||||
- Metadata path (for permanent deletion): `{mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`
|
||||
- Default mount: `secret` (overridable via `VAULT_MOUNT`)
|
||||
|
||||
**Opt-in configuration**:
|
||||
- `VAULT_ADDR` — Vault server address (e.g. `http://127.0.0.1:8200`) — required to enable Vault mode
|
||||
- `VAULT_TOKEN` — Vault authentication token — required to enable Vault mode
|
||||
- `VAULT_MOUNT` — KV v2 mount path — optional, defaults to `secret`
|
||||
|
||||
**Constant-time comparison rationale**: The `verifySecret` method uses Node.js
|
||||
`crypto.timingSafeEqual` instead of `===` to prevent attackers from inferring the
|
||||
length or content of stored secrets by measuring how long the comparison takes. When
|
||||
the stored and candidate secrets differ in length, a dummy `timingSafeEqual` call is
|
||||
still performed to eliminate the timing signal from the early-exit path.
|
||||
|
||||
---
|
||||
|
||||
### OPA Policy Engine
|
||||
|
||||
**Purpose**: Enforces scope-based authorisation on every protected HTTP request without requiring a code deployment to change access rules.
|
||||
|
||||
**Responsibility boundary**: The OPA policy engine (`src/middleware/opa.ts`) is a
|
||||
middleware layer — it does not know about business rules, credentials, or audit events.
|
||||
It receives the HTTP method, full request path, and caller scopes from `req.user`, and
|
||||
returns allow or deny. All policy logic lives in `policies/authz.rego` and
|
||||
`policies/data/scopes.json`.
|
||||
|
||||
**Policy file locations**:
|
||||
- `policies/authz.rego` — Rego policy defining `normalise_path`, `lookup_key`, and the `allow` rule. Evaluated by the Wasm bundle when compiled; replicated in TypeScript for the fallback path.
|
||||
- `policies/data/scopes.json` — JSON map of `"METHOD:/path/pattern"` → `[required_scopes]`. Loaded as data into the Wasm policy and used directly by the TypeScript fallback.
|
||||
- `policies/authz.wasm` — compiled Wasm bundle (not committed to source control; built from `authz.rego` using the OPA CLI). When present, the Wasm path is used; when absent, the TypeScript fallback reads `scopes.json`.
|
||||
|
||||
**How `opaMiddleware` evaluates input**:
|
||||
|
||||
1. `createOpaMiddleware()` is called once at app startup in `src/app.ts`.
|
||||
2. It attempts to load `policies/authz.wasm`. If found, `loadPolicy(wasmBuffer)` is called and `scopes.json` data is injected via `loaded.setData(parsed)`.
|
||||
3. If no Wasm bundle is found, `scopes.json` is loaded into `scopesMap` as the TypeScript fallback.
|
||||
4. On every request, the middleware builds an `OpaInput` object: `{ method: req.method, path: req.baseUrl + req.path, scopes: req.user.scope.split(' ') }`.
|
||||
5. `evaluate(input)` checks the Wasm policy (if loaded) or applies `normalisePath` + scope-intersection logic against `scopesMap`. Returns `false` if neither is loaded (fail-closed).
|
||||
6. If `evaluate` returns `false`, the middleware calls `next(new AuthorizationError())`.
|
||||
|
||||
**How to write a new policy rule**:
|
||||
|
||||
1. Add the new endpoint's scope requirement to `policies/data/scopes.json`:
|
||||
```json
|
||||
"GET:/api/v1/reports": ["reports:read"]
|
||||
```
|
||||
2. Add `"reports:read"` to the `OAuthScope` union type in `src/types/index.ts`.
|
||||
3. If Wasm mode is in use, recompile `authz.rego` to `authz.wasm` using the OPA CLI: `opa build policies/authz.rego -d policies/data/ -o policies/authz.wasm`.
|
||||
4. Send `SIGHUP` to the running process to hot-reload: `kill -HUP <pid>`.
|
||||
|
||||
**How to test a policy rule**:
|
||||
```bash
|
||||
# Using the OPA CLI directly
|
||||
opa eval --data policies/data/scopes.json \
|
||||
--input '{"method":"GET","path":"/api/v1/agents","scopes":["agents:read"]}' \
|
||||
--bundle policies/ \
|
||||
'data.authz.allow'
|
||||
```
|
||||
Expected output: `true`. Replace method/path/scopes to test deny cases.
|
||||
|
||||
**Hot-reload via SIGHUP**: When `SIGHUP` is received by the Node.js process,
|
||||
`server.ts` calls `reloadOpaPolicy()`. This re-executes the same startup loading logic:
|
||||
tries to load the Wasm bundle, falls back to `scopes.json`. The in-memory `wasmPolicy`
|
||||
and `scopesMap` module-level variables are replaced atomically. No requests are dropped.
|
||||
|
||||
---
|
||||
|
||||
### Web Dashboard
|
||||
|
||||
**Purpose**: Provides a browser-based UI for human operators to manage agents, credentials, and audit logs without writing API calls directly.
|
||||
|
||||
**Responsibility boundary**: The dashboard is a pure client-side React SPA. It has no
|
||||
server-side logic. It calls the AgentIdP REST API using the `@sentryagent/idp-sdk`
|
||||
`TokenManager` for authentication and a typed `ApiClient` from `dashboard/src/lib/client.ts`
|
||||
for all API calls. It never stores the `access_token` in localStorage — only
|
||||
`client_id`, `client_secret`, and `baseUrl` are stored in `sessionStorage` (cleared
|
||||
on tab close).
|
||||
|
||||
**React component structure**:
|
||||
|
||||
```
|
||||
dashboard/src/
|
||||
├── main.tsx # React root — mounts App into #root, wraps with BrowserRouter
|
||||
├── App.tsx # Route definitions — AuthProvider, RequireAuth, AppShell
|
||||
├── lib/
|
||||
│ ├── auth.tsx # AuthContext, AuthProvider, useAuth hook, sessionStorage helpers
|
||||
│ └── client.ts # Typed ApiClient class — wraps fetch with TokenManager token injection
|
||||
├── components/
|
||||
│ ├── RequireAuth.tsx # Route guard — redirects to /dashboard/login if not authenticated
|
||||
│ └── layout/AppShell.tsx # Persistent sidebar navigation + Outlet for page content
|
||||
└── pages/
|
||||
├── Login.tsx # Login form — calls auth.login(), redirects to /dashboard/agents
|
||||
├── Agents.tsx # Paginated agents list with status filter and search
|
||||
├── AgentDetail.tsx # Single agent view — status, metadata, update, decommission actions
|
||||
├── Credentials.tsx # Credential list for an agent — generate, rotate, revoke actions
|
||||
├── AuditLog.tsx # Paginated audit log with date range and action filters
|
||||
└── Health.tsx # /health endpoint response — PostgreSQL and Redis status display
|
||||
```
|
||||
|
||||
**Authentication flow with sessionStorage**:
|
||||
1. On `Login.tsx` form submit, `auth.login(creds)` is called.
|
||||
2. `validateCredentials(creds)` creates a `TokenManager` and calls `getToken()` — if this succeeds, the credentials are valid.
|
||||
3. `saveCredentials(creds)` stores `{ clientId, clientSecret, baseUrl }` in `sessionStorage` under key `agentidp_credentials`.
|
||||
4. On every subsequent API call, `getClient()` in `lib/client.ts` reads credentials from `sessionStorage`, creates a `TokenManager`, and injects the current `access_token` into the `Authorization: Bearer` header. The `TokenManager` handles automatic token refresh when the token is expired.
|
||||
5. `auth.logout()` calls `clearCredentials()` (removes the `sessionStorage` key) and navigates to `/dashboard/login`.
|
||||
|
||||
**Main views and their API calls**:
|
||||
- **Agents** — `GET /api/v1/agents?page=N&limit=20` — paginated list with `status` filter
|
||||
- **AgentDetail** — `GET /api/v1/agents/:id`, `PATCH /api/v1/agents/:id`, `DELETE /api/v1/agents/:id`
|
||||
- **Credentials** — `GET /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials`, `POST /api/v1/agents/:id/credentials/:credId/rotate`, `DELETE /api/v1/agents/:id/credentials/:credId`
|
||||
- **AuditLog** — `GET /api/v1/audit?page=N&limit=20&fromDate=...&toDate=...`
|
||||
- **Health** — `GET /health`
|
||||
|
||||
**Local development**:
|
||||
```bash
|
||||
cd dashboard
|
||||
npm install
|
||||
npm run dev # Vite dev server with HMR — dashboard available at http://localhost:5173/dashboard
|
||||
```
|
||||
The Vite dev server proxies `/api/` calls to the Express server at `http://localhost:3000`.
|
||||
The Express server must be running separately for API calls to work.
|
||||
|
||||
---
|
||||
|
||||
### Prometheus/Grafana Monitoring
|
||||
|
||||
**Purpose**: Provides operational visibility into AgentIdP's HTTP traffic, token issuance rates, agent registration rates, database latency, and Redis command latency.
|
||||
|
||||
**Responsibility boundary**: The metrics middleware (`src/middleware/metrics.ts`) and
|
||||
the metrics registry (`src/metrics/registry.ts`) are observability concerns only — they
|
||||
do not affect business logic. Metrics are exposed at `GET /metrics` via
|
||||
`createMetricsRouter()` using `metricsRegistry.metrics()` from `prom-client`. The
|
||||
`/metrics` endpoint is unauthenticated, intended for scraping by Prometheus only and
|
||||
not exposed to the public internet.
|
||||
|
||||
**Key metrics with labels**:
|
||||
|
||||
| Metric Name | Type | Labels | Description |
|
||||
|-------------|------|--------|-------------|
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | Total HTTP requests received; route is normalised (UUIDs replaced with `:id`) |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration; buckets from 5ms to 2.5s |
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | Total OAuth 2.0 access tokens successfully issued |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Total AI agents successfully registered |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration; buckets from 1ms to 1s |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration; buckets from 0.5ms to 250ms |
|
||||
|
||||
**How to add a new Counter**:
|
||||
1. Open `src/metrics/registry.ts`.
|
||||
2. Add a new `Counter` export:
|
||||
```typescript
|
||||
export const myNewCounter = new Counter({
|
||||
name: 'agentidp_my_new_counter_total',
|
||||
help: 'Description of what this counts.',
|
||||
labelNames: ['label_one'] as const,
|
||||
registers: [metricsRegistry],
|
||||
});
|
||||
```
|
||||
3. Import and call `myNewCounter.inc({ label_one: value })` in the service or middleware where the event occurs.
|
||||
|
||||
**How to add a new Histogram**:
|
||||
1. Open `src/metrics/registry.ts`.
|
||||
2. Add a new `Histogram` export with appropriate buckets:
|
||||
```typescript
|
||||
export const myDurationHistogram = new Histogram({
|
||||
name: 'agentidp_my_operation_duration_seconds',
|
||||
help: 'Duration of my operation in seconds.',
|
||||
labelNames: ['operation'] as const,
|
||||
buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
|
||||
registers: [metricsRegistry],
|
||||
});
|
||||
```
|
||||
3. Use `const end = myDurationHistogram.startTimer({ operation: 'name' }); ... end();` around the operation being measured.
|
||||
|
||||
**Grafana access in local Docker**:
|
||||
|
||||
Start the monitoring overlay:
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up
|
||||
```
|
||||
- Prometheus: `http://localhost:9090`
|
||||
- Grafana: `http://localhost:3001` — default credentials: `admin` / `agentidp`
|
||||
|
||||
Grafana is pre-provisioned with a Prometheus data source pointing to `http://prometheus:9090`
|
||||
and dashboard JSON files from `monitoring/grafana/dashboards/`. No manual configuration
|
||||
is needed after startup.
|
||||
717
docs/engineering/06-walkthroughs.md
Normal file
717
docs/engineering/06-walkthroughs.md
Normal file
@@ -0,0 +1,717 @@
|
||||
# 06 — Code Walkthroughs
|
||||
|
||||
Last verified against commit: `1f95cfe89d1f45fa43b9fb7cff237f07bf9e889e`
|
||||
|
||||
These walkthroughs trace three real production code paths from the HTTP request
|
||||
to the database and back. Every step includes a `file:line` reference and a
|
||||
"why" annotation explaining the design decision.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 1 — Token Issuance
|
||||
|
||||
**Request:** `POST /api/v1/token` with `grant_type=client_credentials`
|
||||
|
||||
This is the most security-critical path in the codebase. An AI agent calling this
|
||||
endpoint is proving its identity and receiving a token that grants access to the
|
||||
entire API for one hour.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Express middleware stack
|
||||
|
||||
**File:** `src/app.ts` lines 57–83
|
||||
|
||||
```
|
||||
helmet() → security headers
|
||||
cors() → CORS headers
|
||||
morgan() → access log line (skipped in test env)
|
||||
express.json() → parse JSON bodies
|
||||
express.urlencoded({ extended: false }) → parse form-encoded bodies
|
||||
metricsMiddleware → start request timer, record counters on finish
|
||||
```
|
||||
|
||||
**Why `extended: false`?** The token endpoint receives `application/x-www-form-urlencoded`
|
||||
bodies (RFC 6749 mandates this format for OAuth 2.0). The `express.urlencoded`
|
||||
middleware parses them into `req.body`. `extended: false` uses the native `querystring`
|
||||
parser, which is sufficient and avoids `qs` library complexity for flat key-value data.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Route dispatch
|
||||
|
||||
**File:** `src/routes/token.ts` line 24
|
||||
|
||||
```typescript
|
||||
router.post('/', asyncHandler(rateLimitMiddleware), asyncHandler(tokenController.issueToken.bind(tokenController)));
|
||||
```
|
||||
|
||||
**Why no `authMiddleware` here?** The token endpoint is where the agent _gets_ its
|
||||
token — it cannot present a Bearer token to authenticate. Instead, credentials go
|
||||
in the request body (`client_id`, `client_secret`). `POST /token` is deliberately
|
||||
unauthenticated at the transport layer; authentication happens inside the controller.
|
||||
|
||||
**Why `asyncHandler`?** Express does not natively support async middleware. `asyncHandler`
|
||||
wraps the async function and calls `next(err)` if the promise rejects, routing the
|
||||
error to `errorHandler`.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Rate limit check
|
||||
|
||||
**File:** `src/middleware/rateLimit.ts`
|
||||
|
||||
The rate limiter checks a Redis sliding-window counter for the client's IP address.
|
||||
If the counter exceeds 100 requests/minute, it throws `RateLimitError` (429).
|
||||
|
||||
**Why Redis, not in-memory?** If the server restarts or scales horizontally to multiple
|
||||
instances, an in-memory counter would reset. Redis maintains the counter across
|
||||
instances and restarts.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Controller: validate grant_type
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 84–103
|
||||
|
||||
```typescript
|
||||
issueToken = async (req: Request, res: Response, _next: NextFunction): Promise<void> => {
|
||||
const body = req.body as ITokenRequest;
|
||||
|
||||
if (!body.grant_type) { ... return res.status(400).json({error: 'invalid_request', ...}) }
|
||||
if (body.grant_type !== 'client_credentials') { ... return res.status(400).json(...) }
|
||||
```
|
||||
|
||||
**Why does this method catch errors itself instead of calling `next(err)`?** The token
|
||||
endpoint must return errors in the **OAuth 2.0 error format** (`{ error, error_description }`)
|
||||
per RFC 6749 §5.2, not the standard SentryAgent.ai format (`{ code, message }`). The
|
||||
`mapToOAuth2Error()` helper translates `AuthenticationError` and `AuthorizationError`
|
||||
into OAuth2 error codes. The `_next` parameter is intentionally unused for the error path.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: Joi validation and credential extraction
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 106–138
|
||||
|
||||
```typescript
|
||||
const { error, value } = tokenRequestSchema.validate(body, { abortEarly: false });
|
||||
// ...
|
||||
// Support HTTP Basic auth fallback (RFC 6749 §2.3.1)
|
||||
const authHeader = req.headers['authorization'];
|
||||
if (authHeader?.startsWith('Basic ')) {
|
||||
const base64 = authHeader.slice(6);
|
||||
const decoded = Buffer.from(base64, 'base64').toString('utf-8');
|
||||
const colonIndex = decoded.indexOf(':');
|
||||
clientId = decoded.slice(0, colonIndex);
|
||||
clientSecret = decoded.slice(colonIndex + 1);
|
||||
}
|
||||
```
|
||||
|
||||
**Why `abortEarly: false`?** This returns all validation errors at once, so the
|
||||
client can fix all problems in one round trip.
|
||||
|
||||
**Why Basic auth support?** RFC 6749 §2.3.1 specifies that client credentials MAY
|
||||
be sent via HTTP Basic authentication. Some OAuth libraries default to this method.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Controller: scope validation
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 141–151
|
||||
|
||||
```typescript
|
||||
const requestedScope = tokenBody.scope ?? 'agents:read';
|
||||
const validScopes = ['agents:read', 'agents:write', 'tokens:read', 'audit:read'];
|
||||
const scopeList = requestedScope.split(' ');
|
||||
const invalidScope = scopeList.find((s) => !validScopes.includes(s));
|
||||
if (invalidScope) { return res.status(400).json({error: 'invalid_scope', ...}) }
|
||||
```
|
||||
|
||||
**Why validate scopes here?** Scope validation at the controller layer provides an
|
||||
RFC 6749-compliant `invalid_scope` error before we even look up the agent. This is
|
||||
faster and gives the client a clearer error message.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: agent lookup
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 83–94
|
||||
|
||||
```typescript
|
||||
const agent = await this.agentRepository.findById(clientId);
|
||||
if (!agent) {
|
||||
void this.auditService.logEvent(clientId, 'auth.failed', 'failure', ..., { reason: 'agent_not_found', clientId });
|
||||
throw new AuthenticationError('Client authentication failed...');
|
||||
}
|
||||
```
|
||||
|
||||
**Why log auth failures?** Failed authentication attempts may indicate a brute-force
|
||||
attack or a misconfigured client. Having them in the audit log enables incident
|
||||
investigation and alerting.
|
||||
|
||||
**Why not distinguish between "agent not found" and "wrong secret" in the error message?**
|
||||
Revealing which is wrong gives an attacker information — they can enumerate valid
|
||||
`client_id` values by checking whether they get "agent not found" vs "wrong secret".
|
||||
Both cases return the same message.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Service: credential verification
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 97–131
|
||||
|
||||
```typescript
|
||||
const { credentials } = await this.credentialRepository.findByAgentId(clientId, { status: 'active', page: 1, limit: 100 });
|
||||
|
||||
for (const cred of credentials) {
|
||||
const credRow = await this.credentialRepository.findById(cred.credentialId);
|
||||
if (credRow) {
|
||||
if (credRow.expiresAt !== null && credRow.expiresAt < new Date()) { continue; }
|
||||
|
||||
let matches: boolean;
|
||||
if (credRow.vaultPath !== null && this.vaultClient !== null) {
|
||||
matches = await this.vaultClient.verifySecret(clientId, credRow.credentialId, clientSecret);
|
||||
} else {
|
||||
matches = await verifySecret(clientSecret, credRow.secretHash);
|
||||
}
|
||||
if (matches) { credentialVerified = true; break; }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why iterate over multiple credentials?** An agent can have multiple active
|
||||
credentials (e.g. one per service that calls it). The agent rotates credentials
|
||||
one at a time — if credential A is rotated while service X is still using it,
|
||||
service X will fail. By checking all active credentials, we allow overlapping rotation.
|
||||
|
||||
**Why check expiry before hashing?** Bcrypt is intentionally slow (~100ms). Checking
|
||||
expiry first is a cheap early exit that avoids the bcrypt computation on expired
|
||||
credentials.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: status and monthly limit checks
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 144–176
|
||||
|
||||
```typescript
|
||||
if (agent.status === 'suspended') { throw new AuthorizationError(...) }
|
||||
if (agent.status === 'decommissioned') { throw new AuthorizationError(...) }
|
||||
|
||||
const monthlyCount = await this.tokenRepository.getMonthlyCount(clientId);
|
||||
if (monthlyCount >= FREE_TIER_MAX_MONTHLY_TOKENS) { throw new FreeTierLimitError(...) }
|
||||
```
|
||||
|
||||
**Why check status after credential verification?** We verify credentials first so
|
||||
a suspended agent with a wrong secret gets `AuthenticationError` (401) not
|
||||
`AuthorizationError` (403). This prevents leaking which agents are suspended to
|
||||
unauthenticated callers.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Service: sign the JWT
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 179–190
|
||||
|
||||
```typescript
|
||||
const jti = uuidv4();
|
||||
const payload: Omit<ITokenPayload, 'iat' | 'exp'> = { sub: clientId, client_id: clientId, scope, jti };
|
||||
const accessToken = signToken(payload, this.privateKey);
|
||||
```
|
||||
|
||||
**File:** `src/utils/jwt.ts` lines 19–31
|
||||
|
||||
```typescript
|
||||
export function signToken(payload: Omit<ITokenPayload, 'iat' | 'exp'>, privateKey: string): string {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
const fullPayload: ITokenPayload = { ...payload, iat: now, exp: now + TOKEN_EXPIRES_IN };
|
||||
return jwt.sign(fullPayload, privateKey, { algorithm: 'RS256' });
|
||||
}
|
||||
```
|
||||
|
||||
**Why RS256 instead of HS256?** RS256 (RSA asymmetric) allows any consumer of the
|
||||
token to verify it using the public key without needing the private signing key.
|
||||
HS256 (HMAC symmetric) would require sharing the secret with every service that
|
||||
verifies tokens.
|
||||
|
||||
**Why `jti` (JWT ID)?** The `jti` is a unique identifier for this specific token.
|
||||
It is used as the key in the Redis revocation list. Without `jti`, you cannot
|
||||
revoke a single token without revoking all tokens for the agent.
|
||||
|
||||
---
|
||||
|
||||
### Step 11 — Service: fire-and-forget operations
|
||||
|
||||
**File:** `src/services/OAuth2Service.ts` lines 193–207
|
||||
|
||||
```typescript
|
||||
void this.tokenRepository.incrementMonthlyCount(clientId);
|
||||
void this.auditService.logEvent(clientId, 'token.issued', 'success', ..., { scope, expiresAt });
|
||||
tokensIssuedTotal.inc({ scope });
|
||||
```
|
||||
|
||||
**Why `void` (fire-and-forget)?** The token has been signed and is ready to return.
|
||||
Waiting for the Redis increment and audit write would add ~5–10ms to every token
|
||||
request. These operations are best-effort — if they fail, the token is still valid.
|
||||
|
||||
**Why is the Prometheus `.inc()` call synchronous?** Prometheus counters are
|
||||
in-process memory operations — they do not write to Redis or PostgreSQL. They are
|
||||
O(1) and sub-microsecond.
|
||||
|
||||
---
|
||||
|
||||
### Step 12 — Response
|
||||
|
||||
**File:** `src/controllers/TokenController.ts` lines 163–167
|
||||
|
||||
```typescript
|
||||
res.setHeader('Cache-Control', 'no-store');
|
||||
res.setHeader('Pragma', 'no-cache');
|
||||
res.status(200).json(tokenResponse);
|
||||
```
|
||||
|
||||
**Why `Cache-Control: no-store`?** RFC 6749 §5.1 mandates that token responses
|
||||
must not be cached. Without this header, a shared proxy or CDN could cache the
|
||||
response and replay it to another client.
|
||||
|
||||
Final response:
|
||||
```json
|
||||
{
|
||||
"access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600,
|
||||
"scope": "agents:read agents:write"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 2 — Agent Registration
|
||||
|
||||
**Request:** `POST /api/v1/agents` with Bearer token and agent data JSON body
|
||||
|
||||
After token issuance, registering an agent is the second most common operation.
|
||||
This walkthrough shows a request that goes through all three auth middleware layers.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Middleware stack
|
||||
|
||||
**File:** `src/app.ts` lines 57–83 (same security and parsing middleware as Walkthrough 1)
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Route dispatch
|
||||
|
||||
**File:** `src/routes/agents.ts` lines 22–27
|
||||
|
||||
```typescript
|
||||
router.use(asyncHandler(authMiddleware));
|
||||
router.use(opaMiddleware);
|
||||
router.use(asyncHandler(rateLimitMiddleware));
|
||||
router.post('/', asyncHandler(agentController.registerAgent.bind(agentController)));
|
||||
```
|
||||
|
||||
All three middleware run on every request to the agents router before the handler.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Auth middleware: Bearer token verification
|
||||
|
||||
**File:** `src/middleware/auth.ts` lines 28–77
|
||||
|
||||
```typescript
|
||||
const authHeader = req.headers['authorization'];
|
||||
if (!authHeader || !authHeader.startsWith('Bearer ')) { throw new AuthenticationError(...) }
|
||||
|
||||
const token = authHeader.slice(7).trim();
|
||||
const publicKey = process.env['JWT_PUBLIC_KEY'];
|
||||
let payload: ITokenPayload;
|
||||
try {
|
||||
payload = verifyToken(token, publicKey);
|
||||
} catch (err) {
|
||||
if (err instanceof TokenExpiredError) { throw new AuthenticationError('Token has expired.') }
|
||||
if (err instanceof JsonWebTokenError) { throw new AuthenticationError('Token signature is invalid.') }
|
||||
}
|
||||
|
||||
const redis = await getRedisClient();
|
||||
const revocationKey = `revoked:${payload.jti}`;
|
||||
const isRevoked = await redis.get(revocationKey);
|
||||
if (isRevoked !== null) { throw new AuthenticationError('Token has been revoked.') }
|
||||
|
||||
req.user = payload;
|
||||
next();
|
||||
```
|
||||
|
||||
**Why check Redis after signature verification?** Signature verification is a pure
|
||||
cryptographic operation (no I/O). If the token is expired or has a bad signature,
|
||||
there is no need to hit Redis. The fast path exits early; Redis is the slower
|
||||
secondary check.
|
||||
|
||||
**Why `await getRedisClient()` instead of storing the client?** `getRedisClient()`
|
||||
returns the same singleton every time — the connection is created once and reused.
|
||||
The `await` is fast (no I/O after the first call).
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — OPA middleware: scope enforcement
|
||||
|
||||
**File:** `src/middleware/opa.ts` lines 230–257
|
||||
|
||||
```typescript
|
||||
const input: OpaInput = {
|
||||
method: req.method, // "POST"
|
||||
path: req.baseUrl + req.path, // "/api/v1/agents"
|
||||
scopes: req.user.scope.split(' '), // ["agents:read", "agents:write"]
|
||||
};
|
||||
|
||||
if (!evaluate(input)) {
|
||||
next(new AuthorizationError());
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
For `POST /api/v1/agents`, the policy requires `["agents:write"]`. If `agents:write`
|
||||
is not in the token's scope, the request is rejected with 403 before the controller
|
||||
runs.
|
||||
|
||||
**Why reconstruct the full path with `req.baseUrl + req.path`?** The OPA policy
|
||||
uses full paths (`/api/v1/agents/:id`). Inside a nested router, `req.path` is
|
||||
relative to the router's mount point (e.g. `/`). `req.baseUrl` is the mount prefix
|
||||
(`/api/v1/agents`). Concatenating them gives the full path the policy expects.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: validation
|
||||
|
||||
**File:** `src/controllers/AgentController.ts` lines 37–60
|
||||
|
||||
```typescript
|
||||
registerAgent = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
|
||||
if (!req.user) { throw new AuthorizationError() }
|
||||
|
||||
const { error, value } = createAgentSchema.validate(req.body, { abortEarly: false });
|
||||
if (error) {
|
||||
throw new ValidationError('Request validation failed.', {
|
||||
details: error.details.map((d) => ({ field: d.path.join('.'), reason: d.message })),
|
||||
});
|
||||
}
|
||||
|
||||
const data = value as ICreateAgentRequest;
|
||||
const ipAddress = req.ip ?? '0.0.0.0';
|
||||
const userAgent = req.headers['user-agent'] ?? 'unknown';
|
||||
|
||||
const agent = await this.agentService.registerAgent(data, ipAddress, userAgent);
|
||||
res.status(201).json(agent);
|
||||
```
|
||||
|
||||
**Why check `req.user` in the controller when `authMiddleware` already set it?**
|
||||
TypeScript's type system marks `req.user` as `ITokenPayload | undefined`. The check
|
||||
at line 39 narrows the type so subsequent code can use `req.user` without null
|
||||
assertions. It is a guard, not redundant authentication.
|
||||
|
||||
**Why pass `ipAddress` and `userAgent` to the service?** The service logs audit events.
|
||||
Audit events include the client IP and User-Agent for forensic value. These values
|
||||
come from the HTTP request, which the service has no access to — so the controller
|
||||
extracts them and passes them down.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Service: free-tier limit check
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 59–65
|
||||
|
||||
```typescript
|
||||
const currentCount = await this.agentRepository.countActive();
|
||||
if (currentCount >= FREE_TIER_MAX_AGENTS) {
|
||||
throw new FreeTierLimitError('Free tier limit of 100 registered agents has been reached.', ...);
|
||||
}
|
||||
```
|
||||
|
||||
**Why count before checking email uniqueness?** If the limit is reached, there is
|
||||
no point checking whether the email already exists. Doing the cheaper check (count)
|
||||
first avoids an unnecessary query.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: email uniqueness check
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 68–71
|
||||
|
||||
```typescript
|
||||
const existing = await this.agentRepository.findByEmail(data.email);
|
||||
if (existing !== null) { throw new AgentAlreadyExistsError(data.email) }
|
||||
```
|
||||
|
||||
**Why not rely on the database UNIQUE constraint?** We could, but catching a
|
||||
PostgreSQL `23505` error code in the repository would be less readable and would
|
||||
not produce a typed `AgentAlreadyExistsError` with a structured `details` field.
|
||||
The explicit check gives better error messages and keeps the repository layer clean.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Repository: INSERT
|
||||
|
||||
**File:** `src/repositories/AgentRepository.ts` lines 67–85
|
||||
|
||||
```typescript
|
||||
async create(data: ICreateAgentRequest): Promise<IAgent> {
|
||||
const agentId = uuidv4();
|
||||
const result: QueryResult<AgentRow> = await this.pool.query(
|
||||
`INSERT INTO agents (agent_id, email, agent_type, version, capabilities, owner, deployment_env, status, created_at, updated_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, 'active', NOW(), NOW())
|
||||
RETURNING *`,
|
||||
[agentId, data.email, data.agentType, data.version, data.capabilities, data.owner, data.deploymentEnv],
|
||||
);
|
||||
return mapRowToAgent(result.rows[0]);
|
||||
}
|
||||
```
|
||||
|
||||
**Why generate `agentId` in application code instead of relying on `gen_random_uuid()`?**
|
||||
Because we use the UUID as the OAuth 2.0 `client_id`. We need the UUID before writing
|
||||
to the database so we can use it in the audit event and the response. Having it in
|
||||
application code avoids a separate SELECT after the INSERT.
|
||||
|
||||
**Why `RETURNING *`?** PostgreSQL's `RETURNING` clause sends back the inserted row
|
||||
in the same round trip as the INSERT. This avoids a second SELECT to fetch the
|
||||
newly created record.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: audit event
|
||||
|
||||
**File:** `src/services/AgentService.ts` lines 76–83
|
||||
|
||||
```typescript
|
||||
await this.auditService.logEvent(
|
||||
agent.agentId,
|
||||
'agent.created',
|
||||
'success',
|
||||
ipAddress,
|
||||
userAgent,
|
||||
{ agentType: agent.agentType, owner: agent.owner },
|
||||
);
|
||||
```
|
||||
|
||||
**Why `await` here but `void` for token audit events?** Agent registration is a
|
||||
database write operation that happens once. Adding ~5ms for the audit write is
|
||||
acceptable and ensures the audit event is recorded before the 201 response is sent.
|
||||
Token issuance happens far more frequently — audit is fire-and-forget there.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Response
|
||||
|
||||
**File:** `src/controllers/AgentController.ts` line 56
|
||||
|
||||
```typescript
|
||||
res.status(201).json(agent);
|
||||
```
|
||||
|
||||
Returns the full `IAgent` object with HTTP 201 Created.
|
||||
|
||||
---
|
||||
|
||||
## Walkthrough 3 — Credential Rotation
|
||||
|
||||
**Request:** `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`
|
||||
|
||||
Credential rotation is the process of replacing an existing client secret with a
|
||||
new one without changing the `credentialId`. This is the recommended security
|
||||
practice — rotate periodically and rotate immediately after suspected compromise.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Route dispatch
|
||||
|
||||
**File:** `src/routes/credentials.ts` line 34
|
||||
|
||||
```typescript
|
||||
router.post('/:credentialId/rotate', asyncHandler(credentialController.rotateCredential.bind(credentialController)));
|
||||
```
|
||||
|
||||
The credentials router is mounted at `/api/v1/agents/:agentId/credentials` in `app.ts`.
|
||||
The full path becomes `POST /api/v1/agents/:agentId/credentials/:credentialId/rotate`.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Auth middleware
|
||||
|
||||
Same as Walkthrough 2, Step 3. Bearer token is verified via RS256 and Redis revocation check.
|
||||
`req.user` is populated with the JWT payload.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — OPA middleware
|
||||
|
||||
The path `/api/v1/agents/:agentId/credentials/:credId/rotate` is normalised to
|
||||
`/api/v1/agents/:id/credentials/:credId/rotate`. The policy requires `["agents:write"]`.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Controller: ownership check
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` lines 127–137
|
||||
|
||||
```typescript
|
||||
rotateCredential = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
|
||||
if (!req.user) { throw new AuthenticationError() }
|
||||
|
||||
const { agentId, credentialId } = req.params;
|
||||
|
||||
if (req.user.sub !== agentId) {
|
||||
throw new AuthorizationError('You do not have permission to manage credentials for this agent.');
|
||||
}
|
||||
```
|
||||
|
||||
**Why check `req.user.sub !== agentId`?** An agent's token contains its own
|
||||
`agentId` as the `sub` claim. This check enforces that an agent can only manage
|
||||
its own credentials. Even if an agent has `agents:write` scope, it cannot rotate
|
||||
another agent's credentials. This is Phase 1 behaviour — there is no admin scope yet.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Controller: request validation
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` lines 139–157
|
||||
|
||||
```typescript
|
||||
const { error, value } = generateCredentialSchema.validate(req.body ?? {}, { abortEarly: false });
|
||||
// generateCredentialSchema validates optional `expiresAt` field
|
||||
const data = value as IGenerateCredentialRequest;
|
||||
const result = await this.credentialService.rotateCredential(agentId, credentialId, data, ipAddress, userAgent);
|
||||
res.status(200).json(result);
|
||||
```
|
||||
|
||||
**Why `req.body ?? {}`?** The rotation body is optional — an agent may rotate a
|
||||
credential without an expiry date, in which case the body may be empty. Passing
|
||||
`undefined` to Joi would cause a different error than passing `{}`.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Service: existence checks
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 163–177
|
||||
|
||||
```typescript
|
||||
const agent = await this.agentRepository.findById(agentId);
|
||||
if (!agent) { throw new AgentNotFoundError(agentId) }
|
||||
|
||||
const existing = await this.credentialRepository.findById(credentialId);
|
||||
if (!existing || existing.clientId !== agentId) { throw new CredentialNotFoundError(credentialId) }
|
||||
|
||||
if (existing.status === 'revoked') {
|
||||
throw new CredentialAlreadyRevokedError(credentialId, existing.revokedAt?.toISOString() ?? ...);
|
||||
}
|
||||
```
|
||||
|
||||
**Why check `existing.clientId !== agentId`?** Even though OPA restricts the agent
|
||||
to its own credentials, a malicious actor could craft a request with a valid
|
||||
`agentId` in the path but a `credentialId` belonging to another agent. This check
|
||||
ensures that a credential is only accessible to the agent it was created for.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Service: generate new secret and write to Vault or bcrypt
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 180–192
|
||||
|
||||
```typescript
|
||||
const expiresAt = data.expiresAt !== undefined ? new Date(data.expiresAt) : null;
|
||||
const plainSecret = generateClientSecret(); // sk_live_<64 hex chars>
|
||||
|
||||
let updated: ICredential | null;
|
||||
|
||||
if (this.vaultClient !== null) {
|
||||
// Phase 2: overwrite the existing Vault secret (KV v2 creates a new version)
|
||||
const vaultPath = await this.vaultClient.writeSecret(agentId, credentialId, plainSecret);
|
||||
updated = await this.credentialRepository.updateVaultPath(credentialId, vaultPath, expiresAt);
|
||||
} else {
|
||||
// Phase 1: use bcrypt
|
||||
const newHash = await hashSecret(plainSecret);
|
||||
updated = await this.credentialRepository.updateHash(credentialId, newHash, expiresAt);
|
||||
}
|
||||
```
|
||||
|
||||
**Why does Vault rotation write to the same path?** Vault KV v2 is versioned — writing
|
||||
to an existing path creates a new version without overwriting previous versions.
|
||||
This preserves an audit trail in Vault itself.
|
||||
|
||||
**Why does the Vault path stay the same after rotation?** The `vault_path` column
|
||||
stores the path, not the secret. The path is deterministic:
|
||||
`{mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}`. Since the
|
||||
`credentialId` does not change on rotation, the path does not change either.
|
||||
Only the Vault version at that path changes.
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Repository: UPDATE the credential
|
||||
|
||||
**File:** `src/repositories/CredentialRepository.ts` lines 180–218
|
||||
|
||||
```typescript
|
||||
// Bcrypt path (updateHash):
|
||||
UPDATE credentials
|
||||
SET secret_hash = $1, vault_path = NULL, expires_at = $2, status = 'active', revoked_at = NULL
|
||||
WHERE credential_id = $3
|
||||
RETURNING *
|
||||
|
||||
// Vault path (updateVaultPath):
|
||||
UPDATE credentials
|
||||
SET vault_path = $1, secret_hash = '', expires_at = $2, status = 'active', revoked_at = NULL
|
||||
WHERE credential_id = $3
|
||||
RETURNING *
|
||||
```
|
||||
|
||||
**Why `status = 'active'` in the UPDATE?** A credential could theoretically be
|
||||
in any state when rotated. The UPDATE explicitly sets it to active. This handles
|
||||
edge cases where a revoked credential is being "un-revoked" by rotation (though
|
||||
the service layer prevents this — revoked credentials throw `CredentialAlreadyRevokedError`).
|
||||
The belt-and-suspenders approach at the SQL layer ensures data integrity.
|
||||
|
||||
---
|
||||
|
||||
### Step 9 — Service: audit event
|
||||
|
||||
**File:** `src/services/CredentialService.ts` lines 199–206
|
||||
|
||||
```typescript
|
||||
await this.auditService.logEvent(
|
||||
agentId,
|
||||
'credential.rotated',
|
||||
'success',
|
||||
ipAddress,
|
||||
userAgent,
|
||||
{ credentialId },
|
||||
);
|
||||
```
|
||||
|
||||
The audit event records which credential was rotated. Combined with the timestamp,
|
||||
this gives a complete rotation history for each credential.
|
||||
|
||||
---
|
||||
|
||||
### Step 10 — Response
|
||||
|
||||
**File:** `src/controllers/CredentialController.ts` line 161
|
||||
|
||||
```typescript
|
||||
res.status(200).json(result);
|
||||
```
|
||||
|
||||
Returns `ICredentialWithSecret` — the updated credential including the new
|
||||
`clientSecret`. This is the only time the new secret is ever returned. The caller
|
||||
must store it securely.
|
||||
|
||||
```json
|
||||
{
|
||||
"credentialId": "d4e5f6a7-...",
|
||||
"clientId": "a1b2c3d4-...",
|
||||
"status": "active",
|
||||
"clientSecret": "sk_live_4f8a2e9b...",
|
||||
"createdAt": "2026-01-15T10:00:00Z",
|
||||
"expiresAt": "2027-01-15T10:00:00Z",
|
||||
"revokedAt": null
|
||||
}
|
||||
```
|
||||
404
docs/engineering/07-dev-setup.md
Normal file
404
docs/engineering/07-dev-setup.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# 07 — Development Environment Setup
|
||||
|
||||
This guide takes you from a fresh machine to a running AgentIdP server with a
|
||||
passing smoke test. Estimated time: 15–20 minutes.
|
||||
|
||||
---
|
||||
|
||||
## 8.1 Prerequisites
|
||||
|
||||
Install all of these before proceeding.
|
||||
|
||||
| Prerequisite | Minimum version | Install link |
|
||||
|-------------|----------------|--------------|
|
||||
| Node.js | 18.x LTS | https://nodejs.org/en/download — use the LTS version |
|
||||
| npm | 9.x (ships with Node.js 18) | Included with Node.js |
|
||||
| Docker Desktop | Latest stable | https://docs.docker.com/get-docker/ |
|
||||
| Git | 2.x | https://git-scm.com/downloads |
|
||||
|
||||
**Verify your versions:**
|
||||
```bash
|
||||
node --version # Should print v18.x.x or higher
|
||||
npm --version # Should print 9.x or higher
|
||||
docker --version # Should print Docker version 24.x or higher
|
||||
git --version # Should print git version 2.x
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.2 Clone and Install
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/sentryagent-ai/sentryagent-idp.git
|
||||
cd sentryagent-idp
|
||||
|
||||
# Install Node.js dependencies
|
||||
npm install
|
||||
```
|
||||
|
||||
This installs all production dependencies (Express, pg, Redis, etc.) and
|
||||
development dependencies (TypeScript, Jest, ts-jest, eslint).
|
||||
|
||||
---
|
||||
|
||||
## 8.3 Environment Variables Setup
|
||||
|
||||
The server requires a `.env` file at the project root. There is no `.env.example`
|
||||
file — create it from scratch using the template below.
|
||||
|
||||
```bash
|
||||
touch .env
|
||||
```
|
||||
|
||||
Add the following content to `.env`. Every variable is documented below.
|
||||
|
||||
```bash
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# PostgreSQL connection
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# Redis connection
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# HTTP server port
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
PORT=3000
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# JWT RSA keys (generate these below)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
JWT_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----"
|
||||
JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----"
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# CORS (optional — defaults to '*' which allows all origins)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
CORS_ORIGIN=*
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# Node environment
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
NODE_ENV=development
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# OPA policy directory (optional — defaults to ./policies)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# POLICY_DIR=/path/to/policies
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# HashiCorp Vault (optional — omit to use bcrypt mode)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# VAULT_ADDR=http://127.0.0.1:8200
|
||||
# VAULT_TOKEN=your-vault-token
|
||||
# VAULT_MOUNT=secret
|
||||
```
|
||||
|
||||
**Complete environment variable reference:**
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `DATABASE_URL` | Yes | — | PostgreSQL connection string. Format: `postgresql://user:password@host:port/dbname` |
|
||||
| `REDIS_URL` | Yes | — | Redis connection URL. Format: `redis://host:port[/db]` |
|
||||
| `PORT` | No | `3000` | TCP port the HTTP server listens on |
|
||||
| `JWT_PRIVATE_KEY` | Yes | — | PEM-encoded RSA private key (2048-bit minimum) for signing tokens |
|
||||
| `JWT_PUBLIC_KEY` | Yes | — | PEM-encoded RSA public key (matching the private key above) for verifying tokens |
|
||||
| `CORS_ORIGIN` | No | `*` | CORS allowed origin. Use `*` for development, set to your dashboard domain in production |
|
||||
| `NODE_ENV` | No | `development` | Set to `test` to suppress Morgan HTTP logging during tests |
|
||||
| `POLICY_DIR` | No | `./policies` | Absolute path to the directory containing `authz.wasm` or `data/scopes.json` |
|
||||
| `VAULT_ADDR` | No | — | HashiCorp Vault server address (e.g. `http://127.0.0.1:8200`). When omitted, bcrypt mode is used |
|
||||
| `VAULT_TOKEN` | No | — | Vault authentication token. Required when `VAULT_ADDR` is set |
|
||||
| `VAULT_MOUNT` | No | `secret` | Vault KV v2 mount path. Only used when Vault is configured |
|
||||
|
||||
**Generating JWT keys:**
|
||||
```bash
|
||||
# Generate RSA 2048-bit private key
|
||||
openssl genrsa -out private.pem 2048
|
||||
|
||||
# Extract public key
|
||||
openssl rsa -in private.pem -pubout -out public.pem
|
||||
|
||||
# Print private key as single-line for .env (replace newlines with \n)
|
||||
awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' private.pem
|
||||
|
||||
# Print public key as single-line for .env
|
||||
awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' public.pem
|
||||
```
|
||||
|
||||
Paste the output (including the `-----BEGIN/END-----` lines) as the value for
|
||||
`JWT_PRIVATE_KEY` and `JWT_PUBLIC_KEY` in your `.env` file, surrounded by double
|
||||
quotes.
|
||||
|
||||
---
|
||||
|
||||
## 8.4 Docker Compose Startup and Health Checks
|
||||
|
||||
Docker Compose starts PostgreSQL 14 and Redis 7. The application reads the
|
||||
`DATABASE_URL` and `REDIS_URL` from your `.env` file to connect to them.
|
||||
|
||||
```bash
|
||||
# Start PostgreSQL and Redis in the background
|
||||
docker compose up postgres redis -d
|
||||
|
||||
# Wait for health checks to pass (usually 10-15 seconds)
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
Expected output when both services are healthy:
|
||||
```
|
||||
NAME STATUS PORTS
|
||||
sentryagent-idp-postgres-1 Up (healthy) 0.0.0.0:5432->5432/tcp
|
||||
sentryagent-idp-redis-1 Up (healthy) 0.0.0.0:6379->6379/tcp
|
||||
```
|
||||
|
||||
**Manual health check:**
|
||||
```bash
|
||||
# Test PostgreSQL connection
|
||||
docker exec sentryagent-idp-postgres-1 pg_isready -U sentryagent -d sentryagent_idp
|
||||
# Expected: /var/run/postgresql:5432 - accepting connections
|
||||
|
||||
# Test Redis connection
|
||||
docker exec sentryagent-idp-redis-1 redis-cli ping
|
||||
# Expected: PONG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.5 Database Migrations
|
||||
|
||||
Run the migration script to create all required tables:
|
||||
|
||||
```bash
|
||||
npm run db:migrate
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Running database migrations...
|
||||
✓ Applied: 001_create_agents.sql
|
||||
✓ Applied: 002_create_credentials.sql
|
||||
✓ Applied: 003_create_audit_events.sql
|
||||
✓ Applied: 004_create_tokens.sql
|
||||
✓ Applied: 005_add_vault_path.sql
|
||||
|
||||
Migrations complete. 5 migration(s) applied.
|
||||
```
|
||||
|
||||
Running `npm run db:migrate` a second time is safe — it skips already-applied migrations:
|
||||
```
|
||||
- Skipped (already applied): 001_create_agents.sql
|
||||
...
|
||||
Migrations complete. 0 migration(s) applied.
|
||||
```
|
||||
|
||||
**Migration internals:**
|
||||
The migration runner (`scripts/migrate.ts`) reads `.sql` files from `src/db/migrations/`
|
||||
in alphabetical order, wraps each in a transaction, and records the filename in the
|
||||
`schema_migrations` table. If a migration fails, the transaction rolls back and
|
||||
the runner exits with code 1.
|
||||
|
||||
---
|
||||
|
||||
## 8.6 Start the Server
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
# Expected: SentryAgent.ai AgentIdP listening on port 3000
|
||||
```
|
||||
|
||||
`npm run dev` uses `ts-node` to execute `src/server.ts` directly without compiling.
|
||||
This is faster for development. For a production-style start, compile first:
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
npm start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.7 Smoke Test
|
||||
|
||||
Verify the server is working with these three curl commands.
|
||||
|
||||
**1. Health check:**
|
||||
```bash
|
||||
curl http://localhost:3000/health
|
||||
```
|
||||
Expected response (200 OK):
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"checks": {
|
||||
"database": "healthy",
|
||||
"redis": "healthy"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**2. Register an agent:**
|
||||
First, you need a token to authenticate. But to get a token, you need credentials.
|
||||
And to get credentials, you need an agent. The chicken-and-egg is resolved by the
|
||||
fact that agent registration requires an `agents:write` scoped token — which means
|
||||
you need to bootstrap the first agent another way.
|
||||
|
||||
For local development, temporarily test without auth by using the `/api/v1` prefix
|
||||
directly (the server accepts requests; OPA will enforce scope).
|
||||
|
||||
The easiest approach: generate a test token programmatically:
|
||||
|
||||
```bash
|
||||
# Generate test keys
|
||||
openssl genrsa -out /tmp/test_private.pem 2048 2>/dev/null
|
||||
openssl rsa -in /tmp/test_private.pem -pubout -out /tmp/test_public.pem 2>/dev/null
|
||||
|
||||
# Set them in your environment temporarily
|
||||
export JWT_PRIVATE_KEY="$(cat /tmp/test_private.pem)"
|
||||
export JWT_PUBLIC_KEY="$(cat /tmp/test_public.pem)"
|
||||
|
||||
# Start server with these keys and use a tool or short Node.js script to mint a test token
|
||||
```
|
||||
|
||||
**3. Token endpoint with seeded credentials:**
|
||||
|
||||
Once you have an agent with credentials (e.g. created via the API or seeded in
|
||||
development), issue a token:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3000/api/v1/token \
|
||||
-H "Content-Type: application/x-www-form-urlencoded" \
|
||||
-d "grant_type=client_credentials" \
|
||||
-d "client_id=YOUR_AGENT_ID" \
|
||||
-d "client_secret=sk_live_YOUR_SECRET" \
|
||||
-d "scope=agents:read agents:write"
|
||||
```
|
||||
|
||||
Expected response (200 OK):
|
||||
```json
|
||||
{
|
||||
"access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600,
|
||||
"scope": "agents:read agents:write"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.8 Troubleshooting
|
||||
|
||||
### Error: `connection refused` on PostgreSQL or Redis
|
||||
**Cause:** Docker services not running or not yet healthy.
|
||||
**Fix:**
|
||||
```bash
|
||||
docker compose ps # Check status
|
||||
docker compose up postgres redis -d # Start if not running
|
||||
docker compose logs postgres # Check for startup errors
|
||||
```
|
||||
|
||||
### Error: `DATABASE_URL environment variable is required`
|
||||
**Cause:** `.env` file missing or not being loaded.
|
||||
**Fix:** Ensure `.env` exists at the project root. `npm run dev` loads it via `dotenv.config()` in `src/server.ts`.
|
||||
|
||||
### Error: `JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required`
|
||||
**Cause:** JWT keys not in `.env`, or newlines in the PEM keys are not properly escaped.
|
||||
**Fix:** Ensure the keys are wrapped in double quotes and newlines are represented as `\n`. Use the `awk` command from section 8.3 to format them correctly.
|
||||
|
||||
### Error: `Cannot find module 'ts-node'`
|
||||
**Cause:** `npm install` was not run, or ran against a different Node.js version.
|
||||
**Fix:**
|
||||
```bash
|
||||
node --version # Confirm Node.js 18+
|
||||
rm -rf node_modules package-lock.json
|
||||
npm install
|
||||
```
|
||||
|
||||
### Error: `Cannot connect to Redis` (during migration or server start)
|
||||
**Cause:** Redis container not running or `REDIS_URL` is incorrect.
|
||||
**Fix:**
|
||||
```bash
|
||||
docker exec sentryagent-idp-redis-1 redis-cli ping
|
||||
# If container is not running:
|
||||
docker compose up redis -d
|
||||
```
|
||||
|
||||
### Port 3000 already in use
|
||||
**Cause:** Another process is listening on port 3000.
|
||||
**Fix:**
|
||||
```bash
|
||||
lsof -i :3000 # Find the process
|
||||
kill <PID> # Kill it
|
||||
# Or: set PORT=3001 in .env and restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.9 Running Tests Locally
|
||||
|
||||
```bash
|
||||
# Run all tests (unit + integration)
|
||||
npm test
|
||||
|
||||
# Run tests with coverage report
|
||||
npm run test:unit -- --coverage
|
||||
# Coverage report: coverage/lcov-report/index.html
|
||||
|
||||
# Run only unit tests
|
||||
npm run test:unit
|
||||
|
||||
# Run only integration tests (requires running PostgreSQL and Redis)
|
||||
npm run test:integration
|
||||
|
||||
# Run a single test file
|
||||
npx jest tests/unit/services/AgentService.test.ts
|
||||
|
||||
# Run tests matching a pattern
|
||||
npx jest --testNamePattern="registerAgent"
|
||||
|
||||
# Watch mode (re-runs on file changes)
|
||||
npx jest --watch
|
||||
```
|
||||
|
||||
**Integration test requirements:**
|
||||
Integration tests connect to real PostgreSQL and Redis. Set these environment
|
||||
variables before running integration tests:
|
||||
```bash
|
||||
TEST_DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test
|
||||
TEST_REDIS_URL=redis://localhost:6379/1
|
||||
```
|
||||
|
||||
The integration tests create their own tables (using `CREATE TABLE IF NOT EXISTS`)
|
||||
and clean up after themselves with `DELETE FROM` statements in `afterAll`.
|
||||
|
||||
---
|
||||
|
||||
## 8.10 Web Dashboard Local Development
|
||||
|
||||
The web dashboard is a separate Vite project in the `dashboard/` directory.
|
||||
|
||||
```bash
|
||||
# From the project root
|
||||
cd dashboard
|
||||
npm install
|
||||
|
||||
# Start the Vite development server
|
||||
npm run dev
|
||||
# Dashboard available at http://localhost:5173
|
||||
```
|
||||
|
||||
The Vite dev server proxies all `/api/*` requests to `http://localhost:3000`,
|
||||
so the API server must be running concurrently (in a separate terminal).
|
||||
|
||||
**Build for production:**
|
||||
```bash
|
||||
cd dashboard
|
||||
npm run build
|
||||
# Output: dashboard/dist/ (served by Express at /dashboard)
|
||||
```
|
||||
|
||||
After building, the Express server serves the built dashboard at
|
||||
`http://localhost:3000/dashboard`. You do not need to run the Vite dev server
|
||||
for this — the static files are served directly by Express.
|
||||
372
docs/engineering/08-workflow.md
Normal file
372
docs/engineering/08-workflow.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# 08 — Engineering Workflow
|
||||
|
||||
---
|
||||
|
||||
## 9.1 OpenSpec Spec-First Workflow
|
||||
|
||||
Every feature in this codebase was designed before it was implemented.
|
||||
The OpenSpec workflow enforces this order without exception.
|
||||
|
||||
### The Full Sequence
|
||||
|
||||
```
|
||||
1. CEO identifies a feature or change
|
||||
└── Documents it in the backlog
|
||||
|
||||
2. CEO approves the feature for the current sprint
|
||||
└── Creates an OpenSpec change document
|
||||
|
||||
3. Virtual Architect designs the API
|
||||
└── Writes the OpenAPI 3.0 spec BEFORE any implementation
|
||||
└── Produces spec file in docs/openapi/ or openspec/changes/<name>/specs/
|
||||
└── Every endpoint in the spec must have:
|
||||
- Summary and description
|
||||
- Request body schema (with all validation rules)
|
||||
- All response schemas (every status code)
|
||||
- Error response schemas
|
||||
- Authentication requirements
|
||||
- Example requests and responses
|
||||
|
||||
4. Virtual CTO reviews the spec
|
||||
└── Checks DRY, SOLID, AGNTCY compliance, completeness
|
||||
└── Either approves or returns with corrections
|
||||
|
||||
5. CEO approves the spec
|
||||
└── Only after CTO approval
|
||||
└── Scope changes require re-running from step 1
|
||||
|
||||
6. Virtual Principal Developer implements the spec
|
||||
└── Implementation must match the spec exactly
|
||||
└── TypeScript strict mode, DRY, SOLID, JSDoc on all public methods
|
||||
└── Zero any types
|
||||
└── All errors typed and handled
|
||||
|
||||
7. Virtual QA Engineer writes and runs tests
|
||||
└── Unit tests: >80% coverage on all services
|
||||
└── Integration tests: every endpoint in the spec tested
|
||||
└── Edge cases: null, empty, invalid inputs
|
||||
└── Performance: token endpoints <100ms, all others <200ms
|
||||
└── Verifies spec matches implementation exactly
|
||||
|
||||
8. Virtual CTO reviews the implementation and QA report
|
||||
└── If quality gates not met: returns to step 6 or 7
|
||||
└── If approved: notifies CEO
|
||||
|
||||
9. CEO approves → code is merged to develop
|
||||
└── No code ever goes to main without CEO awareness
|
||||
```
|
||||
|
||||
### Rule: No Code Without a Spec
|
||||
|
||||
If you find yourself implementing something without an approved OpenAPI spec,
|
||||
stop. Write the spec first, get it reviewed, then implement. This is not
|
||||
bureaucracy — it is how you avoid building the wrong thing.
|
||||
|
||||
---
|
||||
|
||||
## 9.2 OpenSpec CLI Commands Reference
|
||||
|
||||
OpenSpec is the change management workflow built into this project. Changes
|
||||
are tracked in `openspec/changes/`.
|
||||
|
||||
```bash
|
||||
# Create a new change (starts the design process)
|
||||
openspec new change <name>
|
||||
# Creates: openspec/changes/<name>/proposal.md, design.md, specs/, tasks.md
|
||||
|
||||
# Check the status of all active changes
|
||||
openspec status
|
||||
# Shows: change name, phase (design/implementation/review), completion %
|
||||
|
||||
# List all active changes
|
||||
openspec list
|
||||
|
||||
# Get implementation instructions for a change
|
||||
openspec instructions <name>
|
||||
# Outputs the tasks.md formatted for implementation
|
||||
|
||||
# Archive a completed change
|
||||
openspec archive <name>
|
||||
# Moves: openspec/changes/<name>/ → openspec/archive/<name>/
|
||||
```
|
||||
|
||||
### Change Lifecycle
|
||||
|
||||
```
|
||||
openspec/changes/<name>/
|
||||
├── proposal.md — Business case and feature description (CEO-authored)
|
||||
├── design.md — Technical design decisions (Architect-authored)
|
||||
├── specs/ — OpenAPI specs and interface contracts
|
||||
│ └── *.md or *.yaml
|
||||
└── tasks.md — Implementation tasks checklist (checked off as work completes)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9.3 Branching Strategy
|
||||
|
||||
### Branch naming
|
||||
|
||||
```
|
||||
feature/<short-description> — New features (from develop)
|
||||
fix/<short-description> — Bug fixes (from develop)
|
||||
docs/<short-description> — Documentation changes only (from develop)
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
```
|
||||
main — Production. Only CTO-approved, CEO-aware merges.
|
||||
└── develop — Integration branch. All feature branches merge here.
|
||||
└── feature/my-feature — Your work branch
|
||||
```
|
||||
|
||||
1. Create your branch from `develop`:
|
||||
```bash
|
||||
git checkout develop
|
||||
git pull origin develop
|
||||
git checkout -b feature/my-feature
|
||||
```
|
||||
|
||||
2. Work on your branch, committing as you go.
|
||||
|
||||
3. When ready, push and open a PR targeting `develop`:
|
||||
```bash
|
||||
git push -u origin feature/my-feature
|
||||
gh pr create --base develop --title "feat: my feature" --body "..."
|
||||
```
|
||||
|
||||
4. Virtual QA reviews the PR (all quality gates must pass).
|
||||
|
||||
5. Virtual CTO approves the PR.
|
||||
|
||||
6. Merge to `develop`.
|
||||
|
||||
7. `develop` → `main` requires an explicit CEO decision.
|
||||
|
||||
**Rule:** Never push directly to `main` or `develop`. Always work through a PR.
|
||||
|
||||
---
|
||||
|
||||
## 9.4 TypeScript and Code Standards
|
||||
|
||||
### Strict Mode
|
||||
|
||||
All compiler strictness flags are enabled in `tsconfig.json`. These are
|
||||
non-negotiable:
|
||||
|
||||
```json
|
||||
{
|
||||
"strict": true,
|
||||
"noImplicitAny": true,
|
||||
"strictNullChecks": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"noImplicitReturns": true
|
||||
}
|
||||
```
|
||||
|
||||
**Consequence of violation:** The TypeScript compiler (`npm run build`) will fail.
|
||||
PRs that cause build failures are rejected automatically.
|
||||
|
||||
### No `any` Types
|
||||
|
||||
Never use `any`. If a third-party library returns `unknown` or `any`, cast it to
|
||||
a specific interface you define:
|
||||
|
||||
```typescript
|
||||
// BAD
|
||||
const result: any = await vault.read(path);
|
||||
const secret = result.data.data.clientSecret;
|
||||
|
||||
// GOOD
|
||||
interface KvV2ReadResponse {
|
||||
data: { data: Record<string, string>; metadata: { version: number; } };
|
||||
}
|
||||
const result = (await vault.read(path)) as KvV2ReadResponse;
|
||||
const secret = result.data.data.clientSecret;
|
||||
```
|
||||
|
||||
### DRY (Don't Repeat Yourself)
|
||||
|
||||
Zero code duplication. See `04-codebase-structure.md` section 5.5 for the complete
|
||||
mapping of what lives where. Before writing a utility function, check whether
|
||||
it already exists in `src/utils/`.
|
||||
|
||||
### SOLID Principles
|
||||
|
||||
Each service has a single, clear responsibility. If you find yourself adding a
|
||||
method to `AgentService` that queries the `audit_events` table, stop — that
|
||||
belongs in `AuditService`. If you find yourself adding SQL to a controller, stop —
|
||||
that belongs in a repository.
|
||||
|
||||
### JSDoc on All Public Methods
|
||||
|
||||
Every public class, method, and interface must have a JSDoc comment that includes:
|
||||
- `@param` for every parameter
|
||||
- `@returns` describing the return value
|
||||
- `@throws` for every error that can be thrown
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Registers a new AI agent identity.
|
||||
*
|
||||
* @param data - Agent registration request data.
|
||||
* @param ipAddress - Client IP for audit logging.
|
||||
* @param userAgent - Client User-Agent for audit logging.
|
||||
* @returns The newly created agent record.
|
||||
* @throws FreeTierLimitError if the 100-agent limit is reached.
|
||||
* @throws AgentAlreadyExistsError if the email is already registered.
|
||||
*/
|
||||
async registerAgent(data: ICreateAgentRequest, ipAddress: string, userAgent: string): Promise<IAgent>
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
Always throw a typed error from the `SentryAgentError` hierarchy. Never throw
|
||||
raw `Error` objects with string messages in service or controller code:
|
||||
|
||||
```typescript
|
||||
// BAD
|
||||
throw new Error('Agent not found');
|
||||
|
||||
// GOOD
|
||||
throw new AgentNotFoundError(agentId);
|
||||
```
|
||||
|
||||
The `errorHandler` middleware maps `SentryAgentError` subclasses to HTTP status
|
||||
codes automatically. Adding a new error type only requires adding a class in
|
||||
`src/utils/errors.ts`.
|
||||
|
||||
---
|
||||
|
||||
## 9.5 PR Checklist
|
||||
|
||||
Every pull request must pass all items before it can be merged.
|
||||
|
||||
```
|
||||
Code quality:
|
||||
- [ ] TypeScript builds without errors: npm run build
|
||||
- [ ] No any types introduced
|
||||
- [ ] All new public methods have JSDoc
|
||||
- [ ] ESLint passes: npm run lint
|
||||
- [ ] No code duplication — logic extracted to utils/services
|
||||
|
||||
Testing:
|
||||
- [ ] Unit tests added for all new service methods
|
||||
- [ ] Integration tests added for all new API endpoints
|
||||
- [ ] Coverage threshold maintained: npm run test:unit -- --coverage
|
||||
(>80% statements, branches, functions, lines)
|
||||
- [ ] Integration tests pass against real PostgreSQL and Redis
|
||||
|
||||
Spec compliance:
|
||||
- [ ] Implementation matches the approved OpenAPI spec exactly
|
||||
- [ ] If the spec needs updating, the spec was updated BEFORE the code was changed
|
||||
|
||||
Documentation:
|
||||
- [ ] docs/engineering/ updated if the change affects service interfaces or workflows
|
||||
- [ ] CHANGELOG.md updated with a summary of the change
|
||||
- [ ] Any new environment variables documented in docs/engineering/07-dev-setup.md
|
||||
|
||||
Database:
|
||||
- [ ] If a new table or column is added, a migration file exists in src/db/migrations/
|
||||
- [ ] Migration file is numbered correctly and tested locally
|
||||
|
||||
Review:
|
||||
- [ ] Virtual CTO reviewed the implementation
|
||||
- [ ] Virtual QA signed off on tests
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9.6 Virtual Engineering Team Roles for Contributors
|
||||
|
||||
External contributors operate within the same team structure as the internal
|
||||
Virtual Engineering Team. Here is how to interact with each role:
|
||||
|
||||
**Virtual CTO (architecture gate)**
|
||||
Opens architectural discussions by filing a GitHub issue labeled `architecture`.
|
||||
The Virtual CTO must approve any change to:
|
||||
- The layered architecture (adding a direct DB call in a controller)
|
||||
- The error hierarchy
|
||||
- The authentication or authorisation flow
|
||||
- Any new dependency (package)
|
||||
- Multi-region deployment topology
|
||||
|
||||
**Virtual Architect (spec owner)**
|
||||
All API changes require an updated OpenAPI spec. If you are adding an endpoint,
|
||||
file an issue labeled `spec-change` with your proposed additions before writing
|
||||
any code. The Architect will review and approve the spec.
|
||||
|
||||
**Virtual Principal Developer (code reviewer)**
|
||||
All implementation PRs are reviewed for TypeScript compliance, DRY violations,
|
||||
SOLID violations, JSDoc completeness, and correctness against the spec.
|
||||
|
||||
**Virtual QA Engineer (quality gate)**
|
||||
All PRs require >80% coverage and passing integration tests. The QA Engineer
|
||||
will review test completeness and flag edge cases that need coverage.
|
||||
|
||||
---
|
||||
|
||||
## 9.7 Commit Message Conventions
|
||||
|
||||
This project uses **Conventional Commits** (https://www.conventionalcommits.org).
|
||||
|
||||
### Format
|
||||
|
||||
```
|
||||
<type>(<optional scope>): <short description>
|
||||
|
||||
<optional body — why this change was made>
|
||||
|
||||
<optional footer — breaking changes, issue references>
|
||||
```
|
||||
|
||||
### Types
|
||||
|
||||
| Type | When to use | Example |
|
||||
|------|-------------|---------|
|
||||
| `feat` | A new feature | `feat(agents): add agent status filter to list endpoint` |
|
||||
| `fix` | A bug fix | `fix(auth): handle TokenExpiredError separately from JsonWebTokenError` |
|
||||
| `docs` | Documentation only | `docs(engineering): add credential rotation walkthrough` |
|
||||
| `test` | Adding or updating tests | `test(oauth2): add monthly token limit integration test` |
|
||||
| `chore` | Build, tooling, dependencies | `chore: update jest to 29.7.0` |
|
||||
| `refactor` | Code change with no behaviour change | `refactor(credential): extract secret storage to VaultClient` |
|
||||
| `perf` | Performance improvement | `perf(token): make audit log write fire-and-forget` |
|
||||
|
||||
### Rules
|
||||
|
||||
- Keep the description under 72 characters
|
||||
- Use the imperative mood: "add" not "added" or "adds"
|
||||
- Include the scope in parentheses when the change is limited to one area
|
||||
- Reference issues in the footer: `Closes #123`
|
||||
|
||||
### Examples
|
||||
|
||||
```
|
||||
feat(vault): add optional HashiCorp Vault credential backend
|
||||
|
||||
Adds VaultClient wrapping node-vault for KV v2 operations.
|
||||
When VAULT_ADDR and VAULT_TOKEN are set, new credentials are
|
||||
stored in Vault instead of as bcrypt hashes in PostgreSQL.
|
||||
|
||||
Backwards compatible: existing bcrypt credentials continue to work.
|
||||
|
||||
Closes #45
|
||||
```
|
||||
|
||||
```
|
||||
fix(opa): normalise /token/revoke path before OPA lookup
|
||||
|
||||
The path /api/v1/token/revoke was not in the normalisation
|
||||
switch, causing OPA to deny all revocation requests even with
|
||||
the correct scope.
|
||||
```
|
||||
|
||||
```
|
||||
docs: add engineering knowledge base for new hires
|
||||
|
||||
Adds docs/engineering/ with 11 documents covering architecture,
|
||||
service deep-dives, code walkthroughs, dev setup, workflow,
|
||||
testing, deployment, and SDK guide.
|
||||
```
|
||||
424
docs/engineering/09-testing.md
Normal file
424
docs/engineering/09-testing.md
Normal file
@@ -0,0 +1,424 @@
|
||||
# 09 — Testing Strategy
|
||||
|
||||
---
|
||||
|
||||
## 10.1 Test Types and Purposes
|
||||
|
||||
This codebase uses two types of tests. Understanding when to use each prevents
|
||||
you from writing integration tests for things that should be unit tests (slow)
|
||||
and unit tests for things that need a real database (misleading).
|
||||
|
||||
### Unit Tests
|
||||
|
||||
**Location:** `tests/unit/`
|
||||
|
||||
**What they test:** A single class or function in complete isolation. All
|
||||
dependencies (repositories, services, external clients) are replaced with Jest mocks.
|
||||
|
||||
**When to use:**
|
||||
- Testing service business logic (free-tier limits, status transitions, error cases)
|
||||
- Testing utility functions (crypto, jwt, validators)
|
||||
- Testing error hierarchy behaviour
|
||||
- Any code that has conditional logic you want to test exhaustively
|
||||
|
||||
**What they do NOT test:**
|
||||
- Whether the SQL queries are correct
|
||||
- Whether the HTTP routing works
|
||||
- Whether middleware chains execute in the right order
|
||||
|
||||
**Speed:** Milliseconds. Hundreds of unit tests should complete in under 10 seconds.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
**Location:** `tests/integration/`
|
||||
|
||||
**What they test:** A full HTTP request through the Express application against
|
||||
a real PostgreSQL database and real Redis instance.
|
||||
|
||||
**When to use:**
|
||||
- Testing that a route is correctly wired to the right controller method
|
||||
- Testing authentication and authorisation middleware in combination
|
||||
- Testing database operations end-to-end (INSERT → read back → verify)
|
||||
- Testing response shapes match the OpenAPI spec exactly
|
||||
|
||||
**What they require:**
|
||||
- Running PostgreSQL (at `TEST_DATABASE_URL` or default)
|
||||
- Running Redis (at `TEST_REDIS_URL` or default)
|
||||
- The test creates its own tables and cleans up after every test case
|
||||
|
||||
**Speed:** Seconds. Expect 2–5 seconds per integration test file.
|
||||
|
||||
---
|
||||
|
||||
## 10.2 Test Framework Stack
|
||||
|
||||
| Tool | Role |
|
||||
|------|------|
|
||||
| **Jest 29.7** | Test runner. `describe`, `it`, `expect`, `beforeEach`, `afterAll`. Also provides mocking via `jest.mock()`, `jest.fn()`, `jest.spyOn()`. |
|
||||
| **ts-jest** | Transforms TypeScript test files for Jest without a separate compilation step. Configured in `jest.config.ts`. |
|
||||
| **Supertest 6.3** | HTTP testing library. Used in integration tests to make real HTTP requests against the Express app without opening a network port. Works by passing the `Application` object directly. |
|
||||
|
||||
**Jest configuration** (`jest.config.ts`):
|
||||
```typescript
|
||||
export default {
|
||||
preset: 'ts-jest',
|
||||
testEnvironment: 'node',
|
||||
roots: ['<rootDir>/tests'],
|
||||
testPathPattern: ['tests/unit', 'tests/integration'],
|
||||
collectCoverageFrom: ['src/**/*.ts', '!src/server.ts'],
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10.3 Coverage Gates
|
||||
|
||||
All four coverage metrics must be above 80% before a feature is considered complete:
|
||||
|
||||
| Metric | Gate | What it means |
|
||||
|--------|------|---------------|
|
||||
| Statements | >80% | Each statement was executed at least once |
|
||||
| Branches | >80% | Each `if`/`else`/`switch` branch was taken at least once |
|
||||
| Functions | >80% | Each function was called at least once |
|
||||
| Lines | >80% | Each line was executed at least once |
|
||||
|
||||
**Enforcement:**
|
||||
|
||||
Coverage is checked in the PR process:
|
||||
```bash
|
||||
npm run test:unit -- --coverage
|
||||
# Fails if any metric is below 80%
|
||||
```
|
||||
|
||||
Coverage reports are output to `coverage/lcov-report/index.html` for visual inspection.
|
||||
|
||||
The coverage threshold configuration is in `jest.config.ts`:
|
||||
```typescript
|
||||
coverageThreshold: {
|
||||
global: {
|
||||
statements: 80,
|
||||
branches: 80,
|
||||
functions: 80,
|
||||
lines: 80,
|
||||
},
|
||||
},
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10.4 How to Run the Test Suite
|
||||
|
||||
```bash
|
||||
# Run all tests (unit + integration)
|
||||
npm test
|
||||
|
||||
# Run only unit tests
|
||||
npm run test:unit
|
||||
|
||||
# Run only integration tests
|
||||
npm run test:integration
|
||||
|
||||
# Run unit tests with coverage report
|
||||
npm run test:unit -- --coverage
|
||||
# HTML report: coverage/lcov-report/index.html
|
||||
|
||||
# Run a single test file
|
||||
npx jest tests/unit/services/AgentService.test.ts
|
||||
|
||||
# Run tests matching a name pattern
|
||||
npx jest --testNamePattern="should throw FreeTierLimitError"
|
||||
|
||||
# Run tests in watch mode (re-runs on file changes)
|
||||
npx jest --watch
|
||||
|
||||
# Run with verbose output (shows each test name)
|
||||
npx jest --verbose
|
||||
```
|
||||
|
||||
**Integration test environment variables:**
|
||||
```bash
|
||||
export TEST_DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test
|
||||
export TEST_REDIS_URL=redis://localhost:6379/1
|
||||
npm run test:integration
|
||||
```
|
||||
|
||||
Using database index `/1` for Redis in tests prevents test runs from polluting
|
||||
the main database (index `0`) used for local development.
|
||||
|
||||
---
|
||||
|
||||
## 10.5 Unit Test Writing Conventions
|
||||
|
||||
Unit tests follow a strict pattern. Study this example carefully — it shows every
|
||||
convention in use.
|
||||
|
||||
**Real example from `tests/unit/services/AgentService.test.ts`:**
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Unit tests for src/services/AgentService.ts
|
||||
*/
|
||||
|
||||
import { AgentService } from '../../../src/services/AgentService';
|
||||
import { AgentRepository } from '../../../src/repositories/AgentRepository';
|
||||
import { CredentialRepository } from '../../../src/repositories/CredentialRepository';
|
||||
import { AuditService } from '../../../src/services/AuditService';
|
||||
import {
|
||||
AgentAlreadyExistsError,
|
||||
FreeTierLimitError,
|
||||
} from '../../../src/utils/errors';
|
||||
import { IAgent, ICreateAgentRequest } from '../../../src/types/index';
|
||||
|
||||
// Mock all dependencies — none of them execute real code
|
||||
jest.mock('../../../src/repositories/AgentRepository');
|
||||
jest.mock('../../../src/repositories/CredentialRepository');
|
||||
jest.mock('../../../src/services/AuditService');
|
||||
|
||||
// Get typed mock constructors so we can call .mockResolvedValue() on them
|
||||
const MockAgentRepository = AgentRepository as jest.MockedClass<typeof AgentRepository>;
|
||||
const MockCredentialRepository = CredentialRepository as jest.MockedClass<typeof CredentialRepository>;
|
||||
const MockAuditService = AuditService as jest.MockedClass<typeof AuditService>;
|
||||
|
||||
// Define a complete test fixture — reuse this instead of duplicating object literals
|
||||
const MOCK_AGENT: IAgent = {
|
||||
agentId: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
|
||||
email: 'agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'team-a',
|
||||
deploymentEnv: 'production',
|
||||
status: 'active',
|
||||
createdAt: new Date('2026-03-28T09:00:00Z'),
|
||||
updatedAt: new Date('2026-03-28T09:00:00Z'),
|
||||
};
|
||||
|
||||
describe('AgentService', () => {
|
||||
let agentService: AgentService;
|
||||
let agentRepo: jest.Mocked<AgentRepository>;
|
||||
let credentialRepo: jest.Mocked<CredentialRepository>;
|
||||
let auditService: jest.Mocked<AuditService>;
|
||||
|
||||
beforeEach(() => {
|
||||
// Clear all mocks before each test — prevents state leakage
|
||||
jest.clearAllMocks();
|
||||
// Create fresh mock instances for each test
|
||||
agentRepo = new MockAgentRepository({} as never) as jest.Mocked<AgentRepository>;
|
||||
credentialRepo = new MockCredentialRepository({} as never) as jest.Mocked<CredentialRepository>;
|
||||
auditService = new MockAuditService({} as never) as jest.Mocked<AuditService>;
|
||||
// Inject mocks into the system under test
|
||||
agentService = new AgentService(agentRepo, credentialRepo, auditService);
|
||||
});
|
||||
|
||||
describe('registerAgent()', () => {
|
||||
const createData: ICreateAgentRequest = {
|
||||
email: 'agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'team-a',
|
||||
deploymentEnv: 'production',
|
||||
};
|
||||
|
||||
it('should create and return a new agent', async () => {
|
||||
// Arrange — set up mock return values
|
||||
agentRepo.countActive.mockResolvedValue(0);
|
||||
agentRepo.findByEmail.mockResolvedValue(null);
|
||||
agentRepo.create.mockResolvedValue(MOCK_AGENT);
|
||||
auditService.logEvent.mockResolvedValue({} as never);
|
||||
|
||||
// Act — call the method under test
|
||||
const result = await agentService.registerAgent(createData, '127.0.0.1', 'test/1.0');
|
||||
|
||||
// Assert — verify the result
|
||||
expect(result).toEqual(MOCK_AGENT);
|
||||
// Also verify the mock was called with the right arguments
|
||||
expect(agentRepo.create).toHaveBeenCalledWith(createData);
|
||||
});
|
||||
|
||||
it('should throw FreeTierLimitError when 100 agents already registered', async () => {
|
||||
// Arrange — simulate limit reached
|
||||
agentRepo.countActive.mockResolvedValue(100);
|
||||
|
||||
// Assert error — rejects.toThrow checks the error type
|
||||
await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
|
||||
.rejects.toThrow(FreeTierLimitError);
|
||||
});
|
||||
|
||||
it('should throw AgentAlreadyExistsError if email is already registered', async () => {
|
||||
agentRepo.countActive.mockResolvedValue(0);
|
||||
agentRepo.findByEmail.mockResolvedValue(MOCK_AGENT); // Simulate existing agent
|
||||
|
||||
await expect(agentService.registerAgent(createData, '127.0.0.1', 'test/1.0'))
|
||||
.rejects.toThrow(AgentAlreadyExistsError);
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Conventions explained:
|
||||
|
||||
1. **One test file per source file.** `AgentService.test.ts` tests `AgentService.ts`.
|
||||
2. **`jest.mock()` before any imports from the mocked module.** Jest hoists mock declarations.
|
||||
3. **`jest.clearAllMocks()` in `beforeEach`.** Prevents mock call counts from leaking between tests.
|
||||
4. **AAA pattern (Arrange, Act, Assert).** Every `it` block follows this order.
|
||||
5. **Test both the happy path and every error case.** A service with 3 error conditions
|
||||
needs at least 4 tests (1 success + 3 failures).
|
||||
6. **Verify mock calls for side effects.** Use `.toHaveBeenCalledWith()` to verify that
|
||||
`auditService.logEvent` was called with the right arguments, not just that it was called.
|
||||
7. **Use typed error assertions.** `.rejects.toThrow(FreeTierLimitError)` verifies the
|
||||
error type, not just a message string.
|
||||
|
||||
---
|
||||
|
||||
## 10.6 Integration Test Writing Conventions
|
||||
|
||||
Integration tests use Supertest to make real HTTP requests against a live Express app.
|
||||
|
||||
**Real example from `tests/integration/agents.test.ts`:**
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Integration tests for Agent Registry endpoints.
|
||||
*/
|
||||
|
||||
import crypto from 'crypto';
|
||||
import request from 'supertest';
|
||||
import { Application } from 'express';
|
||||
import { v4 as uuidv4 } from 'uuid';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
// Generate RSA keys for test tokens — done once per test module
|
||||
const { privateKey, publicKey } = crypto.generateKeyPairSync('rsa', {
|
||||
modulusLength: 2048,
|
||||
publicKeyEncoding: { type: 'spki', format: 'pem' },
|
||||
privateKeyEncoding: { type: 'pkcs8', format: 'pem' },
|
||||
});
|
||||
|
||||
// Set environment variables BEFORE importing the app
|
||||
process.env['DATABASE_URL'] = process.env['TEST_DATABASE_URL'] ?? 'postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp_test';
|
||||
process.env['REDIS_URL'] = process.env['TEST_REDIS_URL'] ?? 'redis://localhost:6379/1';
|
||||
process.env['JWT_PRIVATE_KEY'] = privateKey;
|
||||
process.env['JWT_PUBLIC_KEY'] = publicKey;
|
||||
process.env['NODE_ENV'] = 'test';
|
||||
|
||||
import { createApp } from '../../src/app';
|
||||
import { signToken } from '../../src/utils/jwt';
|
||||
import { closePool } from '../../src/db/pool';
|
||||
import { closeRedisClient } from '../../src/cache/redis';
|
||||
|
||||
// Helper: mint a valid test token
|
||||
function makeToken(sub: string = uuidv4(), scope: string = 'agents:read agents:write'): string {
|
||||
return signToken({ sub, client_id: sub, scope, jti: uuidv4() }, privateKey);
|
||||
}
|
||||
|
||||
describe('Agent Registry Integration Tests', () => {
|
||||
let app: Application;
|
||||
let pool: Pool;
|
||||
|
||||
beforeAll(async () => {
|
||||
// Boot the real Express app
|
||||
app = await createApp();
|
||||
pool = new Pool({ connectionString: process.env['DATABASE_URL'] });
|
||||
|
||||
// Create test tables (idempotent)
|
||||
await pool.query(`CREATE TABLE IF NOT EXISTS agents (...)`);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
// Clean up after each test — order matters (foreign key constraints)
|
||||
await pool.query('DELETE FROM audit_events');
|
||||
await pool.query('DELETE FROM credentials');
|
||||
await pool.query('DELETE FROM agents');
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
// Close all connections — prevents Jest from hanging
|
||||
await pool.end();
|
||||
await closePool();
|
||||
await closeRedisClient();
|
||||
});
|
||||
|
||||
describe('POST /api/v1/agents', () => {
|
||||
it('should register a new agent and return 201', async () => {
|
||||
const token = makeToken();
|
||||
|
||||
const res = await request(app)
|
||||
.post('/api/v1/agents')
|
||||
.set('Authorization', `Bearer ${token}`)
|
||||
.send({
|
||||
email: 'test-agent@sentryagent.ai',
|
||||
agentType: 'screener',
|
||||
version: '1.0.0',
|
||||
capabilities: ['resume:read'],
|
||||
owner: 'test-team',
|
||||
deploymentEnv: 'development',
|
||||
});
|
||||
|
||||
expect(res.status).toBe(201);
|
||||
expect(res.body.agentId).toBeDefined();
|
||||
expect(res.body.email).toBe('test-agent@sentryagent.ai');
|
||||
expect(res.body.status).toBe('active');
|
||||
});
|
||||
|
||||
it('should return 401 without a token', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/v1/agents')
|
||||
.send({ email: 'test@sentryagent.ai' });
|
||||
|
||||
expect(res.status).toBe(401);
|
||||
});
|
||||
|
||||
it('should return 409 for duplicate email', async () => {
|
||||
const token = makeToken();
|
||||
const body = { email: 'dup@sentryagent.ai', agentType: 'screener', version: '1.0', capabilities: [], owner: 'team', deploymentEnv: 'development' };
|
||||
|
||||
await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
|
||||
const res = await request(app).post('/api/v1/agents').set('Authorization', `Bearer ${token}`).send(body);
|
||||
|
||||
expect(res.status).toBe(409);
|
||||
expect(res.body.code).toBe('AGENT_ALREADY_EXISTS');
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Conventions explained:
|
||||
|
||||
1. **Set `process.env` before importing the app.** The app reads env vars at import
|
||||
time (`getPool()`, JWT keys). Setting them after import does nothing.
|
||||
2. **`afterEach` cleanup.** Delete all rows after each test so tests are independent.
|
||||
Always delete in child-to-parent order (audit_events → credentials → agents)
|
||||
to respect foreign key constraints.
|
||||
3. **`afterAll` close connections.** Always close the pool and Redis client at the end
|
||||
of the suite. Jest will hang if connections remain open.
|
||||
4. **Test both success and failure status codes.** Every endpoint test must include
|
||||
an unauthenticated request (401) and an invalid request (400).
|
||||
5. **Verify response body shape.** Check `res.body.code` for error responses to
|
||||
verify the correct error type, not just the status code.
|
||||
6. **Use `makeToken()` for test tokens.** A helper function keeps token creation
|
||||
consistent across all integration test files.
|
||||
|
||||
---
|
||||
|
||||
## 10.7 OWASP Top 10 Security Testing Reference
|
||||
|
||||
These are the security concerns most relevant to an identity provider. For each,
|
||||
here is what AgentIdP does to mitigate the risk and how to test it.
|
||||
|
||||
| OWASP Category | Relevant risk | Mitigation | Test approach |
|
||||
|---------------|--------------|-----------|---------------|
|
||||
| **A01 Broken Access Control** | Agent A accesses agent B's credentials | `req.user.sub !== agentId` check in all credential endpoints | Test: send credential request with a token for agent A but agentId for agent B in the path — expect 403 |
|
||||
| **A02 Cryptographic Failures** | Weak credential secrets or JWT algorithm | `sk_live_<64 hex>` = 256-bit entropy; RS256 signing; bcrypt 10 rounds | Test: verify generated secrets are 72 chars; verify JWT header shows `alg: RS256` |
|
||||
| **A03 Injection** | SQL injection via input fields | Parameterised queries (`$1, $2, ...`) in all repositories | Test: send `'; DROP TABLE agents; --` as `owner` field — expect 400 from Joi validation |
|
||||
| **A05 Security Misconfiguration** | Server leaking stack traces | `errorHandler` returns generic 500 for unknown errors | Test: trigger an unexpected error (mock a repository to throw `new Error()`) — verify response body does not contain stack trace |
|
||||
| **A06 Vulnerable Components** | Outdated dependencies with CVEs | Regular `npm audit` | Run: `npm audit` in CI; fail on high/critical findings |
|
||||
| **A07 Auth Failures** | Timing attack on credential verification | `crypto.timingSafeEqual` in `VaultClient.verifySecret()`; bcrypt inherently timing-safe | Test: measure multiple failed verification attempts with wrong secrets of varying lengths — timing should not increase linearly with shared prefix length |
|
||||
| **A08 Integrity Failures** | Forged JWT tokens | RS256 verification rejects tokens signed with wrong key | Test: create a token signed with a different private key — expect 401 |
|
||||
| **A09 Logging Failures** | Auth failures not logged | `auth.failed` audit events written for every authentication failure | Test: attempt token issuance with wrong secret — verify `auth_events` table contains `auth.failed` row |
|
||||
| **A10 SSRF** | Not applicable to current API surface | No outbound HTTP from user-supplied URLs | N/A — no URL-accepting fields in current API |
|
||||
|
||||
**JWT algorithm confusion (bonus):**
|
||||
Test that the server rejects tokens with `alg: none` or `alg: HS256`. The
|
||||
`verifyToken()` function specifies `algorithms: ['RS256']`, which causes jsonwebtoken
|
||||
to reject any token with a different algorithm header.
|
||||
273
docs/engineering/10-deployment.md
Normal file
273
docs/engineering/10-deployment.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# 10 — Deployment and Operations
|
||||
|
||||
This document covers building and running AgentIdP in production: Docker, environment variables, database migrations, Terraform multi-region deployment, Prometheus/Grafana monitoring, and operational runbooks for common incidents.
|
||||
|
||||
---
|
||||
|
||||
## 1. Docker Build and Run
|
||||
|
||||
The Dockerfile uses a two-stage build:
|
||||
|
||||
- **Stage 1 (builder):** `node:18-alpine` — installs all dependencies (including dev) and compiles TypeScript to `dist/`.
|
||||
- **Stage 2 (production):** `node:18-alpine` — copies `dist/` and `node_modules` (production only), runs as the built-in non-root `node` user.
|
||||
|
||||
```bash
|
||||
# Build
|
||||
docker build -t sentryagent-idp:latest .
|
||||
|
||||
# Run (supply required env vars)
|
||||
docker run -d \
|
||||
-p 3000:3000 \
|
||||
-e DATABASE_URL=postgresql://sentryagent:sentryagent@<host>:5432/sentryagent_idp \
|
||||
-e REDIS_URL=redis://<host>:6379 \
|
||||
-e JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\n..." \
|
||||
-e JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\n..." \
|
||||
sentryagent-idp:latest
|
||||
```
|
||||
|
||||
The container exposes port `3000`. Override with `PORT` environment variable if needed.
|
||||
|
||||
For local full-stack development, use Docker Compose instead:
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The `docker-compose.yml` starts the app, PostgreSQL 14, and Redis 7 with health checks and data volumes.
|
||||
|
||||
---
|
||||
|
||||
## 2. Environment Variables Reference
|
||||
|
||||
All variables are loaded at startup via `dotenv`. In production, inject them directly into the process environment — do not commit `.env` to version control.
|
||||
|
||||
| Variable | Required | Default | Purpose |
|
||||
|----------|----------|---------|---------|
|
||||
| `DATABASE_URL` | Yes | — | PostgreSQL connection string. Format: `postgresql://<user>:<password>@<host>:<port>/<db>` |
|
||||
| `REDIS_URL` | Yes | — | Redis connection URL. Format: `redis://<host>:<port>` |
|
||||
| `JWT_PRIVATE_KEY` | Yes | — | PEM-encoded RSA-2048 private key for signing RS256 JWT tokens |
|
||||
| `JWT_PUBLIC_KEY` | Yes | — | PEM-encoded RSA-2048 public key for verifying tokens on every authenticated request |
|
||||
| `PORT` | No | `3000` | HTTP port the Express server listens on |
|
||||
| `NODE_ENV` | No | `undefined` | Set to `production` in production, `test` in test (disables Morgan logging in test) |
|
||||
| `CORS_ORIGIN` | No | `*` | Allowed CORS origin(s). Set to specific URL in production (e.g. `https://app.mycompany.ai`) |
|
||||
| `VAULT_ADDR` | No | — | HashiCorp Vault server address. When set with `VAULT_TOKEN`, new credentials are stored in Vault KV v2 instead of bcrypt |
|
||||
| `VAULT_TOKEN` | No | — | Vault authentication token. Required when `VAULT_ADDR` is set |
|
||||
| `VAULT_MOUNT` | No | `secret` | KV v2 secrets engine mount path |
|
||||
| `POLICY_DIR` | No | `<cwd>/policies` | Directory containing OPA policy files (`authz.wasm` or `data/scopes.json`) |
|
||||
|
||||
**Validation at startup:** `JWT_PRIVATE_KEY` and `JWT_PUBLIC_KEY` are checked in `createApp()` (see `src/app.ts:117–121`). If missing, the process exits before binding to any port. `DATABASE_URL` and `REDIS_URL` are validated when their respective singletons are first initialised.
|
||||
|
||||
---
|
||||
|
||||
## 3. Database Migrations
|
||||
|
||||
Migrations are plain SQL files in `src/db/migrations/`. They are append-only — never modify an existing migration file. Always create a new numbered file.
|
||||
|
||||
Current migration files:
|
||||
|
||||
| File | What it creates |
|
||||
|------|----------------|
|
||||
| `001_create_agents.sql` | `agents` table with UUID primary key, email unique constraint, status enum |
|
||||
| `002_create_credentials.sql` | `credentials` table linked to `agents` by `client_id` foreign key |
|
||||
| `003_create_audit_events.sql` | `audit_events` table with JSONB `metadata` column |
|
||||
| `004_create_tokens.sql` | `token_monthly_counts` table for free-tier token limit tracking |
|
||||
| `005_add_vault_path.sql` | Adds `vault_path VARCHAR(512)` column to the `credentials` table |
|
||||
|
||||
**Run migrations:**
|
||||
|
||||
```bash
|
||||
npm run db:migrate
|
||||
```
|
||||
|
||||
This executes `scripts/migrate.ts` which applies all SQL files that have not yet been recorded in the `schema_migrations` tracking table.
|
||||
|
||||
**Adding a new migration:**
|
||||
|
||||
1. Create `src/db/migrations/006_<description>.sql`
|
||||
2. Write idempotent SQL (use `CREATE TABLE IF NOT EXISTS`, `ADD COLUMN IF NOT EXISTS`, etc.)
|
||||
3. Run `npm run db:migrate`
|
||||
|
||||
---
|
||||
|
||||
## 4. Terraform Multi-Region Deployment
|
||||
|
||||
The `terraform/` directory contains reusable modules and two environment configurations.
|
||||
|
||||
**Directory structure:**
|
||||
|
||||
```
|
||||
terraform/
|
||||
modules/
|
||||
agentidp/ # Core AgentIdP compute resources
|
||||
lb/ # Load balancer (ALB/Cloud Load Balancer)
|
||||
rds/ # RDS PostgreSQL (AWS)
|
||||
redis/ # ElastiCache Redis (AWS) / Memorystore (GCP)
|
||||
environments/
|
||||
aws/ # AWS deployment (ECS Fargate, ALB, RDS, ElastiCache)
|
||||
gcp/ # GCP deployment (Cloud Run, Cloud SQL, Memorystore)
|
||||
```
|
||||
|
||||
### AWS Deployment
|
||||
|
||||
Architecture: `Internet → Route 53 → ALB (public subnets, HTTPS) → ECS Fargate tasks (private subnets) → RDS PostgreSQL 14 (Multi-AZ) + ElastiCache Redis 7`
|
||||
|
||||
All secrets are stored in AWS Secrets Manager and injected into ECS task definitions at launch time.
|
||||
|
||||
```bash
|
||||
cd terraform/environments/aws
|
||||
terraform init
|
||||
terraform plan -var="aws_region=us-east-1"
|
||||
terraform apply
|
||||
```
|
||||
|
||||
Resources provisioned:
|
||||
- VPC with public and private subnets across multiple availability zones
|
||||
- ECS Cluster and Fargate task definition (running `sentryagent-idp` container)
|
||||
- Application Load Balancer with HTTPS listener and health check target group
|
||||
- RDS PostgreSQL 14 (Multi-AZ for high availability)
|
||||
- ElastiCache Redis 7 (primary + replica)
|
||||
- IAM roles and instance profiles for ECS task permissions
|
||||
- Security groups enforcing least-privilege network access
|
||||
|
||||
### GCP Deployment
|
||||
|
||||
Architecture: `Internet → Cloud Run (Google-managed TLS, auto-scaling) → Cloud SQL PostgreSQL 14 (REGIONAL HA) + Memorystore Redis 7 (STANDARD_HA)`
|
||||
|
||||
All secrets are stored in GCP Secret Manager and mounted into the Cloud Run service at startup.
|
||||
|
||||
```bash
|
||||
cd terraform/environments/gcp
|
||||
terraform init
|
||||
terraform plan -var="gcp_region=us-central1"
|
||||
terraform apply
|
||||
```
|
||||
|
||||
Resources provisioned:
|
||||
- VPC network with Serverless VPC Access connector (Cloud Run → private databases)
|
||||
- Cloud Run service (auto-scales to zero, Google-managed TLS)
|
||||
- Cloud Load Balancer with global anycast IP
|
||||
- Cloud SQL PostgreSQL 14 with regional high-availability
|
||||
- Memorystore Redis 7 (STANDARD_HA with in-transit encryption)
|
||||
- IAM service accounts and bindings
|
||||
|
||||
**Important:** All infrastructure changes must go through Terraform. Never make manual edits in the AWS console or GCP Cloud Console — they will be overwritten on the next `terraform apply` and will not be tracked in state.
|
||||
|
||||
---
|
||||
|
||||
## 5. Prometheus and Grafana
|
||||
|
||||
**Metrics endpoint:** `GET /metrics` (unauthenticated — restrict in production to internal network or scrape from within the cluster)
|
||||
|
||||
The metrics endpoint is served by the `prom-client` library using a dedicated registry (`metricsRegistry`) defined in `src/metrics/registry.ts`. The registry is isolated from the default global registry to prevent conflicts in tests.
|
||||
|
||||
### Metric Definitions
|
||||
|
||||
All 6 metrics are defined in `src/metrics/registry.ts`:
|
||||
|
||||
| Metric name | Type | Labels | What it measures |
|
||||
|-------------|------|--------|-----------------|
|
||||
| `agentidp_tokens_issued_total` | Counter | `scope` | Total OAuth 2.0 access tokens issued successfully |
|
||||
| `agentidp_agents_registered_total` | Counter | `deployment_env` | Total AI agents registered successfully |
|
||||
| `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | Total HTTP requests received |
|
||||
| `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP request duration in seconds (buckets: 5ms–2.5s) |
|
||||
| `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration in seconds |
|
||||
| `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration in seconds |
|
||||
|
||||
The HTTP metrics (`agentidp_http_requests_total` and `agentidp_http_request_duration_seconds`) are populated by `metricsMiddleware` in `src/middleware/metrics.ts`, which is registered before all routes in `src/app.ts`. Route labels are normalised to replace UUIDs with `:id` to prevent high cardinality (e.g. `/api/v1/agents/:id` rather than `/api/v1/agents/a1b2c3...`).
|
||||
|
||||
### Local Grafana
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d
|
||||
```
|
||||
|
||||
- Prometheus: http://localhost:9090
|
||||
- Grafana: http://localhost:3001 (admin password: `agentidp`)
|
||||
|
||||
The monitoring compose overlay starts `prom/prometheus:v2.53.0` and `grafana/grafana:11.2.0`. Grafana dashboards and datasource provisioning are loaded from `monitoring/grafana/provisioning/`.
|
||||
|
||||
### Adding a New Metric
|
||||
|
||||
1. Define the metric in `src/metrics/registry.ts` using the shared `metricsRegistry` (not the default prom-client registry).
|
||||
2. Export it from that file.
|
||||
3. Import it in the file where the instrumentation point lives.
|
||||
4. Call `.inc(labels)` for Counters or `.observe(labels, value)` for Histograms at the instrumentation point.
|
||||
5. Verify it appears in `GET /metrics` after starting the server.
|
||||
|
||||
---
|
||||
|
||||
## 6. Operational Runbook
|
||||
|
||||
### Health Check
|
||||
|
||||
```bash
|
||||
curl http://<host>/health
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```json
|
||||
{"status":"ok","postgres":"connected","redis":"connected"}
|
||||
```
|
||||
|
||||
Troubleshooting:
|
||||
- If `postgres: "error"` — verify `DATABASE_URL` is correct and PostgreSQL is reachable. Check `docker compose logs postgres` for local dev.
|
||||
- If `redis: "error"` — verify `REDIS_URL` is correct and Redis is reachable. Check `docker compose logs redis` for local dev.
|
||||
- If the health endpoint returns 502 or times out — the app process has crashed; check application logs.
|
||||
|
||||
---
|
||||
|
||||
### Rotate the JWT Signing Key
|
||||
|
||||
All active tokens become invalid after a key rotation — agents must re-authenticate.
|
||||
|
||||
1. Generate a new RSA-2048 key pair:
|
||||
```bash
|
||||
openssl genrsa -out new-private.pem 2048
|
||||
openssl rsa -in new-private.pem -pubout -out new-public.pem
|
||||
```
|
||||
2. Update `JWT_PRIVATE_KEY` and `JWT_PUBLIC_KEY` in your deployment environment (AWS Secrets Manager, GCP Secret Manager, or `.env`).
|
||||
3. Perform a rolling restart:
|
||||
- **ECS:** trigger a new task deployment — ECS drains existing tasks and starts new ones with the updated secret values.
|
||||
- **Cloud Run:** deploy a new revision — Cloud Run gradually shifts traffic to the new revision.
|
||||
4. Tokens signed with the old key will fail verification immediately after all instances have restarted.
|
||||
|
||||
---
|
||||
|
||||
### Revoke All Tokens for a Compromised Agent
|
||||
|
||||
Suspend the agent to stop new token issuance immediately:
|
||||
|
||||
```bash
|
||||
curl -X PATCH http://<host>/api/v1/agents/<agentId> \
|
||||
-H "Authorization: Bearer <admin_token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"status": "suspended"}'
|
||||
```
|
||||
|
||||
This prevents any new `POST /api/v1/token` requests for that agent. Active tokens remain valid until their TTL (1 hour). To invalidate active tokens immediately, also revoke all credentials for the agent:
|
||||
|
||||
```bash
|
||||
# List credentials
|
||||
curl http://<host>/api/v1/agents/<agentId>/credentials \
|
||||
-H "Authorization: Bearer <admin_token>"
|
||||
|
||||
# Revoke each active credential
|
||||
curl -X DELETE http://<host>/api/v1/agents/<agentId>/credentials/<credentialId> \
|
||||
-H "Authorization: Bearer <admin_token>"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Read Audit Logs for an Incident
|
||||
|
||||
Query the audit log with date range and agent filter:
|
||||
|
||||
```bash
|
||||
curl "http://<host>/api/v1/audit?agentId=<agentId>&startDate=2026-01-01T00:00:00Z&endDate=2026-01-31T23:59:59Z" \
|
||||
-H "Authorization: Bearer <admin_token>"
|
||||
```
|
||||
|
||||
Events are returned newest-first. Audit log retention is 90 days on the free tier. Each event includes: `eventId`, `agentId`, `action`, `outcome`, `ipAddress`, `userAgent`, `metadata`, `timestamp`.
|
||||
|
||||
Common `action` values: `token.issued`, `token.revoked`, `token.introspected`, `agent.created`, `agent.updated`, `agent.suspended`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `auth.failed`.
|
||||
456
docs/engineering/11-sdk-guide.md
Normal file
456
docs/engineering/11-sdk-guide.md
Normal file
@@ -0,0 +1,456 @@
|
||||
# 11 — SDK Integration Guide
|
||||
|
||||
AgentIdP ships four official client SDKs — Node.js, Python, Go, and Java. All four expose an identical API surface, handle OAuth 2.0 token acquisition automatically, and throw typed errors. This document covers installation, complete working examples, error handling, and the contribution guide for adding new endpoints.
|
||||
|
||||
---
|
||||
|
||||
## 1. SDK Architecture Overview
|
||||
|
||||
Every SDK composes the same four service clients:
|
||||
|
||||
| Service client | Node.js | Python | Go | Java |
|
||||
|---------------|---------|--------|----|------|
|
||||
| Agent Registry | `AgentRegistryClient` | `AgentRegistryClient` | `AgentsClient` | `AgentServiceClient` |
|
||||
| Credential Management | `CredentialClient` | `CredentialClient` | `CredentialsClient` | `CredentialServiceClient` |
|
||||
| Token Operations | `TokenClient` | `TokenClient` | `TokenServiceClient` | `TokenServiceClient` |
|
||||
| Audit Log | `AuditClient` | `AuditClient` | `AuditClient` | `AuditServiceClient` |
|
||||
|
||||
All four SDKs also implement:
|
||||
|
||||
- **`AgentIdPClient`** — the top-level client that composes all four service clients and wires them to a shared `TokenManager`.
|
||||
- **`TokenManager`** — fetches and caches the OAuth 2.0 access token. Automatically requests a new token when the cached one is within 60 seconds of expiry. Thread-safe / goroutine-safe.
|
||||
- **Typed error class** — `AgentIdPError` (Node.js, Python, Go) or `AgentIdPException` (Java) — with `code`, `httpStatus`, and `details` fields.
|
||||
|
||||
This consistency is a maintained standard. When a new API endpoint is added to the server, it must be added to all four SDKs simultaneously.
|
||||
|
||||
---
|
||||
|
||||
## 2. Node.js SDK
|
||||
|
||||
**Install:**
|
||||
|
||||
```bash
|
||||
npm install @sentryagent/idp-sdk
|
||||
```
|
||||
|
||||
**Requirements:** Node.js 18+ (uses native `fetch`).
|
||||
|
||||
**Complete example:**
|
||||
|
||||
```typescript
|
||||
import { AgentIdPClient, AgentIdPError } from '@sentryagent/idp-sdk';
|
||||
|
||||
const client = new AgentIdPClient({
|
||||
baseUrl: 'http://localhost:3000',
|
||||
clientId: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
|
||||
clientSecret: 'sk_live_...',
|
||||
// Optional: restrict scopes. Defaults to all four.
|
||||
// scopes: ['agents:read', 'tokens:read'],
|
||||
});
|
||||
|
||||
// Register a new agent
|
||||
const agent = await client.agents.registerAgent({
|
||||
email: 'classifier-v2@myorg.ai',
|
||||
agentType: 'classifier',
|
||||
version: '2.0.0',
|
||||
capabilities: ['resume:read', 'classify'],
|
||||
owner: 'platform-team',
|
||||
deploymentEnv: 'production',
|
||||
});
|
||||
console.log('Registered:', agent.agentId);
|
||||
|
||||
// List active agents (token acquired automatically)
|
||||
const { data: agents } = await client.agents.listAgents({ status: 'active' });
|
||||
console.log('Active agents:', agents.length);
|
||||
|
||||
// Generate credentials for an agent
|
||||
const cred = await client.credentials.generateCredential(agent.agentId);
|
||||
console.log('Client secret (store this — shown once):', cred.clientSecret);
|
||||
|
||||
// Rotate credentials
|
||||
const newCred = await client.credentials.rotateCredential(agent.agentId, cred.credentialId);
|
||||
console.log('New secret:', newCred.clientSecret);
|
||||
|
||||
// Introspect a token
|
||||
const introspection = await client.tokens.introspectToken('eyJ...');
|
||||
console.log('Active:', introspection.active);
|
||||
|
||||
// Error handling
|
||||
try {
|
||||
await client.agents.getAgent('non-existent-id');
|
||||
} catch (err) {
|
||||
if (err instanceof AgentIdPError) {
|
||||
console.error(err.code); // e.g. AGENT_NOT_FOUND
|
||||
console.error(err.httpStatus); // e.g. 404
|
||||
console.error(err.details); // optional structured context
|
||||
}
|
||||
}
|
||||
|
||||
// Force a fresh token on the next call (e.g. after credential rotation)
|
||||
client.clearTokenCache();
|
||||
```
|
||||
|
||||
**Token manager behaviour:** `TokenManager` in `sdk/src/token-manager.ts` caches the token and requests a new one when fewer than 60 seconds remain before expiry.
|
||||
|
||||
**Service clients are accessible at:**
|
||||
- `client.agents` — `AgentRegistryClient` (register, list, get, update, decommission)
|
||||
- `client.credentials` — `CredentialClient` (generate, list, rotate, revoke)
|
||||
- `client.tokens` — `TokenClient` (introspect, revoke)
|
||||
- `client.audit` — `AuditClient` (query, get event)
|
||||
|
||||
---
|
||||
|
||||
## 3. Python SDK
|
||||
|
||||
**Install:**
|
||||
|
||||
```bash
|
||||
pip install sentryagent-idp
|
||||
```
|
||||
|
||||
**Requirements:** Python 3.9+. Synchronous client uses `requests`; asynchronous client uses `httpx`.
|
||||
|
||||
### Synchronous example
|
||||
|
||||
```python
|
||||
from sentryagent_idp import AgentIdPClient, AgentIdPError, RegisterAgentRequest
|
||||
|
||||
client = AgentIdPClient(
|
||||
base_url="http://localhost:3000",
|
||||
client_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890",
|
||||
client_secret="sk_live_...",
|
||||
# scopes=["agents:read", "tokens:read"], # optional
|
||||
)
|
||||
|
||||
# Register an agent
|
||||
agent = client.agents.register_agent(RegisterAgentRequest(
|
||||
email="screener@myorg.ai",
|
||||
agent_type="screener",
|
||||
version="1.0.0",
|
||||
capabilities=["resume:read"],
|
||||
owner="recruiting-team",
|
||||
deployment_env="production",
|
||||
))
|
||||
print("Registered:", agent.agent_id)
|
||||
|
||||
# List agents
|
||||
result = client.agents.list_agents(status="active", page=1, limit=20)
|
||||
for a in result.data:
|
||||
print(a.agent_id, a.status)
|
||||
|
||||
# Generate credentials
|
||||
cred = client.credentials.generate_credential(agent.agent_id)
|
||||
print("Client secret (shown once):", cred.client_secret)
|
||||
|
||||
# Error handling
|
||||
try:
|
||||
client.agents.get_agent("non-existent-id")
|
||||
except AgentIdPError as e:
|
||||
print(e.code) # e.g. AGENT_NOT_FOUND
|
||||
print(e.http_status) # e.g. 404
|
||||
print(e.details) # optional dict
|
||||
```
|
||||
|
||||
### Asynchronous example
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from sentryagent_idp import AsyncAgentIdPClient, AgentIdPError
|
||||
|
||||
async def main() -> None:
|
||||
client = AsyncAgentIdPClient(
|
||||
base_url="http://localhost:3000",
|
||||
client_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890",
|
||||
client_secret="sk_live_...",
|
||||
)
|
||||
|
||||
result = await client.agents.list_agents(status="active")
|
||||
print(f"Found {result.total} active agents")
|
||||
|
||||
# Rotate a credential
|
||||
new_cred = await client.credentials.rotate_credential(
|
||||
"agent-uuid", "credential-uuid"
|
||||
)
|
||||
print("New secret:", new_cred.client_secret)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
`AsyncAgentIdPClient` uses an `AsyncTokenManager` backed by `httpx.AsyncClient`. Both sync and async clients are available from the `sentryagent_idp` top-level package.
|
||||
|
||||
---
|
||||
|
||||
## 4. Go SDK
|
||||
|
||||
**Install:**
|
||||
|
||||
```bash
|
||||
go get github.com/sentryagent/idp-sdk-go
|
||||
```
|
||||
|
||||
**Requirements:** Go 1.21+.
|
||||
|
||||
**Complete example:**
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log"
|
||||
|
||||
agentidp "github.com/sentryagent/idp-sdk-go"
|
||||
)
|
||||
|
||||
func main() {
|
||||
ctx := context.Background()
|
||||
|
||||
client := agentidp.NewAgentIdPClient(agentidp.AgentIdPClientConfig{
|
||||
BaseURL: "http://localhost:3000",
|
||||
ClientID: "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
|
||||
ClientSecret: "sk_live_...",
|
||||
// Scope: "agents:read agents:write", // optional
|
||||
})
|
||||
|
||||
// Register an agent
|
||||
agent, err := client.Agents.RegisterAgent(ctx, agentidp.RegisterAgentRequest{
|
||||
Email: "screener@myorg.ai",
|
||||
AgentType: "screener",
|
||||
Version: "1.0.0",
|
||||
Capabilities: []string{"resume:read"},
|
||||
Owner: "recruiting-team",
|
||||
DeploymentEnv: "production",
|
||||
})
|
||||
if err != nil {
|
||||
// Type-assert for structured error information
|
||||
var idpErr *agentidp.AgentIdPError
|
||||
if errors.As(err, &idpErr) {
|
||||
log.Fatalf("API error: code=%s status=%d", idpErr.Code, idpErr.HTTPStatus)
|
||||
}
|
||||
log.Fatal(err)
|
||||
}
|
||||
fmt.Println("Registered:", agent.AgentID)
|
||||
|
||||
// List agents with filters
|
||||
list, err := client.Agents.ListAgents(ctx, &agentidp.ListAgentsParams{
|
||||
Status: "active",
|
||||
Page: 1,
|
||||
Limit: 20,
|
||||
})
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
fmt.Printf("Found %d agents\n", list.Total)
|
||||
|
||||
// Generate credentials
|
||||
cred, err := client.Credentials.GenerateCredential(ctx, agent.AgentID, nil)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
fmt.Println("Client secret (shown once):", cred.ClientSecret)
|
||||
|
||||
// Rotate credentials
|
||||
newCred, err := client.Credentials.RotateCredential(ctx, agent.AgentID, cred.CredentialID, nil)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
fmt.Println("New secret:", newCred.ClientSecret)
|
||||
}
|
||||
```
|
||||
|
||||
`context.Context` is the first parameter of every method — use `context.Background()` for simple cases or a derived context with deadline/cancellation for production code. The `TokenManager` is goroutine-safe and the client is safe for concurrent use.
|
||||
|
||||
---
|
||||
|
||||
## 5. Java SDK
|
||||
|
||||
**Maven dependency:**
|
||||
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>ai.sentryagent</groupId>
|
||||
<artifactId>idp-sdk</artifactId>
|
||||
<version>1.0.0</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
**Gradle:**
|
||||
|
||||
```groovy
|
||||
implementation 'ai.sentryagent:idp-sdk:1.0.0'
|
||||
```
|
||||
|
||||
**Requirements:** Java 17+.
|
||||
|
||||
### Synchronous example
|
||||
|
||||
```java
|
||||
import ai.sentryagent.idp.AgentIdPClient;
|
||||
import ai.sentryagent.idp.AgentIdPException;
|
||||
import ai.sentryagent.idp.models.*;
|
||||
|
||||
// Builder pattern — scope is optional (defaults to all four scopes)
|
||||
AgentIdPClient client = new AgentIdPClient(
|
||||
"http://localhost:3000",
|
||||
"a1b2c3d4-e5f6-7890-abcd-ef1234567890",
|
||||
"sk_live_..."
|
||||
);
|
||||
|
||||
// Register an agent
|
||||
Agent agent = client.agents().registerAgent(
|
||||
RegisterAgentRequest.builder()
|
||||
.email("screener@myorg.ai")
|
||||
.agentType("screener")
|
||||
.version("1.0.0")
|
||||
.capabilities(List.of("resume:read"))
|
||||
.owner("recruiting-team")
|
||||
.deploymentEnv("production")
|
||||
.build()
|
||||
);
|
||||
System.out.println("Registered: " + agent.getAgentId());
|
||||
|
||||
// List agents
|
||||
PaginatedAgents result = client.agents().listAgents(
|
||||
ListAgentsParams.builder().status("active").page(1).limit(20).build()
|
||||
);
|
||||
System.out.println("Total: " + result.getTotal());
|
||||
|
||||
// Generate credentials
|
||||
CredentialWithSecret cred = client.credentials().generateCredential(agent.getAgentId());
|
||||
System.out.println("Client secret (shown once): " + cred.getClientSecret());
|
||||
|
||||
// Rotate credentials
|
||||
CredentialWithSecret newCred = client.credentials().rotateCredential(
|
||||
agent.getAgentId(), cred.getCredentialId()
|
||||
);
|
||||
System.out.println("New secret: " + newCred.getClientSecret());
|
||||
|
||||
// Error handling
|
||||
try {
|
||||
client.agents().getAgent("non-existent-id");
|
||||
} catch (AgentIdPException ex) {
|
||||
System.out.printf("code=%s status=%d%n", ex.getCode(), ex.getHttpStatus());
|
||||
// e.g. code=AGENT_NOT_FOUND status=404
|
||||
}
|
||||
```
|
||||
|
||||
### Async example (CompletableFuture)
|
||||
|
||||
```java
|
||||
import java.util.concurrent.CompletableFuture;
|
||||
|
||||
// Every sync method has an async counterpart
|
||||
CompletableFuture<Agent> future = client.agents().getAgentAsync("agent-uuid");
|
||||
future.thenAccept(a -> System.out.println(a.getAgentId()));
|
||||
|
||||
// Compose multiple async calls
|
||||
client.agents().getAgentAsync("agent-uuid")
|
||||
.thenCompose(a -> client.credentials().generateCredentialAsync(a.getAgentId()))
|
||||
.thenAccept(cred -> System.out.println("New secret: " + cred.getClientSecret()))
|
||||
.exceptionally(ex -> {
|
||||
if (ex.getCause() instanceof AgentIdPException idpEx) {
|
||||
System.err.printf("code=%s%n", idpEx.getCode());
|
||||
}
|
||||
return null;
|
||||
});
|
||||
```
|
||||
|
||||
The `TokenManager` is thread-safe. `AgentIdPClient` is safe for concurrent use from multiple threads.
|
||||
|
||||
---
|
||||
|
||||
## 6. SDK Contribution Guide — Adding a New Endpoint
|
||||
|
||||
When the server adds a new API endpoint, update all four SDKs. The checklist below covers each SDK.
|
||||
|
||||
### Node.js SDK (`sdk/`)
|
||||
|
||||
```
|
||||
src/
|
||||
services/
|
||||
agents.ts # AgentRegistryClient
|
||||
credentials.ts # CredentialClient
|
||||
token.ts # TokenClient
|
||||
audit.ts # AuditClient
|
||||
types.ts # All request/response type definitions
|
||||
token-manager.ts # TokenManager
|
||||
client.ts # AgentIdPClient (top-level)
|
||||
errors.ts # AgentIdPError
|
||||
```
|
||||
|
||||
Checklist:
|
||||
- [ ] Add method to the appropriate service client in `src/services/<client>.ts`
|
||||
- [ ] Add TypeScript request/response types in `src/types.ts`
|
||||
- [ ] Add JSDoc with `@param`, `@returns`, and `@throws`
|
||||
- [ ] Add unit test in `tests/<client>.test.ts`
|
||||
- [ ] Verify `npx tsc --strict` exits 0
|
||||
|
||||
### Python SDK (`sdk-python/`)
|
||||
|
||||
```
|
||||
src/sentryagent_idp/
|
||||
services/
|
||||
agents.py # AgentRegistryClient + AsyncAgentRegistryClient
|
||||
credentials.py # CredentialClient + AsyncCredentialClient
|
||||
token.py # TokenClient + AsyncTokenClient
|
||||
audit.py # AuditClient + AsyncAuditClient
|
||||
client.py # AgentIdPClient + AsyncAgentIdPClient
|
||||
token_manager.py # TokenManager (sync)
|
||||
async_token_manager.py # AsyncTokenManager
|
||||
errors.py # AgentIdPError
|
||||
types.py # TypedDict / dataclass definitions
|
||||
```
|
||||
|
||||
Checklist:
|
||||
- [ ] Add method to both the sync and async service clients
|
||||
- [ ] Add type hints (all parameters and return types)
|
||||
- [ ] Verify `mypy --strict` passes
|
||||
- [ ] Add unit test in `tests/`
|
||||
- [ ] Verify `pytest` passes with >80% coverage
|
||||
|
||||
### Go SDK (`sdk-go/`)
|
||||
|
||||
```
|
||||
agentidp/
|
||||
client.go # AgentIdPClient + AgentIdPClientConfig
|
||||
agents.go # AgentsClient
|
||||
credentials.go # CredentialsClient
|
||||
token_service.go # TokenServiceClient
|
||||
audit.go # AuditClient
|
||||
token_manager.go # TokenManager (goroutine-safe)
|
||||
errors.go # AgentIdPError
|
||||
types.go # All request/response struct types
|
||||
request.go # Shared HTTP request helper
|
||||
```
|
||||
|
||||
Checklist:
|
||||
- [ ] Add method to the appropriate `*Client` type
|
||||
- [ ] Use `context.Context` as the first parameter
|
||||
- [ ] Add godoc comment above the method
|
||||
- [ ] Add request/response struct types in `types.go` if needed
|
||||
- [ ] Add unit test in `<file>_test.go`
|
||||
- [ ] Verify `go vet ./... && staticcheck ./...` pass
|
||||
|
||||
### Java SDK (`sdk-java/`)
|
||||
|
||||
```
|
||||
src/main/java/ai/sentryagent/idp/
|
||||
AgentIdPClient.java # Top-level client
|
||||
services/
|
||||
AgentServiceClient.java # Agent Registry
|
||||
CredentialServiceClient.java
|
||||
TokenServiceClient.java
|
||||
AuditServiceClient.java
|
||||
models/ # Request/response POJOs (@JsonProperty)
|
||||
TokenManager.java # Thread-safe token caching
|
||||
AgentIdPException.java # Typed exception
|
||||
```
|
||||
|
||||
Checklist:
|
||||
- [ ] Add sync method to the appropriate service client
|
||||
- [ ] Add `CompletableFuture<T>` async counterpart with the `Async` suffix
|
||||
- [ ] Add request/response POJO in `models/` with `@JsonProperty` annotations
|
||||
- [ ] Add Javadoc on the method
|
||||
- [ ] Add JUnit 5 test in `src/test/java/`
|
||||
- [ ] Verify `mvn verify` passes (compiles, tests, and checks coverage)
|
||||
56
docs/engineering/README.md
Normal file
56
docs/engineering/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# SentryAgent.ai — Engineering Knowledge Base
|
||||
|
||||
> Internal reference for engineers contributing to AgentIdP. Read in order if you're new. Jump to the relevant document if you know what you need.
|
||||
|
||||
---
|
||||
|
||||
## Reading Order (New Engineers Start Here)
|
||||
|
||||
| # | Document | What you'll learn | Time |
|
||||
|---|---------|------------------|------|
|
||||
| 1 | [Company and Product Overview](01-overview.md) | What SentryAgent.ai builds, why it exists, the product feature set, Phase roadmap | 15 min |
|
||||
| 2 | [System Architecture](02-architecture.md) | Component diagram, HTTP request lifecycle, OAuth 2.0 data flow, multi-region topology | 20 min |
|
||||
| 3 | [Technology Stack and ADRs](03-tech-stack.md) | Why each technology was chosen — rationale and alternatives considered | 20 min |
|
||||
| 4 | [Codebase Structure](04-codebase-structure.md) | Directory map, where to add new code, DRY enforcement rules | 15 min |
|
||||
| 5 | [Service Deep Dives](05-services.md) | All 8 services/components — purpose, interface, schema, error types | 30 min |
|
||||
| 6 | [Annotated Code Walkthroughs](06-walkthroughs.md) | Step-by-step traces of token issuance, agent registration, credential rotation | 30 min |
|
||||
| 7 | [Development Environment Setup](07-dev-setup.md) | Clone to running local stack — under 30 minutes | 30 min |
|
||||
| 8 | [Engineering Workflow](08-workflow.md) | OpenSpec spec-first workflow, branching, PR checklist, commit conventions | 20 min |
|
||||
| 9 | [Testing Strategy](09-testing.md) | Unit vs integration, coverage gates, how to write tests, OWASP reference | 20 min |
|
||||
| 10 | [Deployment and Operations](10-deployment.md) | Docker, Terraform, Prometheus/Grafana, operational runbook | 20 min |
|
||||
| 11 | [SDK Integration Guide](11-sdk-guide.md) | All 4 SDKs — installation, examples, contribution guide | 20 min |
|
||||
|
||||
**Total estimated reading time for new engineers: ~3.5 hours**
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| I need to... | Go to |
|
||||
|-------------|-------|
|
||||
| Understand the codebase layout | [04-codebase-structure.md](04-codebase-structure.md) |
|
||||
| Run the project locally | [07-dev-setup.md](07-dev-setup.md) |
|
||||
| Understand how token issuance works end-to-end | [06-walkthroughs.md](06-walkthroughs.md) |
|
||||
| Add a new API endpoint | [08-workflow.md](08-workflow.md) + [04-codebase-structure.md](04-codebase-structure.md) |
|
||||
| Write tests | [09-testing.md](09-testing.md) |
|
||||
| Deploy to production | [10-deployment.md](10-deployment.md) |
|
||||
| Integrate with the SDK | [11-sdk-guide.md](11-sdk-guide.md) |
|
||||
| Understand why a technology was chosen | [03-tech-stack.md](03-tech-stack.md) |
|
||||
|
||||
---
|
||||
|
||||
## Document Conventions
|
||||
|
||||
- **File paths** are always relative to the project root unless otherwise noted.
|
||||
- **Line numbers** in [06-walkthroughs.md](06-walkthroughs.md) were verified against commit `1f95cfe`.
|
||||
- **Code examples** are complete and runnable — no ellipses, no placeholders.
|
||||
- **ADR** stands for Architecture Decision Record — a short document recording a technology choice.
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `docs/developers/` — End-user API reference (for agents calling the AgentIdP API)
|
||||
- `docs/devops/` — Operator runbooks and environment variable reference
|
||||
- `docs/agntcy/` — AGNTCY alignment documentation
|
||||
- `openspec/` — OpenSpec change management (proposals, designs, specs, tasks, archives)
|
||||
548
docs/openapi/compliance.yaml
Normal file
548
docs/openapi/compliance.yaml
Normal file
@@ -0,0 +1,548 @@
|
||||
openapi: 3.0.3
|
||||
|
||||
info:
|
||||
title: SentryAgent.ai — Compliance & SOC 2 Type II Service
|
||||
version: 1.0.0
|
||||
description: |
|
||||
The Compliance Service exposes endpoints supporting SentryAgent.ai's
|
||||
**SOC 2 Type II** audit readiness programme.
|
||||
|
||||
Two categories of control are surfaced:
|
||||
|
||||
**Audit chain verification** (`GET /audit/verify`) — Confirms cryptographic
|
||||
integrity of the immutable audit log chain across an optional date range.
|
||||
This endpoint provides auditors and compliance tooling with a single call to
|
||||
assert that no audit events have been tampered with, deleted, or reordered
|
||||
after initial capture.
|
||||
|
||||
**SOC 2 control status** (`GET /compliance/controls`) — Returns a live status
|
||||
snapshot for each of the five in-scope SOC 2 Trust Services Criteria controls
|
||||
monitored by the platform. Designed as a lightweight, public health-style
|
||||
endpoint so that monitoring infrastructure can poll without bearer credentials.
|
||||
|
||||
**In-scope SOC 2 controls:**
|
||||
| Control ID | Name | Description |
|
||||
|------------|------|-------------|
|
||||
| `CC6.1` | Encryption at Rest | Verifies database and secrets store encryption is active |
|
||||
| `CC6.7` | TLS Enforcement | Confirms TLS 1.2+ is enforced on all inbound connections |
|
||||
| `CC7.2` | Audit Log Integrity | Validates audit chain hash continuity |
|
||||
| `CC9.2` | Secrets Rotation | Checks that all managed secrets are within rotation policy |
|
||||
| `CC7.1` | Webhook Dead-Letter Monitoring | Asserts dead-letter queue depth is within threshold |
|
||||
|
||||
**Required scope (audit chain verify only):** `audit:read`
|
||||
|
||||
servers:
|
||||
- url: http://localhost:3000/api/v1
|
||||
description: Local development server
|
||||
- url: https://api.sentryagent.ai/v1
|
||||
description: Production server
|
||||
|
||||
tags:
|
||||
- name: Audit Chain
|
||||
description: Cryptographic integrity verification of the immutable audit event chain
|
||||
- name: Compliance Controls
|
||||
description: SOC 2 Type II control status — public health-style monitoring endpoint
|
||||
|
||||
components:
|
||||
securitySchemes:
|
||||
BearerAuth:
|
||||
type: http
|
||||
scheme: bearer
|
||||
bearerFormat: JWT
|
||||
description: |
|
||||
JWT access token with `audit:read` scope, obtained via `POST /token`.
|
||||
Include as: `Authorization: Bearer <token>`
|
||||
|
||||
schemas:
|
||||
ChainVerificationResult:
|
||||
type: object
|
||||
description: |
|
||||
Result of an audit event chain integrity verification run.
|
||||
|
||||
The audit log is structured as a hash-linked chain. Each event stores a
|
||||
reference to the hash of the preceding event. `verified: true` means every
|
||||
event in the requested window was checked and no breaks in the chain were
|
||||
detected.
|
||||
|
||||
When `verified` is `false`, `brokenAtEventId` identifies the first event
|
||||
where the chain integrity check failed, enabling targeted forensic investigation.
|
||||
required:
|
||||
- verified
|
||||
- checkedCount
|
||||
- brokenAtEventId
|
||||
properties:
|
||||
verified:
|
||||
type: boolean
|
||||
description: >
|
||||
`true` if every audit event in the checked range maintains an unbroken
|
||||
cryptographic hash chain; `false` if at least one chain break was detected.
|
||||
example: true
|
||||
checkedCount:
|
||||
type: integer
|
||||
description: Total number of audit events examined during this verification run.
|
||||
minimum: 0
|
||||
example: 2847
|
||||
brokenAtEventId:
|
||||
type: string
|
||||
format: uuid
|
||||
nullable: true
|
||||
description: >
|
||||
UUID of the first audit event where chain continuity failed, or `null`
|
||||
when `verified` is `true`. Only the first detected break is reported;
|
||||
subsequent events are not checked after a break is found.
|
||||
example: null
|
||||
fromDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: >
|
||||
The ISO 8601 lower bound of the date range that was verified.
|
||||
Present only when a `fromDate` query parameter was supplied.
|
||||
example: "2026-03-01T00:00:00.000Z"
|
||||
toDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: >
|
||||
The ISO 8601 upper bound of the date range that was verified.
|
||||
Present only when a `toDate` query parameter was supplied.
|
||||
example: "2026-03-31T23:59:59.999Z"
|
||||
|
||||
ControlStatus:
|
||||
type: string
|
||||
description: Operational status of a SOC 2 control at the time of the last check.
|
||||
enum:
|
||||
- passing
|
||||
- failing
|
||||
- unknown
|
||||
example: passing
|
||||
|
||||
ComplianceControl:
|
||||
type: object
|
||||
description: Status record for a single SOC 2 Trust Services Criteria control.
|
||||
required:
|
||||
- id
|
||||
- name
|
||||
- status
|
||||
- lastChecked
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
description: SOC 2 Trust Services Criteria control identifier.
|
||||
enum:
|
||||
- CC6.1
|
||||
- CC6.7
|
||||
- CC7.2
|
||||
- CC9.2
|
||||
- CC7.1
|
||||
example: "CC6.1"
|
||||
name:
|
||||
type: string
|
||||
description: Human-readable name of the control.
|
||||
example: "Encryption at Rest"
|
||||
status:
|
||||
$ref: '#/components/schemas/ControlStatus'
|
||||
lastChecked:
|
||||
type: string
|
||||
format: date-time
|
||||
description: ISO 8601 timestamp of the most recent automated check for this control.
|
||||
example: "2026-03-31T06:00:00.000Z"
|
||||
|
||||
ComplianceControlsResponse:
|
||||
type: object
|
||||
description: SOC 2 compliance control status summary for all in-scope controls.
|
||||
required:
|
||||
- controls
|
||||
properties:
|
||||
controls:
|
||||
type: array
|
||||
description: Status record for each of the five in-scope SOC 2 controls.
|
||||
minItems: 5
|
||||
maxItems: 5
|
||||
items:
|
||||
$ref: '#/components/schemas/ComplianceControl'
|
||||
example:
|
||||
- id: "CC6.1"
|
||||
name: "Encryption at Rest"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC6.7"
|
||||
name: "TLS Enforcement"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.2"
|
||||
name: "Audit Log Integrity"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC9.2"
|
||||
name: "Secrets Rotation"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.1"
|
||||
name: "Webhook Dead-Letter Monitoring"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
|
||||
ErrorResponse:
|
||||
type: object
|
||||
description: Standard error response envelope used across all SentryAgent.ai APIs.
|
||||
required:
|
||||
- code
|
||||
- message
|
||||
properties:
|
||||
code:
|
||||
type: string
|
||||
description: Machine-readable error code.
|
||||
example: "UNAUTHORIZED"
|
||||
message:
|
||||
type: string
|
||||
description: Human-readable description of the error.
|
||||
example: "A valid Bearer token is required."
|
||||
details:
|
||||
type: object
|
||||
description: Optional structured details providing additional context.
|
||||
additionalProperties: true
|
||||
example: {}
|
||||
|
||||
responses:
|
||||
Unauthorized:
|
||||
description: Missing or invalid Bearer token.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "UNAUTHORIZED"
|
||||
message: "A valid Bearer token is required to access this resource."
|
||||
|
||||
Forbidden:
|
||||
description: Valid token but insufficient permissions. Requires `audit:read` scope.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "INSUFFICIENT_SCOPE"
|
||||
message: "The 'audit:read' scope is required to verify the audit chain."
|
||||
|
||||
TooManyRequests:
|
||||
description: |
|
||||
Rate limit exceeded. Retry after the reset time indicated in `X-RateLimit-Reset`.
|
||||
headers:
|
||||
X-RateLimit-Limit:
|
||||
schema:
|
||||
type: integer
|
||||
description: Maximum requests allowed per minute.
|
||||
example: 30
|
||||
X-RateLimit-Remaining:
|
||||
schema:
|
||||
type: integer
|
||||
description: Requests remaining in the current window.
|
||||
example: 0
|
||||
X-RateLimit-Reset:
|
||||
schema:
|
||||
type: integer
|
||||
description: Unix timestamp when the rate limit window resets.
|
||||
example: 1743155400
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "RATE_LIMIT_EXCEEDED"
|
||||
message: "Too many requests. Please retry after the rate limit window resets."
|
||||
|
||||
InternalServerError:
|
||||
description: Unexpected server error.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "INTERNAL_SERVER_ERROR"
|
||||
message: "An unexpected error occurred. Please try again later."
|
||||
|
||||
paths:
|
||||
/audit/verify:
|
||||
get:
|
||||
operationId: verifyAuditChain
|
||||
tags:
|
||||
- Audit Chain
|
||||
summary: Verify audit log chain integrity
|
||||
description: |
|
||||
Triggers a full integrity verification pass over the immutable audit event
|
||||
chain. Each event in the log contains a cryptographic hash of the previous
|
||||
event; this endpoint traverses the chain and confirms no breaks exist.
|
||||
|
||||
**Use cases:**
|
||||
- Auditor evidence collection for SOC 2 Type II assessment
|
||||
- Continuous compliance monitoring (cron-driven)
|
||||
- Incident response — confirm audit log has not been tampered with
|
||||
|
||||
**Requires:** Bearer token with `audit:read` scope.
|
||||
|
||||
**Rate limit:** 30 requests/minute per `client_id`. Audit chain verification
|
||||
is a computationally intensive operation and is rate-limited more aggressively
|
||||
than standard read endpoints. For continuous monitoring, poll no more than
|
||||
once per minute.
|
||||
|
||||
**Date range filtering:** Supply `fromDate` and/or `toDate` to restrict
|
||||
verification to a specific window. When omitted, the entire retained audit
|
||||
log is verified. `fromDate` must be before or equal to `toDate` when both
|
||||
are provided.
|
||||
|
||||
**Result interpretation:**
|
||||
- `verified: true` — chain is intact across all checked events
|
||||
- `verified: false` — at least one chain break detected; `brokenAtEventId`
|
||||
identifies the first affected event
|
||||
security:
|
||||
- BearerAuth: []
|
||||
parameters:
|
||||
- name: fromDate
|
||||
in: query
|
||||
description: |
|
||||
ISO 8601 date-time lower bound for the verification window (inclusive).
|
||||
When omitted, verification starts from the earliest available audit event.
|
||||
Must be before or equal to `toDate` when both are supplied.
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
format: date-time
|
||||
example: "2026-03-01T00:00:00.000Z"
|
||||
- name: toDate
|
||||
in: query
|
||||
description: |
|
||||
ISO 8601 date-time upper bound for the verification window (inclusive).
|
||||
When omitted, verification runs up to and including the most recent
|
||||
audit event. Must be after or equal to `fromDate` when both are supplied.
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
format: date-time
|
||||
example: "2026-03-31T23:59:59.999Z"
|
||||
responses:
|
||||
'200':
|
||||
description: |
|
||||
Audit chain verification completed. Inspect `verified` to determine
|
||||
whether chain integrity is intact. A `200` is returned regardless of
|
||||
whether verification passed or failed — check the response body.
|
||||
headers:
|
||||
X-RateLimit-Limit:
|
||||
schema:
|
||||
type: integer
|
||||
description: Maximum requests allowed per minute for this endpoint.
|
||||
example: 30
|
||||
X-RateLimit-Remaining:
|
||||
schema:
|
||||
type: integer
|
||||
description: Requests remaining in the current rate limit window.
|
||||
example: 29
|
||||
X-RateLimit-Reset:
|
||||
schema:
|
||||
type: integer
|
||||
description: Unix timestamp when the rate limit window resets.
|
||||
example: 1743155400
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ChainVerificationResult'
|
||||
examples:
|
||||
chainIntact:
|
||||
summary: Verification passed — chain is intact
|
||||
value:
|
||||
verified: true
|
||||
checkedCount: 2847
|
||||
brokenAtEventId: null
|
||||
fromDate: "2026-03-01T00:00:00.000Z"
|
||||
toDate: "2026-03-31T23:59:59.999Z"
|
||||
chainBroken:
|
||||
summary: Verification failed — chain break detected
|
||||
value:
|
||||
verified: false
|
||||
checkedCount: 1203
|
||||
brokenAtEventId: "c4d5e6f7-a8b9-0123-cdef-456789012345"
|
||||
fromDate: "2026-03-01T00:00:00.000Z"
|
||||
toDate: "2026-03-31T23:59:59.999Z"
|
||||
noDateRange:
|
||||
summary: Full log verified (no date range supplied)
|
||||
value:
|
||||
verified: true
|
||||
checkedCount: 18504
|
||||
brokenAtEventId: null
|
||||
'400':
|
||||
description: Invalid query parameter value or date range.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
examples:
|
||||
invalidFromDate:
|
||||
summary: fromDate is not a valid ISO 8601 date-time
|
||||
value:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "Invalid query parameter value."
|
||||
details:
|
||||
field: "fromDate"
|
||||
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-01T00:00:00.000Z)."
|
||||
invalidToDate:
|
||||
summary: toDate is not a valid ISO 8601 date-time
|
||||
value:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "Invalid query parameter value."
|
||||
details:
|
||||
field: "toDate"
|
||||
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-31T23:59:59.999Z)."
|
||||
invalidDateRange:
|
||||
summary: fromDate is after toDate
|
||||
value:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "Invalid date range."
|
||||
details:
|
||||
reason: "fromDate must be before or equal to toDate."
|
||||
'401':
|
||||
$ref: '#/components/responses/Unauthorized'
|
||||
'403':
|
||||
$ref: '#/components/responses/Forbidden'
|
||||
'429':
|
||||
$ref: '#/components/responses/TooManyRequests'
|
||||
'500':
|
||||
$ref: '#/components/responses/InternalServerError'
|
||||
|
||||
/compliance/controls:
|
||||
get:
|
||||
operationId: getComplianceControls
|
||||
tags:
|
||||
- Compliance Controls
|
||||
summary: Get SOC 2 control status summary
|
||||
description: |
|
||||
Returns a live status snapshot for each of the five in-scope SOC 2 Type II
|
||||
Trust Services Criteria controls monitored by the SentryAgent.ai platform.
|
||||
|
||||
**No authentication required.** This endpoint is intentionally public
|
||||
(analogous to a health check) so that external monitoring infrastructure,
|
||||
status pages, and audit tooling can poll it without bearer credentials.
|
||||
|
||||
**Controls monitored:**
|
||||
| Control ID | Name | What is checked |
|
||||
|------------|------|-----------------|
|
||||
| `CC6.1` | Encryption at Rest | Database and secrets store encryption is active and configured |
|
||||
| `CC6.7` | TLS Enforcement | TLS 1.2+ is enforced on all platform inbound connections |
|
||||
| `CC7.2` | Audit Log Integrity | Audit chain hash continuity — shorthand of `/audit/verify` |
|
||||
| `CC9.2` | Secrets Rotation | All managed secrets are within the rotation policy window |
|
||||
| `CC7.1` | Webhook Dead-Letter Monitoring | Dead-letter queue depth is within the acceptable threshold |
|
||||
|
||||
**Status values:**
|
||||
- `passing` — control is operating within policy
|
||||
- `failing` — control has breached policy; immediate attention required
|
||||
- `unknown` — automated check could not complete (e.g. dependency unavailable)
|
||||
|
||||
**Caching note:** Responses may be cached for up to 60 seconds by
|
||||
intermediate proxies. The `lastChecked` field on each control indicates
|
||||
the timestamp of the most recent automated evaluation.
|
||||
|
||||
**Rate limit:** 120 requests/minute per IP address.
|
||||
security: []
|
||||
responses:
|
||||
'200':
|
||||
description: SOC 2 control status summary returned successfully.
|
||||
headers:
|
||||
Cache-Control:
|
||||
schema:
|
||||
type: string
|
||||
description: >
|
||||
Downstream caches may serve this response for up to 60 seconds.
|
||||
example: "public, max-age=60"
|
||||
X-RateLimit-Limit:
|
||||
schema:
|
||||
type: integer
|
||||
description: Maximum requests allowed per minute for this endpoint.
|
||||
example: 120
|
||||
X-RateLimit-Remaining:
|
||||
schema:
|
||||
type: integer
|
||||
description: Requests remaining in the current rate limit window.
|
||||
example: 119
|
||||
X-RateLimit-Reset:
|
||||
schema:
|
||||
type: integer
|
||||
description: Unix timestamp when the rate limit window resets.
|
||||
example: 1743155400
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ComplianceControlsResponse'
|
||||
examples:
|
||||
allPassing:
|
||||
summary: All controls passing
|
||||
value:
|
||||
controls:
|
||||
- id: "CC6.1"
|
||||
name: "Encryption at Rest"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC6.7"
|
||||
name: "TLS Enforcement"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.2"
|
||||
name: "Audit Log Integrity"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC9.2"
|
||||
name: "Secrets Rotation"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.1"
|
||||
name: "Webhook Dead-Letter Monitoring"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
oneControlFailing:
|
||||
summary: One control failing (secrets rotation overdue)
|
||||
value:
|
||||
controls:
|
||||
- id: "CC6.1"
|
||||
name: "Encryption at Rest"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC6.7"
|
||||
name: "TLS Enforcement"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.2"
|
||||
name: "Audit Log Integrity"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC9.2"
|
||||
name: "Secrets Rotation"
|
||||
status: "failing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.1"
|
||||
name: "Webhook Dead-Letter Monitoring"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
unknownControl:
|
||||
summary: One control in unknown state (dependency unavailable)
|
||||
value:
|
||||
controls:
|
||||
- id: "CC6.1"
|
||||
name: "Encryption at Rest"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC6.7"
|
||||
name: "TLS Enforcement"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.2"
|
||||
name: "Audit Log Integrity"
|
||||
status: "unknown"
|
||||
lastChecked: "2026-03-31T05:00:00.000Z"
|
||||
- id: "CC9.2"
|
||||
name: "Secrets Rotation"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
- id: "CC7.1"
|
||||
name: "Webhook Dead-Letter Monitoring"
|
||||
status: "passing"
|
||||
lastChecked: "2026-03-31T06:00:00.000Z"
|
||||
'429':
|
||||
$ref: '#/components/responses/TooManyRequests'
|
||||
'500':
|
||||
$ref: '#/components/responses/InternalServerError'
|
||||
226
monitoring/grafana/dashboards/agentidp.json
Normal file
226
monitoring/grafana/dashboards/agentidp.json
Normal file
@@ -0,0 +1,226 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": { "type": "grafana", "uid": "-- Grafana --" },
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"description": "SentryAgent.ai AgentIdP — Application Overview",
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"panels": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 }
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
|
||||
"id": 1,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "rate(agentidp_tokens_issued_total[1m])",
|
||||
"legendFormat": "scope={{ scope }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Tokens Issued / min",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 }
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
|
||||
"id": 2,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "rate(agentidp_agents_registered_total[1m])",
|
||||
"legendFormat": "env={{ deployment_env }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Agents Registered / min",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 }
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 8 },
|
||||
"id": 3,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "rate(agentidp_http_requests_total[1m])",
|
||||
"legendFormat": "{{ method }} {{ route }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "HTTP Request Rate / min",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 },
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "color": "green", "value": null },
|
||||
{ "color": "red", "value": 0.01 }
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
|
||||
"id": 4,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "rate(agentidp_http_requests_total{status_code=~\"5..\"}[1m])",
|
||||
"legendFormat": "{{ method }} {{ route }} {{ status_code }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "HTTP Error Rate (5xx)",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 },
|
||||
"unit": "s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 16 },
|
||||
"id": 5,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "histogram_quantile(0.99, rate(agentidp_http_request_duration_seconds_bucket[5m]))",
|
||||
"legendFormat": "p99 {{ method }} {{ route }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "HTTP P99 Latency",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 },
|
||||
"unit": "s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 },
|
||||
"id": 6,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "histogram_quantile(0.95, rate(agentidp_db_query_duration_seconds_bucket[5m]))",
|
||||
"legendFormat": "p95 {{ operation }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "DB Query P95 Latency",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": { "mode": "palette-classic" },
|
||||
"custom": { "lineWidth": 2, "fillOpacity": 10 },
|
||||
"unit": "s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 },
|
||||
"id": 7,
|
||||
"options": {
|
||||
"legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom" },
|
||||
"tooltip": { "mode": "multi" }
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": { "type": "prometheus", "uid": "prometheus" },
|
||||
"expr": "histogram_quantile(0.95, rate(agentidp_redis_command_duration_seconds_bucket[5m]))",
|
||||
"legendFormat": "p95 {{ command }}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Redis Command P95 Latency",
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 39,
|
||||
"tags": ["agentidp", "sentryagent"],
|
||||
"templating": { "list": [] },
|
||||
"time": { "from": "now-1h", "to": "now" },
|
||||
"timepicker": {},
|
||||
"timezone": "browser",
|
||||
"title": "SentryAgent.ai — AgentIdP",
|
||||
"uid": "agentidp-overview",
|
||||
"version": 1,
|
||||
"weekStart": ""
|
||||
}
|
||||
11
monitoring/grafana/provisioning/dashboards/provider.yml
Normal file
11
monitoring/grafana/provisioning/dashboards/provider.yml
Normal file
@@ -0,0 +1,11 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: AgentIdP
|
||||
orgId: 1
|
||||
folder: AgentIdP
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 10
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
@@ -0,0 +1,9 @@
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: false
|
||||
50
monitoring/prometheus/alerts.yml
Normal file
50
monitoring/prometheus/alerts.yml
Normal file
@@ -0,0 +1,50 @@
|
||||
groups:
|
||||
- name: agentidp_alerts
|
||||
rules:
|
||||
- alert: AuthFailureSpike
|
||||
expr: rate(agentidp_http_requests_total{status_code="401"}[5m]) > 0.5
|
||||
for: 2m
|
||||
labels: { severity: warning }
|
||||
annotations:
|
||||
summary: "Auth failure spike detected"
|
||||
description: "More than 0.5 auth failures/sec over the past 2 minutes."
|
||||
|
||||
- alert: RateLimitExhaustion
|
||||
expr: rate(agentidp_http_requests_total{status_code="429"}[5m]) > 0.2
|
||||
for: 2m
|
||||
labels: { severity: warning }
|
||||
annotations:
|
||||
summary: "Rate limit exhaustion spike"
|
||||
description: "Sustained rate limit rejections over the past 2 minutes."
|
||||
|
||||
- alert: AnomalousTokenIssuance
|
||||
expr: rate(agentidp_tokens_issued_total[5m]) > 10
|
||||
for: 5m
|
||||
labels: { severity: warning }
|
||||
annotations:
|
||||
summary: "Anomalous token issuance rate"
|
||||
description: "More than 10 tokens/sec issued over the past 5 minutes."
|
||||
|
||||
- alert: WebhookDeadLetterAccumulating
|
||||
expr: increase(agentidp_webhook_dead_letters_total[1h]) > 10
|
||||
for: 0m
|
||||
labels: { severity: critical }
|
||||
annotations:
|
||||
summary: "Webhook dead-letter accumulation"
|
||||
description: "More than 10 webhook deliveries moved to dead-letter in the past hour."
|
||||
|
||||
- alert: AuditChainIntegrityFailed
|
||||
expr: agentidp_audit_chain_integrity == 0
|
||||
for: 0m
|
||||
labels: { severity: critical }
|
||||
annotations:
|
||||
summary: "Audit chain integrity failure"
|
||||
description: "Audit chain verification failed — possible log tampering detected."
|
||||
|
||||
- alert: CredentialExpiryApproaching
|
||||
expr: increase(agentidp_credentials_expiring_soon_total[1h]) > 0
|
||||
for: 0m
|
||||
labels: { severity: info }
|
||||
annotations:
|
||||
summary: "Credentials expiring soon"
|
||||
description: "One or more agent credentials will expire within 7 days."
|
||||
13
monitoring/prometheus/prometheus.yml
Normal file
13
monitoring/prometheus/prometheus.yml
Normal file
@@ -0,0 +1,13 @@
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
rule_files:
|
||||
- alerts.yml
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'agentidp'
|
||||
static_configs:
|
||||
- targets: ['agentidp:3000']
|
||||
metrics_path: /metrics
|
||||
scheme: http
|
||||
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-29
|
||||
105
openspec/changes/archive/2026-03-29-engineering-docs/design.md
Normal file
105
openspec/changes/archive/2026-03-29-engineering-docs/design.md
Normal file
@@ -0,0 +1,105 @@
|
||||
## Context
|
||||
|
||||
SentryAgent.ai has completed Phase 1 (MVP) and Phase 2 (Production-Ready), producing a fully implemented AgentIdP with 12 capabilities across ~150 source files, 4 language SDKs, Terraform infrastructure, and a React web dashboard. The codebase is mature but undocumented at the engineering level — there are bedroom developer guides (`docs/developers/`) and DevOps guides (`docs/devops/`), but no structured internal engineering knowledge base.
|
||||
|
||||
New hires arrive with BSc Computer Science and one year of industrial experience. They understand programming fundamentals and have worked on codebases before, but they have no context on: what SentryAgent.ai is building, why architectural decisions were made, how the codebase is structured, how to navigate the services, how to contribute per our standards, or how the OpenSpec workflow operates. Without documentation, onboarding is fragmented and relies entirely on the CTO's time.
|
||||
|
||||
The goal is a `docs/engineering/` directory that a new engineer can read sequentially from top to bottom and arrive ready to contribute within their first week.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Produce a complete top-down engineering knowledge base readable in sequence
|
||||
- Cover all 10 capability areas identified in the proposal
|
||||
- Calibrate depth for BSc + 1yr experience — assume programming competence, explain domain and architectural decisions
|
||||
- Every document is self-contained with internal cross-links where needed
|
||||
- All code examples are complete and runnable (no ellipses, no `// ... rest of code`)
|
||||
- Development environment setup is achievable in under 30 minutes following the guide alone
|
||||
- Annotated walkthroughs trace the three critical flows through every layer of code with file:line references
|
||||
|
||||
**Non-Goals:**
|
||||
- Not a replacement for `docs/developers/` (end-user API reference) or `docs/devops/` (operator runbooks)
|
||||
- Not a tutorial for learning TypeScript, React, or Terraform — assumes language competence
|
||||
- Not a complete API reference — `docs/developers/api-reference.md` already covers that
|
||||
- Not roadmap documentation — focuses on what is built, not what is planned
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1: Location — `docs/engineering/` as a flat directory with an index
|
||||
|
||||
**Decision**: All engineering docs live in `docs/engineering/` as flat markdown files with a `README.md` index.
|
||||
|
||||
**Rationale**: Deep nested directory structures create navigation friction. Flat layout with numbered filenames (`01-overview.md`, `02-architecture.md`) ensures reading order is obvious without needing a build tool. Gitea renders markdown natively, so no documentation site tooling is required.
|
||||
|
||||
**Alternatives considered**:
|
||||
- `docs/engineering/<subdirs>/` — rejected: adds navigation complexity with no benefit at our current document count
|
||||
- Docusaurus site — rejected: adds build infrastructure overhead; plain markdown in-repo is sufficient and always in sync with code
|
||||
|
||||
---
|
||||
|
||||
### D2: Numbered file naming for enforced reading order
|
||||
|
||||
**Decision**: Files are named `01-overview.md` through `10-sdk-guide.md`.
|
||||
|
||||
**Rationale**: New engineers need a guided path, not a reference library. Numbers make the intended reading sequence unambiguous without any tooling. The `README.md` index maps numbers to sections.
|
||||
|
||||
---
|
||||
|
||||
### D3: Annotated walkthroughs use file:line references
|
||||
|
||||
**Decision**: Code walkthrough documents reference actual source files with line numbers (e.g., `src/controllers/agentController.ts:45`).
|
||||
|
||||
**Rationale**: Engineers with 1yr experience learn fastest by reading real code, not simplified pseudocode. File:line references let them jump directly to the relevant section in their editor or on Gitea.
|
||||
|
||||
**Trade-off**: Line numbers drift as code changes. Mitigation: walkthrough documents include a "last verified" version comment and note which commit they were verified against. The CTO adds walkthrough review to the Phase 3 change process as a maintenance item.
|
||||
|
||||
---
|
||||
|
||||
### D4: Three walkthroughs selected by criticality and complexity
|
||||
|
||||
**Decision**: Walkthroughs cover: (1) OAuth 2.0 token issuance, (2) agent registration, (3) credential rotation.
|
||||
|
||||
**Rationale**:
|
||||
- Token issuance is the highest-traffic path and touches the most layers (controller → service → repository → Redis → JWT signing)
|
||||
- Agent registration is the entry point for all users and demonstrates the full validation + persistence + audit pattern
|
||||
- Credential rotation demonstrates the Vault integration path and shows how Phase 2 extended Phase 1 patterns
|
||||
|
||||
These three flows collectively exercise every architectural layer and every major design pattern in the codebase.
|
||||
|
||||
---
|
||||
|
||||
### D5: Service deep-dives use a consistent template
|
||||
|
||||
**Decision**: Each service deep-dive follows the structure: Purpose → Responsibility boundary → Interface → Key methods → Database schema (if applicable) → Error types → Configuration.
|
||||
|
||||
**Rationale**: Consistency reduces cognitive load. An engineer who has read the AgentService deep-dive knows exactly where to look for the same information in the OAuth2Service deep-dive. The template mirrors SOLID's Single Responsibility — each section answers one question.
|
||||
|
||||
---
|
||||
|
||||
### D6: Engineering workflow doc is prescriptive, not descriptive
|
||||
|
||||
**Decision**: The workflow guide tells engineers exactly what to do step by step, not just what the process is.
|
||||
|
||||
**Rationale**: Engineers with 1yr experience have worked in teams but may not have used a spec-first workflow before. A prescriptive guide ("Step 1: run `openspec new change <name>`") reduces ambiguity and enforces our standards from day one.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
**[Line numbers drift as code evolves]** → Walkthroughs include a "last verified against commit X" header. The CTO assigns a quarterly walkthrough review task in each Phase change.
|
||||
|
||||
**[Docs can become stale if not maintained]** → Each document has a "Last updated" field in its header. The engineering workflow guide explicitly requires updating relevant engineering docs as part of any PR that changes architecture or public service interfaces.
|
||||
|
||||
**[Scope is large — ~15 documents, ~10,000 lines]** → Tasks are broken into discrete documents, each independently completable. No document depends on another being written first (only the index depends on all others).
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. Create `docs/engineering/` directory
|
||||
2. Write all 15 documents (10 capability areas, some split across multiple files)
|
||||
3. Write `docs/engineering/README.md` index with links and reading order
|
||||
4. Commit all to `develop` in a single commit
|
||||
5. No existing documentation is modified or removed
|
||||
|
||||
No rollback required — this is additive only.
|
||||
|
||||
## Open Questions
|
||||
|
||||
_(none — all decisions made above; scope fully defined in proposal)_
|
||||
@@ -0,0 +1,42 @@
|
||||
## Why
|
||||
|
||||
SentryAgent.ai is growing and hiring engineers with BSc Computer Science and one year of industrial experience. There are currently no internal engineering documents that explain how the system works from the top down — new engineers have no structured path from product vision to running code, and no reference for how to contribute correctly. This gap slows onboarding, increases mistakes, and risks divergence from our architecture and standards.
|
||||
|
||||
## What Changes
|
||||
|
||||
- New `docs/engineering/` directory added to the repository as the canonical engineering knowledge base
|
||||
- Top-down documentation suite covering all layers of the system: company vision → architecture → codebase → services → workflows → operations
|
||||
- Annotated code walkthroughs for the three most critical system flows (token issuance, agent registration, credential rotation)
|
||||
- Development environment setup guide targeting < 30 minutes from clone to running local stack
|
||||
- Engineering workflow guide covering the full OpenSpec → Architect → Developer → QA → merge cycle
|
||||
- Service deep-dive documents for all 8 core services/components
|
||||
- SDK integration guide covering all four language SDKs
|
||||
- Testing strategy and quality gate reference
|
||||
- Deployment and operations reference covering Docker, Terraform, and monitoring
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
|
||||
- `engineering-overview`: Company mission, product vision, system purpose, and how the engineering team operates — the entry point for all new hires
|
||||
- `architecture-guide`: System architecture including component diagram, data flow diagrams, deployment topology, and technology stack rationale (ADRs)
|
||||
- `codebase-structure`: Annotated directory map explaining every top-level directory and key file, what lives where and why
|
||||
- `service-deep-dives`: Per-service documentation for AgentService, OAuth2Service, CredentialService, AuditService, VaultClient, OPA policy engine, Web Dashboard, and Prometheus/Grafana monitoring
|
||||
- `code-walkthroughs`: Step-by-step annotated traces of the three critical flows: token issuance end-to-end, agent registration end-to-end, credential rotation end-to-end
|
||||
- `dev-environment-setup`: Local development environment setup — prerequisites, clone, configure, Docker Compose up, smoke test — targeting < 30 minutes
|
||||
- `engineering-workflow`: How to contribute — OpenSpec spec-first workflow, branching strategy, PR standards, quality gates, and the role of each virtual engineering team member
|
||||
- `testing-strategy`: Test framework, test types (unit vs integration), coverage gates, how to run tests, and how to write new tests following project conventions
|
||||
- `deployment-operations`: Docker build and run, Terraform multi-region deployment, environment configuration, Prometheus/Grafana monitoring, and operational runbooks
|
||||
- `sdk-guide`: Integration guide for Node.js, Python, Go, and Java SDKs — installation, authentication, all major operations, error handling
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
_(none — this change adds documentation only; no existing spec-level behavior changes)_
|
||||
|
||||
## Impact
|
||||
|
||||
- **Repository**: New `docs/engineering/` directory (~15 documents, ~10,000 lines of markdown)
|
||||
- **No code changes**: Documentation only — zero impact on `src/`, `tests/`, `sdk/`, or infrastructure
|
||||
- **Dependencies**: None — no new packages required
|
||||
- **APIs**: No API changes
|
||||
- **Existing docs**: `docs/developers/` (bedroom developer guide) and `docs/devops/` (operations) remain unchanged; this is an additive engineering-internal knowledge base
|
||||
@@ -0,0 +1,35 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: System architecture document
|
||||
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
|
||||
|
||||
#### Scenario: Component diagram present
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
|
||||
|
||||
#### Scenario: Request lifecycle explained
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
|
||||
|
||||
#### Scenario: Data flow for authentication described
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
|
||||
|
||||
#### Scenario: Deployment topology covered
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
|
||||
|
||||
### Requirement: Technology stack and ADR document
|
||||
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
|
||||
|
||||
#### Scenario: Every major technology documented with rationale
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
|
||||
|
||||
#### Scenario: TypeScript strict mode rationale explained
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
|
||||
|
||||
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)
|
||||
@@ -0,0 +1,27 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Annotated code walkthrough documents
|
||||
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
|
||||
|
||||
#### Scenario: Token issuance walkthrough complete
|
||||
- **WHEN** a new engineer reads the token issuance walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number where it occurs
|
||||
|
||||
#### Scenario: Agent registration walkthrough complete
|
||||
- **WHEN** a new engineer reads the agent registration walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Credential rotation walkthrough complete
|
||||
- **WHEN** a new engineer reads the credential rotation walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Walkthroughs include version reference
|
||||
- **WHEN** a new engineer reads any walkthrough
|
||||
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
|
||||
|
||||
#### Scenario: Each walkthrough annotates why, not just what
|
||||
- **WHEN** a new engineer reads a walkthrough step
|
||||
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before
|
||||
@@ -0,0 +1,24 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Codebase structure document
|
||||
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
|
||||
|
||||
#### Scenario: Full directory tree annotated
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
|
||||
|
||||
#### Scenario: src/ subdirectory roles explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
|
||||
|
||||
#### Scenario: Where to add new code explained
|
||||
- **WHEN** a new engineer needs to add a new feature
|
||||
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
|
||||
|
||||
#### Scenario: Key files identified and explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
|
||||
|
||||
#### Scenario: DRY principle mapped to structure
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Deployment and operations guide
|
||||
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
|
||||
|
||||
#### Scenario: Docker build and run documented
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
|
||||
|
||||
#### Scenario: Environment variables fully documented
|
||||
- **WHEN** a new engineer needs to configure the application
|
||||
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
|
||||
|
||||
#### Scenario: Database migrations documented
|
||||
- **WHEN** a new engineer needs to run or write migrations
|
||||
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
|
||||
|
||||
#### Scenario: Terraform multi-region deployment explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
|
||||
|
||||
#### Scenario: Prometheus metrics and Grafana explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
|
||||
|
||||
#### Scenario: Operational runbook for common tasks
|
||||
- **WHEN** a new engineer is on-call or supporting operations
|
||||
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Development environment setup guide
|
||||
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
|
||||
|
||||
#### Scenario: Prerequisites listed completely
|
||||
- **WHEN** a new engineer reads 07-dev-setup.md
|
||||
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
|
||||
|
||||
#### Scenario: Repository clone and setup steps complete
|
||||
- **WHEN** a new engineer follows the clone and setup steps
|
||||
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
|
||||
|
||||
#### Scenario: Docker Compose local stack starts successfully
|
||||
- **WHEN** a new engineer runs `docker-compose up -d`
|
||||
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
|
||||
|
||||
#### Scenario: Smoke test confirms working stack
|
||||
- **WHEN** a new engineer follows the smoke test section
|
||||
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
|
||||
|
||||
#### Scenario: Common setup errors documented
|
||||
- **WHEN** a new engineer encounters a setup error
|
||||
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
|
||||
|
||||
#### Scenario: Running tests locally documented
|
||||
- **WHEN** a new engineer wants to run the test suite
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
|
||||
|
||||
#### Scenario: Web dashboard local development documented
|
||||
- **WHEN** a new engineer wants to run the web dashboard
|
||||
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Company and product overview document
|
||||
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
|
||||
|
||||
#### Scenario: Mission and vision covered
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
|
||||
|
||||
#### Scenario: AGNTCY alignment explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
|
||||
|
||||
#### Scenario: Product features listed
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
|
||||
|
||||
#### Scenario: Phase roadmap visible
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
|
||||
|
||||
#### Scenario: Engineering team structure explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
|
||||
|
||||
#### Scenario: Free tier limits documented
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Engineering workflow and contribution guide
|
||||
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
|
||||
|
||||
#### Scenario: OpenSpec spec-first workflow explained
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
|
||||
|
||||
#### Scenario: OpenSpec CLI commands documented
|
||||
- **WHEN** a new engineer wants to start a new change
|
||||
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
|
||||
|
||||
#### Scenario: Branching strategy documented
|
||||
- **WHEN** a new engineer creates a branch
|
||||
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop` → `main` requires CTO + CEO approval
|
||||
|
||||
#### Scenario: TypeScript and code standards enforced in workflow
|
||||
- **WHEN** a new engineer writes code
|
||||
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
|
||||
|
||||
#### Scenario: PR checklist documented
|
||||
- **WHEN** a new engineer opens a PR
|
||||
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
|
||||
|
||||
#### Scenario: Virtual engineering team roles explained for contributors
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
|
||||
|
||||
#### Scenario: Commit message conventions documented
|
||||
- **WHEN** a new engineer writes a commit message
|
||||
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: SDK integration guide
|
||||
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
|
||||
|
||||
#### Scenario: SDK architecture overview present
|
||||
- **WHEN** a new engineer reads 11-sdk-guide.md
|
||||
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
|
||||
|
||||
#### Scenario: Node.js SDK documented
|
||||
- **WHEN** a new engineer reads the Node.js SDK section
|
||||
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
|
||||
|
||||
#### Scenario: Python SDK documented
|
||||
- **WHEN** a new engineer reads the Python SDK section
|
||||
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
|
||||
|
||||
#### Scenario: Go SDK documented
|
||||
- **WHEN** a new engineer reads the Go SDK section
|
||||
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
|
||||
|
||||
#### Scenario: Java SDK documented
|
||||
- **WHEN** a new engineer reads the Java SDK section
|
||||
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
|
||||
|
||||
#### Scenario: SDK contribution guide included
|
||||
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
|
||||
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language
|
||||
@@ -0,0 +1,40 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Service deep-dive documents
|
||||
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
|
||||
|
||||
#### Scenario: AgentService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
|
||||
|
||||
#### Scenario: OAuth2Service documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
|
||||
|
||||
#### Scenario: CredentialService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
|
||||
|
||||
#### Scenario: AuditService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
|
||||
|
||||
#### Scenario: VaultClient documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
|
||||
|
||||
#### Scenario: OPA policy engine documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
|
||||
|
||||
#### Scenario: Web Dashboard documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
|
||||
|
||||
#### Scenario: Monitoring stack documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
|
||||
|
||||
#### Scenario: Consistent template enforced
|
||||
- **WHEN** a new engineer looks up any service
|
||||
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Testing strategy document
|
||||
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
|
||||
|
||||
#### Scenario: Test types and their purposes explained
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
|
||||
|
||||
#### Scenario: Test framework stack documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
|
||||
|
||||
#### Scenario: Coverage gates documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
|
||||
|
||||
#### Scenario: How to run the test suite documented
|
||||
- **WHEN** a new engineer wants to run tests
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
|
||||
|
||||
#### Scenario: Unit test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new unit test
|
||||
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
|
||||
|
||||
#### Scenario: Integration test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new integration test
|
||||
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
|
||||
|
||||
#### Scenario: OWASP security testing reference included
|
||||
- **WHEN** a new engineer writes security-relevant code
|
||||
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)
|
||||
120
openspec/changes/archive/2026-03-29-engineering-docs/tasks.md
Normal file
120
openspec/changes/archive/2026-03-29-engineering-docs/tasks.md
Normal file
@@ -0,0 +1,120 @@
|
||||
## 1. Setup
|
||||
|
||||
- [x] 1.1 Create `docs/engineering/` directory
|
||||
- [x] 1.2 Create `docs/engineering/README.md` index with numbered document list, reading order guidance, and one-line description of each document
|
||||
|
||||
## 2. Company and Product Overview (01-overview.md)
|
||||
|
||||
- [x] 2.1 Write company mission and SentryAgent.ai product purpose section
|
||||
- [x] 2.2 Write AGNTCY alignment explanation — what AGNTCY is, why it matters, what "first-class agent identity" means
|
||||
- [x] 2.3 Write product features summary table covering all capabilities (agent registry, OAuth 2.0, credentials, audit, SDKs, dashboard, OPA, monitoring)
|
||||
- [x] 2.4 Write Phase roadmap section (Phase 1 / Phase 2 / Phase 3 scope)
|
||||
- [x] 2.5 Write Virtual Engineering Team model explanation (CTO → Architect → Developer → QA, Claude as engineering partner)
|
||||
- [x] 2.6 Write free tier limits table
|
||||
|
||||
## 3. System Architecture (02-architecture.md)
|
||||
|
||||
- [x] 3.1 Write component diagram (ASCII or Mermaid) showing all major components and their connections
|
||||
- [x] 3.2 Write HTTP request lifecycle section — client → router → middleware → controller → service → repository → database → response
|
||||
- [x] 3.3 Write OAuth 2.0 Client Credentials data flow section
|
||||
- [x] 3.4 Write multi-region deployment topology section
|
||||
|
||||
## 4. Technology Stack and ADRs (03-tech-stack.md)
|
||||
|
||||
- [x] 4.1 Write ADR entry for Node.js 18 — purpose, rationale, alternatives considered
|
||||
- [x] 4.2 Write ADR entry for TypeScript 5.3 strict mode — purpose, rationale, consequences of violation
|
||||
- [x] 4.3 Write ADR entry for Express 4.18 — purpose, rationale, alternatives considered
|
||||
- [x] 4.4 Write ADR entry for PostgreSQL 14 — purpose, what it stores, rationale
|
||||
- [x] 4.5 Write ADR entry for Redis 7 — purpose, what it stores (vs PostgreSQL), rationale
|
||||
- [x] 4.6 Write ADR entry for HashiCorp Vault — purpose, opt-in model, rationale
|
||||
- [x] 4.7 Write ADR entry for OPA — purpose, how it extends OAuth scopes, rationale
|
||||
- [x] 4.8 Write ADR entry for React 18 + Vite 5 — purpose, rationale
|
||||
- [x] 4.9 Write ADR entry for Prometheus + Grafana — purpose, rationale
|
||||
- [x] 4.10 Write ADR entry for Terraform — purpose, multi-region rationale
|
||||
|
||||
## 5. Codebase Structure (04-codebase-structure.md)
|
||||
|
||||
- [x] 5.1 Write annotated directory tree covering all top-level directories
|
||||
- [x] 5.2 Write `src/` subdirectory roles section (controllers, services, repositories, middleware, utils, types, routes)
|
||||
- [x] 5.3 Write "where to add new code" decision guide — mapping feature type to directory
|
||||
- [x] 5.4 Write key files section (app.ts, server.ts, types/index.ts, utils/errors.ts, docker-compose.yml, tsconfig.json)
|
||||
- [x] 5.5 Write DRY enforcement section — mapping the single-location principle to the directory structure
|
||||
|
||||
## 6. Service Deep Dives (05-services.md)
|
||||
|
||||
- [x] 6.1 Write AgentService deep-dive (purpose, responsibility boundary, public interface, DB schema, error types)
|
||||
- [x] 6.2 Write OAuth2Service deep-dive (purpose, responsibility boundary, public interface, Redis schema, JWT payload, Vault vs bcrypt path)
|
||||
- [x] 6.3 Write CredentialService deep-dive (purpose, responsibility boundary, public interface, DB schema, vault_path column explanation)
|
||||
- [x] 6.4 Write AuditService deep-dive (purpose, responsibility boundary, public interface, DB schema, immutability guarantee, 90-day retention)
|
||||
- [x] 6.5 Write VaultClient deep-dive (purpose, public methods, opt-in configuration, constant-time comparison rationale)
|
||||
- [x] 6.6 Write OPA policy engine deep-dive (purpose, policy file locations, how authorization decisions work, how to write and test a new policy)
|
||||
- [x] 6.7 Write Web Dashboard deep-dive (React structure, authentication flow, main views, local dev instructions)
|
||||
- [x] 6.8 Write Prometheus/Grafana monitoring deep-dive (metrics endpoint, key metrics, Grafana dashboard, how to add a new metric)
|
||||
|
||||
## 7. Code Walkthroughs (06-walkthroughs.md)
|
||||
|
||||
- [x] 7.1 Read source files to identify current line numbers for token issuance flow
|
||||
- [x] 7.2 Write token issuance walkthrough — POST /oauth2/token → router → middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt → Redis → JWT sign → AuditService → 200 response, with file:line references and "why" annotations at each step
|
||||
- [x] 7.3 Read source files to identify current line numbers for agent registration flow
|
||||
- [x] 7.4 Write agent registration walkthrough — POST /agents → auth middleware → validation → AgentController → AgentService → validators → AgentRepository → PostgreSQL → AuditService → 201 response, with file:line references and "why" annotations
|
||||
- [x] 7.5 Read source files to identify current line numbers for credential rotation flow
|
||||
- [x] 7.6 Write credential rotation walkthrough — POST /agents/:id/credentials/:credId/rotate → auth → CredentialController → CredentialService → revoke old → crypto → Vault/bcrypt → CredentialRepository → token revocation → AuditService → 200 response, with file:line references and "why" annotations
|
||||
- [x] 7.7 Add "last verified against commit X" header to each walkthrough
|
||||
|
||||
## 8. Development Environment Setup (07-dev-setup.md)
|
||||
|
||||
- [x] 8.1 Write prerequisites section (Node.js 18+, Docker Desktop, Git) with install links
|
||||
- [x] 8.2 Write repository clone and npm install steps
|
||||
- [x] 8.3 Write .env setup section (copy .env.example, document every variable)
|
||||
- [x] 8.4 Write Docker Compose startup steps and service health verification
|
||||
- [x] 8.5 Write database migration steps
|
||||
- [x] 8.6 Write smoke test section — curl POST /oauth2/token with seed credentials, expected JWT response
|
||||
- [x] 8.7 Write troubleshooting section (port conflict, migration failure, Node version mismatch, Docker not running, missing .env vars)
|
||||
- [x] 8.8 Write "running tests locally" section (npm test, npm run test:coverage, npm run test:integration, single file)
|
||||
- [x] 8.9 Write web dashboard local dev section (cd dashboard, npm install, npm run dev, port confirmation)
|
||||
|
||||
## 9. Engineering Workflow (08-workflow.md)
|
||||
|
||||
- [x] 9.1 Write OpenSpec spec-first workflow overview — the full sequence from CEO approval to merged code
|
||||
- [x] 9.2 Write OpenSpec CLI commands reference (openspec new change, openspec status, openspec instructions, openspec list)
|
||||
- [x] 9.3 Write branching strategy section (feature/* from develop, PR → develop, develop → main requires approvals)
|
||||
- [x] 9.4 Write TypeScript and code standards section (strict mode, no any, DRY, SOLID, JSDoc) with enforcement consequences
|
||||
- [x] 9.5 Write PR checklist section (TypeScript, ESLint, unit tests, coverage, integration tests, spec updated, docs updated)
|
||||
- [x] 9.6 Write virtual engineering team roles for contributors section
|
||||
- [x] 9.7 Write commit message conventions section (Conventional Commits — feat, fix, docs, test, chore, refactor with examples)
|
||||
|
||||
## 10. Testing Strategy (09-testing.md)
|
||||
|
||||
- [x] 10.1 Write test types and purposes section (unit vs integration — distinction, when to use each)
|
||||
- [x] 10.2 Write test framework stack section (Jest 29.7, ts-jest, Supertest 6.3 — role of each)
|
||||
- [x] 10.3 Write coverage gates section (>80% statements, branches, functions, lines — enforcement)
|
||||
- [x] 10.4 Write "how to run the test suite" section (npm test, test:coverage, test:integration, single file)
|
||||
- [x] 10.5 Write unit test writing conventions section with complete real example from codebase (mock setup, describe/it structure, error assertion, mock verification)
|
||||
- [x] 10.6 Write integration test writing conventions section with complete real example (Supertest app boot, seed data, authenticated request, cleanup)
|
||||
- [x] 10.7 Write OWASP Top 10 security testing reference section relevant to this codebase
|
||||
|
||||
## 11. Deployment and Operations (10-deployment.md)
|
||||
|
||||
- [x] 11.1 Write Docker build and run section (multi-stage build explanation, build command, run command with env vars)
|
||||
- [x] 11.2 Write complete environment variables reference table (all variables, purpose, required/optional, example value)
|
||||
- [x] 11.3 Write database migrations section (file location, naming convention, run command, writing new migrations)
|
||||
- [x] 11.4 Write Terraform multi-region deployment section (modules, regions, terraform plan/apply, resources provisioned)
|
||||
- [x] 11.5 Write Prometheus and Grafana section (/metrics endpoint, key metrics, Grafana access, adding a new metric)
|
||||
- [x] 11.6 Write operational runbook section (health check, rotate JWT signing key, revoke all tokens for compromised agent, read audit logs for incident)
|
||||
|
||||
## 12. SDK Guide (11-sdk-guide.md)
|
||||
|
||||
- [x] 12.1 Write SDK architecture overview (all four SDKs, identical API surface, consistency standard)
|
||||
- [x] 12.2 Write Node.js SDK section (install, AgentIdPClient, 4 service clients, TokenManager, AgentIdPError, complete working example)
|
||||
- [x] 12.3 Write Python SDK section (install, sync + async clients, TokenManager variants, AgentIdPError, complete sync and async examples)
|
||||
- [x] 12.4 Write Go SDK section (go get, AgentIdPClient, goroutine-safe TokenManager, context.Context usage, AgentIdPError, complete example)
|
||||
- [x] 12.5 Write Java SDK section (Maven/Gradle snippet, builder pattern, sync + CompletableFuture async, TokenManager, AgentIdPException, complete example)
|
||||
- [x] 12.6 Write SDK contribution guide (step-by-step checklist for adding a new endpoint to all four SDKs)
|
||||
|
||||
## 13. Final Review and Commit
|
||||
|
||||
- [x] 13.1 Review all documents for cross-link accuracy (internal links between documents resolve correctly)
|
||||
- [x] 13.2 Verify all code examples are complete and runnable (no ellipses, no placeholders)
|
||||
- [x] 13.3 Verify file:line references in walkthroughs against current source
|
||||
- [x] 13.4 Verify README.md index links to all 11 documents correctly
|
||||
- [x] 13.5 Commit `docs/engineering/` to `develop` with message `docs: engineering knowledge base for new hires`
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user