sentryagent-idp/docs/devops/field-trial.md

# SentryAgent.ai AgentIdP — In-House Field Trial Guide

This guide is the execution playbook for in-house Docker Compose field trials of SentryAgent.ai
AgentIdP. Follow each phase in order. All commands are exact — copy and paste them directly.

Estimated time to complete all phases: 45–60 minutes.

Prerequisites must be satisfied before Section 0.

## Prerequisites

**Docker 24+ and Docker Compose 2.20+**

```bash
docker --version
# Expected: Docker version 24.x.x or higher

docker compose version
# Expected: Docker Compose version v2.20.x or higher
```

**Node.js 18+ via nvm**

```bash
export NVM_DIR="$HOME/.nvm" && source "$NVM_DIR/nvm.sh"
node --version
# Expected: v18.x.x or higher
```

**openssl**

```bash
openssl version
# Expected: OpenSSL 1.1.x or higher (any version)
```

**Git repo cloned**

```bash
git clone https://git.sentryagent.ai/vijay_admin/sentryagent-idp.git
cd sentryagent-idp
```

**Ports free**

The following ports must be free on the machine before starting:

| Port | Service |
|------|---------|
| 3000 | AgentIdP backend |
| 3001 | Next.js portal |
| 5432 | PostgreSQL |
| 6379 | Redis |

Check all ports:

```bash
lsof -i :3000 -i :3001 -i :5432 -i :6379
# Expected: no output (all ports free)
```

If any port is in use, kill the occupying process:

```bash
lsof -ti:<port> | xargs kill
```

---

## Section 0 — Environment Setup

This section guides the engineer through creating a valid `.env` file for field trial use.

**Step 0.1 — Copy `.env.example`**

```bash
cp .env.example .env
```

**Step 0.2 — Generate RSA-2048 keypair**

Generate the JWT signing keys:

```bash
openssl genrsa -out private.pem 2048
openssl rsa -in private.pem -pubout -out public.pem
```

Verify the keys are valid:

```bash
openssl rsa -in private.pem -check -noout
# Expected: RSA key ok

openssl rsa -in public.pem -pubin -noout -text 2>&1 | head -3
# Expected: Public-Key: (2048 bit)
```

**Step 0.3 — Write keys into `.env`**

Write the private key as a single-line PEM with `\n` separators:

```bash
PRIVATE_KEY_LINE=$(awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' private.pem)
sed -i "s|JWT_PRIVATE_KEY=.*|JWT_PRIVATE_KEY=\"${PRIVATE_KEY_LINE}\"|" .env
```

Write the public key:

```bash
PUBLIC_KEY_LINE=$(awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' public.pem)
sed -i "s|JWT_PUBLIC_KEY=.*|JWT_PUBLIC_KEY=\"${PUBLIC_KEY_LINE}\"|" .env
```

Verify both keys are present and non-empty:

```bash
grep -c "BEGIN RSA PRIVATE KEY" .env
# Expected: 1

grep -c "BEGIN PUBLIC KEY" .env
# Expected: 1
```

**Step 0.4 — Configure field trial values**

Set the following values in `.env`. These are the correct values for an in-house field trial
(no real Stripe, no Kafka, no Vault):

```bash
# Disable real Stripe billing for field trial
sed -i "s|BILLING_ENABLED=.*|BILLING_ENABLED=false|" .env
sed -i "s|STRIPE_SECRET_KEY=.*|STRIPE_SECRET_KEY=sk_test_placeholder|" .env
sed -i "s|STRIPE_WEBHOOK_SECRET=.*|STRIPE_WEBHOOK_SECRET=whsec_placeholder|" .env
sed -i "s|STRIPE_PRICE_ID=.*|STRIPE_PRICE_ID=price_placeholder|" .env

# Keep feature flags at defaults
sed -i "s|ANALYTICS_ENABLED=.*|ANALYTICS_ENABLED=true|" .env
sed -i "s|TIER_ENFORCEMENT=.*|TIER_ENFORCEMENT=true|" .env
sed -i "s|COMPLIANCE_ENABLED=.*|COMPLIANCE_ENABLED=true|" .env

# Allow portal CORS
sed -i "s|CORS_ORIGIN=.*|CORS_ORIGIN=http://localhost:3001|" .env
```

**Step 0.5 — Verify final `.env`**

```bash
grep -E "^(DATABASE_URL|REDIS_URL|JWT_PRIVATE_KEY|JWT_PUBLIC_KEY|BILLING_ENABLED|ANALYTICS_ENABLED|TIER_ENFORCEMENT|COMPLIANCE_ENABLED|CORS_ORIGIN)=" .env
```

Expected output (values abbreviated):

```
POSTGRES_USER=sentryagent
POSTGRES_PASSWORD=sentryagent
POSTGRES_DB=sentryagent_idp
DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp
REDIS_URL=redis://localhost:6379
JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\n...
JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\n...
BILLING_ENABLED=false
ANALYTICS_ENABLED=true
TIER_ENFORCEMENT=true
COMPLIANCE_ENABLED=true
CORS_ORIGIN=http://localhost:3001
```

---

## Phase A — Stack Startup

**Step A.1 — Build and start the full stack**

```bash
docker compose up --build -d
```

This builds the `app` container image and starts all three services. The `app` service waits
for `postgres` and `redis` to pass their health checks before starting.

**Step A.2 — Verify all services are healthy**

```bash
docker compose ps
```

Expected output — all three services must show `healthy`:

```
NAME                          IMAGE                        STATUS
sentryagent-idp-app-1         sentryagent-idp-app          running (healthy)
sentryagent-idp-postgres-1    postgres:14.12-alpine3.19    running (healthy)
sentryagent-idp-redis-1       redis:7.2-alpine3.19         running (healthy)
```

If any service shows `starting` or `unhealthy`, wait 15 seconds and run `docker compose ps`
again. If a service remains unhealthy after 60 seconds, see Troubleshooting.

**Step A.3 — Run database migrations**

```bash
docker compose exec app npm run db:migrate
```

Expected output:

```
Running database migrations...
  ✓ Applied: 001_create_agents.sql
  ✓ Applied: 002_create_credentials.sql
  ...
  ✓ Applied: 025_add_analytics_events.sql
  ✓ Applied: 026_add_tenant_tiers.sql

Migrations complete. 26 migration(s) applied.
```

All 26 migrations must apply without error before proceeding.

**Step A.4 — Verify application health**

```bash
curl -s http://localhost:3000/health | jq .
```

Expected response:

```json
{"status":"ok"}
```

**Step A.5 — Verify Prometheus metrics**

```bash
curl -s http://localhost:3000/metrics | head -20
```

Expected: Prometheus text output beginning with `# HELP` lines. Verify these specific metrics
are present:

```bash
curl -s http://localhost:3000/metrics | grep -E "^# HELP agentidp_"
```

Expected: at least 19 lines matching `# HELP agentidp_*`.

---

## Phase B — Core Product Journeys

This phase tests the end-to-end agent identity lifecycle. Run each step in order. Each step
depends on the output of the previous step.

> **Note on tokens:** The steps below use shell variables to pass values between commands. Run
> all commands in the same terminal session.

**Step B.1 — Create an organisation**

```bash
ORG_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/organizations \
  -H "Content-Type: application/json" \
  -d '{"name":"Field Trial Org","slug":"field-trial"}')

echo $ORG_RESPONSE | jq .
ORG_ID=$(echo $ORG_RESPONSE | jq -r '.org_id')
echo "ORG_ID: $ORG_ID"
```

Expected: HTTP 201 response body containing an `org_id` UUID. `ORG_ID` must be a non-empty UUID.

**Step B.2 — Register an agent**

```bash
AGENT_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/agents \
  -H "Content-Type: application/json" \
  -d "{
    \"email\": \"trial-agent@field-trial.sentryagent.ai\",
    \"agent_type\": \"classifier\",
    \"version\": \"1.0.0\",
    \"capabilities\": [\"documents:read\", \"documents:classify\"],
    \"owner\": \"field-trial-team\",
    \"deployment_env\": \"development\",
    \"organization_id\": \"$ORG_ID\"
  }")

echo $AGENT_RESPONSE | jq .
AGENT_ID=$(echo $AGENT_RESPONSE | jq -r '.agent_id')
echo "AGENT_ID: $AGENT_ID"
```

Expected: HTTP 201 response body containing an `agent_id` UUID.

**Step B.3 — Generate credentials**

```bash
CRED_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/credentials \
  -H "Content-Type: application/json" \
  -d "{\"agent_id\": \"$AGENT_ID\"}")

echo $CRED_RESPONSE | jq .
CLIENT_ID=$(echo $CRED_RESPONSE | jq -r '.client_id')
CLIENT_SECRET=$(echo $CRED_RESPONSE | jq -r '.client_secret')
echo "CLIENT_ID: $CLIENT_ID"
echo "CLIENT_SECRET: $CLIENT_SECRET"
```

Expected: HTTP 201 response body containing `client_id` and `client_secret`. The `client_secret`
is only returned once — save it now.

**Step B.4 — Issue an OAuth 2.0 access token**

```bash
TOKEN_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&scope=read")

echo $TOKEN_RESPONSE | jq .
ACCESS_TOKEN=$(echo $TOKEN_RESPONSE | jq -r '.access_token')
echo "ACCESS_TOKEN obtained: ${ACCESS_TOKEN:0:30}..."
```

Expected: HTTP 200 response body with `access_token`, `token_type: "Bearer"`, `expires_in: 3600`,
`scope: "read"`.

**Step B.5 — Use the token on a protected endpoint**

```bash
curl -s -H "Authorization: Bearer $ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents | jq .
```

Expected: HTTP 200 with a JSON array of agents including the agent registered in Step B.2.

**Step B.6 — Inspect JWT claims**

Decode and inspect the access token structure (without verifying signature):

```bash
echo $ACCESS_TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq .
```

Expected claims:

```json
{
  "sub": "<client_id>",
  "iss": "https://sentryagent.ai",
  "aud": "sentryagent-api",
  "scope": "read",
  "agent_id": "<agent_id>",
  "organization_id": "<org_id>",
  "iat": "<issued-at-timestamp>",
  "exp": "<expiry-timestamp>",
  "jti": "<unique-jwt-id>"
}
```

Verify `exp - iat = 3600` (1 hour TTL).

**Step B.7 — Rotate credentials and verify old token is rejected**

Rotate the credentials (generates a new client_secret, revokes the old one):

```bash
ROTATE_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/credentials \
  -H "Content-Type: application/json" \
  -d "{\"agent_id\": \"$AGENT_ID\"}")

NEW_CLIENT_ID=$(echo $ROTATE_RESPONSE | jq -r '.client_id')
NEW_CLIENT_SECRET=$(echo $ROTATE_RESPONSE | jq -r '.client_secret')
echo "New credential: $NEW_CLIENT_ID"
```

Attempt to use the old token (must be rejected):

```bash
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents
# Expected: 401
```

Issue a new token with the new credentials:

```bash
NEW_TOKEN_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=$NEW_CLIENT_ID&client_secret=$NEW_CLIENT_SECRET&scope=read")

NEW_ACCESS_TOKEN=$(echo $NEW_TOKEN_RESPONSE | jq -r '.access_token')
echo "New token obtained."
```

Verify the new token works:

```bash
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents
# Expected: 200
```

**Step B.8 — Check audit log**

```bash
curl -s -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  "http://localhost:3000/api/v1/audit?limit=10" | jq .
```

Expected: JSON array of audit events. Verify these action types are present from Steps B.1–B.7:
`agent.created`, `credential.generated`, `token.issued`, `credential.rotated`, `token.revoked`.

---

## Phase C — Guardrails

This phase tests security boundaries. Each test case must be run with the exact command shown
and must produce the specified HTTP status code.

> **Setup:** Ensure `$NEW_ACCESS_TOKEN` is still set from Phase B. Use `export NEW_ACCESS_TOKEN`
> if switching terminals.

**Test C.1 — No Authorization header → 401**

```bash
curl -s -o /dev/null -w "%{http_code}" \
  http://localhost:3000/api/v1/agents
```

Expected HTTP status: `401`

**Test C.2 — Malformed JWT → 401**

```bash
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer notavalidjwt" \
  http://localhost:3000/api/v1/agents
```

Expected HTTP status: `401`

**Test C.3 — Expired JWT → 401**

Use a known-expired token. Generate one with a 1-second TTL (requires a test helper or
manually craft an expired JWT). For field trial purposes, use this pre-constructed expired token
(signed with a different key — will fail signature verification and return 401):

```bash
EXPIRED_TOKEN="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ0ZXN0IiwiZXhwIjoxfQ.invalid"

curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $EXPIRED_TOKEN" \
  http://localhost:3000/api/v1/agents
```

Expected HTTP status: `401`

**Test C.4 — Valid JWT, wrong scope → 403**

Issue a token with scope `read`, then attempt to access an endpoint requiring scope `write`:

```bash
# The NEW_ACCESS_TOKEN has scope "read"
# Attempt an action requiring "write" scope (create agent)
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:3000/api/v1/agents \
  -d '{"email":"scope-test@example.com","agent_type":"custom","version":"1.0.0","capabilities":[],"owner":"test","deployment_env":"development"}'
```

Expected HTTP status: `403`

**Test C.5 — Rate limit: 101 requests → 429 on the 101st**

Send 101 requests in rapid succession. The 101st must return 429.

```bash
for i in $(seq 1 101); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
    http://localhost:3000/api/v1/agents)
  if [ "$STATUS" = "429" ]; then
    echo "Request $i returned 429 (PASS)"
    break
  fi
done
```

Expected: Output shows `Request 101 returned 429 (PASS)` (or earlier if previous requests in
the session have already counted toward the window).

After this test, wait 60 seconds for the rate limit window to reset, or use a fresh
`client_id` for subsequent tests.

**Test C.6 — Tier limit: exceed free-tier API call limit → 429 with `tier_limit_exceeded`**

The free tier allows 1,000 API calls per day. For field trial, manually set the counter to the
limit value to trigger the guard without making 1,000 real requests:

```bash
# Get the org_id from the token
ORG_ID=$(echo $NEW_ACCESS_TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq -r '.organization_id')

# Force the counter to the limit via Redis CLI
docker compose exec redis redis-cli SET "rate:tier:calls:$ORG_ID" 1001 EX 86400

# The next API call must be rejected
TIER_RESPONSE=$(curl -s -w "\n%{http_code}" \
  -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents)

echo "$TIER_RESPONSE"
```

Expected: HTTP status `429`. Response body must contain `"code":"tier_limit_exceeded"`.

Reset the counter after this test:

```bash
docker compose exec redis redis-cli DEL "rate:tier:calls:$ORG_ID"
```

**Test C.7 — Tenant isolation: Org A token cannot access Org B agents → 403**

Create a second organisation and agent:

```bash
ORG_B_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/organizations \
  -H "Content-Type: application/json" \
  -d '{"name":"Org B","slug":"org-b"}')

ORG_B_ID=$(echo $ORG_B_RESPONSE | jq -r '.org_id')
echo "ORG_B_ID: $ORG_B_ID"

AGENT_B_RESPONSE=$(curl -s -X POST http://localhost:3000/api/v1/agents \
  -H "Content-Type: application/json" \
  -d "{
    \"email\": \"org-b-agent@org-b.sentryagent.ai\",
    \"agent_type\": \"monitor\",
    \"version\": \"1.0.0\",
    \"capabilities\": [],
    \"owner\": \"org-b\",
    \"deployment_env\": \"development\",
    \"organization_id\": \"$ORG_B_ID\"
  }")

AGENT_B_ID=$(echo $AGENT_B_RESPONSE | jq -r '.agent_id')
echo "AGENT_B_ID: $AGENT_B_ID"
```

Attempt to access Org B's agent using Org A's token:

```bash
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents/$AGENT_B_ID
```

Expected HTTP status: `403`

---

## Phase D — Portal

**Step D.1 — Install portal dependencies**

```bash
cd portal && npm install && cd ..
```

**Step D.2 — Start the portal development server**

```bash
cd portal && npm run dev &
```

Wait 5 seconds for Next.js to compile, then verify it is listening:

```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:3001
# Expected: 200 or 307 (redirect to /login)
```

**Step D.3 — Verify each portal route loads**

Open a browser and navigate to each of the following URLs. Each must load without a JavaScript
error in the browser console:

| URL | Expected |
|-----|---------|
| `http://localhost:3001/login` | Login page renders |
| `http://localhost:3001/agents` | Agent list renders (may be empty or show auth redirect) |
| `http://localhost:3001/credentials` | Credentials page renders |
| `http://localhost:3001/audit` | Audit log page renders |
| `http://localhost:3001/analytics` | Analytics dashboard renders |
| `http://localhost:3001/settings/tier` | Tier status page renders |
| `http://localhost:3001/compliance` | Compliance report page renders |
| `http://localhost:3001/webhooks` | Webhooks page renders |
| `http://localhost:3001/marketplace` | Marketplace page renders |

All 9 routes must load without a blank page or unhandled error.

**Step D.4 — Verify analytics charts render**

Navigate to `http://localhost:3001/analytics`.

Verify both of the following chart components are present in the page DOM:

```bash
curl -s http://localhost:3001/analytics | grep -c "recharts"
# Expected: 1 or more (recharts is used for TokenTrendChart and AgentHeatmap)
```

**Step D.5 — Verify tier status page**

Navigate to `http://localhost:3001/settings/tier`.

The page must display the current tier (expected: `free` for a new organisation).

**Step D.6 — Stop the portal**

```bash
kill $(lsof -ti:3001)
```

---

## Phase E — AGNTCY Conformance

**Step E.1 — Activate nvm**

```bash
export NVM_DIR="$HOME/.nvm" && source "$NVM_DIR/nvm.sh"
```

**Step E.2 — Run the AGNTCY conformance suite**

```bash
npm run test:agntcy-conformance
```

**Step E.3 — Expected output**

```
AGNTCY Conformance Suite
  Agent Card Export
    ✓ exports valid AGNTCY agent card format
    ✓ agent card contains required identity fields
  Compliance Report
    ✓ generates SOC2-aligned compliance report
    ✓ compliance report includes all required control domains

4 passing (Xs)
```

All 4 tests must pass. A failure indicates a regression in AGNTCY conformance.

**What each test validates:**

| Test | What it validates |
|------|------------------|
| `exports valid AGNTCY agent card format` | The `/api/v1/compliance/agent-cards` endpoint returns an array where each card has `id`, `name`, `version`, `capabilities`, `did` fields in AGNTCY format |
| `agent card contains required identity fields` | Each agent card's `identity` block includes `agent_id`, `organization_id`, `did`, and `deployment_env` |
| `generates SOC2-aligned compliance report` | The `/api/v1/compliance/report` endpoint returns a report with `generated_at`, `controls`, `summary` top-level keys |
| `compliance report includes all required control domains` | The `controls` array in the report includes entries for `access_control`, `audit_logging`, `credential_management`, and `tenant_isolation` |

---

## Phase F — Performance Baseline

> **Prerequisite:** Apache Bench (`ab`) must be installed. On Ubuntu: `sudo apt install apache2-utils`.
> Verify: `ab -V`

**Step F.1 — Create a token payload file**

```bash
cat > /tmp/token_payload.json << 'EOF'
grant_type=client_credentials&client_id=REPLACE_CLIENT_ID&client_secret=REPLACE_CLIENT_SECRET&scope=read
EOF
```

Replace `REPLACE_CLIENT_ID` and `REPLACE_CLIENT_SECRET` with `$NEW_CLIENT_ID` and
`$NEW_CLIENT_SECRET` from Phase B:

```bash
cat > /tmp/token_payload.txt << EOF
grant_type=client_credentials&client_id=${NEW_CLIENT_ID}&client_secret=${NEW_CLIENT_SECRET}&scope=read
EOF
```

**Step F.2 — Benchmark token endpoint**

```bash
ab -n 100 -c 10 \
  -p /tmp/token_payload.txt \
  -T "application/x-www-form-urlencoded" \
  http://localhost:3000/api/v1/token
```

**Pass criteria for token endpoint:**

- `Requests per second` > 10
- `Time per request (mean)` < 100 ms
- p95 (95th percentile, shown as `95%` in the `Percentage of requests` table) < 100 ms
- Zero non-2xx responses

**Step F.3 — Benchmark agent list endpoint**

Ensure `$NEW_ACCESS_TOKEN` is still set and valid. Issue a fresh token if needed:

```bash
NEW_ACCESS_TOKEN=$(curl -s -X POST http://localhost:3000/api/v1/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=${NEW_CLIENT_ID}&client_secret=${NEW_CLIENT_SECRET}&scope=read" \
  | jq -r '.access_token')
```

Run the benchmark:

```bash
ab -n 100 -c 10 \
  -H "Authorization: Bearer $NEW_ACCESS_TOKEN" \
  http://localhost:3000/api/v1/agents
```

**Pass criteria for agent list endpoint:**

- `Time per request (mean)` < 200 ms
- p95 (`95%` row in the `Percentage of requests` table) < 200 ms
- Zero non-2xx responses

**Step F.4 — Record results**

Record the following values from each `ab` output for the field trial report:

| Endpoint | Metric | Value |
|----------|--------|-------|
| `/api/v1/token` | Requests per second | |
| `/api/v1/token` | Mean time per request (ms) | |
| `/api/v1/token` | p95 (ms) | |
| `/api/v1/agents` | Requests per second | |
| `/api/v1/agents` | Mean time per request (ms) | |
| `/api/v1/agents` | p95 (ms) | |

A field trial passes Phase F if all p95 values are within the pass criteria above.

---

## Troubleshooting

Each entry follows the pattern: **Symptom** → **Cause** → **Fix** with exact commands.

---

**Port already in use**

Symptom:

```
Error response from daemon: driver failed programming external connectivity on endpoint
sentryagent-idp-app-1: Bind for 0.0.0.0:3000 failed: port is already allocated
```

Fix: Kill the process occupying the port, then restart:

```bash
lsof -ti:3000 | xargs kill
lsof -ti:5432 | xargs kill
lsof -ti:6379 | xargs kill
docker compose up --build -d
```

---

**Container shows `unhealthy`**

Symptom: `docker compose ps` shows `unhealthy` for a service.

Fix: Check logs for the unhealthy service:

```bash
docker compose logs postgres
docker compose logs redis
docker compose logs app
```

Common causes:

| Service | Cause | Fix |
|---------|-------|-----|
| `postgres` | Wrong database credentials | Verify `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` in `.env` match values in `compose.yaml` |
| `redis` | Port conflict | Check `lsof -ti:6379` and kill occupying process |
| `app` | Missing env var | Check `docker compose logs app` for `Failed to start server` message |

---

**Migration fails — connection refused**

Symptom:

```
Migration failed: Error: connect ECONNREFUSED 127.0.0.1:5432
```

Cause: Running `npm run db:migrate` directly on the host (not inside the container) while
PostgreSQL is running inside Docker.

Fix: Always run migrations inside the container during a field trial:

```bash
docker compose exec app npm run db:migrate
```

---

**Migration fails — relation already exists**

Symptom:

```
Migration failed: Error: relation "agents" already exists
```

Cause: A previous partial migration run left the database in an inconsistent state.

Fix: Check which migrations have been applied:

```bash
docker compose exec postgres psql -U sentryagent -d sentryagent_idp \
  -c "SELECT name FROM schema_migrations ORDER BY name;"
```

If the database state cannot be repaired, reset it:

```bash
docker compose down -v
docker compose up --build -d
docker compose exec app npm run db:migrate
```

> `docker compose down -v` destroys all data. Use only when a clean slate is acceptable.

---

**JWT error — invalid signature or key format**

Symptom:

```
Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required
```

Or: All tokens return `401 Token signature is invalid`.

Cause: JWT keys in `.env` have incorrect PEM format — literal newlines instead of `\n`
sequences, or trailing whitespace.

Fix: Regenerate the keys and re-write them using the exact commands from Step 0.2 and 0.3.

Verify the key format in `.env`:

```bash
grep "JWT_PRIVATE_KEY" .env | head -c 100
# Expected: JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\nMII...
# NOT:      JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----
#           MII...
```

The entire key must be on a single line with `\n` as literal backslash-n characters, not
actual newlines.

---

**Portal CORS error**

Symptom: Browser console shows:

```
Access to XMLHttpRequest at 'http://localhost:3000/api/v1/...' from origin 'http://localhost:3001'
has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present
```

Cause: `CORS_ORIGIN` in `.env` does not include `http://localhost:3001`, or is set to a
different value.

Fix:

```bash
sed -i "s|CORS_ORIGIN=.*|CORS_ORIGIN=http://localhost:3001|" .env
docker compose up --build -d
```

Wait for the `app` container to become healthy before retrying.

---

**Tier counter not resetting**

Symptom: All API calls return 429 `tier_limit_exceeded` even after waiting.

Cause: The Redis tier counter was manually set in Test C.6 and not deleted.

Fix:

```bash
# Get your org_id from the token
ORG_ID=$(echo $NEW_ACCESS_TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq -r '.organization_id')

docker compose exec redis redis-cli DEL "rate:tier:calls:$ORG_ID"
docker compose exec redis redis-cli DEL "rate:tier:tokens:$ORG_ID"
```

---

**`ab` not found**

Symptom: `ab: command not found`

Fix:

```bash
sudo apt-get update && sudo apt-get install -y apache2-utils
# or on macOS:
brew install httpd
```

---

**AGNTCY conformance test fails**

Symptom: One or more tests in `npm run test:agntcy-conformance` fail.

Diagnosis steps:

1. Ensure the backend is running and healthy: `curl -s http://localhost:3000/health`
2. Ensure `COMPLIANCE_ENABLED=true` in `.env` (check with `grep COMPLIANCE_ENABLED .env`)
3. Ensure at least one agent has been registered (Phase B must have been completed)
4. Check the test output for the specific assertion that failed
5. Check `docker compose logs app` for errors around compliance report generation

If the issue is a Redis cache hit returning stale data:

```bash
docker compose exec redis redis-cli KEYS "compliance:*" | xargs docker compose exec redis redis-cli DEL
```

Then re-run the conformance suite.