Files
sentryagent-idp/docs/devops/operations.md
SentryAgent.ai Developer d94a8cedc0 docs: DevOps documentation — complete docs/devops/ set
Adds the full devops-documentation OpenSpec change implementation.
Separate from docs/developers/ — serves a different audience (operators,
not API consumers).

docs/devops/:
- README.md          — index and system overview
- architecture.md    — components, ports, data flow, Redis key patterns
- environment-variables.md — all 7 env vars (required + optional, formats, .env example)
- database.md        — 4-table schema, indexes, constraints, migration runner
- local-development.md — docker-compose setup, health checks, startup, Dockerfile gap noted
- security.md        — RSA key generation/rotation, CORS, bcrypt, secret storage guidance
- operations.md      — startup order, graceful shutdown, log reference, troubleshooting

QA gates: 48/48 tasks complete. All env vars verified against source.
All table names verified against migrations. All ports verified against
docker-compose.yml. All internal links resolve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 14:28:55 +00:00

6.8 KiB

Operations

Startup, shutdown, log interpretation, and troubleshooting for AgentIdP.


Startup Order

Always start services in this order. Starting the application before PostgreSQL or Redis is ready will cause connection errors on first request.

1. PostgreSQL   (must be healthy)
2. Redis        (must be healthy)
3. Migrations   (must complete successfully)
4. Application  (start last)

Startup checklist

# 1. Start PostgreSQL and Redis
docker-compose up -d postgres redis

# 2. Wait for healthy status
docker-compose ps
# Both postgres and redis must show "healthy" before proceeding

# 3. Run migrations
npm run db:migrate
# Must complete with 0 errors before starting the app

# 4. Start the application
npm run dev    # development
# or
npm start      # production (requires prior npm run build)

Graceful Shutdown

The application handles SIGTERM and SIGINT gracefully:

  1. Stops accepting new connections
  2. Waits for in-flight requests to complete
  3. Exits with code 0

Sending SIGTERM

# Find the PID
ps aux | grep "node.*server"

# Send SIGTERM
kill -SIGTERM <pid>

Expected log output:

Shutting down gracefully...

The process exits cleanly. No requests are dropped if they were already in-flight.

Docker stop

docker stop sends SIGTERM by default with a 10-second timeout before SIGKILL. This is sufficient for graceful shutdown.

docker stop sentryagent-idp-app-1

Log Reference

AgentIdP logs to stdout. In development (NODE_ENV=development), Morgan HTTP request logs are included. In test (NODE_ENV=test), Morgan is suppressed.

Startup logs

Log line Meaning
SentryAgent.ai AgentIdP listening on port 3000 Server bound successfully — ready to accept requests
Shutting down gracefully... SIGTERM/SIGINT received — draining connections

Error logs

Log line Meaning
Failed to start server: Error: DATABASE_URL environment variable is required DATABASE_URL is not set in the environment
Failed to start server: Error: REDIS_URL environment variable is required REDIS_URL is not set
Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required One or both JWT keys are missing
Unexpected pg pool error <err> PostgreSQL connection dropped after startup — check DB availability
Redis client error <err> Redis connection error after startup — check Redis availability

Morgan HTTP request format (development)

::1 - - [28/Mar/2026:09:01:00 +0000] "POST /api/v1/token HTTP/1.1" 200 312 "-" "curl/7.88.1"

Format: <ip> - - [<timestamp>] "<method> <path> <protocol>" <status> <bytes> "<referrer>" "<user-agent>"


Redis Key Patterns

Three key patterns are used in Redis. Useful for debugging and manual inspection.

# Connect to Redis CLI
docker-compose exec redis redis-cli
Key pattern Example Purpose TTL
revoked:<jti> revoked:f1e2d3c4-b5a6-... Revoked token JTI Remaining token lifetime
rate:<client_id>:<window> rate:a1b2c3...:29086156 Request count per minute window 60 seconds
monthly:<client_id>:<year>:<month> monthly:a1b2c3...:2026:3 Token issuance count for free tier End of month

Inspect keys:

# List all revoked tokens
redis-cli KEYS "revoked:*"

# Check rate limit counter for a specific client
redis-cli GET "rate:<client_id>:<window_key>"

# Check monthly token count for a specific client
redis-cli GET "monthly:<client_id>:2026:3"

Where <window_key> is floor(unix_ms / 60000). For the current window:

node -e "console.log(Math.floor(Date.now() / 60000))"

Troubleshooting

Application fails to start — missing environment variable

Symptom:

Failed to start server: Error: DATABASE_URL environment variable is required

Fix: Ensure your .env file exists in the project root and contains all required variables. Verify:

grep -E "^(DATABASE_URL|REDIS_URL|JWT_PRIVATE_KEY|JWT_PUBLIC_KEY)=" .env

Application fails to start — JWT key error

Symptom:

Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required

Fix: Generate RSA keys and add them to .env. See security.md.


PostgreSQL connection refused on first request

Symptom:

Error: connect ECONNREFUSED 127.0.0.1:5432

Causes and fixes:

Cause Fix
PostgreSQL container not started Run docker-compose up -d postgres
PostgreSQL container not yet healthy Wait and run docker-compose ps — wait for healthy
Wrong DATABASE_URL host/port Check DATABASE_URL matches the PostgreSQL port (5432)
PostgreSQL container exited Run docker-compose logs postgres to see why it exited

Redis connection error on first request

Symptom:

Redis client error Error: connect ECONNREFUSED 127.0.0.1:6379

Causes and fixes:

Cause Fix
Redis container not started Run docker-compose up -d redis
Redis container not yet healthy Run docker-compose ps — wait for healthy
Wrong REDIS_URL Check REDIS_URL matches the Redis port (6379)

Migration fails

Symptom:

Migration failed: Error: connect ECONNREFUSED 127.0.0.1:5432

Fix: PostgreSQL is not running or not reachable. Start it and verify health before running migrations.

Symptom:

Migration failed: Error: relation "agents" already exists

Fix: The migration has already been applied partially. Check schema_migrations:

psql "$DATABASE_URL" -c "SELECT name FROM schema_migrations ORDER BY name;"

If a migration is listed there but the table is inconsistent, manually inspect and repair the database state before re-running.


All requests return 401 after key rotation

Symptom: Every API call returns 401 UNAUTHORIZED with Token signature is invalid.

Cause: JWT keys were rotated. All previously issued tokens were signed with the old private key and are now invalid.

Fix: Clients must re-authenticate using POST /token with their client_id and client_secret to obtain a new token signed with the new key. This is expected behaviour after key rotation.


Rate limit hit unexpectedly — 429 responses

Symptom: API returns 429 RATE_LIMIT_EXCEEDED with X-RateLimit-Reset header.

Check current rate limit state:

# Find the current window key
WINDOW=$(node -e "console.log(Math.floor(Date.now() / 60000))")
# Check count for a specific client
docker-compose exec redis redis-cli GET "rate:<client_id>:$WINDOW"

Fix: Wait until X-RateLimit-Reset (Unix timestamp in the response header) before retrying. The window resets every 60 seconds.