Adds the full devops-documentation OpenSpec change implementation. Separate from docs/developers/ — serves a different audience (operators, not API consumers). docs/devops/: - README.md — index and system overview - architecture.md — components, ports, data flow, Redis key patterns - environment-variables.md — all 7 env vars (required + optional, formats, .env example) - database.md — 4-table schema, indexes, constraints, migration runner - local-development.md — docker-compose setup, health checks, startup, Dockerfile gap noted - security.md — RSA key generation/rotation, CORS, bcrypt, secret storage guidance - operations.md — startup order, graceful shutdown, log reference, troubleshooting QA gates: 48/48 tasks complete. All env vars verified against source. All table names verified against migrations. All ports verified against docker-compose.yml. All internal links resolve. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.8 KiB
Operations
Startup, shutdown, log interpretation, and troubleshooting for AgentIdP.
Startup Order
Always start services in this order. Starting the application before PostgreSQL or Redis is ready will cause connection errors on first request.
1. PostgreSQL (must be healthy)
2. Redis (must be healthy)
3. Migrations (must complete successfully)
4. Application (start last)
Startup checklist
# 1. Start PostgreSQL and Redis
docker-compose up -d postgres redis
# 2. Wait for healthy status
docker-compose ps
# Both postgres and redis must show "healthy" before proceeding
# 3. Run migrations
npm run db:migrate
# Must complete with 0 errors before starting the app
# 4. Start the application
npm run dev # development
# or
npm start # production (requires prior npm run build)
Graceful Shutdown
The application handles SIGTERM and SIGINT gracefully:
- Stops accepting new connections
- Waits for in-flight requests to complete
- Exits with code
0
Sending SIGTERM
# Find the PID
ps aux | grep "node.*server"
# Send SIGTERM
kill -SIGTERM <pid>
Expected log output:
Shutting down gracefully...
The process exits cleanly. No requests are dropped if they were already in-flight.
Docker stop
docker stop sends SIGTERM by default with a 10-second timeout before SIGKILL. This is sufficient for graceful shutdown.
docker stop sentryagent-idp-app-1
Log Reference
AgentIdP logs to stdout. In development (NODE_ENV=development), Morgan HTTP request logs are included. In test (NODE_ENV=test), Morgan is suppressed.
Startup logs
| Log line | Meaning |
|---|---|
SentryAgent.ai AgentIdP listening on port 3000 |
Server bound successfully — ready to accept requests |
Shutting down gracefully... |
SIGTERM/SIGINT received — draining connections |
Error logs
| Log line | Meaning |
|---|---|
Failed to start server: Error: DATABASE_URL environment variable is required |
DATABASE_URL is not set in the environment |
Failed to start server: Error: REDIS_URL environment variable is required |
REDIS_URL is not set |
Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required |
One or both JWT keys are missing |
Unexpected pg pool error <err> |
PostgreSQL connection dropped after startup — check DB availability |
Redis client error <err> |
Redis connection error after startup — check Redis availability |
Morgan HTTP request format (development)
::1 - - [28/Mar/2026:09:01:00 +0000] "POST /api/v1/token HTTP/1.1" 200 312 "-" "curl/7.88.1"
Format: <ip> - - [<timestamp>] "<method> <path> <protocol>" <status> <bytes> "<referrer>" "<user-agent>"
Redis Key Patterns
Three key patterns are used in Redis. Useful for debugging and manual inspection.
# Connect to Redis CLI
docker-compose exec redis redis-cli
| Key pattern | Example | Purpose | TTL |
|---|---|---|---|
revoked:<jti> |
revoked:f1e2d3c4-b5a6-... |
Revoked token JTI | Remaining token lifetime |
rate:<client_id>:<window> |
rate:a1b2c3...:29086156 |
Request count per minute window | 60 seconds |
monthly:<client_id>:<year>:<month> |
monthly:a1b2c3...:2026:3 |
Token issuance count for free tier | End of month |
Inspect keys:
# List all revoked tokens
redis-cli KEYS "revoked:*"
# Check rate limit counter for a specific client
redis-cli GET "rate:<client_id>:<window_key>"
# Check monthly token count for a specific client
redis-cli GET "monthly:<client_id>:2026:3"
Where <window_key> is floor(unix_ms / 60000). For the current window:
node -e "console.log(Math.floor(Date.now() / 60000))"
Troubleshooting
Application fails to start — missing environment variable
Symptom:
Failed to start server: Error: DATABASE_URL environment variable is required
Fix: Ensure your .env file exists in the project root and contains all required variables. Verify:
grep -E "^(DATABASE_URL|REDIS_URL|JWT_PRIVATE_KEY|JWT_PUBLIC_KEY)=" .env
Application fails to start — JWT key error
Symptom:
Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required
Fix: Generate RSA keys and add them to .env. See security.md.
PostgreSQL connection refused on first request
Symptom:
Error: connect ECONNREFUSED 127.0.0.1:5432
Causes and fixes:
| Cause | Fix |
|---|---|
| PostgreSQL container not started | Run docker-compose up -d postgres |
| PostgreSQL container not yet healthy | Wait and run docker-compose ps — wait for healthy |
Wrong DATABASE_URL host/port |
Check DATABASE_URL matches the PostgreSQL port (5432) |
| PostgreSQL container exited | Run docker-compose logs postgres to see why it exited |
Redis connection error on first request
Symptom:
Redis client error Error: connect ECONNREFUSED 127.0.0.1:6379
Causes and fixes:
| Cause | Fix |
|---|---|
| Redis container not started | Run docker-compose up -d redis |
| Redis container not yet healthy | Run docker-compose ps — wait for healthy |
Wrong REDIS_URL |
Check REDIS_URL matches the Redis port (6379) |
Migration fails
Symptom:
Migration failed: Error: connect ECONNREFUSED 127.0.0.1:5432
Fix: PostgreSQL is not running or not reachable. Start it and verify health before running migrations.
Symptom:
Migration failed: Error: relation "agents" already exists
Fix: The migration has already been applied partially. Check schema_migrations:
psql "$DATABASE_URL" -c "SELECT name FROM schema_migrations ORDER BY name;"
If a migration is listed there but the table is inconsistent, manually inspect and repair the database state before re-running.
All requests return 401 after key rotation
Symptom: Every API call returns 401 UNAUTHORIZED with Token signature is invalid.
Cause: JWT keys were rotated. All previously issued tokens were signed with the old private key and are now invalid.
Fix: Clients must re-authenticate using POST /token with their client_id and client_secret to obtain a new token signed with the new key. This is expected behaviour after key rotation.
Rate limit hit unexpectedly — 429 responses
Symptom: API returns 429 RATE_LIMIT_EXCEEDED with X-RateLimit-Reset header.
Check current rate limit state:
# Find the current window key
WINDOW=$(node -e "console.log(Math.floor(Date.now() / 60000))")
# Check count for a specific client
docker-compose exec redis redis-cli GET "rate:<client_id>:$WINDOW"
Fix: Wait until X-RateLimit-Reset (Unix timestamp in the response header) before retrying. The window resets every 60 seconds.