docs: DevOps documentation — complete docs/devops/ set
Adds the full devops-documentation OpenSpec change implementation. Separate from docs/developers/ — serves a different audience (operators, not API consumers). docs/devops/: - README.md — index and system overview - architecture.md — components, ports, data flow, Redis key patterns - environment-variables.md — all 7 env vars (required + optional, formats, .env example) - database.md — 4-table schema, indexes, constraints, migration runner - local-development.md — docker-compose setup, health checks, startup, Dockerfile gap noted - security.md — RSA key generation/rotation, CORS, bcrypt, secret storage guidance - operations.md — startup order, graceful shutdown, log reference, troubleshooting QA gates: 48/48 tasks complete. All env vars verified against source. All table names verified against migrations. All ports verified against docker-compose.yml. All internal links resolve. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
249
docs/devops/operations.md
Normal file
249
docs/devops/operations.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Operations
|
||||
|
||||
Startup, shutdown, log interpretation, and troubleshooting for AgentIdP.
|
||||
|
||||
---
|
||||
|
||||
## Startup Order
|
||||
|
||||
Always start services in this order. Starting the application before PostgreSQL or Redis is ready will cause connection errors on first request.
|
||||
|
||||
```
|
||||
1. PostgreSQL (must be healthy)
|
||||
2. Redis (must be healthy)
|
||||
3. Migrations (must complete successfully)
|
||||
4. Application (start last)
|
||||
```
|
||||
|
||||
### Startup checklist
|
||||
|
||||
```bash
|
||||
# 1. Start PostgreSQL and Redis
|
||||
docker-compose up -d postgres redis
|
||||
|
||||
# 2. Wait for healthy status
|
||||
docker-compose ps
|
||||
# Both postgres and redis must show "healthy" before proceeding
|
||||
|
||||
# 3. Run migrations
|
||||
npm run db:migrate
|
||||
# Must complete with 0 errors before starting the app
|
||||
|
||||
# 4. Start the application
|
||||
npm run dev # development
|
||||
# or
|
||||
npm start # production (requires prior npm run build)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
The application handles `SIGTERM` and `SIGINT` gracefully:
|
||||
|
||||
1. Stops accepting new connections
|
||||
2. Waits for in-flight requests to complete
|
||||
3. Exits with code `0`
|
||||
|
||||
### Sending SIGTERM
|
||||
|
||||
```bash
|
||||
# Find the PID
|
||||
ps aux | grep "node.*server"
|
||||
|
||||
# Send SIGTERM
|
||||
kill -SIGTERM <pid>
|
||||
```
|
||||
|
||||
Expected log output:
|
||||
|
||||
```
|
||||
Shutting down gracefully...
|
||||
```
|
||||
|
||||
The process exits cleanly. No requests are dropped if they were already in-flight.
|
||||
|
||||
### Docker stop
|
||||
|
||||
`docker stop` sends `SIGTERM` by default with a 10-second timeout before `SIGKILL`. This is sufficient for graceful shutdown.
|
||||
|
||||
```bash
|
||||
docker stop sentryagent-idp-app-1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Log Reference
|
||||
|
||||
AgentIdP logs to stdout. In development (`NODE_ENV=development`), Morgan HTTP request logs are included. In test (`NODE_ENV=test`), Morgan is suppressed.
|
||||
|
||||
### Startup logs
|
||||
|
||||
| Log line | Meaning |
|
||||
|----------|---------|
|
||||
| `SentryAgent.ai AgentIdP listening on port 3000` | Server bound successfully — ready to accept requests |
|
||||
| `Shutting down gracefully...` | SIGTERM/SIGINT received — draining connections |
|
||||
|
||||
### Error logs
|
||||
|
||||
| Log line | Meaning |
|
||||
|----------|---------|
|
||||
| `Failed to start server: Error: DATABASE_URL environment variable is required` | `DATABASE_URL` is not set in the environment |
|
||||
| `Failed to start server: Error: REDIS_URL environment variable is required` | `REDIS_URL` is not set |
|
||||
| `Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required` | One or both JWT keys are missing |
|
||||
| `Unexpected pg pool error <err>` | PostgreSQL connection dropped after startup — check DB availability |
|
||||
| `Redis client error <err>` | Redis connection error after startup — check Redis availability |
|
||||
|
||||
### Morgan HTTP request format (development)
|
||||
|
||||
```
|
||||
::1 - - [28/Mar/2026:09:01:00 +0000] "POST /api/v1/token HTTP/1.1" 200 312 "-" "curl/7.88.1"
|
||||
```
|
||||
|
||||
Format: `<ip> - - [<timestamp>] "<method> <path> <protocol>" <status> <bytes> "<referrer>" "<user-agent>"`
|
||||
|
||||
---
|
||||
|
||||
## Redis Key Patterns
|
||||
|
||||
Three key patterns are used in Redis. Useful for debugging and manual inspection.
|
||||
|
||||
```bash
|
||||
# Connect to Redis CLI
|
||||
docker-compose exec redis redis-cli
|
||||
```
|
||||
|
||||
| Key pattern | Example | Purpose | TTL |
|
||||
|------------|---------|---------|-----|
|
||||
| `revoked:<jti>` | `revoked:f1e2d3c4-b5a6-...` | Revoked token JTI | Remaining token lifetime |
|
||||
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per minute window | 60 seconds |
|
||||
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Token issuance count for free tier | End of month |
|
||||
|
||||
Inspect keys:
|
||||
|
||||
```bash
|
||||
# List all revoked tokens
|
||||
redis-cli KEYS "revoked:*"
|
||||
|
||||
# Check rate limit counter for a specific client
|
||||
redis-cli GET "rate:<client_id>:<window_key>"
|
||||
|
||||
# Check monthly token count for a specific client
|
||||
redis-cli GET "monthly:<client_id>:2026:3"
|
||||
```
|
||||
|
||||
Where `<window_key>` is `floor(unix_ms / 60000)`. For the current window:
|
||||
|
||||
```bash
|
||||
node -e "console.log(Math.floor(Date.now() / 60000))"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Application fails to start — missing environment variable
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Failed to start server: Error: DATABASE_URL environment variable is required
|
||||
```
|
||||
|
||||
**Fix:** Ensure your `.env` file exists in the project root and contains all required variables. Verify:
|
||||
```bash
|
||||
grep -E "^(DATABASE_URL|REDIS_URL|JWT_PRIVATE_KEY|JWT_PUBLIC_KEY)=" .env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Application fails to start — JWT key error
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required
|
||||
```
|
||||
|
||||
**Fix:** Generate RSA keys and add them to `.env`. See [security.md](security.md).
|
||||
|
||||
---
|
||||
|
||||
### PostgreSQL connection refused on first request
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Error: connect ECONNREFUSED 127.0.0.1:5432
|
||||
```
|
||||
|
||||
**Causes and fixes:**
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| PostgreSQL container not started | Run `docker-compose up -d postgres` |
|
||||
| PostgreSQL container not yet healthy | Wait and run `docker-compose ps` — wait for `healthy` |
|
||||
| Wrong `DATABASE_URL` host/port | Check `DATABASE_URL` matches the PostgreSQL port (5432) |
|
||||
| PostgreSQL container exited | Run `docker-compose logs postgres` to see why it exited |
|
||||
|
||||
---
|
||||
|
||||
### Redis connection error on first request
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Redis client error Error: connect ECONNREFUSED 127.0.0.1:6379
|
||||
```
|
||||
|
||||
**Causes and fixes:**
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| Redis container not started | Run `docker-compose up -d redis` |
|
||||
| Redis container not yet healthy | Run `docker-compose ps` — wait for `healthy` |
|
||||
| Wrong `REDIS_URL` | Check `REDIS_URL` matches the Redis port (6379) |
|
||||
|
||||
---
|
||||
|
||||
### Migration fails
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Migration failed: Error: connect ECONNREFUSED 127.0.0.1:5432
|
||||
```
|
||||
|
||||
**Fix:** PostgreSQL is not running or not reachable. Start it and verify health before running migrations.
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
Migration failed: Error: relation "agents" already exists
|
||||
```
|
||||
|
||||
**Fix:** The migration has already been applied partially. Check `schema_migrations`:
|
||||
```bash
|
||||
psql "$DATABASE_URL" -c "SELECT name FROM schema_migrations ORDER BY name;"
|
||||
```
|
||||
If a migration is listed there but the table is inconsistent, manually inspect and repair the database state before re-running.
|
||||
|
||||
---
|
||||
|
||||
### All requests return 401 after key rotation
|
||||
|
||||
**Symptom:** Every API call returns `401 UNAUTHORIZED` with `Token signature is invalid.`
|
||||
|
||||
**Cause:** JWT keys were rotated. All previously issued tokens were signed with the old private key and are now invalid.
|
||||
|
||||
**Fix:** Clients must re-authenticate using `POST /token` with their `client_id` and `client_secret` to obtain a new token signed with the new key. This is expected behaviour after key rotation.
|
||||
|
||||
---
|
||||
|
||||
### Rate limit hit unexpectedly — 429 responses
|
||||
|
||||
**Symptom:** API returns `429 RATE_LIMIT_EXCEEDED` with `X-RateLimit-Reset` header.
|
||||
|
||||
**Check current rate limit state:**
|
||||
```bash
|
||||
# Find the current window key
|
||||
WINDOW=$(node -e "console.log(Math.floor(Date.now() / 60000))")
|
||||
# Check count for a specific client
|
||||
docker-compose exec redis redis-cli GET "rate:<client_id>:$WINDOW"
|
||||
```
|
||||
|
||||
**Fix:** Wait until `X-RateLimit-Reset` (Unix timestamp in the response header) before retrying. The window resets every 60 seconds.
|
||||
Reference in New Issue
Block a user