# Operations Startup, shutdown, log interpretation, and troubleshooting for AgentIdP. --- ## Startup Order Always start services in this order. Starting the application before PostgreSQL or Redis is ready will cause connection errors on first request. ``` 1. PostgreSQL (must be healthy) 2. Redis (must be healthy) 3. Migrations (must complete successfully) 4. Application (start last) ``` ### Startup checklist ```bash # 1. Start the full stack docker compose up --build -d # 2. Verify all three services are healthy docker compose ps # app, postgres, and redis must all show "healthy" # 3. Run migrations docker compose exec app npm run db:migrate # 4. Verify application health curl http://localhost:3000/health # Expected: {"status":"ok"} # 5. (Optional) Start the portal for local dev cd portal && npm run dev ``` --- ## Graceful Shutdown The application handles `SIGTERM` and `SIGINT` gracefully: 1. Stops accepting new connections 2. Waits for in-flight requests to complete 3. Exits with code `0` ### Sending SIGTERM ```bash # Find the PID ps aux | grep "node.*server" # Send SIGTERM kill -SIGTERM ``` Expected log output: ``` Shutting down gracefully... ``` The process exits cleanly. No requests are dropped if they were already in-flight. ### Docker stop `docker stop` sends `SIGTERM` by default with a 10-second timeout before `SIGKILL`. This is sufficient for graceful shutdown. ```bash docker stop sentryagent-idp-app-1 ``` --- ## Log Reference AgentIdP logs to stdout. In development (`NODE_ENV=development`), Morgan HTTP request logs are included. In test (`NODE_ENV=test`), Morgan is suppressed. ### Startup logs | Log line | Meaning | |----------|---------| | `SentryAgent.ai AgentIdP listening on port 3000` | Server bound successfully — ready to accept requests | | `Shutting down gracefully...` | SIGTERM/SIGINT received — draining connections | ### Error logs | Log line | Meaning | |----------|---------| | `Failed to start server: Error: DATABASE_URL environment variable is required` | `DATABASE_URL` is not set in the environment | | `Failed to start server: Error: REDIS_URL environment variable is required` | `REDIS_URL` is not set | | `Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required` | One or both JWT keys are missing | | `Unexpected pg pool error ` | PostgreSQL connection dropped after startup — check DB availability | | `Redis client error ` | Redis connection error after startup — check Redis availability | ### Morgan HTTP request format (development) ``` ::1 - - [28/Mar/2026:09:01:00 +0000] "POST /api/v1/token HTTP/1.1" 200 312 "-" "curl/7.88.1" ``` Format: ` - - [] " " "" ""` --- ## Redis Key Patterns Three key patterns are used in Redis. Useful for debugging and manual inspection. ```bash # Connect to Redis CLI docker compose exec redis redis-cli ``` | Key pattern | Example | Purpose | TTL | |------------|---------|---------|-----| | `revoked:` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime | | `rate::` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` | | `monthly:::` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month | | `rate:tier:calls:` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC | | `rate:tier:tokens:` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC | | `compliance:report:` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes | Inspect keys: ```bash # List all revoked tokens redis-cli KEYS "revoked:*" # Check rate limit counter for a specific client redis-cli GET "rate::" # Check monthly token count for a specific client redis-cli GET "monthly::2026:3" # Check tier API call counter for a tenant redis-cli GET "rate:tier:calls:" # Check tier token counter for a tenant redis-cli GET "rate:tier:tokens:" # Check cached compliance report for a tenant redis-cli GET "compliance:report:" redis-cli TTL "compliance:report:" ``` Where `` is `floor(unix_ms / 60000)`. For the current window: ```bash node -e "console.log(Math.floor(Date.now() / 60000))" ``` --- ## Troubleshooting ### Application fails to start — missing environment variable **Symptom:** ``` Failed to start server: Error: DATABASE_URL environment variable is required ``` **Fix:** Ensure your `.env` file exists in the project root and contains all required variables. Verify: ```bash grep -E "^(DATABASE_URL|REDIS_URL|JWT_PRIVATE_KEY|JWT_PUBLIC_KEY)=" .env ``` --- ### Application fails to start — JWT key error **Symptom:** ``` Failed to start server: Error: JWT_PRIVATE_KEY and JWT_PUBLIC_KEY environment variables are required ``` **Fix:** Generate RSA keys and add them to `.env`. See [security.md](security.md). --- ### PostgreSQL connection refused on first request **Symptom:** ``` Error: connect ECONNREFUSED 127.0.0.1:5432 ``` **Causes and fixes:** | Cause | Fix | |-------|-----| | PostgreSQL container not started | Run `docker compose up -d postgres` | | PostgreSQL container not yet healthy | Wait and run `docker compose ps` — wait for `healthy` | | Wrong `DATABASE_URL` host/port | Check `DATABASE_URL` matches the PostgreSQL port (5432) | | PostgreSQL container exited | Run `docker compose logs postgres` to see why it exited | --- ### Redis connection error on first request **Symptom:** ``` Redis client error Error: connect ECONNREFUSED 127.0.0.1:6379 ``` **Causes and fixes:** | Cause | Fix | |-------|-----| | Redis container not started | Run `docker compose up -d redis` | | Redis container not yet healthy | Run `docker compose ps` — wait for `healthy` | | Wrong `REDIS_URL` | Check `REDIS_URL` matches the Redis port (6379) | --- ### Migration fails **Symptom:** ``` Migration failed: Error: connect ECONNREFUSED 127.0.0.1:5432 ``` **Fix:** PostgreSQL is not running or not reachable. Start it and verify health before running migrations. **Symptom:** ``` Migration failed: Error: relation "agents" already exists ``` **Fix:** The migration has already been applied partially. Check `schema_migrations`: ```bash psql "$DATABASE_URL" -c "SELECT name FROM schema_migrations ORDER BY name;" ``` If a migration is listed there but the table is inconsistent, manually inspect and repair the database state before re-running. --- ### All requests return 401 after key rotation **Symptom:** Every API call returns `401 UNAUTHORIZED` with `Token signature is invalid.` **Cause:** JWT keys were rotated. All previously issued tokens were signed with the old private key and are now invalid. **Fix:** Clients must re-authenticate using `POST /token` with their `client_id` and `client_secret` to obtain a new token signed with the new key. This is expected behaviour after key rotation. --- ### Rate limit hit unexpectedly — 429 responses **Symptom:** API returns `429 RATE_LIMIT_EXCEEDED` with `X-RateLimit-Reset` header. **Check current rate limit state:** ```bash # Find the current window key WINDOW=$(node -e "console.log(Math.floor(Date.now() / 60000))") # Check count for a specific client docker compose exec redis redis-cli GET "rate::$WINDOW" ``` **Fix:** Wait until `X-RateLimit-Reset` (Unix timestamp in the response header) before retrying. The window resets every 60 seconds. --- ## Monitoring AgentIdP exposes a Prometheus metrics endpoint at `GET /metrics` (unauthenticated, plain text). ### Metrics Exposed | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued | | `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered | | `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests | | `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP latency | | `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration | | `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration | | `agentidp_webhook_dead_letters_total` | Counter | `event_type` | Webhook deliveries moved to dead-letter queue | | `agentidp_credentials_expiring_soon_total` | Gauge | — | Credentials expiring within 7 days | | `agentidp_audit_chain_integrity` | Gauge | — | `1` if audit chain is intact, `0` if broken | | `agentidp_rate_limit_hits_total` | Counter | `client_id` | Rate limit rejections | | `agentidp_db_pool_active_connections` | Gauge | — | Active PostgreSQL connections | | `agentidp_db_pool_waiting_requests` | Gauge | — | Requests waiting for a pool connection | | `agentidp_tenant_api_calls_total` | Counter | `org_id`, `tier` | API calls per tenant per tier | | `agentidp_billing_limit_rejections_total` | Counter | `org_id`, `limit_type` | Tier limit enforcement rejections | | `agentidp_did_documents_generated_total` | Counter | — | DID documents generated | | `agentidp_oidc_tokens_issued_total` | Counter | — | OIDC ID tokens issued | | `agentidp_federation_events_total` | Counter | `event_type` | Federation partner events | | `agentidp_delegation_chains_created_total` | Counter | — | A2A delegation chains created | | `agentidp_compliance_reports_generated_total` | Counter | — | Compliance reports generated | ### Starting the Monitoring Stack ```bash # Start the full stack with monitoring docker compose -f compose.yaml -f compose.monitoring.yaml up -d # Prometheus: http://localhost:9090 # Grafana: http://localhost:3001 (admin / ) ``` The Grafana dashboard auto-provisions on first start. Navigate to **Dashboards → AgentIdP → SentryAgent.ai — AgentIdP**. ### Security Note `GET /metrics` is unauthenticated. In production, ensure this endpoint is: - Only accessible from your internal network (firewall rule or reverse proxy restriction) - Not exposed on a public-facing port --- ### Tier limit rejected — 429 with `tier_limit_exceeded` code Symptom: `429 TOO_MANY_REQUESTS` with body `{"code":"tier_limit_exceeded","message":"..."}` Check the tenant's current tier counter: ```bash # Check API call counter docker compose exec redis redis-cli GET "rate:tier:calls:" # Check the tenant's tier psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '';" ``` If the org is on the `free` tier and has hit 1,000 calls/day, upgrade the tier or wait until midnight UTC for the counter to reset. --- ### Analytics endpoints return 404 Cause: `ANALYTICS_ENABLED` is set to `false` in `.env`. Fix: Set `ANALYTICS_ENABLED=true` and restart the application. --- ### Compliance report returns 404 Cause: `COMPLIANCE_ENABLED` is set to `false` in `.env`. Fix: Set `COMPLIANCE_ENABLED=true` and restart the application. --- ### Portal CORS error Symptom: Browser console shows `Access-Control-Allow-Origin` error on requests to `http://localhost:3000`. Fix: Ensure `CORS_ORIGIN` in `.env` includes `http://localhost:3001`: ``` CORS_ORIGIN=http://localhost:3001 ``` Restart the application after changing this variable.