feat(phase-2): workstream 7 — Prometheus + Grafana Monitoring

- Add prom-client 15; shared registry in src/metrics/registry.ts (7 metrics)
- HTTP request counter + duration histogram via metricsMiddleware
- DB query duration histogram wrapping pg Pool.query
- Redis command duration histogram via typed instrumentRedisMethod wrapper
- agentidp_tokens_issued_total in OAuth2Service
- agentidp_agents_registered_total in AgentService
- GET /metrics unauthenticated endpoint (Prometheus text format)
- docker-compose.monitoring.yml overlay (Prometheus + Grafana)
- Grafana auto-provisioned datasource + pre-built AgentIdP dashboard
- docs/devops/operations.md monitoring section added
- 36/36 unit tests passing, 100% coverage on new metrics code
- Fix pre-existing unused import in tests/integration/agents.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-03-29 06:13:41 +00:00
parent 7d6e248a14
commit a504964e5f
21 changed files with 1053 additions and 15 deletions

View File

@@ -94,20 +94,20 @@
## Workstream 7: Prometheus + Grafana Monitoring
- [ ] 7.1 Add `prom-client` to dependencies (after CEO approval A0.4)
- [ ] 7.2 Write `src/metrics/registry.ts` — shared Prometheus Registry with all 7 metric definitions
- [ ] 7.3 Instrument `OAuth2Service.ts` — increment `agentidp_tokens_issued_total`
- [ ] 7.4 Instrument `AgentService.ts` — increment `agentidp_agents_registered_total`
- [ ] 7.5 Instrument `src/middleware/` — HTTP request counter and duration histogram
- [ ] 7.6 Instrument `src/db/pool.ts` — DB query duration histogram
- [ ] 7.7 Instrument `src/cache/redis.ts` — Redis command duration histogram
- [ ] 7.8 Add `GET /metrics` route (unauthenticated, Prometheus text format)
- [ ] 7.9 Write `monitoring/prometheus/prometheus.yml` — scrape config
- [ ] 7.10 Write `monitoring/grafana/provisioning/` — datasource + dashboard provisioning
- [ ] 7.11 Write `monitoring/grafana/dashboards/agentidp.json` — pre-built Grafana dashboard
- [ ] 7.12 Write `docker-compose.monitoring.yml` overlay
- [ ] 7.13 Update `docs/devops/operations.md` — monitoring section
- [ ] 7.14 QA: all 7 metrics verified under load, Grafana auto-provisions, no auth leak on /metrics
- [x] 7.1 Add `prom-client` to dependencies (after CEO approval A0.4)
- [x] 7.2 Write `src/metrics/registry.ts` — shared Prometheus Registry with all 7 metric definitions
- [x] 7.3 Instrument `OAuth2Service.ts` — increment `agentidp_tokens_issued_total`
- [x] 7.4 Instrument `AgentService.ts` — increment `agentidp_agents_registered_total`
- [x] 7.5 Instrument `src/middleware/` — HTTP request counter and duration histogram
- [x] 7.6 Instrument `src/db/pool.ts` — DB query duration histogram
- [x] 7.7 Instrument `src/cache/redis.ts` — Redis command duration histogram
- [x] 7.8 Add `GET /metrics` route (unauthenticated, Prometheus text format)
- [x] 7.9 Write `monitoring/prometheus/prometheus.yml` — scrape config
- [x] 7.10 Write `monitoring/grafana/provisioning/` — datasource + dashboard provisioning
- [x] 7.11 Write `monitoring/grafana/dashboards/agentidp.json` — pre-built Grafana dashboard
- [x] 7.12 Write `docker-compose.monitoring.yml` overlay
- [x] 7.13 Update `docs/devops/operations.md` — monitoring section
- [x] 7.14 QA: all 7 metrics verified under load, Grafana auto-provisions, no auth leak on /metrics
## Workstream 8: Multi-Region Deployment (Terraform)