Observability
Prometheus metrics, the MetricsMiddleware, and the /health and /metrics endpoints. What is instrumented and where to scrape.
The backend exposes Prometheus metrics and a simple liveness probe. There is no tracing and no structured logging beyond the Python default logger; metrics are the primary observability surface.
Endpoints
GET /healthreturns200 text/plain "ok". No body, no caching headers, no dependencies checked. Used by Traefik and Docker health checks.GET /metricsreturns metrics in the Prometheus text format. No caching headers. TheMetricsMiddlewaredoes not set any cache-control.
MetricsMiddleware
Every HTTP request is auto-instrumented by a custom middleware built on prometheus-client. The middleware records request count, request duration, and in-flight count, labeled by endpoint, method, and status code. The metric names are listed below.
Metric Catalogue
The backend exposes the following metric families. All counters and histograms use the standard Prometheus naming conventions.
| Metric | Type | Labels | Meaning |
|---|---|---|---|---|
| http_requests_total | Counter | method, endpoint, status_code | Number of HTTP requests handled. |
| http_request_duration_seconds | Histogram | method, endpoint | End-to-end request latency. |
| http_requests_in_progress | Gauge | method, endpoint | In-flight request count. |
| processing_jobs_queued_total | Counter | (none) | Total jobs pushed to the Redis queue. |
| processing_jobs_processed_total | Counter | status (completed/failed) | Ingestion jobs by terminal status. |
| processing_characters_extracted_total | Counter | (none) | Profiles parsed from BookNLP output across all jobs. |
| processing_characters_processed_total | Counter | status | Per-character prompt generation outcomes. |
| processing_pipeline_duration_seconds | Histogram | phase | Time per ingestion phase (booknlp, extract, prompt, index). |
| processing_active_workers | Gauge | (none) | Worker threads currently processing a job. |
| processing_queue_depth | Gauge | (none) | Current length of the processing_queue Redis list. |
| indexing_chunks_generated_total | Counter | (none) | Text chunks produced from book files. |
| indexing_chunks_indexed_total | Counter | store (qdrant/typesense) | Chunks successfully upserted per store. |
| indexing_embeddings_generated_total | Counter | (none) | Embedding vectors generated. |
| indexing_duration_seconds | Histogram | pipeline | Duration of a single chunk index operation. |
The exact set of label values depends on runtime configuration. Use /metrics directly to see the live series.
Dashboards
There is no committed Grafana dashboard in this repository. Operators typically point Prometheus at the /metrics endpoint and build dashboards from the catalogue above. The two most useful series for a quick health check are processing_queue_depth (is the queue draining?) and http_request_duration_seconds (are chat streams within budget?).
Logs
Logging uses the Python logging module at the root level. Ingestion and chat flows log to stdout in the standard format. There is no JSON log formatter; if structured logs are needed, configure them at the deployment layer.
Failure Isolation
What happens when a character, a chunk, an embedding, retrieval, query rewrite, or a lock acquisition fails. The backend is designed so one bad component never aborts an entire book or chat.
Out of Scope
The backend is a focused ML and LLM service. This page lists what it does not do, and where that work lives in the frontend and Supabase.