Falsafa
BackendHigh-Level Design

Observability

Prometheus metrics, the MetricsMiddleware, and the /health and /metrics endpoints. What is instrumented and where to scrape.

The backend exposes Prometheus metrics and a simple liveness probe. There is no tracing and no structured logging beyond the Python default logger; metrics are the primary observability surface.

Endpoints

  • GET /health returns 200 text/plain "ok". No body, no caching headers, no dependencies checked. Used by Traefik and Docker health checks.
  • GET /metrics returns metrics in the Prometheus text format. No caching headers. The MetricsMiddleware does not set any cache-control.

MetricsMiddleware

Every HTTP request is auto-instrumented by a custom middleware built on prometheus-client. The middleware records request count, request duration, and in-flight count, labeled by endpoint, method, and status code. The metric names are listed below.

Metric Catalogue

The backend exposes the following metric families. All counters and histograms use the standard Prometheus naming conventions.

| Metric | Type | Labels | Meaning | |---|---|---|---|---| | http_requests_total | Counter | method, endpoint, status_code | Number of HTTP requests handled. | | http_request_duration_seconds | Histogram | method, endpoint | End-to-end request latency. | | http_requests_in_progress | Gauge | method, endpoint | In-flight request count. | | processing_jobs_queued_total | Counter | (none) | Total jobs pushed to the Redis queue. | | processing_jobs_processed_total | Counter | status (completed/failed) | Ingestion jobs by terminal status. | | processing_characters_extracted_total | Counter | (none) | Profiles parsed from BookNLP output across all jobs. | | processing_characters_processed_total | Counter | status | Per-character prompt generation outcomes. | | processing_pipeline_duration_seconds | Histogram | phase | Time per ingestion phase (booknlp, extract, prompt, index). | | processing_active_workers | Gauge | (none) | Worker threads currently processing a job. | | processing_queue_depth | Gauge | (none) | Current length of the processing_queue Redis list. | | indexing_chunks_generated_total | Counter | (none) | Text chunks produced from book files. | | indexing_chunks_indexed_total | Counter | store (qdrant/typesense) | Chunks successfully upserted per store. | | indexing_embeddings_generated_total | Counter | (none) | Embedding vectors generated. | | indexing_duration_seconds | Histogram | pipeline | Duration of a single chunk index operation. |

The exact set of label values depends on runtime configuration. Use /metrics directly to see the live series.

Dashboards

There is no committed Grafana dashboard in this repository. Operators typically point Prometheus at the /metrics endpoint and build dashboards from the catalogue above. The two most useful series for a quick health check are processing_queue_depth (is the queue draining?) and http_request_duration_seconds (are chat streams within budget?).

Logs

Logging uses the Python logging module at the root level. Ingestion and chat flows log to stdout in the standard format. There is no JSON log formatter; if structured logs are needed, configure them at the deployment layer.

On this page