High-Level Design
Detailed design of the Falsafa backend: tech stack, module layout, endpoints, ingestion and chat flows, data ownership, concurrency, failure isolation, and observability.
The high-level design documents how the backend is built, how its modules are organized, and how the two business endpoints behave end to end. It is the reference for anyone working on or debugging the backend.
Sections
Tech Stack
Python 3.11, FastAPI, Redis, Qdrant, TypeSense, BookNLP, OpenAI/Anthropic/Ollama, Jina reranker.
Module Architecture
The app.py → services/ → utils/ layering and what each module is responsible for.
Endpoints
The two business endpoints plus /health and /metrics. What they accept and what they return.
Ingestion Flow
End-to-end walkthrough of POST /character/processing: validate, enqueue, BookNLP, character analysis, indexing.
Chat Flow
End-to-end walkthrough of POST /character/chat: lock, query rewrite, hybrid retrieval, streaming, persist.
Data Ownership
Where every piece of backend-owned data lives: Supabase, Qdrant, TypeSense, Redis, and the local filesystem.
Concurrency Model
BookNLP singleton, JobManager lock, per-session chat locks, connection pools, and per-request async clients.
Failure Isolation
What happens when a character, a chunk, an embedding, retrieval, query rewrite, or a lock acquisition fails.
Observability
Prometheus metrics, the MetricsMiddleware, and the /health and /metrics endpoints.
Out of Scope
What the backend explicitly does not handle and where that work lives instead.
Backend Overview
The Falsafa backend is a focused Python ML/LLM service. It extracts characters from books and streams in-character chat replies. It does not own auth, payments, or user-facing data.
Tech Stack
Python 3.11, FastAPI, Redis, Qdrant, TypeSense, BookNLP, multi-provider LLM and embedding, Jina reranker. The runtime and infrastructure choices that shape the backend.