High-Level Design

Detailed design of the Falsafa backend: tech stack, module layout, endpoints, ingestion and chat flows, data ownership, concurrency, failure isolation, and observability.

The high-level design documents how the backend is built, how its modules are organized, and how the two business endpoints behave end to end. It is the reference for anyone working on or debugging the backend.

Sections

Tech Stack

Python 3.11, FastAPI, Redis, Qdrant, TypeSense, BookNLP, OpenAI/Anthropic/Ollama, Jina reranker.

Module Architecture

The app.py → services/ → utils/ layering and what each module is responsible for.

Endpoints

The two business endpoints plus /health and /metrics. What they accept and what they return.

Ingestion Flow

End-to-end walkthrough of POST /character/processing: validate, enqueue, BookNLP, character analysis, indexing.

Chat Flow

End-to-end walkthrough of POST /character/chat: lock, query rewrite, hybrid retrieval, streaming, persist.

Data Ownership

Where every piece of backend-owned data lives: Supabase, Qdrant, TypeSense, Redis, and the local filesystem.

Concurrency Model

BookNLP singleton, JobManager lock, per-session chat locks, connection pools, and per-request async clients.

Failure Isolation

What happens when a character, a chunk, an embedding, retrieval, query rewrite, or a lock acquisition fails.

Observability

Prometheus metrics, the MetricsMiddleware, and the /health and /metrics endpoints.

Out of Scope

What the backend explicitly does not handle and where that work lives instead.