Module Architecture
The app.py → services/ → utils/ layering. Services orchestrate business logic; utilities are independent, single-purpose primitives.
The backend code is organized in three layers: an entrypoint, a small set of service modules that orchestrate work, and a larger set of utility modules that do one thing each.
Layering
app.py FastAPI entrypoint, lifespan (preload BookNLP, start JobManager)
├── /health, /metrics
├── POST /character/processing → services.character_processing.request_validator.verify_request
└── POST /character/chat → services.character_chat.prepare_chat → stream_chat
services/ Orchestration layer
├── character_chat.py SSE stream, query rewrite, RAG, session lock, persist
├── character_processing/
│ ├── request_validator.py Verify book exists, download, validate extension, enqueue
│ ├── job_manager.py Redis poll loop, capacity gate, asyncio.Lock
│ ├── job_processor.py BookNLP → extract → prompt → Supabase row → index
│ └── index_pipeline.py Chunk + Qdrant + TypeSense in parallel
└── chat_sessions.py Session, character, and book context fetch (Redis-cached), per-session lock
utils/ Single-purpose primitives
├── character_processing/
│ ├── book_nlp.py BookNLP wrapper, model patching, init/run
│ ├── character_extraction.py BookNLP JSON → structured character profile
│ └── system_prompt_builder.py Two-pass LLM persona generator
├── chat/
│ ├── chat_llm.py Unified async dispatcher (provider-agnostic)
│ ├── openai_provider.py
│ ├── anthropic_provider.py
│ └── ollama_provider.py
├── embedding/
│ ├── embedding.py Unified async embed dispatcher
│ ├── openai_provider.py
│ ├── cohere_provider.py
│ └── jina_provider.py
├── retriever/
│ ├── rerank.py Jina reranker
│ └── deduplicate.py Chunk ID dedup, first-occurrence-wins
├── wrappers/
│ ├── redis.py Pool, JSON ser, queue ops
│ ├── qdrant.py Cloud + local, async
│ ├── typesense.py BM25 + vector + conversation search
│ └── prometheus.py Metrics + middleware
├── cleaner/ Vector and keyword text cleaners
├── converters/ Not used at runtime
└── chunking.py Tiktoken-based word-level chunker
config.py Central env loader, hand-rolled KEY=VALUE parserWhat Each Layer Does
app.py is the FastAPI entrypoint. It mounts the two business routes, the health and metrics routes, and runs a lifespan handler that preloads the BookNLP models and starts the JobManager polling loop.
services/ contains a small number of modules, each owning one piece of business logic. A service module reads configuration, calls into utils/ to do the work, and writes back the result. Services do not call each other. They are the only place that knows the business flow.
utils/ is the largest layer. Each module does one thing (an LLM provider, an embedding provider, a Redis wrapper, the Qdrant wrapper, the reranker, the chunker, the text cleaners, and so on). Utilities are independent: they import config lazily, do not call each other, and have no shared state. This makes them easy to swap and easy to test in isolation.
config.py is the single hand-rolled KEY=VALUE env parser. Every module reads its own config from here, not from os.environ directly. This keeps the configuration surface in one place and makes it easy to log or override values for tests.
Service Responsibilities
services/character_chat.py owns the chat endpoint. It prepares the session context, calls into the query rewriter and the retriever, builds the system prompt, streams the LLM response, persists messages, and releases the per-session lock.
services/character_processing/request_validator.py owns the synchronous part of the ingestion endpoint. It verifies the book exists in Supabase, downloads the file, checks the extension, and pushes a job onto the Redis queue.
services/character_processing/job_manager.py owns the background job loop. It polls Redis, enforces the worker capacity, and dispatches jobs.
services/character_processing/job_processor.py owns the body of a single ingestion job. It runs BookNLP, extracts characters, generates system prompts, and triggers the index pipeline.
services/character_processing/index_pipeline.py owns the chunking and indexing step. It chunks the text, embeds it, and writes to Qdrant and TypeSense in parallel.
services/chat_sessions.py owns session, character, and book context loading with Redis caching, and the per-session write lock.
Tech Stack
Python 3.11, FastAPI, Redis, Qdrant, TypeSense, BookNLP, multi-provider LLM and embedding, Jina reranker. The runtime and infrastructure choices that shape the backend.
Endpoints
The backend exposes exactly two business endpoints: POST /character/processing and POST /character/chat. Plus /health and /metrics for operations.