Falsafa
BackendHigh-Level Design

Ingestion Flow

End-to-end walkthrough of POST /character/processing. Synchronous validation, async job queue, BookNLP extraction, two-pass prompt generation, parallel indexing into Qdrant and TypeSense.

The ingestion endpoint turns a book file into indexed chunks and one system prompt per character. The synchronous part is small; the heavy work happens in a background worker.

Sequence

Synchronous Phase

request_validator.verify_request runs in the request handler. It:

  1. Looks up the book in Supabase by book_id. If it does not exist, returns 422.
  2. Downloads the file from the signed URL using httpx with a 120-second timeout.
  3. Checks the extension. Anything other than .txt or .md is rejected with 422.
  4. Pushes a job dict to the Redis list processing_queue via RPUSH.
  5. Returns 202 with the job id.

The file is held in memory by the request handler. If the handler crashes after enqueue, the worker will redownload the file from the signed URL on its own.

Async Phase

The JobManager is a long-running coroutine started in the FastAPI lifespan. It:

  1. Polls Redis with LPOP every 10 seconds.
  2. Acquires an asyncio.Lock to atomically check capacity (at most MAX_PROCESSING_WORKERS concurrent jobs) and reserve a slot.
  3. Dispatches the job to process_job via asyncio.to_thread because BookNLP is synchronous.

Per-Job Steps

process_job runs in a worker thread and operates on temp files in /tmp/{book_id}/. The steps are:

  1. BookNLP extraction. Run the entity, quote, coref, supersense, and event pipelines on the book file. The output is a set of JSON files in the temp directory.
  2. Character extraction. character_extraction.py parses the BookNLP output into a list of structured character profiles (name, aliases, descriptors, quote counts, top relations).
  3. System prompt generation. For each character, system_prompt_builder.py makes a two-pass LLM call:
    • Pass 1 at temperature=0.3 produces a structured CharacterAnalysis JSON (psychology, voice, motivations, relationships, signature themes).
    • Pass 2 at temperature=0.7 transforms that JSON into a free-text markdown persona prompt.
  4. Disk write. The final prompt is written to SYSTEM_PROMPT_STORAGE_PATH/{book_id}/{safe_name}.md.
  5. Supabase insert. A row is inserted into the characters table with the prompt, the BookNLP profile, and references back to the book.
  6. Index pipeline. The book's full text is cleaned, chunked by token count, and indexed. See below.
  7. Status update. The book row's processing_status is set to completed (or failed if anything threw and was caught).

Index Pipeline

index_pipeline.py runs after the per-character steps:

  1. Chunk the cleaned book text using tiktoken with o200k_base. The chunk size is configurable; each chunk gets a stable chunk_index based on its position in the book.
  2. Create a Qdrant collection named after the book_id (or recreate it if rerunning).
  3. Create a TypeSense collection named after the book_id (or recreate it if rerunning).
  4. Embed each chunk through the configured embedding provider. Embedding calls retry up to three times on BadRequestError with delays of 30, 120, and 200 seconds; other errors propagate.
  5. Upsert into Qdrant and TypeSense in parallel.

Failure Handling

  • Per-character failures are caught individually. A failed system prompt or Supabase insert for one character is logged, but the rest of the book's characters continue.
  • Per-chunk failures are counted. If any chunk fails to index, the book's processing_status is set to failed, but successfully indexed chunks remain in Qdrant and TypeSense.
  • Temp files. On success the entire /tmp/{book_id}/ directory is cleaned up. On failure the BookNLP output and character profiles are removed regardless (in a finally block), while the original input file and chunks directory are preserved for debugging.

The full failure matrix is in Failure Isolation.

On this page