Falsafa
SystemHigh-Level Design

Error Propagation

Component failure matrix and error event flows: how each service failure degrades the user experience and chat-specific error codes.

Cross-Component Error Propagation

Every component in the system can fail independently. The system is designed so that no single component failure takes down the entire application.

Component Failure Matrix

Failed componentSystem behaviorUser experience
FrontendNo HTTP responses. Traefik returns 502/503.Cannot access the app at all.
BackendChat and ingestion unavailable. Frontend serves all other pages (browse, library, settings, payments).Cannot send chat messages or upload books. Everything else works.
Supabase AuthLogin, registration, and session verification fail. Existing SSR sessions can still be read from cookies until they expire.New users cannot sign up. Existing users continue until cookie expiry.
Supabase DBEvery page and API route that reads or writes fails. No books, no characters, no messages.The app is effectively down. Static content still renders.
Supabase StorageBook uploads fail (cover + file cannot be written). Existing book covers and files are served from CDN cache.Cannot upload new books. Existing books display covers and download fine from cache.
RedisChat and ingestion fail. Session caches miss on every request (falls through to Supabase which may still be up). No per-session lock means concurrent writes could interleave.Chat likely fails or produces garbled responses. Book uploads enqueue but cannot be dequeued.
QdrantVector search returns empty results. Chat continues with BM25-only context from TypeSense.Chat responses may be less semantically relevant but still work.
TypeSenseBM25 search returns empty results. Chat continues with vector-only context from Qdrant.Same as Qdrant failure - reduced relevance, still functional.
Both Qdrant + TypeSenseHybrid retrieval returns empty. Empty context string substituted. Chat continues without grounding.Chat responses are generated entirely from the LLM's training data. May hallucinate. Still functional.
Jina RerankerFalls back to sorting by raw score from Qdrant/TypeSense within the backend.No visible difference to the user.
LLM provider502 SSE error emitted. Messages not persisted.User sees "AI service unavailable" error. Book ingestion completes but with empty system prompts.
Embedding providerIngestion chunk indexing fails. Chunk is logged and skipped. Book processing_status = failed.User sees "Processing failed" on the book page. Already-indexed chunks remain available.
StripePayment intents cannot be created or confirmed. Existing webhooks still fire.Users cannot purchase books. Already-purchased books in library remain accessible.
BookNLPIngestion job fails at the character extraction step.User sees "Processing failed" on the book page.

Error Event Flow: Chat-Specific

LLM Fails Mid-Stream:
  Backend → Frontend: SSE {"error":"LLM call failed","code":502}
  Backend → Frontend: SSE {"done":true}
  (No messages persisted to Supabase)
  Frontend → User: "AI service error. Please try again."

Lock Contention:
  Backend → Frontend: SSE {"error":"Session busy","code":429}
  Backend → Frontend: SSE {"done":true}
  Frontend → User: "This conversation is busy with another request. Try again."

Session Not Found:
  Backend → Frontend: HTTP 404 (not SSE)
  Frontend → User: Redirect to /chat with "Session not found"

Cache Miss (non-fatal):
  Backend → Supabase: fetch data
  Backend → Redis: SETEX with TTL
  (User sees no difference)

On this page