Error Propagation

Component failure matrix and error event flows: how each service failure degrades the user experience and chat-specific error codes.

Cross-Component Error Propagation

Every component in the system can fail independently. The system is designed so that no single component failure takes down the entire application.

Component Failure Matrix

Failed component	System behavior	User experience
Frontend	No HTTP responses. Traefik returns 502/503.	Cannot access the app at all.
Backend	Chat and ingestion unavailable. Frontend serves all other pages (browse, library, settings, payments).	Cannot send chat messages or upload books. Everything else works.
Supabase Auth	Login, registration, and session verification fail. Existing SSR sessions can still be read from cookies until they expire.	New users cannot sign up. Existing users continue until cookie expiry.
Supabase DB	Every page and API route that reads or writes fails. No books, no characters, no messages.	The app is effectively down. Static content still renders.
Supabase Storage	Book uploads fail (cover + file cannot be written). Existing book covers and files are served from CDN cache.	Cannot upload new books. Existing books display covers and download fine from cache.
Redis	Chat and ingestion fail. Session caches miss on every request (falls through to Supabase which may still be up). No per-session lock means concurrent writes could interleave.	Chat likely fails or produces garbled responses. Book uploads enqueue but cannot be dequeued.
Qdrant	Vector search returns empty results. Chat continues with BM25-only context from TypeSense.	Chat responses may be less semantically relevant but still work.
TypeSense	BM25 search returns empty results. Chat continues with vector-only context from Qdrant.	Same as Qdrant failure - reduced relevance, still functional.
Both Qdrant + TypeSense	Hybrid retrieval returns empty. Empty context string substituted. Chat continues without grounding.	Chat responses are generated entirely from the LLM's training data. May hallucinate. Still functional.
Jina Reranker	Falls back to sorting by raw score from Qdrant/TypeSense within the backend.	No visible difference to the user.
LLM provider	502 SSE error emitted. Messages not persisted.	User sees "AI service unavailable" error. Book ingestion completes but with empty system prompts.
Embedding provider	Ingestion chunk indexing fails. Chunk is logged and skipped. Book processing_status = failed.	User sees "Processing failed" on the book page. Already-indexed chunks remain available.
Stripe	Payment intents cannot be created or confirmed. Existing webhooks still fire.	Users cannot purchase books. Already-purchased books in library remain accessible.
BookNLP	Ingestion job fails at the character extraction step.	User sees "Processing failed" on the book page.

Error Event Flow: Chat-Specific

LLM Fails Mid-Stream:
  Backend → Frontend: SSE {"error":"LLM call failed","code":502}
  Backend → Frontend: SSE {"done":true}
  (No messages persisted to Supabase)
  Frontend → User: "AI service error. Please try again."

Lock Contention:
  Backend → Frontend: SSE {"error":"Session busy","code":429}
  Backend → Frontend: SSE {"done":true}
  Frontend → User: "This conversation is busy with another request. Try again."

Session Not Found:
  Backend → Frontend: HTTP 404 (not SSE)
  Frontend → User: Redirect to /chat with "Session not found"

Cache Miss (non-fatal):
  Backend → Supabase: fetch data
  Backend → Redis: SETEX with TTL
  (User sees no difference)

Error Propagation

Cross-Component Error Propagation

Component Failure Matrix

Error Event Flow: Chat-Specific

On this page