SystemHigh-Level Design
Error Propagation
Component failure matrix and error event flows: how each service failure degrades the user experience and chat-specific error codes.
Cross-Component Error Propagation
Every component in the system can fail independently. The system is designed so that no single component failure takes down the entire application.
Component Failure Matrix
| Failed component | System behavior | User experience |
|---|---|---|
| Frontend | No HTTP responses. Traefik returns 502/503. | Cannot access the app at all. |
| Backend | Chat and ingestion unavailable. Frontend serves all other pages (browse, library, settings, payments). | Cannot send chat messages or upload books. Everything else works. |
| Supabase Auth | Login, registration, and session verification fail. Existing SSR sessions can still be read from cookies until they expire. | New users cannot sign up. Existing users continue until cookie expiry. |
| Supabase DB | Every page and API route that reads or writes fails. No books, no characters, no messages. | The app is effectively down. Static content still renders. |
| Supabase Storage | Book uploads fail (cover + file cannot be written). Existing book covers and files are served from CDN cache. | Cannot upload new books. Existing books display covers and download fine from cache. |
| Redis | Chat and ingestion fail. Session caches miss on every request (falls through to Supabase which may still be up). No per-session lock means concurrent writes could interleave. | Chat likely fails or produces garbled responses. Book uploads enqueue but cannot be dequeued. |
| Qdrant | Vector search returns empty results. Chat continues with BM25-only context from TypeSense. | Chat responses may be less semantically relevant but still work. |
| TypeSense | BM25 search returns empty results. Chat continues with vector-only context from Qdrant. | Same as Qdrant failure - reduced relevance, still functional. |
| Both Qdrant + TypeSense | Hybrid retrieval returns empty. Empty context string substituted. Chat continues without grounding. | Chat responses are generated entirely from the LLM's training data. May hallucinate. Still functional. |
| Jina Reranker | Falls back to sorting by raw score from Qdrant/TypeSense within the backend. | No visible difference to the user. |
| LLM provider | 502 SSE error emitted. Messages not persisted. | User sees "AI service unavailable" error. Book ingestion completes but with empty system prompts. |
| Embedding provider | Ingestion chunk indexing fails. Chunk is logged and skipped. Book processing_status = failed. | User sees "Processing failed" on the book page. Already-indexed chunks remain available. |
| Stripe | Payment intents cannot be created or confirmed. Existing webhooks still fire. | Users cannot purchase books. Already-purchased books in library remain accessible. |
| BookNLP | Ingestion job fails at the character extraction step. | User sees "Processing failed" on the book page. |
Error Event Flow: Chat-Specific
LLM Fails Mid-Stream:
Backend → Frontend: SSE {"error":"LLM call failed","code":502}
Backend → Frontend: SSE {"done":true}
(No messages persisted to Supabase)
Frontend → User: "AI service error. Please try again."
Lock Contention:
Backend → Frontend: SSE {"error":"Session busy","code":429}
Backend → Frontend: SSE {"done":true}
Frontend → User: "This conversation is busy with another request. Try again."
Session Not Found:
Backend → Frontend: HTTP 404 (not SSE)
Frontend → User: Redirect to /chat with "Session not found"
Cache Miss (non-fatal):
Backend → Supabase: fetch data
Backend → Redis: SETEX with TTL
(User sees no difference)