Falsafa
SystemHigh-Level Design

Chat Message Flow

End-to-end trace of a single chat message: browser to SSE stream, including caching, locking, rewrite, hybrid retrieval, rerank, and persist.

End-to-End Flow: Chat Message

This flow traces a single user message from the chat input to the SSE stream back.

Payloads at Each Hop

Browser to Frontend - chat message:

POST /api/chat
Content-Type: application/json
Cookie: sb-{ref}-auth-token=<token>

{
  "session_id": "{session_id}",
  "message": "What would you do if you were free?"
}

Frontend to Supabase - session & preference fetch:

-- Validate session ownership
SELECT id, user_id, character_id, book_id FROM chat_sessions WHERE id = '{session_id}'

-- Fetch user preferences for this character
SELECT relationship_mode, custom_relationship, speech_modifiers, behavioral_modifiers, preference_summary
FROM user_preferences WHERE user_id = '{user_id}' AND character_id = '{character_id}'

Frontend compiles preferences into a modifier block:

USER PREFERENCE MODIFIERS
- Relationship mode: partner
- Speech modifiers: be_informal
- Behavioral modifiers: be_flirty
- Preference summary: Socrates should challenge my assumptions

Frontend to Backend - proxied chat request:

POST http://backend:8001/character/chat
Content-Type: application/json
Accept: text/event-stream
X-User-Id: {user_id}
X-API-Key: {process.env.BACKEND_API_KEY || ''}

{
  "user_id": "{user_id}",
  "character_id": "{character_id}",
  "book_id": "{book_id}",
  "session_id": "{session_id}",
  "user_message": "What would you do if you were free?"
}

Note: X-API-Key is sent by the frontend but the backend does not validate it. The X-User-Id header is advisory only - the backend reads user_id from the JSON body.

Backend to Redis - cache check + lock:

# Cache check (individual GET calls, no MGET)
GET chat:char:{character_id}
GET chat:sess:{session_id}
GET chat:book:{book_id}

# On any miss, fetch from Supabase and SETEX with 3600s TTL
SELECT id, user_id, character_id, message_count FROM chat_sessions WHERE id = '{session_id}'
SELECT id, name, system_prompt, profile_json FROM characters WHERE id = '{character_id}'
SELECT id, title, author, description FROM books WHERE id = '{book_id}'

# Acquire lock
SET chat:lock:{session_id} {hex_token} NX EX 30

Backend to Supabase - session validation (if cache miss):

SELECT id, user_id FROM chat_sessions WHERE id = '{session_id}'
-- user_id must match request.user_id

Backend to LLM - query rewrite:

POST https://gateway.truefoundry.ai/chat/completions
Authorization: Bearer <openai_api_key>

{
  "model": "falsafa-temp/chat",
  "temperature": 0,
  "max_tokens": 200,
  "messages": [
    {"role": "system", "content": "Rewrite the user's question as a narrative query and a keyword query. Character psychology: \"Socratic questioning, relentless pursuit of definitions...\""},
    {"role": "user", "content": "What would you do if you were free?\n\nLast 4 messages: [\"Justice is harmony of the soul\", \"Tell me more about the forms\", \"The cave represents ignorance\", \"Education is the art of turning the soul towards the light\"]"}
  ]
}

Response:

{
  "narrative_query": "What actions would a philosopher who values truth and virtue take if they were liberated from the constraints of Athenian society and could pursue wisdom without restriction?",
  "keyword_query": "freedom philosopher truth virtue action escape prison"
}

Cache key: chat:qr:{character_id}:{sha256hex[:16]} where the hash input is user_message + last_4_messages_content.

Backend to Qdrant - vector search:

POST https://qdrant:6333/collections/{book_id}/points/search
api-key: <qdrant_api_key>

{
  "vector": [0.002, ..., 0.412],  // embedded narrative_query
  "limit": 10,
  "with_payload": true
}

Response:

{
  "result": [
    {
      "id": 42,
      "score": 0.89,
      "payload": {
        "chunk_index": 42,
        "text": "Socrates: The unexamined life is not worth living. If I were free of these chains, I would spend every hour in the agora, questioning those who claim to know...",
        "book_id": "{book_id}"
      }
    },
    ...
  ]
}

Backend to TypeSense - BM25 search:

GET https://typesense:8108/collections/{book_id}/documents/search?q=freedom+philosopher+truth+virtue&query_by=text&per_page=10
X-TYPESENSE-API-KEY: <typesense_api_key>

Response:

{
  "hits": [
    {
      "document": {
        "id": "{book_id}:42",
        "chunk_index": 42,
        "text": "Socrates: The unexamined life is not worth living...",
        "book_id": "{book_id}"
      }
    },
    ...
  ]
}

Backend to Jina Reranker - rerank:

POST https://api.jina.ai/v1/rerank
Authorization: Bearer <reranker_api_key>

{
  "model": "jina-reranker-v3",
  "query": "What would you do if you were free?",
  "documents": [
    "Socrates: The unexamined life is not worth living...",
    "Glaucon: Would you not escape the cave...",
    ...
  ],
  "top_n": 5,
  "return_documents": false
}

Response:

{
  "results": [
    {"index": 0, "relevance_score": 0.97},
    {"index": 1, "relevance_score": 0.82}
  ]
}

The backend uses the index field (0-based position in the input) to look up the original chunk, then attaches the relevance_score to it. If RERANKER_API_KEY is empty, the backend falls back to sorting chunks by their raw similarity scores from Qdrant and TypeSense.

Backend to LLM - stream completion:

POST https://gateway.truefoundry.ai/chat/completions
Authorization: Bearer <openai_api_key>
Accept: text/event-stream

{
  "model": "falsafa-temp/chat",
  "stream": true,
  "messages": [
    {"role": "system", "content": "I am Socrates...\n\nRelevant Passages:\n[1] Socrates: The unexamined life is not worth living...\n[2] Glaucon: Would you not escape the cave...\n\nConversation Summary: The user and Socrates have discussed justice, the forms, and the allegory of the cave."},
    {"role": "user", "content": "What would you do if you were free?"}
  ]
}

Backend to Frontend - SSE stream:

data: {"token":"If "}
data: {"token":"I "}
data: {"token":"were "}
data: {"token":"free "}
data: {"token":"from "}
data: {"token":"the "}
data: {"token":"chains "}
data: {"token":"of "}
data: {"token":"this "}
data: {"token":"body"}
data: {"token":","}
data: {"token":" I "}
data: {"token":"would "}
data: {"token":"devote "}
data: {"token":"every "}
data: {"token":"day "}
data: {"token":"to "}
data: {"token":"the "}
data: {"token":"pursuit "}
data: {"token":"of "}
data: {"token":"wisdom"}
data: {"token":"."}
data: {"done":true,"full_response":"If I were free from the chains of this body, I would devote every day to the pursuit of wisdom."}

Backend to Supabase - persist after stream:

INSERT INTO messages(session_id, role, content)
VALUES ('{session_id}', 'user', 'What would you do if you were free?');

INSERT INTO messages(session_id, role, content)
VALUES ('{session_id}', 'assistant', 'If I were free from the chains of this body, I would devote every day to the pursuit of wisdom.');

UPDATE chat_sessions
SET message_count = message_count + 1,
    preview = 'If I were free from the chains of this body...',
    updated_at = NOW()
WHERE id = '{session_id}';

Backend to Redis - session cache update:

SETEX chat:sess:{session_id} 3600 "<updated session JSON with last 10 messages + summary>"

# Release lock (Lua CAS - script compares stored token before deleting)
EVAL "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end" 1 chat:lock:{session_id} {hex_token}

Flow Diagram

On this page