Endpoints
The backend exposes exactly two business endpoints: POST /character/processing and POST /character/chat. Plus /health and /metrics for operations.
The backend is intentionally small at the HTTP surface. There are two business endpoints and two operational endpoints. Anything more complex is composed from these.
Summary
| Method | Path | Purpose | Response |
|---|---|---|---|
| GET | /health | Liveness probe | 200 text/plain "ok" |
| GET | /metrics | Prometheus scrape | Prometheus text format |
| POST | /character/processing | Submit a book for ingestion | 202 with job accepted, or 400/422/500 |
| POST | /character/chat | Stream a chat reply | SSE stream of token, done, or error events |
POST /character/processing
Submit a book for ingestion. The request is validated synchronously and a job is enqueued; the response returns 202 immediately.
Request body
{
"book_id": "uuid",
"file_url": "https://<supabase>/storage/...",
"user_id": "uuid"
}Behavior
- Confirm the book exists in Supabase.
- Download the file via
httpxwith a 120-second timeout. - Reject anything that is not
.txtor.md(returns422). - Push a job dict to the Redis list
processing_queue. - Return
202with{"status": "accepted", "job_id": "..."}.
Error responses
400(malformed body)422(invalid file extension or missing book)500(internal failure, e.g., Supabase unreachable)
The actual BookNLP extraction, prompt generation, and indexing happen asynchronously in the JobManager. The frontend observes completion through the Supabase books.processing_status field.
POST /character/chat
Stream a chat reply for one message. Returns an SSE stream.
Request body
{
"user_id": "uuid",
"character_id": "uuid",
"book_id": "uuid",
"session_id": "uuid",
"user_message": "What would you do if you were free?"
}Behavior
The endpoint validates the session (it must exist in Supabase, and the user_id must match), acquires a per-session write lock in Redis, and then begins streaming. See Chat Flow for the full sequence.
Success events
data: {"token":"..."}, one per LLM tokendata: {"done":true,"full_response":"..."}, the final event
SSE error events
data: {"error":"...","code":429}(another request is already streaming for this session)data: {"error":"...","code":502}(LLM call failed mid-stream)
Session-not-found and user-mismatch are returned as HTTP 404 before the SSE stream starts. Once the stream begins, errors are emitted as SSE events.
GET /health
Liveness probe. Returns 200 text/plain "ok". No body, no caching headers, no dependencies checked. Used by Traefik and Docker health checks.
GET /metrics
Prometheus scrape endpoint. Returns metrics in the Prometheus text format. No caching headers. The full list of metrics is in Observability.
Why So Few Endpoints
The backend is a focused service. The frontend owns auth, library, payments, comments, notifications, and the admin panel. Pushing more endpoints into the backend would mean duplicating the auth and ownership logic that already lives in the frontend. Keeping the surface to two business endpoints keeps the trust boundary narrow: the backend trusts the user_id and book_id from the frontend, and that is enough.
Module Architecture
The app.py → services/ → utils/ layering. Services orchestrate business logic; utilities are independent, single-purpose primitives.
Ingestion Flow
End-to-end walkthrough of POST /character/processing. Synchronous validation, async job queue, BookNLP extraction, two-pass prompt generation, parallel indexing into Qdrant and TypeSense.