Streaming (SSE)¶
POST /threads/{thread_id}/runs/stream executes a run and streams its progress
back as Server-Sent Events (text/event-stream). This is how UIs render
token-by-token output and live state updates.
The response¶
The endpoint returns a streaming response with:
Content-Type: text/event-streamCache-Control: no-cacheConnection: keep-aliveContent-Location: /threads/{thread_id}/runs/{run_id}— so the client knows the run id immediately.
Each event is encoded in the standard SSE framing, with a monotonically
increasing id:
id: 1
event: metadata
data: {"run_id":"...","thread_id":"...","run":{...}}
id: 2
event: values
data: {"messages":[...]}
id: 3
event: end
data: {"run_id":"...","status":"success","usage":{"total_tokens":1234}}
JSON payloads are serialized compactly (no extra whitespace).
Event sequence¶
A successful stream always looks like:
metadata(first event) — the run is starting. Payload:{"run_id", "thread_id", "run": <RunModel>}.- Zero or more data events — the graph's output, in a shape determined by
the requested
stream_mode(see below). end(terminal, on success) —{"run_id", "status": "success", "usage": {"total_tokens": <int>}}. The total is measured per run by a usage callback attached to the run's config (see Token usage).
If the graph raises, the terminal event is instead:
error—{"detail": "<message>", "run_id": "..."}.
If the client disconnects mid-stream, the run is marked interrupted.
Stream modes¶
The stream_mode field on the run request (a single mode or a list) selects how
graph output is emitted. skeino dispatches three ways:
Incremental value streaming. skeino subscribes to both LangGraph's
messages and values streams and emits values events whose messages
list grows token-by-token as agent nodes produce output. This is what powers
live "typing" UIs.
- Only nodes you declare as agent nodes (
SkeinoSettings.agent_nodes) contribute streamed message chunks. - If you configure a
status_field, new entries appended to that state-list field are emitted as separatestatusevents ({"message": "..."}), useful for surfacing pipeline progress.
Passes LangGraph's astream_events (v2) output straight through as events
events — the full event firehose, serialized as-is.
updates, messages, messages-tuple, tasks, checkpoints, debug,
custom — skeino calls graph.astream(stream_mode=...) and forwards each
chunk under an event name matching the mode.
The recognised modes are values, messages, messages-tuple, tasks,
checkpoints, updates, events, debug, and custom.
Output filtering¶
If your graph declares an output_schema, skeino only emits the fields that
schema declares, so internal state never leaks onto the wire.
Resilience¶
Streaming runs are hardened against transient backend failures:
- Retry with backoff. If the graph stream fails before any event has been delivered with a retriable error (timeouts, SSL/connection/syscall errors, "could not receive data from server"), skeino retries with exponential backoff — up to a small number of attempts.
- No replay after delivery. Once any event has reached the client, skeino does not retry — retrying would duplicate already-streamed output. The error surfaces instead.
- Permanent errors fail fast. Programming errors (
ValueError,KeyError, …) are never retried. - Disconnect handling. A client disconnect (
CancelledError) marks the runinterruptedand releases the thread lock in afinallyblock, so a dropped connection never wedges the thread.
Serialization on the wire¶
skeino normalises data in both directions:
Inbound — request input is converted to LangGraph-native objects. In
particular, an input.messages list is converted to LangChain message objects,
and run config is merged with the thread/checkpoint/run identifiers LangGraph
needs.
Outbound — graph state is serialized to JSON-safe values. LangChain messages get a stable shape:
- AI/human/system:
{"id", "type", "content", "tool_calls"?, "additional_kwargs"?} - tool:
{"id", "type": "tool", "tool_call_id", "name", "content"}
Multi-block message content is flattened to a single string, UUIDs and datetimes are stringified, and arbitrary objects fall back to their public attributes.
Summary of event types¶
| Event | When | Payload |
|---|---|---|
metadata |
first, always | {run_id, thread_id, run} |
values |
values mode |
{messages: [...], ...} (grows incrementally) |
status |
values mode, if status_field set |
{message} |
events |
events mode |
raw LangGraph v2 event |
updates / messages / tasks / checkpoints / debug / custom |
matching mode | LangGraph chunk for that mode |
end |
terminal, success | {run_id, status: "success", usage: {total_tokens}} |
error |
terminal, failure | {detail, run_id} |
See Threads & runs for run lifecycle and the HTTP reference for the request schema.