Streaming (SSE)¶

POST /threads/{thread_id}/runs/stream executes a run and streams its progress back as Server-Sent Events (text/event-stream). This is how UIs render token-by-token output and live state updates.

The response¶

The endpoint returns a streaming response with:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Content-Location: /threads/{thread_id}/runs/{run_id} — so the client knows the run id immediately.

Each event is encoded in the standard SSE framing, with a monotonically increasing id:

id: 1
event: metadata
data: {"run_id":"...","thread_id":"...","run":{...}}

id: 2
event: values
data: {"messages":[...]}

id: 3
event: end
data: {"run_id":"...","status":"success","usage":{"total_tokens":1234}}

JSON payloads are serialized compactly (no extra whitespace).

Event sequence¶

A successful stream always looks like:

metadata (first event) — the run is starting. Payload: {"run_id", "thread_id", "run": <RunModel>}.
Zero or more data events — the graph's output, in a shape determined by the requested stream_mode (see below).
end (terminal, on success) — {"run_id", "status": "success", "usage": {"total_tokens": <int>}}. The total is measured per run by a usage callback attached to the run's config (see Token usage).

If the graph raises, the terminal event is instead:

error — {"detail": "<message>", "run_id": "..."}.

If the client disconnects mid-stream, the run is marked interrupted.

Stream modes¶

The stream_mode field on the run request (a single mode or a list) selects how graph output is emitted. skeino dispatches three ways:

values (default)eventsother modes

Incremental value streaming. skeino subscribes to both LangGraph's messages and values streams and emits values events whose messages list grows token-by-token as agent nodes produce output. This is what powers live "typing" UIs.

Only nodes you declare as agent nodes (SkeinoSettings.agent_nodes) contribute streamed message chunks.
If you configure a status_field, new entries appended to that state-list field are emitted as separate status events ({"message": "..."}), useful for surfacing pipeline progress.

Passes LangGraph's astream_events (v2) output straight through as events events — the full event firehose, serialized as-is.

updates, messages, messages-tuple, tasks, checkpoints, debug, custom — skeino calls graph.astream(stream_mode=...) and forwards each chunk under an event name matching the mode.

The recognised modes are values, messages, messages-tuple, tasks, checkpoints, updates, events, debug, and custom.

Output filtering¶

If your graph declares an output_schema, skeino only emits the fields that schema declares, so internal state never leaks onto the wire.

Resilience¶

Streaming runs are hardened against transient backend failures:

Retry with backoff. If the graph stream fails before any event has been delivered with a retriable error (timeouts, SSL/connection/syscall errors, "could not receive data from server"), skeino retries with exponential backoff — up to a small number of attempts.
No replay after delivery. Once any event has reached the client, skeino does not retry — retrying would duplicate already-streamed output. The error surfaces instead.
Permanent errors fail fast. Programming errors (ValueError, KeyError, …) are never retried.
Disconnect handling. A client disconnect (CancelledError) marks the run interrupted and releases the thread lock in a finally block, so a dropped connection never wedges the thread.

Serialization on the wire¶

skeino normalises data in both directions:

Inbound — request input is converted to LangGraph-native objects. In particular, an input.messages list is converted to LangChain message objects, and run config is merged with the thread/checkpoint/run identifiers LangGraph needs.

Outbound — graph state is serialized to JSON-safe values. LangChain messages get a stable shape:

AI/human/system: {"id", "type", "content", "tool_calls"?, "additional_kwargs"?}
tool: {"id", "type": "tool", "tool_call_id", "name", "content"}

Multi-block message content is flattened to a single string, UUIDs and datetimes are stringified, and arbitrary objects fall back to their public attributes.

Summary of event types¶

Event	When	Payload
`metadata`	first, always	`{run_id, thread_id, run}`
`values`	`values` mode	`{messages: [...], ...}` (grows incrementally)
`status`	`values` mode, if `status_field` set	`{message}`
`events`	`events` mode	raw LangGraph v2 event
`updates` / `messages` / `tasks` / `checkpoints` / `debug` / `custom`	matching mode	LangGraph chunk for that mode
`end`	terminal, success	`{run_id, status: "success", usage: {total_tokens}}`
`error`	terminal, failure	`{detail, run_id}`

See Threads & runs for run lifecycle and the HTTP reference for the request schema.