Skip to content

Threads & runs

skeino models conversations the same way the LangGraph Platform does: a thread holds the evolving state, a run is one execution against that thread, and a checkpoint is an immutable snapshot of the graph state at a point in time.

The data model

Thread  (thread_id)
├── metadata, config, status, ttl        ← stored in the metadata store
└── checkpoint stream                     ← stored in the checkpointer
        ├── checkpoint (stamped run_id)
        ├── checkpoint (stamped run_id)
        └── ...
Run     (run_id, belongs to a thread)
└── assistant_id, status, kwargs, error  ← stored in the metadata store
  • A thread owns a single checkpoint namespace. Every run against the thread reads and appends to the same checkpoint stream, so the thread's "current state" is simply its most recent checkpoint — regardless of which run wrote it.
  • A run is one invocation of the graph against a thread. Multiple runs execute sequentially against the same thread (see Concurrency below). Each run records the parameters it was invoked with and its terminal status.
  • A checkpoint is a LangGraph checkpoint tuple (config, values, metadata). skeino stamps each checkpoint's metadata with the run_id that produced it, so clients (including LangGraph Studio) can group checkpoints by run.

Two stores, two responsibilities

State is deliberately split across two backends:

Concern Stored in Backends
Thread & run rows (metadata, status, config, kwargs) metadata store Postgres (app_threads, app_runs tables) or in-memory
Graph state / checkpoints checkpointer Postgres saver or in-memory MemorySaver

A single checkpointer_scheme selects both halves (e.g. postgres uses Postgres for both); the default memory keeps both in-memory (ephemeral). See Persistence & checkpointers for the details.

Thread lifecycle

A thread is created with POST /threads. The request can:

  • supply an explicit thread_id (otherwise one is generated),
  • attach metadata,
  • choose if_exists behaviour ("raise" — the default — or "do_nothing"),
  • configure a ttl, and
  • seed initial state via supersteps (a list of node updates applied before any run executes).

Thread status is one of idle, busy, interrupted, or error. You can read a thread's metadata-plus-latest-values with GET /threads/{id}, its full latest checkpoint with GET /threads/{id}/state, and walk its checkpoint history with GET/POST /threads/{id}/history. To read state at a specific point in time, use GET /threads/{id}/state/{checkpoint_id} (or the POST .../state/checkpoint variant with a full config body).

You can update a thread's metadata with PATCH /threads/{id}, delete it (along with its runs and checkpoints) with DELETE /threads/{id}, and edit its state directly with POST /threads/{id}/state — a human-in-the-loop write that applies values (optionally as_node, and from a specific checkpoint) and returns the new checkpoint config.

Threads are searchable with POST /threads/search, which filters by ids, metadata, state values, and status, with pagination and field selection.

You can fork a thread with POST /threads/{id}/copy. This creates a new, independent thread seeded with the source's latest state (its metadata is copied and stamped with forked_from), so you can branch and explore — a what-if continuation, an isolated debug replay — without mutating the original. The copy is shallow: the current state carries over, not the full checkpoint history.

Run lifecycle

A run is created against an existing thread:

  • POST /threads/{id}/runs — execute to completion and return the final RunModel.
  • POST /threads/{id}/runs/stream — execute and stream events over SSE (see Streaming).

A run progresses through these statuses:

pendingrunningsuccess · error · interrupted (and timeout)

The run row records the assistant_id, the serialized invocation kwargs (input, config, checkpoint selection, stream options), the multitask_strategy, and — on failure — an error message. List a thread's runs with GET /threads/{id}/runs and fetch one with GET /threads/{id}/runs/{run_id}.

Input vs. command

A run is driven by either an input payload (new state to merge in) or a command (update / resume / goto) used to resume an interrupted graph. input.messages is converted to LangChain message objects automatically; see Streaming → serialization.

if_not_exists

By default a run against a missing thread is rejected. Set if_not_exists: "create" on the run request to have skeino create the thread on demand.

Concurrency: one run at a time

skeino enforces at most one in-flight run per thread. Each thread has its own lock, acquired before the run row is created (and, for streaming runs, before the SSE generator is returned — so a queued caller can't observe a stale "free" lock).

The multitask_strategy on the run request decides what happens when a thread is already busy:

Strategy Behaviour when the thread is busy
enqueue (default) Wait for the active run to finish, then proceed.
reject Fail immediately with 409 Conflict.
rollback Fail immediately with 409 Conflict.
interrupt Fail immediately with 409 Conflict.

Single-process scope

Locks are in-process asyncio locks, which is correct for a single-process deployment. A clustered, multi-process deployment would need a shared lock service; that is out of scope for v1.

Token usage

skeino measures each run's token usage with a LangChain UsageMetadataCallbackHandler attached to the run's config. The handler records every LLM call made during the run — including calls whose responses never reach checkpoint state — and is scoped to that run, so multi-turn threads report per-run totals. The total is surfaced:

  • on POST /threads/{id}/runs via the X-Tokens-Used response header, and
  • on the streaming endpoint inside the terminal end event's usage field.

When the handler records nothing (providers that don't populate usage_metadata plus model_name on their messages), skeino falls back to summing the total tokens across all AI messages in the final state, normalising the different provider formats (LangChain usage_metadata, Gemini, OpenAI/Groq/Bedrock). The fallback covers the thread's whole message history, so on multi-turn threads it reports cumulative totals rather than the run's own.