Server-side conversation state
What you will achieve
Section titled “What you will achieve”Send turn 1 ('Remember the number 42.'), capture the server-state id, then send
turn 2 ('What number?') with NO prior messages and confirm the model recalls 42.
When and why you need this
Section titled “When and why you need this”In standard client-side history you resend the full conversation on every turn. For long conversations this means:
- Growing cost — input tokens increase each turn, even for context the model has already processed.
- Growing latency — more tokens to transmit and process on each request.
- Bandwidth — the transcript travels over the wire every single turn.
OpenAI’s Responses API and xAI’s Interactions API both support server-side state: the
provider stores the conversation on their servers and you send only a previous_response_id
on subsequent turns. The provider reconstructs context from its server cache and combines
it with just the new user message. You pay for new tokens only.
Anthropic and Google do not offer this feature — they always require the full history.
Step by step
Section titled “Step by step”Step 1 — Create an LLMClient for a stateful provider
Section titled “Step 1 — Create an LLMClient for a stateful provider”import { createLLM, type Message } from '@combycode/llm-sdk';
const llm = createLLM({ model: process.env.LLM_MODEL!, // e.g. 'openai/gpt-4o' or 'xai/grok-3' apiKey: process.env.LLM_API_KEY,});createLLM() automatically detects which API type the model uses. OpenAI models use
the Responses API (api: 'responses'); xAI models use the Interactions API
(api: 'interactions'). You do not configure this manually.
Step 2 — Send the first turn and capture the assistant message
Section titled “Step 2 — Send the first turn and capture the assistant message”const messages: Message[] = [ { role: 'user', content: 'Remember the number 42.' },];
const r1 = await llm.complete(messages);
// assistantMessage() stamps the server-state id (response_id / interaction_id)// into the message's `origin.serverStateId` field.messages.push(llm.assistantMessage(r1));llm.assistantMessage(r1) does two things:
- Creates a
role: 'assistant'message with the model’s text. - When the client is on a stateful API (Responses or Interactions), embeds
r1.idintoorigin.serverStateIdon the message.
Without this step the next turn does not have the id needed to continue server-side.
Step 3 — Send the second turn — only the new message
Section titled “Step 3 — Send the second turn — only the new message”messages.push({ role: 'user', content: 'What number did I ask you to remember?' });
// The SDK detects origin.serverStateId in the last assistant message,// extracts it as previousResponseId, and sends only the new user message.const r2 = await llm.complete(messages);
console.log(r2.text); // 'You asked me to remember 42.'You pass the full messages array but the SDK decides what to actually send.
When it finds a usable serverStateId in the most-recent assistant message
(same provider, model within the TTL window), it sends only previousResponseId +
the new user message. The provider reconstructs the rest from its cache.
Step 4 — Inspect what was actually sent
Section titled “Step 4 — Inspect what was actually sent”The decision is automatic but observable. On the response object:
console.log(r2.id); // server-side response id for the next turnconsole.log(r2.usage); // input tokens will be much lower on turn 2+On a non-stateful provider (Anthropic, Google) the same code still works — the SDK transparently falls back to sending the full history. No code change needed when you run the same application against a different provider.
Step 5 — Opt out of server-state
Section titled “Step 5 — Opt out of server-state”To always send full history regardless of provider:
const r2 = await llm.complete(messages, { stateful: false });stateful: false disables the server-state optimisation for this call. Use it when:
- You are debugging and want to confirm what history the model is actually using.
- Your provider has a server-state bug and you need a workaround.
- You are doing a capability test that requires full-history semantics.
Step 6 — Pass an explicit previousResponseId
Section titled “Step 6 — Pass an explicit previousResponseId”You can also manage the id yourself:
const r1 = await llm.complete([{ role: 'user', content: 'Set x = 7.' }]);const stateId = r1.id;
// Later -- just the new message + explicit id, no history array needed:const r2 = await llm.complete( [{ role: 'user', content: 'What is x?' }], { previousResponseId: stateId },);When you set previousResponseId manually the SDK uses it verbatim and skips the
automatic detection logic. This is useful when you persist state ids to a database
and restore them across sessions.
Your options
Section titled “Your options”| Option / field | Where | Behaviour |
|---|---|---|
stateful: true | Default | SDK auto-detects server-state id in the last assistant message and optimises the send. |
stateful: false | ExecuteOptions | Always send full history. No server-state optimisation. Works on all providers. |
previousResponseId | ExecuteOptions | Manual: pass the id explicitly. SDK uses it verbatim; skips auto-detection. |
llm.assistantMessage(r) | LLMClient method | Creates the assistant Message with origin.serverStateId embedded. Required for auto-detection to work on the next turn. |
Server-state availability:
| Provider / API | Server-state support | Id field |
|---|---|---|
| OpenAI Responses API | Yes | response_id |
| xAI Interactions API | Yes | interaction_id |
| Anthropic Messages API | No — full history always | — |
| Google Generative AI | No — full history always | — |
The SDK’s fallback (full history on non-stateful providers) means your code is
portable: remove the provider prefix from model and the conversation still works,
just without the bandwidth/cost savings.
TTL and id expiry: server-side state ids expire (24 hours on OpenAI at the time of writing). If you store an id and replay it after the TTL the provider returns an error. The SDK does not retry automatically — it propagates the provider error so you can handle it (e.g. by resending full history).
Compare the SDKs
Section titled “Compare the SDKs”The structural difference: OpenAI’s Responses API exposes previous_response_id as
a request field and returns the id in the response. Vanilla OpenAI SDK code must
extract response.id, store it, and pass it back manually. There is no equivalent
feature in the Anthropic or Google SDKs. ORXA automates the extraction and re-injection
via llm.assistantMessage() + the stateful resolution logic in complete(), and
provides the same code path with a transparent fallback for providers that do not
support server state.
Gotchas and next steps
Section titled “Gotchas and next steps”assistantMessage() is required for auto-detection. If you push a bare
{ role: 'assistant', content: r1.text } the message carries no origin and the
SDK cannot find the server-state id. Always use llm.assistantMessage(r) to stamp
assistant turns in stateful conversations.
Expired ids throw. OpenAI and xAI return a 4xx error when a state id has expired. Wrap the second-turn call in a try/catch and fall back to resending full history if you receive this error in long-running or persisted sessions.
Model pinning for server state. The SDK checks that the origin.model in the
assistant message matches the current client’s model before sending the server-state
id. If you switch model mid-conversation (e.g. upgrade from gpt-4o-mini to gpt-4o)
the id is silently dropped and full history is sent instead.
Next steps:
- Multi-turn conversation — client-side history baseline
- Prompt caching — reduce cost on repeated large prefixes (complementary to server state)
- Layered context — manage dynamic system prompt layers across turns