Multi-turn conversation

▶ Try in Sandbox Opens a live chat playground with this example prefilled — add your API key then hit Send. Runs in your browser; no code is executed.

What you will achieve

Tell the model your name in turn 1, then ask it to recall the name in turn 2. The model answers correctly because the full history is included. Same code on every provider.

When and why you need this

A single-shot complete() call has no memory. Each request is stateless. To build a chatbot, a coding assistant, or any back-and-forth experience you must pass the prior conversation on each new request.

The challenge with raw provider SDKs is that message shapes differ:

OpenAI — { role: 'user' | 'assistant' | 'system', content: string }.
Anthropic — same shape for user/assistant, but system is a top-level field, not a messages-array entry.
Google — { role: 'user' | 'model', parts: [{ text: string }] } (note: 'model' not 'assistant').

If you build a portable history array you have to branch on provider before every send.

Step by step

Step 1 — Send a message and record the reply

import { complete, type Message } from '@combycode/llm-sdk';

const history: Message[] = [];

// Turn 1: user introduces themselves
history.push({ role: 'user', content: 'My name is Alex.' });

const r1 = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: history,
  maxTokens: 64,
});

// Append the assistant reply so the model sees it in turn 2
history.push({ role: 'assistant', content: r1.text });

The key step is appending the assistant’s reply to history after each turn. Without this the model has no context for the next question.

Step 2 — Send the follow-up turn

// Turn 2: ask it to recall the name
history.push({ role: 'user', content: 'What is my name?' });

const r2 = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: history,
  maxTokens: 32,
});

console.log(r2.text); // 'Your name is Alex.'

prompt accepts either a string (single user message) or a Message[] (full history). When you pass an array it becomes the entire messages list for the request.

Step 3 — Add a system prompt

The system prompt applies to all turns. Pass it once via system; do not include it in the history array:

const SYSTEM = 'You are a concise assistant. Reply in at most one sentence.';

const history: Message[] = [
  { role: 'user',      content: 'My name is Alex.' },
  { role: 'assistant', content: 'Nice to meet you, Alex.' },
  { role: 'user',      content: 'What is my name?' },
];

const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  system: SYSTEM,
  prompt: history,
  maxTokens: 32,
});

Step 4 — Use `ConversationHistory` for automatic tracking

For long-running conversations, ConversationHistory manages the array for you and adds token estimation and layered system-prompt context:

import { ConversationHistory, complete } from '@combycode/llm-sdk';

const conv = new ConversationHistory();

// Write the agent role to the registry (the preferred way for agents)
conv.registry.set('agent.role', 'You are a helpful assistant.', {
  priority: 10,
  tags: ['system'],
});

async function chat(userMessage: string): Promise<string> {
  conv.append({ role: 'user', content: userMessage });

  const { text, usage } = await complete({
    model: process.env.LLM_MODEL!,
    apiKey: process.env.LLM_API_KEY,
    system: conv.registry.flat({ tag: 'system' }),
    prompt: conv.messages(),
    maxTokens: 256,
  });

  conv.append({ role: 'assistant', content: text }, { usage });
  return text;
}

console.log(await chat('My name is Alex.'));
console.log(await chat('What is my name?'));
console.log(`Total tokens so far: ${conv.estimatedTokens()}`);

ConversationHistory also tracks usage, supports snapshots (for persistence), and exposes the layered registry for multi-contributor system prompts. See Layered context for the full registry API.

Step 5 — Handle a growing conversation

Conversations grow without bound. After many turns you will approach the model’s context window. Common strategies:

// Keep only the last 20 turns (10 pairs) + system
const MAX_TURNS = 20;
if (history.length > MAX_TURNS) {
  history.splice(0, history.length - MAX_TURNS);
}

For more sophisticated strategies (sliding window, summarisation) see the Context Guard guide.

Your options

Message shapes (Message type):

Field	Type	Notes
`role`	`'user' \| 'assistant' \| 'system' \| 'tool'`	Always use `'assistant'` — the SDK rewrites it to `'model'` for Google internally.
`content`	`string \| ContentPart[]`	A plain string for most cases. Use `ContentPart[]` for multi-modal content (images, audio, tool results).
`id`	`string`	Optional. Universal message id for dedup and referencing.
`createdAt`	`number`	Optional. Epoch ms. Used for server-state TTL checks.
`origin`	`MessageOrigin`	Set automatically by `llm.assistantMessage(r)`. Carries server-state id for stateful continuation.
`cache`	`boolean`	Mark this message’s content for prompt-cache pinning (Anthropic cache_control).

When to use a plain Message[] vs ConversationHistory:

Approach	When to use
`Message[]`	Simple scripts, short conversations, single-function call chains. You manage the array.
`ConversationHistory`	Multi-turn chatbots, agents, anything that needs token tracking, export/import, or registry-based system prompts.

Multi-modal turns:

Images and documents are ContentPart[] entries in a message’s content. The structure is the same across all providers — the SDK maps them to the provider’s native format:

history.push({
  role: 'user',
  content: [
    { type: 'text',  text: 'What is in this image?' },
    { type: 'image', source: { type: 'url', url: 'https://example.com/photo.jpg' } },
  ],
});

Compare the SDKs

import { complete } from '@combycode/llm-sdk';

// Plain multi-turn: pass the full messages array. (Server-state continuation is
// a separate opt-in via createLLM + assistantMessage — see scenario 24.)
const t0 = performance.now();
const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: [
    { role: 'user', content: 'My name is Alex.' },
    { role: 'assistant', content: 'Nice to meet you, Alex.' },
    { role: 'user', content: 'What is my name?' },
  ],
  maxTokens: 32,
});

console.log(JSON.stringify({ result: text.trim(), ms: Math.round(performance.now() - t0) }));

The structural difference: official SDKs require knowing and handling role name differences ('assistant' vs 'model'), content format differences (string vs parts), and system prompt placement (top-level vs messages array). In a multi-provider application this is conditional logic on every turn. ORXA normalises role: 'assistant' to role: 'model' for Google, moves role: 'system' messages to the correct provider field, and wraps string content in parts for Google — your history array stays in one canonical format throughout.

Gotchas and next steps

Always append both sides. A common mistake is appending only the user message and omitting the assistant reply. The model then has no memory of what it said and may contradict itself.

Do not modify past messages. Some providers (Anthropic) will reject requests where message roles do not alternate (user/assistant/user/…). If you need to edit history, use ConversationHistory.spliceRange() which also handles token-count re-anchoring.

Context window vs token budget. The sum of all tokens in the history plus your maxTokens must not exceed the model’s context window. Use history.estimatedTokens() or countTokens() to check before sending.

Next steps:

Conversation state — let the server hold history (OpenAI Responses)
Layered context — dynamic, multi-contributor system prompts via history.registry
Streaming — stream replies inside a conversation loop
Token counting — measure history size before sending