LLM Client
title: LLM Client
Section titled “title: LLM Client”LLM Client
Section titled “LLM Client”Layer 2. Format adapter only. Source: src/llm/.
Purpose and responsibilities
Section titled “Purpose and responsibilities”- Hold a fixed
(provider, model, apiKey, system)binding, immutable after construction. - Accept universal
(string | ContentPart[] | Message[])input and normalize it into a provider-specific HTTP body via aProviderAdapter. - Route the resulting request through the injected
EngineFetch/EngineFetchStream— never callsfetchdirectly. - Parse the provider’s raw HTTP response body into a normalized
CompletionResponseorAsyncIterable<StreamEvent>. - Emit lifecycle hooks:
onClientCreate,onMessageResolve,onBeforeSubmit,onCompletion,onClientDestroy.
Does NOT own: a queue, retry policy, cache, or rate limiter. Those belong to
NetworkEngine (queue/retry) and the Cache plugin (onBeforeSubmit intercept).
Key types
Section titled “Key types”src/llm/types/provider.ts
Section titled “src/llm/types/provider.ts”type ProviderName = 'anthropic' | 'openai' | 'google' | 'xai' | 'openrouter';type ApiType = 'completions' | 'responses' | 'messages' | 'interactions' | 'generate';
interface ProviderAdapter { readonly name: ProviderName; buildRequest(req: NormalizedRequest): ProviderHttpRequest; parseResponse(raw: unknown, latencyMs: number): CompletionResponse; parseStreamEvent(event: SSEEvent): StreamEvent[]; authHeaders(): Record<string, string>; baseURL(): string; completionPath(): string; enableStreaming?(providerReq: ProviderHttpRequest, req: NormalizedRequest): void;}
interface ProviderHttpRequest { body: Record<string, unknown>; headers?: Record<string, string>; // extra request headers (e.g. Anthropic beta flags) path?: string; // override default completionPath()}src/llm/types/request.ts
Section titled “src/llm/types/request.ts”interface NormalizedRequest { model: string; messages: Message[]; system?: string; maxTokens?: number; temperature?: number; topP?: number; stop?: string[]; tools?: Tool[]; toolChoice?: ToolChoice; structured?: { schema: Record<string, unknown>; name?: string; strict?: boolean }; thinking?: ThinkingConfig; // { mode: 'auto'|'on'|'off'; effort?: 'low'|'medium'|'high'|'max' } cache?: CacheConfig; // 'auto' | 'off' | { system?, tools?, ttl? } serviceTier?: ServiceTier; providerOptions?: Record<string, unknown>; audio?: AudioOptions; outputModalities?: Array<'text' | 'audio'>; previousResponseId?: string; // Responses/Interactions API chain continuation timeout?: number; signal?: AbortSignal;}src/llm/types/response.ts
Section titled “src/llm/types/response.ts”interface CompletionResponse { id: string; model: string; content: ContentPart[]; finishReason: FinishReason; // 'stop' | 'tool_use' | 'length' | 'content_filter' | 'error' usage: Usage; text: string; // convenience: joined text parts toolCalls: ToolCallPart[]; thinking: string | null; media: MediaOutputPart[]; latencyMs: number; raw: unknown; // provider's raw HTTP response body}
interface Usage { inputTokens: number; outputTokens: number; totalTokens: number; cachedTokens: number; cacheWriteTokens: number; reasoningTokens: number; audioInputTokens?: number; audioOutputTokens?: number; serviceTier?: string; // raw provider tier name (e.g. 'batch', 'priority') pricingTier?: string; // adapter-normalized tier key → for cost lookup}src/llm/types/stream.ts
Section titled “src/llm/types/stream.ts”type StreamEvent = | { type: 'text'; text: string } | { type: 'thinking'; text: string } | { type: 'tool_call_start'; id: string; name: string; _meta?: Record<string, unknown> } | { type: 'tool_call_delta'; id: string; arguments: string } | { type: 'tool_call_end'; id: string } | { type: 'usage'; usage: Usage } | { type: 'done'; finishReason: string } | { type: 'error'; error: Error } | { type: 'media_start'; mediaType: 'image'|'audio'|'video'; mimeType: string } | { type: 'media_chunk'; data: string; progress?: number } | { type: 'media_end'; mediaId?: string };LLMClient class (src/llm/client.ts)
Section titled “LLMClient class (src/llm/client.ts)”Immutable fields set at construction: id (UUID), sessionId, provider,
model, system, hooks, api (ApiType), mode ('foreground'|'background'),
batchable, adapter (ProviderAdapter), fetchFn, fetchStreamFn,
priority, queueName, configName, cacheName, cacheKeyFn, catalog.
resolveApi (src/llm/client-internal.ts)
Section titled “resolveApi (src/llm/client-internal.ts)”Provider defaults when api is omitted or 'auto':
anthropic→'messages'openai→'responses'google→'generate'xai→'responses'openrouter→'completions'
resolveAdapter
Section titled “resolveAdapter”If config.adapter is a function (factory), calls adapter(provider, apiKey, api, baseURL) to get the concrete ProviderAdapter. Otherwise uses the object directly.
Input normalization (src/llm/client-internal.ts)
Section titled “Input normalization (src/llm/client-internal.ts)”normalizeInput(input):
string→[{ role: 'user', content: string }]ContentPart[](first element lacksrole) →[{ role: 'user', content: parts }]Message[]→ used as-is
extractSystem(messages): lifts every role: 'system' message out of the
messages array and concatenates their text content. Adapters never see
role: 'system'; they receive only system as a top-level field in
NormalizedRequest. This makes per-call system text work on all providers
(Anthropic rejects role: 'system' in the messages array).
System composition in complete():
composedSystem = [options.system, systemFromMessages, this.system] .filter(truthy) .join('\n\n')complete() flow (src/llm/client.ts:147)
Section titled “complete() flow (src/llm/client.ts:147)”normalizeInput(input)→rawMessages.extractSystem(rawMessages)→{ system: systemFromMessages, messages }.- Compose
composedSystem(three-way join above). buildContext(this, options)→RequestContextwith mintedrequestId(req_<12-char UUID>),callId(call_<8-char UUID>),sessionId,queueName,configName,cacheName,cacheKey.- Build
NormalizedRequestfrom config + per-call options. - Emit
onMessageResolve(async). Handlers (FilesRegistry, ContextGuard) may:- Mutate
resolveCtx.messagesandresolveCtx.systemin place. - Set
resolveCtx.abort = true(+ optionalabortReason) to cancel.
- Mutate
- Re-anchor
normalized.messagesandnormalized.systemfrom the (possibly mutated)resolveCtx. - Server-state resolution: unless
previousResponseIdis explicitly set oroptions.stateful === false, callresolveServerState(...)to determine whether to sendprevious_response_idand trim the messages array to just the new turn. adapter.buildRequest(normalized)→ProviderHttpRequest.- Compute
url = adapter.baseURL() + (providerReq.path ?? adapter.completionPath()). cacheKeyFnapplied if configured.- Emit
onBeforeSubmit(async). Cache plugin may setintercepted = true+resultPromiseto bypass the network entirely. - If intercepted:
rawResult = await submitCtx.resultPromise, wrap inHttpResponse. - Else: build
HttpRequest(mergingadapter.authHeaders()+providerReq.headers), callfetchFn(httpReq, { queueName, priority, estimatedTokens }). adapter.parseResponse(response.body, latencyMs)→CompletionResponse.- Emit
onCompletion(async) with full request + response metadata. - Return
CompletionResponse.
estimatedInputTokens (passed to the queue for rate-limit token accounting):
Math.ceil(JSON.stringify(normalized.messages).length / 4).
stream() flow (src/llm/client.ts:310)
Section titled “stream() flow (src/llm/client.ts:310)”Steps 1–7 identical to complete(). No server-state resolution. No
onBeforeSubmit (caching streaming is out of scope). Then:
adapter.buildRequest(normalized)→providerReq.adapter.enableStreaming?.(providerReq, normalized)— adapter mutates the body (e.g. setsstream: true).- Build
HttpRequestwithstream: true. - Accumulate as the SSE stream flows:
for await (const sseEvent of fetchStreamFn(httpReq, ...)) {const events = adapter.parseStreamEvent(sseEvent);for (const event of events) {if (event.type === 'text') text += event.text;if (event.type === 'thinking') thinking += event.text;if (event.type === 'usage') usage = event.usage;if (event.type === 'done') finishReason = event.finishReason;yield event;}}
- Synthesize a
CompletionResponsefrom accumulatedtext,usage,finishReason. - Emit
onCompletion(async) — same hook ascomplete(), soCostCollectorandContextMeasurerprice and measure streamed calls identically.
structuredComplete<T>() wraps complete() with structured: { schema } and
then calls parseStructured<T>(res.text) (from client-internal.ts): strips
leading/trailing markdown fences (```), then JSON.parse.
Provider adapters
Section titled “Provider adapters”Five provider directories: src/llm/providers/{anthropic,openai,google,xai,openrouter}/.
Anthropic — src/llm/providers/anthropic/messages.ts
Section titled “Anthropic — src/llm/providers/anthropic/messages.ts”AnthropicAdapter implements the Messages API (/v1/messages).
buildRequest:
- Maps
NormalizedRequest.messagesto Anthropic message blocks viabuildMessage(msg, req, forceCache). cache: 'auto'addscache_control: { type: 'ephemeral' }to the last message’s last block (conversation prefix) and to the system and tools arrays.thinking→{ type: 'enabled', budget_tokens: N }whereNis mapped fromeffortviaANTHROPIC_THINKING_BUDGETS. Liftsmax_tokensabovebudget_tokens.structured→output_config.format.type = 'json_schema'.- File refs (source type
provider_reforfile) in content parts trigger theanthropic-beta: files-api-2025-04-14header. - Tool role
'tool'is remapped to'user'(Anthropic’s wire format). web_searchbuiltin maps to{ type: 'web_search_20250305', name: 'web_search' }.- Service tier:
'auto'→'auto';'standard'→'standard_only';'priority'→'auto';'flex'/'scale'→'standard_only'or'auto'.
parseResponse: reads content[] blocks; type: 'text' → TextPart,
type: 'thinking' → sets thinking, type: 'tool_use' → ToolCallPart.
Usage: input_tokens, output_tokens, cache_read_input_tokens,
cache_creation_input_tokens.
parseStreamEvent dispatches on data.type:
content_block_delta+text_delta→{ type: 'text' }content_block_delta+thinking_delta→{ type: 'thinking' }content_block_delta+input_json_delta→{ type: 'tool_call_delta' }content_block_start+tool_useblock →{ type: 'tool_call_start' }message_delta→{ type: 'usage' }+{ type: 'done' }message_start→{ type: 'usage' }(initial usage with input tokens)
In-browser: adds anthropic-dangerous-direct-browser-access: true header
(detected via isBrowser() from src/runtime/runtime.ts).
OpenAI Responses — src/llm/providers/openai/responses.ts
Section titled “OpenAI Responses — src/llm/providers/openai/responses.ts”OpenAIResponsesAdapter targets the Responses API (/v1/responses).
buildRequest:
- Converts
messages→ flatinput[]array of typed items:- User/system text →
{ role, content: string }or content-part array withinput_text,input_image,input_filetypes. - Assistant →
{ type: 'message', role: 'assistant', content: [{ type: 'output_text' }] }and tool calls as{ type: 'function_call', id: 'fc_<id>', call_id, name, arguments }. - Tool results →
{ type: 'function_call_output', call_id, output }.
- User/system text →
system→instructions.previousResponseId→previous_response_id(server-state chain).- Function tools →
{ type: 'function', name, description, parameters, strict }. - Builtin tools pass through with
...t.params;code_interpreterdefaultscontainer = { type: 'auto' }when absent. structured→text.format = { type: 'json_schema', name, schema, strict }.thinking→reasoning = { effort, summary: 'auto' }.
parseResponse: iterates output[] items; type: 'message' → text content;
type: 'reasoning' → extracts summary text as thinking; type: 'function_call'
→ ToolCallPart; type: 'image_generation_call' → ImageOutputPart.
parseStreamEvent dispatches on SSE event types prefixed response.:
response.output_text.delta→{ type: 'text' }response.function_call_arguments.delta→{ type: 'tool_call_delta' }response.output_item.addedwithfunction_call→{ type: 'tool_call_start' }response.output_item.done→tool_call_end,media_end, orthinkingresponse.image_generation_call.partial_image→{ type: 'media_chunk' }response.completed→usage+done
Inline file data requires a filename field with the correct extension
(filenameForMime helper at responses.ts:34).
OpenAI Completions — src/llm/providers/openai/completions.ts
Section titled “OpenAI Completions — src/llm/providers/openai/completions.ts”Legacy Chat Completions (/v1/chat/completions). Maps messages directly to
OpenAI’s role: user|assistant|tool format. Tool calls serialized as
{ id, type: 'function', function: { name, arguments } }.
Google — src/llm/providers/google/generate.ts
Section titled “Google — src/llm/providers/google/generate.ts”Targets generateContent (/v1beta/models/{model}:generateContent). Maps
messages to contents[] with role: user|model. system → systemInstruction.
Tools → tools[].functionDeclarations[].
xAI and OpenRouter
Section titled “xAI and OpenRouter”xAI (src/llm/providers/xai/) supports both responses.ts and
completions.ts (same Responses API shape as OpenAI). OpenRouter
(src/llm/providers/openrouter/) uses Chat Completions with provider-passthrough.
Shared utilities (src/llm/providers/_shared/)
Section titled “Shared utilities (src/llm/providers/_shared/)”response-utils.ts:extractFinishReason(hasToolCalls, stopReason, statusMap)— maps provider stop reason strings toFinishReason. IfhasToolCallsis true, always returns'tool_use'.constants.ts:DEFAULT_MAX_TOKENS = 4096.
parseUsage is NOT shared — each provider’s usage schema differs too much.
Queue routing and priority
Section titled “Queue routing and priority”queueName defaults to "${provider}/${model}". Overridable via:
LLMClientConfig.queueName(construction time)RequestContext.queueName(per-call viaoptions.ctx.queueName)
Priority: mode: 'background' → PRIORITY_BACKGROUND = 2;
foreground → PRIORITY_INTERACTIVE = 1. Retries use Priority.RETRY = 0.
RequestContext propagation (src/types/request-context.ts)
Section titled “RequestContext propagation (src/types/request-context.ts)”buildContext(client, options) in client-internal.ts mints:
sessionIdfromclient.sessionId(or override fromoptions.ctx)requestIdminted asreq_<12-char>if not already setcallIdminted ascall_<8-char>if not already setqueueName,configName,cacheName,cacheKey
The trace object attached to every HttpRequest is
{ sessionId, requestId, callId } from this context.
Server-state (Responses API chain)
Section titled “Server-state (Responses API chain)”resolveServerState (src/llm/server-state.ts) checks:
- Last assistant message in history has
origin.serverStateIdandorigin.provider === this.provider. catalog.supportsPreviousResponseId(provider, model)returns true.- State retention period has not elapsed (uses
catalog.getStateRetention). catalog.isStateModelBound— if true, the model must match across turns.
When all pass: sets normalized.previousResponseId = origin.serverStateId and
trims normalized.messages to only the new turn (the provider reconstructs
context from its stored state).
Extension points — adding a provider
Section titled “Extension points — adding a provider”- Create
src/llm/providers/<name>/with a class implementingProviderAdapter. - Add a
catalog.jsonin that directory;ModelCatalog.loadProviderDefaults()auto-loads it. - Register the new
ProviderNameinsrc/llm/types/provider.ts. - Add a default
ApiTypeinresolveApi(client-internal.ts). - Export an adapter factory from the provider directory’s index.
Key invariants
Section titled “Key invariants”LLMClientis immutable after construction:provider,model,apiKey,system, andadapterare all fixed.- It never calls
fetchdirectly; it always calls the injectedEngineFetch. onMessageResolveis the only hook where handlers may mutate the request in place. All other hooks are observational.onBeforeSubmitis the only interception point for the Cache plugin to short-circuit a network call.role: 'system'messages are extracted byextractSystembefore any adapter sees the messages array — adapters never receiverole: 'system'.onCompletionfires for bothcomplete()andstream()(with a synthesized response for the streaming case).CostCollectorsubscribes to this single event.