Skip to content

Async batch job

Submit multiple prompts as an async batch job, poll until the provider completes all of them (up to 24 hours), and retrieve per-item results with success flags, text, and full usage data. Cost-collector hooks fire per completed item so the budget meter stays accurate. Supported on OpenAI, Anthropic, and Google.

Batch APIs are designed for workloads where latency does not matter but cost does:

  • Bulk classification or extraction — process thousands of documents overnight.
  • Offline evaluation — run a prompt benchmark suite without paying synchronous API prices.
  • Large-scale enrichment — annotate a dataset with model outputs without blocking.

Provider batch discounts:

  • OpenAI: ~50% off synchronous pricing on supported models.
  • Anthropic: 50% off on Claude models.
  • Google: free (Gemini batch is experimental; check current pricing).

The raw work without the SDK: build a JSONL file, upload it with files.create(), call batches.create(), poll batches.retrieve() until status === 'completed', download the output file id from the batch object, download the file content, split on newlines, JSON-parse each line, handle per-line errors. For Anthropic and Google the flow differs. Total: 50-100 lines per provider.

Step 1 — Auto mode: submit and wait in one call

Section titled “Step 1 — Auto mode: submit and wait in one call”
import { batch } from '@combycode/llm-sdk';
const results = await batch({
model: 'openai/gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY,
requests: [
{ customId: 'classify-1', prompt: 'Classify as positive or negative: "I love this product"', maxTokens: 8 },
{ customId: 'classify-2', prompt: 'Classify as positive or negative: "Terrible experience"', maxTokens: 8 },
{ customId: 'classify-3', prompt: 'Classify as positive or negative: "It was okay"', maxTokens: 8 },
],
});
for (const r of results) {
console.log(r.customId, r.success ? r.text.trim() : `ERROR: ${r.error}`);
}
// classify-1 positive
// classify-2 negative
// classify-3 neutral

batch() blocks until all items complete (or until timeoutMs expires). The default poll interval is 5 seconds; the default timeout is 24 hours (the provider batch window).

Step 2 — Manual mode: submit, persist, resume

Section titled “Step 2 — Manual mode: submit, persist, resume”

For long-running batches where your process may restart:

import { submitBatch, batchJob } from '@combycode/llm-sdk';
// Submit and immediately save the job id
const job = await submitBatch({
model: 'anthropic/claude-haiku-4-5',
apiKey: process.env.ANTHROPIC_API_KEY,
requests: [
{ customId: 'doc-1', prompt: 'Summarise: ...', maxTokens: 128 },
{ customId: 'doc-2', prompt: 'Summarise: ...', maxTokens: 128 },
],
});
console.log(job.id); // 'batch_abc123...'
console.log(job.provider); // 'anthropic'
// Save job.id + job.provider to your database, queue, or env var.
// Your process can restart here -- the batch continues running on the provider.
// Later (different process, different server):
const restored = batchJob({
id: 'batch_abc123...',
provider: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY,
});
const status = await restored.status();
console.log(status.status); // 'in_progress' | 'completed' | 'failed' | 'expired' | 'cancelled'
if (status.status === 'completed') {
const results = await restored.results();
console.log(results.length); // 2
}
import { submitBatch } from '@combycode/llm-sdk';
const job = await submitBatch({
model: 'google/gemini-2.0-flash',
apiKey: process.env.GOOGLE_API_KEY,
requests: Array.from({ length: 50 }, (_, i) => ({
customId: `item-${i}`,
prompt: `Translate to French: item number ${i}`,
maxTokens: 32,
})),
});
const results = await job.wait({
pollIntervalMs: 10_000, // check every 10 seconds
timeoutMs: 2 * 60 * 60 * 1000, // 2 hour limit
onProgress: (status) => {
console.log(`[${new Date().toISOString()}] status: ${status.status}`);
},
});
console.log(results.filter(r => r.success).length, '/ 50 succeeded');
const job = await submitBatch({ model: 'openai/gpt-4o-mini', requests: [...] });
// Change of plans:
await job.cancel();

Cancellation is best-effort — items already processed by the provider may still appear in results.

The SDK emits one onCostEntry hook per successfully-parsed batch item (cost accrues when results are downloaded, not at submit time). Listen on the engine:

import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({
apiKeys: { openai: process.env.OPENAI_API_KEY! },
});
engine.hooks.on('onCostEntry', ({ entry }) => {
console.log(entry.tags.customId, entry.cost?.totalUsd?.toFixed(6));
});
const results = await batch({
model: 'openai/gpt-4o-mini',
engine,
requests: [{ customId: 'q1', prompt: 'Hello', maxTokens: 16 }],
});

BatchRequestInput — one item per request:

FieldTypeNotes
customIdstringCorrelation id returned in results. Defaults to req-0, req-1, … when omitted.
promptstring | ContentPart[] | Message[]Same shapes as complete().
systemstringPer-item system prompt override.
maxTokensnumberPer-item max output tokens.
temperaturenumberPer-item temperature.
structured{ schema, name? }Per-item JSON schema output.

SubmitBatchOptions — shared settings:

OptionTypeNotes
modelstringNamespaced (openai/gpt-4o-mini) or bare with provider.
providerProviderNameRequired when model is bare.
apiKeystringFalls back to engine.apiKeys[provider].
requestsBatchRequestInput[]The items to submit.
engineEngineHandleOverride global engine.

WaitOptions — for batch() and job.wait():

OptionTypeDefaultNotes
pollIntervalMsnumber5000 (5s)How often to call job.status().
timeoutMsnumber86400000 (24h)Throw Error if not done by this deadline.
onProgress(status: BatchStatus) => voidundefinedCalled after each status poll.

BatchJob handle — manual mode:

MethodReturnsNotes
job.idstringProvider batch id. Persist this for resume.
job.providerProviderNameProvider name. Needed for batchJob() resume.
job.status()Promise<BatchStatus>Current status without blocking.
job.results()Promise<BatchItemResult[]>Throws if not yet terminal.
job.wait(opts?)Promise<BatchItemResult[]>Poll + block until complete.
job.cancel()Promise<void>Request cancellation.

BatchItemResult — per-item output:

FieldTypeNotes
customIdstringMatches the input customId (or auto-generated req-N).
successbooleantrue when the provider confirms success and the response parsed correctly.
textstringParsed reply text. Empty string on failure.
responseCompletionResponse | nullFull normalised response including usage and raw. null on failure.
errorstring | nullError message when success is false.

Provider support:

ProviderBatch mechanismNotes
OpenAI/v1/batches + JSONLBuilt on Responses API (/v1/responses). ~50% discount on supported models.
Anthropic/v1/messages/batchesNative batch API. 50% discount.
GoogleBatch prediction APIUses GoogleBatchAdapter. Experimental; pricing varies.
xAINot supportedbatch() throws for xai provider.
OpenRouterNot supportedOpenRouter does not expose a batch endpoint.

Google custom id mapping: Google keys results by index, not by the submitted customId. The SDK maps results back to the original ids using submission order. Results are returned in the same order as the requests array.

import { batch } from '@combycode/llm-sdk';

// Unified explicit batch: describe each request like a complete() call, await all.
// (Official samples hand-build provider JSONL/inline bodies, upload a file, create
// the batch, and poll the status loop by hand.)
const t0 = performance.now();
const results = await batch({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  requests: [
    { customId: 'a', prompt: 'Say apple.', maxTokens: 16 },
    { customId: 'b', prompt: 'Say banana.', maxTokens: 16 },
  ],
});

// Individual results are correlatable by customId.
const count = results.filter((r) => r.success).length;
console.log(JSON.stringify({ result: String(count), ms: Math.round(performance.now() - t0) }));

OpenAI’s official SDK exposes the raw multi-step flow: upload JSONL file, create batch, poll, download output file, parse JSONL. Anthropic’s is similar but with different endpoint shapes. Google’s batch is a separate prediction API with different semantics. ORXA wraps all three behind the same submitBatch() / batch() / batchJob() API. The BatchJobImpl class handles per-provider result parsing, customId remapping (Google), and onCostEntry hook emission — cost tracking works the same for batch and synchronous calls.

Batch turnaround is typically minutes to hours, not seconds. Providers process batches on spare capacity. A batch of 1000 items may complete in 10 minutes or 12 hours. Design accordingly — batch() auto mode will block your process for the full duration. Prefer manual mode for anything longer than a few minutes.

results() throws if the batch is not terminal. Always check status() first or use wait() to block until done. Terminal statuses are completed, failed, expired, and cancelled.

Failed items don’t throw. A partially-failed batch returns a BatchItemResult[] where some items have success: false. Inspect each item’s error field. The batch itself is considered completed by the provider even if some items failed.

Cost fires at download time, not submit time. The onCostEntry hook emits when results() or wait() parses each item. If your process restarts between submit and download, the cost for items already processed by the provider fires only when you call results() on the restored batchJob() handle.

OpenAI batch uses the Responses API internally. OpenAIBatchAdapter builds requests with OpenAIResponsesAdapter and parses responses with the same adapter. The batch JSONL body contains Responses API format, not Chat Completions format.

Next steps: