Async batch job

What you will achieve

Submit multiple prompts as an async batch job, poll until the provider completes all of them (up to 24 hours), and retrieve per-item results with success flags, text, and full usage data. Cost-collector hooks fire per completed item so the budget meter stays accurate. Supported on OpenAI, Anthropic, and Google.

When and why

Batch APIs are designed for workloads where latency does not matter but cost does:

Bulk classification or extraction — process thousands of documents overnight.
Offline evaluation — run a prompt benchmark suite without paying synchronous API prices.
Large-scale enrichment — annotate a dataset with model outputs without blocking.

Provider batch discounts:

OpenAI: ~50% off synchronous pricing on supported models.
Anthropic: 50% off on Claude models.
Google: free (Gemini batch is experimental; check current pricing).

The raw work without the SDK: build a JSONL file, upload it with files.create(), call batches.create(), poll batches.retrieve() until status === 'completed', download the output file id from the batch object, download the file content, split on newlines, JSON-parse each line, handle per-line errors. For Anthropic and Google the flow differs. Total: 50-100 lines per provider.

Step by step

Step 1 — Auto mode: submit and wait in one call

import { batch } from '@combycode/llm-sdk';

const results = await batch({
  model: 'openai/gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY,
  requests: [
    { customId: 'classify-1', prompt: 'Classify as positive or negative: "I love this product"', maxTokens: 8 },
    { customId: 'classify-2', prompt: 'Classify as positive or negative: "Terrible experience"', maxTokens: 8 },
    { customId: 'classify-3', prompt: 'Classify as positive or negative: "It was okay"', maxTokens: 8 },
  ],
});

for (const r of results) {
  console.log(r.customId, r.success ? r.text.trim() : `ERROR: ${r.error}`);
}
// classify-1  positive
// classify-2  negative
// classify-3  neutral

batch() blocks until all items complete (or until timeoutMs expires). The default poll interval is 5 seconds; the default timeout is 24 hours (the provider batch window).

Step 2 — Manual mode: submit, persist, resume

For long-running batches where your process may restart:

import { submitBatch, batchJob } from '@combycode/llm-sdk';

// Submit and immediately save the job id
const job = await submitBatch({
  model: 'anthropic/claude-haiku-4-5',
  apiKey: process.env.ANTHROPIC_API_KEY,
  requests: [
    { customId: 'doc-1', prompt: 'Summarise: ...', maxTokens: 128 },
    { customId: 'doc-2', prompt: 'Summarise: ...', maxTokens: 128 },
  ],
});

console.log(job.id);       // 'batch_abc123...'
console.log(job.provider); // 'anthropic'

// Save job.id + job.provider to your database, queue, or env var.
// Your process can restart here -- the batch continues running on the provider.

// Later (different process, different server):
const restored = batchJob({
  id: 'batch_abc123...',
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const status = await restored.status();
console.log(status.status); // 'in_progress' | 'completed' | 'failed' | 'expired' | 'cancelled'

if (status.status === 'completed') {
  const results = await restored.results();
  console.log(results.length); // 2
}

Step 3 — Poll with progress reporting

import { submitBatch } from '@combycode/llm-sdk';

const job = await submitBatch({
  model: 'google/gemini-2.0-flash',
  apiKey: process.env.GOOGLE_API_KEY,
  requests: Array.from({ length: 50 }, (_, i) => ({
    customId: `item-${i}`,
    prompt: `Translate to French: item number ${i}`,
    maxTokens: 32,
  })),
});

const results = await job.wait({
  pollIntervalMs: 10_000,  // check every 10 seconds
  timeoutMs: 2 * 60 * 60 * 1000, // 2 hour limit
  onProgress: (status) => {
    console.log(`[${new Date().toISOString()}] status: ${status.status}`);
  },
});

console.log(results.filter(r => r.success).length, '/ 50 succeeded');

Step 4 — Cancel a batch

const job = await submitBatch({ model: 'openai/gpt-4o-mini', requests: [...] });

// Change of plans:
await job.cancel();

Cancellation is best-effort — items already processed by the provider may still appear in results.

Step 5 — Read per-item cost

The SDK emits one onCostEntry hook per successfully-parsed batch item (cost accrues when results are downloaded, not at submit time). Listen on the engine:

import { createEngine } from '@combycode/llm-sdk';

const engine = createEngine({
  apiKeys: { openai: process.env.OPENAI_API_KEY! },
});
engine.hooks.on('onCostEntry', ({ entry }) => {
  console.log(entry.tags.customId, entry.cost?.totalUsd?.toFixed(6));
});

const results = await batch({
  model: 'openai/gpt-4o-mini',
  engine,
  requests: [{ customId: 'q1', prompt: 'Hello', maxTokens: 16 }],
});

Your options

BatchRequestInput — one item per request:

Field	Type	Notes
`customId`	`string`	Correlation id returned in results. Defaults to `req-0`, `req-1`, … when omitted.
`prompt`	`string \| ContentPart[] \| Message[]`	Same shapes as `complete()`.
`system`	`string`	Per-item system prompt override.
`maxTokens`	`number`	Per-item max output tokens.
`temperature`	`number`	Per-item temperature.
`structured`	`{ schema, name? }`	Per-item JSON schema output.

SubmitBatchOptions — shared settings:

Option	Type	Notes
`model`	`string`	Namespaced (`openai/gpt-4o-mini`) or bare with `provider`.
`provider`	`ProviderName`	Required when model is bare.
`apiKey`	`string`	Falls back to `engine.apiKeys[provider]`.
`requests`	`BatchRequestInput[]`	The items to submit.
`engine`	`EngineHandle`	Override global engine.

WaitOptions — for batch() and job.wait():

Option	Type	Default	Notes
`pollIntervalMs`	`number`	`5000` (5s)	How often to call `job.status()`.
`timeoutMs`	`number`	`86400000` (24h)	Throw `Error` if not done by this deadline.
`onProgress`	`(status: BatchStatus) => void`	`undefined`	Called after each status poll.

BatchJob handle — manual mode:

Method	Returns	Notes
`job.id`	`string`	Provider batch id. Persist this for resume.
`job.provider`	`ProviderName`	Provider name. Needed for `batchJob()` resume.
`job.status()`	`Promise<BatchStatus>`	Current status without blocking.
`job.results()`	`Promise<BatchItemResult[]>`	Throws if not yet terminal.
`job.wait(opts?)`	`Promise<BatchItemResult[]>`	Poll + block until complete.
`job.cancel()`	`Promise<void>`	Request cancellation.

BatchItemResult — per-item output:

Field	Type	Notes
`customId`	`string`	Matches the input `customId` (or auto-generated `req-N`).
`success`	`boolean`	`true` when the provider confirms success and the response parsed correctly.
`text`	`string`	Parsed reply text. Empty string on failure.
`response`	`CompletionResponse \| null`	Full normalised response including usage and `raw`. `null` on failure.
`error`	`string \| null`	Error message when `success` is `false`.

Provider support:

Provider	Batch mechanism	Notes
OpenAI	`/v1/batches` + JSONL	Built on Responses API (`/v1/responses`). ~50% discount on supported models.
Anthropic	`/v1/messages/batches`	Native batch API. 50% discount.
Google	Batch prediction API	Uses `GoogleBatchAdapter`. Experimental; pricing varies.
xAI	Not supported	`batch()` throws for `xai` provider.
OpenRouter	Not supported	OpenRouter does not expose a batch endpoint.

Google custom id mapping: Google keys results by index, not by the submitted customId. The SDK maps results back to the original ids using submission order. Results are returned in the same order as the requests array.

Compare the SDKs

import { batch } from '@combycode/llm-sdk';

// Unified explicit batch: describe each request like a complete() call, await all.
// (Official samples hand-build provider JSONL/inline bodies, upload a file, create
// the batch, and poll the status loop by hand.)
const t0 = performance.now();
const results = await batch({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  requests: [
    { customId: 'a', prompt: 'Say apple.', maxTokens: 16 },
    { customId: 'b', prompt: 'Say banana.', maxTokens: 16 },
  ],
});

// Individual results are correlatable by customId.
const count = results.filter((r) => r.success).length;
console.log(JSON.stringify({ result: String(count), ms: Math.round(performance.now() - t0) }));

OpenAI’s official SDK exposes the raw multi-step flow: upload JSONL file, create batch, poll, download output file, parse JSONL. Anthropic’s is similar but with different endpoint shapes. Google’s batch is a separate prediction API with different semantics. ORXA wraps all three behind the same submitBatch() / batch() / batchJob() API. The BatchJobImpl class handles per-provider result parsing, customId remapping (Google), and onCostEntry hook emission — cost tracking works the same for batch and synchronous calls.

Gotchas and next steps

Batch turnaround is typically minutes to hours, not seconds. Providers process batches on spare capacity. A batch of 1000 items may complete in 10 minutes or 12 hours. Design accordingly — batch() auto mode will block your process for the full duration. Prefer manual mode for anything longer than a few minutes.

results() throws if the batch is not terminal. Always check status() first or use wait() to block until done. Terminal statuses are completed, failed, expired, and cancelled.

Failed items don’t throw. A partially-failed batch returns a BatchItemResult[] where some items have success: false. Inspect each item’s error field. The batch itself is considered completed by the provider even if some items failed.

Cost fires at download time, not submit time. The onCostEntry hook emits when results() or wait() parses each item. If your process restarts between submit and download, the cost for items already processed by the provider fires only when you call results() on the restored batchJob() handle.

OpenAI batch uses the Responses API internally. OpenAIBatchAdapter builds requests with OpenAIResponsesAdapter and parses responses with the same adapter. The batch JSONL body contains Responses API format, not Chat Completions format.

Next steps:

File upload — upload a file once and reference it in batch requests
Provider routing — route individual synchronous requests across providers
Cost tracking guide — how onCostEntry accumulates batch costs