Image generation

▶ Try in Sandbox Opens a live chat playground with this example prefilled — add your API key then hit Send. Runs in your browser; no code is executed.

What you will achieve

Prompt 'a red circle on white background' and confirm non-empty image bytes are written to disk — same generateImage() call for OpenAI and Google (Anthropic has no image generation API).

When and why you need this

Image generation turns a text prompt into pixel data you can display, embed in a document, or feed back into another model call as a vision input. Use cases: product mock-ups, icon sets, illustration pipelines, data augmentation.

The challenge with raw provider SDKs:

OpenAI calls client.images.generate({ model, prompt, size, quality, n }) and returns a list of b64_json or url items. gpt-image-1 always returns b64_json and rejects the response_format parameter; dall-e-3 requires it. Two models, two different call shapes inside the same provider.
Google Imagen uses a :predict endpoint with instances and parameters; predictions contain bytesBase64Encoded fields. Google Gemini image models instead use generateContent with responseModalities: ['IMAGE'] and return inlineData parts. Two endpoints within the same provider.

createMediaOutput() routes each provider+model combination to the correct endpoint, saves the raw bytes to disk, and returns a uniform MediaResult.

Step by step

Step 1 — Create a media handle

import { createMediaOutput } from '@combycode/llm-sdk';

const media = createMediaOutput({
  model: 'openai/gpt-image-1',
  apiKey: process.env.OPENAI_API_KEY,
  dir: './.media-out',
});

createMediaOutput() requires either dir (Node/Bun, for FileMediaStore) or store (any environment, e.g. new MemoryMediaStore() in the browser). The model string is 'provider/model-id' or a bare model with a separate provider field. API key falls back to engine.apiKeys[provider] when not passed explicitly.

Step 2 — Generate an image

const [img] = await media.generateImage({
  prompt: 'a red circle on white background',
  params: { size: '1024x1024' },
});

console.log(`saved ${img.meta.size} bytes, id: ${img.id}`);
// img.meta.mimeType  -> 'image/png'
// img.meta.provider  -> 'openai'
// img.meta.model     -> 'gpt-image-1'

generateImage() returns a MediaResult[]. Each item has { id, type: 'image', mimeType, meta }. The bytes are saved to dir under the id. Load them back with output.raw.mediaStore.load(id) when needed.

Step 3 — Generate multiple images (n)

const images = await media.generateImage({
  prompt: 'a watercolor painting of a mountain',
  params: { n: 4, size: '1024x1024' },
});

for (const img of images) {
  console.log(`${img.id}: ${img.meta.size} bytes`);
}

params.n requests multiple images in one API call. OpenAI DALL-E 2 supports up to 10; gpt-image-1 and DALL-E 3 support 1 (with gpt-image-1 supporting up to 10 in batch). Google Imagen supports up to 4 (sampleCount). Images are stored individually; the array length matches n.

Step 4 — Edit an existing image

import { readFileSync } from 'fs';

const sourceBytes = new Uint8Array(readFileSync('./original.png'));

const [edited] = await media.editImage({
  prompt: 'replace the background with a sunset',
  sourceImage: { type: 'buffer', mimeType: 'image/png', data: sourceBytes },
  params: { size: '1024x1024' },
});

editImage() is available on OpenAI (gpt-image-1 via /v1/images/edits) and Google (Gemini via generateContent with the image attached as an extra part). Pass mask as a second DataSource for inpainting (OpenAI only).

Step 5 — Switch to Google Imagen

const googleMedia = createMediaOutput({
  model: 'google/imagen-4.0-generate-001',
  apiKey: process.env.GOOGLE_API_KEY,
  dir: './.media-out',
});

const [img] = await googleMedia.generateImage({
  prompt: 'a photorealistic red apple on a white table',
  params: {
    n: 1,
    aspectRatio: '1:1',
  },
});

The Google Imagen path calls :predict on the Imagen model. Google Gemini image models (e.g. gemini-2.0-flash-exp) call generateContent with responseModalities: ['IMAGE']. The SDK routes automatically based on whether the model name starts with 'imagen'.

Your options

createMediaOutput() options:

Option	Type	Description
`model`	`string`	Namespaced (`'openai/gpt-image-1'`) or bare. Required unless `provider` is set and the adapter uses a default.
`provider`	`ProviderName`	Required when `model` is bare.
`apiKey`	`string`	Optional; falls back to `engine.apiKeys[provider]`.
`dir`	`string`	Directory for `FileMediaStore` (Node/Bun).
`store`	`MediaStore`	Custom store. Use `new MemoryMediaStore()` in the browser.
`providers`	`Record<string, MediaProviderAdapter>`	Override or extend auto-registered adapters (custom baseURL, shared instance).
`engine`	`EngineHandle`	Share an existing engine (hooks, catalog, fetch queue).
`config`	`MediaOutputConfig`	`pollIntervalMs` (default 5000) and `maxPollWaitMs` (default 600000) for async video.

ImageGenRequest.params — full option set:

Param	Type	Providers	Description
`n`	`number`	OpenAI, Google Imagen	Number of images to generate. OpenAI default 1; Google Imagen max 4.
`size`	`string`	OpenAI	Pixel dimensions string: `'1024x1024'`, `'1792x1024'`, `'1024x1792'` (DALL-E 3 / gpt-image-1). `'256x256'`, `'512x512'`, `'1024x1024'` (DALL-E 2).
`aspectRatio`	`string`	Google	Aspect ratio string: `'1:1'`, `'3:4'`, `'4:3'`, `'9:16'`, `'16:9'`.
`imageSize`	`string`	Google Imagen	Sample image size (`'1K'`, `'2K'`). Maps to `sampleImageSize` for Imagen, `imageSize` for Gemini.
`quality`	`string`	OpenAI	`'standard'` or `'hd'` (DALL-E 3); `'low'`, `'medium'`, `'high'`, `'auto'` (gpt-image-1).
`style`	`string`	OpenAI DALL-E 3	`'vivid'` or `'natural'`. Ignored by gpt-image-1 and Google.
`background`	`string`	OpenAI gpt-image-1	`'transparent'`, `'opaque'`, or `'auto'`. Requires PNG output.
`outputFormat`	`string`	OpenAI gpt-image-1	`'png'`, `'jpeg'`, or `'webp'`. Default `'png'`.
`responseFormat`	`'b64_json' \| 'url'`	OpenAI DALL-E 2/3 only	`gpt-image-1` always returns `b64_json` and ignores this parameter. The adapter omits it automatically.
`strength`	`number`	OpenRouter	Image-to-image strength (0-1); lower = closer to source.

MediaResult fields:

Field	Type	Description
`id`	`string`	Generated media id (`img_<uuid>`). Use to load bytes from the store.
`type`	`'image'`	Media type discriminator.
`mimeType`	`string`	e.g. `'image/png'`. From provider response or `outputFormat`.
`meta.size`	`number`	Byte count of the stored file.
`meta.provider`	`string`	Provider that generated it.
`meta.model`	`string`	Model id used.
`meta.prompt`	`string`	The prompt sent.
`meta.revisedPrompt`	`string \| undefined`	OpenAI may return a revised prompt when it rewrites your input.
`meta.width` / `meta.height`	`number \| undefined`	Pixel dimensions when reported by the provider.

Cost note: Image generation is priced per image (DALL-E 2/3) or per output token (gpt-image-1, Gemini). gpt-image-1 output tokens are billed at $0.04/1K by default (higher for HD). DALL-E 3 standard 1024x1024 is $0.04/image. Google Imagen pricing varies by model and region. Check provider pricing pages before running large batches.

Provider and model reference:

Provider	Models	Endpoint
OpenAI	`gpt-image-1`, `dall-e-3`, `dall-e-2`	`/v1/images/generations` (generate); `/v1/images/edits` (edit)
Google	`imagen-4.0-generate-001`, `gemini-2.0-flash-exp` (image)	`:predict` for Imagen; `generateContent` for Gemini image
xAI	Aurora models via OpenRouter	`/v1/images/generations`

Compare the SDKs

import { createMediaOutput } from '@combycode/llm-sdk';

// Unified media output: one handle, generateImage() defaults provider+model from
// the configured model id. (Official samples call provider-specific endpoints:
// openai images.generate vs google generateContent responseModalities:['IMAGE'].)
const media = createMediaOutput({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  dir: './.media-out',
});

const t0 = performance.now();
const [img] = await media.generateImage({
  prompt: 'a red circle on white background',
  params: { size: '1024x1024' },
});
console.log(JSON.stringify({ result: String(img?.meta.size ?? 0), ms: Math.round(performance.now() - t0) }));

OpenAI’s SDK calls client.images.generate() and returns response.data[] with b64_json or url fields — you decode base64 and save to disk yourself. Google has no official image-generation method in the Node SDK; you call client.models.generateContent() with responseModalities: ['IMAGE'] and extract inlineData.data from the response parts manually. ORXA calls generateImage() once and returns typed MediaResult[] with bytes already saved to dir — no provider-specific extraction code in your app.

Gotchas and next steps

gpt-image-1 always returns PNG bytes as b64_json. The responseFormat parameter is silently omitted by the adapter for gpt-image-1 because the API rejects it. DALL-E 3 and DALL-E 2 still need it set to 'b64_json' internally — the adapter handles this.

Revised prompts. OpenAI’s API may rewrite your prompt for safety or quality. The original prompt is stored in meta.prompt; the rewritten version (if any) is in meta.revisedPrompt. Log revisedPrompt when debugging unexpected output.

Google Imagen vs Gemini image models have different endpoints. Model names starting with 'imagen' go to :predict; all other Google models go to generateContent. Set model in createMediaOutput to the correct string; the adapter routes automatically.

Bytes are saved on every generateImage() call. If generation fails mid-call (network error after the image is returned but before mediaStore.save()), no file is written. The promise rejects cleanly — retry safely.

Edit requires sourceImage as a DataSource, not a path string. editImage() does not accept raw file paths. Read the file into a buffer DataSource first (see Step 4 above).

Next steps:

Vision input — feed generated images back into a vision model
TTS — audio generation counterpart
File upload — upload a source image to the Files API for use in edits