Skip to content

PDF document input

Try in Sandbox Opens a live chat playground with this example prefilled — add your API key and attach a PDF, then hit Send. Runs in your browser; no code is executed.

Send a one-page PDF containing the word BANANA and prompt 'What word is in this document?'. Assert the response matches /banana/i on OpenAI, Anthropic, and Google — with one call shape.

Whenever the model must reason over structured document content: extracting data from a report, answering questions about a contract, summarising a research paper, or cross-referencing sections of a manual.

Raw provider SDKs diverge sharply for PDF input:

  • OpenAI accepts PDFs as a base64 content block (type input_file) in the Responses API, or as a file block in Chat Completions. The Files API upload-then-reference path is also supported.
  • Anthropic uses { type: 'document', source: { type: 'base64', media_type: 'application/pdf', data } } inside a content array.
  • Google treats a PDF as any other inlineData part with mimeType: 'application/pdf'.

Each path requires different code. attachments handles all three with the same call.

import { complete } from '@combycode/llm-sdk';
const { text } = await complete({
model: process.env.LLM_MODEL!,
apiKey: process.env.LLM_API_KEY,
prompt: 'What word is in this document? Reply with just the word.',
attachments: ['./banana.pdf'],
maxTokens: 64,
});
console.log(text); // "BANANA"

loadContent() reads the file, detects application/pdf from the .pdf extension (or from the %PDF magic bytes), base64-encodes the content, and returns a DocumentPart with a base64 DataSource. The provider adapter translates that into the correct wire format.

const { text } = await complete({
model: process.env.LLM_MODEL!,
apiKey: process.env.LLM_API_KEY,
prompt: 'Summarise this document in one sentence.',
attachments: ['./report.txt'],
maxTokens: 128,
});

Files with text/plain MIME type (.txt extension or detected MIME) also resolve to a DocumentPart. This is distinct from an image attachment — the provider receives it through the document/file path rather than the image path.

Step 3 — Send a PDF from bytes in memory

Section titled “Step 3 — Send a PDF from bytes in memory”
import { readFileSync } from 'fs';
const pdfBytes = new Uint8Array(readFileSync('./contract.pdf'));
const { text } = await complete({
model: process.env.LLM_MODEL!,
apiKey: process.env.LLM_API_KEY,
prompt: 'List the parties named in this contract.',
attachments: [pdfBytes],
maxTokens: 256,
});

Pass Uint8Array when the bytes are already in memory. loadContent() detects application/pdf from the %PDF magic bytes (bytes 0-3: 25 50 44 46).

Step 4 — Build a DocumentPart manually for citations

Section titled “Step 4 — Build a DocumentPart manually for citations”

When you want to enable Anthropic’s citation feature, build the part directly:

import { complete } from '@combycode/llm-sdk';
import type { DocumentPart } from '@combycode/llm-sdk';
import { readFileSync } from 'fs';
import { Buffer } from 'buffer';
const raw = new Uint8Array(readFileSync('./paper.pdf'));
const b64 = Buffer.from(raw).toString('base64');
const docPart: DocumentPart = {
type: 'document',
source: { type: 'base64', mimeType: 'application/pdf', data: b64 },
citations: true,
};
const { text } = await complete({
model: 'anthropic/claude-opus-4-5',
apiKey: process.env.ANTHROPIC_API_KEY,
messages: [
{
role: 'user',
content: [
docPart,
{ type: 'text', text: 'What does section 3 conclude?' },
],
},
],
maxTokens: 512,
});

citations: true is passed through to Anthropic’s API when the adapter supports it. It has no effect on OpenAI or Google adapters.

DocumentPart shape (ContentPart of type 'document'):

FieldTypeDescription
type'document'Discriminator — set by loadContent() when MIME is application/pdf or text/plain.
sourceDataSourceWhere the document bytes come from (see below).
citationsbooleanOptional. When true, Anthropic returns inline citations. Ignored by other providers.

DataSource variants for documents:

typeRequired fieldsWhen to use
'base64'mimeType: string, data: stringRaw base64-encoded document bytes. Output of loadContent().
'buffer'mimeType: string, data: Uint8ArrayRaw bytes in memory. mimeType must be 'application/pdf' or 'text/plain'.
'path'mimeType: string, path: stringLocal file path (Node/Bun). SDK reads and encodes. Use attachments instead for simpler calls.
'file'fileId: stringA file previously uploaded via the Files API.
'provider_ref'mimeType: string, refId: stringProvider-specific file reference (e.g. Google Files API URI).
'url'url: stringNot auto-detected as document by loadContent(). Use file-path or bytes instead.

MIME type detection for documents:

Extension / MagicMIME typePart type
.pdf / %PDF magic bytesapplication/pdfDocumentPart
.txttext/plainDocumentPart
.png, .jpg, .gif, .webpimage/*ImagePart — NOT a document
.wav, .mp3, .m4a, etc.audio/*AudioPart — NOT a document

Provider support and constraints:

ProviderPDF supportNative text/plainApproximate size limit
OpenAIYes (Responses API: input_file block; Chat Completions: file block)Yes512 MB per file (Files API); ~32 pages inline
AnthropicYes (document block with base64 source)Yes32 MB per document; 100 pages per request
GoogleYes (inlineData with application/pdf)Yes (as inlineData)20 MB inline; use Files API for larger

When to use Files API instead of inline:

For large documents or documents shared across many requests, upload once with the Files upload helper and reference by file DataSource. The inline path (attachments) re-encodes and re-sends the document on every call.

import { complete } from '@combycode/llm-sdk';

// `attachments` now loads PDFs (and audio/video) by MIME, not just images —
// the SDK builds the right document/image part per provider.
const t0 = performance.now();
const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'What word is in this document?',
  attachments: ['../../official-samples/_fixtures/banana.pdf'],
  maxTokens: 64,
});

console.log(JSON.stringify({ result: text.trim(), ms: Math.round(performance.now() - t0) }));

Official SDKs each require a different document block shape. OpenAI’s Responses API uses input_file with a base64 file_data sub-object. Anthropic’s SDK needs a document content block with a source containing media_type. Google inlines the PDF bytes as inlineData with no special document type. ORXA resolves a single DocumentPart with a base64 DataSource into the right shape per provider — the same attachments: ['./file.pdf'] call works for all three.

Page count limits apply to inline uploads. At ~32 pages (OpenAI) or 100 pages (Anthropic), large PDFs require the Files API. attachments always inlines — use the Files API path for large documents.

Scanned PDFs without selectable text may not work well. Providers read the text layer of a PDF; image-only scans (rasterised documents) are processed as image content, not text. Quality varies: Google Gemini handles image-PDF better than older OpenAI models.

text/plain vs image/* detection. If a .txt file contains binary content the MIME sniff will still return text/plain (no magic-byte check for plain text). Binary content sent as a document will corrupt. Always use the correct extension.

Anthropic citations: true adds response size. With citations enabled, the response includes document excerpts alongside the answer. Account for this in maxTokens.

Next steps:

  • Image input — same attachments API for image files
  • File upload — persist a file server-side and reference it across calls
  • Batch — process many documents in parallel at provider-batch rates