PDF document input

▶ Try in Sandbox Opens a live chat playground with this example prefilled — add your API key and attach a PDF, then hit Send. Runs in your browser; no code is executed.

What you will achieve

Send a one-page PDF containing the word BANANA and prompt 'What word is in this document?'. Assert the response matches /banana/i on OpenAI, Anthropic, and Google — with one call shape.

When and why you need this

Whenever the model must reason over structured document content: extracting data from a report, answering questions about a contract, summarising a research paper, or cross-referencing sections of a manual.

Raw provider SDKs diverge sharply for PDF input:

OpenAI accepts PDFs as a base64 content block (type input_file) in the Responses API, or as a file block in Chat Completions. The Files API upload-then-reference path is also supported.
Anthropic uses { type: 'document', source: { type: 'base64', media_type: 'application/pdf', data } } inside a content array.
Google treats a PDF as any other inlineData part with mimeType: 'application/pdf'.

Each path requires different code. attachments handles all three with the same call.

Step by step

Step 1 — Send a PDF by file path

import { complete } from '@combycode/llm-sdk';

const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'What word is in this document? Reply with just the word.',
  attachments: ['./banana.pdf'],
  maxTokens: 64,
});

console.log(text); // "BANANA"

loadContent() reads the file, detects application/pdf from the .pdf extension (or from the %PDF magic bytes), base64-encodes the content, and returns a DocumentPart with a base64 DataSource. The provider adapter translates that into the correct wire format.

Step 2 — Send a plain text document

const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'Summarise this document in one sentence.',
  attachments: ['./report.txt'],
  maxTokens: 128,
});

Files with text/plain MIME type (.txt extension or detected MIME) also resolve to a DocumentPart. This is distinct from an image attachment — the provider receives it through the document/file path rather than the image path.

Step 3 — Send a PDF from bytes in memory

import { readFileSync } from 'fs';

const pdfBytes = new Uint8Array(readFileSync('./contract.pdf'));

const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'List the parties named in this contract.',
  attachments: [pdfBytes],
  maxTokens: 256,
});

Pass Uint8Array when the bytes are already in memory. loadContent() detects application/pdf from the %PDF magic bytes (bytes 0-3: 25 50 44 46).

Step 4 — Build a DocumentPart manually for citations

When you want to enable Anthropic’s citation feature, build the part directly:

import { complete } from '@combycode/llm-sdk';
import type { DocumentPart } from '@combycode/llm-sdk';
import { readFileSync } from 'fs';
import { Buffer } from 'buffer';

const raw = new Uint8Array(readFileSync('./paper.pdf'));
const b64 = Buffer.from(raw).toString('base64');

const docPart: DocumentPart = {
  type: 'document',
  source: { type: 'base64', mimeType: 'application/pdf', data: b64 },
  citations: true,
};

const { text } = await complete({
  model: 'anthropic/claude-opus-4-5',
  apiKey: process.env.ANTHROPIC_API_KEY,
  messages: [
    {
      role: 'user',
      content: [
        docPart,
        { type: 'text', text: 'What does section 3 conclude?' },
      ],
    },
  ],
  maxTokens: 512,
});

citations: true is passed through to Anthropic’s API when the adapter supports it. It has no effect on OpenAI or Google adapters.

Your options

DocumentPart shape (ContentPart of type 'document'):

Field	Type	Description
`type`	`'document'`	Discriminator — set by `loadContent()` when MIME is `application/pdf` or `text/plain`.
`source`	`DataSource`	Where the document bytes come from (see below).
`citations`	`boolean`	Optional. When `true`, Anthropic returns inline citations. Ignored by other providers.

DataSource variants for documents:

`type`	Required fields	When to use
`'base64'`	`mimeType: string`, `data: string`	Raw base64-encoded document bytes. Output of `loadContent()`.
`'buffer'`	`mimeType: string`, `data: Uint8Array`	Raw bytes in memory. `mimeType` must be `'application/pdf'` or `'text/plain'`.
`'path'`	`mimeType: string`, `path: string`	Local file path (Node/Bun). SDK reads and encodes. Use `attachments` instead for simpler calls.
`'file'`	`fileId: string`	A file previously uploaded via the Files API.
`'provider_ref'`	`mimeType: string`, `refId: string`	Provider-specific file reference (e.g. Google Files API URI).
`'url'`	`url: string`	Not auto-detected as document by `loadContent()`. Use file-path or bytes instead.

MIME type detection for documents:

Extension / Magic	MIME type	Part type
`.pdf` / `%PDF` magic bytes	`application/pdf`	`DocumentPart`
`.txt`	`text/plain`	`DocumentPart`
`.png`, `.jpg`, `.gif`, `.webp`	`image/*`	`ImagePart` — NOT a document
`.wav`, `.mp3`, `.m4a`, etc.	`audio/*`	`AudioPart` — NOT a document

Provider support and constraints:

Provider	PDF support	Native text/plain	Approximate size limit
OpenAI	Yes (Responses API: `input_file` block; Chat Completions: `file` block)	Yes	512 MB per file (Files API); ~32 pages inline
Anthropic	Yes (`document` block with `base64` source)	Yes	32 MB per document; 100 pages per request
Google	Yes (`inlineData` with `application/pdf`)	Yes (as `inlineData`)	20 MB inline; use Files API for larger

When to use Files API instead of inline:

For large documents or documents shared across many requests, upload once with the Files upload helper and reference by file DataSource. The inline path (attachments) re-encodes and re-sends the document on every call.

Compare the SDKs

import { complete } from '@combycode/llm-sdk';

// `attachments` now loads PDFs (and audio/video) by MIME, not just images —
// the SDK builds the right document/image part per provider.
const t0 = performance.now();
const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'What word is in this document?',
  attachments: ['../../official-samples/_fixtures/banana.pdf'],
  maxTokens: 64,
});

console.log(JSON.stringify({ result: text.trim(), ms: Math.round(performance.now() - t0) }));

Official SDKs each require a different document block shape. OpenAI’s Responses API uses input_file with a base64 file_data sub-object. Anthropic’s SDK needs a document content block with a source containing media_type. Google inlines the PDF bytes as inlineData with no special document type. ORXA resolves a single DocumentPart with a base64 DataSource into the right shape per provider — the same attachments: ['./file.pdf'] call works for all three.

Gotchas and next steps

Page count limits apply to inline uploads. At ~32 pages (OpenAI) or 100 pages (Anthropic), large PDFs require the Files API. attachments always inlines — use the Files API path for large documents.

Scanned PDFs without selectable text may not work well. Providers read the text layer of a PDF; image-only scans (rasterised documents) are processed as image content, not text. Quality varies: Google Gemini handles image-PDF better than older OpenAI models.

text/plain vs image/* detection. If a .txt file contains binary content the MIME sniff will still return text/plain (no magic-byte check for plain text). Binary content sent as a document will corrupt. Always use the correct extension.

Anthropic citations: true adds response size. With citations enabled, the response includes document excerpts alongside the answer. Account for this in maxTokens.

Next steps:

Image input — same attachments API for image files
File upload — persist a file server-side and reference it across calls
Batch — process many documents in parallel at provider-batch rates