PDF document input
What you will achieve
Section titled “What you will achieve”Send a one-page PDF containing the word BANANA and prompt 'What word is in this document?'. Assert the response matches /banana/i on OpenAI, Anthropic, and Google — with one call shape.
When and why you need this
Section titled “When and why you need this”Whenever the model must reason over structured document content: extracting data from a report, answering questions about a contract, summarising a research paper, or cross-referencing sections of a manual.
Raw provider SDKs diverge sharply for PDF input:
- OpenAI accepts PDFs as a base64 content block (type
input_file) in the Responses API, or as afileblock in Chat Completions. The Files API upload-then-reference path is also supported. - Anthropic uses
{ type: 'document', source: { type: 'base64', media_type: 'application/pdf', data } }inside acontentarray. - Google treats a PDF as any other
inlineDatapart withmimeType: 'application/pdf'.
Each path requires different code. attachments handles all three with the same call.
Step by step
Section titled “Step by step”Step 1 — Send a PDF by file path
Section titled “Step 1 — Send a PDF by file path”import { complete } from '@combycode/llm-sdk';
const { text } = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, prompt: 'What word is in this document? Reply with just the word.', attachments: ['./banana.pdf'], maxTokens: 64,});
console.log(text); // "BANANA"loadContent() reads the file, detects application/pdf from the .pdf extension (or from the %PDF magic bytes), base64-encodes the content, and returns a DocumentPart with a base64 DataSource. The provider adapter translates that into the correct wire format.
Step 2 — Send a plain text document
Section titled “Step 2 — Send a plain text document”const { text } = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, prompt: 'Summarise this document in one sentence.', attachments: ['./report.txt'], maxTokens: 128,});Files with text/plain MIME type (.txt extension or detected MIME) also resolve to a DocumentPart. This is distinct from an image attachment — the provider receives it through the document/file path rather than the image path.
Step 3 — Send a PDF from bytes in memory
Section titled “Step 3 — Send a PDF from bytes in memory”import { readFileSync } from 'fs';
const pdfBytes = new Uint8Array(readFileSync('./contract.pdf'));
const { text } = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, prompt: 'List the parties named in this contract.', attachments: [pdfBytes], maxTokens: 256,});Pass Uint8Array when the bytes are already in memory. loadContent() detects application/pdf from the %PDF magic bytes (bytes 0-3: 25 50 44 46).
Step 4 — Build a DocumentPart manually for citations
Section titled “Step 4 — Build a DocumentPart manually for citations”When you want to enable Anthropic’s citation feature, build the part directly:
import { complete } from '@combycode/llm-sdk';import type { DocumentPart } from '@combycode/llm-sdk';import { readFileSync } from 'fs';import { Buffer } from 'buffer';
const raw = new Uint8Array(readFileSync('./paper.pdf'));const b64 = Buffer.from(raw).toString('base64');
const docPart: DocumentPart = { type: 'document', source: { type: 'base64', mimeType: 'application/pdf', data: b64 }, citations: true,};
const { text } = await complete({ model: 'anthropic/claude-opus-4-5', apiKey: process.env.ANTHROPIC_API_KEY, messages: [ { role: 'user', content: [ docPart, { type: 'text', text: 'What does section 3 conclude?' }, ], }, ], maxTokens: 512,});citations: true is passed through to Anthropic’s API when the adapter supports it. It has no effect on OpenAI or Google adapters.
Your options
Section titled “Your options”DocumentPart shape (ContentPart of type 'document'):
| Field | Type | Description |
|---|---|---|
type | 'document' | Discriminator — set by loadContent() when MIME is application/pdf or text/plain. |
source | DataSource | Where the document bytes come from (see below). |
citations | boolean | Optional. When true, Anthropic returns inline citations. Ignored by other providers. |
DataSource variants for documents:
type | Required fields | When to use |
|---|---|---|
'base64' | mimeType: string, data: string | Raw base64-encoded document bytes. Output of loadContent(). |
'buffer' | mimeType: string, data: Uint8Array | Raw bytes in memory. mimeType must be 'application/pdf' or 'text/plain'. |
'path' | mimeType: string, path: string | Local file path (Node/Bun). SDK reads and encodes. Use attachments instead for simpler calls. |
'file' | fileId: string | A file previously uploaded via the Files API. |
'provider_ref' | mimeType: string, refId: string | Provider-specific file reference (e.g. Google Files API URI). |
'url' | url: string | Not auto-detected as document by loadContent(). Use file-path or bytes instead. |
MIME type detection for documents:
| Extension / Magic | MIME type | Part type |
|---|---|---|
.pdf / %PDF magic bytes | application/pdf | DocumentPart |
.txt | text/plain | DocumentPart |
.png, .jpg, .gif, .webp | image/* | ImagePart — NOT a document |
.wav, .mp3, .m4a, etc. | audio/* | AudioPart — NOT a document |
Provider support and constraints:
| Provider | PDF support | Native text/plain | Approximate size limit |
|---|---|---|---|
| OpenAI | Yes (Responses API: input_file block; Chat Completions: file block) | Yes | 512 MB per file (Files API); ~32 pages inline |
| Anthropic | Yes (document block with base64 source) | Yes | 32 MB per document; 100 pages per request |
Yes (inlineData with application/pdf) | Yes (as inlineData) | 20 MB inline; use Files API for larger |
When to use Files API instead of inline:
For large documents or documents shared across many requests, upload once with the Files upload helper and reference by file DataSource. The inline path (attachments) re-encodes and re-sends the document on every call.
Compare the SDKs
Section titled “Compare the SDKs”Official SDKs each require a different document block shape. OpenAI’s Responses API uses input_file with a base64 file_data sub-object. Anthropic’s SDK needs a document content block with a source containing media_type. Google inlines the PDF bytes as inlineData with no special document type. ORXA resolves a single DocumentPart with a base64 DataSource into the right shape per provider — the same attachments: ['./file.pdf'] call works for all three.
Gotchas and next steps
Section titled “Gotchas and next steps”Page count limits apply to inline uploads. At ~32 pages (OpenAI) or 100 pages (Anthropic), large PDFs require the Files API. attachments always inlines — use the Files API path for large documents.
Scanned PDFs without selectable text may not work well. Providers read the text layer of a PDF; image-only scans (rasterised documents) are processed as image content, not text. Quality varies: Google Gemini handles image-PDF better than older OpenAI models.
text/plain vs image/* detection. If a .txt file contains binary content the MIME sniff will still return text/plain (no magic-byte check for plain text). Binary content sent as a document will corrupt. Always use the correct extension.
Anthropic citations: true adds response size. With citations enabled, the response includes document excerpts alongside the answer. Account for this in maxTokens.
Next steps:
- Image input — same attachments API for image files
- File upload — persist a file server-side and reference it across calls
- Batch — process many documents in parallel at provider-batch rates