Text embeddings
What you will achieve
Section titled “What you will achieve”Call embed() with a string or an array of strings and receive a vector (or matrix) back — embeddings[0] is the first vector, dimensions is its length, and usage tokens are reported. One call shape works across OpenAI, Google, and OpenRouter. Anthropic and xAI have no first-party embeddings endpoint and are not supported.
When and why
Section titled “When and why”Embeddings convert text into a fixed-length numeric vector that captures semantic meaning. Two sentences with similar meaning land close together in vector space; unrelated sentences land far apart. You need them for:
- Semantic search — retrieve the most relevant chunks from a document corpus without exact keyword matches.
- RAG (retrieval-augmented generation) — feed retrieved chunks as context into a subsequent
complete()call. - Clustering / classification — group texts by topic without labelled training data.
- Deduplication — detect near-duplicate records faster than string diff.
The raw problem: OpenAI’s call is client.embeddings.create({ model, input }) and returns data[0].embedding. Google’s call is ai.models.embedContent({ model, contents }) and returns embeddings[0].values. Different endpoints, different auth headers, different extraction paths. embed() normalises all of this.
Step by step
Section titled “Step by step”Step 1 — Embed a single string
Section titled “Step 1 — Embed a single string”import { embed } from '@combycode/llm-sdk';
const result = await embed({ model: 'openai/text-embedding-3-small', apiKey: process.env.OPENAI_API_KEY, input: 'The quick brown fox',});
console.log(result.dimensions); // e.g. 1536console.log(result.embeddings[0].length); // same as dimensionsconsole.log(result.embeddings[0][0]); // first component, e.g. 0.012...result.embeddings is always a number[][] — one row per input string. For a single string, result.embeddings[0] is your vector.
Step 2 — Embed multiple strings in one call
Section titled “Step 2 — Embed multiple strings in one call”OpenAI and OpenRouter accept a batch in one HTTP round-trip. Google does not have a native batch endpoint — the adapter loops the calls automatically:
const result = await embed({ model: 'openai/text-embedding-3-small', apiKey: process.env.OPENAI_API_KEY, input: [ 'The quick brown fox', 'A lazy dog rests here', 'TypeScript is a typed superset of JavaScript', ],});
// result.embeddings[0] -- vector for first string// result.embeddings[1] -- vector for second string// result.embeddings[2] -- vector for third stringconsole.log(result.embeddings.length); // 3All vectors have the same dimensions, making it safe to compare them with cosine similarity.
Step 3 — Check token usage (cost hook)
Section titled “Step 3 — Check token usage (cost hook)”embed() emits an onCompletion hook after each call so the cost collector can account for embedding tokens. You can also read usage directly:
const result = await embed({ model: 'openai/text-embedding-3-small', input: 'Hello world',});
console.log(result.usage?.inputTokens); // e.g. 2usage is { inputTokens: number } | undefined. Google’s adapter currently returns no usage object (the Gemini embedContent response does not carry token counts); OpenAI and OpenRouter return prompt_tokens from the response body.
Step 4 — Switch to Google
Section titled “Step 4 — Switch to Google”The call shape is identical — only the model string changes:
const result = await embed({ model: 'google/gemini-embedding-exp-03-07', apiKey: process.env.GOOGLE_API_KEY, input: 'The quick brown fox',});Google routes to POST /v1beta/models/{model}:embedContent. The adapter adds the x-goog-api-key header and unwraps embedding.values from the response body.
Step 5 — Plug vectors into a similarity search
Section titled “Step 5 — Plug vectors into a similarity search”import { embed } from '@combycode/llm-sdk';
function cosineSimilarity(a: number[], b: number[]): number { const dot = a.reduce((s, v, i) => s + v * (b[i] ?? 0), 0); const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0)); const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0)); return dot / (magA * magB);}
const corpus = ['OpenAI makes GPT models', 'Google makes Gemini', 'Rust is memory safe'];
const { embeddings: corpusVecs } = await embed({ model: 'openai/text-embedding-3-small', input: corpus,});
const { embeddings: [queryVec] } = await embed({ model: 'openai/text-embedding-3-small', input: 'Who built GPT?',});
const ranked = corpus .map((text, i) => ({ text, score: cosineSimilarity(queryVec, corpusVecs[i]) })) .sort((a, b) => b.score - a.score);
console.log(ranked[0].text); // 'OpenAI makes GPT models'For a full retrieval pipeline with chunking, indexing, and RAG, see the Retrieval (RAG) guide.
Your options
Section titled “Your options”embed() accepts an EmbedOptions object:
| Option | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Namespaced (openai/text-embedding-3-small) or bare with provider. Determines which adapter is used. |
input | string | string[] | Yes | One string or a batch. Batch is sent in a single request for OpenAI/OpenRouter; looped for Google. |
provider | ProviderName | No | Required when model is bare (e.g. model: 'text-embedding-3-small', provider: 'openai'). |
apiKey | string | No | Falls back to engine.apiKeys[provider] from the global engine config. |
adapter | EmbeddingProviderAdapter | No | Override the auto-selected adapter with a custom one. Useful for testing or self-hosted endpoints. |
engine | EngineHandle | No | Override the global engine (rate-limit queue, hooks, keys). Defaults to coreRegistry.get(). |
Return value (EmbedResult):
| Field | Type | Notes |
|---|---|---|
embeddings | number[][] | One vector per input string, in order. Length equals input.length (or 1 for a single string). |
dimensions | number | Length of each vector (embeddings[0].length). 0 if the response was empty. |
model | string | The model id echoed from the request. |
usage | { inputTokens: number } | undefined | Present for OpenAI and OpenRouter; absent for Google (Gemini embedContent carries no token count). |
Supported providers:
| Provider | Endpoint | dimensions note |
|---|---|---|
openai | POST /v1/embeddings | From data[0].embedding.length. OpenAI text-embedding-3-* support dimensions truncation natively — pass via params on a custom adapter if needed. |
openrouter | POST /api/v1/embeddings (OpenAI-compat) | Routed to any OpenAI-compatible embedding model available on OpenRouter. |
google | POST /v1beta/models/{model}:embedContent | Looped per input (no batch endpoint). No usage returned. |
anthropic | — | No first-party endpoint. Not supported. |
xai | — | No first-party endpoint. Not supported. |
When to use a custom adapter:
Pass adapter when you need a self-hosted or custom-base-URL endpoint. For example, an Azure OpenAI deployment:
import { OpenAIEmbeddingAdapter } from '@combycode/llm-sdk/providers/openai';
const azureAdapter = new OpenAIEmbeddingAdapter({ apiKey: process.env.AZURE_KEY!, baseURL: 'https://my-resource.openai.azure.com/openai/deployments/my-embed-deployment',});
const result = await embed({ model: 'text-embedding-3-small', input: 'hello', adapter: azureAdapter,});Compare the SDKs
Section titled “Compare the SDKs”OpenAI’s official SDK calls client.embeddings.create() and returns data[0].embedding; Google’s returns embeddings[0].values; OpenRouter mirrors the OpenAI shape. ORXA unifies these under one embed() signature with a consistent EmbedResult. All HTTP flows through engine.fetch, which means rate-limit queuing, retry, and onCompletion hooks (including cost tracking) apply to embedding calls exactly as they do to complete() calls.
Gotchas and next steps
Section titled “Gotchas and next steps”Google loops requests. For an array of three strings, Google makes three HTTP calls. The calls share the NetworkEngine queue (rate-limit aware), but latency is additive. For large batches, prefer OpenAI or OpenRouter.
Dimensions vary by model. text-embedding-3-small returns 1536 dimensions by default; text-embedding-3-large returns 3072; gemini-embedding-exp-03-07 returns 3072. When mixing models across services in a corpus, all vectors must use the same model — dimensions must match for cosine similarity to be valid.
Usage is absent for Google. If you are using cost tracking via the onCompletion hook, Google embedding calls will emit a hook but with inputTokens: 0 (because the response carries no count). OpenAI and OpenRouter fill this correctly.
embed() is a one-shot helper. It creates its own adapter each call. For repeated embedding inside a tight loop, pass a pre-built adapter to skip the constructor overhead.
Next steps:
- Retrieval (RAG) guide — full pipeline: chunk, embed, index, retrieve, complete
- Web search (grounded search) — provider-side real-time search without a corpus
- Cost tracking guide — how
onCompletionaccounts for embedding tokens