Cost & Estimation
title: Cost & Estimation
Section titled “title: Cost & Estimation”Cost & Estimation
Section titled “Cost & Estimation”Source: src/plugins/cost-collector/collector.ts,
src/plugins/cost-collector/cost-collector-internal.ts,
src/helpers/estimate.ts, src/helpers/estimator.ts,
src/helpers/calibration-store.ts, src/helpers/calibration-types.ts,
src/plugins/model-catalog/catalog.ts.
Purpose and responsibilities
Section titled “Purpose and responsibilities”The cost subsystem has three independent parts:
- ModelCatalog — static registry of model metadata (pricing, capabilities, API preferences, state-retention rules). Loaded once; queried by all other layers.
- CostCollector — runtime accumulant. Subscribes to
onCompletionandonMediaGenerated; computes actual cost per call; enforces budgets. estimate()/Estimator— pre-flight cost estimation.estimate()is a pure function (no network, no state).Estimatorwraps it with EWMA-calibrated output-token bounds derived from observed completions.
ModelCatalog (src/plugins/model-catalog/catalog.ts)
Section titled “ModelCatalog (src/plugins/model-catalog/catalog.ts)”Data model
Section titled “Data model”interface ModelInfo { provider: string; model: string; // canonical slug, e.g. "claude-opus-4.8" pricing: ModelPricing; preferredApi: ApiType; supportedApis: ApiType[]; contextWindow?: number; maxOutput?: number; capabilities: ModelCapabilities; reasoning: ModelReasoning; tokenizer?: TokenizerInfo; aliases?: string[]; supportsPreviousResponseId?: boolean;}
interface ModelPricing { inputPerMTok?: number; // USD per 1M input tokens outputPerMTok?: number; // USD per 1M output tokens cacheReadPerMTok?: number; cacheWritePerMTok?: number; audioInputPerMTok?: number; audioOutputPerMTok?: number; tiers?: Record<string, TierRates>; // keyed by provider's OWN tier name}
type TierRates = Omit<ModelPricing, 'tiers'>;
interface ModelCapabilities { toolUse: boolean; streaming: boolean; structuredOutput: boolean; vision: boolean; audio: boolean; video: boolean; imageGeneration: boolean; audioGeneration: boolean; videoGeneration: boolean;}
interface ModelReasoning { supported: boolean; automatic: boolean; effortControl: boolean; effortValues?: string[]; encryptedContent: boolean; summaryAvailable: boolean;}Service-tier pricing: when a completion returns usage.pricingTier (set by
provider adapters from usage.serviceTier), calculateCost overlays
tiers[pricingTier] on top of flat rates. Fields in the tier entry override
flat rates; missing fields fall back to flat rates. Example:
tiers['flex'] = { inputPerMTok: 0.5 } gives discounted input at flex-tier
while the output rate stays at the flat rate.
Internal storage
Section titled “Internal storage”class ModelCatalog { private models: Map<string, ModelInfo> // key: "provider/model" (canonical slug) private aliasIndex: Map<string, string> // "provider/alias" -> "provider/canonical-slug"}set(info: ModelInfo) registers a model: inserts into models under the
canonical key, then iterates info.aliases[] to populate aliasIndex.
Aliases are stored as "provider/alias" -> "provider/modelId".
get(provider, modelId): checks models directly (canonical lookup), then
aliasIndex for an alias, then returns undefined. Does NOT throw — callers
check for undefined.
resolveModelId(provider, slug): follows alias chain and returns the provider’s
canonical model ID (e.g. "claude-3-5-sonnet-latest" resolves to canonical).
Returns slug unchanged if no alias found.
Provider defaults
Section titled “Provider defaults”loadProviderDefaults() loads the five built-in catalog JSON files via static
imports resolved at bundle time. Provider files live at:
src/llm/providers/{provider}/catalog.json.
PROVIDER_STATE constant holds per-provider defaults for stateRetentionDuration
(a duration string e.g. "30d", "72h", or null) and modelBound.
Applied when a model entry omits those fields.
Query API
Section titled “Query API”catalog.getPricing(provider, model): ModelPricing | nullcatalog.getPreferredApi(provider, model): ApiType | nullcatalog.supportsApi(provider, model, apiType): booleancatalog.supportsPreviousResponseId(provider, model): booleancatalog.getStateRetention(provider, model): string | null // duration string e.g. "30d", or nullcatalog.isStateModelBound(provider, model): booleanCostCollector (src/plugins/cost-collector/collector.ts)
Section titled “CostCollector (src/plugins/cost-collector/collector.ts)”Construction and subscription
Section titled “Construction and subscription”class CostCollector { constructor(hooks: HookBus, catalog: ModelCatalog, opts?: CostCollectorOptions)}
interface CostCollectorOptions { budgets?: BudgetSpec[]; sessionBudget?: number; // convenience: total USD limit for this instance}Calls hooks.on('onCompletion', ...) and hooks.on('onMediaGenerated', ...).
Uses hooks.emitSync (NOT emit) for cost events — onCostEntry,
onBudgetWarning, onBudgetExceeded are all sync.
handleCompletion flow
Section titled “handleCompletion flow”- Extract provider-reported cost via
extractProviderCost(response). - If no provider cost: compute via
calculateCost(catalog, provider, modelId, usage). - Build
CostEntry { provider, modelId, inputTokens, outputTokens, costUsd, tags?, timestamp }. - Push to
this.entries[]. - Accumulate into
this.byProvider,this.byModel,this.byTagmaps. hooks.emitSync('onCostEntry', { entry }).- Call
checkBudgets(entry.costUsd).
Provider cost extraction (cost-collector-internal.ts:extractProviderCost)
Section titled “Provider cost extraction (cost-collector-internal.ts:extractProviderCost)”Two special-case providers:
- openrouter:
response.raw?.usage?.cost— OpenRouter injects total USD directly into the usage object. - xai (Grok):
response.raw?.usage?.cost_in_usd_ticks / 1e10— xAI’s API returns integer “USD ticks” (10-nanosecond units). Dividing by 1e10 converts to USD.
All other providers: undefined (compute from token counts).
handleMediaGenerated flow
Section titled “handleMediaGenerated flow”Extracts { provider, mediaType, count?, durationSeconds? } from the event.
Calls mediaUnitCost(catalog, provider, mediaType, count, durationSeconds).
The mediaUnitCost helper looks up per-unit or per-second rates from the
model’s pricing entry.
Budget enforcement (checkBudgets)
Section titled “Budget enforcement (checkBudgets)”interface BudgetSpec { scope?: { provider?, model?, tag? }; // undefined -> global limitUsd: number; thresholds?: number[]; // fraction of limitUsd, e.g. [0.7, 0.9] action?: 'warn' | 'stop'; // 'warn' default; 'stop' pauses watched agents watchedAgents?: AgentLoop[];}After each CostEntry, checkBudgets runs all budgets:
matchesScope(entry, budget.scope): true if all scope fields match.getProviderTotal(this, budget.scope): sum of all matching entries.- For each
thresholdinbudget.thresholds[]: iftotal >= limitUsd * thresholdand not already notified ->hooks.emitSync('onBudgetWarning', { budget, total }). - If
total >= budget.limitUsd:hooks.emitSync('onBudgetExceeded', { budget, total }).- If
budget.action === 'stop': calls.stop()on eachAgentLoopinbudget.watchedAgents. Fire-and-forget (no await). - Budget is removed from the list to prevent repeated stop signals.
Query API
Section titled “Query API”collector.total: number // all-time USDcollector.byProvider: Map<string, number>collector.byModel: Map<string, number>collector.byTag: Map<string, number>collector.entries: CostEntry[]collector.query(filter): { totalUsd, breakdown }collector.export(): SerializedCostStatecollector.import(state: SerializedCostState): void // merge into currentCost computation ladder (cost-collector-internal.ts:computeCost)
Section titled “Cost computation ladder (cost-collector-internal.ts:computeCost)”computeCost(catalog, input: CostComputeInput) // CostComputeInput: { provider, model, tokens?, media?, providerEvidence?, tier? }Four steps in priority order:
- Provider-reported total — if
extractProviderCostreturned a value, use it. Skip steps 2-4. - Token cost — if
pricing.inputPerMTokorpricing.outputPerMTokexist, compute viacalculateCost(). - Media unit cost — if this is a media generation (no tokens), compute
via
mediaUnitCost(). - Unknown — return
$0.00(honest zero, notnull).
calculateCost(catalog, provider, model, tokens, providerEvidence, tier?)
Section titled “calculateCost(catalog, provider, model, tokens, providerEvidence, tier?)”- Fetch
pricing = catalog.getPricing(provider, model). - If
tieris set ANDpricing.tiers?.[tier]exists: mergetiers[tier]over flat rates (tier fields win; absent fields fall back to flat rates). - Compute:
Also supportsinputCost = (tokens.input / 1_000_000) * inputPerMTokoutputCost = (tokens.output / 1_000_000) * outputPerMTokcacheReadCost = (tokens.cached / 1_000_000) * cacheReadPerMTokcacheWriteCost = (tokens.cacheWrite / 1_000_000) * cacheWritePerMTokreasoningCost = (tokens.reasoning / 1_000_000) * outputPerMTok
audioInputPerMTok/audioOutputPerMTokfor audio tokens. - Cache-read tokens are billed at
cacheReadPerMTok(defaults toinputPerMTok * 0.1). Cache-write atcacheWritePerMTok(defaults toinputPerMTok * 1.25).
estimate() (src/helpers/estimate.ts)
Section titled “estimate() (src/helpers/estimate.ts)”async function estimate(request: EstimateRequest, opts?: EstimateOptions): Promise<EstimateResult>Async, pure (no network). Throws UnknownModelError when the model is absent
from the catalog. Model is specified as "provider/model" in request.model
or as separate request.model + request.provider.
interface EstimateRequest { model: string; // "provider/model" or bare slug (needs provider field) provider?: ProviderName; prompt: string | ContentPart[] | Message[]; system?: string; maxTokens?: number;}
interface EstimateOptions { model?: string; expectedOutputTokens?: number; engine?: EngineHandle;}
interface EstimateResult { model: string; inputTokens: number; estOutputTokens: number; cost: { low: number; expected: number; high: number }; breakdown: { inputUsd: number; outputUsd: number; imageUsd?: number; audioUsd?: number }; currency: 'USD'; assumptions: string[];}Three-bound system
Section titled “Three-bound system”low: 0 output tokens. Minimum possible cost.expected:resolveExpectedOutput()— usesopts.expectedOutputTokensif provided, elseDEFAULT_EXPECTED_OUTPUT_TOKENS = 512.high:resolveHighOutput()— usesrequest.maxTokensif set, elsecatalog.get().maxOutput, elseFALLBACK_MAX_OUTPUT_TOKENS = 4096.
Input token counting
Section titled “Input token counting”countInputTokens(counter, ctx, request, assumptions): uses HybridTokenCounter
for accurate per-model estimation (tiktoken / count-api / heuristic, per catalog).
priceMediaParts(prompt, pricing, provider, model, assumptions): scans content
parts for images and audio, computing flat-rate media costs added to each bound.
Media cost is additive, independent of token cost.
assumptions[]: collects human-readable strings explaining which fallbacks
were applied. Examples: "expected output tokens defaulted to DEFAULT_EXPECTED_OUTPUT_TOKENS=512",
"max output from model catalog: 4096", "image tokens approximated".
Estimator (src/helpers/estimator.ts)
Section titled “Estimator (src/helpers/estimator.ts)”Stateful wrapper that calibrates estimate()’s expected and high bounds using
EWMA mean + p90 histogram learned from observed completions.
class Estimator { constructor(opts?: EstimatorOptions) async estimate(request: EstimateRequest, opts?: EstimateOptions): Promise<EstimateResult> async record(obs: CalibrationObservation): Promise<void> subscribeToEngine(engine: EngineHandle): () => void subscribeToHooks(hooks: HookBus): () => void}subscribeToEngine and subscribeToHooks are alternative wiring points.
Both feed onCompletion events into record() automatically.
applyCalibratedBounds (private)
Section titled “applyCalibratedBounds (private)”- Load the calibration entry for
(provider, model, inputBucket). - If entry is absent or has < 5 samples: return
baseunchanged. - Calibrated expected =
Math.round(entry.ewmaMean). - Calibrated high =
Math.min(Math.max(p90, ewmaMean), hardCeiling).hardCeiling = request.maxTokens ?? catalog.maxOutput ?? FALLBACK_MAX_OUTPUT_TOKENS. - Recomputes
cost.expectedandcost.highfrom the calibrated token counts.
The ceiling prevents calibrated high from exceeding the actual model maximum.
expected and high are always >= low (the Math.max guard ensures it).
OutputCalibrationStore (src/helpers/calibration-store.ts)
Section titled “OutputCalibrationStore (src/helpers/calibration-store.ts)”Backed by the Persistence interface from src/plugins/persistence/types.ts
(same get/set/list interface used by the cache and context subsystems).
Key structure
Section titled “Key structure”OUTPUT_CALIBRATION_KEY_PREFIX + provider/model#bucket// e.g. "output-calibration:anthropic/claude-3-5-sonnet-20241022#2000-8000"inputBucketLabel(inputTokens): maps to one of five labels.
INPUT_SIZE_BUCKET_EDGES = [500, 2000, 8000, 32000]:
"0-500","500-2000","2000-8000","8000-32000","32000+"
Histogram structure
Section titled “Histogram structure”interface OutputCalibrationEntry { key: string; ewmaMean: number; // EWMA-smoothed mean of observed output tokens histogram: number[]; // P90_HISTOGRAM_BIN_COUNT = 32 bins count: number; // total observations recorded lastUpdated: number;}
P90_HISTOGRAM_BIN_WIDTH = 256 // tokens per binBin index: Math.min(Math.floor(outputTokens / 256), 31) — clamped to 31
(captures all output >= 7936 tokens in the last bin).
record(obs: CalibrationObservation)
Section titled “record(obs: CalibrationObservation)”- Resolve calibration key and load or initialize the entry.
- EWMA update:
ewmaMean = alpha * outputTokens + (1 - alpha) * existing.ewmaMean.CALIBRATION_EWMA_ALPHA = 0.15. histogram[binIndex] += 1;count += 1.- Persist via
persistence.set(key, entry).
histogramQuantile(histogram, quantile) — p90
Section titled “histogramQuantile(histogram, quantile) — p90”- Sum all bin counts (
total). - Target =
Math.ceil(quantile * total)whereCALIBRATION_HIGH_QUANTILE = 0.9. - Walk bins from 0 upward accumulating until cumulative >= target.
- Return center of target bin:
(binIndex + 0.5) * 256.
p90(entry): returns histogramQuantile(entry.histogram, 0.9).
get(provider, model, inputTokens): returns null if entry is missing;
callers in Estimator check entry.count < 5 before using calibration.
Key invariants and gotchas
Section titled “Key invariants and gotchas”computeCoststep 1 is exclusive: ifextractProviderCostreturns a value (even$0for a free model), steps 2-4 are skipped entirely. The provider cost is authoritative.pricingTiervsserviceTier:usage.serviceTieris the provider’s raw tier name in the response.usage.pricingTieris what the SDK maps it to for catalog lookup.LLMClientdoes the mapping inparseResponse.- Budget
action:'stop'is not immediate:AgentLoop.stop()sets a flag; the loop checks it at the top of the next iteration. Mid-step execution completes first. - Calibration requires >= 5 EWMA observations:
p90()returnsundefinedbelow this threshold;Estimator.applyCalibratedBoundsskips calibration and returns the raw (uncalibrated) estimate unchanged. loadProviderDefaults()is idempotent: repeated calls re-register the same models;set()overwrites if the model already exists.estimate()throws on unknown models: callers must either handleUnknownModelErroror ensure the model is registered before calling.- Alias resolution in the cost path:
CostCollectorreceivesmodelIdfromCompletionResponse.modelId, which provider adapters set to the canonical ID.catalog.get()with the canonical ID finds it directly; no alias lookup needed in the cost path. - Honest-zero applies in
calculateCost, notcomputeCost: the fallback toestimatedInputTokenshappens insidecalculateCost(step 2). If step 1 short-circuits,estimatedInputTokensis never consulted.