LLM Token Counter — tiktoken, Context & Cost
LLM token counter & prompt budget
This tool estimates how many tokens a prompt consumes for the model you select, together with word counts and character statistics. Commercial APIs meter input and output in tokens and cap prompts by a context window, so a tokenizer-grounded preview helps you plan retrieval chunks, few-shot examples, tool definitions, and system prompts before you spend quota or hit length errors.
Pick a provider filter and a model row from the catalog. When the row maps to a tiktoken model name, counting uses CoreBPE in WebAssembly—the same family of encodings as the tiktoken library for that name. Other catalog rows may fall back to a documented base encoding (proxy BPE) or to a coarse characters-per-token heuristic; the results card states which path ran.
Below the interactive area, a separate sortable table lists indicative U.S.-style list prices per million tokens (input, optional cached input, output), published context limits, and links to official pricing pages. That table never inspects your paste; it is a static reference snapshot bundled with the app.
Workbench: prompt area, model options, and multiple outputs
The main tool area is split into two setting panels—Prompt input (single vs. split paste, and whether total characters include newlines) and Model & options (reference catalog model or custom context and $/M rates, output reserve, Add Output, and Jump to reference). Your prompt lives in the Prompt text or Prompt sections box below those panels; Clear prompt in that box header clears only the pasted content, using the same compact control style as Remove on each result card.
Add Output snapshots the current model options into another result block so you can compare the same prompt under different providers, models, or custom rates side by side. You can keep up to 5 comparison blocks; if you reach that limit, the tool will not add another and will tell you to remove a block first. Each block has its own Reference vs. Custom source toggle, metrics, Copy summary, and Remove.
Jump to reference scrolls this page to the sortable model pricing table. The app may persist UI preferences—including filters, sort order, and your list of output blocks and their per-block settings—in localStorage; it does not store your prompt body there.
Tokenization: BPE, tiktoken, proxy encodings, and heuristics
Large language models do not read characters directly; a tokenizer maps Unicode text to a sequence of integer tokens via a vocabulary and merge rules (commonly byte-level BPE variants). Two strings that look similar in an editor can tokenize differently after normalization, which is why byte length or word count alone is a poor proxy for invoiceable tokens.
For OpenAI-compatible names, this build resolves the same model keys that tiktoken exposes: the counter runs ordinary encoding (no special chat template injection) so your paste is measured as raw prompt bytes after UTF-8, matching typical developer expectations for pasted prompts.
When the catalog cannot load a named BPE, it may still load a fallback tokenizer such as cl100k_base or o200k_base; the UI labels that as proxy BPE because the provider might ship a custom vocabulary even if pricing is grouped with a public tokenizer family. The last resort is a heuristic that divides character count by four and rounds up—use it only for rough magnitude.
Context window, output reserve, and usage percentage
The context window is the maximum number of tokens the model accepts in a single forward pass for the catalog entry (sometimes reported in “tokens” mixing input and tool overhead on the provider side). The calculator compares your counted prompt tokens plus an integer reserve you type—typically headroom reserved for model output such as max_tokens—to that published window and shows a usage percentage.
If the reserve is zero, the percentage reflects prompt-only pressure on the window. If you set the reserve near your expected completion length, the percentage approximates total conversational pressure including a generation budget. Percentages are undefined when the catalog context is zero or missing.
Indicative cost lines from reference $/M rates
The results grid multiplies your token count by the catalog input price per million tokens for an estimated input charge, optionally multiplies the same count by the cached-input rate to show a hypothetical cached scenario, and multiplies the reserve by the output rate for a completion estimate. Missing rates show an em dash. These are back-of-the-envelope figures for planning; real invoices add discounts, tax, regional pricing, and token classifications you cannot see from paste alone.
Privacy and where computation runs
Token counting executes entirely in your browser via WebAssembly. The tool persists only UI preferences (selected model, filters, sort order) in localStorage—never your prompt body. In normal use, nothing in this client sends your pasted text to CompuTools servers for tokenization; treat network policy the same as any static site you load from the open web.
Accuracy, billing, and catalog freshness
- Provider dashboards may attach hidden system instructions, reformat JSON, prepend retrieval blocks, or count tool calls differently, so billed tokens can diverge from a raw paste.
- Cached-input discounts apply only to repeated prefixes the provider recognizes; the cached line in the UI assumes the optimistic all-cached case when a rate exists.
- Reference data as of: 2026-04-08. Sourced from LiteLLM model_prices_and_context_window.json (see scripts/update_llm_models.py). Reference $/M; verify on provider dashboards.
Model pricing reference
Indicative $/M (input, cached input, output) and links to external pricing pages, from bundled reference data—not a product catalog. Sort columns or filter by provider / substring—no estimates from your prompt text.
Reference data as of: 2026-04-08
Click column headers to sort; toggle again to reverse. Figures are for reference only (not billing quotes).
Frequently Asked Questions
What is an LLM token, and why does token count matter for API pricing and context limits?
An LLM token is a unit of text after the model tokenizer splits input into subwords or symbols. APIs bill and enforce limits in tokens rather than raw characters, and each model tokenizer can split the same Unicode differently. This tool counts tokens locally so you can estimate prompt size, remaining context window, and rough cost before sending a request.
How does this token counter compare to provider usage dashboards?
When the UI shows tiktoken-compatible counting, the WASM build uses the same CoreBPE encodings as tiktoken for the named model. Official dashboards may still differ slightly due to hidden system prefixes, formatting, or tool calls. Rows labeled proxy BPE or heuristic are approximations; use the provider for authoritative billing.
What do tiktoken, proxy BPE, and heuristic mean in the token note?
Tiktoken means the count uses CoreBPE for the catalog model name. Proxy BPE loads a documented base vocabulary such as cl100k_base or o200k_base when the catalog maps a provider model to that encoding. Heuristic divides character count by four and rounds up; it is the coarsest mode and appears only when no BPE can be loaded.
Why can the same pasted text get different token counts per model?
Each model uses a tokenizer vocabulary and merge table. GPT-4-class and GPT-5-class encodings may differ; multilingual and code-heavy text changes average bytes per token. Changing the model dropdown switches the tokenizer path so you can compare counts side by side.
How does context window percentage work with the reserve field?
The context window is the maximum input tokens the model accepts for the entry in the catalog. The tool adds your counted tokens to the integer reserve (typically headroom for model output such as max_tokens). The percentage equals (tokens plus reserve) divided by the published context window. If the context is zero or unknown, the percentage is not meaningful.
How are estimated input, cached-input, and output costs computed?
Estimated input multiplies your token count by the catalog input per million tokens rate. The cached estimate multiplies the same token count by the optional cached-input rate and assumes all tokens qualify for that tier; real billing may split cached versus uncached prefixes. Output estimate multiplies the reserve by the output per million token rate. Missing rates display as an em dash.
Is the reference table a guarantee of charges?
No. Figures are bundled reference data with an as-of date and a pricing note. Providers change tiers, discounts, and regional pricing. External links point to official pages for verification. The table supports comparison and planning, not a binding quote.
Is prompt text uploaded to a server?
No. Tokenization runs in WebAssembly in your browser. The implementation only stores UI preferences in localStorage. Your pasted prompt text does not leave the page for this tool in the way the client is built.
How does split mode combine system, user, and assistant fields?
Non-empty sections are trimmed and joined with blank lines between sections. The combined string is tokenized. Assistant is optional for simulating prior turns; the tool does not call any API.
What does Add Output do, and how many blocks can I add?
Add Output captures the model and pricing options from Model & options at that moment and appends a new result card for the same combined prompt. Use it to compare token counts, usage percentage, and indicative costs across different catalog rows or custom rates. You can add up to 5 blocks; remove a block with Remove if you need a slot for another configuration.
What does including or excluding newlines in total characters change?
It only affects the character statistics cards, not the token count. Newlines remain in the combined text used for tokenization. Use it when you want editor-style character totals that include or exclude line breaks.
How often is the reference catalog updated?
The catalog JSON is shipped as /data/llm_models.json and loaded in the browser; it includes a catalog_as_of field in the reference section. Regenerate that file when upstream pricing changes (see scripts/generate_llm_models.py). Always verify production budgets against the provider.
Can I export the reference table?
Yes. Use Copy table (TSV) to copy tab-separated values for spreadsheets. Sorting and filters apply to the copied rows.
Why might billed tokens differ from this counter in production?
Providers may add system or tool prefixes, format JSON or XML, re-tokenize after tool results, or classify cached tokens differently. Streaming endpoints and batch jobs may round differently. Treat this tool as planning; use API metering for invoices.
Which OpenAI-style encodings does tiktoken-rs resolve in this WASM build?
The catalog maps model names and fallback keys such as o200k_base or cl100k_base to CoreBPE instances that match OpenAI tiktoken selections for compatible models.
Related tools
- Robot Arm FK/IK Calculator — Robot arm kinematics for learning & practice: FK/IK, DH tables, Jacobian—aligned with cobots, industrial robots, humanoid arms, and AI robotics workflows. Robot programming friendly (CSV export). 2–6 DOF, SCARA, UR/KUKA-style presets, 3D visualization. Runs in your browser.
- Bezier Curve Editor — Interactive cubic Bezier curve editor with real-time animation preview, 35 easing presets, side-by-side comparison, and code export for CSS, Unity C#, C++, and JavaScript.
- Drone Calculator — Calculate drone thrust, TWR, hover throttle, flight time, and battery C-rating safety. Compare up to 4 motor/battery/propeller configurations side by side.