🤖 AI Tools

LLM API Cost Calculator

Estimate what your AI API will cost across GPT, Claude, Gemini and more. Enter your tokens, see the price per call, monthly projection, and which model is cheapest.

🤖

Pick a model and enter your input and output tokens per request. See the cost per call and total, project it monthly or yearly, and compare every model for the same workload.

Model

Input Tokens (per request)

Your prompt, system message, context, documents.

Output Tokens (per request)

The model's response. Often billed higher than input.

Number of Requests

Frequency

Total Cost

$0.00

Cost / Request

$0.00

Input Cost

$0.00

Output Cost

$0.00

Per Month

$0.00

Per Year

$0.00

💰 Same Workload — Every Model (cheapest first)

Model	In $/1M	Out $/1M	Total

Tap any row to switch to that model. Prices are standard pay-as-you-go rates per 1M tokens, verified June 2026 — batch, cached-input, and free-tier discounts are not applied. Always confirm current prices on the provider's official pricing page before relying on them.

What is an LLM API Cost Calculator?

An LLM API cost calculator estimates how much you'll pay to use a large language model through its API — for models like OpenAI's GPT, Anthropic's Claude, and Google's Gemini. These APIs bill per token, with separate rates for the tokens you send (input) and the tokens the model generates (output). This tool turns your token usage and request volume into a clear dollar figure, projects it over a month or year, and compares every model side by side so you can see which is cheapest for your workload.

Whether you're budgeting a new AI feature, comparing providers before committing, or trying to cut an existing API bill, knowing the real cost per call and at scale is essential. A model that looks cheap per token can become expensive at volume, and the cheapest model isn't always obvious until you run the numbers.

How is LLM API Cost Calculated?

The cost of a single request is the input tokens times the input price plus the output tokens times the output price, with prices quoted per million tokens. Multiply by your number of requests for the total.

Cost per request =
(input tokens ÷ 1,000,000 × input price)
+ (output tokens ÷ 1,000,000 × output price)

Total = cost per request × number of requests

Example: 1,000 in + 500 out on GPT-4o
= (1000/1M × $2.50) + (500/1M × $10)
= $0.0025 + $0.005 = $0.0075 per call

How to Use This Calculator

Choose your model, then enter the average input tokens and output tokens for a single request — if you're not sure, use a token counter on a sample prompt and response. Enter how many requests you expect, and pick whether that's a one-off total, a daily figure, or a monthly figure. You'll instantly see the cost per request, the total, the input/output split, and monthly and yearly projections. The comparison table shows the same workload priced on every model, cheapest first.

Why Output Tokens Cost More

Across almost every provider, output tokens are priced higher than input tokens — commonly three to five times more. This is because generating text is more computationally expensive than reading it. The practical consequence is that the length of the model's responses often drives your bill more than the length of your prompts. If your costs are high, capping the maximum output length and asking for concise responses is frequently the biggest lever you have.

💡 The input/output split in the breakdown above shows where your money goes. If output cost dominates, shorten responses or set a max-tokens limit. If input cost dominates, trim your prompts and context, or use prompt caching (offered by most providers) to avoid paying full price for repeated context.

Comparing GPT vs Claude vs Gemini Costs

Pricing varies widely across providers and tiers. Budget models like Gemini Flash-Lite or GPT-4o mini cost a fraction of flagship models like GPT-5.5 or Claude Opus. The comparison table makes the trade-off visible: for the same workload, the cheapest and most expensive models can differ by 50x or more. The key insight is that you rarely need the most powerful (and priciest) model for every task — routing simple work to a cheap model and reserving the flagship for hard cases is the single biggest way to control costs.

What is a Token?

A token is the unit LLMs read and bill in — roughly four characters or three-quarters of a word in English. Both your prompt and the model's response are measured in tokens. Because billing is per token, accurately estimating your token counts is the foundation of cost estimation. To get exact token counts for your actual prompts, use a token counter tool, then plug those numbers into this calculator for a precise cost.

Ways to Reduce Your API Costs

Use cheaper models for simple tasks: route classification, extraction, and routing to budget models; reserve flagships for complex reasoning.
Cap output length: set a max-tokens limit so responses don't run longer (and pricier) than needed.
Use prompt caching: most providers offer up to 90% off repeated input context.
Batch non-urgent work: batch APIs typically give 50% off when you can wait.
Trim prompts: remove redundant instructions and unnecessary context.

Understanding Pricing Tiers and Discounts

The prices in this calculator are standard, real-time pay-as-you-go rates. Providers also offer discounts this tool doesn't apply by default: batch processing (around 50% off for non-urgent work completed within a day), prompt caching (up to 90% off input tokens that repeat across requests), and free tiers (limited free usage, common on Google's Flash models). Some flagship models also charge more for very long prompts above a token threshold. For precise budgeting at scale, factor in whichever discounts apply to your usage pattern.

Frequently Asked Questions

How much does the OpenAI / ChatGPT API cost?

It depends on the model. As of mid-2026, GPT-4o costs about $2.50 per million input tokens and $10 per million output tokens, while GPT-4o mini is far cheaper at $0.15/$0.60, and flagship GPT-5.5 is $5/$30. A typical request of 1,000 input and 500 output tokens on GPT-4o costs under one cent. Enter your token usage above for an exact figure and a monthly projection.

Which LLM API is cheapest?

Among major providers, budget models are cheapest: Gemini Flash-Lite (around $0.10/$0.40 per million tokens), GPT-4o mini ($0.15/$0.60), and DeepSeek ($0.14/$0.28) are among the lowest. The comparison table above ranks every model for your specific workload, so you can see the cheapest option for your exact input/output mix rather than relying on headline rates.

How do I estimate my token usage?

Use a token counter on a representative sample of your prompts and the responses you expect, then enter those averages here. As a rough guide, one token is about four characters or 0.75 words, so 1,000 words is roughly 1,300–1,400 tokens. For accurate budgeting, measure a few real examples rather than guessing, since token counts vary by content.

Why is the output price higher than the input price?

Generating tokens is more computationally demanding than processing input, so providers charge more for output — typically three to five times the input rate. This means the length of the model's responses often matters more for your bill than your prompt length. Limiting response length is usually the most effective way to cut costs.

Are these prices up to date?

The rates here were verified in June 2026 from provider pricing pages, but AI pricing changes regularly as new models launch. Always confirm the current price on OpenAI's, Anthropic's, or Google's official pricing page before relying on it for contracts or production budgets. This calculator is for estimation and comparison.

Do these prices include batch or caching discounts?

No — the calculator uses standard pay-as-you-go rates. Providers offer additional savings: batch processing (around 50% off for non-urgent work), prompt caching (up to 90% off repeated input), and free tiers on some models. If your usage qualifies, your real cost could be significantly lower than shown. Factor those in for precise large-scale budgeting.

How do I calculate monthly API costs?

Enter your per-request token usage, set the number of requests to your daily volume, and choose "Per day" — the calculator projects your monthly and yearly costs automatically. Alternatively, enter your monthly request count directly and select "Per month." This makes it easy to budget an AI feature before you build it.

Is my data private?

Yes. The calculator runs entirely in your browser — no token counts, usage figures, or anything else is uploaded or stored. All calculations happen on your device, instantly and privately, and it works offline once the page has loaded.