LLM API Cost Calculator
Estimate what your AI API will cost across GPT, Claude, Gemini and more. Enter your tokens, see the price per call, monthly projection, and which model is cheapest.
Pick a model and enter your input and output tokens per request. See the cost per call and total, project it monthly or yearly, and compare every model for the same workload.
| Model | In $/1M | Out $/1M | Total |
|---|
What is an LLM API Cost Calculator?
An LLM API cost calculator estimates how much you'll pay to use a large language model through its API โ for models like OpenAI's GPT, Anthropic's Claude, and Google's Gemini. These APIs bill per token, with separate rates for the tokens you send (input) and the tokens the model generates (output). This tool turns your token usage and request volume into a clear dollar figure, projects it over a month or year, and compares every model side by side so you can see which is cheapest for your workload.
Whether you're budgeting a new AI feature, comparing providers before committing, or trying to cut an existing API bill, knowing the real cost per call and at scale is essential. A model that looks cheap per token can become expensive at volume, and the cheapest model isn't always obvious until you run the numbers.
How is LLM API Cost Calculated?
The cost of a single request is the input tokens times the input price plus the output tokens times the output price, with prices quoted per million tokens. Multiply by your number of requests for the total.
(input tokens รท 1,000,000 ร input price)
+ (output tokens รท 1,000,000 ร output price)
Total = cost per request ร number of requests
Example: 1,000 in + 500 out on GPT-4o
= (1000/1M ร $2.50) + (500/1M ร $10)
= $0.0025 + $0.005 = $0.0075 per call
How to Use This Calculator
Choose your model, then enter the average input tokens and output tokens for a single request โ if you're not sure, use a token counter on a sample prompt and response. Enter how many requests you expect, and pick whether that's a one-off total, a daily figure, or a monthly figure. You'll instantly see the cost per request, the total, the input/output split, and monthly and yearly projections. The comparison table shows the same workload priced on every model, cheapest first.
Why Output Tokens Cost More
Across almost every provider, output tokens are priced higher than input tokens โ commonly three to five times more. This is because generating text is more computationally expensive than reading it. The practical consequence is that the length of the model's responses often drives your bill more than the length of your prompts. If your costs are high, capping the maximum output length and asking for concise responses is frequently the biggest lever you have.
Comparing GPT vs Claude vs Gemini Costs
Pricing varies widely across providers and tiers. Budget models like Gemini Flash-Lite or GPT-4o mini cost a fraction of flagship models like GPT-5.5 or Claude Opus. The comparison table makes the trade-off visible: for the same workload, the cheapest and most expensive models can differ by 50x or more. The key insight is that you rarely need the most powerful (and priciest) model for every task โ routing simple work to a cheap model and reserving the flagship for hard cases is the single biggest way to control costs.
What is a Token?
A token is the unit LLMs read and bill in โ roughly four characters or three-quarters of a word in English. Both your prompt and the model's response are measured in tokens. Because billing is per token, accurately estimating your token counts is the foundation of cost estimation. To get exact token counts for your actual prompts, use a token counter tool, then plug those numbers into this calculator for a precise cost.
Ways to Reduce Your API Costs
- Use cheaper models for simple tasks: route classification, extraction, and routing to budget models; reserve flagships for complex reasoning.
- Cap output length: set a max-tokens limit so responses don't run longer (and pricier) than needed.
- Use prompt caching: most providers offer up to 90% off repeated input context.
- Batch non-urgent work: batch APIs typically give 50% off when you can wait.
- Trim prompts: remove redundant instructions and unnecessary context.
Understanding Pricing Tiers and Discounts
The prices in this calculator are standard, real-time pay-as-you-go rates. Providers also offer discounts this tool doesn't apply by default: batch processing (around 50% off for non-urgent work completed within a day), prompt caching (up to 90% off input tokens that repeat across requests), and free tiers (limited free usage, common on Google's Flash models). Some flagship models also charge more for very long prompts above a token threshold. For precise budgeting at scale, factor in whichever discounts apply to your usage pattern.
Frequently Asked Questions
Explore All NerdyTools By Categories
Find the right tool for any task โ free, fast, and no sign-up required
