Knowledge

Understanding Tokens and How They Affect AI Costs

Author

Marc Logemann

Date Published

Understanding Tokens in AI

When you're using AI APIs (like those powering ChatGPT, Claude, or Gemini), pricing is often based on tokens—not on words, seconds, or API calls. This can be confusing at first, so here’s a simple breakdown:


What Is a Token?

A token is a chunk of text. It could be:

  • A word (e.g., "cat")
  • Part of a word (e.g., "un" in "understand")
  • Or punctuation (e.g., "." or "!")

In English, 1 token is roughly 4 characters or ¾ of a word on average. So:

  • 100 tokens ≈ 75 words
  • 1,000 tokens ≈ 750 words

Be aware that most LLM Proviers price input tokens differently than output token


How Are Tokens Used?

When you send input to an AI model, it gets broken down into tokens. The model processes those tokens and generates output tokens in response.

You pay for both:

  • Input tokens (your prompt)
  • Output tokens (the model’s response)


A real example for input and output Tokens

In the following screenshot we created a request and told ChatGTP to also show the used tokens.

Example ChatGPT Screen 1


Then we asked ChatGTP to calculate the costs of the used tokens. He calculated based on the GPT-4 Turbo model which we are using on the official ChatGTP Website.


ChatGPT Example Screen 2


Why Does This Matter for Budgeting?

If you're building AI-based software that interacts with a model frequently—like a chatbot, summarizer, or recommendation system—token usage can add up fast.

Example:

  • A single user query might use 200 input tokens
  • The model responds with 300 output tokens
  • That’s 500 tokens total per interaction

Now imagine thousands of users each day. Multiply by the price per 1,000 tokens (e.g., $0.002 for some models, or much more for larger ones), and you can see how costs scale.


Key Takeaways for Budgeting

  • Token usage drives variable cost — it's like bandwidth for AI.
  • Bigger models are more expensive per token — and may use more tokens per response.
  • Fine-tuning, embeddings, and context windows (how much info the model remembers) also affect token usage.

You’ll need to:

  • Track usage per user/session
  • Cap token limits where possible
  • Optimize prompts to reduce input/output length without degrading performance
Related Posts
Comparison Claude with ChatGPT
Knowledge
11.07.2025

This comparison of two major LLMs focus on the areas Model Philosophy & Alignment, Training Data & Capabilities, Context Window and more.