The best way to manage AI spend isn't to use AI less. It's to stay in control of how and when you use it.
TL;DR — Credit Optimization Best Practices
Default to Auto Lite. It's the most credit-efficient starting point for everyday work. Escalate to Performance or Turbo when the task genuinely warrants it — then drop back down.
Check The Multiplier. Hover over an LLM in the selector, or use the multipliers tab to see how much more or less credit-efficient a model is in relation to a ‘flagship model’ at 1X.
Enable subagents. They route specialized tasks to lighter, purpose-built models so the primary model isn't doing everything.
Start new chats for new topics. Long threads carry full history as input on every message. A fresh chat resets the cost.
Turn recurring prompts into Agents or Workflows. If you're typing the same thing every week, automate it.
Keep context tight. Upload only the relevant pages, paste only the relevant excerpt. Input size is part of the cost.
Think ROI, not just spend. High-value usage at higher cost beats low-value usage at any cost. Ask whether the credits are earning their keep.
Admins: use the controls available to you. Role-based credit limits, model restrictions, Auto Mode policies, and the Adoption Tab exist so you stay in control at scale.
Why Credits Vary More Than You Might Expect
Credits are Hatz's unit of AI consumption — each message you send draws credits based on the amount of text processed (input and output) and the model handling the request. Every model carries a multiplier, so the same message sent to a Turbo-tier model costs meaningfully more than one sent to a Lite-tier model. Your plan includes a monthly credit allotment; how far that allotment stretches depends almost entirely on which models you use and how much context you carry into each conversation.
Credits aren't a flat per-message fee. Every conversation draws on several variables at once: which model is handling the request, how much context has accumulated in the thread, whether tools or files are involved, and how your workflows, agents, apps, or even chats are structured. These factors compound — a long thread running on a high-multiplier model with a large file attached costs meaningfully more per message than a fresh, focused chat on Auto Lite.
Instead of being problem to avoid, it's a set of dials you can turn to control AI. Understanding these dials is the first step to getting real value out of every credit you spend.
Related: Credits & Usage · Model Multipliers
Start with Auto, Default to Lite
The single biggest lever available to most users is model selection — specifically, defaulting to Auto on the Lite tier rather than leaving a high-multiplier model selected out of habit.
Auto Model Selection routes each message to the most capable model that the task actually requires. On Lite, that means efficient, fast responses for the majority of everyday work: drafting, summarizing, answering questions, light analysis. Recent data from Hatz suggests that nearly 80% of business tasks given to AI can be handled sufficiently by models available in Lite Mode.
But when a task genuinely needs more — complex reasoning, nuanced writing, multi-step analysis — you can escalate to Performance or Turbo consciously, for that task, and drop back down when it's done.
One important note: Auto LLM, or model routing, routes PER MESSAGE, so you get the best model for each prompt without needing to start a new chat or branch off to use a different model.
The habit to build: start on Auto Lite, escalate intentionally, don't leave Turbo running for simple follow-ups.
Users who default to Turbo for everything and never change it are often spending 3–5x more per message than Performance, and 10-12x more than Lite than the task requires.
Read more in our blog post here: https://hatz.ai/articles/don%E2%80%99t-take-the-bentley-to-the-bodega-a-smarter-approach-to-ai-model-cost-optimization
When You Want To Choose A Specific LLM: Understanding Model Multipliers
Every model in Hatz carries a multiplier — a number that tells you how much more or less credit-intensive that model is compared to the industry's flagship model, which sits at 1x.
A model at 0.5x uses roughly half the credits per message as the flagship
A model at 1x matches the flagship
A model at 2x uses roughly double
The multiplier is a directional comparison, not an exact prediction — your actual spend on any given message still depends on how many tokens are sent and received. But it gives you a fast, reliable way to compare models before you choose one, without doing token math.
Where to find multipliers:
Hover over any model in the model picker — the tooltip shows its multiplier as "Nx credits per message"
Open "View all models" and switch to the Multipliers tab for a sortable, filterable table of every available model
Keep an eye out for the Promo badge — Hatz occasionally runs promotional pricing on specific models, temporarily lowering their multiplier. The displayed number always reflects the current rate.
Before you switch away from Auto Lite for a task, check the multiplier first. Escalating to a higher-tier model is often the right call — but it's worth a quick look at what that decision costs per message before you commit. A model at 3x or 4x used across a long session adds up fast. The Multipliers tab makes that comparison instant.
Related: Model Multipliers
Enable Subagents
Subagents are purpose-built specialists that operate alongside the primary model rather than loading everything onto it. When subagents are enabled, the AI can delegate specialized work to lighter, more efficient models instead of handling every step itself.
The practical result: tasks that would otherwise require the primary model to stretch outside its core strengths get routed to a tool designed for that job. That's better output and more efficient credit use.
If subagents aren't enabled in your workspace, it's worth turning them on. The efficiency gains are passive — you don't have to manage the routing yourself.
Start Fresh Chats For New Topics
This is one of the most underappreciated credit drivers, and one of the easiest habits to change.
Every message you send in an existing conversation carries the full history of that thread as input context. The longer the chat, the more tokens are processed with every new message — even for simple follow-ups. A conversation that started as a strategy brainstorm and has since drifted into scheduling questions is burning credits on irrelevant context every single time.
The rule of thumb: when you're starting a genuinely new topic, start a new chat. The model has a cleaner signal to work from, and usage is easier to reason about because the chat is no longer carrying unrelated history.
Long threads are appropriate for tasks where continuity actually matters — iterating on a document, working through a multi-step problem, or building on prior outputs. For everything else, fresh is cheaper and often sharper.
Turn Repetitive Chats Into Automations
If you or someone on your team is opening a new chat and typing a similar prompt on a regular basis — a weekly status summary, a recurring report, a standard client email format — that's a Workflow, App, or Agent waiting to be built.
Manual, prompt-from-scratch conversations are the least efficient way to use AI for work that repeats. Every time someone reconstructs context, re-explains the format, or re-types the same instructions, that's both their time and your credits being spent on setup rather than output.
Agents and Workflows handle repeatable work in a structured, focused way. They're faster, more consistent, and more credit-efficient because the context is baked in rather than rebuilt each time.
This is also the right place to think about ROI. The goal isn't to use fewer credits — it's to get more done per credit. A well-built workflow that runs 20 times a month at a predictable cost is a better investment than 20 ad-hoc conversations that cost more and produce less consistent results.
Not sure where to start? The Workshop Assistant builds it for you.
You don't need to know how to configure a workflow or structure an agent from scratch. The Hatz Workshop Assistant lets you describe what you want in plain language and handles the technical setup for you. Tell it what the automation should do, and it will generate the structure, prompts, and steps.
“Take our existing chat and turn it into an Agent or Workflow”
"Create a workflow that takes a weekly report and writes an executive summary."
"Build an agent that drafts client follow-up emails based on meeting notes."
"Add a step to my existing workflow that translates the output into Spanish."
It can create, modify, run, and explain Workshop items — all from a single conversation. Find it in the Tool Selector, no additional setup required.
Be Intentional About Files And Context
What you put into a conversation is part of the cost. Uploading a 40-page document when the relevant content is on two pages, or pasting an entire email thread when only the last message matters, adds input tokens that the model has to process regardless of whether they're useful.
A few habits that make a difference:
Trim file uploads to the sections that are actually relevant to your question
Paste targeted excerpts rather than full documents when you're asking about something specific
Avoid re-uploading the same file across multiple conversations — if it's being referenced repeatedly, that's a signal it belongs in an Agent or Workflow with the context built in
Keep questions focused — one clear question per message typically costs less and gets a better answer than a multi-part prompt that requires the model to hold many threads at once
Think in ROI, Not Just Spend
Credits are a resource, not a tax. A 50-credit conversation that saves two hours of research, catches an error before it becomes a problem, or produces a draft that would have taken an afternoon is an excellent trade — regardless of what it costs in absolute terms.
Instead of asking, "how do I use fewer credits?" Ask: "am I getting real value from the credits I'm using?"
High usage driven by AI doing meaningful work is the system working as intended.
High usage driven by runaway context, default Turbo selection, or repetitive manual prompting is worth addressing — not because spend is inherently bad, but because those patterns are usually producing less value per credit than they could be.
The tips in this article aren't about cutting AI use. They're about making sure that when you spend, it counts.
For Admins: Controls That Help At Scale
If you manage Hatz for a team or organization, there are additional levers available at the admin level that compound the habits above.
Set role-based credit limits to establish per-user or per-role spending thresholds before they become a problem
Restrict high-multiplier models for roles that don't need them — a support team running on Lite by policy costs a fraction of what an unrestricted deployment does
Use Auto Mode policies to keep teams defaulting to Lite or Performance without requiring individual users to manage it themselves
Monitor usage patterns in the Adoption Tab to identify where credits are going, who the heaviest users are, and whether usage patterns are producing the outcomes you expect
Visibility and policy together are how you stay in control at scale.
Related: Custom Roles — Credit Limits , Adoption Tab
