The best way to manage AI spend isn't to use AI less. It's to stay in control of how and when you use it.

TL;DR — Credit Optimization Best Practices

Default to Auto Lite. It's the most credit-efficient starting point for everyday work. Escalate to Performance or Turbo when the task genuinely warrants it — then drop back down.

Check The Multiplier. Hover over an LLM in the selector, or use the multipliers tab to see how much more or less credit-efficient a model is in relation to a ‘flagship model’ at 1X.

Enable subagents. They route specialized tasks to lighter, purpose-built models so the primary model isn't doing everything.

Start new chats for new topics. Long threads carry full history as input on every message. A fresh chat resets the cost.

Turn recurring prompts into Agents or Workflows. If you're typing the same thing every week, automate it.

Keep context tight. Upload only the relevant pages, paste only the relevant excerpt. Input size is part of the cost.

Think ROI, not just spend. High-value usage at higher cost beats low-value usage at any cost. Ask whether the credits are earning their keep.

Admins: use the controls available to you. Role-based credit limits, model restrictions, Auto Mode policies, and the Adoption Tab exist so you stay in control at scale.

Hatz credit optimization summary showing model tier recommendations and credit usage best practices

Why Credits Vary More Than You Might Expect

Credits are Hatz's unit of AI consumption — each message you send draws credits based on the amount of text processed (input and output) and the model handling the request. Every model carries a multiplier, so the same message sent to a Turbo-tier model costs meaningfully more than one sent to a Lite-tier model. Your plan includes a monthly credit allotment; how far that allotment stretches depends almost entirely on which models you use and how much context you carry into each conversation.

Credits aren't a flat per-message fee. Every conversation draws on several variables at once: which model is handling the request, how much context has accumulated in the thread, whether tools or files are involved, and how your workflows, agents, apps, or even chats are structured. These factors compound — a long thread running on a high-multiplier model with a large file attached costs meaningfully more per message than a fresh, focused chat on Auto Lite.

Instead of being problem to avoid, it's a set of dials you can turn to control AI. Understanding these dials is the first step to getting real value out of every credit you spend.

Related: Credits & Usage · Model Multipliers

Start with Auto, Default to Lite

The single biggest lever available to most users is model selection — specifically, defaulting to Auto on the Lite tier rather than leaving a high-multiplier model selected out of habit.

Auto Model Selection routes each message to the most capable model that the task actually requires. On Lite, that means efficient, fast responses for the majority of everyday work: drafting, summarizing, answering questions, light analysis. Recent data from Hatz suggests that nearly 80% of business tasks given to AI can be handled sufficiently by models available in Lite Mode.

But when a task genuinely needs more — complex reasoning, nuanced writing, multi-step analysis — you can escalate to Performance or Turbo consciously, for that task, and drop back down when it's done.

One important note: Auto LLM, or model routing, routes PER MESSAGE, so you get the best model for each prompt without needing to start a new chat or branch off to use a different model.

The habit to build: start on Auto Lite, escalate intentionally, don't leave Turbo running for simple follow-ups.

Users who default to Turbo for everything and never change it are often spending 3–5x more per message than Performance, and 10-12x more than Lite than the task requires.

Why Auto Lite can still use more credits than you expect

Auto Lite keeps the model efficient, but the model is only one part of the cost. On a single request the assistant may also call tools, hand work to subagents, read uploaded files, or take several steps to finish a task — and each of those draws additional credits on top of the base message. So a request that stays on Auto Lite can still run up usage if it triggers a lot of tool or subagent activity, works through a large file, or runs for many steps.

This is usually working as intended, but if Auto Lite usage looks higher than the task warrants, check whether the request is pulling in large files or long history, and whether it needed the tools it used. The troubleshooting steps below help you confirm what drove the usage.

When You Want To Choose A Specific LLM: Understanding Model Multipliers

Every model in Hatz carries a multiplier — a number that tells you how much more or less credit-intensive that model is compared to the industry's flagship model, which sits at 1x.

A model at 0.5x uses roughly half the credits per message as the flagship

A model at 1x matches the flagship

A model at 2x uses roughly double

The multiplier is a directional comparison, not an exact prediction — your actual spend on any given message still depends on how many tokens are sent and received. But it gives you a fast, reliable way to compare models before you choose one, without doing token math.

Where to find multipliers:

Hover over any model in the model picker — the tooltip shows its multiplier as "Nx credits per message"

Open "View all models" and switch to the Multipliers tab for a sortable, filterable table of every available model

Keep an eye out for the Promo badge — Hatz occasionally runs promotional pricing on specific models, temporarily lowering their multiplier. The displayed number always reflects the current rate.

Before you switch away from Auto Lite for a task, check the multiplier first. Escalating to a higher-tier model is often the right call — but it's worth a quick look at what that decision costs per message before you commit. A model at 3x or 4x used across a long session adds up fast. The Multipliers tab makes that comparison instant.

Related: Model Multipliers

Enable Subagents

Subagents are purpose-built specialists that operate alongside the primary model rather than loading everything onto it. When subagents are enabled, the AI can delegate specialized work to lighter, more efficient models instead of handling every step itself.

The practical result: tasks that would otherwise require the primary model to stretch outside its core strengths get routed to a tool designed for that job. That's better output and more efficient credit use.

If subagents aren't enabled in your workspace, it's worth turning them on. The efficiency gains are passive — you don't have to manage the routing yourself.

Start Fresh Chats For New Topics

This is one of the most underappreciated credit drivers, and one of the easiest habits to change.

Every message you send in an existing conversation carries the full history of that thread as input context. The longer the chat, the more tokens are processed with every new message — even for simple follow-ups. A conversation that started as a strategy brainstorm and has since drifted into scheduling questions is burning credits on irrelevant context every single time.

The rule of thumb: when you're starting a genuinely new topic, start a new chat. The model has a cleaner signal to work from, and usage is easier to reason about because the chat is no longer carrying unrelated history.

Long threads are appropriate for tasks where continuity actually matters — iterating on a document, working through a multi-step problem, or building on prior outputs. For everything else, fresh is cheaper and often sharper.

Turn Repetitive Chats Into Automations

If you or someone on your team is opening a new chat and typing a similar prompt on a regular basis — a weekly status summary, a recurring report, a standard client email format — that's a Workflow, App, or Agent waiting to be built.

Manual, prompt-from-scratch conversations are the least efficient way to use AI for work that repeats. Every time someone reconstructs context, re-explains the format, or re-types the same instructions, that's both their time and your credits being spent on setup rather than output.

Agents and Workflows handle repeatable work in a structured, focused way. They're faster, more consistent, and more credit-efficient because the context is baked in rather than rebuilt each time.

This is also the right place to think about ROI. The goal isn't to use fewer credits — it's to get more done per credit. A well-built workflow that runs 20 times a month at a predictable cost is a better investment than 20 ad-hoc conversations that cost more and produce less consistent results.

Not sure where to start? The Workshop Assistant builds it for you.

You don't need to know how to configure a workflow or structure an agent from scratch. The Hatz Workshop Assistant lets you describe what you want in plain language and handles the technical setup for you. Tell it what the automation should do, and it will generate the structure, prompts, and steps.

“Take our existing chat and turn it into an Agent or Workflow”

"Create a workflow that takes a weekly report and writes an executive summary."

"Build an agent that drafts client follow-up emails based on meeting notes."

"Add a step to my existing workflow that translates the output into Spanish."

It can create, modify, run, and explain Workshop items — all from a single conversation. Find it in the Tool Selector, no additional setup required.

Be Intentional About Files And Context

What you put into a conversation is part of the cost. Uploading a 40-page document when the relevant content is on two pages, or pasting an entire email thread when only the last message matters, adds input tokens that the model has to process regardless of whether they're useful.

A few habits that make a difference:

Trim file uploads to the sections that are actually relevant to your question

Paste targeted excerpts rather than full documents when you're asking about something specific

Avoid re-uploading the same file across multiple conversations — if it's being referenced repeatedly, that's a signal it belongs in an Agent or Workflow with the context built in

Keep questions focused — one clear question per message typically costs less and gets a better answer than a multi-part prompt that requires the model to hold many threads at once

Think in ROI, Not Just Spend

Credits are a resource, not a tax. A 50-credit conversation that saves two hours of research, catches an error before it becomes a problem, or produces a draft that would have taken an afternoon is an excellent trade — regardless of what it costs in absolute terms.

Instead of asking, "how do I use fewer credits?" Ask: "am I getting real value from the credits I'm using?"

High usage driven by AI doing meaningful work is the system working as intended.

High usage driven by runaway context, default Turbo selection, or repetitive manual prompting is worth addressing — not because spend is inherently bad, but because those patterns are usually producing less value per credit than they could be.

The tips in this article aren't about cutting AI use. They're about making sure that when you spend, it counts.

Troubleshooting Unexpected Or Runaway Credit Burn

Most high usage comes from the dials above — model choice, long threads, large files, and tool or subagent activity. Occasionally a single chat or run consumes far more than the work seems to justify. Use these steps to figure out what happened.

A chat spins with no response but usage still climbs

A request keeps processing in the background even if the answer is slow to appear, so credits can be consumed before you see any output. If a chat appears stuck:

Stop or close the request rather than resending the same prompt repeatedly — each resend starts new work.
Start a fresh chat for your next attempt so it isn't carrying the stalled thread's history.
If the request was long-running, the assistant may stop the work on its own once it reaches a usage boundary and return a partial or wrapped-up result.

Repetitive or “junk” output loops

Sometimes the assistant gets stuck repeating itself or calling the same tool over and over without making progress, which burns credits while producing little of value. The system is designed to detect this kind of loop and wrap the run up gracefully rather than let it run indefinitely, but a loop can still consume credits before it stops. If you see repetitive or low-value output:

Stop the run and start a new, tightly scoped chat.
Narrow the request — one clear task, only the files and context that matter.
Note the time and the chat so support can review what the run was doing.

Auditing unexpectedly high usage for a specific model

If usage for one model looks higher than expected, work through it in order:

Confirm the model actually in use. Auto Mode routes per message, so a thread may have escalated to a higher-tier model for some messages. Hover a model in the picker, or open “View all models” and the Multipliers tab, to see its credit multiplier.
Check the multiplier. A higher-multiplier model used across a long session adds up quickly — the same conversation on a lower-tier model would cost less.
Look at context size. Long threads and large or re-uploaded files are processed as input on every message and are a common cause of high per-message cost.
Check whether tools, subagents, or many steps were involved for that model, since each adds credits beyond the base message.
Review who and where. Admins can use the per-user usage views and the Adoption Tab to see which users and features are driving usage for that model.

If usage still looks wrong after these checks, contact support with the tenant, the affected user, the date, time, and timezone, the model and Auto Mode state, and a link or ID for the chat, workflow run, or request so it can be investigated.

For Admins: Controls That Help At Scale

If you manage Hatz for a team or organization, there are additional levers available at the admin level that compound the habits above.

Set role-based credit limits to establish per-user or per-role spending thresholds before they become a problem

Restrict high-multiplier models for roles that don't need them — a support team running on Lite by policy costs a fraction of what an unrestricted deployment does

Use Auto Mode policies to keep teams defaulting to Lite or Performance without requiring individual users to manage it themselves

Monitor usage patterns in the Adoption Tab to identify where credits are going, who the heaviest users are, and whether usage patterns are producing the outcomes you expect

Visibility and policy together are how you stay in control at scale.

Getting More From Every Credit