Models, Limits, Tokens, and Response Quality
Use this guide when a model is missing, Auto selected a different model than expected, a response feels slow or low quality, a request hit a limit, or usage looks higher than expected.
Why a model may be missing or unavailable
Your admin, MSP, tenant, or role may restrict which models or Auto modes are available.
Some models are available only in certain product surfaces, features, or API compatibility paths.
Some models support tools, images, files, or long context better than others.
A model can be temporarily unavailable because of provider availability, product rollout, or usage limits.
The live model selector or API model list is the source of truth for what your account can use right now.
How Auto model selection works
Auto chooses a model for each message based on the request, available models, your selected Auto mode, and your organization's policy. Auto does not bypass admin settings. If a preferred model or mode is restricted, Auto uses an available route within the allowed set.
Lite: optimized for faster, lighter, lower-credit work.
Performance: balanced for most day-to-day work.
Turbo: allows more capability for harder or higher-stakes work, which can use more credits or take longer.
How credits, tokens, and context relate
Tokens: the text and structured content the model processes as input and output.
Context: the conversation history, files, instructions, tool results, and other information included with a request.
Model multiplier: a directional comparison of how credit-intensive a model is relative to a reference model. It helps compare models, but it is not an exact per-message quote.
Credits: Hatz Credits are not a raw token invoice. Usage can vary by model, multiplier, input size, output size, files, tools, API calls, generated files, and workflow complexity.
Workflows and agents: multi-step runs, tool calls, files, retries, and generated outputs can consume credits across multiple pieces of work.
If usage is higher than expected, check the model, Auto mode, prompt length, files, tools, workflow step count, retries, generated files, and whether a long chat is carrying more context than intended. For more detail, see Credits & Usage, Getting More From Every Credit, Extra Usage, and Rate Limiting and Credit Overage.
Limits and common error classes
Credit or quota limit: the tenant, role, user, model, workflow, API key, or package may have reached a usage limit.
Rate or concurrency limit: too many requests may be running at once, or a provider may be throttling requests.
Context limit: the prompt, files, conversation history, tool schemas, or prior outputs may be too large for the selected route. For long-running chats, starting a fresh chat and restating the key context can help.
Timeout or retry failure: the request may have taken too long, failed upstream, or failed during a tool, file, or workflow step.
Unsupported feature: the selected model or API compatibility route may not support the requested tool, file, image, streaming, cache controls, or response shape.
Why responses can feel slow or lower quality
Large prompts, long chats, large files, tool calls, web search, document generation, and multi-step workflows can take longer.
Auto mode can choose a different model for different messages, so behavior may vary inside one conversation.
AI output quality depends on task clarity, model fit, instructions, files, context, and tool results. A weaker answer is not automatically a product bug.
For critical work, use clear instructions, provide only relevant context, choose the right model or Auto mode, and verify important facts before relying on the output.
When support can help
Support can help investigate when you have evidence of a product behavior problem, unexpected limit, missing model, API mismatch, repeatable timeout, generated-file issue, or response-quality regression.
Include the product surface, model or Auto mode, timestamp and timezone, tenant or workspace, expected result, actual result, error text, prompt shape with sensitive content removed, files/tools used, retry result, and whether production work is blocked.
Support cannot guarantee that a model will always answer perfectly, match another AI product, use a specific provider path, or stay available forever. Model and provider behavior changes over time, so use the current in-product model selector or API model list for live availability.
Related model articles: LLMs Available on Hatz, AI Model Selector, Auto Model Selection, and LLM Settings.
