Documentation Index
Fetch the complete documentation index at: https://docs.tesslate.com/llms.txt
Use this file to discover all available pages before exploring further.

One agent, every model
OpenSail is model-agnostic. Every AI call routes through a LiteLLM proxy, which exposes a unified OpenAI-compatible interface to any supported provider. Switch from Claude to GPT-4 to Qwen to a local Llama without changing a line of agent code.LiteLLM gateway
Unified proxy across all providers and local inference
Dynamic pricing
Per-token cost fetched live from the proxy and cached 5 minutes
BYOK
Attach your own keys; platform credits are not charged
Self-hosted
Point LiteLLM at Ollama, vLLM, or any OpenAI-compatible endpoint
Supported providers
OpenSail ships with first-party support for every major model provider. Any OpenAI-compatible endpoint also works out of the box.| Provider | Flagship models |
|---|---|
| Anthropic | Claude 3.5/4.x family (Opus, Sonnet, Haiku) |
| OpenAI | GPT-4, GPT-4 Turbo, GPT-4o, o-series |
| DeepSeek | DeepSeek V3, R1 |
| Meta | Llama 3.1, Llama 3.2, Llama 3.3 |
| Mistral | Mistral Large, Codestral, Mixtral |
| Qwen | Qwen 2.5, Qwen-Coder, QwQ |
| Gemini 1.5/2.x Pro, Flash | |
| Moonshot | Kimi K2 family |
| MiniMax | MiniMax M-series |
| Z.AI (ChatGLM) | GLM-4 family |
| xAI | Grok |
How routing works
Agent sends a request
The tesslate-agent’s LLM adapter calls the LiteLLM proxy with an OpenAI-style chat completion request.
Credit check
The
check_credits(user, model_name) function runs before the upstream request. If the model is BYOK for this user, the check always passes. Otherwise it verifies sufficient platform credits across all pools.Provider dispatch
LiteLLM looks up the model name, resolves the provider, and dispatches to the correct upstream API using either the platform’s key, your BYOK key (passthrough mode), or a local endpoint.
Stream back
The response streams back through the proxy to the agent, token by token, into the chat UI.
BYOK (Bring Your Own Key)
BYOK lets you use your own API keys instead of platform credits. Available on Pro and Ultra tiers.- What you get
- How to set it up
- Passthrough mode
- Cost is zero in platform credits
- No deduction from any credit pool
- You pay the upstream provider directly on your own account
UsageLogis still created for your analytics, withis_byok=Trueandbilled_status="exempt"- No platform margin on top
BYOK detection is provider-prefix based. The
is_byok_model(model_name) check looks up the model’s provider in BUILTIN_PROVIDERS. If you have a key stored for that provider, the call runs BYOK automatically.Self-hosted models
For air-gapped deployments or latency-sensitive workloads, point LiteLLM at a local inference server.- Ollama
- vLLM
- Any OpenAI-compatible
Ollama runs open-weight models on your hardware with a simple HTTP API. Add an entry to the LiteLLM config pointing at your Ollama host:The model
llama3-local now appears in OpenSail’s model selector and routes to your Ollama instance.Model selector in chat
Every chat session has a model selector at the top. The menu shows:- Models available to your subscription tier
- BYOK-enabled models (if you have a matching key stored)
- Self-hosted models (if your LiteLLM config includes them)
- Cost badges: platform credits or BYOK or self-hosted
/guides/customizing-agents). Users can override per session unless the agent locks the model.
Default models per tier
| Tier | Typical default | Bundled credits (USD equivalent) |
|---|---|---|
| Free | Fast, low-cost (e.g., Haiku-class, GPT-3.5-class, Qwen-7B-class) | 0 bundled, 5 daily |
| Basic | Mid-tier (Sonnet-class, GPT-4-class) | 500 |
| Pro | Top-tier plus BYOK (Opus-class, GPT-4-class, or your own key) | 2,000 |
| Ultra | Everything unlocked | 8,000 |
/guides/billing for tier pricing details.
Model parameters
All LiteLLM-routed calls accept the standard OpenAI-compatible parameters: temperature,max_tokens, top_p, frequency_penalty, presence_penalty, and stop sequences. Agents can override per call; the platform does not impose hard caps beyond what upstream providers enforce.
For extended-thinking capable models (Claude 3.7+, o1, o3), agents can pass a thinking budget. The default thinking effort is set via default_thinking_effort in the platform config.
Cost management
Match model to task
Match model to task
Use Haiku-class or Qwen-Coder-7B for UI tweaks and simple edits. Save Opus or GPT-4 for reasoning-heavy debugging and architecture. The cost delta is often 10x.
Watch context pressure
Watch context pressure
The agent auto-compacts at 80% of the context window using a cheap summarization model (
compaction_summary_model). Long runs accrue compaction cost; clearing chat between unrelated tasks saves more.BYOK for heavy users
BYOK for heavy users
If you run many hours per day, BYOK pays for itself quickly. No platform margin; you buy at provider list price.
Self-host for scale
Self-host for scale
On Pro/Ultra with self-hosted Qwen-Coder or Llama 3.3, you can drive agents at near-zero marginal cost after GPU depreciation. Great for batch workflows and scheduled agents.
Troubleshooting
Model not available
Model not available
Your tier may not include that model. Check the model selector; greyed-out entries show the minimum tier required. For self-hosted, confirm the model is in the LiteLLM config and the inference server is reachable.
Budget exceeded
Budget exceeded
The LiteLLM per-key budget is a $10,000 runaway ceiling, not the real gate. If you hit it unexpectedly, your admin may need to run
ensure_budget_headroom(). The real usage gate is your OpenSail credit balance.Slow responses
Slow responses
Switch to a faster model (Haiku, Flash, or a hosted Qwen-7B). Reduce
max_tokens. Clear chat history to shrink input. If self-hosting, check GPU utilization; batched throughput dominates.BYOK key rejected
BYOK key rejected
Verify the key in the provider’s console. Check it has sufficient balance. Some providers scope keys to specific models; confirm access. Rotate the key in OpenSail Settings.
Next steps
API Keys
Set up BYOK model keys and external API keys for agent invocation
Billing
Credit pools, tiers, and how BYOK changes the math
Self-hosting
Run OpenSail on your own infrastructure with local models
Using Agents
Pick the right model mid-session