Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tesslate.com/llms.txt

Use this file to discover all available pages before exploring further.

Tesslate OpenSail

One agent, every model

OpenSail is model-agnostic. Every AI call routes through a LiteLLM proxy, which exposes a unified OpenAI-compatible interface to any supported provider. Switch from Claude to GPT-4 to Qwen to a local Llama without changing a line of agent code.

LiteLLM gateway

Unified proxy across all providers and local inference

Dynamic pricing

Per-token cost fetched live from the proxy and cached 5 minutes

BYOK

Attach your own keys; platform credits are not charged

Self-hosted

Point LiteLLM at Ollama, vLLM, or any OpenAI-compatible endpoint

Supported providers

OpenSail ships with first-party support for every major model provider. Any OpenAI-compatible endpoint also works out of the box.
ProviderFlagship models
AnthropicClaude 3.5/4.x family (Opus, Sonnet, Haiku)
OpenAIGPT-4, GPT-4 Turbo, GPT-4o, o-series
DeepSeekDeepSeek V3, R1
MetaLlama 3.1, Llama 3.2, Llama 3.3
MistralMistral Large, Codestral, Mixtral
QwenQwen 2.5, Qwen-Coder, QwQ
GoogleGemini 1.5/2.x Pro, Flash
MoonshotKimi K2 family
MiniMaxMiniMax M-series
Z.AI (ChatGLM)GLM-4 family
xAIGrok
Additional gateways: OpenRouter (100+ models through a single key), Groq (ultra-low latency), Together, Fireworks, Perplexity, Cohere, Bedrock. Any gateway LiteLLM can reach is routable.

How routing works

1

Agent sends a request

The tesslate-agent’s LLM adapter calls the LiteLLM proxy with an OpenAI-style chat completion request.
2

Credit check

The check_credits(user, model_name) function runs before the upstream request. If the model is BYOK for this user, the check always passes. Otherwise it verifies sufficient platform credits across all pools.
3

Provider dispatch

LiteLLM looks up the model name, resolves the provider, and dispatches to the correct upstream API using either the platform’s key, your BYOK key (passthrough mode), or a local endpoint.
4

Stream back

The response streams back through the proxy to the agent, token by token, into the chat UI.
5

Cost and usage logged

Post-request, calculate_cost_cents(model, tokens_in, tokens_out) computes the cost using current pricing. deduct_credits subtracts from credit pools in priority order (daily, bundled, signup bonus, purchased). BYOK requests log with is_byok=True and cost zero.

BYOK (Bring Your Own Key)

BYOK lets you use your own API keys instead of platform credits. Available on Pro and Ultra tiers.
  • Cost is zero in platform credits
  • No deduction from any credit pool
  • You pay the upstream provider directly on your own account
  • UsageLog is still created for your analytics, with is_byok=True and billed_status="exempt"
  • No platform margin on top
BYOK detection is provider-prefix based. The is_byok_model(model_name) check looks up the model’s provider in BUILTIN_PROVIDERS. If you have a key stored for that provider, the call runs BYOK automatically.

Self-hosted models

For air-gapped deployments or latency-sensitive workloads, point LiteLLM at a local inference server.
Ollama runs open-weight models on your hardware with a simple HTTP API. Add an entry to the LiteLLM config pointing at your Ollama host:
model_list:
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3
      api_base: http://ollama:11434
The model llama3-local now appears in OpenSail’s model selector and routes to your Ollama instance.
Self-hosted inference requires enough VRAM for the model plus concurrent sessions. Plan for batch size and max sequence length. Ollama and vLLM both handle quantization for smaller GPUs.

Model selector in chat

Every chat session has a model selector at the top. The menu shows:
  • Models available to your subscription tier
  • BYOK-enabled models (if you have a matching key stored)
  • Self-hosted models (if your LiteLLM config includes them)
  • Cost badges: platform credits or BYOK or self-hosted
Agents can also declare a default model as part of their configuration (see /guides/customizing-agents). Users can override per session unless the agent locks the model.

Default models per tier

TierTypical defaultBundled credits (USD equivalent)
FreeFast, low-cost (e.g., Haiku-class, GPT-3.5-class, Qwen-7B-class)0 bundled, 5 daily
BasicMid-tier (Sonnet-class, GPT-4-class)500
ProTop-tier plus BYOK (Opus-class, GPT-4-class, or your own key)2,000
UltraEverything unlocked8,000
Exact model availability is configured in LiteLLM’s team settings. Your admin (or you, if self-hosting) can expose any subset of models to any tier. See /guides/billing for tier pricing details.

Model parameters

All LiteLLM-routed calls accept the standard OpenAI-compatible parameters: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, and stop sequences. Agents can override per call; the platform does not impose hard caps beyond what upstream providers enforce. For extended-thinking capable models (Claude 3.7+, o1, o3), agents can pass a thinking budget. The default thinking effort is set via default_thinking_effort in the platform config.

Cost management

Use Haiku-class or Qwen-Coder-7B for UI tweaks and simple edits. Save Opus or GPT-4 for reasoning-heavy debugging and architecture. The cost delta is often 10x.
The agent auto-compacts at 80% of the context window using a cheap summarization model (compaction_summary_model). Long runs accrue compaction cost; clearing chat between unrelated tasks saves more.
If you run many hours per day, BYOK pays for itself quickly. No platform margin; you buy at provider list price.
On Pro/Ultra with self-hosted Qwen-Coder or Llama 3.3, you can drive agents at near-zero marginal cost after GPU depreciation. Great for batch workflows and scheduled agents.

Troubleshooting

Your tier may not include that model. Check the model selector; greyed-out entries show the minimum tier required. For self-hosted, confirm the model is in the LiteLLM config and the inference server is reachable.
The LiteLLM per-key budget is a $10,000 runaway ceiling, not the real gate. If you hit it unexpectedly, your admin may need to run ensure_budget_headroom(). The real usage gate is your OpenSail credit balance.
Switch to a faster model (Haiku, Flash, or a hosted Qwen-7B). Reduce max_tokens. Clear chat history to shrink input. If self-hosting, check GPU utilization; batched throughput dominates.
Verify the key in the provider’s console. Check it has sufficient balance. Some providers scope keys to specific models; confirm access. Rotate the key in OpenSail Settings.

Next steps

API Keys

Set up BYOK model keys and external API keys for agent invocation

Billing

Credit pools, tiers, and how BYOK changes the math

Self-hosting

Run OpenSail on your own infrastructure with local models

Using Agents

Pick the right model mid-session