Overview
Tesslate Studio routes all AI requests through a LiteLLM proxy gateway that provides unified access to multiple AI providers. This means you can use OpenAI, Anthropic, Google, and other models through a single interface, with automatic budget tracking, usage monitoring, and dynamic pricing.LiteLLM Gateway
Unified proxy routing requests to OpenAI, Anthropic, Google, and more
Dynamic Pricing
Per-token cost calculated in real time from provider rates
Budget Controls
Per-user budget limits and team-based access tiers
BYOK Support
Bring Your Own Key to bypass platform credits entirely
How the LiteLLM Gateway Works
LiteLLM acts as a reverse proxy between Tesslate and AI providers. When you send a message to an agent, the request flows through this pipeline:Credit Check
The
check_credits(user, model_name) function verifies you have sufficient credits (or a BYOK key) before proceeding.LiteLLM Routing
The request is forwarded to the LiteLLM proxy (running on port 4000), which routes it to the correct AI provider based on the model name.
Provider Response
The AI provider processes the request and streams the response back through LiteLLM to the Tesslate backend.
Cost Calculation
After the response,
calculate_cost_cents(model, tokens_in, tokens_out) computes the cost using the current pricing data.Supported Models and Providers
- OpenAI
- Anthropic
- Google
- OpenRouter
GPT-4 Family
Requires: OpenAI API key (BYOK) or platform credits
| Model | Strengths | Speed | Cost |
|---|---|---|---|
| GPT-4 Turbo | Best reasoning, most capable | Moderate | High |
| GPT-4 | Strong general purpose | Slower | High |
| GPT-3.5 Turbo | Fast, affordable | Very fast | Low |
Model Pricing System
Tesslate uses dynamic pricing fetched directly from LiteLLM’s/model/info endpoint. Prices are cached for 5 minutes and calculated with Decimal arithmetic to avoid floating-point errors.
How Pricing Works
| Step | Description |
|---|---|
| Fetch | Calls LiteLLM /model/info to get per-token costs for all configured models |
| Cache | Stores the pricing map in distributed cache (Redis or in-memory) for 5 minutes |
| Lookup | On each AI request, looks up the model’s pricing (exact match, then partial match, then fallback) |
| Calculate | Converts token counts to cost in cents using Decimal arithmetic |
Cost Calculation Formula
- Prices are in USD per 1 million tokens
- Costs are calculated in cents (1 credit = $0.01)
- Ceiling rounding ensures any non-zero usage costs at least 1 cent
- All arithmetic uses Python’s
decimal.Decimalfor financial precision
Fallback Pricing
If a model is not found in LiteLLM’s pricing data, default rates apply:| Token Type | Fallback Rate |
|---|---|
| Input | $1.00 per 1M tokens |
| Output | $3.00 per 1M tokens |
Exact pricing varies by provider and changes over time. The dynamic pricing system ensures you always pay current rates without manual configuration updates.
Model Access Tiers
LiteLLM organizes users into teams with different model access levels:| Team Tier | Available Models | Max Budget |
|---|---|---|
| Free | GPT-3.5 Turbo, Claude 3 Haiku | $5.00 |
| Internal (Basic) | GPT-4, Claude 3 Sonnet, GPT-3.5 Turbo | $100.00 |
| Premium (Pro/Ultra) | GPT-4, Claude 3 Opus, Claude 3 Sonnet | $1,000.00 |
BYOK (Bring Your Own Key)
BYOK lets you use your own API keys instead of platform credits. Available on Pro and Ultra tiers.- How It Works
- Passthrough Mode
When a model is identified as BYOK (you have provided your own API key for that provider):
- The cost is $0 in platform credits
- No credits are deducted from any pool
- A
UsageLogentry is still created withis_byok=Trueandbilled_status="exempt"for your analytics - You pay the provider directly through your own API key billing
Model Configuration
Setting Default Models
Choose Diagram Model
Set the model used for architecture diagram generation via the
diagram_model preferenceConfigure Agent Models
Each agent can use a different model. When creating or editing an agent, select the model from the dropdown.
Per-Agent Model Selection
Model Parameters
Temperature
Controls randomness and creativity in AI responses:- Low (0.0 to 0.3)
- Medium (0.4 to 0.7)
- High (0.8 to 1.0)
Deterministic and focused
- More predictable output
- Consistent code style
- Repeatable results
Max Tokens
Limits the length of AI responses:- Stream Agents: Typically 4,000 to 8,000 tokens
- Iterative Agents: Typically 2,000 to 4,000 tokens per step
- Higher values produce more complete responses at higher cost
- Lower values are faster and cheaper but may truncate output
1 token is approximately 4 characters. A React component with 100 lines is roughly 400 to 800 tokens.
Other Parameters
Top P (Nucleus Sampling)
Top P (Nucleus Sampling)
Alternative to temperature for controlling randomness:
- 0.1: Very focused, narrow range of tokens considered
- 0.5: Balanced
- 0.9: Diverse
- 1.0: All token possibilities considered (default)
Frequency Penalty
Frequency Penalty
Reduces repetition in generated code:
- 0.0: No penalty (default)
- 0.5: Moderate reduction in repeated patterns
- 2.0: Maximum reduction
Presence Penalty
Presence Penalty
Encourages exploration of new topics:
- 0.0: No penalty (default)
- 0.5: Moderate encouragement for new topics
- 2.0: Maximum encouragement
Cost Management
Understanding Your Costs
AI costs are based on two factors: the model you choose and the number of tokens processed.- Input Tokens
- Output Tokens
What you send to the model
- Your prompts and messages
- System prompts and agent instructions
- Chat history and context
- Code being analyzed or edited
Reducing Costs
Match Model to Task Complexity
Match Model to Task Complexity
- Use GPT-3.5 or Claude Haiku for simple UI work and quick iterations
- Use GPT-4 or Claude Opus for complex logic, architecture decisions, and debugging
- Do not use premium models for tasks that smaller models handle well
Optimize Your Prompts
Optimize Your Prompts
- Be specific and concise in your requests
- Avoid sending unnecessary context
- Clear chat history when changing topics to reduce input tokens
Use BYOK for Heavy Usage
Use BYOK for Heavy Usage
If you use AI models heavily, adding your own API key (BYOK) can be more cost-effective than purchasing platform credits, since you pay provider rates directly.
Adjust Agent Parameters
Adjust Agent Parameters
- Lower
max_tokenswhen you need shorter responses - Use temperature 0.2 to 0.3 for predictable, focused output
- Set per-run cost limits to prevent runaway agent loops (default limit: $5 per run)
Usage Monitoring
In Tesslate Studio
Track your AI usage directly from the billing dashboard:View Usage Summary
The usage section shows total cost, token counts, and request counts for the current billing period, broken down by model and by agent.
View Detailed Logs
Click into usage logs to see individual requests with model, tokens, cost, and timestamp.
Usage API Endpoints
| Endpoint | Description |
|---|---|
GET /api/billing/usage | Usage summary for a date range, broken down by model and agent |
GET /api/billing/usage/logs | Detailed per-request usage logs with pagination |
POST /api/billing/usage/sync | Manually trigger usage sync from LiteLLM |
Model Selection Strategy
Simple UI Work
Use: GPT-3.5, Claude Haiku, Qwen
Fast, affordable, great for visual components and Tailwind CSS
Complex Logic
Use: GPT-4, Claude Opus
Better reasoning, handles complexity, fewer errors in business logic
API Integration
Use: GPT-4 Turbo, Claude Sonnet
Strong understanding of APIs, good error handling, async patterns
Debugging
Use: GPT-4, Claude Opus
Deep code analysis, root cause identification, comprehensive fixes
Troubleshooting
Invalid API Key Error
Invalid API Key Error
Solutions:
- Verify
LITELLM_MASTER_KEYis set correctly in the environment - Confirm the LiteLLM proxy is running (test:
curl http://localhost:4000/health) - Check that your BYOK key is valid and has sufficient credits with the provider
Model Not Available
Model Not Available
Solutions:
- Your subscription tier may not include that model; check the access tier table above
- Verify the model is configured in the LiteLLM config
- Use
GET /api/billing/configto see which models are available for your tier
Budget Exceeded
Budget Exceeded
Solutions:
- Purchase additional credits from Settings > Billing
- Upgrade your subscription tier for more bundled credits
- Add your own API key (BYOK) to bypass platform credit costs
Slow Model Responses
Slow Model Responses
Solutions:
- Switch to a faster model (e.g., Claude Haiku or GPT-3.5 Turbo)
- Reduce the
max_tokensparameter - Clear chat history to reduce input context size
Next Steps
API Keys
Set up your AI provider keys for BYOK
Billing
Understand credits, tiers, and cost management
Using Agents
Get the most from your AI agents
Customizing Agents
Create custom agents with specific models and parameters