Skip to main content

Overview

Tesslate Studio routes all AI requests through a LiteLLM proxy gateway that provides unified access to multiple AI providers. This means you can use OpenAI, Anthropic, Google, and other models through a single interface, with automatic budget tracking, usage monitoring, and dynamic pricing.

LiteLLM Gateway

Unified proxy routing requests to OpenAI, Anthropic, Google, and more

Dynamic Pricing

Per-token cost calculated in real time from provider rates

Budget Controls

Per-user budget limits and team-based access tiers

BYOK Support

Bring Your Own Key to bypass platform credits entirely

How the LiteLLM Gateway Works

LiteLLM acts as a reverse proxy between Tesslate and AI providers. When you send a message to an agent, the request flows through this pipeline:
1

User Sends Message

Your message is sent from the chat UI to the FastAPI backend.
2

Credit Check

The check_credits(user, model_name) function verifies you have sufficient credits (or a BYOK key) before proceeding.
3

LiteLLM Routing

The request is forwarded to the LiteLLM proxy (running on port 4000), which routes it to the correct AI provider based on the model name.
4

Provider Response

The AI provider processes the request and streams the response back through LiteLLM to the Tesslate backend.
5

Cost Calculation

After the response, calculate_cost_cents(model, tokens_in, tokens_out) computes the cost using the current pricing data.
6

Credit Deduction

The deduct_credits() function subtracts the cost from your credit pools in priority order: daily, then bundled, then signup bonus, then purchased.

Supported Models and Providers

GPT-4 Family
ModelStrengthsSpeedCost
GPT-4 TurboBest reasoning, most capableModerateHigh
GPT-4Strong general purposeSlowerHigh
GPT-3.5 TurboFast, affordableVery fastLow
Requires: OpenAI API key (BYOK) or platform credits

Model Pricing System

Tesslate uses dynamic pricing fetched directly from LiteLLM’s /model/info endpoint. Prices are cached for 5 minutes and calculated with Decimal arithmetic to avoid floating-point errors.

How Pricing Works

StepDescription
FetchCalls LiteLLM /model/info to get per-token costs for all configured models
CacheStores the pricing map in distributed cache (Redis or in-memory) for 5 minutes
LookupOn each AI request, looks up the model’s pricing (exact match, then partial match, then fallback)
CalculateConverts token counts to cost in cents using Decimal arithmetic

Cost Calculation Formula

cost_input  = ceil((tokens_in / 1,000,000) * price_per_1M_input * 100)
cost_output = ceil((tokens_out / 1,000,000) * price_per_1M_output * 100)
cost_total  = cost_input + cost_output
Key details:
  • Prices are in USD per 1 million tokens
  • Costs are calculated in cents (1 credit = $0.01)
  • Ceiling rounding ensures any non-zero usage costs at least 1 cent
  • All arithmetic uses Python’s decimal.Decimal for financial precision

Fallback Pricing

If a model is not found in LiteLLM’s pricing data, default rates apply:
Token TypeFallback Rate
Input$1.00 per 1M tokens
Output$3.00 per 1M tokens
A warning is logged when fallback pricing is used, so administrators can add proper pricing data.
Exact pricing varies by provider and changes over time. The dynamic pricing system ensures you always pay current rates without manual configuration updates.

Model Access Tiers

LiteLLM organizes users into teams with different model access levels:
Team TierAvailable ModelsMax Budget
FreeGPT-3.5 Turbo, Claude 3 Haiku$5.00
Internal (Basic)GPT-4, Claude 3 Sonnet, GPT-3.5 Turbo$100.00
Premium (Pro/Ultra)GPT-4, Claude 3 Opus, Claude 3 Sonnet$1,000.00
When you subscribe to a tier, your LiteLLM team assignment updates automatically to grant access to the appropriate models.

BYOK (Bring Your Own Key)

BYOK lets you use your own API keys instead of platform credits. Available on Pro and Ultra tiers.
When a model is identified as BYOK (you have provided your own API key for that provider):
  1. The cost is $0 in platform credits
  2. No credits are deducted from any pool
  3. A UsageLog entry is still created with is_byok=True and billed_status="exempt" for your analytics
  4. You pay the provider directly through your own API key billing

Model Configuration

Setting Default Models

1

Open Settings

Navigate to Settings > Preferences
2

Choose Diagram Model

Set the model used for architecture diagram generation via the diagram_model preference
3

Configure Agent Models

Each agent can use a different model. When creating or editing an agent, select the model from the dropdown.
4

Disable Models

Use the disabled_models preference to hide specific models from the chat model selector. This is stored as a JSON array in your user profile.

Per-Agent Model Selection

1

Open Library

Go to Library > Agents
2

Edit Agent

Click Edit on an open-source agent you own
3

Select Model

Choose from the available models based on your tier
4

Save and Test

Save changes. The agent immediately uses the new model for all future requests.
You can only change models for open-source agents you own. Marketplace agents with closed-source configurations have fixed models set by the creator.

Model Parameters

Temperature

Controls randomness and creativity in AI responses:
Deterministic and focused
  • More predictable output
  • Consistent code style
  • Repeatable results
Best for: Production code, bug fixes, refactoring, documentation

Max Tokens

Limits the length of AI responses:
  • Stream Agents: Typically 4,000 to 8,000 tokens
  • Iterative Agents: Typically 2,000 to 4,000 tokens per step
  • Higher values produce more complete responses at higher cost
  • Lower values are faster and cheaper but may truncate output
1 token is approximately 4 characters. A React component with 100 lines is roughly 400 to 800 tokens.

Other Parameters

Alternative to temperature for controlling randomness:
  • 0.1: Very focused, narrow range of tokens considered
  • 0.5: Balanced
  • 0.9: Diverse
  • 1.0: All token possibilities considered (default)
Reduces repetition in generated code:
  • 0.0: No penalty (default)
  • 0.5: Moderate reduction in repeated patterns
  • 2.0: Maximum reduction
Encourages exploration of new topics:
  • 0.0: No penalty (default)
  • 0.5: Moderate encouragement for new topics
  • 2.0: Maximum encouragement

Cost Management

Understanding Your Costs

AI costs are based on two factors: the model you choose and the number of tokens processed.
What you send to the model
  • Your prompts and messages
  • System prompts and agent instructions
  • Chat history and context
  • Code being analyzed or edited
Input tokens are typically cheaper than output tokens.

Reducing Costs

  • Use GPT-3.5 or Claude Haiku for simple UI work and quick iterations
  • Use GPT-4 or Claude Opus for complex logic, architecture decisions, and debugging
  • Do not use premium models for tasks that smaller models handle well
  • Be specific and concise in your requests
  • Avoid sending unnecessary context
  • Clear chat history when changing topics to reduce input tokens
If you use AI models heavily, adding your own API key (BYOK) can be more cost-effective than purchasing platform credits, since you pay provider rates directly.
  • Lower max_tokens when you need shorter responses
  • Use temperature 0.2 to 0.3 for predictable, focused output
  • Set per-run cost limits to prevent runaway agent loops (default limit: $5 per run)

Usage Monitoring

In Tesslate Studio

Track your AI usage directly from the billing dashboard:
1

Open Billing

Navigate to Settings > Billing
2

View Usage Summary

The usage section shows total cost, token counts, and request counts for the current billing period, broken down by model and by agent.
3

View Detailed Logs

Click into usage logs to see individual requests with model, tokens, cost, and timestamp.
4

Check Credit Status

The credit status indicator warns you when your balance drops below 20% of your monthly allowance.

Usage API Endpoints

EndpointDescription
GET /api/billing/usageUsage summary for a date range, broken down by model and agent
GET /api/billing/usage/logsDetailed per-request usage logs with pagination
POST /api/billing/usage/syncManually trigger usage sync from LiteLLM

Model Selection Strategy

Simple UI Work

Use: GPT-3.5, Claude Haiku, Qwen Fast, affordable, great for visual components and Tailwind CSS

Complex Logic

Use: GPT-4, Claude Opus Better reasoning, handles complexity, fewer errors in business logic

API Integration

Use: GPT-4 Turbo, Claude Sonnet Strong understanding of APIs, good error handling, async patterns

Debugging

Use: GPT-4, Claude Opus Deep code analysis, root cause identification, comprehensive fixes

Troubleshooting

Solutions:
  • Verify LITELLM_MASTER_KEY is set correctly in the environment
  • Confirm the LiteLLM proxy is running (test: curl http://localhost:4000/health)
  • Check that your BYOK key is valid and has sufficient credits with the provider
Solutions:
  • Your subscription tier may not include that model; check the access tier table above
  • Verify the model is configured in the LiteLLM config
  • Use GET /api/billing/config to see which models are available for your tier
Solutions:
  • Purchase additional credits from Settings > Billing
  • Upgrade your subscription tier for more bundled credits
  • Add your own API key (BYOK) to bypass platform credit costs
Solutions:
  • Switch to a faster model (e.g., Claude Haiku or GPT-3.5 Turbo)
  • Reduce the max_tokens parameter
  • Clear chat history to reduce input context size

Next Steps

API Keys

Set up your AI provider keys for BYOK

Billing

Understand credits, tiers, and cost management

Using Agents

Get the most from your AI agents

Customizing Agents

Create custom agents with specific models and parameters