Overview
Model Management allows you to configure which AI models power your agents, adjust model parameters, and control costs.Choose Models
Select GPT-4, Claude, Qwen, or custom models
Control Costs
See pricing and manage API usage
Tune Parameters
Adjust temperature, tokens, and more
Per-Agent
Different models for different agents
Available Models
- OpenAI
- Anthropic
- Qwen
- OpenRouter
GPT-4 Family
-
GPT-4 Turbo: Best reasoning, most capable
- Cost: $$$ (Highest)
- Speed: Moderate
- Best for: Complex logic, architecture
-
GPT-4: Slightly less capable than Turbo
- Cost: $$$
- Speed: Slower
- Best for: General development
-
GPT-3.5 Turbo: Fast and affordable
- Cost: $ (Low)
- Speed: Very fast
- Best for: Simple UI, quick iterations
Model Configuration
Global Default Models
Set default models for all new agents:1
Open Settings
Profile → Settings → Model Management
2
Stream Agent Default
Choose default model for Stream agents
3
Iterative Agent Default
Choose default model for Iterative agents
4
Save
New agents will use these defaults
Existing custom agents keep their configured models. Only affects newly created agents.
Per-Agent Models
Customize models for individual agents:1
Open Library
Go to Library → Agents
2
Edit Agent
Click Edit on an open-source agent
3
Select Model
Choose from available models
4
Save
Agent now uses the new model
5
Test
Try the agent with new model
You can only change models for open-source agents you own. Closed-source agents have fixed models.
Model Parameters
Temperature
Controls randomness and creativity:- Low (0.0 - 0.3)
- Medium (0.4 - 0.7)
- High (0.8 - 1.0)
Deterministic and focused
- More predictable output
- Consistent code style
- Less creative
- Repeatable results
- Production code
- Bug fixes
- Refactoring
- Documentation
Max Tokens
Limits response length:- Stream Agents: 4000-8000 tokens typical
- Iterative Agents: 2000-4000 per step
- Higher: More complete responses, higher cost
- Lower: Faster, cheaper, may truncate
1 token ≈ 4 characters. A component with 100 lines ≈ 400-800 tokens.
Other Parameters
Top P (Nucleus Sampling)
Top P (Nucleus Sampling)
Alternative to temperature:
- 0.1: Very focused
- 0.5: Balanced
- 0.9: Diverse
- 1.0: All possibilities
Frequency Penalty
Frequency Penalty
Reduces repetition:
- 0.0: No penalty (default)
- 0.5: Some reduction
- 2.0: Maximum reduction
Presence Penalty
Presence Penalty
Encourages new topics:
- 0.0: No penalty (default)
- 0.5: Moderate encouragement
- 2.0: Maximum encouragement
Cost Management
Understanding Pricing
AI model costs are based on tokens:- Input Tokens
- Output Tokens
What you send
- Your prompts
- System messages
- Context/history
- Code being edited
Model Cost Comparison
| Model | Input | Output | Relative Cost |
|---|---|---|---|
| GPT-4 Turbo | $$$ | $$$ | High |
| Claude Opus | $$$ | $$$ | High |
| GPT-3.5 | $ | $ | Low |
| Claude Haiku | $ | $ | Low |
| Qwen 32B | $ | $ | Very Low |
Exact pricing varies - check provider websites for current rates. Tesslate doesn’t mark up model costs.
Reducing Costs
Use Appropriate Models
Use Appropriate Models
- GPT-3.5/Haiku for UI
- GPT-4/Opus for complex logic
- Match model to task complexity
- Don’t use premium for simple tasks
Optimize Prompts
Optimize Prompts
- Be specific and concise
- Avoid unnecessary context
- Clear, direct requests
- Reduce back-and-forth
Clear Context
Clear Context
- Clear chat history when changing topics
- Don’t carry unnecessary context
- Start fresh for new features
- Reduce token usage
Adjust Parameters
Adjust Parameters
- Lower max_tokens when appropriate
- Use temperature 0.3 for predictable tasks
- Disable verbose explanations
- Request concise responses
Usage Monitoring
Tracking API Usage
1
Check Provider Dashboard
- OpenAI: platform.openai.com/usage
- Anthropic: console.anthropic.com/settings/billing
- OpenRouter: openrouter.ai/activity
2
Monitor in Tesslate
Coming soon: Built-in usage dashboard
3
Set Budgets
Configure spending limits in provider dashboards
4
Review Regularly
Check usage weekly or monthly
Usage Alerts
Set up alerts in provider dashboards:- Email when hitting thresholds
- Warnings at 50%, 75%, 90%
- Hard limits to prevent overspending
- Monthly budget caps
Model Selection Strategy
By Task Type
Simple UI
Use: GPT-3.5, Claude Haiku, Qwen
- Fast and cheap
- Good for visual components
- Tailwind CSS
Complex Logic
Use: GPT-4, Claude Opus
- Better reasoning
- Handles complexity
- Fewer errors
API Integration
Use: GPT-4 Turbo, Claude Sonnet
- Understands APIs
- Good error handling
- Async patterns
Debugging
Use: GPT-4, Claude Opus
- Analyzes code deeply
- Finds root causes
- Suggests fixes
By Project Stage
- Prototyping
- Development
- Production
Fast and cheap
- GPT-3.5 Turbo
- Claude Haiku
- Qwen models
- Quick iterations
Switching Models
You can switch models anytime:1
Library → Agents
Find the agent to update
2
Edit Agent
Click Edit (open-source only)
3
Change Model
Select new model from dropdown
4
Save
Changes apply immediately
5
Test
Try agent with new model
Each agent can use a different model. Mix and match based on needs.
Best Practices
Match Model to Task
Match Model to Task
Don’t use GPT-4 for simple UI - waste of money
Don’t use GPT-3.5 for complex logic - poor results
Choose appropriately for each task
Test Different Models
Test Different Models
Same prompt, different models, different results
Test to find best model for your use case
Balance cost vs quality
Monitor Costs
Monitor Costs
Check usage regularly
Set budget alerts
Optimize expensive operations
Review monthly spending
Update Models
Update Models
New models released regularly
Better performance over time
Lower costs as prices drop
Stay current with updates