Overview
This guide covers two primary deployment paths for Tesslate Studio:- Docker Compose for single-server deployments (development teams, small organizations).
- Kubernetes for production-scale deployments with auto-scaling, per-project namespace isolation, and EBS VolumeSnapshot persistence.
Choosing a Deployment Path
| Factor | Docker Compose | Kubernetes (Minikube) | Kubernetes (AWS EKS) |
|---|---|---|---|
| Best for | Single server, small teams | Local K8s testing | Production, multi-user |
| Scaling | Vertical only (bigger server) | Not designed for scale | Horizontal (node autoscaling) |
| Project isolation | Docker networks | Namespace per project | Namespace + NetworkPolicy |
| Storage | Local filesystem | PVC (hostpath) | EBS VolumeSnapshots |
| SSL/TLS | Traefik + Let’s Encrypt | HTTP only | cert-manager + Cloudflare |
| Complexity | Low | Medium | High |
| Monthly cost | 120 (server) | Free (local) | ~320 (AWS baseline) |
Path 1: Docker Compose (Single Server)
This path deploys all services on a single Linux server with Docker Compose, Traefik for reverse proxying, and optional Let’s Encrypt for SSL.Prerequisites
Server
- Cloud VM or dedicated server
- 16 GB RAM recommended
- 50 GB+ disk space
- Ubuntu 22.04 LTS (or similar)
Domain
- Custom domain (e.g.,
studio.yourcompany.com) - DNS access to create A records
- Wildcard DNS support (
*.studio.yourcompany.com)
Step-by-Step Setup
Configure DNS
| Type | Name | Value | TTL |
|---|---|---|---|
| A | studio | your-server-ip | 300 |
| A | *.studio | your-server-ip | 300 |
Configure SSL with Let's Encrypt (Traefik)
docker-compose.yml Traefik service for production SSL:Database Backups
Create an automated backup script at/home/tesslate/backup.sh:
Performance Tuning
PostgreSQL optimization
PostgreSQL optimization
docker-compose.yml postgres service:Docker resource limits
Docker resource limits
Log rotation
Log rotation
Path 2: Kubernetes (Production)
Kubernetes mode provides per-project namespace isolation, EBS VolumeSnapshot-based persistence, NetworkPolicy security, automatic SSL, and horizontal scaling. This section covers both Minikube (local testing) and AWS EKS (production).Kubernetes Architecture
proj-{uuid}) with:
- A dedicated PVC for block storage
- A file manager pod (always running, handles file operations)
- Dev container pods (frontend, backend, database as needed)
- An Ingress resource for subdomain routing
- A NetworkPolicy enforcing zero cross-project communication
Option A: Minikube (Local Kubernetes Testing)
Use Minikube to test the full Kubernetes deployment locally before going to production.Install prerequisites
| Software | Install Command |
|---|---|
| Docker Desktop | docker.com/products/docker-desktop |
| Minikube | brew install minikube (macOS) or choco install minikube (Windows) |
| kubectl | brew install kubernetes-cli (macOS) or choco install kubernetes-cli (Windows) |
Build and load all images
Configure secrets
SECRET_KEY(random string for JWT)DATABASE_URL(default works for in-cluster PostgreSQL)LITELLM_API_BASEandLITELLM_MASTER_KEY
Minikube limitations compared to production
Minikube limitations compared to production
| Feature | Minikube | AWS EKS |
|---|---|---|
| VolumeSnapshots / Timeline | Not supported | EBS snapshots |
| Hibernation with snapshots | Projects stop; no snapshot | Snapshot created, namespace deleted |
| SSL/TLS | HTTP only | Automatic via cert-manager |
| Data persistence | PVC survives pod restarts; lost if cluster deleted | Snapshot-based, survives cluster changes |
Option B: AWS EKS (Production)
For production deployments, use AWS EKS with Terraform for Infrastructure as Code.Infrastructure Provisioning with Terraform
Terraform provisions all required AWS resources: VPC, EKS cluster, ECR repositories, S3 bucket, IAM roles, and Helm charts.Provision infrastructure
| Resource | Details |
|---|---|
| VPC | Public and private subnets across 2 AZs, NAT gateway |
| EKS Cluster | Managed control plane with IRSA enabled |
| Node Groups | Primary (on-demand t3.large) and optional Spot group |
| EKS Add-ons | CoreDNS, kube-proxy, VPC CNI, EBS CSI Driver |
| ECR Repositories | tesslate-backend, tesslate-frontend, tesslate-devserver |
| S3 Bucket | Versioned, encrypted, lifecycle-managed project storage |
| IAM Roles | IRSA roles for S3 access and EBS provisioning |
| Storage Class | tesslate-block-storage (EBS gp3, encrypted) |
Build and push images to ECR
--no-cache when building images for deployment. This ensures all code changes are included in the image.EBS VolumeSnapshot Persistence
In production Kubernetes mode, user projects are persisted using EBS VolumeSnapshots (not S3). Here is how the lifecycle works:Project opens (restore)
proj-{uuid}) and a PVC. If a VolumeSnapshot exists from a previous session, the PVC is created with a dataSource pointing to the snapshot. EBS lazy-loads data on first access, so startup is near-instant (under 10 seconds).Runtime (fast local I/O)
Hibernation (snapshot and cleanup)
K8S_HIBERNATION_IDLE_MINUTES (default: 10), a cleanup CronJob triggers:- Creates a VolumeSnapshot from the PVC (takes under 5 seconds to initiate)
- Waits for the snapshot to be ready
- Deletes the namespace (cascading to all resources)
AWS Cost Estimation
| Resource | Monthly Cost |
|---|---|
| EKS Cluster (control plane) | ~$73 |
| EC2 Nodes (2 x t3.large on-demand) | ~$121 |
| EBS Volumes (~100 GB total) | ~$10 |
| S3 Storage (~10 GB) | ~$0.23 |
| NAT Gateway (2 AZs) | ~$66 |
| Data Transfer | ~50 |
| Total baseline | ~320 |
Cost optimization tips
Cost optimization tips
- Spot instances for user containers: Add a Spot node group (90% savings on compute for dev containers).
- Cluster autoscaling: Scale down during off-hours.
- S3 lifecycle policies: Move old versions to Infrequent Access (30 days) then Glacier (90 days).
- EBS gp3: Already used by default; 20% cheaper than gp2.
- Right-size nodes: Use
t3.mediumif resource usage is low.
Kustomize Manifest Structure
Tesslate Studio uses Kustomize for Kubernetes manifest management:Updating a Running Deployment
- Minikube
- AWS EKS
Security Hardening
Troubleshooting
SSL certificate not issued (Docker Compose / Traefik)
SSL certificate not issued (Docker Compose / Traefik)
- DNS records not propagated yet (wait 10 to 15 minutes)
- Port 80 blocked by firewall
- Rate limit hit (Let’s Encrypt allows 5 certificates per week per domain)
SSL certificate not issued (Kubernetes / cert-manager)
SSL certificate not issued (Kubernetes / cert-manager)
Image not updating after rebuild (Minikube)
Image not updating after rebuild (Minikube)
ImagePullBackOff on user project pods (AWS EKS)
ImagePullBackOff on user project pods (AWS EKS)
503 error on user project URL
503 error on user project URL
Namespace stuck in Terminating state
Namespace stuck in Terminating state
Site not loading after backend restart (AWS)
Site not loading after backend restart (AWS)
Monitoring and Maintenance
Health Checks
| Endpoint | Purpose |
|---|---|
GET /health | Backend liveness check (returns 200 if alive) |
GET /api/config | Public configuration (deployment mode, app domain) |
backend-deployment.yaml):
- Startup probe:
/health, 10s initial delay, allows 2 minutes for boot - Liveness probe:
/healthevery 10 seconds - Readiness probe:
/healthevery 5 seconds