Overview
This page collects the most common issues encountered when developing, deploying, and self-hosting Tesslate Studio. Each section includes symptoms, diagnosis commands, root causes, and solutions. If you are new to the codebase, scan the section headers to find the category that matches your problem.Container Issues
Devserver Image Missing
Symptoms: User project containers fail to start. Pods stuck inImagePullBackOff or ErrImagePull.
Diagnosis:
tesslate-devserver image was never built or loaded into the cluster.
Solution:
- Docker Compose
- Minikube
- AWS EKS
ImagePullBackOff
Symptoms: Pod stuck inImagePullBackOff state.
Diagnosis:
- Image not loaded into cluster (Minikube): Run
minikube -p tesslate image load <image>:latest - ECR credentials expired (AWS): Re-authenticate:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com - Wrong image name in config: Verify
K8S_DEVSERVER_IMAGEin the backend environment matches the actual image name
Pod Stuck in CrashLoopBackOff
Symptoms: Pod repeatedly crashes and restarts. Diagnosis:- Missing environment variables: Verify secrets are properly mounted:
kubectl exec -n tesslate deployment/tesslate-backend -- env | grep DATABASE - Database connection failure: Check
DATABASE_URLand ensure the database pod is running - Missing Python dependencies: Rebuild the image with
--no-cache
Namespace Stuck in Terminating
Symptoms: A project namespace stays inTerminating state and never completes deletion.
Diagnosis:
Database Issues
Connection Refused
Symptoms: Backend logs showConnection refused or timeout errors for PostgreSQL.
Diagnosis:
- Database not running: Restart it:
docker compose up -d postgresorkubectl rollout restart deployment/tesslate-postgres -n tesslate - Wrong DATABASE_URL: Verify the format:
postgresql+asyncpg://user:pass@host:5432/dbname. Check with:kubectl exec -n tesslate deployment/tesslate-backend -- env | grep DATABASE_URL - Network policy blocking: Ensure the NetworkPolicy allows backend-to-database traffic
Migration Errors
Symptoms:alembic upgrade head fails.
Diagnosis:
Multiple heads detected
Multiple heads detected
Relation does not exist
Relation does not exist
Migration partially applied
Migration partially applied
If a migration fails midway, check the current state and fix manually:
Autogenerate misses changes
Autogenerate misses changes
Ensure all model files are imported in
alembic/env.py:Database Seeding Failures
Symptoms: Seed scripts fail or produce no data. Diagnosis:- Migrations not applied: Run
alembic upgrade headfirst - Script not copied into container: Verify the
docker cpstep completed successfully - PYTHONPATH not set: Always include
-e PYTHONPATH=/appwhen running scripts inside the container
Agent Issues
LLM Timeout or No Response
Symptoms: Chat messages do not get responses. The UI spins indefinitely. Diagnosis:- Missing API key: Verify
LITELLM_API_BASEandLITELLM_MASTER_KEYare set - Rate limiting: Check logs for rate limit errors; implement exponential backoff
- Model not available: Verify the model name in
LITELLM_DEFAULT_MODELSis correct and accessible
Tool Execution Failures
Symptoms: Agent tool calls fail. Logs show tool execution errors. Diagnosis:- Container not running: The user project container must be started before the agent can execute file or shell operations
- File path issues: Tool file paths are relative to the project root; verify the expected file exists
- Permission denied: Check that the container user has write access to the target directory
Streaming Errors
Symptoms: Agent responses cut off mid-stream or the SSE connection drops. Common causes:- Proxy timeout: NGINX Ingress default timeouts may be too short for long agent runs. Ingress annotations should set
proxy-read-timeoutandproxy-send-timeoutto 3600 - Client-side EventSource disconnect: Ensure the frontend properly handles reconnection
- Backend exception during streaming: Check backend logs for tracebacks during the stream
Deployment Issues (External Providers)
SSL Certificate Not Valid
Symptoms: Browser shows certificate warning when accessing the application. Diagnosis:- DNS not propagated: Wait up to 48 hours for DNS propagation
- Cloudflare API token invalid: The token needs Zone:Zone:Read and Zone:DNS:Edit permissions
- Wildcard cert subdomain limitation:
*.domain.comonly covers one level.foo.bar.domain.comrequires a separate cert or Cloudflare proxy
Domain Routing (503 Service Unavailable)
Symptoms: Browser shows 503 error when accessing the application or a user project. Diagnosis:- Pod not ready: Wait for the pod to pass readiness checks, or check why it is failing
- Service endpoint stale: Restart the ingress controller:
kubectl rollout restart deployment/ingress-nginx-controller -n ingress-nginx - Ingress misconfigured: Inspect with
kubectl describe ingress -n tesslate
CORS Errors
Symptoms: Browser console showsAccess to fetch has been blocked by CORS policy.
Solutions:
- Verify
APP_DOMAINin backend config matches your frontend origin - Check the
DynamicCORSMiddlewareinmain.pyincludes the correct URL patterns - Ensure both HTTP and HTTPS origins are allowed if your setup uses both
Docker Issues
Image Not Updating After Rebuild
Symptoms: Code changes do not appear after rebuilding and redeploying. Root cause: Docker (and Minikube) caches images and does not overwrite existing images with the same tag. Solution (Minikube):Volume Permission Errors
Symptoms: Container fails to read or write files. Logs show “Permission denied.” Common causes:- Wrong user inside container: Ensure the container user (1000:1000) owns the project files
- Host filesystem permissions: On Linux, Docker volumes may inherit restrictive host permissions
- Windows line endings: Files created on Windows may cause script execution failures inside Linux containers
Network Conflicts
Symptoms: Containers cannot communicate. Port conflicts on the host. Diagnosis:- Stop conflicting services on the host that use the same ports (5432, 8000, 5173)
- Ensure the project network is connected to Traefik: check the Compose file for network configuration
Kubernetes Issues
Minikube Image Caching
Problem:minikube image load does not overwrite existing images with the same tag.
Solution: Always delete the old image before loading the new one:
NGINX Ingress Configuration
Problem: Ingress returns 503 or the wrong backend. Diagnosis:- Restart the ingress controller after backend deployments:
kubectl rollout restart deployment/ingress-nginx-controller -n ingress-nginx - Verify service selectors match pod labels
- Check that the ingress class annotation matches your controller
PVC Not Bound
Symptoms: Pod stuck inPending with event “unbound PersistentVolumeClaims.”
Diagnosis:
- StorageClass not found: Verify
K8S_STORAGE_CLASSmatches an available StorageClass:kubectl get sc - No available PersistentVolumes: The dynamic provisioner may not be configured
- Pod affinity violation: All pods sharing a RWO PVC must be on the same node. Check for affinity constraint failures
VolumeSnapshot Hibernation Failures
Symptoms: Hibernation fails with “snapshot not ready” or “snapshot creation failed.” Diagnosis:- VolumeSnapshotClass not configured: Ensure
tesslate-ebs-snapshotsexists:kubectl get volumesnapshotclass - EBS CSI driver not installed: The snapshot feature requires the AWS EBS CSI driver with snapshot support
- PVC does not exist or is not bound: Verify the PVC is in
Boundstate before attempting a snapshot
Quick Diagnostic Commands
Getting Help
If you cannot resolve an issue:Search the codebase
Search for the error message in the source code. Many errors have comments explaining the root cause and fix.