Documentation Index
Fetch the complete documentation index at: https://docs.tesslate.com/llms.txt
Use this file to discover all available pages before exploring further.

1. What you will deploy
EKS cluster
ECR
tesslate-backend, tesslate-frontend, tesslate-devserver, tesslate-ast, tesslate-btrfs-csi, and seeded app images.S3 + IAM (IRSA)
NGINX Ingress + NLB
cert-manager + Cloudflare
*.domain records on the NLB hostname.btrfs CSI + Volume Hub
10.0.0.0/16, three public plus three private subnets across three AZs), a NAT gateway, LiteLLM (with optional RDS backend), and the application workloads in the tesslate namespace (backend, frontend, worker, Redis, Postgres or external RDS, cleanup CronJobs).
The shared stack (k8s/terraform/shared/) provisions cross-environment resources: ECR repos, a small tesslate-platform-eks cluster for internal tools (Headscale VPN and friends), and platform-level NGINX Ingress plus cert-manager plus Cloudflare DNS.
2. Prerequisites
AWS account + CLI
aws sts get-caller-identity must succeed against the target account.Terraform >= 1.5
main.tf. State is stored in S3, keyed per environment.kubectl + Helm
Docker with buildx
linux/amd64 builds. Apple Silicon still works because buildx cross-compiles.Cloudflare API token
Zone:DNS:Edit and Zone:Zone:Read on the target zone.ECR push access
eks_admin_iam_arns, or a named team role with push permissions.- Ability to create VPC, EKS, IAM, S3, and ECR resources (direct or via assumed role).
- An IAM user listed in
eks_admin_iam_arnsfor the target environment, or membership in a team IAM group. See the EKS access model section for details. - AWS Secrets Manager access to
tesslate/terraform/{production,beta,shared}for pulling tfvars.
kubectl and docker exec with MSYS_NO_PATHCONV=1 so Git Bash does not rewrite paths.3. First-time provisioning
Apply the shared stack
Apply beta first
aws-deploy.sh helper auto-detects backend drift: if your local .terraform/terraform.tfstate points at the wrong environment, it reinitializes with the correct backend HCL before running plan or apply.
4. Environments: beta vs production
- Beta
- Production
| Field | Value |
|---|---|
| Terraform state key | beta/terraform.tfstate |
| Backend config | backend-beta.hcl |
| tfvars file | terraform.beta.tfvars |
| Secrets Manager entry | tesslate/terraform/beta |
| Kustomize overlay | k8s/overlays/aws-beta/ |
| ECR tag | :beta |
| kubectl context | tesslate-beta-eks |
5. Secrets management (envFrom auto-sync)
Three Kubernetes secrets in thetesslate namespace are fully terraform-managed from k8s/terraform/aws/kubernetes.tf:
tesslate-app-secrets: app-level config (APP_DOMAIN,LITELLM_MASTER_KEY, OAuth client secrets, Stripe keys, SMTP, PostHog, etc.)postgres-secret: Postgres credentialss3-credentials: S3 bucket config (the backend pod uses IRSA for auth, so no static AWS keys land in the secret)
envFrom. This is the auto-sync half of the pattern: every key added to a terraform-managed secret is available as a pod env var on the next rollout, with no kustomize edit required.
The other half is explicit env entries in k8s/overlays/aws-base/backend-patch.yaml. Those entries live under a $patch: replace directive so the base manifest’s env array is wiped and only static values plus one alias (K8S_INGRESS_DOMAIN to APP_DOMAIN) remain. Without $patch: replace, stale base entries would merge back in.
Rotating a secret
reload step rolls pods so they pick up the new secret values.
6. EKS access
EKS uses a role-based model. Regular humans assume one of four team roles. Terraform, CI, and a small list of named admins (tesslate-terraform, tesslate-bigboss) assume the eks-deployer role.
| I want to… | Role | ARN pattern |
|---|---|---|
kubectl logs, get, describe, read CloudWatch logs, browse ECR | team-observer | arn:aws:iam::859561299901:role/tesslate-{env}-eks-team-observer |
Above, plus kubectl rollout, restart/patch deployments, push to ECR | team-deployer | arn:aws:iam::859561299901:role/tesslate-{env}-eks-team-deployer |
Above, plus kubectl exec, shell into pods, run debug containers | team-debugger | arn:aws:iam::859561299901:role/tesslate-{env}-eks-team-debugger |
| Above, plus Secrets Manager, RBAC, namespace mgmt, IAM team users | team-admin | arn:aws:iam::859561299901:role/tesslate-{env}-eks-team-admin |
Configure kubectl with the deployer role
--role-arn bakes role assumption into the resulting kubeconfig, so every later kubectl call uses it.eks_admin_iam_arns should use named AWS CLI profiles with role_arn entries for the team role they need. Full onboarding flow (IAM groups, ~/.aws/config snippets, assume-role one-liner) lives in the EKS Cluster Access guide.aws-deploy.sh invokes aws eks update-kubeconfig under the hood with --role-arn arn:aws:iam::859561299901:role/tesslate-{env}-eks-eks-deployer every time it touches the cluster, so you do not have to rerun the commands above for its subcommands.
7. Build and push images
Six images live in ECR under account 859561299901 inus-east-1:
| Repository | Dockerfile | Purpose |
|---|---|---|
tesslate-backend | orchestrator/Dockerfile | FastAPI + ARQ worker |
tesslate-frontend | app/Dockerfile.prod | React + Vite SPA behind NGINX |
tesslate-devserver | orchestrator/Dockerfile.devserver | User project container base |
tesslate-ast | services/ast/Dockerfile | AST parser sidecar of the backend pod |
tesslate-btrfs-csi | services/btrfs-csi/Dockerfile | CSI driver + Volume Hub |
tesslate-markitdown, tesslate-deerflow | seeds/apps/.../Dockerfile | Seeded Tesslate Apps |
:production or :beta for first-class images; :latest for seeded app images.
build subcommand performs these steps:
git submodule update --init --recursive(the agent runner inpackages/tesslate-agentis copied into the backend image).aws ecr get-login-password | docker loginagainst859561299901.dkr.ecr.us-east-1.amazonaws.com.docker buildx build --platform linux/amd64 --pushin parallel across selected images.aws eks update-kubeconfigwith theeks-deployerrole for the target environment.kubectl apply -k k8s/overlays/aws-{env}to pick up any manifest changes.- Rolling restart of the impacted Deployments plus a parallel
kubectl rollout status --timeout=300s. - If the backend was rebuilt,
python -m scripts.seed_appsruns inside the backend pod to upsert the Tesslate Apps registry.
8. Deploy
9. Verify
https://<your-domain>/ and confirm the dashboard loads and you can create a project.10. DNS and TLS
DNS and certificates are fully managed by terraform plus in-cluster controllers:- Cloudflare DNS records (
dns.tf) create CNAMEs for the apex domain and*.domainpointing at the NLB hostname, proxied through Cloudflare. - external-dns reconciles per-project subdomain records from Ingress annotations when users deploy preview projects.
- cert-manager runs a
ClusterIssuerthat uses the Cloudflare API token for DNS01 challenges, minting a wildcard Let’s Encrypt cert stored in thetesslate-wildcard-tlsSecret. Ingress resources reference it viaK8S_WILDCARD_TLS_SECRET. - Cloudflare SSL mode should be
Full (strict)so browser to edge and edge to NLB are both encrypted.
Ready=False for longer than ten minutes, check kubectl describe certificaterequest and cert-manager logs for Cloudflare API errors.
11. Seed the production database
Run this once after the initial terraform apply. Seeds upsert by slug, so running twice is safe but wasteful.alembic upgrade head inside the backend pod.
12. Scaling
Three independent layers:Pod replicas
k8s/overlays/aws-production/replicas-patch.yaml sets backend, frontend, worker, and ingress controller replica counts. Hotfix scale: kubectl --context=tesslate-production-eks scale deploy/tesslate-backend -n tesslate --replicas=4.HPA
enable_metrics_server = true). Add HorizontalPodAutoscaler per Deployment as needed.Cluster autoscaler
eks_node_min_size and eks_node_max_size; spot up to eks_spot_max_size. User project workloads prefer the spot group.additional_node_groups in tfvars and apply. Schema lives in variables.tf.
13. Observability
-
Control plane logs stream to CloudWatch at
/aws/eks/tesslate-{env}-eks/cluster: -
Workload logs:
kubectl --context=tesslate-production-eks logs -n tesslate deploy/tesslate-backend -f. -
Metrics:
kubectl --context=tesslate-production-eks top pods -n tesslateandkubectl top nodes. For historical data, installkube-prometheus-stackvia Helm or route metrics from the OpenTelemetry Collector to CloudWatch. - Structured logs + OpenTelemetry: see the Enterprise observability guide for deploying the OTel Collector, wiring exporters, and enabling the audit log stream.
14. Updates and migrations
Trigger deploy
Deploy Production workflow (.github/workflows/deploy-production.yml) in GitHub Actions. It downloads tesslate/terraform/production from Secrets Manager, runs terraform plan -detailed-exitcode, applies, then runs ./scripts/aws-deploy.sh deploy-k8s production followed by ./scripts/aws-deploy.sh build production.Or, from your workstation:rollout restart, so the Deployment is always serving traffic. For a hard cutover or a very large schema migration, drain traffic first using the safe shutdown procedure.
15. Rollback and safe shutdown
Rollback a single Deployment::production, or bump newTag in k8s/overlays/aws-production/kustomization.yaml to a specific SHA and reapply.
For planned downtime, maintenance windows, or draining user pods cleanly, follow the safe shutdown procedure in docs/guides/safe-shutdown-procedure.md on GitHub: stop user containers, scale the worker and backend to zero, pause the task queue, and only then apply the risky change.
16. Troubleshooting
AccessDenied: eks:DescribeCluster
AccessDenied: eks:DescribeCluster
eks_admin_iam_arns. Assume a team role first (one-shot aws sts assume-role, or a named AWS profile with role_arn) before running any aws eks or kubectl command.error: You must be logged in to the server (Unauthorized)
error: You must be logged in to the server (Unauthorized)
ErrImagePull / unauthorized on a tesslate-* image
ErrImagePull / unauthorized on a tesslate-* image
no match for platform in manifest
no match for platform in manifest
docker buildx build --platform linux/amd64 --push. The build subcommand does this by default.Ingress returns 503
Ingress returns 503
Certificate stuck Ready=False
Certificate stuck Ready=False
Zone:Zone:Read and Zone:DNS:Edit on the correct zone, and check cert-manager logs:Frontend calls go to /api/api/...
Frontend calls go to /api/api/...
api-url ConfigMap includes a trailing /api. Set frontend_api_url = "https://opensail.tesslate.com" in tfvars (no /api) and reapply.Backend CrashLoopBackOff immediately
Backend CrashLoopBackOff immediately
tesslate-app-secrets is missing a key consumed via envFrom.k8s/terraform/aws/kubernetes.tf and reapply.No module named 'tesslate_agent' in the backend
No module named 'tesslate_agent' in the backend
docker build. Run git submodule update --init --recursive, then rebuild. The build script handles this automatically.Volume Hub pods stuck Terminating
Volume Hub pods stuck Terminating
tesslate-btrfs-csi-node rollout to stabilize; ./scripts/aws-deploy.sh build compute and reload volume-hub sequence these correctly.Orphaned proj-* namespaces
Orphaned proj-* namespaces
terraform apply times out on Helm resources
terraform apply times out on Helm resources
-target=module.eks first, then rerun the full apply once the cluster is Ready.Next steps
Local Kubernetes
Docker Setup
Publishing Apps
Billing
Getting help
Discord
GitHub
[email protected].