Introduction
This tutorial guides you through setting up a private, production-ready Kubernetes cluster using Talos OS on Hetzner Cloud with proper NAT gateway configuration.
Architecture Overview
(Managed by Kubernetes)
(3 Talos VMs)
Pub IP: none
- etcd cluster
- kube-apiserver
- kube-scheduler
(Talos VMs)
Pub IP: none
- kubelet
- kube-proxy
- ingress-nginx
- Longhorn storage
- NodePorts 30000/1
(Management Node)
Pub IP: 192.0.2.254
- Public IP (SSH access)
- Private IP (NAT gateway)
- kubectl/talosctl tools
- Routes traffic to internet
Traffic Flow:
- Inbound: Internet → Load Balancer → Private Network Gateway → Kubernetes Worker NodePorts → ingress-nginx → Services
- Outbound: Kubernetes Nodes → Private Network Gateway → Egress VM (NAT) → Internet
- Management: SSH to Egress VM → kubectl/talosctl to Kubernetes cluster
Prerequisites
- Hetzner Console account and API token
- Basic knowledge of Kubernetes and networking
- Understanding of Talos OS concepts
Step 1 - Set Up NAT Gateway and Private Network
This step explains how to set up the egress VM from the visualization above as NAT gateway.
Before creating the Kubernetes cluster, you need to set up a private Network with NAT gateway for internet access. Follow the official Hetzner tutorial:
How to set up NAT for Cloud Networks
Do the following:
You do NOT need the client servers. You should only follow the steps for creating and setting up the Private Network and the NAT server. Use
10.21.0.0/16for the Network.
| Steps to follow | Description |
|---|---|
| Step 1 | Create a new Network as explained in that step. You can set it to 10.21.0.0/16 |
| Step 6 | Create a new server.
|
| Step 2 | Add the route to the Network as explained in that step. Set the server you just created (or the existing server you just configured) as the Gateway. |
| Conclusion | Configure the firewall |
In the steps below, you will need the Network ID for hcloud_network_id. When you select your Network in Hetzner Console, the URL in the address bar of the browser contains the Network ID:
https://console.hetzner.com/projects/<project-id>/networks/<network-id>/resourcesStep 2 - Install prerequisites
In Step 1, you created and configured an egress VM to act as a NAT gateway for the Kubernetes nodes.
On that egress VM, install the following as explained in their official documentation:
kubectl / talosctl are needed to access the Kubernetes cluster from the egress VM.
Step 3 - Create Project Directory
Now that the egress VM works as a NAT gateway and has all the prerequisites installed, you can create a new directory for the project files:
mkdir k8s-cluster && cd k8s-clusterStep 4 - Configure Terraform Variables
In the new directory k8s-cluster on the egress VM, add the following files:
-
Create
variables.tf:variable "hcloud_token" { description = "Hetzner Cloud API token. Prefer to supply via TF_VAR_hcloud_token or terraform.tfvars, or rely on provider using HCLOUD_TOKEN env." type = string sensitive = true }Optional: S3 Backup Variables (Click to expand)
If you want to enable S3 backup for Talos configuration, add these additional variables to your
variables.tf:variable "talos_backup_s3_access_key" { description = "S3 Access Key for Talos Backup." type = string sensitive = true default = "" } variable "talos_backup_s3_secret_key" { description = "S3 Secret Access Key for Talos Backup." type = string sensitive = true default = "" } variable "talos_backup_s3_bucket" { description = "S3 bucket name for Talos backups." type = string default = "" } variable "talos_backup_s3_endpoint" { description = "S3 endpoint hostname for Talos backups." type = string default = "" } variable "talos_backup_s3_region" { description = "S3 region for Talos backups." type = string default = "" }
-
Create
terraform.tfvarsor set your token via environment variable# Option 1: Environment variable export TF_VAR_hcloud_token="your-hetzner-cloud-api-token" # Option 2: Create terraform.tfvars file echo 'hcloud_token = "your-hetzner-cloud-api-token"' > terraform.tfvarsOptional: S3 Backup Variables (Click to expand)
If you want to enable S3 backup for Talos configuration, add these additional variables to your
terraform.tfvars:echo 'talos_backup_s3_access_key = "your-access-key"' > terraform.tfvars echo 'talos_backup_s3_secret_key = "your-secret-key"' > terraform.tfvars echo 'talos_backup_s3_bucket = "your-bucket-name"' > terraform.tfvars echo 'talos_backup_s3_endpoint = "your-endpoint"' > terraform.tfvars echo 'talos_backup_s3_region = "your-region"' > terraform.tfvars
-
Create
kubernetes.tfwith the main cluster configurationReplace
YOUR_NETWORK_ID_HEREwith your actual Network ID.module "kubernetes" { source = "hcloud-k8s/kubernetes/hcloud" version = "3.20.1" cluster_name = "k8s" hcloud_token = var.hcloud_token # Export configs for Talos and Kube API access cluster_kubeconfig_path = "kubeconfig" cluster_talosconfig_path = "talosconfig" # Optional Ingress Controller, Cert Manager and Storage cert_manager_enabled = true ingress_nginx_enabled = true longhorn_enabled = true network_ipv4_cidr = "10.21.0.0/16" # Private nodes, egress via your own gateway VM talos_public_ipv4_enabled = false talos_public_ipv6_enabled = false control_plane_nodepools = [ { name = "control", type = "cx23", location = "hel1", count = 3 } ] worker_nodepools = [ # placement_group = true ensures VMs are distributed across different physical servers { name = "worker-hel-ccx", type = "ccx23", location = "hel1", count = 3, placement_group = true }, ] cluster_healthcheck_enabled = true firewall_use_current_ipv4 = false firewall_use_current_ipv6 = false cluster_access = "private" talos_extra_routes = ["0.0.0.0/0"] network_native_routing_ipv4_cidr = "10.0.0.0/8" # Use your existing Network ID from the NAT gateway setup (Step 1) # You can find this in Hetzner Cloud Console -> Networks or via: hcloud network list hcloud_network_id = YOUR_NETWORK_ID_HERE control_plane_private_vip_ipv4_enabled = true ingress_nginx_kind = "DaemonSet" ingress_nginx_service_external_traffic_policy = "Local" ingress_load_balancer_pools = [ { name = "regional-lb-hel" location = "hel1" } ] cluster_autoscaler_nodepools = [ { name = "autoscaler" type = "ccx23" location = "hel1" min = 0 max = 6 labels = { "autoscaler-node" = "true" } taints = [ "autoscaler-node=true:NoExecute" ] } ] cluster_delete_protection = true }Optional: S3 Backup Configuration (Click to expand)
If you added the S3 backup variables to your
variables.tf, include these lines in yourkubernetes.tfmodule configuration:# Add these lines inside the module "kubernetes" block above talos_backup_s3_endpoint = var.talos_backup_s3_endpoint talos_backup_s3_region = var.talos_backup_s3_region talos_backup_s3_bucket = var.talos_backup_s3_bucket talos_backup_s3_access_key = var.talos_backup_s3_access_key talos_backup_s3_secret_key = var.talos_backup_s3_secret_key
You'll need these files in your working directory:
| File | Description |
|---|---|
| variables.tf | Variable definitions (with optional S3 variables) |
| kubernetes.tf | Main cluster configuration (with optional S3 configuration) |
| terraform.tfvars | Your actual values (or use environment variables) |
Step 5 - Deploy the Cluster
Note that terraform apply will create chargeable resources in Hetzner Console. Terraform will create several chargeable cloud servers, a chargeable Load Balancer, chargeable Snapshots, a Firewall, Placement Groups, and more.
The number of control plane nodes and worker nodes is defined via count = # in kubernetes.tf.
Initialize and apply the Terraform configuration:
terraform init -upgrade
terraform applyReview the planned changes and confirm the deployment. This process will:
- Create Talos images using Packer
- Deploy control plane and worker nodes
- Configure the Kubernetes cluster
- Set up ingress controllers and cert-manager
- Configure Longhorn for persistent storage
Step 6 - Access Your Cluster
After successful deployment, you'll find the configuration files in your current directory:
export TALOSCONFIG=talosconfig
export KUBECONFIG=kubeconfigVerify your cluster is running:
# Check Talos cluster members
talosctl get member
# Check Kubernetes nodes
kubectl get nodes -o wide
# Check all pods across namespaces
kubectl get pods -AStep 7 - Configuration summary and enhancement suggestions
Configuration Highlights
-
Private Network Setup
- Network CIDR:
10.21.0.0/16 - No public IPs: All Kubernetes nodes are private
- Egress routing: Traffic flows through your gateway VM
- Network CIDR:
-
High Availability Features
- Control plane: 3 nodes for HA
- Worker nodes: Distributed across Placement Groups (ensures VMs run on different physical hardware for better reliability)
- Ingress: Load Balancer with DaemonSet configuration
- Storage: Longhorn for distributed persistent storage
-
Security Features
- Private cluster access: No direct internet access to nodes
- Firewall: Controlled access through security groups
- Backup: Optional S3 backup for Talos configuration
Scaling to Full High Availability
For a production-ready, fully high-available setup, consider these enhancements:
-
Multiple Egress VMs
Deploy additional egress VMs in different locations with failover configuration:
- Set up multiple gateway VMs across different zones
- Configure VRRP (Virtual Router Redundancy Protocol) for automatic failover
- Use BGP routing for advanced traffic management
- Multi-Region Load Balancers
ingress_load_balancer_pools = [ { name = "regional-lb-fsn" location = "fsn1" local_traffic = true }, { name = "regional-lb-nbg" location = "nbg1" local_traffic = true }, { name = "regional-lb-hel" location = "hel1" local_traffic = true } ]
- Cross-Zone Worker Distribution
worker_nodepools = [ # Each placement_group = true ensures nodes within each location are on different physical servers { name = "worker-fsn", type = "cpx42", location = "fsn1", count = 2, placement_group = true }, { name = "worker-nbg", type = "cpx42", location = "nbg1", count = 2, placement_group = true }, { name = "worker-hel", type = "cpx42", location = "hel1", count = 2, placement_group = true }, ]
- Additional HA Components
- External DNS: Automatic DNS management for services
- Monitoring: Prometheus and Grafana for observability
- Backup strategies: Regular etcd and persistent volume backups
- Disaster recovery: Cross-region backup and restore procedures
Troubleshooting
-
Common Issues
- Network connectivity: Ensure your egress VM's NAT rules are correctly configured
- DNS resolution: Verify that private nodes can resolve external DNS through the gateway
- Load Balancer access: Check that ingress controllers are properly configured for private network access
-
Useful Commands
# Check Talos node status talosctl -n <node-ip> get nodeready # Check network connectivity from nodes talosctl -n <node-ip> get links # Restart Talos services if needed talosctl -n <node-ip> restart systemd-networkd
Conclusion
With this configuration, you have a fully private, production-ready Kubernetes cluster running on Hetzner Cloud that can scale to meet your needs while maintaining high security and availability standards.