Get Rewarded! We will reward you with up to €50 credit on your account for every tutorial that you write and we publish!

Private Kubernetes cluster via Terraform and Talos

profile picture
Author
Tamas Mihalik
Published
2026-01-19
Time to read
13 minutes reading time

Introduction

This tutorial guides you through setting up a private, production-ready Kubernetes cluster using Talos OS on Hetzner Cloud with proper NAT gateway configuration.

Architecture Overview

Internet
203.0.113.1
Hetzner Load Balancer
(Managed by Kubernetes)
Pvt IP: 10.21.64.251      Pub IP: 203.0.113.1
Port 80 » :30000
Port 443 » :30001
10.21.0.0/16
Private Network Gateway 10.21.0.1
10.21.64.0/25

10.21.65.0/25

10.21.0.0/24
0.0.0.0/0
Kubernetes Cluster
Control Plane Nodes

(3 Talos VMs)
Pvt IP: 10.21.64.X
Pub IP: none
  • etcd cluster
  • kube-apiserver
  • kube-scheduler
Worker Nodes
(Talos VMs)
Pvt IP: 10.21.64.X
Pub IP: none
  • kubelet
  • kube-proxy
  • ingress-nginx
  • Longhorn storage
  • NodePorts 30000/1
Egress VM
(Management Node)
Pvt IP: 10.21.0.2
Pub IP: 192.0.2.254
  • Public IP (SSH access)
  • Private IP (NAT gateway)
  • kubectl/talosctl tools
  • Routes traffic to internet

Traffic Flow:

  • Inbound: Internet → Load Balancer → Private Network Gateway → Kubernetes Worker NodePorts → ingress-nginx → Services
  • Outbound: Kubernetes Nodes → Private Network Gateway → Egress VM (NAT) → Internet
  • Management: SSH to Egress VM → kubectl/talosctl to Kubernetes cluster

Prerequisites

Step 1 - Set Up NAT Gateway and Private Network

This step explains how to set up the egress VM from the visualization above as NAT gateway.

Before creating the Kubernetes cluster, you need to set up a private Network with NAT gateway for internet access. Follow the official Hetzner tutorial:

How to set up NAT for Cloud Networks

Do the following:

You do NOT need the client servers. You should only follow the steps for creating and setting up the Private Network and the NAT server. Use 10.21.0.0/16 for the Network.

Steps to follow Description
Step 1 Create a new Network as explained in that step. You can set it to 10.21.0.0/16
Step 6 Create a new server.
  • Networking
    Select the Network you just created.
  • Cloud config
    Copy&paste the NAT server script provided in that step. Remember to change 10.0.0.0/16 to 10.21.0.0/16.
If you already have an existing server that should serve as the NAT gateway, you can instead follow the instructions for the NAT server in Step 3 and Step 4 to set everything up manually.
Step 2 Add the route to the Network as explained in that step. Set the server you just created (or the existing server you just configured) as the Gateway.
Conclusion Configure the firewall

In the steps below, you will need the Network ID for hcloud_network_id. When you select your Network in Hetzner Console, the URL in the address bar of the browser contains the Network ID:

https://console.hetzner.com/projects/<project-id>/networks/<network-id>/resources

Step 2 - Install prerequisites

In Step 1, you created and configured an egress VM to act as a NAT gateway for the Kubernetes nodes.

On that egress VM, install the following as explained in their official documentation:

kubectl / talosctl are needed to access the Kubernetes cluster from the egress VM.

Step 3 - Create Project Directory

Now that the egress VM works as a NAT gateway and has all the prerequisites installed, you can create a new directory for the project files:

mkdir k8s-cluster && cd k8s-cluster

Step 4 - Configure Terraform Variables

In the new directory k8s-cluster on the egress VM, add the following files:

  • Create variables.tf:

    variable "hcloud_token" {
      description = "Hetzner Cloud API token. Prefer to supply via TF_VAR_hcloud_token or terraform.tfvars, or rely on provider using HCLOUD_TOKEN env."
      type        = string
      sensitive   = true
    }
    Optional: S3 Backup Variables (Click to expand)

    If you want to enable S3 backup for Talos configuration, add these additional variables to your variables.tf:

    variable "talos_backup_s3_access_key" {
      description = "S3 Access Key for Talos Backup."
      type        = string
      sensitive   = true
      default     = ""
    }
    
    variable "talos_backup_s3_secret_key" {
      description = "S3 Secret Access Key for Talos Backup."
      type        = string
      sensitive   = true
      default     = ""
    }
    
    variable "talos_backup_s3_bucket" {
      description = "S3 bucket name for Talos backups."
      type        = string
      default     = ""
    }
    
    variable "talos_backup_s3_endpoint" {
      description = "S3 endpoint hostname for Talos backups."
      type        = string
      default     = ""
    }
    
    variable "talos_backup_s3_region" {
      description = "S3 region for Talos backups."
      type        = string
      default     = ""
    }



  • Create terraform.tfvars or set your token via environment variable

    # Option 1: Environment variable
    export TF_VAR_hcloud_token="your-hetzner-cloud-api-token"
    
    # Option 2: Create terraform.tfvars file
    echo 'hcloud_token = "your-hetzner-cloud-api-token"' > terraform.tfvars
    Optional: S3 Backup Variables (Click to expand)

    If you want to enable S3 backup for Talos configuration, add these additional variables to your terraform.tfvars:

    echo 'talos_backup_s3_access_key = "your-access-key"' > terraform.tfvars
    echo 'talos_backup_s3_secret_key = "your-secret-key"' > terraform.tfvars
    echo 'talos_backup_s3_bucket = "your-bucket-name"' > terraform.tfvars
    echo 'talos_backup_s3_endpoint = "your-endpoint"' > terraform.tfvars
    echo 'talos_backup_s3_region = "your-region"' > terraform.tfvars



  • Create kubernetes.tf with the main cluster configuration

    See registry.terraform.io

    Replace YOUR_NETWORK_ID_HERE with your actual Network ID.

    module "kubernetes" {
      source  = "hcloud-k8s/kubernetes/hcloud"
      version = "3.20.1"
    
      cluster_name = "k8s"
      hcloud_token = var.hcloud_token
    
      # Export configs for Talos and Kube API access
      cluster_kubeconfig_path  = "kubeconfig"
      cluster_talosconfig_path = "talosconfig"
    
      # Optional Ingress Controller, Cert Manager and Storage
      cert_manager_enabled  = true
      ingress_nginx_enabled = true
      longhorn_enabled      = true
    
      network_ipv4_cidr = "10.21.0.0/16"
    
      # Private nodes, egress via your own gateway VM
      talos_public_ipv4_enabled = false
      talos_public_ipv6_enabled = false
    
      control_plane_nodepools = [
        { name = "control", type = "cx23", location = "hel1", count = 3 }
      ]
      
      worker_nodepools = [
        # placement_group = true ensures VMs are distributed across different physical servers
        { name = "worker-hel-ccx", type = "ccx23", location = "hel1", count = 3, placement_group = true },
      ]
    
      cluster_healthcheck_enabled = true
      firewall_use_current_ipv4 = false
      firewall_use_current_ipv6 = false
      cluster_access = "private"
      talos_extra_routes = ["0.0.0.0/0"]
      network_native_routing_ipv4_cidr = "10.0.0.0/8"
    
      # Use your existing Network ID from the NAT gateway setup (Step 1)
      # You can find this in Hetzner Cloud Console -> Networks or via: hcloud network list
      hcloud_network_id = YOUR_NETWORK_ID_HERE
    
      control_plane_private_vip_ipv4_enabled = true
      ingress_nginx_kind = "DaemonSet"
      ingress_nginx_service_external_traffic_policy = "Local"
    
      ingress_load_balancer_pools = [
        {
           name          = "regional-lb-hel"
          location      = "hel1"
        }
      ]
    
      cluster_autoscaler_nodepools = [
        {
          name     = "autoscaler"
          type     = "ccx23"
          location = "hel1"
          min      = 0
          max      = 6
          labels   = { "autoscaler-node" = "true" }
          taints   = [ "autoscaler-node=true:NoExecute" ]
        }
      ]
    
      cluster_delete_protection = true
    }
    Optional: S3 Backup Configuration (Click to expand)

    If you added the S3 backup variables to your variables.tf, include these lines in your kubernetes.tf module configuration:

    # Add these lines inside the module "kubernetes" block above
      talos_backup_s3_endpoint   = var.talos_backup_s3_endpoint
      talos_backup_s3_region     = var.talos_backup_s3_region
      talos_backup_s3_bucket     = var.talos_backup_s3_bucket
      talos_backup_s3_access_key = var.talos_backup_s3_access_key
      talos_backup_s3_secret_key = var.talos_backup_s3_secret_key

You'll need these files in your working directory:

File Description
variables.tf Variable definitions (with optional S3 variables)
kubernetes.tf Main cluster configuration (with optional S3 configuration)
terraform.tfvars Your actual values (or use environment variables)

Step 5 - Deploy the Cluster

Note that terraform apply will create chargeable resources in Hetzner Console. Terraform will create several chargeable cloud servers, a chargeable Load Balancer, chargeable Snapshots, a Firewall, Placement Groups, and more.

The number of control plane nodes and worker nodes is defined via count = # in kubernetes.tf.

Initialize and apply the Terraform configuration:

terraform init -upgrade
terraform apply

Review the planned changes and confirm the deployment. This process will:

  1. Create Talos images using Packer
  2. Deploy control plane and worker nodes
  3. Configure the Kubernetes cluster
  4. Set up ingress controllers and cert-manager
  5. Configure Longhorn for persistent storage

Step 6 - Access Your Cluster

After successful deployment, you'll find the configuration files in your current directory:

export TALOSCONFIG=talosconfig
export KUBECONFIG=kubeconfig

Verify your cluster is running:

# Check Talos cluster members
talosctl get member

# Check Kubernetes nodes
kubectl get nodes -o wide

# Check all pods across namespaces
kubectl get pods -A

Step 7 - Configuration summary and enhancement suggestions

Configuration Highlights

  • Private Network Setup

    • Network CIDR: 10.21.0.0/16
    • No public IPs: All Kubernetes nodes are private
    • Egress routing: Traffic flows through your gateway VM
  • High Availability Features

    • Control plane: 3 nodes for HA
    • Worker nodes: Distributed across Placement Groups (ensures VMs run on different physical hardware for better reliability)
    • Ingress: Load Balancer with DaemonSet configuration
    • Storage: Longhorn for distributed persistent storage
  • Security Features

    • Private cluster access: No direct internet access to nodes
    • Firewall: Controlled access through security groups
    • Backup: Optional S3 backup for Talos configuration


Scaling to Full High Availability

For a production-ready, fully high-available setup, consider these enhancements:

  • Multiple Egress VMs

    Deploy additional egress VMs in different locations with failover configuration:

    • Set up multiple gateway VMs across different zones
    • Configure VRRP (Virtual Router Redundancy Protocol) for automatic failover
    • Use BGP routing for advanced traffic management

  • Multi-Region Load Balancers
    ingress_load_balancer_pools = [
      {
        name          = "regional-lb-fsn"
        location      = "fsn1"
        local_traffic = true
      },
      {
        name          = "regional-lb-nbg"
        location      = "nbg1"
        local_traffic = true
      },
      {
        name          = "regional-lb-hel"
        location      = "hel1"
        local_traffic = true
      }
    ]

  • Cross-Zone Worker Distribution
    worker_nodepools = [
      # Each placement_group = true ensures nodes within each location are on different physical servers
      { name = "worker-fsn", type = "cpx42", location = "fsn1", count = 2, placement_group = true },
      { name = "worker-nbg", type = "cpx42", location = "nbg1", count = 2, placement_group = true },
      { name = "worker-hel", type = "cpx42", location = "hel1", count = 2, placement_group = true },
    ]

  • Additional HA Components
    • External DNS: Automatic DNS management for services
    • Monitoring: Prometheus and Grafana for observability
    • Backup strategies: Regular etcd and persistent volume backups
    • Disaster recovery: Cross-region backup and restore procedures
    This setup provides a robust, private Kubernetes cluster that can handle production workloads while maintaining security and high availability standards.


Troubleshooting

  • Common Issues

    1. Network connectivity: Ensure your egress VM's NAT rules are correctly configured
    2. DNS resolution: Verify that private nodes can resolve external DNS through the gateway
    3. Load Balancer access: Check that ingress controllers are properly configured for private network access
  • Useful Commands

    # Check Talos node status
    talosctl -n <node-ip> get nodeready
    
    # Check network connectivity from nodes
    talosctl -n <node-ip> get links
    
    # Restart Talos services if needed
    talosctl -n <node-ip> restart systemd-networkd

Conclusion

With this configuration, you have a fully private, production-ready Kubernetes cluster running on Hetzner Cloud that can scale to meet your needs while maintaining high security and availability standards.

License: MIT
Want to contribute?

Get Rewarded: Get up to €50 in credit! Be a part of the community and contribute. Do it for the money. Do it for the bragging rights. And do it to teach others!

Report Issue
Try Hetzner Cloud

Get €20/$20 free credit!

Valid until: 31 December 2026 Valid for: 3 months and only for new customers
Get started
Want to contribute?

Get Rewarded: Get up to €50 credit on your account for every tutorial you write and we publish!

Find out more