How-to: Provisioning a highly available load balancer in Hetzner Cloud with Ansible

Introduction

With the demand for online services growing each year at an exceptional pace, businesses face interesting opportunities but also challenges as they expand their operations in order to cope with this demand. One such challenge concerns the scaling of the IT infrastructure as the "load" on the underlying systems increases. These days, offering a performant and always readily available service can be paramount to businesses of any size. No customer enjoys downtime nor slow websites and web applications, and when these issues occur they can have a significant impact on the overall performance and reputation of the business.

When discussing the subject of scaling infrastructure, there are various tools and techniques in an IT team's arsenal that can help address the growing demand. One such technique is load balancing.

In the context of web applications, load balancing is a technique that involves the distribution of traffic across two or more servers, with the goal of achieving high availability while also keeping the applications speedy and responsive as the number of users and traffic grow. It is important to note that true high availability is only achieved when one or more servers are always ready to receive traffic at all times. In the simplest form, a highly available load balancer consists of a master server, and a backup server; the system is designed in such a way that the master server typically handles and distributes the traffic to the backends (that is, the servers that run the actual applications), but the backup server automatically takes over in the case the master server experiences downtime, either for planned maintenance or for unforeseen reasons. This process, called failover, typically takes a couple of seconds at most, thus ensuring that the applications remain available with minimal to no downtime, as if nothing happened.

In this tutorial, we'll see how to distribute traffic to your application servers by provisioning a highly available load balancer in Hetzner Cloud, using tools such as haproxy, keepalived, and Ansible:

haproxy will do the heavy lifting of actually distributing the traffic across multiple backends; it is very performant and efficient, and allows us to distribute traffic for multiple distinct applications at the same time; haproxy will continuously ping the backend servers and ensure that traffic is sent to the healthy backends only;
keepalived will take care of ensuring that one load balancer server (either the master or the backup) is always ready to receive traffic; to achieve this, we'll use the Hetzner Cloud CLI, a handy tool that we can use to interact indirectly with the Hetzner Cloud API in order to assign a floating IP address to either the master server or the backup server, depending on the status of each; a floating IP address is simply an IP address that can be "moved" from a server to another as needed, and this can be easily automated. This is especially important for a load balancer because we can use this IP in the DNS configuration for our domains, so to avoid issues such as caching and propagation of DNS records when the failover occurs; by "status of the server", we are referring to whether haproxy is running or not; should the master server go down, haproxy will be seen as unavailable and therefore keepalived - through constant communication between master and backup - will instruct the backup server to take over and assign the floating IPs to itself;
Ansible is a very popular configuration management tool that we can use to automate the whole process rather than configuring all of this manually. This allows for quick and repeatable provisioning of multiple load balancers.

Prerequisites

In order to follow along with this tutorial, you will need:

a Hetzner Cloud account and basic understanding of how to create resources in a project, such as servers and floating IPs; we will be using the aforementioned CLI tool to create these resources, but you can use the Hetzner Cloud web console if you prefer;
a project which will house your servers and the floating IPs;
a token that is required to assign the floating IP addresses with the CLI; you can create one in the Hetzner Cloud console in the project, under Access > API tokens. Take note of this token somewhere like a password manager because you will only see it once;
Ansible installed on your computer - please refer to the instructions here for your operating system;
python installed (Ansible is written in Python);
the hcloud-python Python module. Ansible has built-in support for Hetzner Cloud, but it requires this module in order to function. We will be using this to implement a dynamic inventory so that Ansible can query Hetzner Cloud directly to find the hosts.

Step 1 - Provisioning the resources

To get started, create a project and a token in Hetzner Cloud if you haven't already. Next, install the Hetzner Cloud CLI - see this page for instructions. Once the CLI is installed, you need to create a context, which allows the CLI to interact with a project in Hetzner Cloud:

hcloud context create lb

Here I am naming the context lb for "load balancer". You will be prompted to enter the token for your Hetzner Cloud project. The command above will create and activate the context, so you can manage resources in the project right away.

Note that while it is possible to create servers directly with Ansible, it is not possible to manage floating IPs this way yet, so we can just use the Hetzner Cloud CLI to manage the cloud resources, and Ansible to configure them.

Before creating the servers, let's add our SSH key to our Hetzner Cloud project. We will add this SSH key to the servers so that Ansible can easily access them:

hcloud ssh-key create --public-key-from-file ~/.ssh/id_rsa.pub --name lb

Change the path of your public key if needed.

As mentioned earlier, we'll need two servers for the load balancers. To create the master and backup servers, run:

hcloud server create --location nbg1 --ssh-key lb --type cx11-ceph --image ubuntu-18.04  --name lb-master
hcloud server create --location nbg1 --ssh-key lb --type cx11-ceph --image ubuntu-18.04  --name lb-backup

Take note of the IP addresses of these servers because you will need them soon.

You can change the location to Falkenstein (fsn1) or Helsinki (hel1), if you prefer. However it is recommended that the load balancer servers be in the same location as the backend servers of your applications for better performance. I find that Nuremberg (nbg1) has better latency for users in the USA, so that's the location I typically choose. For the server type I use the model CX11, which is the entry level cloud server; load balancing is typically a lightweight task, so even the smallest servers will do just fine in most cases. I am choosing the "Ceph" variant for the servers; this ensures that the servers use network storage instead of local storage. Local storage is faster, but with network storage the cloud servers can be booted automatically on other physical servers in case of hardware failure, with no data loss. The OS image we are going to use is Ubuntu 18.04. The OS used for a load balancer isn't really important. Ubuntu is a very popular choice, thus the Ansible code we'll see soon is tested with Ubuntu, although it can be easily adapted to other Linux operating systems. The naming convention for the servers used for a load balancer is load_balancer_name-role, where role is either "master" or "backup".

Finally, we need to create one or more floating IPs depending on how many applications you want to configure the load balancer for. In the example illustrated in this tutorial, we are going to configure load balancing for a single application, but we could define multiple load balancing groups. A group will require a distinct floating IP. This is so that we can bind the same ports for different applications at the same time. To create the floating IP, run:

hcloud floating-ip create --home-location nbg1 --type ipv4 --name lb-http

Make sure the location matches that of the servers. The naming convention here is load_balancer_name-group_name.

Step 2 - Ansible: shared configuration

Ansible works with playbooks, that is simple YAML files. A playbook defines which hosts will be affected, as well as the tasks that Ansible will perform on these hosts. One great feature of tools like Ansible is that you can run a playbook against the servers as many times as you wish, and the outcome will always be the same - provided you follow idempotency best practices. If you stick to Ansible's built-in modules, this will be taken care of for you in most cases.

Before we dive into the playbook, we need to write a basic configuration file. Create a directory for your Ansible project somewhere, and in the root of this directory create the file ansible.cfg with the following content:

[defaults]
vault_password_file = ~/.secrets/ansible-vault-pass

[ssh_connection]
pipelining = True

[inventory]
enable_plugins = hcloud

vault_password_file

We'll need to store somewhere two secrets: the API token for Hetzner Cloud, and the password that keepalived will use on both servers to communicate with each other. Ansible has built-in support for vaults, encrypted files that you can safely check into a version control system without risk of leaking secrets. Ansible requires a password to access the vault, so rather than typing the password each time Ansible needs to access it, you can tell Ansible to read the password from a simple text file that contains just the password. Go ahead and create this file at ~/.secrets/ansible-vault-pass or wherever you prefer as long as the file is outside of the repository if you are using a version control system.

pipelining = True

We set this option to true because it can significantly improve performance with SSH connections from Ansible on your computer to the servers. See this page for more information.

enable_plugins = hcloud

Here we instruct Ansible to enable the Hetzner Cloud plugin, which allows Ansible to query Hetzner Cloud for the information on the existing servers. Alternatively we could hardcode this information in a plain .ini or .yml file, but a dynamic inventory is easier.

The inventory file is a YAML file that simply tells Ansible to use the Hetzner Cloud plugin. Create the file /inventories/load_balancer/hosts.hcloud.yml with the following line:

plugin: hcloud

The .hcloud part in the name of the file is important.

Step 3 - Ansible: load balancer configuration and secrets

With Ansible we can manage the configuration for our servers in a YAML file at /inventories/load_balancer/group_vars/all/vars.yml, and secrets in a vault at /inventories/load_balancer/group_vars/all/vault.yml. To create the vault, run:

ansible-vault create /inventories/load_balancer/group_vars/all/vault.yml

We don't need to enter a password since Ansible will read it from the file we specified in the previous configuration file. The command above will open the vault file in the default text editor and will allow you to enter the secrets with the same syntax you would use in a regular YAML file. Type the following secrets:

---
keepalived_password: <8 characters password>
hetzner_cloud_token: <your Hetzner Cloud token>

The keepalived password is required to ensure that the instances of keepalived on the servers can communicate with each other with some simple authentication. This traffic is not encrypted, but no sensitive information is exchanged - just the status of the monitored process, which in this case is haproxy as we'll see in a bit. The other secret we need is the Hetzner Cloud token; we need this token in a script that keepalived uses to assign the floating IPs. We'll see this script in a moment.

Once you save the secrets file, Ansible will automatically encrypt it with your password (you can confirm by opening the file with a regular text editor).

Next, we need to create some configuration for our load balancer. Create the file /inventories/load_balancer/group_vars/all/vars.yml with the following content:

---
load_balancer:
  name: lb
  max_connections: 10000
  master_server_ip: "203.0.113.1"
  backup_server_ip: "203.0.113.2"
  groups:
    - name: http
      floating_ip: "203.0.113.3"
      balancers:
        - mode: tcp
          algorithm: roundrobin
          frontend_port: "80"
          backend_servers:
            - host: "10.0.0.1"
              port: "30080"
              options: "check send-proxy-v2"
            - host: "10.0.0.2"
              port: "30080"
              options: "check send-proxy-v2"
        - mode: tcp
          algorithm: roundrobin
          frontend_port: "443"
          backend_servers:
            - host: "10.0.0.1"
              port: "30443"
              options: "check send-proxy-v2"
            - host: "10.0.0.2"
              port: "30443"
              options: "check send-proxy-v2"

So we are giving the load balancer a name, lb, and specifying that we want this load balancer to accept max 10K concurrent connections; we also specify the IP addresses for the master server and backup server (make sure your replace the IP addresses in the example with yours). We then have a load balancing "group" with the relevant floating IP, as well as one or more balancers. This is arbitrary terminology but hopefully it makes sense. Each balancer will distribute incoming traffic to a specific frontend port on the given floating IP, to the backend servers; each backend server is identified by the IP and the port the service or application is listening to, as well as one or more options specific to the backend server. mode can be either http or, if you need to terminate TLS connections at the backend servers level, tcp - this mode will simply forward raw traffic to the backend servers. As for the options, check is required for haproxy to periodically ping the backend servers to find which ones are healthy and can handle traffic; send-proxy-v2 is recommended if you need to preserve the original IP address of the user visiting your application - without this, your application will "see" the IP address of the load balancer instead, breaking any authorisation system or other features that may be dependent on the IP address of the user. Please keep in mind that if you enable send-proxy-v2 the backends must be configured to accept the proxy protocol header that haproxy will send them.

Step 4 - Ansible: the playbook

Now that we have the configuration out of the way, we can create the Ansible playbook. Create the file /load_balancer.yml with the following content:

---
- name: Load balancer provisioning
  hosts: all
  remote_user: root
  roles:
    - role: load_balancer

Here we tell Ansible to execute this playbook on all the servers in the inventory (in our case, the master and backup servers), and specifically to run the load balancer role. Ansible roles are a simple way of organizing code for easier maintainability.

In our role, we'll define tasks to be performed on the servers, as well as handlers and templates. Handlers are triggered when we use the notify keyword for a task, and are typically used to restart services when configuration files change. Templates, as the name suggests, are templates for configuration files that can be constructed by Ansible interpolating configuration and secrets with text.

Create the file /roles/load_balancer/tasks/main.yml with the following content:

---
- include_tasks: floating_ips.yml

Rather than having all the tasks in a single file, we'll split them into separate files so they are easier to manage (we'll add a couple more later).

Step 4.1 - Floating IPs

In order for a server to respond to connections using a floating IP, two things are required:

the floating IP must be assigned to that server;
the server must have a network interface configured with the floating IP.

We'll see how to automate the assignment of the IP with keepalived later. For now, create the file /roles/load_balancer/tasks/floating_ips.yml with the following content:

- name: Configure floating IPs
  template:
    src: "60-my-floating-ips.cfg.j2"
    dest: "/etc/network/interfaces.d/60-my-floating-ips.cfg"
  notify: Restart network

We need only one task, that simply creates the file /etc/network/interfaces.d/60-my-floating-ips.cfg on both servers using the template file 60-my-floating-ips.cfg.j2 - the '.j2' extension stands for jinja2 templating format. When the contents of this file change, this will trigger the handler that will restart the networking service to apply the changes. The handler is defined in the file /roles/load_balancer/handlers/main.yml with the following:

---
- name: Restart network
  service:
    name: networking
    state: restarted

Next, let's create the template in /roles/load_balancer/templates/60-my-floating-ips.cfg.j2 with the following content:

{% for load_balancer_group in load_balancer.groups %}
auto eth0:{{ loop.index }}
iface eth0:{{ loop.index }} inet static
    address {{ load_balancer_group.floating_ip }}
    netmask 32

{% endfor %}

Here we are using the information we added to the vars.yml file, to configure the floating IP addresses with a loop.

Before we proceed with more code, let's test that what we have done so far works. In order to use the dynamic inventory with Ansible, we need to set the HCLOUD_TOKEN environment variable to the Hetzner Cloud token. Let's export it so it applies to the subsequent commands:

export HCLOUD_TOKEN=<your Hetzner Cloud token>

Then, to test the first tasks of our playbook, run:

ansible-playbook -i inventories/load_balancer load_balancer.yml

The ansible-playbook command will perform the tasks defined in the playbook load_balancer.yml against the hosts defined in the inventory specified. Because we are passing a directory, Ansible will use any files in that directory that are of a recognised format as inventory. In our case it's /inventories/load_balancer/hosts.hcloud.yml. Ansible will see that the Hetzner Cloud plugin is used and will query Hetzner Cloud to find out which servers exist in the project identified by our token.

If all went well, Ansible will run the first tasks we have defined against both the master and the backup servers, configuring the primary network interface with the floating IP addresses. You can verify this by SSH'ing into the servers and running ifconfig.

Step 4.2 - haproxy

To install and configure haproxy, add the following line to /roles/load_balancer/tasks/main.yml:

- include_tasks: haproxy.yml

Then create /roles/load_balancer/tasks/haproxy.yml with the following content:

---
- name: Install haproxy
  apt:
    name: "haproxy"
    state: present
    update_cache: yes
- name: Configure haproxy
  template:
    src: haproxy.cfg.j2
    dest: /etc/haproxy/haproxy.cfg
  notify: Restart haproxy

Again, a couple of very simple tasks. First, we update the apt cache and install the haproxy package, then we update the haproxy configuration using a template. Whenever the contents of this configuration file change, haproxy will be restarted. For this to happen, we need to add a handler to /roles/load_balancer/handlers/main.yml:

- name: Restart haproxy
  service:
    name: haproxy
    state: restarted

Next, create the template in /roles/load_balancer/templates/haproxy.cfg.j2:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 10s
        user haproxy
        group haproxy
        daemon
        maxconn {{ load_balancer.max_connections }}

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        # An alternative list with additional directives can be obtained from
        #  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

defaults
        log     global
        mode    tcp
        option  tcplog
        option  dontlognull
        timeout connect 5000
        timeout client  10000
        timeout server  10000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

{% for group in load_balancer.groups %}
{% for balancer in group.balancers %}

frontend {{ group.name }}-{{ balancer.frontend_port }}
  mode {{ balancer.mode }}
  bind {{ group.floating_ip }}:{{ balancer.frontend_port }}
  option tcplog
  default_backend {{ group.name }}-{{ balancer.frontend_port }}

backend {{ group.name }}-{{ balancer.frontend_port }}
  balance {{ balancer.algorithm }}
  mode {{ balancer.mode }}
  {% for server in balancer.backend_servers %}
  server {{ group.name }}-{{ balancer.frontend_port }}-{{ loop.index }} {{ server.host }}:{{ server.port }} {{ server.options }}
  {% endfor %}

{% endfor %}
{% endfor %}

You can mostly ignore the first good half of this configuration file, apart from the maxconn setting which is set to the max number of concurrent connections which we have specified in the vars.yml file. What's interesting here is the second half of the file, where we use Ansible loops to dynamically create a section for each balancer configured in vars.yml. For each balancer, we define a frontend section, which binds to the floating IP of the relevant load balancing group on the frontend port of the balancer, as well as the backend section with the backend servers. As you can see, we also use the configuration settings for the load balancing mode and algorithm.

The haproxy part of our playbook is ready, so let's run the playbook again to install and configure haproxy:

ansible-playbook -i inventories/load_balancer load_balancer.yml

You will notice that Ansible shows the tasks for the floating IPs configuration in green, meaning that these tasks are no-ops since they have already been performed and the actual state of the servers matches the desired state defined in the tasks file. SSH into the servers and check that the haproxy process is running; also check /var/log/haproxy.log to see if there are any errors.

Step 4.3 - keepalived

The final piece of the puzzle is keepalived. This process will monitor the haproxy process on both master and backup servers, and keep the floating IPs assigned to the master as long as the master is up and its haproxy instance is running. In case haproxy stops working on the master, or the master goes down, keepalived will reassign the floating IPs to the backup server very quickly.

First, add the following handler to /roles/load_balancer/handlers/main.yml so that keepalived can be restarted if required:

- name: Restart keepalived
  service:
    name: keepalived
    state: restarted

Next, add the following line to /roles/load_balancer/tasks/main.yml:

- include_tasks: keepalived.yml

Then create the file /roles/load_balancer/tasks/keepalived.yml. This file is more complex than the ones for the floating IPs and haproxy, so let's tackle a few steps per time. Add the following to that file:

---
- name: Install packages
  apt:
    name: ['libssl-dev', 'build-essential']
    state: present
    update_cache: yes

- name: Check if keepalived has already been installed
  stat:
    path: /etc/_keepalived_installed
  register: "_keepalived_installed"

- name: Download keepalived
  get_url:
    url: "{{ keepalived_source_url }}"
    dest: "/tmp/keepalived.tar.gz"
  register: keepalived_source
  when:
    - not _keepalived_installed.stat.exists

- name: Unpack keepalived
  unarchive:
    copy: no
    dest: /tmp/
    src: "{{ keepalived_source.dest }}"
  when:
    - not _keepalived_installed.stat.exists
    - keepalived_source.changed
  register: keepalived_source_unpack

Here we are installing a couple of dependencies required to build keepalived from source. While the version of haproxy in the Ubuntu 18.04 repository is decently recent, the version of keepalived is fairly old instead. Therefore we are going to compile and install from source this time. Remember that one benefit of Ansible is that a playbook can be executed many times and the outcome will always be the same. In this case, because we are compiling from source we need to make the relevant tasks idempotent as a group. To do this, we'll use an empty file at /etc/_keepalived_installed. If this file exists, then we'll assume that keepalived has already been installed, and these steps will be skipped altogether. So we check first if the file exists, and "register" the result of this check in a variable named _keepalived_installed. Next we download and unpack the archive with the keepalived source only if that file doesn't exist. As you can see, here we are using some variables concerning keepalived. These won't change often, so we can just add them as defaults in /roles/load_balancer/defaults/main.yml:

---
keepalived_version: 2.0.20
keepalived_source_url: "http://www.keepalived.org/software/keepalived-{{ keepalived_version }}.tar.gz"
keepalived_install_dir: "/tmp/keepalived-{{ keepalived_version }}"
keepalived_conf_path: /etc/keepalived/keepalived.conf
keepalived_ip_switch_script_path: /etc/keepalived/master.sh
keepalived_network_interface: eth0
hetzner_cloud_cli_version: "v1.16.1"
hetzner_cloud_cli_url: "https://github.com/hetznercloud/cli/releases/download/{{ hetzner_cloud_cli_version }}/hcloud-linux-amd64.tar.gz"

Here we are also setting a couple of variables for the Hetzner Cloud CLI - we'll see why in a moment.

Next, add the following steps to keepalived.yml:

- name: Configure keepalived source
  command: "./configure"
  args:
    chdir: "{{ keepalived_install_dir }}"
  when:
    - not _keepalived_installed.stat.exists
    - keepalived_source_unpack.changed
  register: keepalived_configure

- name: Install keepalived
  become: yes
  shell: make && make install
  args:
    chdir: "{{ keepalived_install_dir }}"
  when:
    - not _keepalived_installed.stat.exists
    - keepalived_configure.changed

These steps simply configure and install keepalived from source as it's typically done on *nix systems. The next steps we need to add to keepalived.yml are as follows:

- name: Set keepalived variables for master server
  set_fact:
    keepalive_unicast_src_ip: "{{ load_balancer.master_server_ip }}"
    keepalive_unicast_peer: "{{ load_balancer.backup_server_ip }}"
    keepalive_role: "MASTER"
    keepalive_priority: "200"
  when: "ansible_host == load_balancer.master_server_ip"

- name: Set keepalived variables for backup server
  set_fact:
    keepalive_unicast_src_ip: "{{ load_balancer.backup_server_ip }}"
    keepalive_unicast_peer: "{{ load_balancer.master_server_ip }}"
    keepalive_role: "BACKUP"
    keepalive_priority: "100"
  when: "ansible_host == load_balancer.backup_server_ip"

- name: Create config directory
  file:
    path: /etc/keepalived
    state: directory

- name: Create keepalived conf file
  become: yes
  template:
    src: keepalived.conf.j2
    dest: "{{ keepalived_conf_path }}"
  notify: Restart keepalived

Here we are setting some variables with different values depending on whether the server is the master server, or the backup server. When using a dynamic inventory, the built-in variable ansible_host will have as value the IP address of the server, so we can compare this IP with the IPs specified in vars.yml for the master and the backup servers, to see which server these tasks are running on. If it's the master, keepalived will run in MASTER mode with higher priority, otherwise it will run in BACKUP mode with lower priority. The different priority ensures that the floating IPs be assigned to the backup server only if the master is down. The following steps create a configuration file for keepalived from the template at /roles/load_balancer/templates/keepalived.conf.j2, so let's create this file:

global_defs {
  script_user root
  enable_script_security
}
vrrp_script chk_haproxy {
  script "/usr/bin/pgrep haproxy"
  interval 2
}
vrrp_instance VI_1 {
  interface eth0
  state {{ keepalive_role }}
  priority {{ keepalive_priority }}
  virtual_router_id 33
  unicast_src_ip {{ keepalive_unicast_src_ip }}
  unicast_peer {
    {{ keepalive_unicast_peer }}
  }
  authentication {
    auth_type PASS
    auth_pass {{ keepalived_password }}
  }
  track_script {
    chk_haproxy
  }
  notify_master /etc/keepalived/master.sh
}

The important bit in this configuration file to note is the /etc/keepalived/master.sh script, which will be executed by keepalived when it needs to assign the floating IPs to the server it is running on. Let's create a template for this file in templates/master.sh.j2 with the following content:

#!/bin/bash
export HCLOUD_TOKEN='{{ hetzner_cloud_token }}'

ME=`hcloud server describe $(hostname) | head -n 1 | sed 's/[^0-9]*//g'`

{% for group in load_balancer.groups %}

HTTP_IP_CURRENT_SERVER_ID=`hcloud floating-ip describe {{ load_balancer.name }}-{{ group.name }} | grep 'Server:' -A 1 | tail -n 1 | sed 's/[^0-9]*//g'`

if [ "$HTTP_IP_CURRENT_SERVER_ID" != "$ME" ] ; then
  n=0
  while [ $n -lt 10 ]
  do
    hcloud floating-ip unassign {{ load_balancer.name }}-{{ group.name }}
    hcloud floating-ip assign {{ load_balancer.name }}-{{ group.name }} $ME && break
    n=$((n+1))
    sleep 3
  done
fi
{% endfor %}

Here we are using the Hetzner Cloud CLI to check whether the floating IPs are assigned to the other server. If that's the case, the script will use the Hetzner Cloud CLI to first unassign the IPs and then reassign them to the current server.

Back to the tasks file keepalived.yml, let's add the task that creates this script on the servers:

- name: Create floating IP assignment script
  become: yes
  template:
    src: master.sh.j2
    dest: "{{ keepalived_ip_switch_script_path }}"
    mode: +x

The next tasks are required to download and install the Hetzner Cloud CLI:

- name: Check if Hetzner Cloud CLI has already been downloaded
  stat:
    path: /etc/_hcloud_cli_installed
  register: "_hcloud_cli_installed"

- name: Download Hetzner Cloud CLI
  get_url:
    url: "{{ hetzner_cloud_cli_url }}"
    dest: "/tmp/hcloud_cli.tar.gz"
  register: keepalived_files
  when:
    - not _hcloud_cli_installed.stat.exists

- name: Unpack Hetzner Cloud CLI
  unarchive:
    copy: no
    dest: /tmp/
    src: "{{ keepalived_files.dest }}"
  when:
    - not _hcloud_cli_installed.stat.exists
    - keepalived_files.changed
  register: keepalived_files_unpack

- name: Move Hetzner Cloud CLI
  command: mv /tmp/hcloud /usr/local/bin/
  when:
    - not _hcloud_cli_installed.stat.exists
    - keepalived_files_unpack.changed

- name: Make Hetzner Cloud CLI executable
  file:
    path: /usr/local/bin/hcloud
    mode: +x

- name: Mark Hetzner Cloud CLI as installed
  file:
    path: /etc/_hcloud_cli_installed
    state: touch

We are determining whether the CLI has already been installed by checking if a file exists. If it doesn't we download the CLI and make it executable; finally we create the file that prevents Ansible from installing the CLI again in subsequent runs.

The final steps we need to add to keepalived.yml are the following:

- name: Install keepalived service
  become: yes
  template:
    src: keepalived.service.j2
    dest: /etc/systemd/system/keepalived.service
  notify: Restart keepalived

- name: Start keepalived
  become: yes
  service:
    name: keepalived
    state: started
    enabled: yes

- name: Mark keepalived as installed
  file:
    path: /etc/_keepalived_installed
    state: touch

The above steps create a file required for the keepalived service to start at boot, ensure that keepalived is running, and then we create a file that prevents keepalived from being installed again when we re-run the playbook.

Phew, this tasks file was quite long, but as you can see it's pretty simple and you will just need to copy and paste now that you know how it works. Before running the playbook again, we need to create a final template for the service in templates/keepalived.service.j2:

#
# keepalived control files for systemd
#
# Incorporates fixes from RedHat bug #769726.
[Unit]
Description=LVS and VRRP High Availability monitor
After=network.target
ConditionFileNotEmpty=/etc/keepalived/keepalived.conf
[Service]
Type=simple
# Ubuntu/Debian convention:
EnvironmentFile=-/etc/default/keepalived
ExecStart=/usr/local/sbin/keepalived --dont-fork
ExecReload=/bin/kill -s HUP $MAINPID
# keepalived needs to be in charge of killing its own children.
KillMode=process
[Install]
WantedBy=multi-user.target

We can now run the playbook again:

ansible-playbook -i inventories/load_balancer load_balancer.yml

If all went well, as soon as keepalived starts it will check whether haproxy is running on the master, and if that's the case it will automatically assign the floating IPs to the master. You can verify this either by using the CLI or by checking the Hetzner Cloud web console.

The master should now respond to pings with the floating IPs. To test the failover, simply stop haproxy on the master with

service haproxy stop

Within a couple of seconds, you will see the floating IPs automagically reassigned to the backup server. Start haproxy on the master again, and the floating IPs will be quickly moved back to the master server.

Conclusion

If you followed along, you will have learnt the basics of configuration management with Ansible while provisioning a highly available load balancer in Hetzner Cloud. Furthermore, the playbook we wrote can be configured to provision any number of totally distinct load balancers and configure each of them with as many rules as we want, all by reusing the same code. Happy load balancing!

How-to: Provisioning a highly available load balancer in Hetzner Cloud with Ansible

Author

Published

Time to read

Table of Contents