Get Rewarded! We will reward you with up to €50 credit on your account for every tutorial that you write and we publish!

Hosting an AI chatbot with Ollama and Open WebUI

profile picture
Author
Svenja Michal
Published
2024-08-19
Time to read
12 minutes reading time

Introduction

This tutorial explains how to install Ollama to run language models on a server with Ubuntu or Debian. It also shows how to setup a chat interface with Open WebUI, and how to use a custom language model.

webui-preview

Prerequisites

  • One server with Ubuntu/Debian

    • You need access to the root user or a user with sudo permissions.
    • Before you start, you should complete some basic configuration, including a firewall.
  • Minimum hardware requirements

    • CPU: Ideally an Intel/AMD CPU that supports AVX512 or DDR5. To check, you can run:
      lscpu | grep -o 'avx512' && sudo dmidecode -t memory | grep -i "DDR5"
    • RAM: 16GB
    • Disk space: about 50GB
    • GPU: Not mandatory

You can choose to either install Ollama and the web UI on the same server or on two separate servers. If you install them on separate servers, you will need two servers with Ubuntu/Debian.

The tutorial was tested with Ubuntu 24.04 and Debian 12.

Step 1 - Install Ollama

The following step explains how to install Ollama manually. For a quick start, you can use the installation script and continue with "Step 2 - Install Ollama WebUI".

Installation script: curl -fsSL https://ollama.com/install.sh | sh
If you get a connection error, check if your server has a public IPv4 address.

To install Ollama yourself, follow these steps:

  • If your server has an Nvidia GPU, make sure the CUDA drivers are installed
    nvidia-smi
    If the CUDA drivers are not installed, do this now. In this configurator, you can select your operating system and choose an installer type to view the commands you need to run with your setup.
    sudo apt update
    sudo apt install -y nvidia-kernel-open-545
    sudo apt install -y cuda-drivers-545

  • Download the Ollama binary and create a Ollama user
    sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
    sudo chmod +x /usr/bin/ollama
    sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

  • Create a service file.

    By default, you can access the Ollama API via 127.0.0.1 port 11434. This means the API is only available of the localhost.

    • If you need to access Ollama externally, you can uncomment Environment and set an IP address to access the Ollama API. 0.0.0.0 will allow you to access the API via the public IP of the server. If you do use Environment, make sure the firewall on your server allows access to the port you set, here 11434.

    • If you only have one server, you don't need to modify the command below.

    Copy and paste the entire content of the code block below. This will create the new file /etc/systemd/system/ollama.service and add the content between EOF to the new file.

    sudo bash -c 'cat <<'EOF' >> /etc/systemd/system/ollama.service
    [Unit]
    Description=Ollama Service
    After=network-online.target
    
    [Service]
    ExecStart=/usr/bin/ollama serve
    User=ollama
    Group=ollama
    Restart=always
    RestartSec=3
    #Environment="OLLAMA_HOST=0.0.0.0:11434"
    
    [Install]
    WantedBy=default.target
    
    EOF'

  • Reload the systemd daemon and enable the Ollama service

    sudo systemctl daemon-reload
    sudo systemctl enable ollama
    sudo systemctl start ollama

    You can use ollama --version to double-check if the installation was successful.

    Use systemctl status ollama to check the status. If Ollama is not active and running, make sure you ran systemctl start ollama.

In the terminal, you can now start language models and ask questions. For example:

ollama run llama2

You can use ollama rm <model-name> to delete a model.

The next step explains how to install a web UI so that you can ask questions in a nice user interface via a web browser.

Step 2 - Install Open WebUI

In the Ollama documentation on GitHub, you can find a list of different web and terminal integrations. This example explains how to install Open WebUI.

Open WebUI preview

You can choose to either install Open WebUI on the same server as Ollama, or install Ollama and Open WebUI on two separate servers. If you install Open WebUI on a separate server, make sure the Ollama API is exposed to your network. To double-check, view /etc/systemd/system/ollama.service on the server that has the Ollama installed and verify the value of OLLAMA_HOST.

The steps below explain how to install the user interface:

Install Open WebUI manually

  • Install npm and pip, clone the WebUI repository, and create a copy of the example environment file:

    sudo apt update && sudo apt install npm python3-pip python3-venv git -y
    git clone https://github.com/open-webui/open-webui.git
    cd open-webui
    cp -RPp .env.example .env

    In .env, the address to connect to the Ollama API is set to localhost:11434 by default. If you installed the Ollama API on the same server as your Open WebUI, you can leave this setting as is. If you installed Open WebUI on a separate server than the Ollama API, edit .env and replace the default value with the address of the server that has Ollama installed.

  • Install the dependencies listed in package.json and run the script named build:

    npm i && npm run build
    Click here if you get an error

    If you get an error like Not compatible with your version, run the following commands to use the latest version:

    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
    source ~/.bashrc
    nvm install node && nvm use node
    npm install -g npm@latest

    Now try and run npm i && npm run build again.

  • Install required python packages:

    cd backend
    python3 -m venv venv && source venv/bin/activate
    pip install -r requirements.txt -U
  • Start the web UI with ollama-webui/backend/start.sh.

    bash start.sh

    If you installed Ollama on a different server, make sure the firewall on the server with Ollama allows access to the port of the API before you start the web UI. The default port for the API is 11434.

In start.sh, the port is set to 8080. This means, you can access Open WebUI at http://<ip-adress>:8080. If you have an active firewall on your server, you need to allow the port before you can access the chat UI. For this, you can now go to "Step 3 - Allow the port to the web UI". If you don't have a firewall, which is not recommended, you can now skip to "Step 4 - Add models".

Install Open WebUI with Docker

For this step, you need Docker installed. If you haven't installed Docker yet, you can do this now using this tutorial.

Click here for a quick installation of Docker
sudo apt update && sudo apt install docker.io apparmor -y

As mentioned before, you can choose to either install Open WebUI on the same server as Ollama, or install Ollama and Open WebUI on two separate servers.

  • Install Open WebUI on the same server as Ollama

    sudo docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  • Install Open WebUI on a different server than Ollama

    Before you start the Open WebUI, make sure the firewall on the server with Ollama allows access to the port of the API. The default port for the API is 11434.

    sudo docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=http://<ip-adress>:11434 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

    Replace <ip-adress>:11434 with the address of the server that has Ollama installed.

In the Docker command above, the port is set to 3000. This means, you can access Open WebUI at http://<ip-adress>:3000. If you have an active firewall on your server, you need to allow the port before you can access the chat UI. This is explained in the next step.

Step 3 - Allow the port to the web UI

If you have a firewall, make sure it allows access to the Open WebUI port. If you installed it manually, you need to allow port 8080 TCP. If you installed it with Docker, you need to allow port 3000 TCP.

To double check, you can use netstat and view which ports are used.

holu@<your-server>:~$ netstat -tulpn | grep LISTEN
Proto    Recv-Q    Send-Q    Local Address       Foreign Address     State
tcp           0         0    127.0.0.1:11434     0.0.0.0:*           LISTEN
tcp           0         0    0.0.0.0:8080        0.0.0.0:*           LISTEN

Port 11434 is used for the Ollama API. Port 8080 is used for the web interface.

There are several different firewall tools. This tutorial covers the default firewall configuration tool for Ubuntu ufw. If you're using a different firewall, make sure it allows incoming traffic to port 8080 or 3000 TCP.

Managing ufw firewall rules:

  • View current firewall settings
    To check if the ufw firewall is active and if you already have any rules, you can use:

    sudo ufw status

    Example output:

    holu@remote-server:~# ufw status
    Status: active
    
    To                         Action      From
    --                         ------      ----
    OpenSSH                    ALLOW       Anywhere
    OpenSSH (v6)               ALLOW       Anywhere (v6)
  • Allow port 8080 or 3000 TCP
    If the firewall is active, run this command to allow incoming traffic to port 8080 or 3000 TCP:

    sudo ufw allow proto tcp to any port 8080

    If you used Docker, replace 8080 with 3000.

  • View new firewall settings
    The new rules should now be added. To check, use:

    sudo ufw status
If you need to delete any rules, click here.

To delete rules, you can use the following commands:

sudo ufw status numbered   # List all rules with numbers
sudo ufw delete <number>   # Delete a rule by specifying the number

Step 4 - Add models

After you accessed the web UI, you need to create the first user account. This user will have admin priviliges. To start your first chat, you need to select a model. You can browse a list of models on the official Ollama website. In this example, we will add "llama2".

  • In the bottom left corner, click on the user and select "Admin Panel". settings icon
  • Go to "Settings" » "Models", enter a model and select the download button. download model Wait until this message pops up:
    Model 'llama2' has been successfully downloaded.

    If you get an error such as WebUI could not connect to Ollama, it can mean that Ollama is not active and running. On the server, use systemctl status ollama to check the status and make sure you ran sudo systemctl start ollama.

  • Go back to the chat, click on "Select a model" at the top and add your model. 04 chatui model llama2 en
  • If you want to add several models, you can use the + sign at the top. several models
  • Once you added the models you want to use, you can start asking your questions. If you added several models, you will get several answers. switch-answers

Step 5 - Add you own model

If you want to add new models via the user interface, you can do so via http://<ip-adress>:8080/modelfiles/create/. Replace 8080 with 3000 if needed.

The following will focus on adding a new model via the terminal. First, you need to connect to the server that has ollama installed. Use ollama list to list the models that are available so far.

  • Create a model file
    You can find the requirements for a model file in the Ollama documentation on GitHub. In the first line of the model file FROM <model>, you can specify which model you want to use. In this example, we will modify the existing model llama2. If you want to add a completely new model, you need to specify the path to the model file (e.g. FROM ./my-model.gguf).

    nano new-model

    Save this content:

    FROM llama2
    
    # The higher the number, the more creative are the answers
    PARAMETER temperature 1
    
    # If set to "0", the model will not consider any previous context or conversation history when generating responses. Each input is treated independently.
    # If you set a high number such as "4096", the model will consider previous context or conversation history when generating responses. "4096" is the number of tokens that will be considered.
    PARAMETER num_ctx 4096
    
    # Set what "personality" the chat assistant should have in the responses. You can set "who" the chat assistant should respond as and in which style.
    SYSTEM You are a moody lama that only talks about its own fluffy wool.

    A list of available parameters is available on GitHub.

  • Create a model from the model file

    ollama create moody-lama -f ./new-model

    Replace moody-lama with a unique name for your new model.

  • Check if the new model is available
    Use the ollama command to list all models. moody-lama should also be listed.

    ollama list
  • Use your model in the WebUI
    When you go back to the web UI, the model should now also be in the list of Select a model. If it doen't show yet, you might need to refresh real quick.

Conclusion

In this tutorial, you learned how to host an AI chat on your own server and how to add your own models.

License: MIT
Want to contribute?

Get Rewarded: Get up to €50 in credit! Be a part of the community and contribute. Do it for the money. Do it for the bragging rights. And do it to teach others!

Report Issue

Discover our

GPU Server

Kickstart your AI project with our cost-efficient and powerful dedicated GPU servers.

Want to contribute?

Get Rewarded: Get up to €50 credit on your account for every tutorial you write and we publish!

Find out more