Introduction
paperless-ng is a fork of Paperless, an application that can be used to index your scanned documents. At the time of writing this tutorial, paperless-ng is not 100% done, so it is likely you will need to regularly update your docker image for the latest features.
Please note that this should not be run available on the public internet. Either limit the IP access to your origin IP or workaround this via VPN (there is already a great pfSense tutorial and hcloud private networks.
Prerequisites
This tutorial will assume there is already a VPN connection available to the server and it can be reached via its private hcloud network IP.
** paperless-ng host **
- Hostname:
dms.yourdomain.com
- OS: Debian 10
- Size: CPX11
- IP address:
10.0.1.1
** Storage box **
- Hostname:
uXXXXX.your-storagebox.de
- Username:
uXXXXX
Step 1 - Preparation of paperless-ng host
Install the paperless-ng host at the closest datacenter to your location and provide the following cloud-init script to kick start the installation.
This will update the base-system and install docker, docker-compose and required libraries and tools in order to talk to the scanner and storagebox.
#cloud-config
package_upgrade: true
packages:
- cifs-utils
- sane-utils
- imagemagick
runcmd:
- curl -fsSL https://get.docker.com -o get-docker.sh
- sudo sh get-docker.sh
- sudo curl -L "https://github.com/docker/compose/releases/download/1.28.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
- sudo chmod +x /usr/local/bin/docker-compose
Step 1.1 - Mount storagebox
Create a credential file /root/.storageboxcred
:
username=uXXXXX
password=<STORAGEBOX-PASSWORD>
To auto-mount the storage box on boot add the following line at the end of /etc/fstab
:
//uXXXXX.your-storagebox.de/backup /mnt/storagebox cifs credentials=/root/.storageboxcred,uid=root,forceuid,gid=root,forcegid,file_mode=0777,dir_mode=0777 0 0
Create the mount folder mkdir /mnt/storagebox
, mount the storagebox with mount -a
and create a folder called dms within using mkdir /mnt/storagebox/dms
.
Step 1.2 - Prepare paperless-ng Docker
Create a folder in your root directory to place the docker-compose.yml
and docker-compose.env
files into, and navigate to it. This tutorial will use /root/paperless-ng
.
Create docker-compose.env
and insert the following lines:
COMPOSE_PROJECT_NAME=paperless
PAPERLESS_OCR_LANGUAGE=deu
PAPERLESS_OCR_CLEAN=clean
If you plan on using paperless-ng only for english documents you may remove PAPERLESS_OCR_LANGUAGE
. If you want to use other languages please specify them in a comma separated list according to ISO 639.2 specification.
PAPERLESS_OCR_CLEAN
will clean up images to yield nicer results. Not required but works fine on CPX11 machines so why not?
Create docker-compose.yml
accordingly:
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
db:
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: jonaswinkler/paperless-ng:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 80:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- /mnt/storagebox/dms/data:/usr/src/paperless/data
- /mnt/storagebox/dms/media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- /var/lib/paperless-ng/consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
volumes:
pgdata:
Consumption directory
For security reasons create a service user useradd -r brscan-skey
and add it to the scanner group via gpasswd brscan-skey scaner
. To properly map the user and group to the containers we require the user_id
and group_id
of the brscan-skey
user within the docker-compose.env
file. To accomplish this we run this one-liner:
printf "USERMAP_UID=$(id -u brscan-skey)\nUSERMAP_GID=$(id -g brscan-skey)" >> docker-compose.env
Next we create the consumer folder for the paperless-ng instance to monitor: mkdir /var/lib/paperless-ng && mkdir /var/lib/paperless-ng/consume
.
The folder is still owned by the root
user so we have to change the owner and group to the brscan-skey
user/group and fix the permissions.
chown $(id -u brscan-skey):$(id -u brscan-skey) -R /var/lib/paperless-ng/
chmod 700 -R /usr/share/paperless-ng/
Step 1.3 - Prepare Brother environment
Brother Linux drivers are available as .deb packages for most of the devices. Just visit their homepage and find your driver.
As paperless-ng does not print we only need the scan drivers and the brscan-skey utility tool. This utility enables initiating the scan directly at the printer device (if your model supports this).
Required files:
- Scanner driver 64bit (deb package)
- Scan-key-tool 64bit (deb package)
Download them and upload them to the machine or get the exact download path with CTRL+J
(Chrome) and wget
on the machine.
First install for example brscan4 via dpkg -i --force-all brscan4-0.4.10-1.amd64.deb
and then install brscan-skey via dpkg -i --force-all brscan-skey-0.3.1-2.amd64.deb
. Verify dependencies of installation via apt install -f
.
Next we delete all entries from the no longer working brother udev rules nano /etc/udev/rules.d/60-brother-libsane-type1.rules
and insert:
ATTR{idVendor}=="04f9", MODE="0666", GROUP="scanner", ENV{libsane_matched}="yes", SYMLINK+="scanner-%k"
The scanner prerequisites are now satisfied and we can configure the sane-service via brsaneconfig4 -a name=FRIENDLY-NAME model=MODEL-NAME ip=xx.xx.xx.xx
. For MODEL-NAME
chose the exact model name e.g. MFC-L2710DW.
Note that the printer needs to have a static IP configured, either via settings or DHCP lease.
For example this can look like: brsaneconfig4 -a name=Scanner -model=MFC-L2710DW -ip=10.1.1.40
. To verify installation run brscan-skey && brscan-skey -l
and scanimage -L
. This should yield something like:
root@dms:~/paperless-ng# scanimage -L
device `brother4:net1;dev0' is a Brother Scanner MFC-L2710DW
Next we create a custom service to autostart brscan-skey on boot. Create a file nano /etc/systemd/system/brscan-skey.service
and paste:
[Unit]
Description=Brother Scan SKey Service
[Service]
Type=forking
Restart=always
User=brscan-skey
ExecStart=/opt/brother/scanner/brscan-skey/brscan-skey
ExecStop=/opt/brother/scanner/brscan-skey/brscan-skey --terminate
[Install]
WantedBy=multi-user.target
Reload systemctl daemon systemctl daemon-reload
. Then enable systemctl enable brscan-skey.service
and start systemctl start brscan-skey.service
our newly created service.
Next we have to specify the interface from which the printer is reachable, in my case enp7s0 (can vary depending on what kind of hcloud machine type you are running on -> check via ip a
).
nano /opt/brother/scanner/brscan-skey/brscan-skey.config
and append the following line eth=enp7s0
.
After this we create our own scan script to merge multiple sites into one multi site .tiff document. nano /opt/brother/scanner/brscan-skey/script/scantofile_v2.sh
#! /bin/sh
TMP=$(mktemp -d)
DSTDIR='/var/lib/paperless-ng/consume'
NAME=scan_$(date "+%d.%m.%Y-%H:%M:%S").tiff
scanimage -L
scanimage --device-name="brother4:net1;dev0" --resolution 200 --format=png --batch=$TMP"/out-%d.png" -x 210 -y 297
cd $TMP
convert out-*.png $NAME
mv $NAME $DSTDIR
rm -rf $TMP
NOTE: Replace brother4:net1;dev0
with the appropriate device name displayed when scanimage -L
is called. If you scan something else than A4 please change -x
and -y
parameters accordingly.
Resolution of 200dpi is more than enough for Tesseract OCR.
This script scans all images on the document feeder (the automatic thingy that takes a stack of pages and scans them one by one) as .png files to a /tmp directory, converts them to a multi page .tiff, moves that to the consume directory and deletes the temp directory.
In order for brscan-skey to call our script we need to change the config again nano /opt/brother/scanner/brscan-skey/brscan-skey.config
and replace FILE="bash /opt/brother/scanner/brscan-skey/script/scantofile.sh"
with FILE="bash /opt/brother/scanner/brscan-skey/script/scantofile_v2.sh"
.
Step 2 - Initialize paperless-ng
Everything is prepared by now. So we first reboot the machine with reboot
and login again in order to apply udev rules properly and also validate our new service.
systemctl status brscan-skey.service
should yield:
root@dms:~/paperless-ng# systemctl status brscan-skey.service
● brscan-skey.service - Brother Scan SKey Service
Loaded: loaded (/etc/systemd/system/brscan-skey.service; enabled; vendor preset: enabled)
Active: active (running) since XXXX
Process: 558 ExecStart=/opt/brother/scanner/brscan-skey/brscan-skey (code=exited, status=0/SUCCESS)
Main PID: 567 (brscan-skey-exe)
Tasks: 4 (limit: 2296)
Memory: 20.7M
CGroup: /system.slice/brscan-skey.service
└─567 /opt/brother/scanner/brscan-skey/brscan-skey-exe
Mar 05 20:40:54 dms systemd[1]: Starting Brother Scan SKey Service...
Mar 05 20:40:54 dms systemd[1]: Started Brother Scan SKey Service.
Change into the paperless-ng directory cd ~/paperless-ng
and pull the docker images docker-compose pull
. Next we will initialize the postgres database and create a superuser for it. docker-compose run --rm webserver createsuperuser
.
Follow the prompts within the console, you may use a made up e-mail (i.e. admin@admin.invalid. Always use .invalid). Complete initialization with docker-compose up -d
.
You can now login at http://IP/. As this tutorial assumes you utilize a secure network (i.e. a secured VPN connection) there is no https in order to have the machine completely unreachable from the internet.
Conclusion
Documents scanned directly from the device will now be placed in /var/lib/paperless-ng/consume
by our scanner script /opt/brother/scanner/brscan-skey/script/scantofile_v2.sh
from our custom service /etc/systemd/system/brscan-skey.service
.