1. Present situation
I currently have 4 different Mayan installations running: 1 on a physical Ubuntu 22.04 LTS server and 3 others on Ubuntu 20.04 servers, each of the 3 servers being a virtual machine. Each of these installation is for different client projects (or for my own documents administration).
1.1 Downside of running all client project under a single Mayan EDMS instance with ACLs
Theoretically, Mayan has a very granular access control model that would allow to put everything into one single Mayan instance as I have shown a previous post on this blog. In practice, though, this approach is quite cumbersome as it would involve heavy ACL administration issues with each document uploaded.
On top of that clients often insist that their documents must be kept in a completely separated environment for reasons of confidentiality. They do indeed have a point when they insist that there should be more than a simple acces control list entry that prevents an outsider from getting access to their documents.
Furthermore, running one single Mayan instance for multiple clients would mean that backup jobs could only run under a “one size fits all” setup, which would effectively result in backing up the entire system at the highest of all frequencies required by customers.
1.2 Downside of running separate Mayan installations
Given the downsides of the single instance approach in 1.1, I opted for putting each project into a single separate Mayan EDMS installation.
While that spared me the downsides of ACL administration work, customer doubts about confidentiality problems and backup issues the solution comes with its own drawbacks:
Each machine (physical or virtual) hosting the server must be updated separately. Whenever there are security upgrades by Ubuntu, new releases from Mayan, the work multiplies as compared to the single Mayan instance.
The solution therefore has the typical drawbacks in terms of redundancy – both with regards to hours spent on system maintenance as well as to resource consumption.
Additionally, a classical dependency problem popped up: I first ran into problems because the new Ubuntu 22.04 LTS Server ships with Python 3.10 as the default version, while the direct deployment of Mayan until recently expected Python 3.8, so I had to trick around with the deadsnakes repository to have a Python 3.8 which I could put into the virtual environment. Now, I have two Ubuntu 20.04 LTS instances with default Python 3.8, but since Mayan 4.3, Python 3.10 is required.
1.3 Hoping to get the best of both worlds with Docker
So instead of running multiple servers with multiple Mayan installations, fiddling around with dependencies etc., I will give Docker a try. Ideally, there will be a virtual machine that holds has several docker-compose Mayan clusters, each for one Mayan installation. But before I will be able to try out whether this allows a less tedious maintenance, I will have some migration work to do. Allons-y!
2. Backing up the existing Mayan Server
According to the Mayan EDMS documentation, there are two components to back up:
- The content of the PostgreSQL data base
- The
/opt/mayan-edms/media
directory which holds all the files stuffed into the document manangement system
Prior to running the backup operations, we stop Mayan: sudo service supervisor stop
2.1 Backup PostgreSQL
We will follow the PostgreSQL documentation to dump the mayan database. Initially I thought it would be safer to dump the whole PostgreSQL server with all users, roles and passwords with pg_dumpall
instead of pg_dump
.
It turned out the opposite is true: The PostgreSQL data base which is set up by the Docker compose only has a single user mayan
which is both superuser and owner of the mayan
database. Therefore trying to restore a complete server with the default superuser postgres
will only cause errors during the restore operation. Go for the simple dump with of the mayan
database only and everything will be fine:
1 |
ilek@mayan2:~$ pg_dump -U mayan -d mayan -f mayanli_dump_20220906.backup |
2.2 Backup media folder
By default, the Mayan EDMS media
folder which holds all the files checked into the system in their original format (not with their original name, though but labelled by uuids), resides in /opt/mayan-edms/media
.
After a quick check to make sure that there is still enough space on our hard drive left…
1 2 3 4 5 6 |
ilek@mayan2:/opt/mayan-edms$ df -h Filesystem Size Used Avail Use% Mounted on [...output shortened...] /dev/sda2 41G 14G 26G 35% / [...output shortened...] tmpfs 394M 0 394M 0% /run/user/1000 |
… we proceed with the command suggested by the Mayan EDMS online documentation (we leave out the v
flag originally given for the tar command as this would result in an endless list of files churning through the terminal window and would slow down the backup):
1 2 |
ilek@mayan2:~$ sudo tar -zcf mayanli_media_20220906.tar.gz /opt/mayan-edms/media/ tar: Removing leading `/' from member names |
Finally make sure that our backup files have really been created on the source machine:
1 2 3 4 |
ilek@mayan2:~$ ls -alh | grep mayan -rw-rw-r-- 1 ilek ilek 45M Sep 6 14:19 mayanli_dump_20220906.backup -rw-r--r-- 1 root root 1.6G Sep 6 15:14 mayanli_media_20220906.tar.gz ilek@mayan2:~$ |
3. Preparing the new Mayan EDMS instance
3.1 Prepare working directory structure
As I would like to have all non-container data relating to Docker in a single place, I first created a docker
directory directly under my home
directory.
Inside the docker
directory I created a directory mayanli
that should hold all external data for a single mayan instance.
- the shell script that holds all the commands to spin up the mayan-edms containers
- the
docker-compose.yml
and the.env
file that we download in step 3 of the Mayan instructions. - the directories which will be needed for data backup and restore – in particular the
media
and thepostgres
volumes of our dockerized Mayan installation.
3.2 Prepare docker-compose files
Step 3 of the Mayan EDMS instruction downloads the files needed by Docker to create all necessary containers and their surrounding infrastructure (networks, volumes). Download the docker-compose.yml
and the .env
file into the mayanli
directory.
Edit the .env file as follows:
1 2 3 4 5 6 7 8 9 |
[...] COMPOSE_PROJECT_NAME=mayanli [...] MAYAN_APT_INSTALLS="tesseract-ocr-deu tesseract-ocr-fra" [...] MAYAN_DATABASE_NAME=mayan MAYAN_DATABASE_PASSWORD=yourPostgreSQLUserPassword MAYAN_DATABASE_USER=mayan [...] |
MAYAN_APP_VOLUME
and MAYAN_POSTGRES_VOLUME
in the .env
file did not work for me (while the custom postgres
directory is populated during the install, the custom media
directory remains completely empty. Furthermore, two additional volumes are still created under the default /var/lib/docker/volumes path
.4. Restoring our data
4.1 PostgreSQL restore
We start with the more tricky part of the operation: restoring PostgreSQL data. We need the PostgreSQL container running to pick up the dump.
- To have a running PostgreSQL installation we launch the docker cluster:
ilek@x220:~/docker/mayanli$ ./mayanli-up.sh
As docker creates the skeleton infrastructure with lots of disk and database operations, it is recommended to wait a minute until the next step. Check the mayanli_app_1 docker log to see if the process has finished. - As we only work on the PostgreSQL container, we immediately stop all other containers to prevent interferences while restoring our database snapshot:
ilek@x220:~/docker/mayanli$ docker stop mayanli_app_1
ilek@x220:~/docker/mayanli$ docker stop mayanli_redis_1
ilek@x220:~/docker/mayanli$ docker stop mayanli_rabbitmq_1
- From the host machine find out the IP address of the PostgreSQL container by issuing:
docker inspect mayanli_postgresql_1
In my case the output shows among other things:"IPAddress": "172.30.0.4"
- The PostgreSQL server installed by Docker has only a single user, which is the superuser at the same time:
mayan
. Thepg_hba.conf
file by default has been equipped with an entry to accept outside connections that are identified by an md5 hashed password. We can therefore connect from the host machine withpsql
:ilek@x220:~/docker/mayanli$ psql -U mayan -d postgres -h 172.30.0.4
- In order to prevent errors related to unique keys and similar things, we want to have a completely empty mayan database. Unfortunately, upon spinning up our mayanli cluster, there has also been some mayan data base content created that would conflict with our restore content. We therefore drop the mayan database…
postgres# DROP DATABASE mayan;
… recreate an empty database:postgres# CREATE DATABASE mayan;
and quit with\q
andexit
the container. - Now we are ready for the main part: Pushing our restore into the database. From the host machine issue the following command:
ilek@x220:~/docker/mayanli$ psql -U mayan -h 172.30.0.4 -d mayan < mayanli_pg_20220906.backup
- Restart the stopped containers of our Mayan cluster:
ilek@x220:~/docker/mayanli$ docker start mayanli_rabbitmq_1
ilek@x220:~/docker/mayanli$ docker start mayanli_redis_1
ilek@x220:~/docker/mayanli$ docker start mayanli_app_1
- Your browser should now respond to
http://localhost:80
with the Mayan login screen (depending on your machine resources, it might take a minute until all services have started and you get a neat Mayan EDMS login screen, you can issue adocker logs mayanli_app_1
to check in the logs if Docker has completed the startup – the final startup messages should be fromcelery
.).
4.2 media
folder
- If still running, shut down the Mayan docker cluster by executing the
./mayanli_down.sh
shell script. - Change to
root
withsudo su
and copy your.tar.gz
backup file into/var/lib/docker/volumes/mayanli_app/_data
.
Note that although the official docker backup guide uses the term “media
folder”, there is no explicitmedia
folder in the docker installation. The/opt/mayan-edms/media
folder known from the direct deployment is equivalent to the/var/lib/mayan
folder inside the Mayan EDMS docker container, anddocker-compose
mounts this internal container path to/var/lib/docker/volumes/mayanli_app/_data
on the docker host machine. - The Mayan documentation simply states that backing up and restoring the content of the old / new
media
folder usingtar
. This is not true though, as theconfig.yml
and probably thesystem/SECRET_KEY
files that come with the new installation have to be kept instead of the files in the old installation’smedia
folder. Therefore proceed as follows:- In
/var/lib/docker/volumes/mayanli_app/_data
rename.. config.yml
and config_backup.yml toconfig.yml.keep
and config_backup.yml.keepsystem/SECRET_KEY
tosystem/SECRET_KEY.keep
(not sure about this one, but let’s see if it runs smoothly)- whoosh directory to whoosh.keep (again not sure about this one, but let’s see if it works.)
- In
- Now still as
root
, clean out everything that’s not one of the kept files and directories in the/var/lib/docker/volumes/mayanli_app/_data
. You will have to manually use therm -rf
command - Still as
root
on the host machine and inside the same directory, untar your media archive:
root@x220:/var/lib/docker/volumes/mayanli_app/_data# tar --strip-components=3 -xzf mayanli_media_20220906.tar.gz
(The--strip-components
flag is needed because tar would other wise carry over the full path from the source system which was/opt/mayan-edms/media
. But we need to strip those path levels from each path extracted because we are already inside the directory which is equivalent to themedia
directory in the direct deployment installation. - Still as root rename all files / directories that have a keep duplicate to
<name>.old
and then remove the keep extensions. Finally, change the ownership of the whole data directory content to what it is at the_data
level astar
has to be run underroot
privileges, it creates all new content withroot:root
ownership, while the original docker installation expects (in my case)ilek:ilek
as ownership.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config.yml config.yml.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config.yml.keep config.yml root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config_backup.yml config_backup.yml.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config_backup.yml.keep config_backup.yml root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv whoosh whoosh.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv whoosh.keep/ whoosh root@x220:/var/lib/docker/volumes/mayanli_app/_data# cd system/ root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# ls SECRET_KEY SECRET_KEY.keep VERSION root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv SECRET_KEY SECRET_KEY.old root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv SECRET_KEY.keep SECRET_KEY root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv VERSION VERSION.old root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# cd ../.. root@x220:/var/lib/docker/volumes/mayanli_app# chown -R ilek:ilek _data/ |
5. The Mayan Search Backend
The abysmal performance of the Mayan EDMS search backend is a never ending story and there is enough drama to fill a separate blog. As the Docker installation ships with the new Whoosh search backend, my hope was (and still is) that this would change things. Unfortunately, so far it has not and according to my preliminary (!) post-migration testing, the performance is even worse.
While the DjangoSearchBackend
was at least capable to find documents based on their uuid, the new Whoosh search even fails at such basic queries.
For the time being, my recommendation is to switch back to the previous DjangoSearchBackend
, so that at least a reliable search by uuid can be performed. To achieve that, go to System
-> Settings
, Click on the Settings
button for identifier Search
, click on Edit
for SEARCH_BACKEND
and replace the Value mayan.apps.dynamic_search.backends.whoosh.WhooshSearchBackend
by mayan.apps.dynamic_search.backends.django.DjangoSearchBackend