Site Overlay

Moving Mayan EDMS from Direct Deployment to Docker

1. Present situation

I currently have 4 different Mayan installations running: 1 on a physical Ubuntu 22.04 LTS server and 3 others on Ubuntu 20.04 servers, each of the 3 servers being a virtual machine. Each of these installation is for different client projects (or for my own documents administration).

1.1 Downside of running all client project under a single Mayan EDMS instance with ACLs

Theoretically, Mayan has a very granular access control model that would allow to put everything into one single Mayan instance as I have shown a previous post on this blog. In practice, though, this approach is quite cumbersome as it would involve heavy ACL administration issues with each document uploaded.

On top of that clients often insist that their documents must be kept in a completely separated environment for reasons of confidentiality. They do indeed have a point when they insist that there should be more than a simple acces control list entry that prevents an outsider from getting access to their documents.

Furthermore, running one single Mayan instance for multiple clients would mean that backup jobs could only run under a “one size fits all” setup, which would effectively result in backing up the entire system at the highest of all frequencies required by customers.

1.2 Downside of running separate Mayan installations

Given the downsides of the single instance approach in 1.1, I opted for putting each project into a single separate Mayan EDMS installation.

While that spared me the downsides of ACL administration work, customer doubts about confidentiality problems and backup issues the solution comes with its own drawbacks:

Each machine (physical or virtual) hosting the server must be updated separately. Whenever there are security upgrades by Ubuntu, new releases from Mayan, the work multiplies as compared to the single Mayan instance.

The solution therefore has the typical drawbacks in terms of redundancy – both with regards to hours spent on system maintenance as well as to resource consumption.

Additionally, a classical dependency problem popped up: I first ran into problems because the new Ubuntu 22.04 LTS Server ships with Python 3.10 as the default version, while the direct deployment of Mayan until recently expected Python 3.8, so I had to trick around with the deadsnakes repository to have a Python 3.8 which I could put into the virtual environment. Now, I have two Ubuntu 20.04 LTS instances with default Python 3.8, but since Mayan 4.3, Python 3.10 is required.

1.3 Hoping to get the best of both worlds with Docker

So instead of running multiple servers with multiple Mayan installations, fiddling around with dependencies etc., I will give Docker a try. Ideally, there will be a virtual machine that holds has several docker-compose Mayan clusters, each for one Mayan installation. But before I will be able to try out whether this allows a less tedious maintenance, I will have some migration work to do. Allons-y!

2. Backing up the existing Mayan Server

According to the Mayan EDMS documentation, there are two components to back up:

  • The content of the PostgreSQL data base
  • The /opt/mayan-edms/media directory which holds all the files stuffed into the document manangement system

Prior to running the backup operations, we stop Mayan: sudo service supervisor stop

2.1 Backup PostgreSQL

We will follow the PostgreSQL documentation to dump the mayan database. Initially I thought it would be safer to dump the whole PostgreSQL server with all users, roles and passwords with pg_dumpall instead of pg_dump.

It turned out the opposite is true: The PostgreSQL data base which is set up by the Docker compose only has a single user mayan which is both superuser and owner of the mayan database. Therefore trying to restore a complete server with the default superuser postgres will only cause errors during the restore operation. Go for the simple dump with of the mayan database only and everything will be fine:

2.2 Backup media folder

By default, the Mayan EDMS media folder which holds all the files checked into the system in their original format (not with their original name, though but labelled by uuids), resides in /opt/mayan-edms/media.

After a quick check to make sure that there is still enough space on our hard drive left…

… we proceed with the command suggested by the Mayan EDMS online documentation (we leave out the v flag originally given for the tar command as this would result in an endless list of files churning through the terminal window and would slow down the backup):

Finally make sure that our backup files have really been created on the source machine:

3. Preparing the new Mayan EDMS instance

3.1 Prepare working directory structure

As I would like to have all non-container data relating to Docker in a single place, I first created a docker directory directly under my home directory.

Inside the docker directory I created a directory mayanli that should hold all external data for a single mayan instance.

  • the shell script that holds all the commands to spin up the mayan-edms containers
  • the docker-compose.yml and the .env file that we download in step 3 of the Mayan instructions.
  • the directories which will be needed for data backup and restore – in particular the media and the postgres volumes of our dockerized Mayan installation.

3.2 Prepare docker-compose files

Step 3 of the Mayan EDMS instruction downloads the files needed by Docker to create all necessary containers and their surrounding infrastructure (networks, volumes). Download the docker-compose.yml and the .env file into the mayanli directory.

Edit the .env file as follows:

4. Restoring our data

4.1 PostgreSQL restore

We start with the more tricky part of the operation: restoring PostgreSQL data. We need the PostgreSQL container running to pick up the dump.

  1. To have a running PostgreSQL installation we launch the docker cluster:
    ilek@x220:~/docker/mayanli$ ./mayanli-up.sh
    As docker creates the skeleton infrastructure with lots of disk and database operations, it is recommended to wait a minute until the next step. Check the mayanli_app_1 docker log to see if the process has finished.
  2. As we only work on the PostgreSQL container, we immediately stop all other containers to prevent interferences while restoring our database snapshot:
    ilek@x220:~/docker/mayanli$ docker stop mayanli_app_1
    ilek@x220:~/docker/mayanli$ docker stop mayanli_redis_1
    ilek@x220:~/docker/mayanli$ docker stop mayanli_rabbitmq_1
  3. From the host machine find out the IP address of the PostgreSQL container by issuing:
    docker inspect mayanli_postgresql_1
    In my case the output shows among other things: "IPAddress": "172.30.0.4"
  4. The PostgreSQL server installed by Docker has only a single user, which is the superuser at the same time: mayan. The pg_hba.conf file by default has been equipped with an entry to accept outside connections that are identified by an md5 hashed password. We can therefore connect from the host machine with psql:
    ilek@x220:~/docker/mayanli$ psql -U mayan -d postgres -h 172.30.0.4
  5. In order to prevent errors related to unique keys and similar things, we want to have a completely empty mayan database. Unfortunately, upon spinning up our mayanli cluster, there has also been some mayan data base content created that would conflict with our restore content. We therefore drop the mayan database…
    postgres# DROP DATABASE mayan;
    … recreate an empty database:
    postgres# CREATE DATABASE mayan;
    and quit with \q and exit the container.
  6. Now we are ready for the main part: Pushing our restore into the database. From the host machine issue the following command:
    ilek@x220:~/docker/mayanli$ psql -U mayan -h 172.30.0.4 -d mayan < mayanli_pg_20220906.backup
  7. Restart the stopped containers of our Mayan cluster:
    ilek@x220:~/docker/mayanli$ docker start mayanli_rabbitmq_1
    ilek@x220:~/docker/mayanli$ docker start mayanli_redis_1
    ilek@x220:~/docker/mayanli$ docker start mayanli_app_1
  8. Your browser should now respond to http://localhost:80 with the Mayan login screen (depending on your machine resources, it might take a minute until all services have started and you get a neat Mayan EDMS login screen, you can issue a docker logs mayanli_app_1 to check in the logs if Docker has completed the startup – the final startup messages should be from celery.).

4.2 media folder

  1. If still running, shut down the Mayan docker cluster by executing the ./mayanli_down.sh shell script.
  2. Change to root with sudo su and copy your .tar.gz backup file into /var/lib/docker/volumes/mayanli_app/_data.
    Note that although the official docker backup guide uses the term “media folder”, there is no explicit media folder in the docker installation. The /opt/mayan-edms/media folder known from the direct deployment is equivalent to the /var/lib/mayan folder inside the Mayan EDMS docker container, and docker-compose mounts this internal container path to /var/lib/docker/volumes/mayanli_app/_data on the docker host machine.
  3. The Mayan documentation simply states that backing up and restoring the content of the old / new media folder using tar. This is not true though, as the config.yml and probably the system/SECRET_KEY files that come with the new installation have to be kept instead of the files in the old installation’s media folder. Therefore proceed as follows:
    • In /var/lib/docker/volumes/mayanli_app/_data rename..
    • config.yml and config_backup.yml to config.yml.keep and config_backup.yml.keep
    • system/SECRET_KEY to system/SECRET_KEY.keep (not sure about this one, but let’s see if it runs smoothly)
    • whoosh directory to whoosh.keep (again not sure about this one, but let’s see if it works.)
  4. Now still as root, clean out everything that’s not one of the kept files and directories in the /var/lib/docker/volumes/mayanli_app/_data. You will have to manually use the rm -rf command
  5. Still as root on the host machine and inside the same directory, untar your media archive:

    (The --strip-components flag is needed because tar would other wise carry over the full path from the source system which was /opt/mayan-edms/media. But we need to strip those path levels from each path extracted because we are already inside the directory which is equivalent to the media directory in the direct deployment installation.
  6. Still as root rename all files / directories that have a keep duplicate to <name>.old and then remove the keep extensions. Finally, change the ownership of the whole data directory content to what it is at the _data level as tar has to be run under root privileges, it creates all new content with root:root ownership, while the original docker installation expects (in my case) ilek:ilek as ownership.

5. The Mayan Search Backend

The abysmal performance of the Mayan EDMS search backend is a never ending story and there is enough drama to fill a separate blog. As the Docker installation ships with the new Whoosh search backend, my hope was (and still is) that this would change things. Unfortunately, so far it has not and according to my preliminary (!) post-migration testing, the performance is even worse.

While the DjangoSearchBackend was at least capable to find documents based on their uuid, the new Whoosh search even fails at such basic queries.

For the time being, my recommendation is to switch back to the previous DjangoSearchBackend, so that at least a reliable search by uuid can be performed. To achieve that, go to System -> Settings, Click on the Settings button for identifier Search, click on Edit for SEARCH_BACKEND and replace the Value mayan.apps.dynamic_search.backends.whoosh.WhooshSearchBackend by mayan.apps.dynamic_search.backends.django.DjangoSearchBackend