1. Present situation
I currently have 4 different Mayan installations running: 1 on a physical Ubuntu 22.04 LTS server and 3 others on Ubuntu 20.04 servers, each of the 3 servers being a virtual machine. Each of these installation is for different client projects (or for my own documents administration).
1.1 Downside of running all client project under a single Mayan EDMS instance with ACLs
Theoretically, Mayan has a very granular access control model that would allow to put everything into one single Mayan instance as I have shown a previous post on this blog. In practice, though, this approach is quite cumbersome as it would involve heavy ACL administration issues with each document uploaded.
On top of that clients often insist that their documents must be kept in a completely separated environment for reasons of confidentiality. They do indeed have a point when they insist that there should be more than a simple acces control list entry that prevents an outsider from getting access to their documents.
Furthermore, running one single Mayan instance for multiple clients would mean that backup jobs could only run under a “one size fits all” setup, which would effectively result in backing up the entire system at the highest of all frequencies required by customers.
1.2 Downside of running separate Mayan installations
Given the downsides of the single instance approach in 1.1, I opted for putting each project into a single separate Mayan EDMS installation.
While that spared me the downsides of ACL administration work, customer doubts about confidentiality problems and backup issues the solution comes with its own drawbacks:
Each machine (physical or virtual) hosting the server must be updated separately. Whenever there are security upgrades by Ubuntu, new releases from Mayan, the work multiplies as compared to the single Mayan instance.
The solution therefore has the typical drawbacks in terms of redundancy – both with regards to hours spent on system maintenance as well as to resource consumption.
Additionally, a classical dependency problem popped up: I first ran into problems because the new Ubuntu 22.04 LTS Server ships with Python 3.10 as the default version, while the direct deployment of Mayan until recently expected Python 3.8, so I had to trick around with the deadsnakes repository to have a Python 3.8 which I could put into the virtual environment. Now, I have two Ubuntu 20.04 LTS instances with default Python 3.8, but since Mayan 4.3, Python 3.10 is required.
1.3 Hoping to get the best of both worlds with Docker
So instead of running multiple servers with multiple Mayan installations, fiddling around with dependencies etc., I will give Docker a try. Ideally, there will be a virtual machine that holds has several docker-compose Mayan clusters, each for one Mayan installation. But before I will be able to try out whether this allows a less tedious maintenance, I will have some migration work to do. Allons-y!
2. Backing up the existing Mayan Server
According to the Mayan EDMS documentation, there are two components to back up:
- The content of the PostgreSQL data base
- The
/opt/mayan-edms/mediadirectory which holds all the files stuffed into the document manangement system
Prior to running the backup operations, we stop Mayan: sudo service supervisor stop
2.1 Backup PostgreSQL
We will follow the PostgreSQL documentation to dump the mayan database. Initially I thought it would be safer to dump the whole PostgreSQL server with all users, roles and passwords with pg_dumpall instead of pg_dump.
It turned out the opposite is true: The PostgreSQL data base which is set up by the Docker compose only has a single user mayan which is both superuser and owner of the mayan database. Therefore trying to restore a complete server with the default superuser postgres will only cause errors during the restore operation. Go for the simple dump with of the mayan database only and everything will be fine:
|
1 |
ilek@mayan2:~$ pg_dump -U mayan -d mayan -f mayanli_dump_20220906.backup |
2.2 Backup media folder
By default, the Mayan EDMS media folder which holds all the files checked into the system in their original format (not with their original name, though but labelled by uuids), resides in /opt/mayan-edms/media.
After a quick check to make sure that there is still enough space on our hard drive left…
|
1 2 3 4 5 6 |
ilek@mayan2:/opt/mayan-edms$ df -h Filesystem Size Used Avail Use% Mounted on [...output shortened...] /dev/sda2 41G 14G 26G 35% / [...output shortened...] tmpfs 394M 0 394M 0% /run/user/1000 |
… we proceed with the command suggested by the Mayan EDMS online documentation (we leave out the v flag originally given for the tar command as this would result in an endless list of files churning through the terminal window and would slow down the backup):
|
1 2 |
ilek@mayan2:~$ sudo tar -zcf mayanli_media_20220906.tar.gz /opt/mayan-edms/media/ tar: Removing leading `/' from member names |
Finally make sure that our backup files have really been created on the source machine:
|
1 2 3 4 |
ilek@mayan2:~$ ls -alh | grep mayan -rw-rw-r-- 1 ilek ilek 45M Sep 6 14:19 mayanli_dump_20220906.backup -rw-r--r-- 1 root root 1.6G Sep 6 15:14 mayanli_media_20220906.tar.gz ilek@mayan2:~$ |
3. Preparing the new Mayan EDMS instance
3.1 Prepare working directory structure
As I would like to have all non-container data relating to Docker in a single place, I first created a docker directory directly under my home directory.
Inside the docker directory I created a directory mayanli that should hold all external data for a single mayan instance.
- the shell script that holds all the commands to spin up the mayan-edms containers
- the
docker-compose.ymland the.envfile that we download in step 3 of the Mayan instructions. - the directories which will be needed for data backup and restore – in particular the
mediaand thepostgresvolumes of our dockerized Mayan installation.
3.2 Prepare docker-compose files
Step 3 of the Mayan EDMS instruction downloads the files needed by Docker to create all necessary containers and their surrounding infrastructure (networks, volumes). Download the docker-compose.yml and the .env file into the mayanli directory.
Edit the .env file as follows:
|
1 2 3 4 5 6 7 8 9 |
[...] COMPOSE_PROJECT_NAME=mayanli [...] MAYAN_APT_INSTALLS="tesseract-ocr-deu tesseract-ocr-fra" [...] MAYAN_DATABASE_NAME=mayan MAYAN_DATABASE_PASSWORD=yourPostgreSQLUserPassword MAYAN_DATABASE_USER=mayan [...] |
MAYAN_APP_VOLUME and MAYAN_POSTGRES_VOLUME in the .env file did not work for me (while the custom postgres directory is populated during the install, the custom media directory remains completely empty. Furthermore, two additional volumes are still created under the default /var/lib/docker/volumes path.4. Restoring our data
4.1 PostgreSQL restore
We start with the more tricky part of the operation: restoring PostgreSQL data. We need the PostgreSQL container running to pick up the dump.
- To have a running PostgreSQL installation we launch the docker cluster:
ilek@x220:~/docker/mayanli$ ./mayanli-up.sh
As docker creates the skeleton infrastructure with lots of disk and database operations, it is recommended to wait a minute until the next step. Check the mayanli_app_1 docker log to see if the process has finished. - As we only work on the PostgreSQL container, we immediately stop all other containers to prevent interferences while restoring our database snapshot:
ilek@x220:~/docker/mayanli$ docker stop mayanli_app_1ilek@x220:~/docker/mayanli$ docker stop mayanli_redis_1ilek@x220:~/docker/mayanli$ docker stop mayanli_rabbitmq_1 - From the host machine find out the IP address of the PostgreSQL container by issuing:
docker inspect mayanli_postgresql_1
In my case the output shows among other things:"IPAddress": "172.30.0.4" - The PostgreSQL server installed by Docker has only a single user, which is the superuser at the same time:
mayan. Thepg_hba.conffile by default has been equipped with an entry to accept outside connections that are identified by an md5 hashed password. We can therefore connect from the host machine withpsql:ilek@x220:~/docker/mayanli$ psql -U mayan -d postgres -h 172.30.0.4 - In order to prevent errors related to unique keys and similar things, we want to have a completely empty mayan database. Unfortunately, upon spinning up our mayanli cluster, there has also been some mayan data base content created that would conflict with our restore content. We therefore drop the mayan database…
postgres# DROP DATABASE mayan;
… recreate an empty database:postgres# CREATE DATABASE mayan;
and quit with\qandexitthe container. - Now we are ready for the main part: Pushing our restore into the database. From the host machine issue the following command:
ilek@x220:~/docker/mayanli$ psql -U mayan -h 172.30.0.4 -d mayan < mayanli_pg_20220906.backup - Restart the stopped containers of our Mayan cluster:
ilek@x220:~/docker/mayanli$ docker start mayanli_rabbitmq_1ilek@x220:~/docker/mayanli$ docker start mayanli_redis_1ilek@x220:~/docker/mayanli$ docker start mayanli_app_1 - Your browser should now respond to
http://localhost:80with the Mayan login screen (depending on your machine resources, it might take a minute until all services have started and you get a neat Mayan EDMS login screen, you can issue adocker logs mayanli_app_1to check in the logs if Docker has completed the startup – the final startup messages should be fromcelery.).
4.2 media folder
- If still running, shut down the Mayan docker cluster by executing the
./mayanli_down.shshell script. - Change to
rootwithsudo suand copy your.tar.gzbackup file into/var/lib/docker/volumes/mayanli_app/_data.
Note that although the official docker backup guide uses the term “mediafolder”, there is no explicitmediafolder in the docker installation. The/opt/mayan-edms/mediafolder known from the direct deployment is equivalent to the/var/lib/mayanfolder inside the Mayan EDMS docker container, anddocker-composemounts this internal container path to/var/lib/docker/volumes/mayanli_app/_dataon the docker host machine. - The Mayan documentation simply states that backing up and restoring the content of the old / new
mediafolder usingtar. This is not true though, as theconfig.ymland probably thesystem/SECRET_KEYfiles that come with the new installation have to be kept instead of the files in the old installation’smediafolder. Therefore proceed as follows:- In
/var/lib/docker/volumes/mayanli_app/_datarename.. config.ymland config_backup.yml toconfig.yml.keepand config_backup.yml.keepsystem/SECRET_KEYtosystem/SECRET_KEY.keep(not sure about this one, but let’s see if it runs smoothly)- whoosh directory to whoosh.keep (again not sure about this one, but let’s see if it works.)
- In
- Now still as
root, clean out everything that’s not one of the kept files and directories in the/var/lib/docker/volumes/mayanli_app/_data. You will have to manually use therm -rfcommand - Still as
rooton the host machine and inside the same directory, untar your media archive:
root@x220:/var/lib/docker/volumes/mayanli_app/_data# tar --strip-components=3 -xzf mayanli_media_20220906.tar.gz
(The--strip-componentsflag is needed because tar would other wise carry over the full path from the source system which was/opt/mayan-edms/media. But we need to strip those path levels from each path extracted because we are already inside the directory which is equivalent to themediadirectory in the direct deployment installation. - Still as root rename all files / directories that have a keep duplicate to
<name>.oldand then remove the keep extensions. Finally, change the ownership of the whole data directory content to what it is at the_datalevel astarhas to be run underrootprivileges, it creates all new content withroot:rootownership, while the original docker installation expects (in my case)ilek:ilekas ownership.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config.yml config.yml.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config.yml.keep config.yml root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config_backup.yml config_backup.yml.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv config_backup.yml.keep config_backup.yml root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv whoosh whoosh.old root@x220:/var/lib/docker/volumes/mayanli_app/_data# mv whoosh.keep/ whoosh root@x220:/var/lib/docker/volumes/mayanli_app/_data# cd system/ root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# ls SECRET_KEY SECRET_KEY.keep VERSION root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv SECRET_KEY SECRET_KEY.old root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv SECRET_KEY.keep SECRET_KEY root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# mv VERSION VERSION.old root@x220:/var/lib/docker/volumes/mayanli_app/_data/system# cd ../.. root@x220:/var/lib/docker/volumes/mayanli_app# chown -R ilek:ilek _data/ |
5. The Mayan Search Backend
The abysmal performance of the Mayan EDMS search backend is a never ending story and there is enough drama to fill a separate blog. As the Docker installation ships with the new Whoosh search backend, my hope was (and still is) that this would change things. Unfortunately, so far it has not and according to my preliminary (!) post-migration testing, the performance is even worse.
While the DjangoSearchBackend was at least capable to find documents based on their uuid, the new Whoosh search even fails at such basic queries.
For the time being, my recommendation is to switch back to the previous DjangoSearchBackend, so that at least a reliable search by uuid can be performed. To achieve that, go to System -> Settings, Click on the Settings button for identifier Search, click on Edit for SEARCH_BACKEND and replace the Value mayan.apps.dynamic_search.backends.whoosh.WhooshSearchBackend by mayan.apps.dynamic_search.backends.django.DjangoSearchBackend