{"id":1587,"date":"2022-09-08T17:36:28","date_gmt":"2022-09-08T15:36:28","guid":{"rendered":"https:\/\/hobbykeller.spdns.de\/?p=1587"},"modified":"2022-09-10T00:52:12","modified_gmt":"2022-09-09T22:52:12","slug":"moving-mayan-edms-from-direct-deployment-to-docker","status":"publish","type":"post","link":"https:\/\/hobbykeller.spdns.de\/?p=1587","title":{"rendered":"Moving Mayan EDMS from Direct Deployment to Docker"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">1. Present situation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I currently have 4 different Mayan installations running: 1 on a physical Ubuntu 22.04 LTS server and 3 others on Ubuntu 20.04 servers, each of the 3 servers being a virtual machine. Each of these installation is for different client projects (or for my own documents administration).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.1 Downside of running all client project under a single Mayan EDMS instance with ACLs<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Theoretically, Mayan has a very granular access control model that would allow to put everything into one single Mayan instance as I have shown a <a href=\"https:\/\/hobbykeller.spdns.de\/?p=1511\" data-type=\"post\" data-id=\"1511\">previous post<\/a> on this blog. In practice, though, this approach is quite <strong>cumbersome<\/strong> as it would involve heavy ACL administration issues with each document uploaded. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On top of that clients often insist that their documents must be kept in a completely separated environment for reasons of <strong>confidentiality<\/strong>. They do indeed have a point when they insist that there should be more than a simple acces control list entry that prevents an outsider from getting access to their documents.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Furthermore, running one single Mayan instance for multiple clients would mean that <strong>backup<\/strong> jobs could only run under a &#8220;one size fits all&#8221; setup, which would effectively result in backing up the entire system at the highest of all frequencies required by customers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.2 Downside of running separate Mayan installations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Given the downsides of the single instance approach in 1.1, I opted for putting each project into a single separate Mayan EDMS installation. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While that spared me the downsides of ACL administration work, customer doubts about confidentiality problems and backup issues the solution comes with its own drawbacks: <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Each machine (physical or virtual) hosting the server must be updated separately. Whenever there are security upgrades by Ubuntu, new releases from Mayan, the <strong>work multiplies<\/strong> as compared to the single Mayan instance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The solution therefore has the typical drawbacks in terms of <strong>redundancy<\/strong> &#8211; both with regards to hours spent on <strong>system maintenance<\/strong> as well as to <strong>resource consumption<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, a classical <strong>dependency problem<\/strong> popped up: I first ran into problems because the new Ubuntu 22.04 LTS Server ships with Python 3.10 as the default version, while the direct deployment of Mayan until recently expected Python 3.8, so I had to trick around with the deadsnakes repository to have a Python 3.8 which I could put into the virtual environment. Now, I have two Ubuntu 20.04 LTS instances with default  Python 3.8, but since Mayan 4.3, Python 3.10 is required.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.3 Hoping to get the best of both worlds with Docker<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">So instead of running multiple servers with multiple Mayan installations, fiddling around with dependencies etc., I will give Docker a try. Ideally, there will be a virtual machine that holds has several docker-compose Mayan clusters, each for one Mayan installation. But before I will be able to try out whether this allows a less tedious maintenance, I will have some migration work to do. Allons-y!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. Backing up the existing Mayan Server<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">According to the Mayan EDMS documentation, there are two components to back up:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The content of the PostgreSQL data base<\/li><li>The <code>\/opt\/mayan-edms\/media<\/code> directory which holds all the files stuffed into the document manangement system<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Prior to running the backup operations, we stop Mayan: <code>sudo service supervisor stop<\/code><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.1 Backup PostgreSQL<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We will follow the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.postgresql.org\/docs\/12\/app-pg-dumpall.html\" target=\"_blank\">PostgreSQL documentation<\/a> to dump the mayan database. Initially I thought it would be safer to dump the whole PostgreSQL server with all users, roles and passwords with <code>pg_dumpall<\/code> instead of <code>pg_dump<\/code>. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It turned out the opposite is true: The PostgreSQL data base which is set up by the Docker compose only has a single user <code>mayan<\/code> which is both superuser and owner of the <code>mayan<\/code> database. Therefore trying to restore a complete server with the default superuser <code>postgres<\/code> will only cause errors during the restore operation. Go for the simple dump with of the <code>mayan<\/code> database only and everything will be fine:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono font-size:14 lang:sh decode:true \" title=\"Dumping the complete Mayan PostgreSQL data base cluster\">ilek@mayan2:~$ pg_dump -U mayan -d mayan -f mayanli_dump_20220906.backup\n<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2.2 Backup media folder<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">By default, the Mayan EDMS <code>media<\/code> folder which holds all the files checked into the system in their original format (not with their original name, though but labelled by uuids), resides in <code>\/opt\/mayan-edms\/media<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After a quick check to make sure that there is still enough space on our hard drive left&#8230;<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono lang:sh decode:true \" title=\"Check space on source machines drive\">ilek@mayan2:\/opt\/mayan-edms$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\n[...output shortened...]\n\/dev\/sda2        41G   14G   26G  35% \/\n[...output shortened...]\ntmpfs           394M     0  394M   0% \/run\/user\/1000\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">&#8230; we proceed with the command suggested by the <a rel=\"noreferrer noopener\" href=\"https:\/\/docs.mayan-edms.com\/chapters\/backups.html\" target=\"_blank\" class=\"broken_link\">Mayan EDMS online documentation<\/a> (we leave out the <code>v<\/code> flag originally given for the tar command as this would result in an endless list of files churning through the terminal window and would slow down the backup):<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono font-size:14 lang:default decode:true \" title=\"Backing up and compressing the \/opt\/mayan-edms\/media folder\">ilek@mayan2:~$ sudo tar -zcf mayanli_media_20220906.tar.gz \/opt\/mayan-edms\/media\/\ntar: Removing leading `\/' from member names\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Finally make sure that our backup files have really been created on the source machine:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono font-size:14 lang:default decode:true \">ilek@mayan2:~$ ls -alh | grep mayan\n-rw-rw-r-- 1 ilek ilek  45M Sep  6 14:19 mayanli_dump_20220906.backup\n-rw-r--r-- 1 root root 1.6G Sep  6 15:14 mayanli_media_20220906.tar.gz\nilek@mayan2:~$ \n<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">3. Preparing the new Mayan EDMS instance<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">3.1 Prepare working directory structure<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">As I would like to have all non-container data relating to Docker in a single place, I first created a <code>docker<\/code> directory directly under my <code>home<\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Inside the <code>docker<\/code> directory I created a directory <code>mayanli<\/code> that should hold all external data for a single mayan instance.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>the shell script that holds all the commands to spin up the mayan-edms containers<\/li><li>the <code>docker-compose.yml<\/code> and the <code>.env<\/code> file that we download in <a rel=\"noreferrer noopener\" href=\"https:\/\/docs.mayan-edms.com\/chapters\/docker\/install_docker_compose.html#docker-compose-install\" target=\"_blank\">step 3 of the Mayan instructions<\/a>.<\/li><li>the directories which will be needed for data backup and restore &#8211; in particular the <code>media<\/code> and the <code>postgres<\/code> volumes of our dockerized Mayan installation.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3.2 Prepare docker-compose files<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Step 3 of the Mayan EDMS instruction downloads the files needed by Docker to create all necessary containers and their surrounding infrastructure (networks, volumes). Download the <code>docker-compose.yml<\/code> and the <code>.env<\/code> file into the <code>mayanli<\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Edit the .env file as follows:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono font-size:14 lang:default decode:true \" title=\"Preparing the .env file for docker-compose\" >[...]\nCOMPOSE_PROJECT_NAME=mayanli\n[...]\nMAYAN_APT_INSTALLS=\"tesseract-ocr-deu tesseract-ocr-fra\"\n[...]\nMAYAN_DATABASE_NAME=mayan\nMAYAN_DATABASE_PASSWORD=yourPostgreSQLUserPassword\nMAYAN_DATABASE_USER=mayan\n[...]<\/pre><\/div>\n\n\n\n<div class=\"wp-block-simple-alerts-for-gutenberg-alert-boxes sab-alert sab-alert-primary\" role=\"alert\">I still have not found out how to set a custom path for the media and the postgres directory. Setting <code>MAYAN_APP_VOLUME<\/code> and <code>MAYAN_POSTGRES_VOLUME<\/code> in the <code>.env<\/code> file did not work for me (while the custom <code>postgres<\/code> directory is populated during the install, the custom <code>media<\/code> directory remains completely empty. Furthermore, two additional volumes are still created under the default <code>\/var\/lib\/docker\/volumes path<\/code>.<button type=\"button\" class=\"close\" data-dismiss=\"alert\" aria-label=\"Close\"><span aria-hidden=\"true\">\u00d7<\/span><\/button><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">4. Restoring our data<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">4.1 PostgreSQL restore<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We start with the more tricky part of the operation: restoring PostgreSQL data. We need the PostgreSQL container running to pick up the dump.  <\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>To have a running PostgreSQL installation we launch the docker cluster:<br><code>ilek@x220:~\/docker\/mayanli$ .\/mayanli-up.sh<\/code><br><strong>As docker creates the skeleton infrastructure with lots of disk and database operations, it is recommended to wait a minute until the next step.<\/strong> Check the mayanli_app_1 docker log to see if the process has finished.<\/li><li>As we only work on the PostgreSQL container, we immediately stop all other containers to prevent interferences while restoring our database snapshot:<br><code>ilek@x220:~\/docker\/mayanli$ docker stop mayanli_app_1<\/code><br><code>ilek@x220:~\/docker\/mayanli$ docker stop mayanli_redis_1<\/code><br><code>ilek@x220:~\/docker\/mayanli$ docker stop mayanli_rabbitmq_1<\/code><\/li><li>From the host machine find out the IP address of the PostgreSQL container by issuing:<br><code>docker inspect mayanli_postgresql_1<\/code><br>In my case the output shows among other things: <code>\"IPAddress\": \"172.30.0.4\"<\/code><\/li><li>The PostgreSQL server installed by Docker has only a single user, which is the superuser at the same time: <code>mayan<\/code>. The <code>pg_hba.conf<\/code> file by default has been equipped with an entry to accept outside connections that are identified by an md5 hashed password. We can therefore connect from the host machine with <code>psql<\/code>:<br><code>ilek@x220:~\/docker\/mayanli$ psql -U mayan -d postgres -h 172.30.0.4<\/code><\/li><li>In order to prevent errors related to unique keys and similar things, we want to have a completely empty mayan database. Unfortunately, upon spinning up our mayanli cluster, there has also been some mayan data base content created that would conflict with our restore content. We therefore drop the mayan database&#8230;<br><code>postgres# DROP DATABASE mayan;<\/code><br>&#8230; recreate an empty database:<br><code>postgres# CREATE DATABASE mayan;<\/code><br>and quit with <code>\\q<\/code> and <code>exit<\/code> the container.<\/li><li>Now we are ready for the main part: Pushing our restore into the database. From the host machine issue the following command:<br>i<code>lek@x220:~\/docker\/mayanli$ psql -U mayan -h 172.30.0.4 -d mayan &lt; mayanli_pg_20220906.backup<\/code><\/li><li>Restart the stopped containers of our Mayan cluster:<br><code>ilek@x220:~\/docker\/mayanli$ docker start mayanli_rabbitmq_1<\/code><br><code>ilek@x220:~\/docker\/mayanli$ docker start mayanli_redis_1<\/code><br><code>ilek@x220:~\/docker\/mayanli$ docker start mayanli_app_1<\/code><\/li><li>Your browser should now respond to <code>http:\/\/localhost:80<\/code> with the Mayan login screen (depending on your machine resources, it might take a minute until all services have started and you get a neat Mayan EDMS login screen, you can issue a <code>docker logs mayanli_app_1<\/code> to check in the logs if Docker has completed the startup &#8211; the final startup messages should be from <code>celery<\/code>.). <\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">4.2 <code>media<\/code> folder<\/h3>\n\n\n\n<ol class=\"wp-block-list\"><li>If still running, shut down the Mayan docker cluster by executing the <code>.\/mayanli_down.sh<\/code> shell script.<\/li><li>Change to <code>root<\/code> with <code>sudo su<\/code> and copy your <code>.tar.gz<\/code> backup file into <code>\/var\/lib\/docker\/volumes\/mayanli_app\/_data<\/code>. <br>Note that although the <a rel=\"noreferrer noopener\" href=\"https:\/\/docs.mayan-edms.com\/chapters\/backups.html\" target=\"_blank\" class=\"broken_link\">official docker backup guide<\/a> uses the term &#8220;<code>media<\/code> folder&#8221;, there is no explicit <code>media<\/code> folder in the docker installation. The <code>\/opt\/mayan-edms\/media<\/code> folder known from the direct deployment is equivalent to the <code>\/var\/lib\/mayan<\/code> folder inside the Mayan EDMS docker container, and <code>docker-compose<\/code> mounts this internal container path to <code>\/var\/lib\/docker\/volumes\/mayanli_app\/_data<\/code> on the docker host machine.<\/li><li>The Mayan documentation simply states that backing up and restoring the content of the old \/ new <code>media<\/code> folder using <code>tar<\/code>. This is not true though, as the <code>config.yml<\/code> and probably the <code>system\/SECRET_KEY<\/code> files that come with the new installation have to be kept instead of the files in the old installation&#8217;s <code>media<\/code> folder. Therefore proceed as follows:<ul><li>In <code>\/var\/lib\/docker\/volumes\/mayanli_app\/_data<\/code> rename..<\/li><li><code>config.yml<\/code> and config_backup.yml to <code>config.yml.keep<\/code> and config_backup.yml.keep<\/li><li><code>system\/SECRET_KEY<\/code> to <code>system\/SECRET_KEY.keep<\/code> (not sure about this one, but let&#8217;s see if it runs smoothly)<\/li><li>whoosh directory to whoosh.keep (again not sure about this one, but let&#8217;s see if it works.)<\/li><\/ul><\/li><li>Now still as <code>root<\/code>, clean out everything that&#8217;s not one of the kept files and directories in the <code>\/var\/lib\/docker\/volumes\/mayanli_app\/_data<\/code>. You will have to manually use the <code>rm -rf<\/code> command<\/li><li>Still as <code>root<\/code> on the host machine and inside the same directory,  untar your media archive: <br><span class=\"crayon-inline font:ubuntu-mono font-size:14 lang:default decode:true\">root@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# tar &#8211;strip-components=3 -xzf mayanli_media_20220906.tar.gz<\/span><br>(The <code>--strip-components<\/code> flag is needed because tar would other wise carry over the full path from the source system which was <code>\/opt\/mayan-edms\/media<\/code>. But we need to strip those path levels from each path extracted because we are already inside the directory which is equivalent to the <code>media<\/code> directory in the direct deployment installation.<\/li><li>Still as root rename all files \/ directories that have a keep duplicate to <code>&lt;name>.old<\/code> and then remove the keep extensions. Finally, change the ownership of the whole data directory content to what it is at the <code>_data<\/code> level as <code>tar<\/code> has to be run under <code>root<\/code> privileges, it creates all new content with <code>root:root<\/code> ownership, while the original docker installation expects (in my case) <code>ilek:ilek<\/code> as ownership.<\/li><\/ol>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"font:ubuntu-mono font-size:14 lang:sh decode:true \">root@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv config.yml config.yml.old\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv config.yml.keep config.yml\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv config_backup.yml config_backup.yml.old\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv config_backup.yml.keep config_backup.yml\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv whoosh whoosh.old\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# mv whoosh.keep\/ whoosh\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data# cd system\/\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data\/system# ls\nSECRET_KEY  SECRET_KEY.keep  VERSION\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data\/system# mv SECRET_KEY SECRET_KEY.old\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data\/system# mv SECRET_KEY.keep SECRET_KEY\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data\/system# mv VERSION VERSION.old\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app\/_data\/system# cd ..\/..\nroot@x220:\/var\/lib\/docker\/volumes\/mayanli_app# chown -R ilek:ilek _data\/\n\n\n<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">5. The Mayan Search Backend<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The abysmal performance of the Mayan EDMS search backend is a never ending story and there is enough drama to fill a separate  blog. As the Docker installation ships with the new Whoosh search backend, my hope was (and still is) that this would change things. Unfortunately, so far it has not and according to my preliminary (!) post-migration testing, the performance is even worse.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While the <code>DjangoSearchBackend<\/code> was at least capable to find documents based on their uuid, the new Whoosh search even fails at such basic queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the time being, my recommendation is to switch back to the previous <code>DjangoSearchBackend<\/code>, so that at least a reliable search by uuid can be performed. To achieve that, go to <code>System<\/code> -&gt; <code>Settings<\/code>, Click on the <code>Settings<\/code> button for identifier <code>Search<\/code>, click on <code>Edit<\/code> for <code>SEARCH_BACKEND<\/code> and replace the Value <code>mayan.apps.dynamic_search.backends.whoosh.WhooshSearchBackend<\/code> by <code>mayan.apps.dynamic_search.backends.django.DjangoSearchBackend<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Present situation I currently have 4 different Mayan installations running: 1 on a physical Ubuntu 22.04 LTS server and 3 others on Ubuntu 20.04 servers, each of the 3<span class=\"more-button\"><a href=\"https:\/\/hobbykeller.spdns.de\/?p=1587\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\">Moving Mayan EDMS from Direct Deployment to Docker<\/span><\/a><\/span><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1587","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/posts\/1587","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1587"}],"version-history":[{"count":22,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/posts\/1587\/revisions"}],"predecessor-version":[{"id":1620,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=\/wp\/v2\/posts\/1587\/revisions\/1620"}],"wp:attachment":[{"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1587"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1587"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hobbykeller.spdns.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1587"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}