I wrote before about why I decided to put an application into a Docker container. Today I’m going to cover a bit more of the how. I originally wrote the application fast and dirty. It was my first serious use of Python and my first distributed system application. Like many coders, I used what I knew rather than losing time learning lots of new things at the same time.
The application is just a few dozen lines of Python code and at times it had a bunch of included libraries. It reads some configuration text files from one folder and then creates an email, attaching a document from another folder to the email. The folders are mounted to each workload server from an NFS server, so the same files are available on multiple workload machines. Progress is logged to a file and sent to Syslog on each workload server. The workload servers then forward log messages to the master machine. I needed the Docker container version to work the same way.
A Docker container is launched with a Docker image. A Docker image is built from a Docker file. This Docker file essentially a set of build instructions that create a file system for the Docker image. Since my workload machines are Ubuntu, I use the latest Ubuntu image from the Docker Hub as the basis of my container application. Then I have the Docker file run Ubuntu commands to install applications, in the end, this was just installing Python. Then the Docker file copies in the application’s Python script. Now the image will have the full file system required to run the workload. Finally, the Docker file sets up launching the script when the container is launched. Here is what my Docker file looks like:
FROM ubuntu:latest RUN apt-get update RUN apt-get -y install python COPY workload-email.py workload-email.py CMD python workload-email.py
The image file system doesn’t include the NFS mounts and I didn’t want to install an NFS client in the container. So I needed to pass the folders into the container when it is launched. Docker has a function called Volumes for passing persistent data into a container. I started by sending the two folders in as volumes and could run the original, unmodified, script. I found that my logging was not working, so I needed to add the log folder as another volume. The container did not have Syslog and I wasn’t very happy with the volumes as a way of accessing shared data. It was simply not the modern way to pull data into an application. It also required that every worker machine has these NFS mounts. It wasn’t a huge change to use HTTP instead of NFS.
Moving the shared folders into the default web site on the master was a couple of lines of Ansible in its build playbook. Then the Python script could get the configuration files and the attachment file. Using HTTP from the container also allowed me to simplify the workload Ubuntu build. No need to install NFS and setup mounts on the workload machines. I could also remove the NFS exports on the master machine, and move the whole build source under /usr/local which is a little more usual as an application location.
Logging from containers is fairly flexible, I wanted to get rid of the container volume for logging and to use Syslog to centralize the logs on the master machine. Docker has a built-in logging system where the container console can be redirected to a log provider. I changed the script’s logging from writing into a file to simply writing the console. Then I told Docker to send these messages to Syslog on the master and tag the log entries with some text to make Syslog configuration simpler. On the master machine Syslog writes the tagged entries to a separate log file.
The objectives here were to minimize the configuration on each workload host, and to centralize configuration on the master machine. Using HTTP allowed me to remove all the NFS configuration. Using Docker Syslog also allowed me to remove the custom Syslog configuration from the workload servers.
I did find a bit of a wrinkle. Each container has its own writable file system, since the image is shared and read-only. Initially, I wasn’t cleaning up the downloaded attachment files. This redirected file system was growing fairly fast and only took a few hours to fill up the 40GB partition. A simple fix was to delete the downloaded file after the message was sent. Now the file system barely grows over a day’s run.
For testing, I have been using a four node Intel NUC VSAN cluster, each node with 16GB of RAM. I test with the master and the ten worker VMs kept small (1GB RAM and 1 vCPU). Postfix still delivers around 20K email messages per hour. I suspect that if I gave the master 10GB of RAM and 2vCPUs that I would get a few more emails but that storage performance is really the limiting factor.
© 2017, Alastair. All rights reserved.