Granite in Containers

In my last post, I gave an outline of the Granite demonstration application and how I use Granite to create a production-like data centre workload across a group of VMs. I am migrating that workload to containers, Kubernetes, and the public cloud for a new project. I am taking a phased approach rather than pushing straight from on-premises VMs to cloud-based Kubernetes. The phased approach keeps the changes manageable and identifies which design considerations apply to each technology. The design considerations for VMs vs containers are separate from the Kubernetes and public-cloud Kubernetes considerations. This post will cover some migration activities and challenges moving from VMs to containers for the various elements of Granite.

Granite Containers

The Granite application consists of a database, a web interface, and a series of worker tasks. Each element needs to be packaged into a container image for deployment. Considering that containers are a way of packaging software and that Kubernetes is a way to run a population of containers, this should be a simple packaging job. However, I made design decisions for building Granite in VMs that will need revisiting as the platform changes. Data persistence and networking are the two places where this repackaging can get complicated. The first build of Granite in containers will use only a single container host, simplifying networking and storage — for the moment.

Networking and name resolution

One of the tricks I use in Granite is to put the database server name in each VM’s /etc/hosts file. I don’t need to use DNS for web and worker VMs to find the database. I can even have multiple copies of Granite running in the same lab without isolated networks. I could have built DNS into Granite, but only the database server’s name required resolution, so /etc/hosts files are sufficient. The database container’s MySQL port is mapped to the (single) container host machine’s IP address, and the other containers will use this IP address to access the database. Containers do not include the /etc/hosts files, but the command line option “–add-host hostname:ip” provides a simple way to create the one entry I need for the worker and web containers. Speaking of web containers, on this single server container deployment, I will only have one web server container and map its HTTP port to the container host machine’s IP address, too. Using a single container host won’t be an option on the public cloud, I will need to solve these problems differently later.

Repackage plus persistence

The Granite application has persistent data in the MySQL database and a file system that stores invoice PDF documents. Happily, containers can have persistent storage mapped into their non-persistent file systems using container volumes. The result is that data can persist through updates to container images. Again, the single container host design allows me to use the host’s file system for these volumes, and just like networking, I will need to use different options on the public cloud.

The Docker Hub image for MySQL is well-documented, with guidance for attaching a volume to allow data persistence, environment variables for authentication settings, and the ability to include SQL scripts for first-run configuration. I already had a set of Python commands to create and populate the default database for Granite, so reworking these as SQL commands wasn’t a big issue. The resulting SQL command file is injected into the container image at build time, although it could also be stored in a volume to reduce the image size.

The file system for invoice PDFs was a little different. In the VM deployment, I had a worker script that copied newly created PDF files to each web server. This worker had to be configured to know all the operational web servers and didn’t populate existing PDFs to a newly deployed web server. I have a single volume mounted to both the worker container and the web servers for the single host container deployment. Mounting the same volume to multiple containers provides a shared filesystem for the containers. This way, I only store one copy of the PDFs, and new web servers can see all existing PDFs automatically.

Simple Repackage

I re-packaged the five task scripts into their own Docker images. Most worker tasks only need access to the database; only the invoice worker needs persistent file storage. Starting with the generic Python image from Docker Hub, the Dockerfile only installs the dependency PIP packages and copies the task script. I didn’t include the scheduler in the images; these containers execute their task scripts and exit. I will need an external scheduler to replace the scheduler Python script that ran on each VM.