Building and tagging container images in CI

docker-ci-tagging-anon

Been thinking a lot recently about how to manage versioning and deployment using Docker for a small scale containerised solution. It’s different from a traditional release pipeline as the build artifacts are the container images with the latest code and configuration, instead of the CI having a zip of the built application.

In a completely ideal containerised microservice solution all containers are loosely coupled and can be tested and built independently. Their CI configuration can be kept independent as well, with the CI and testing setup for the entire orchestrated solution taking the latest safe versions of the containers and performing integration/smoke tests against test/staging environments.

If your solution is smaller scale and the containers linked together, this is my proposed setup.

Build

Images should be built consistently, so dependencies should be resolved and fixed at point of build. This is done for node with npm shrinkwrap which generates a file fixing the npm install to specific dependency versions. This should be done as part of development each time package.json is updated, to ensure all developers as well as images use the exact same versions of packages.

On each commit to develop the image is built and tagged twice, once with “develop” to tag it as the latest version for develop branch code, and then with the version number in the git repo VERSION.md (“1.0.1”). You cannot currently build with multiple tags, but building images with same content/instructions does not duplicate image storage due to Docker image layers.

Tagging

The “develop” tagged image is used as the latest current version of the image to be deployed as part of automated builds to the Development environment, in the develop branch docker-compose.yml all referenced images will use that tag.

The version number tagged image, “1.0.1”, is used as a historic fixed version for traceability, so for specific releases the tagged master docker-compose.yml will reference specific versioned images. This means we have a store of built versioned images which can be deployed on demand to re-create and trace issues.

On each significant release, the latest version image will be pushed to the image repository with the tag “latest” (corresponding to the code in the master branch).

Advertisements

Managing data store changes in containers

docker-container-ci-data-migrations

When creating microservices it’s common to keep their persistence stores loosely coupled so that changes to one service do not affect others. Each service should manage it’s own concerns, be the owner of retrieving/updating it’s data and define how and where it gets it from.

When using a relational database for a store there is an additional problem that each release may require schema/data updates, a known problem of database migration (schema migration/database change management).

There a large number of tools for doing this; flyway, liquibaseDbUp. They allow you to define the schema/data for your service as a series of ordered migration scripts, which can be applied to your database regardless of it’s state as a fresh DB or existing one with production data.

When your container service needs a relational database with a specific schema and you are performing continuous delivery you will need to handle this problem. Traditionally this is handled separately from the service by CI, where a Jenkins/Teamcity task runs the database migration tool before the task to deploy the updated release code for the service. You will have similar problems with containers that require config changes to non-relational stores (redis/mongo etc.).

This is still possible in a containerised deployment, but has disadvantages. Your CI will need knowledge/connection to each containers data store, and run the task for each container with a store. As the number of containers increase this will add more and more complexity into your CI which will need to be aware of all their needs and release changes.

To prevent this from happening the responsibility of updating their persistence store should be on developer for the container itself, as part of the containers definition, code and orchestration details. This allows the developers to define what their persistence store is and how it should be updated each release, leaving CI only responsible for deploying the latest version of the containers.

node_modules/.bin/knex migrate:latest --env development

As an example of this I created a simple People API node application and container, which has a dependency on a mysql database with people data. Using Knex for database migration, the source defines the scripts necessary to setup the database or upgrade it to the latest version. The Dockerfile startup command waits for the database to be available then runs the migration before starting the Node application. The containers necessary for the solution and the dependency on mysql are defined and configured in the docker-compose.yml.

docker-compose up

For a web application example I created People Web node application, that wraps the API and displays the results as HTML. It has a docker-compose.yml that spins up containers for mysql, node-people-api (using the latest image pushed to Docker Hub) and itself. As node-people-api manages it’s own store inside the container node-people-web doesn’t need any knowledge of the migration scripts to setup the mysql database.

Links

Infrastructure as code, containers as artifacts

As a developer, one of the things I love about containers is how fast they are are to create, spin up and then destroy. For rapid testing of my code in production like environments this is invaluable.

But containers offer another advantage, stability. Containers can be versioned, defined with static build artifact copies of the custom components they will host and explicitly versioned dependencies (e.g. nginx version). This allows for greater control in release management, knowing exactly what version of not just the custom code you have on an environment but the infrastructure running it. Your containers become an artifact of your build process.

While managing versions of software has long been standard practise, this isn’t commonly extended to the use of infrastructure as code (environment creation/update by scripts and tools like Puppet). Environments are commonly moving targets, separation of development and operations teams mean software and environment releases are done independently, with environment dependencies and products being patched independently of functionality releases (security patching, version updates etc.). This can cause major regression issues which often can’t be anticipated until it hits pre-production (if you are lucky).

By using containerisation with versioning you can control the release of environmental changes with precise control, something that is very important when dealing with a complex distributed architecture. You can release and test changes to individual servers, then trace back issues to the changes introduced. The containers that make up your infrastructure become build artifacts, which can be identified and updated like any other.

Here’s a sequence diagram showing how this can be introduced into your build process:

Containers as artifacts (1)

At the end of this process you have a fixed release deployed into production, with traceable changes to both custom code and infrastructure. Following this pattern allows upfront testing of infrastructure changes (including developer level) and makes it very difficult to accidentally cause any differences between your test and production environments.

Using Docker to construct a Selenium Grid

When thinking about possible uses for containerisation, one which jumped out at me was web testing. This normally involves large numbers of big, slow and fragile VMs.

I started with making a simple PhantomJS docker container, which removes the necessity of installing PhantomJS on your machine (which can be annoying complex). The image is available on Docker Hub here.

The dream would be using containers to construct a large Selenium grid of different browsers and versions, taking advantage of the performance/reduced size benefits of containers to be able to run a bigger/faster grid than you normally would with VMs. As the containers would be disposable, you could easily start/stop Selenium nodes for browsers when needed (i.e. a set for quick smoke tests and one for more extensive overnight tests). Developers would be able to use the same set of containers and hub scripts, so wouldn’t have to mess around with VMs and scripts to run tests to validate and reproduce issues, making it easier to find issues early.

The main problem is containerisation is currently limited to Linux OS containers only, which means no testing on OS X or Windows, so no Safari and Internet Explorer (who needs to test that, right). These holes can be filled with VMs spun up by Vagrant and added to the Selenium hub, but it’s not an ideal solution. Still, I believe this is a potential solution for small projects which can’t afford Browserstack or Saucelabs licences for testing.

Below are the Docker commands and Selenium tests to call various browsers:

# Run hub
docker run -d -p 4444:4444 --name selenium-hub selenium/hub:2.46.0

# Run browser containers
docker run -d -P --link selenium-hub:hub selenium/node-chrome:2.46.0
docker run -d -P --link selenium-hub:hub selenium/node-firefox:2.46.0
docker run -d --link selenium-hub:hub stevenalexander/node-phantomjs:2.46.0
# Add vagrant VM browser nodes here

With these commands you now have a Selenium Grid running with a number of browser nodes registered and can run tests against the hub using the RemoteWebDriver targeting individual browser/versions. You can modify the containers used to create containers running different versions or add VMs to register IE/Safari/Mobile nodes.

Useful links:

Using Docker for Dev environments

I had put some thought into using Docker locally as part of development, but after a talk about maintaining boxen scripts (used to provision macs according to a common set of requirements for projects) it got me thinking about what you could really do with containers for development environments and flow.

My experience on different projects with development environments has been mostly bad over the years.

Commonly there’s nothing to help with setup, developers just get some (sketchy) documentation about what applications and dependencies they need to run and change the solution. This leads to problems when working in teams, increasing time to get developers started and lots of “works on my machine” environmental differences. Because developers are running everything locally the environment they are building and testing on is nothing like the intended production environment (different dependencies/OS/Architecture etc.), so developers have no visiblity of problems that occur when their code is deployed. This causes the “throwing problems over the wall” gap between developers and operations.

Sometimes there’s a standard development VM image, which is better than manual configuration but also introduces new problems. It’s normally a beast, large in file size and poorly performing (to the point that developers work to avoid using it). It’s also hard to maintain, as no one wants to re-install a multi-gigabyte image, so it tends to become a base image needing manual corrections for updates rather than plug and play environment. It also shares a lot of the same problems with differences from production, as it’s not practical to use virtual machines replicating the production environment/architecture running on a normal laptop/desktop.

Containers offer new possibilities for development environments. When running directly on the host (not inside a virtual machine like boot2docker) it offers near native performance. Containers are small in size, so can be created/destroyed quickly and are very fast to start, meaning you can rebuild your environment per change. Container image definitions and scripts for rebuilding/starting them can be stored in source control, using the same definitions in development, build, test and production environments, reducing the differences between them. Because of the size/performance benefits we can be more ambitious on how far we go in simulating production conditions, introducing load balancing and networking concerns early in development, allowing developers to see these early rather than leaving it all to operations staff.

Below is a diagram showing an example of this setup:

Container development environment diagram

Optionally, containers aren’t just limited to headless processes, you can run GUI applications too like IDEs. This allows pre-configured development tools to be tailored specifically for the project, to quickly allow new developers to start with the same toolset as the rest of the team. Isolating the entire development stack inside containers means developers don’t have to alter their own machine setup to work on a project, reducing startup time and making it easier to switch between projects without rebuilding their machine.

Containerisation

I’ve recently been looking at containerisation and how the adoption of this technology would change development and deployment.

What are the differences between containers and Virtual Machines

Linux containers are type of virtualisation, done at the OS level rather than hardware (see here[1]). Containerisation is not new, it has been around for over 15 years (see here[5]) but required a recent change in the Linux kernel to make them easy to secure (process namespaces and cgroups, see here[6]).

As containers do not require a guest OS and the host running a hypervisor they are smaller in size, much quicker to start and can be hosted with greater density and performance than Virtual Machines. The benefits of this have been recognised by large service providers like Google and Facebook, who now host practically all their services in containers (see here[2] and video here[3] at 35mins). Other services which use dynamically created instances, such as Travis, Saucelabs, Heroku, also use containers as they are much faster to create/destroy.

While containers have been quickly adopted by large scale projects and organisations, they have been largely ignored by smaller scale projects who continue to use IaaS with Virtual Machines. This is beginning to change with the rise of containerisation tools and frameworks like Docker (here[4]) which make it easier to host smaller scale containerised solutions.

How containers can be used in development

We commonly use Virtual Machine images to provide developers with a common development environment, but due to performance and hosting limitations this environment is normally nothing like the production environment, leading to a gap in understanding how the software they are developing will run in reality. Also Virtual Machines are slow to start, massive in size and difficult to configure/recreate, leading to development delays for new starts or after changes.

Containers offer a way for developers to create and run the applications that make up the solution similar to production, while keeping the overhead low and extremely responsive. Where a Virtual Machine hosting an application may take 10 minutes to copy/download and over a minute to startup and run, a container can be created and started in seconds and can be easily destroyed/recreated. In a large project team dealing with multiple applications, having your development environment defined in containers can offer a large time saving even if you do not intend to run in containers on production.

Containers also allow developers to easily define and test the definition of what container requires (dependencies/configuration etc.) and guarantee isolation, so the way a container runs in a development host should be identical to a production host (excluding networking/performance etc.). This reduces the gap between development and operations, allowing developers to locally test their application running in the same container that would be used in production and understand what dependencies/ports/configuration that application needs.

Docker has a good explanation of how containers can be used to develop and ship software reliably here ([7]).

By reducing the overhead in creating environments, both in development and production, containers make types of architecture and scaling easier to create. For example, once you have created a container to host a microservice you can run multiple copies of it load balanced across multiple container hosts, with container instances created/destroyed dynamically on demand. See here (8) for an example of a microservice authentication and authorisation solution implemented with Docker containers.

How containers can be used in deployment and operations

Containerisation new opportunities and challenges in deploying solutions. Used incorrectly and it could become another layer of virtualisation that operations have to worry about. Used correctly and it reduces the amount of work necessary to deploy complex applications by standardising how components are deployed and wired together.

During the initial hype of cloud services it was assumed that with IaaS we would be able to spin up VM instances quickly and easily on demand. In practise on real projects this has not been the case. Costs, restrictions on providers available on projects, differences between providers APIs and tricky networking configuration have meant that actually creating new instances to make an environment is expensive in time and effort. This has made it difficult to create new environments on demand using VMs and locked projects into providers.

A containerised solution allows more flexible hosting, as instead of thinking in terms of VMs provisioned for specific environments and roles (i.e. a Web server, Application server) you can treat each container host as a pool of available resources to run different containers for different, including duplicate containers spread across hosts for redundancy and rolling updates. There are a number of frameworks and tools for managing the containers hosted on each machine, such as Fleet ([9]) and Kubernetes ([10]), using tools like these ensure that each host starts the required containers and coordinates with each other for starting/stopping containers and load balancing. Host themselves would be provisioned the same way as current Virtual Machines using tools like Ansible and Puppet.

This allows projects to create clusters of VM container hosts for development/testing/production and create containers to meet the current hosting needs without having to change the VM instances available. For example, if a new User Test satellite environment is needed the configuration of the test cluster is updated/pushed causing the cluster to start new container instances on node VMs with available CPU/memory resources, links them together and making the new environment available externally. When the environment is not needed the configuration is updated again and the container instances are destroyed. The only time new VMs need to be created is when a new cluster is needed or a clusters capacity needs to be increased.

When should you not use containers

If you are considering using containers you should think about the situations when you would not want to use them, so you know if they are included in your solution.

  • If your application is small, in terms of layers or hosting, as the cost of implementing and managing containers will out weight the benefits. E.g. a simple web application hosted on one or two servers.
  • When your container becomes as complex to provision and configure as it would on a virtual machine. If you cannot easily isolate the functionality into small manageable containers then you are essentially using containers the same as a virtual machine plus the added overhead. E.g. trying to containerise a monolithic solution
  • For bundling large third party dependences which do not officially support your container framework. E.g. trying to containerise an Oracle DB instance.
  • When your hosting architecture has fixed structure or security limitations that restrict how and where certain components can be placed. This affects where you can put your containers and makes it difficult to get the benefits of containers flexible deployment if you are stuck with most of your components deployment being fixed and predefined. E.g. Security restrictions mean any change in networking need external review, so new containers cannot be created without pre-authorisation, making deployments essentially fixed

A lot of the benefits for containers are similar to the benefits of splitting your application across multiple small VMs, if you wouldn’t do that then you should reconsider and either not use containers or only containerise the parts which would benefit.

Conclusion

Using containers requires a change in thinking, both for developing solutions and deploying them.

The speed and responsiveness of containers makes it much easier for developers to run and test their components in production like environments, letting them be more involved with operational tasks and reduce the gap between development and operations.

For deployment, instead of deploying components to static servers in static locations you are defining components that need to be run, linking them together and relying on the container hosts to create and start instances. This brings new challenges in how to handle persistence and monitoring of these containers which are designed to be disposable, but allows solutions to be made from many simple and scalable parts without massive overhead.

While containers are not going to replace VMs right away, but the performance/speed/adaptability advantages are only going to push more providers and services to use them over time, making it impossible to ignore (see video[11], conclusion at 40min). Already Joylent ([12]), a hosting provider, is offering bare metal container hosting, IaaS giving performance/responsiveness benefit of containers over VMs. This trend will likely continue, as container hosting gets cheaper more providers will pick it up and skills to use it will spread.

References

CoreOS Rkt compared to Docker

Looking at CoreOS Rkt to get a comparison with Docker, as quite a few of the experienced operations developers I’ve spoke to about containers have raised concerns with Docker and prefer the pure standardised containers appoach that Rkt (Rocket) is aimed at compared to Dockers propriety format and platform.

Comparison with Docker

Rkt is purely concerned with creating and running containers in isolation securely, it is not attempting to become a wider containerisation platform like Docker. It’s an implementation of the App Container (appc) specification and uses ACI container images from the specification. Rkt can run other types of containers such as Docker images. As of writing it is v0.5.5 and is not recommended for production use.

Docker (as of v1.5.0) does not attempt to implement the appc specification and does not support ACI images. This makes Docker images a open but proprietary format, so users require third party tools to generate ACI images from Docker images to allow their use in other container tools. It offers a large amount of tooling that Rkt cannot offer, such as a public repository for images and tools for integrating with hosting providers. For developers Docker is a easier to use, as there are tools (boot2docker on mac etc.) for using Docker directly from the commandline on the host machine, while with Rkt you need to ssh onto a VM before running any Rkt commands. Docker documentation is better, as is the range of images available.

Rkt is thought as more secure, as it runs daemonless while Docker needs a daemon running as root to manage containers and allocate resources. Rkt also uses a trusted key model for verifying that an image you have downloaded is what you expect and has not been altered by someone.

Conclusion

In practise creating and using Rkt containers is very similar to Docker, simple commandline interface using a JSON container definition file that is like a Dockerfile. The lack of documentation is a problem, as it’s hard to find examples of using Rkt or guides on best practises. Kubernetes has recently announced supporting Rkt for managementing containers instead of Docker, which is great as it gives a choice of what container format/tools you can use.

Rkt is quite immature compared to Docker, right now it would be irresponsible to use it in production. When it reaches v1 and has been used in serious production environments it could well be a better choice than Docker due to it’s strong focus on standards and security. This may be a deciding factor between the two when containerisation becomes more widespread and hosting providers begin offering virtualisation via containers rather than Virtual Machines, to take advantage of the cost/density benefits.

Docker currently has a big advantage in ease of use, documentation and the platform they have made. This makes Docker a much better option to learn about containerisation and try it out. Also many cloud hosting providers are falling over themselves to offer containerisation and right now Docker is the only option, this gives it a huge lead over the alternatives. As there are tools to convert your Docker images into ACI format it’s not a huge lock-in risk to start developing and deploying containerised applications in Docker, as it will be possible to change your mind later.

For developers, I’d recommend you start with Docker to get your head around the containerisation concepts as the documentation is great. But you should be aware that Docker is not the only option. Containerisation has been around for ages and Docker seems to be trying to make itself a way to bridge the gap between IaaS and PaaS, abstracting away important details that as a developer you need to be aware of to make production ready code.

Useful links