1. Overview

1.1. Abstract

Container technology is a popular packaging method for developers and system administrators to build, ship and run distributed applications. Production use of image-based container technology requires a disciplined approach to development and writing Dockerfiles and defining containerized applications can become rather complex. Therefore, we’re passing on our experience in this document, which aims to provide guidance and recommendations for creation, deployment and usage of containerized applications.

This guide assumes the reader has at least basic knowledge of the containerization technologies. The document also isn’t a reference guide of all the Dockerfile instructions. Should you require this kind of content, the documentation is available on the Docker website.

1.2. Contributing

This is a continuously evolving work with the source files available on GitHub. If you find any inconsistencies, mistakes or typos, please use the issue tracker to report them. If you wish to participate, feel free to open a pull request.

2. Terminology

2.1. Dictionary

When discussing containerization, it’s important to have a solid grasp on the related vocabulary. One of the challenges people have is that many of the following terms are used interchangeably. It can be confusing, especially for newcomers.

The goal of this section is to clarify these terms, so that we can speak the same language.

2.1.1. Container Image

Container image is a filesystem tree that includes all of the requirements for running a container, as well as metadata describing the content. You can think of it as a packaging technology.

2.1.2. Container

A container is composed of two things: a writable filesystem layer on top of a container image, and a traditional linux process. Multiple containers can run on the same machine and share the OS kernel with other containers, each running as an isolated processes in the user space. Containers take up less space than VMs (application container images are typically tens of MBs in size), and start almost instantly.

2.1.3. Repository

When using the docker command, a repository is what is specified on the command line, not an image. In the following command, “fedora” is the repository.

docker pull fedora

This is actually expanded automatically to:

docker pull docker.io/library/fedora:latest

This can be confusing, and many people refer to this as an image or a container image. In fact, the docker images sub-command is what is used to list the locally available repositories. Conceptually, these repositories can be thought about as container images, but it’s important to realize that these repositories are actually made up of layers.

When we specify the repository on the command line, the Docker daemon is doing some extra work for you. The Docker daemon (not the client tool) is configured with a list of servers to search. In our example above, the daemon will search for the “fedora” repository on each of the configured servers.

In the above command, only the repository name was specified, but it’s also possible to specify a full URL address with the Docker client. To highlight this, let’s start with dissecting a full address.

REGISTRY[:PORT]/NAMESPACE/REPOSITORY[:TAG]

The full URL is made up of a standard server name, a namespace, and optionally a tag. There are actually many permutations of how to specify a URL and as you explore the Docker ecosystem, you will find that many pieces are optional. The following commands are all valid and all pull some permutation of the same repository:

docker pull docker.io/library/fedora:latest
docker pull docker.io/library/fedora
docker pull library/fedora
docker pull fedora

2.1.4. Image Layer

Repositories are often referred to as images or container images, but actually they are made up of one or more layers. Image layers in a repository are connected together in a parent-child relationship. Each image layer represents some pieces of the final container image.

2.1.5. Registry

A registry server, is essentially a fancy file server that is used to store Docker repositories. Typically, the registry server is specified as a normal DNS name and optionally a port number to connect to. Much of the value in the Docker ecosystem comes from the ability to push and pull repositories from registry servers.

When a Docker daemon does not have a locally cached copy of a repository, it will automatically pull it from a registry server. Usually the default registry is set to docker.io (Docker Hub). It is important to stress, that there is implicit trust in the registry server.

You must determine how much you trust the content provided by the registry and you may want to allow or block certain registries. In addition to security, there are other concerns such as users having access to licensed software and compliance issues. The simplicity with which Docker allows users to pull software makes it critical that you trust upstream content.

2.1.6. Namespace

A namespace is a tool for separating groups of repositories. On the public DockerHub, the namespace is typically the username of the person sharing the image, but can also be a group name, or a logical name.

2.1.7. Tag

When an image builder creates a new repository, they will typically label the best image layers to use. These are called tags and typically map to versions of software contained in the repository. In other words, tags are how various images in a repository are distinguished from each other.

2.2. Container Use Cases

There are many types of Container design patterns forming. Since containers are the runtime version of a container image, the way a container is built is tightly coupled to how it is run.

Some Container Images are designed to be run without privileges while others are more specialized and require root-like privileges. There are many dimensions in which patterns can be evaluated and often users will see multiple patterns or use cases tackled together in one container image/container.

This section will delve into some of the common use cases that users are tackling with containers.

2.2.1. Application Containers

Applications containers are the most popular form of containers. These are what developers and application owners care about. Application containers contain the code that developers work on. These include, for example, MySQL, Apache, MongoDB, and Node.js.

2.2.2. Cattle vs Pet Containers

Containers are usually perceived as a technology that serves for deploying applications that are immutable and can be therefore redeployed or killed any time without severe consequences. As an analogy, these are often referred to as "cattle". Containers in this development environment don’t have "identity", the user doesn’t need to care where the contianers live in the cluster, the containers are automatically recovered after failures and can be scaled up or down as needed. In contrast, when a pet container fails, the running application will be directly affected and might fail as well. Similarly as pets, pet containers require user’s closer attention and management and are usually accompanied with regular health checks. A typical example would be a containerized database.

2.2.3. Super Privileged Containers

When building container infrastructure on dedicated container hosts such as Atomic Host, system administrators still need to perform administrative tasks. Whether used with distributed systems, such as Kubernetes or OpenShift or standalone container hosts, Super Privileged Containers (SPCs) are a powerful tool. SPCs can even do things like load specialized kernel modules, such as with systemtap. In an infrastructure that is built to run containers, administrators will most likely need SPCs to do things like management, monitoring, backups, etc. It’s important to realize that there is typically a tighter coupling between SPCs and the host kernel, so administrators need to choose a rock solid container host and standardize on it, especially in a large clustered/distributed environment where things are more difficult to troubleshoot. They then need to select a user space in the SPC that is compatible with the host kernel.

2.3. Image Types

2.3.1. Base Images

A base image is one of the simplest types of images, but you will find a lot of definitions. Sometimes users will also refer an application image as the “base image.” However, technically, this is not a base image, these are Intermediate images.

Simply put, a base image is an image that has no parent layer. Typically, a base image contains a fresh copy of an operating system. Base images normally include core system tools, such as bash or coreutils and tools necessary to install packages and make updates to the image over time (yum, rpm, apt-get, dnf, microdnf…​) While base images can be “hand crafted”, in practice they are typically produced and published by open source projects (like Debian, Fedora or CentOS) and vendors (like Red Hat). The provenance of base images is critical for security. In short, the sole purpose of a base image is to provide a starting place for creating your derivative images. When using a Dockerfile, the choice of which base image you are using is explicit:

FROM registry.fedoraproject.org/fedora

2.3.2. Builder Images

These are a specialized form of container images which produce application container images as offspring. They include everything but a developer’s source code. Builder images include operating system libraries, language runtimes, middleware, and the source-to-image tooling.

When a builder image is run, it injects the developers source code and produces a ready-to-run offspring application container image. This newly created application container image can then be run in development or production.

For example, if a developer has PHP code and they want to run it in a container, they can use a PHP builder image to produce a ready to run application container image. The developer passes the GitHub URL where the code is stored and the builder image does the rest of the work for them. The output of a Builder container is an Application container image which includes Red Hat Enterprise Linux, PHP from Software Collections, and the developer’s code, all together, ready to run. Builder images provide a powerful way to go from code to container quickly and easily, building off of trusted components.

2.3.3. Intermediate Images

An Intermediate image is any container image which relies on a base image. Typically, core builds, middleware and language runtimes are built as layers on “top of” a base image. These images are then referenced in the FROM directive of another image. These images are not used on their own, they are typically used as a building block to build a standalone image.

It is common to have different teams of specialists own different layers of an image. Systems administrators may own the core build layer, while “developer experience” may own the middleware layer. Intermediate Images are built to be consumed by other teams building images, but can sometimes be ran standalone too, especially for testing.

2.3.4. Intermodal Images

Intermodal container images are images that have hybrid architectures. For example, many Red Hat Software Collections images can be used in two ways.

First, they can be used as simple Application Containers running a fully contained Ruby on Rails and Apache server.

Second, they can be used as Builder Images inside of OpenShift Container Platform. In this case, the output child images which contain Ruby on Rails, Apache, and the application code which the source-to-image process was pointed towards during the build phase.

The intermodal pattern is becoming more and more common to solve two business problems with one container image.

2.3.5. Deployer Images

A deployer image is a specialized kind of container which, when run, deploys or manages other containers. This pattern enables sophisticated deployment techniques such as mandating the start order of containers, or first run logic such as populating schema or data.

2.3.6. Containerized Components

A container that is meant to be deployed as part of a larger software system, not on its own. Two major trends are driving this.

First, microservices are driving the use of best-of-breed components - this is also driving the use of more components combined together to build a single application. Containerized components are meeting the need to deploy an expanding quantity of complex software more quickly and easily.

Second, not all pieces of software are easy to deploy as containers. Sometimes, it makes sense to containerize only certain components which are easier to move to containers or provide more value to the overall project. With multi-service application, some services may be deployed as containers, while others may be deployed through traditional a traditional methodology such as an RPM or installer script.

It’s important to understand that containerized components are not designed to function on their own. They provide value to a larger piece of software, but provide very little value on their own.

3. Application Planning

As you begin to contemplate the containerization of your application, there are number of factors that should be considered prior to authoring a Dockerfile. You will want to plan out everything from how to start the application, to network considerations, to making sure your image is architected in a way that can run in multiple environments like Atomic Host or OpenShift.

The very act of containerizing any application presents a few hurdles that are perhaps considered defacto in a traditional Linux environment. The following sections highlight these hurdles and offer solutions which would be typical in a containerized environment.

3.1. Persistent Storage: Simple Database Server

Although transience is the main benefit of containers, being able to preserve data after a container terminates is often an essential requirement for production environments. Below are outlined common complications the user can encounter and possible solutions.

3.1.1. Traditional database server

One of the simpler environments in the IT world is a database that serves one or more client nodes. A corporate directory is an example most of us can identify with. Consider the figure below.

simple db
Figure 1. Simple database topology

In this scenario, we have a single server running a Linux distribution. The server functions largely as a database server (perhaps postgres) for other clients that can connect to it on the network. The database is capable of connecting to the clients on the network using the standard TCP/IP network stack and typically a combination of TCP socket and port. In the case of posgres, the default port is 5432.

More importantly, many database implementations store the database files on reliable, enterprise storage such as SANs or robust RAID arrays. This is done to obviously protect the database from data loss. By default, containers have immutable storage; and therefore, if the container is deleted, your data will be lost. As a developer, you will need to understand and design your containerization in a way that will allow for data to persist regardless of the state of the container.

3.1.2. Containerized Environment

In the case of a database server, retaining your data can be critical. The default storage for containers themselves is not persistent but it can be with a little planning. The most common way to allow for data persistence is using one of the several methods already available to docker or your chosen deployment platform. The following figure is a simplified containerized topography of the same database from above.

simple db containerized
Figure 2. Containerized database

Note how the container host, like the traditional Linux deployment, has enterprise storage associated with it. Through the use of docker volumes, we can assign storage to containers and those volumes will persist irregardless of the state of the container.

For more information about planning for persistent storage, check out the Storage Options section.

3.2. Container Interconnection: Database Server with Local and Distributed Clients

By definition, distributed application components need to communicate with one another. Container technologies encourage developers to make these interconnection points explicit and provide a number of mechanisms to coordinate or otherwise enable communication between containers.

3.2.1. Traditional Database Server/Environment

Consider the database example in the previous section. Once we have established persistent storage for the database server, we also need to consider how database clients will connect to it. In nearly all cases these connections will occur through a socket, over the network or locally via UNIX domain socket special file.

Simple non-distributed applications may assume that a database is co-located on the same server and use an established port number, or UNIX domain socket path, as their default access mechanism.

True multi-node distributed applications may host the database as a distinct node in which case communication must occur via the network. Clients that wish to use the database must be made aware of its location, either via explicit configuration or a service discovery mechanism.

interconnect single
Figure 3. Traditional DB environment using both socket and TCP/IP connections

3.2.2. Container Environment - Docker

The previous example shows a traditional database where a single node allows both socket and port (TCP/IP) connections. If we were to "containerize" the database server and the database client into seperate containers, this would present a slight challenge in the architecture due to the container namespacing. Consider the following image:

single node mult containers
Figure 4. Single Node Database with server and client in separate containers

In this setup, there are actually two clients. One is containerized and the other is executing from the container host directly. The database is also containerized but isolated by namespacing as well. The database client executing on the host can still communicate with the containerized database server via TCP/IP because Docker has an internal network for containers to communication with each other and the host. Once an interconnection mechanism has been established a container developer must ensure that service containers are properly configured to allow access to these connections.

Some container coordination frameworks, such as Kubernetes, attempt to simplify this use case for containers co-located on a single node by sharing the network port space between node-local containers.

Further details and examples of networking interconnect options for various container frameworks and scenarios can be found in the network considerations section of this document.

For non-network connections between containers on a single node, shared filesystem locations, either for domain sockets or actual filesystem content, must be set up at the time the containers are launched. Docker, for example, allows mapping a host directory to a container directory by adding the following argument to the run command:

-v <host_directory>:<container_directory>

In our DB server example, assuming the database maintains a listening UNIX domain socket in /var/run/postgres we could launch both our server and client with the following argument included:

-v /var/run/postgres:/var/run/postgres

This will ensure that both the server and client see the same directory content, exported from the host, allowing the client to connect using the expected/default location.

Further details and example can be found in the storage considerations section of this document.

Another iteration on this scenario would be where the database server and clients are on different nodes and require network access to communicate. In this case, you must ensure that Docker not only exposes a port for the database container but that a port is also exposed to the network so other clients can communicate with it.

multi node single container
Figure 5. Multiple node deployment where server and client are separated

Notice how in this scenario, the database server is still containerized but the client resides on a different node. For network connections, Docker provides a simple directive in the Dockerfile to expose a port from the running container. For example, to create a Postgres DB server container that listens on the default Postgres port, you would add the following line:

EXPOSE 5432

You then also need to ensure that you perform the port mapping when the container runs using either the -P or -p flags.

3.3. Data Initialization

A truly stateless application component acquires all configuration information via combination of discovery or injection via a cloud, container or configuration management framework. It assumes that all local storage is ephemeral and that any data that requires persistence beyond a shutdown, reboot or termination must be stored elsewhere.

In practice, many applications are not truly stateless in the sense defined above. Instead, they require at least some element of persistent state or storage to be made available or "attached" to the component at the time it is launched. Frequently, this storage must be initialized the first time the application is run.

3.3.1. Examples

  • Creation of schema/tables and initial population of a relational database

  • Initialization of configuration data, such as the location of a central server or executive to which the application component should connect.

  • Initialization of unique identifying information such as a UUID, key pair or shared secrets

3.3.2. Key Issues

  • Tracking whether initialization has occurred to ensure it only happens once or, at the very least, only occurs when the user wants it to

  • Selecting persistence between restarts of a component versus persistence beyond termination/removal

  • If data is persistent beyond termination of the component, re-associating the persistent storage with a freshly launched instance of the component (be it a VM or a container)

  • If data is persistent across restarts and updates to an underlying container image, ensuring that the “old” persistent data is still available. Users might expect behavior similar to RPMs in this area.

3.3.3. General Approaches and Patterns - Containers

Two common patterns have emerged to address components that require one time data initialization.

  • Automatic Initialization - In this pattern, any component that requires data initialization incorporates a check into initial start up. If the check determines that persistent data is already present, it continues as normal. If persistent data is not present, it performs the required data initialization before moving on to normal start up.

  • Explicit Initialization - In this pattern users must explicitly execute an initialization step prior to running an application component. Details may differ depending on the specific container tool or framework being used.

3.3.4. Persistent Storage in Docker

Docker containers provide persistence a few different ways. Changes to the local file system of a running container persist across starts and stops but are lost if the container is ever removed. If a user requires persistence beyond removal, Docker provides the concept of "Volumes" which are available in two flavors.

  • "Data volumes" are directories that exist outside of the container file system and whose contents persist even after a container is terminated.

  • "Bind mounts" are host directories that can also be directly mounted into a running container.

For more details on Docker storage configurations see the storage considerations section of this guide.

3.3.5. Framework Support

This is an area of active development within the various container management frameworks and there is no silver bullet.

Generally speaking, if an application component does not provide some mechanism for automatic initialization it falls to the user to identify and perform any expected explicit storage initialization. It is also the user’s responsibility to track the resulting persistent storage objects during the removal/termination/restart of a container or an update to the underlying container image.

Explicit Initialization - Atomic CLI

The one exception is the Atomic CLI (aka "atomic run") which provides support within its metadata format for encoding any required explicit initialization steps.

3.4. Security and user requirements

3.5. Host and image synchronization

Some applications that run in containers require the host and container to more or less be synchronized on certain attributes so their behaviors are also similar. One such common attribute can be time. The following sections discuss best practices to keeping those attributes similar or the same.

3.5.1. Time

Consider a case where multiple containers (and the host) are running applications and logging to something like a log server. The log timestamps and information would almost be entirely useless if each container reported a different time than the host.

The best way to synchronize the time between a container and its host is through the use of bind mounts. You simply need to bind mount the host’s /etc/localtime with the container’s /etc/localtime. We use the ro flag to ensure the container can’t modify the host’s time zone by default.

In your Dockerfile, this can be accomplished by adding the following to your RUN label:

Synchronizing the timezone of the host to the container.
-v /etc/localtime:/etc/localtime:ro

3.5.2. Machine ID

The /etc/machine-id file is often used as an identifier for things like applications and logging. Depending on your application, it might be beneficial to also bind-mount the machine ID of the host into the container. For example, in many cases journald relies on the machine ID for identification. The sosreport application also uses it. To bind-mount the machine ID of the host to the container, specify the following when running your container or add the following line to your atomic RUN label so atomic can utilize it:

Synchronizing the host machine ID with a container
-v /etc/machine-id:/etc/machine-id:ro

3.6. Starting your applications within a container

At some point in the design of your Dockerfile and image, you will need to determine how to start your application. There are three prevalent methods for starting applications:

  • Call the application binary directly

  • Call a script that results in your binary starting

  • Use systemd to start the application

For the most part, there is no single right answer on which method should be used; however, there are some softer decision points that might help you decide which would be easiest for you as the Dockerfile owner.

3.6.1. Calling the Binary Directly

If your application is not service-oriented, calling the binary directly might be the simplest and most straight-forward method to start a container. There is no memory overhead and no additional packages are needed (like systemd and its dependencies). However, it is more difficult to deal with setting environment variables.

3.6.2. Using a Script

Using a special script to start your application in a container can be a handy way to deal with slightly more complex applications. One upside is that it is generally trivial to set environment variables. This method is also good when you need to call more than a single binary to start the application correctly. One downside is that you now have to maintain the script and ensure it is always present in the image.

3.6.3. Use systemd

Using systemd to start your application is great if your application is service-oriented (like httpd). It can benefit from leveraging well tested unit files generally delivered with the applications themselves and therefore can make complex applications that require environment variables easy to work with. One disadvantage is that systemd will increase the size of your image and there is a small amount of memory used for systemd itself.

Note
As of docker-1.10, the docker run parameter of --privileged is no longer needed to use systemd within a container.

You can implement using systemd fairly simply in your Dockerfile.

3.7. Network Considerations

3.7.1. Single Host

3.7.2. Multi Host

3.7.3. Networking in OpenShift

Establishing network connection between containers in OpenShift is different from the standard Docker container linking approach. OpenShift uses a built-in DNS so that services can be reached by the service DNS and service IP address. In other words, applications running in a container can connect to another container using the service name. For example, if a container running the MySQL database is a database service endpoint, database will be used as a hostname to connect to it from another container. In addition to DNS record, you can also use the environment variables with IP address of the service which are provided for every container running in the same project as the service. However, if the IP address (environment variable) changes, you will need to redeploy the container. Using service names is therefore recommended.

For details, see OpenShift Documentation.

3.8. Storage Considerations

When you architect your container image, storage can certainly be a critical consideration. The power of containers is that they can mimic, replicate, or replace all kinds of applications and therein lies why you must be careful in considering how you deal with storage needs.

By nature, the storage for containers is ephemeral. This makes sense because one of the attractions of containers is that they can be easily created, deleted, replicated, and so on. If no consideration to storage is given, the container will only have access to its own filesystem. This means if the container is deleted, whatever information, whether it is logs or data, will be lost. For some applications, this is perfectly acceptable if not preferred.

However, if your application generates important data that should be retained or perhaps could be shared amongst multiple containers, you will need to ensure that this storage is set up for the user.

3.8.1. Persistent Storage for Containers: Data Volumes

Docker defines persistent storage in two ways.

  1. Data volumes

  2. Data volume containers

However at present, the use of data volumes is emerging to be the preferred storage option for users of Docker. The Docker website defines a data volume as "a specially-designated directory within one or more containers that bypasses the Union File System." It has the distinct advantages that they can be shared and reused for one or more containers. Moreover, a data volume will persist even if the associated container is deleted.

Data volumes must be explicitly created and preferably should be named to provide it with a meaningful name. You can manually create a data volume with the docker volume create command.

Creating a data volume for persistent storage
$ docker volume create <image_name>
Note
You can also specify a driver name with the -d option
Using Data Volumes in a Dockerfile

For developers whose applications require persistent storage, the trick will be instantiating the data volume prior to running the image. This, however, can be achieved leveraging the LABEL metadata and applications like atomic.

We recommend that the data volume be created through the use of the INSTALL label. The INSTALL label is meant to identify a script that should be run prior to ever running the image. In that install script, adding something like the following can be used to create the data volume.

Creating a data volume in your install script
chroot /host /usr/bin/docker volume create <image_name>

To then use the data volume, the RUN label would need to use the bind mount feature. Adding the following to your RUN label would bind mount the data volume by name:

Adding a data volume by name into your RUN label
-v <data_volume_name>:/<mount_path_inside_container>

3.8.2. Persistent Storage for Containers: Mounting a Directory from the Host

You can also leverage the host filesystem for persistent storage through the use of bind mounts. The basic idea for this is to use a directory on the host filesystem that will be bind mounted into the container at runtime. This can be simply used by adding a bind mount to your RUN label:

Bind mounting a directory from the rootfs to a running container
-v /<path_on_the_rootfs>:/<mount_path_inside_container>

One downside to this approach is that anyone with privileges to that directory on the host will be able to view and possibly alter the content.

3.8.3. OpenShift Persistent Storage

3.8.4. Storage Backends for Persistent Storage

3.9. Logging

If your application logs actions, errors, and warnings to some sort of log mechanism, you will want to consider how to allows users to obtain, review, and possibly retain those logs. The flexibility of a container environment can however present some challenges when it comes to logging because typically your containers are separated by namespace and cannot leverage the system logging without some explicit action by the users. There are also several solutions for logging containers like:

  • using a logging service like rsyslog or fluentd

  • setting the docker daemon’s log driver

  • logging to a file shared with the host (bind mounting)

As a developer, if your application uses logging of some manner, you should be thinking about how you will handle your log files. Each of the aforementioned solutions has its pros and cons.

3.9.1. Using a Logging Service

Most traditional Linux systems use a logging service like rsyslog to collect and store its log files. Often the logging service will coordinate logging with journald but nevertheless it too is a service and will accept log input.

If your application uses a logger and you want to take advantage of the host’s logger, you can bind mount /dev/log between the host and container as part of the RUN label like so:

-v /dev/log:/dev/log

Depending on the host distribution, log messages will now be in the host’s journald and subsequently in /var/log/messages assumming the host is using something like rsyslog.

3.9.2. Setting the Log Driver for docker Daemon

Docker has the ability to configure a logging driver. When implemented, it will impact all containers on the system. This is only useful when you can ensure that the host will only be running your application as this might impact other containers. Therefore this method has limited usefulness unless you can ensure the final runtime environment.

3.9.3. Using Shared Storage with the Host

The use of persistent storage can be another effective way to deal with log files whether you choose to perform a simple bind mount with the host or data volumes. Like using a logging service, it has the advantage that the logs can be preserved irregardless of the state of the container. Shared storage also reduces the potential to chew up filesystem space assigned to the container itself. You can bind mount either a file or directory between host and container using the -v flag in your RUN label.

-v <host_dir|file>:<image_dir|file>

3.9.4. Logging in OpenShift

OpenShift automatically collects logs of image builds and processes running inside containers. The recommended way to log for containers running in OpenShift is to send the logs to standard output or standard error rather than storing it in a file. This way, OpenShift can catch the logs and output them directly in the console, as seen in the picture below, or on the command line (oc logs).

openshift logs

For details on logs aggregation, see the OpenShift docuemntation.

3.10. Security and User Considerations

3.10.1. Passing Credentials and Secrets

Storing sensitive data like credentials is a hot topic, especially because Docker does not provide a designated option for storing and passing secret values to containers.

Currently, a very popular way to pass credentials and secrets in a container is specifying them as environment variables at container runtime. You as a consumer of such an image don’t expose any of your sensitive data publicly.

However, this approach also has caveats:

  • If you commit such a container and push your changes to a registry, the final image will contain also all your sensitive data publicly.

  • Processes inside your container and other containers linked to your container might be able to access this information. Similarily, everything you pass as an environment variable is accessible from the host machine using docker inspect as seen in the mysql example below.

# docker run -d --name mysql_database -e MYSQL_USER=user -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=db -p 3306:3306 openshift/mysql-56-centos
# docker inspect openshift/mysql-56-centos

<snip>

"Env": [
            "MYSQL_USER=user",
            "MYSQL_PASSWORD=password",
            "MYSQL_DATABASE=db",

<snip>

There are other ways how to store secrets and although using environment variables might lead to leaking private data in certain corner cases, it still belongs to the safest workarounds available.

There are a couple of things to keep in mind when operating with secrets:

  • For obvious reasons, you should avoid using default passwords - users tend to forget to change the default configuration and in case a known password leaks, it can be easily misused.

  • Although squashing removes intermediate layers from the final image, secrets from those layers will still be present in the build cache.

Handling Secrets in Kubernetes

Containers running through Kubernetes can take advantage of the secret resource type to store sensitive data such as passwords or tokens. Kubernetes uses tmpfs volumes for storing secrets. To learn how to create and access these, refer to the Kubernetes User Guide.

Other Projects Facilitating Secret Management

Custodia

Custodia is an open-source project that defines an API for storing and sharing secrets such as passwords and certificates in a way that keeps data secure, manageable and audiatable. Custodia uses the HTTP protocol and a RESTful API as an IPC mechanism over a local Unix Socket. Custodia is fully modular and users can control how authentication, authorization and API plugins are combined and exposed. You can learn more details on the project’s github repository or wiki page.

Vault

Another open-source project that aims to handle secure accessing and storing of secrets is Vault. Detailed information about the tool and use cases can be found on the Vault project website.

3.10.2. User NameSpace Mapping (docker-1.10 feature)

3.11. Preventing Your Image Users from Filling the Filesystem

Most default docker deployments only set aside about 20GB of storage for each container. For many applications, that amount of storage is more than enough. But if your containerized application produces significant log output, or your deployment scenario restarts containers infrequently, file system space can become a concern.

The first step to preventing the container filesystem from filling up is to make sure your images are small and concise. This obviously will reduce how much space your image consumes from the filesystem right away. However, as a developer, the following techniques can be used by you to manage the container filesystem size.

3.11.1. Ask for a Larger Storage Space on Run

One solution to dealing with filesystem space could be to increase the amount of storage allocated to the container when your image is run. This can be achieved with the following switch to docker run.

Increase the container storage space to 60GB
--storage-opt size:60

If you are using a defined RUN label in your Dockerfile, you could add the switch to that label. However, abuse of this switch could lead to irking users and you should be prudent in using it.

3.11.2. Space Considerations for Logging

Logging can sometimes unknowingly consume disk space, particularily when a service or daemon has failed. If your application performs logging, or more specifically verbose logging, consider the following approaches to help keep filesystem usage down:

3.12. Deployment Considerations

Preparing applications for production distribution and deployment must carefully consider the supported deployment platforms. Production services require high uptime, injection of private or sensitive data, storage integration and configuration control. The deployment platform determines methods for load balancing, scheduling and upgrading. A platform that does not provide these services requires additional work when developing the container packaging.

3.12.1. Platform

3.12.2. Lifecycle

3.12.3. Maintenance

3.12.4. Build infrastructure

4. Creating Images

The base unit of creating an image is the Dockerfile itself. This section focuses on the instructions that make up a Dockerfile.

This chapter will not cover every Dockerfile instruction available but instead will focus on specific ones that we want to re-enforce to those who develop Dockerfiles. Docker has published a reference guide already covering each of the Dockerfile instructions. In addition, upstream docker has a nice description of best practices for Dockerfiles. It describes the various instructions that can be used to compose a Dockerfile and their best usage. Familiarize yourself with these recommendations.

4.1. Creating Base Images

4.1.1. Choosing Base Image

Images that have no parent are called base images. Docker images usually have their own root filesystem with an operating system installed. So when you want to create a new image, it either has to be based on an image that actually provides an operating system or you will need to create this layer in your image. The only difference to this are super minimal images that instead of an operating system provide only a single binary as described later in the text. There is a wide variety of base images already available on Docker Hub, so the simplest solution is to use one from there. Here are a few things that should help you determine which base image will fit your needs:

  • Linux distribution - Your personal preference and experience will play an important role in choosing one distribution over another. However, you should consider whether your containerized application requires specific libraries or tools from a specific system.

  • Image size - Base images usually contain a minimal operating system with a set of tools needed for basic operations. To preserve your environment small and efficient, size should also be taken into account when picking the right base image. The size varies; you can take advantage of super small base images, such as 2MB busybox, or use a standard minimal operating system, such as Fedora or CentOS that are up to 200MB in size.

  • Updates - Not all community images are necessarily rebuilt on a regular basis or when security vulnerabilities are addressed. You should therefore consider using base images from "official repositories" on Docker Hub, and confirm their update policy in advance.

4.1.2. Creating Base Image

Once you’ve considered all options and decided to create your own base image, the process will mostly depend on the distribution you chose. Note that the major distributions have their source files available on GitHub so you still might want to consider creating an issue or opening a pull request to suggest a change in the feature set or any adjustment. Docker documentation suggests two approaches to creating a base image: using tar and building an image "FROM scratch".

Using tar

Using the tar tool is a simple way how to build a base image. As a prerequisite, you will need to set up a directory structure for chroot with all items that you wish to be part of the base image. There are various tools that might help you with this, for example debootstrap for Debian systems or supermin for RPM-based systems.

Once you have your chroot directory ready, it is as simple as running:

# tar -C <chroot_dir> -c . | docker import - <new_image_name>

Note that docker provides a set of scripts for base image creation that take advantage of tar: https://github.com/docker/docker/tree/master/contrib. Popular distributions then use their own build systems that usually also utilizes tar. For example Fedora’s koji.

FROM scratch

"scratch" is a special repository in the Docker Hub registry, created using an empty tarball. It is not meant to be pulled or run, and at any such an attempt you will most likely encounter this message: 'scratch' is a reserved name. Using scratch is ideal for creating extremely minimal images, for example for containerizing single binaries. An example is available from Docker documentation. scratch is also very handy for creating standard distribution base images. But as with tar, you’ll first need to prepare a directory structure for chroot. After that, just add the directory in your Dockerfile as follows:

FROM scratch
ADD <chroot_dir> /
CMD ["/bin/bash"]

4.2. Creating System Images

The need for a container service to be started promptly before the Docker service starts supplies the requirements of a system container. The Open-vm-tools container is a system container utilizing runc as the runtime engine.

A system container normally starts as a regular Docker container:

FROM rhel7:7.4-ondeck

LABEL summary="The open-vm-tools guest agent" \
      io.k8s.description="The open-vm-tools agent is providing information about the virtual machine and allows to restart / shutdown the machine via VMware products. This image is intended to be used with virtual machines running Red Hat Enterprise Linux Atomic Host." \
      name="rhel/open-vm-tools" \
      version="7.4" \
      com.redhat.component="open-vm-tools-docker" \
      maintainer="davis phillips <dphillip@redhat.com>"

ENV SYSTEMD_IGNORE_CHROOT=1

RUN yum-config-manager --enable rhel-7-server-rpms || :
RUN yum -y --setopt=tsflags=nodocs install file open-vm-tools perl net-tools iproute systemd
RUN yum clean all

COPY tmpfiles.template service.template config.json.template /exports/
COPY init.sh /usr/bin/

LABEL run="docker run  --privileged -v /proc/:/hostproc/ -v /sys/fs/cgroup:/sys/fs/cgroup  -v /var/log:/var/log -v /run/systemd:/run/systemd -v /sysroot:/sysroot -v=/var/lib/sss/pipes/:/var/lib/sss/pipes/:rw -v /etc/passwd:/etc/passwd -v /etc/shadow:/etc/shadow -v /tmp:/tmp:rw -v /etc/sysconfig:/etc/sysconfig:rw -v /etc/resolv.conf:/etc/resolv.conf:rw -v /etc/nsswitch.conf:/etc/nsswitch.conf:rw -v /etc/hosts:/etc/hosts:rw -v /etc/hostname:/etc/hostname:rw -v /etc/localtime:/etc/localtime:rw -v /etc/adjtime:/etc/adjtime --env container=docker --net=host  --pid=host IMAGE"

CMD /usr/bin/vmtoolsd

Note, the following line:

COPY tmpfiles.template service.template config.json.template /exports/

This line sets up to stage the systems container. The important components for this are service.template and config.json.template. The service.template is the systemd unit template for when the systems container is installed via atomic install.

[Unit]
Description=Service for virtual machines hosted on VMware
Documentation=http://github.com/vmware/open-vm-tools
ConditionVirtualization=vmware

[Service]
ExecStartPre=/bin/bash -c 'systemctl import-environment'
ExecStartPre=/bin/bash -c 'export -p > /tmp/open-vm-tools-bash-env'
ExecStart=$EXEC_START
ExecStop=$EXEC_STOP
WorkingDirectory=$DESTDIR

[Install]
WantedBy=multi-user.target

The config.json.template is similar to the Docker run label or line. Below shows the environment variables, bind mounts and the command to execute "/usr/bin/init.sh"

{
    "ociVersion": "1.0.0",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "process": {
        "terminal": false,
        "user": {},
        "args": [
            "/usr/bin/init.sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm",
            "NAME=open-vm-tools",
            "SYSTEMD_IGNORE_CHROOT=1"
        ],
...
omitted
    "mounts": [
            {
            "destination": "/run/systemd",
            "type": "bind",
            "source": "/run/systemd",
            "options": [
                "rw",
                "rbind",
                "rprivate"
            ]
        },
...
omitted

Often times, executing a single command via the container is not enough. The above command init.sh stages the container environment and ensures both VGAuthService and vmtoolsd is executed inside the container.

#!/bin/sh
source /tmp/open-vm-tools-bash-env

COMMAND=/usr/local/bin/vmware-toolbox-cmd
if [ ! -e $COMMAND ]
  then
    echo 'runc exec -t open-vm-tools vmware-toolbox-cmd "$@"' > /usr/local/bin/vmware-toolbox-cmd
    chmod +x /usr/local/bin/vmware-toolbox-cmd
fi
exec /usr/bin/VGAuthService -s &
exec /usr/bin/vmtoolsd

Here are the commands to execute via the atomic CLI to install and convert a system container. Provided we have already built the open-vm-tools container from the Dockerfile listed above.

atomic pull --storage=ostree docker:open-vm-tools
atomic install --system open-vm-tools
systemctl start open-vm-tools

Similarly, we can pull this container from the Red Hat registry and install it in the same fashion.

atomic pull --storage ostree registry.access.redhat.com/rhel7/open-vm-tools
atomic install --system registry.access.redhat.com/rhel7/open-vm-tools
systemctl start open-vm-tools

The atomic install command installs the systemd unit file from the container from its /exports/ directory and sets the service to enable. The systemctl command below that starts the service immediately instead of awaiting a reboot.

More examples of system containers can be found here. This includes the open-vm-tools example for CentOS.

4.3. Small and Concise Images

It is preferable to create small and concise images whenever possible. This can be highly dependent on the application you are containerizing, but there are techniques to help you accomplish this. The following sections cover these techniques.

4.3.1. Chaining Commands

In general, having fewer layers improves readability. Commands that are chained together become a part of the same layer. To reduce the number of layers, chain commands together. Find a balance, though, between a large number of layers (and a great many commands), and a small number of layers (and obscurity caused by brevity).

A new layer is created for every new instruction defined. This does not necessarily mean that one instruction should be associated with only one command or definition.

Ensure transparency and provide a good overview of the content of each layer by grouping related operations together so that they together constitute a single layer. Consider this snippet:

Chained Dockerfile instruction
RUN dnf install -y --setopt=tsflags=nodocs \
    httpd vim && \
    systemctl enable httpd && \
    dnf clean all

Each command that is related to the installation and configuration of httpd is grouped together as a part of the same layer. This meaningful grouping of operations keeps the number of layers low while keeping the easy legibility of the layers high.

Using Double Ampersands (&&) vs Semi-colons (;)

In the RUN instruction of Dockerfiles, it is common to string together multiple commands for efficiency. Stringing commands together in the RUN instructions are typically done with ampersands or semi-colons. However, you should consider the implications of each and their usage. The following examples illustrate the difference.

Using semi-colons as instruction conjunctions
RUN do_1; do_2

This sort of conjunction will be evaluated into do_1 and then do_2. However, using the double ampersands results in a different evaluation.

Using double ampersands as conjunctions
RUN do_1 && do_2

The ampersands change the resulting evaluation into do_1 and then do_2 only if do_1 was successful.

The use of the double ampersands as conjunctions is a better practice in Dockerfiles because it ensures that your instructions are completed or the build will fail. If the build were to continue and you had not closely monitored the build (or its results), then the image may not be exactly as you desired. This is particularly true with automated build systems where you will want any failure to result in the failure of the build itself.

There are certainly use cases where semi-colons might be preferred and possibly should be used. Nevertheless, the possible result of an incomplete image should be carefully considered.

4.3.2. Clearing Packaging Caches and Temporary Package Downloads

Package managers can typically generate lots of metadata and also store downloaded content into a cache of sorts. To keep images and layers as small as possible, you should consider clearing out these caches of downloaded content. Note how the following example ends with yum -y clean all which removes deletable yum content.

A singular RUN instruction performing multiple commands
RUN yum install -y epel-release && \
    rpmkeys --import file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 && \
    yum install -y --setopt=tsflags=nodocs bind-utils gettext iproute\
    v8314 mongodb24-mongodb mongodb24 && \
    yum -y clean all

There are several package managers beyond yum that should be of note: dnf, rvm, gems, cpan, pip. Most of these managers have some form of a clean-up command that will handle excess cache created while performing their package management duties.

Below are examples pictured for dnf and rvm:

dnf cleanup example
RUN rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && \
    dnf -y install nodejs tar sudo git-all memcached postgresql-devel postgresql-server \
    libxml2-devel libxslt-devel patch gcc-c++ openssl-devel gnupg curl which && \
    dnf clean all \
Ruby (rvm) cleanup example
RUN /usr/bin/curl -sSL https://rvm.io/mpapis.asc | gpg2 --import - && \
    /usr/bin/curl -sSL https://get.rvm.io | rvm_tar_command=tar bash -s stable && \
    source /etc/profile.d/rvm.sh && \
    echo "gem: --no-ri --no-rdoc --no-document" > ~/.gemrc && \
    /bin/bash -l -c "rvm requirements && rvm install ruby 2.2.4 && rvm use 2.2.4 --default && \
    gem install bundler rake && \
    gem install nokogiri --use-system-libraries && \
    rvm cleanup all && yum clean all && rvm disk-usage all"

In the above example, notice the yum clean all command called after rvm; this is because some package managers like rvm rely on others (like yum in this case) to help perform their duties. Make sure to examine your container’s layers sizes to help determine where you can eliminate excess size and keep its footprint size to a minimum.

Here is a listing of some package managers and the applicable cleanup commands:

Table 1. Package Managers

Package Manager

Cleanup Command

yum

yum clean all

dnf

dnf clean all

rvm

rvm cleanup all

gem

gem cleanup

cpan

rm -rf ~/.cpan/{build,sources}/*

pip

rm -rf ~/.cache/pip/*

apt-get

apt-get clean

Clearing Package Cache and Squashing

If you squash your images after manual building or as part of an automated build process, it is not necessary to clean cache in every single relevant instruction/layer as the intermediate layers affect the previous ones in this case.

Simple example Dockerfiles below would both produce the same image if they were squashed:

Cache cleanup in a separate instruction
FROM fedora
RUN dnf install -y mariadb
RUN dnf install -y wordpress
RUN dnf clean all
Cache cleanup chained with the install command
FROM fedora
RUN dnf install -y mariadb wordpress && dnf clean all

However, without squashing, the first image would contain additional files and would be bigger than the second one:

Size comparison
# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
example             separate            54870d73715f        21 seconds ago       537.7 MB
example             chained             6a6156547888        About a minute ago   377.9 MB

Therefore, it is a good practice to write Dockerfiles in a way so that others can use it as a valid reference and are always able to reproduce the build. To ensure this, you should clean cache in every layer where applicable. In general, you should always aim to create images that are small and concise regardless of whether the final image is squashed or not.

Read more about suqashing and its repercussions in the Squashing layers section.

4.3.3. Removing Unnecessary Packages

In some cases, your image can end up with several packages that are not necessary to support the runtime of your application. A good example is when you actually build your application from source during the build of the image itself. Typically, when you build an application, you will pull in development (-devel) packages as well as toolchain-based packages like make and gcc. Once your application is built, you may no longer need these packages for runtime depending on how your application links to libraries.

Depending on your application and which packages you added to your image, you might need to iteratively attempt to remove packages checking to make sure your application still works. One suggestion would be to remove big parts of the toolchain. And then use your package manager’s command to clean up unused packages. In the case of dnf, you can remove unneeded packages like so:

Removing unnecessary packages with dnf
# dnf autoremove

You should run this command in an interactive shell (docker run -it --rm <image> /bin/bash) initially so you can get a feel for which packages will be removed. One upside to doing so is that you can then test run your application from the interactive shell to make sure it still works.

4.3.4. Removing Documentation

Another technique for reducing the image size is by limiting the documentation being installed. If you package manager supports such a thing and you have no expectations for users to use a shell to interact with your image, this might significantly reduce the size of your image.

Yum has an optional flag to not install documentation. The following example shows how to set the flag.

RUN yum install -y mysql --setopt=tsflags=nodocs

Note that the nodocs flag is used in some base images, for example CentOS and Fedora, and this setting gets inherited by the child layers. This can cause problems in case you decide to include the documentation in one of your layered images.

In this case, if you wish to have the documentation installed for packages from your single layer only, you have to empty the tsflags option as follows:

RUN yum -y install docker --setopt=tsflags=''

If you wish to have the documentation installed for packages from your single layer and the parent layers, you need to reinstall the packages with the empty tsflags option as follow:

RUN yum -y reinstall "*" --setopt-tsflags='' && yum install docker --setopt-tsflags=''

In case you need to have documentation included for every package from every single parent or child layer, the /etc/yum.conf file needs to be edited as follows:

RUN [ -e /etc/yum.conf ] && sed -i '/tsflags=nodocs/d' /etc/yum.conf || true
RUN yum -y reinstall "*"
RUN yum -y install <package>

4.3.5. Squashing Layers

Each instruction you create in your Dockerfile results in a new image layer being created. Each layer brings additional data that are not always part of the resulting image. For example, if you add a file in one layer, but remove it in another layer later, the final image’s size will include the added file size in a form of a special "whiteout" file although you removed it. In addition, every layer contains separate metadata that add up to the overall image size as well. So what are the benefits of squashing?

  • Performance - Since all layers are copy-on-write file systems, it will take longer to build the final container from many layers. Squashing helps reduce the build time.

  • Image size - Similarly, since an image is actually a collection of other images, the final image size is the sum of the sizes of component images. With squashing, you can prevent these unwanted size additions.

  • Organization - Squashing also helps you control the structure of an image, reduce the number of layers and organize images logically.

However, Docker does not yet support squashing natively, so you will have to work around it by using alternative approaches, some of which are listed below.

Experimental --squash Flag

As of version 1.13, Docker supports the --squash flag that enabled the squashing functionality. This is currently only available in experimental mode.

docker save

You can use docker save to squash all the layers of your image into a single layer. The save command was intended for this use, so this happens to be a side effect of the process. This approach, however, is not very practical for sharing as the user will be able to only download the whole content and cannot take advantage the caching. Note that the base image layer will be included as well and might be several hundreds of megabytes in size.

Custom Tools

You will surely find a lot of utilities on the internet that facilitate layer squashing. We recommend taking advantage of Marek Goldmann’s docker-squash, which automates layer squashing, is kept up-to-date and has been tested by the community.

Repercussions of Squashing
  • When you squash an image, you will lose the history together with the metadata accompanying the layers.

  • Without the metadata, users building an image from a layered image that has been squashed are losing the idea that it happened.

  • Similarly, if you decide to include the parent layer from which your image is built into the resulting squashed image, you ultimately prevent others from seeing that this happened.

Look at the mongodb example:

# docker images openshift/mongodb-24-centos7
REPOSITORY                               TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
docker.io/openshift/mongodb-24-centos7   latest              d7c0c18b0ae4        16 hours ago        593.3 MB

Without squashing, you can see the complete history and how each of the layers occupies space.

# docker history docker.io/openshift/mongodb-24-centos7:latest
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
d7c0c18b0ae4        About an hour ago   /bin/sh -c #(nop) CMD ["run-mongod"]            0 B
63e2ba112add        About an hour ago   /bin/sh -c #(nop) ENTRYPOINT &{["container-en   0 B
ca996db9c281        About an hour ago   /bin/sh -c #(nop) USER [184]                    0 B
8593b9473058        About an hour ago   /bin/sh -c #(nop) VOLUME [/var/lib/mongodb/da   0 B
5eca88b7872d        About an hour ago   /bin/sh -c touch /etc/mongod.conf && chown mo   0 B
9439db8f40ad        About an hour ago   /bin/sh -c #(nop) ADD dir:f38635e83f0e6943cd3   17.29 kB
12c60945cbac        About an hour ago   /bin/sh -c #(nop) ENV BASH_ENV=/usr/share/con   0 B
e6073f9a949f        About an hour ago   /bin/sh -c #(nop) ENV CONTAINER_SCRIPTS_PATH=   0 B
619bf2ae5ed8        About an hour ago   /bin/sh -c yum install -y centos-release-scl    342.6 MB
ab5deeccfe21        About an hour ago   /bin/sh -c #(nop) EXPOSE 27017/tcp              0 B
584ded9dcbca        About an hour ago   /bin/sh -c #(nop) LABEL io.k8s.description=Mo   0 B
17e3bcd28e07        About an hour ago   /bin/sh -c #(nop) ENV MONGODB_VERSION=2.6 HOM   0 B
807a1e9c5a7b        16 hours ago        /bin/sh -c #(nop) MAINTAINER SoftwareCollecti   0 B
28e524afdd05        10 days ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
044c0f15c4d9        10 days ago         /bin/sh -c #(nop) LABEL name=CentOS Base Imag   0 B
2ebc6e0c744d        10 days ago         /bin/sh -c #(nop) ADD file:6dd89087d4d418ca0c   196.7 MB
fa5be2806d4c        7 months ago        /bin/sh -c #(nop) MAINTAINER The CentOS Proje   0 B

See how the history and size changes after squashing all layers in a single one (using the script above):

# docker history docker.io/openshift/mongodb-24-centos7:squashed
IMAGE               CREATED             CREATED BY          SIZE                COMMENT
90036ed9bd1d        58 minutes ago                          522.1 MB
  • One of the biggest benefits of using layers is the posibility to reuse them. Images are usually squashed into a single big layer, which does not allow for pushing partial updates in individual layers; instead, the whole image needs to be pushed into the registry upon a change. The same applies to pulling the image from the registry.

  • Some users might rely on suqashing when it comes to sensitive data. Be cautios because squashing is not meant to "hide" content. Even though squashing removes intermediate layers from the final image, information about secrets used in those layers will stay in the build cache.

4.4. Labels

Labels in Dockerfiles serve as a useful way to organize and document metadata used to describe an image. Some labels are only descriptive by nature, like name whereas others, like RUN can be used to describe action-oriented metadata. Labels are often leveraged by applications, like atomic or OpenShift, to help the image run as the author intended. Labels are primarily intended for descriptive purposes and can be viewed manually with the docker inspect <image_name> command.

4.4.1. When are Labels Required?

Labels are never required per se unless your build system or lifecycle management process requires them. However, the use of labels is highly recommended for a number of reasons:

  • As mentioned above, many container related tools can use the label metadata in meaningful ways often contributing to a better user experience.

  • The label metadata is always visible when inspecting the image. Therein, users can at least see the metadata even if their tooling does not make specific use of it. For example, the RUN label basically documents how you, as the author of the Dockerfile, expect this image to be run.

4.4.2. Descriptive Labels

The descriptive labels usually are alpha-numeric strings used to describe some aspect of the image itself. Examples, might be the version and release labels which could theoretically just be integer based. The following table describes labels that are meant to be purely descriptive in nature.

Table 2. Descriptive labels
Label Description Example

changelog-url

URL of a page containing release notes for the image

TBD

name

Name of the Image

"rhel7/rsyslog"

version

Version of the image

"7.2"

release

Release number of the image

"12"

architecture

Architecture for the image

"x86_64"

build-date

Date/Time image was built as RFC 3339 date-time

"2015-12-03T10:00:44.038585Z"

vendor

Owner of the image

"Red Hat, Inc."

URL

URL with more information about the image

TBD

Summary

Brief description of the image

TBD

Description

Longer description of the image

TBD

vcs-type

The type of version control used by the container source. Generally one of git, hg, svn, bzr, cvs

"git"

vcs-url

URL of the version control repository

TBD

vcs-ref

A 'reference' within the version control repository; e.g. a git commit, or a subversion branch

"364a…​92a"

authoritative-source-url

The authoritative location in which the image is published

TBD

distribution-scope

Intended scope of distribution for image. Possible values are private, authoritive-source-only, restricted, or public

private

4.4.3. Action-Oriented Labels

Most action-oriented labels will be a used in the context of a docker command in order for the container to behave in a desired way. The following table describes the defined action-oriented labels.

Table 3. Action-oriented labels
Label Description Example

debug

Command to run the image with debugging turned on

tbd

help

Command to run the help command of the image

tbd

run

Command to run the image

"docker run -d --privileged --name NAME --net=host --pid=host -v /etc/pki/rsyslog:/etc/pki/rsyslog -v /etc/rsyslog.conf:/etc/rsyslog.conf -v /etc/sysconfig/rsyslog:/etc/sysconfig/rsyslog -v /etc/rsyslog.d:/etc/rsyslog.d -v /var/log:/var/log -v /var/lib/rsyslog:/var/lib/rsyslog -v /run:/run -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -e IMAGE=IMAGE -e NAME=NAME --restart=always IMAGE /bin/rsyslog.sh"

uninstall

Command to uninstall the image

"docker run --rm --privileged -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /bin/uninstall.sh"

install

Command to install the image

"docker run --rm --privileged -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /bin/install.sh"

stop

Command to execute before stopping container

tbd

Labels are critical to properly identifying your image and influencing how it runs. For the purposes of identification, we recommend that you at least use the following labels:

  • name

  • version

  • release

  • architecture

  • vendor

And for actionable labels, we recommend you use at least the following:

  • RUN

  • INSTALL

  • UNINSTALL

These three are the most critical for ensuring that users run the image in the manner you wish. Furthermore, tools developed to read and act upon this meta data will work correctly.

In the case that you provide a help file that does not follow the standard of a man page, then the HELP label would also be prudent.

Images that are meant to be run in OpenShift are recommended to contain a set of labels as seen in the OpenShift Origin documentation. The labels are namespaced in compliance with the Docker format; that is io.openshift for OpenShift and io.k8s for Kubernetes.

See the following example snippet from the s2i-ruby image:

LABEL io.k8s.description="Platform for building and running Ruby 2.2 applications" \
      io.k8s.display-name="Ruby 2.2" \
      io.openshift.expose-services="8080:http" \
      io.openshift.tags="builder,ruby,ruby22"

For a comprenehsive list of recommended labels you might want to consider for your projects, see the Container Application Generic Labels git repository.

4.5. Template

4.6. Starting your Application

Generally the CMD instruction in the Dockerfile is used by docker to start your application when the image or container is started. In the planning section, we provided some reasoning for choosing how to start your application. The following subsections will show how to implement each choice in your Dockerfile.

4.6.1. Calling the Binary Directly

Being the simplest of the choices, you simply need to call the binary using the CMD instruction or define an ENTRYPOINT in your Dockerfile.

CMD ["/usr/bin/some_binary"]
Using the CMD Instruction

With CMD, you can identify the default command to run from the image, along with options you want to pass to it. If there is no ENTRYPOINT in the Dockerfile, the value of CMD is the command run by default when you start the container image. If there is an ENTRYPOINT in the Dockerfile, the ENTRYPOINT value is run as the command instead, with the value of CMD used as options to the ENTRYPOINT command.

The CMD instruction can be overridden when you run the image. Any time you add an argument to the end of a docker run command, the CMD instruction inside the container is ignored. For example, when running docker run -it myimage bash, whatever command is set in the CMD instruction in your Dockerfile will be overrriden and bash will be run instead.

Using the ENTRYPOINT Instruction

Like CMD, the ENTRYPOINT instruction lets you define the command executed when you run the container image, but it cannot be overridden by arguments you put at the end of the docker run command. If your Dockerfile includes an ENTRYPOINT instruction and there is also a CMD instruction, any arguments on the CMD instruction line are passed to the command defined in the ENTRYPOINT line.

This is the distinct advantage of the ENTRYPOINT instruction over the CMD instruction because the command being run is not overridden but it can be subsidized. Suppose you have an ENTRYPOINT instruction that displays two files. You could easily add an additional file to be displayed by adding it to the docker run command.

You can override the ENTRYPOINT command by defining a new entrypoint with the --entrypoint="" option on the docker command line.

4.6.2. Using a Script

Using a script to start an application is very similar to calling the binary directly. Again, you use the CMD instruction but instead of pointing at the binary you point at your script that was injected into the image. The registry.access.redhat.com/rhel7/rsyslog image uses a script to start the rsyslogd application. Lets look at the two relevant instructions in its Dockerfile that make this happen.

The following instruction injects our script (rsyslog.sh) into the image in the bin dir.

ADD rsyslog.sh /bin/rsyslog.sh

The contents of the script are as follows:

#!/bin/sh
# Wrapper to start rsyslog.d with appropriate sysconfig options

echo $$ > /var/run/syslogd.pid

source /etc/sysconfig/rsyslog
exec /usr/sbin/rsyslogd -n $SYSLOGD_OPTIONS

Notice how the script does in fact handle environment variables by sourcing the /etc/sysconfig/rsyslog file. And the CMD instruction simply calls the script.

CMD [ "/bin/rsyslog.sh" ]

4.6.3. Displaying Usage Information

It might not always be desired to start an application right away. In such a case, we can display usage information with a script instead. A good example are builder images that are used, as the name suggests, for building applications rather than being run standalone. Let’s take s2i-python as an example:

$ docker run docker.io/centos/python-35-centos7
This is a S2I python-3.5 centos base image:
To use it, install S2I: https://github.com/openshift/source-to-image

Sample invocation:

s2i build https://github.com/sclorg/s2i-python-container.git --context-dir=3.5/test/setup-test-app/ centos/python-35-centos7 python-sample-app


You can then run the resulting image via:
docker run -p 8080:8080 python-sample-app

The s2i-python image leverages the source-to-image tool to build Python applications from the user’s source code. Because the image usage is rather specific and requires user’s input, providing the instructions by default is very convenient.

So, just like the example in the previous section uses a script as CMD input to run the application, the script in this example outputs valuable information about the image usage.

If you would like to provide more information about your image or don’t want to pass the usage information as the default command, consider including a help file instead.

4.6.4. Using systemd Inside the Container

Extending our example from starting an application with a script, the rsyslog image was started with a script. We could easily use systemd to start the application. To use systemd to start a service that has a unit file, we need to tell systemd to enable the service and then let the init process handle the rest. So instead of the ADD instruction used earlier, we would use a RUN instruction to enable the service.

RUN systemctl enable rsyslog

And then we need to change the CMD instruction to call /usr/sbin/init to let systemd take over.

RUN /usr/sbin/init

4.6.5. Using Systemd to Control Containers

The control mechanism for most docker functions is done via the docker commands or something like the atomic application which simplifies the management of containers and images for users. But in a non-development environment, you may wish to treat your containers more like traditional services or applications. For example, you may wish to have your containers start in a specific order on boot-up. Or perhaps you wish to be able to restart (or recycle) a container because you have changed its configuration file.

There are several approaches to these sorts of function. You can make sure a specific container always starts on boot-up using the --restart switch with the docker command line when you initially run the image. There are also orchestration platforms like Kubernetes that will allow you to determine the start up order of containers even when they are distributed. But in the case where all the containers reside on a single node, systemd might just be exactly the solution. Like with traditional services, systemd is capable of making sure services both start and in the order they are specified. Moreover, any issues with startup or the container are logged like any other system service.

When using systemd to manage your containers, you are really using systemd to call docker commands (and subsequently the docker daemon) to perform the actions. Therefore, once you commit to using systemd to control a container, you will need to make sure that all start, stop, and restart actions are conducted with systemd. Failure to do so essentially decouples the docker daemon and systemd causing systemd to be out of sync.

In review, systemd is a good solution for:

  • host system services such as agents and long-running services

  • logging via journald

  • service dependant management

  • traditional service management vis systemctl

  • multi-container applications with dependencies on the same node

The configuration file below is a sample service file that can be used and edited to control your image or container. In the [Unit] section, you can declare other services needed by your image including the cases where those services are also images.

Sample template for a systemd service file
[Unit]
After=docker.service
Requires=docker.service
PartOf=docker.service
After=[cite another service]
Wants=[cite another service]

[Service]
EnvironmentFile=[path to configuration file]
ExecStartPre=-[command to execute prior to starting]
ExecStart=[command to execute for start]
ExecStartPost=/usr/bin/sleep 10
ExecStop=[command to execute for stop]
Restart=always

[Install]
WantedBy=docker.service

In the [Service] section, you can also declare the actual commands that should be run prior to start, in the case of start, and in the case of stop. These commands can either be straight base commands or docker run (or stop) commands as well. Finally, if you are using a well made image that contains labels like STOP or RUN, you could also use the atomic command. For example, a start command could simply be:

atomic run <image_name>

This works because the actual docker command to run that image is part of the image’s metadata and atomic is capable of extracting it.

The [Service] section also has an option for EnvironmentFile. In a traditional, non-containerized systemd service, this configuration file resides in /etc/sysconfig/<service_name>. In the case of a containerized application, these configuration files are not always configurable and therefore do not reside on the host’s filesystem. And in the case of where they are configurable, the EnvironmentFile is usually more important to how the service application is started. If you are starting the application within an image with systemd, then systemd will use /etc/sysconfig/<service_name> within the image itself.

For more information on writing unit files see Managing Systemd unit files.

4.7. Creating a Help File

You can provide a man-like help file with your images that allows for users to have a deeper understanding of your image. This function now allows you to provide a:

  • more verbose description of the what the image is and does

  • understanding of how the image should be run

  • description of the security implications inherent in running the image

  • requirement if the image needs to be installed

You can use the atomic application to display this help file trivially like so:

# atomic help <image or container name>

4.7.1. Location

For the atomic tool to be able to parse your help file, it must be located in the image as /help.1 and in the 'man' format.

4.7.2. Required headings

The following headings are strongly encouraged in the help file for an image.

NAME

Image name with short description.

DESCRIPTION

Describe in greater detail the role or purpose of the image (application, service, base image, builder image, etc.).

USAGE

Describe how to run the image as a container and what factors might influence the behavior of the image itself. Provide specific command lines that are appropriate for how the container should be run.

ENVIRONMENT VARIABLES

Explain all environment variables available to run the image in different ways without the need of rebuilding the image.

HISTORY

Similar to a Changelog of sorts which can be as detailed as the maintainer desires.

4.7.3. Optional Headings

Use the following sections in your help file when applicable.

LABELS

Describe LABEL settings (from the Dockerfile that created the image) that contain pertinent information. For containers run by atomic, that could include INSTALL, RUN, UNINSTALL, and UPDATE LABELS. For more information see Container Application Generic Labels.

SECURITY IMPLICATIONS

If your image uses any privileges that you want to make the user aware of, be sure to document which ones are used and optionally why.

4.7.4. Sample Template

We recommend writing the help file in the markdown language and then converting it to the man format. This is handy because github can natively display markdown, so the help file can be used in multiple ways. For a template, see the example help file below and the template markdown file that can also be obtained at here.

Sample help template in markdown format
% IMAGE_NAME (1) Container Image Pages
% MAINTAINER
% DATE

# NAME
Image_name - short description

# DESCRIPTION
Describe how image is used (user app, service, base image, builder image, etc.), the services or features it provides, and environment it is intended to run in (stand-alone docker, atomic super-privileged, oc multi-container app, etc.).

# USAGE
Describe how to run the image as a container and what factors might influence the behavior of the image itself. Provide specific command lines that are appropriate for how the container should be run. Here is an example for a container image meant to be run by the atomic command:

To pull the container and set up the host system for use by the XYZ container, run:

    # atomic install XYZimage

To run the XYZ container (after it is installed), run:

    # atomic run XYZimage

To remove the XYZ container (not the image) from your system, run:

    # atomic uninstall XYZimage

Also, describe the default configuration options (when defined): default user, exposed ports, volumes, working directory, default command, etc.

# ENVIRONMENT VARIABLES
Explain all the environment variables available to run the image in different ways without the need of rebuilding the image. Change variables on the docker command line with -e option. For example:

MYSQL_PASSWORD=mypass
                The password set for the current MySQL user.

# LABELS
Describe LABEL settings (from the Dockerfile that created the image) that contains pertinent information.
For containers run by atomic, that could include INSTALL, RUN, UNINSTALL, and UPDATE LABELS.


# SECURITY IMPLICATIONS
If you expose ports or run with privileges, note those and provide an explanation. For example:

Root privileges
    Container is running as root. Explain why is it necessary.

-p 3306:3306
    Opens container port 3306 and maps it to the same port on the Host.

--net=host --cap_add=net_admin
     Network devices of the host are visible inside the container and can be configured.

# HISTORY
Similar to a Changelog of sorts which can be as detailed as the maintainer wishes.

# SEE ALSO
References to documenation or other sources. For example:

Does Red Hat provide MariaDB technical support on RHEL 7? https://access.redhat.com/solutions/1247193
Install and Deploy a Mariadb Container Image https://access.redhat.com/documentation/en/red-hat-enterprise-linux-atomic-host/7/single/getting-started-guide/#install_and_deploy_a_mariadb_container

4.7.5. Example Help File for the rsyslog Container

% RSYSLOG (1) Container Image Pages
% Stephen Tweedie
% January 27, 2016

# NAME
rsyslog \- rsyslog container image

# DESCRIPTION
The rsyslog image provides a containerized packaging of the rsyslogd daemon. The rsyslogd daemon is a
utility that supports system message logging. With the rsyslog container installed and running, you
can configure the rsyslogd service directly on the host computer as you would if the daemon were
not containerized.

You can find more information on the rsyslog project from the project Web site (http://www.rsyslog.com/doc).

The rsyslog image is designed to be run by the atomic command with one of these options:

`install`

Sets up the container to access directories and files from the host system to use for rsyslogd configuration,
logging, log rotation, and credentials.

`run`

Starts the installed container with selected privileges to the host and with logging-related files and
directories bind mounted inside the container. If the container stops, it is set to always restart.

`uninstall`

Removes the container from the system. This removes the syslog logrotate file, leave all other files
and directories associated with rsyslogd on the host system.

Because privileges are opened to the host system, the running rsyslog container can gather log messages
from the host and save them to the filesystem on the host.

The container itself consists of:
    - rhel7/rhel base image
    - rsyslog RPM package

Files added to the container during docker build include: /bin/install.sh, /bin/rsyslog.sh, and /bin/uninstall.sh.

# USAGE
To use the rsyslog container, you can run the atomic command with install, run, or uninstall options:

To set up the host system for use by the rsyslog container, run:

  atomic install rhel7/rsyslog

To run the rsyslog container (after it is installed), run:

  atomic run rhel7/rsyslog

To remove the rsyslog container (not the image) from your system, run:

  atomic uninstall rhel7/rsyslog

# LABELS
The rsyslog container includes the following LABEL settings:

That atomic command runs the docker command set in this label:

`INSTALL=`

  LABEL INSTALL="docker run --rm --privileged -v /:/host \
  -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME \
  IMAGE /bin/install.sh"

  The contents of the INSTALL label tells an `atomic install rhel7/rsyslog` command to remove the container
  after it exits (--rm), run with root privileges open to the host, mount the root directory (/) from the hos on
  the /host directory within the container, set the location of the host file system to /host, set the name of
  the image and run the install.sh script.

`RUN=`

  LABEL RUN="docker run -d --privileged --name NAME \
  --net=host --pid=host \
  -v /etc/pki/rsyslog:/etc/pki/rsyslog \
  -v /etc/rsyslog.conf:/etc/rsyslog.conf \
  -v /etc/sysconfig/rsyslog:/etc/sysconfig/rsyslog \
  -v /etc/rsyslog.d:/etc/rsyslog.d \
  -v /var/log:/var/log \
  -v /var/lib/rsyslog:/var/lib/rsyslog \
  -v /run:/run \
  -v /etc/machine-id:/etc/machine-id:ro \
  -v /etc/localtime:/etc/localtime:ro \
  -e IMAGE=IMAGE -e NAME=NAME \
  --restart=always IMAGE /bin/rsyslog.sh"

  The contents of the RUN label tells an `atomic run rhel7/rsyslog` command to open various privileges to the host
  (described later), mount a variety of host files and directories into the container, set the name of the container,
  set the container to restart automatically if it stops, and run the rsyslog.sh script.

`UNINSTALL=`

  LABEL UNINSTALL="docker run --rm --privileged -v /:/host \
  -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME \
  IMAGE /bin/uninstall.sh"

  The contents of the UNINSTALL label tells an `atomic uninstall rhel7/rsyslog` command to uninstall the rsyslog
  container. Stopping the container in this way removes the container, but not the rsyslog image from your system.
  Also, uninstalling leaves all rsyslog configuration files and log files intact on the host (only removing the
  syslog logrotate file).

`BZComponent=`

The bugzilla component for this container. For example, "BZComponent="rsyslog-docker".

`Name=`

The registry location and name of the image. For example, "Name="rhel7/rsyslog":

`Version=`

The Red Hat Enterprise Linux version from which the container was built. For example, "Version="7.2".

`Release=`

The specific release number of the container Release="12.1.a":

`Architecture=`

The machine architecture associated with the Red Hat Enterprise Linux release. For example, "Architecture="x86_64"

When the atomic command runs the rsyslog container, it reads the command line associated with the selected option
from a LABEL set within the Docker container itself. It then runs that command. The following sections detail
each option and associated LABEL:

# SECURITY IMPLICATIONS
The rsyslog container is what is referred to as a super-privileged container. It is designed to have almost complete
access to the host system as root user. The following docker command options open selected privileges to the host:

`-d`

Runs continuously as a daemon process in the background

`--privileged`

Turns off security separation, so a process running as root in the container would have the same access to the
host as it would if it were run directly on the host.

`--net=host`

Allows processes run inside the container to directly access host network interfaces

`--pid=host`

Allows processes run inside the container to see and work with all processes in the host process table

`--restart=always`

If the container should fail or otherwise stop, it would be restarted

# HISTORY
Similar to a Changelog of sorts which can be as detailed as the maintainer wishes.

# AUTHORS
Stephen Tweedie

4.7.6. Converting Markdown to man Format

There are several methods for converting markdown format to man format. One prevalent method is to use go-md2man supplied by the golang-github-cpuguy83-go-md2man package. To convert from markdown to man using this utility, you do as follows:

go-md2man -in path_to_man_file -out output_file

4.8. The Dockerfile Linter

4.8.1. What Is the Linter?

The Dockerfile-lint is a rule based 'linter' for verifying Dockerfiles. The rules are used to check file syntax and best practice options for things such as:

  • Was yum clean up evoked after a package installation?

  • In the RUN section did the writer link commands via semicolons or double ampersands?

These are determined by the rules author and are typically defined by best practices and writer requirements. The input rules are defined via a set of yaml files.

At the time of this writing, there are a number of templates from base to automated build configurations.

4.8.2. Where do I Get the Linter and How do I Install it?

There are two iterations of the linter.

For the next section, we will assume that the CLI version of the linter will be installed manually via npm using the following commands:

git clone https://github.com/projectatomic/dockerfile_lint/
cd dockerfile_lint
npm install

4.8.3. Where do I Get the Templates?

Built-in Templates

In the config directory, there are two base ruleset files. If the dockerfile_lint is executed without -r these are the base rules used.

  • dockerfile_lint/config/default_rules.yaml

  • dockerfile_lint/config/base_rules.yaml

Additional Types of Templates

In the sample_rules directory there are some included templates for OpenShift and an example base template.

  • Basic rules - dockerfile_lint/sample_rules/basic_rules.yaml

This set of rules is a basic catchall of your typical Dockerfile. Things such as yum cache clean up and command execution etiquette are checked. These are the rules we will be referencing below.

  • OpenShift Template - dockerfile_lint/sample_rules/openshift.yaml

In addition to testing the semantics of the basic template from above, The OpenShift template checks for some required OpenShift labels specific to its use.

4.8.4. How do I Read and Customize the Templates?

The filename of the basic template is included in the command above: sample_rules/basic_rules.yaml

The rules are implemented using regular expressions matched on instruction of the Dockerfile. The rule file has 3 sections: a profile section, a line rule section and a required instructions section.

Profile Section

The profile section gives information about the rule file. This is the name identifier and description for the profile. This information should help users to identify an applicable template.

profile:
  name: "Default"
  description: "Default Profile. Checks basic syntax."
  includes:
    - recommended_label_rules.yaml

An excerpt from the rules shows how includes are defined:

includes:
  - recommended_label_rules.yaml

The include section allows for chaining rulesets of multiple sources. In the above example the recommended_label_rules.yaml is processed in addition to its source.

Line Rule Section

This section contains rules match on a given instruction in the Dockerfile. The line rules do the bulk of the dockerfile parsing.

The example below shows rules to run against the 'FROM' instruction.

  FROM registry.access.redhat.com/rhel7:latest

The excerpt below checks for the latest flag in the 'FROM' line.

line_rules:
  FROM:
    paramSyntaxRegex: /^[a-z0-9./-]+(:[a-z0-9.]+)?$/
      rules:
        -
          label: "is_latest_tag"
          regex: /latest/
          level: "error"
          message: "base image uses 'latest' tag"
          description: "using the 'latest' tag may cause unpredictable builds. It is recommended that a specific tag is used in the FROM line or *-released which is the latest supported release."
          reference_url:
            - "https://docs.docker.com/reference/builder/"
            - "#from"

Here is another example that parses the 'RUN' line.

  RUN yum -y --disablerepo=\* --enablerepo=rhel-7-server-rpms install yum-utils && \
    yum-config-manager --disable \* && \
    yum-config-manager --enable rhel-7-server-rpms && \
    yum clean all

    RUN yum -y install file open-vm-tools perl open-vm-tools-deploypkg net-tools && \
    yum clean all

The regex below checks to see if the yum command has been issued. If it has, check to see if yum clean all has been run as well.

  RUN:
    paramSyntaxRegex: /.+/
      rules:
        -
           label: "no_yum_clean_all"  #This is a short description of the rule
           regex: /yum(?!.+clean all|.+\.repo)/g  #regex the linter is attempting to match
           level: "warn" # warn, error or info: These results will define how the linter exits
           message: "yum clean all is not used"
           description: "the yum cache will remain in this layer making the layer unnecessarily large"
           reference_url:
             - "http://docs.projectatomic.io/container-best-practices/#"
             - "_clear_packaging_caches_and_temporary_package_downloads"
            # Lastly, any best practice documentation that may be pertinent to the rule
Required Instructions Section

While the line rules section uses regex, the required instructions looks for the instantiation of the instruction.

required_instructions:
  -
    instruction: "EXPOSE"
    count: 1
    level: "info"
    message: "There is no 'EXPOSE' instruction"
    description: "Without exposed ports how will the service of the container be accessed?"
    reference_url:
      - "https://docs.docker.com/reference/builder/"
      - "#expose"

4.8.5. How do I use the Linter?

Execution of the CLI version of the linter may look like this:

dockerfile_lint -f /path/to/dockerfile -r sample_rules/basic_rules.yaml

Here is some sample output from the command above:

--------ERRORS---------

ERROR: Maintainer is not defined. The MAINTAINER line is useful for identifying the author in the form of MAINTAINER Joe Smith <joe.smith@example.com>.
Reference -> https://docs.docker.com/reference/builder/#maintainer

--------INFO---------

INFO: There is no 'ENTRYPOINT' instruction. None.
Reference -> https://docs.docker.com/reference/builder/#entrypoint

By default, the linter runs in strict mode (errors and/or warnings result in non-zero return code). Run the command with '-p' or '--permissive to run in permissive mode:

dockerfile_lint  -p -f /path/to/dockerfile

This allows for quick and automated testing as what is informational and what needs to be addressed immediately.

5. Building Images

5.1. Simple Build

For building docker images, we must first have the docker daemon installed and running:

#> dnf install -y docker
#> systemctl start docker

Then we can download an image that we’ll use as a base of our image. Let’s use something we trust, for example Red Hat Enterprise Linux 7:

#> docker pull rhel7

Then, one way to create an image, is to simply layer your content on top of the running container. Run the container, make your changes and use docker commit to "commit" your changes:

#> docker run -ti --name mycont rhel7 bash
[root@a1eefecdacfa /]# echo Hello Dojo > /root/greeting
[root@a1eefecdacfa /]# exit
#> docker commit mycont
0bdcfc5ba0602197e2ac4609b8101dc8eaa0d8ab114f542ab6b2f15220d0ab22

However, this approach is not easily reproducible and is not ideal for more complicated scenarios. To ensure the build can be reproduce, use the Dockerfile instead.

The following example results in the same output as the example before, except we can repeat it as many times we want and always get the same output. It also helps understanding the Docker itself more as a packaging format, than just a virtualization technology:

#> cat Dockerfile
FROM rhel7
RUN echo Hello Dojo > /root/greeting

#> docker build .

5.2. Build Environment

A build environment should have the following characteristics

  • is secure by limiting direct access to the build environment

  • limits access to configure and trigger builds

  • limits access to build sources

  • limits access to base images, those images referenced in the FROM line of a Dockerfile

  • provides access to build logs

  • provides some type of a pipeline or workflow, integrating with external services to trigger builds, report results, etc.

  • provides a way to test built images

  • provides a way to reproduce builds

  • provides a secure registry to store builds

  • provides a mechanism to promote tested builds

  • shares the same kernel as the target production runtime environment

A build environment that meets these requirements is difficult to create from scratch. An automation engine like Jenkins is essential to managing a complex pipeline. While a virtual machine-based solution could be created, it is recommended that a dedicated, purpose-built platform such as OpenShift be used.

6. Testing

6.1. What Should be Tested

Container images generally consist of distribution packages and some scripts that help start the container properly. For example, a container with a MariaDB database typically consists of a set of RPMs, such as mariadb-server, that provides the main functionality, and scripts that handle initialization, setup, etc. A simplified example of MariaDB Dockerfile may look like this:

FROM fedora:24

RUN dnf install -y mariadb-server && \
    dnf clean all && \
    /usr/libexec/container-setup

ADD run-mysqld /usr/bin/
ADD container-setup /usr/libexec/

VOLUME ["/var/lib/mysql/data"]
USER 27

CMD ["run-mysqld"]

If we want to test the basic functionality of the container, we do not have to test the RPM functionality. The benefit of using the distribution packaging is that we know testing has already been done during the RPM develpment process. Instead, we need to focus on testing of the added scripts instead and the API of the container. The goal is to determine if it works as described. For example, for the MariaDB database container, we do not run the MariaDB unit tests in the container. Instead we focus on whether the database is initialized, configured, and responds to commands properly.

6.2. Conventions for Test Scripts

It is good practice to keep the basic sanity tests for the image together with the image sources. For example test/run might be a script, that tests the image specified by the IMAGE_NAME environment variable, so users may specify an image which should be tested.

6.3. Examples of Test Scripts

These examples of test scripts can be found in the container images for Software Collections:

A minimal script that verifies a container image by running it as a daemon and then running a script that checks the proper functionality, is shown below. It stores the IDs of the containers created during the test in a temporary directory. This makes it easy to clean up those containers after the test finishes.

#!/bin/bash
#
# General test of the image.
#
# IMAGE_NAME specifies the name of the candidate image used for testing.
# The image has to be available before this script is executed.
#

set -exo nounset
shopt -s nullglob

IMAGE_NAME=${IMAGE_NAME-default-image-name}
CIDFILE_DIR=$(mktemp --suffix=test_cidfiles -d)

# clears containers run during the test
function cleanup() {
  for cidfile in $CIDFILE_DIR/* ; do
    CONTAINER=$(cat $cidfile)

    echo "Stopping and removing container $CONTAINER..."
    docker stop $CONTAINER
    exit_status=$(docker inspect -f '{{.State.ExitCode}}' $CONTAINER)
    if [ "$exit_status" != "0" ]; then
      echo "Dumping logs for $CONTAINER"
      docker logs $CONTAINER
    fi
    docker rm $CONTAINER
    rm $cidfile
    echo "Done."
  done
  rmdir $CIDFILE_DIR
}
trap cleanup EXIT

# returns ID of specified named container
function get_cid() {
  local id="$1" ; shift || return 1
  echo $(cat "$CIDFILE_DIR/$id")
}

# returns IP of specified named container
function get_container_ip() {
  local id="$1" ; shift
  docker inspect --format='{{.NetworkSettings.IPAddress}}' $(get_cid "$id")
}

# runs command to test running container
function test_image() {
  local name=$1 ; shift
  echo "  Testing Image"
  docker run --rm $IMAGE_NAME get_status `get_container_ip $name`
  echo "  Success!"
}

# start a new container
function create_container() {
  local name=$1 ; shift
  cidfile="$CIDFILE_DIR/$name"
  # create container with a cidfile in a directory for cleanup
  docker run --cidfile $cidfile -d $IMAGE_NAME
  echo "Created container $(cat $cidfile)"
}


# Tests.

create_container test1
test_image test1

7. Delivering Images

7.1. Image Naming

This section describes the image naming standards for Docker images.

Docker URLs are similar to GitHub repository names. Their structure is:

REGISTRY:PORT/USER/REPO:TAG

The implicit default registry is docker.io. This means that relative URLs, such as redhat/rhel resolve to docker.io/redhat/rhel. The "registry" and "repository" elements must be present in the names of images. "Port", "user", and "tag" are not always required, and are not always present.

Here is an example showing search results for a query targeting Fedora images:

$ sudo docker search fedora
INDEX       NAME                                             DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
docker.io   docker.io/fedora                                 Official Fedora 21 base image and semi-off...   172       [OK]
docker.io   docker.io/fedora/apache                                                                          30                   [OK]
docker.io   docker.io/fedora/couchdb                                                                         30                   [OK]
docker.io   docker.io/fedora/mariadb                                                                         22                   [OK]
docker.io   docker.io/fedora/ssh                                                                             19                   [OK]

Here we see a search that shows an image name in the REGISTRY/USER/REPO format in the NAME column:

$ sudo docker search zdover23
INDEX       NAME                               DESCRIPTION   STARS     OFFICIAL   AUTOMATED
docker.io   docker.io/zdover23/fed20publican                 0

For more information on naming recommendations, see this document.