TL;DR Reduce Docker image size for your Go application by building it in a Go container and copying the compiled file to a plain Alpine container to be executed.


Docker multistage builds are a great way to optimise your Docker images by minimising the amount of unused packages that are only used in the build process. This allows us to create significantly smaller images which are easier to maintain and read. As an added benefit, the container’s attack surface should be smaller if fewer packages are installed.

To demonstrate the impact of implementing a multistage build, we will create a Hello World application written in Go, and build images for it with and without multistage builds.

How does a multistage build work?

Let’s take the example of a building a Docker image for a Go application. A traditional build process would probably use a standard golang image pulled from Dockerhub as a base image, copy in the Go source code, run a build command, and execute the compiled binary when the container is started. There is nothing inherently wrong with this approach - the golang base image has the Go programming language preinstalled, along with all commonly required dependencies for the build process, and has an environment in which the compiled application can run.

However, after building the application and creating the executable binary, the source code, Go modules and other dependencies, and even the Go programming language itself, become redundant. All we need now is the executable file and the operating system on which to execute it. We can inspect the unmodified base images pulled from Dockerhub to see how much space can be saved by running your application on a stripped down image containing only an operating system.

### Go installed on Debian Buster OS
$ docker pull golang:1.14
$ docker image ls | grep golang | grep 1.14
golang                                        1.14                 8a195c689057   4 months ago    810MB

### Go installed on Alpine OS
$ docker pull golang:alpine
$ docker image ls | grep golang | grep alpine
golang                                        alpine               1463476d8605   3 weeks ago     299MB

### Alpine OS without Go
$ docker pull alpine:latest
$ docker image ls | grep alpine | grep latest
alpine                                        latest               389fef711851   4 weeks ago     5.58MB

As you can see, there is a huge difference in the sizes of the images - just installing Go on Alpine makes the image around 50 times bigger. And all this extra bulk will never be used after the build stage.

An optimised, single-stage Dockerfile might deal with this by uninstalling these unnecessary dependencies as the final stages of the Docker build - but we shouldn’t need to do this!

This is where multistage builds are really helpful. A multistage build uses two or more images in the build process within a single Dockerfile. Each image is given a name which you can refer to in subsequent stages of the build to copy files between images. So, for a Go application, we can use a golang image to follow all of the traditional build process described above, but stop before we tell the image to execute the application when it runs. At this point, we move to a basic alpine image, which will be used as the final output. Since the application has already been compiled in the Go image, the Alpine image simply needs to copy the compiled executable file and execute it.

How to write a multistage Dockerfile

The Docker docs have an excellent page on using multistage builds, but we will run through a simple build process for a “Hello World” Go application below.

Step 1: Write the Go code

First, we need to write some source code to create an application - a very simple “Hello World”.

In a new folder, create a new file called hello-world.go and add the following code:

package main

import "fmt"

func main() {
	fmt.Println("Hello world!")
}

Step 2: Write the Dockerfile

In the same folder as your hello-world.go file, create a new file called Dockerfile.

We’ll start with some steps that should look familiar if you’re used to writing single-stage Dockerfiles for Go, but note the addition of a name ("build") on the FROM line. This will allow us to refer to the Go image by name later in the build process:

FROM golang:alpine as build

COPY ./hello-world.go /app/
WORKDIR /app

RUN go build -o /app/hello-world .

So far, we have copied the source code into the build container in stored it in the /app directory, then changed to the /app directory and completed the build. After these steps, there should now be an executable application file called hello-world within /app.

Now, we don’t want to execute the application in the build image, so we move on to the next stage and create an image based on alpine:latest. Add the following to Dockerfile:

FROM alpine:latest

COPY --from=build /app/hello-world /app/

CMD ["/app/hello-world"]

And that’s it! Note that we didn’t need to give the Alpine image a name in the FROM line in this stage - since it is producing the final image, it will never need to be referred to by another image.

Step 3: Build the Docker image

The image is now ready to build. From the directory containing your Dockerfile and hello-world.go:

$ docker build . -t multistage-hello-world 

It should only take a couple of seconds to build the image, and then you can run it:

$ docker run multistage-hello-world
Hello world!

Now we know that the build was successful, we can look at the image details to see how small the image is:

$ docker image ls
REPOSITORY                                    TAG                  IMAGE ID       CREATED          SIZE
multistage-hello-world                        latest               2997402f69ba   2 minutes ago    7.61MB

Clearly, the application image is significantly smaller than it would be possible with the golang images, which are at least ~300MB before an application is even added.

Why?

You might be wondering why would bother doing this. After all, Docker is already a relatively lightweight solution… Adding extra stages will only and complexity and time to your Docker builds… Does the size of a container really matter on a server…?

Although you are unlikely to even come close to filling a server’s storage capacity with Docker containers for your application, there are several reasons why you might want to minimise the size anyway:

  • Registry charges - many cloud container registry services charge per GB, both in terms of data transfer bandwith and storage. For example, AWS ECR allows up to 500MB to be stored within their free tier. Using a full golang:alpine base image, you wouldn’t even be able to fit 2 containers into the free tier, whereas around 60 images of the size created by our multistage build could be stored for free.
  • Registry storage - even if you are using a self-hosted container registry where pricing is not an issue, you may find storage capacity becomes an issue if you are storing different versions (perhaps a version per application release) of your container. You will quickly notice the storage filling up if you are building large images on a regular release cycle.
  • Bandwidth - as well as registry bandwidth charges from cloud providers, it is simply quicker for your build processes to only need to transfer small images between your servers and registries. Quicker transfers = less potential downtime.

With regards to complexity and build time, multistage builds are more of a reorganisation of your existing Dockerfiles, rather than an addition. You will often find you don’t need to add any more than 2 lines per stage - an additional FROM, and a COPY to get the build artifacts from a previous stage.