Using Docker Multistage Builds to Optimise Go Images
TL;DR Reduce Docker image size for your Go application by building it in a Go container and copying the compiled file to a plain Alpine container to be executed.
Docker multistage builds are a great way to optimise your Docker images by minimising the amount of unused packages that are only used in the build process. This allows us to create significantly smaller images which are easier to maintain and read. As an added benefit, the container’s attack surface should be smaller if fewer packages are installed.
To demonstrate the impact of implementing a multistage build, we will create a Hello World application written in Go, and build images for it with and without multistage builds.
How does a multistage build work?
Let’s take the example of a building a Docker image for a Go application. A traditional build process would
probably use a standard golang
image pulled from Dockerhub as a base image,
copy in the Go source code, run a build command, and execute the compiled binary when the container is started.
There is nothing inherently wrong with this approach - the golang
base image has the Go programming language
preinstalled, along with all commonly required dependencies for the build process, and has an environment in which
the compiled application can run.
However, after building the application and creating the executable binary, the source code, Go modules and other dependencies, and even the Go programming language itself, become redundant. All we need now is the executable file and the operating system on which to execute it. We can inspect the unmodified base images pulled from Dockerhub to see how much space can be saved by running your application on a stripped down image containing only an operating system.
### Go installed on Debian Buster OS
$ docker pull golang:1.14
$ docker image ls | grep golang | grep 1.14
golang 1.14 8a195c689057 4 months ago 810MB
### Go installed on Alpine OS
$ docker pull golang:alpine
$ docker image ls | grep golang | grep alpine
golang alpine 1463476d8605 3 weeks ago 299MB
### Alpine OS without Go
$ docker pull alpine:latest
$ docker image ls | grep alpine | grep latest
alpine latest 389fef711851 4 weeks ago 5.58MB
As you can see, there is a huge difference in the sizes of the images - just installing Go on Alpine makes the image around 50 times bigger. And all this extra bulk will never be used after the build stage.
An optimised, single-stage Dockerfile might deal with this by uninstalling these unnecessary dependencies as the final stages of the Docker build - but we shouldn’t need to do this!
This is where multistage builds are really helpful. A multistage build uses two or more images in the build process within a
single Dockerfile. Each image is given a name which you can refer to in subsequent stages of the build to copy files between
images. So, for a Go application, we can use a golang
image to follow all of the traditional build process described above,
but stop before we tell the image to execute the application when it runs. At this point, we move to a basic alpine
image,
which will be used as the final output. Since the application has already been compiled in the Go image, the Alpine image
simply needs to copy the compiled executable file and execute it.
How to write a multistage Dockerfile
The Docker docs have an excellent page on using multistage builds, but we will run through a simple build process for a “Hello World” Go application below.
Step 1: Write the Go code
First, we need to write some source code to create an application - a very simple “Hello World”.
In a new folder, create a new file called hello-world.go
and add the following code:
package main
import "fmt"
func main() {
fmt.Println("Hello world!")
}
Step 2: Write the Dockerfile
In the same folder as your hello-world.go
file, create a new file called Dockerfile
.
We’ll start with some steps that should look familiar if you’re used to writing single-stage Dockerfiles for Go,
but note the addition of a name ("build
") on the FROM
line. This will allow us to refer to the Go image by
name later in the build process:
FROM golang:alpine as build
COPY ./hello-world.go /app/
WORKDIR /app
RUN go build -o /app/hello-world .
So far, we have copied the source code into the build container in stored it in the /app
directory, then changed
to the /app
directory and completed the build. After these steps, there should now be an executable application file
called hello-world
within /app
.
Now, we don’t want to execute the application in the build image, so we move on to the next stage and create an image based
on alpine:latest
. Add the following to Dockerfile
:
FROM alpine:latest
COPY --from=build /app/hello-world /app/
CMD ["/app/hello-world"]
And that’s it! Note that we didn’t need to give the Alpine image a name in the FROM
line in this stage -
since it is producing the final image, it will never need to be referred to by another image.
Step 3: Build the Docker image
The image is now ready to build. From the directory containing your Dockerfile
and hello-world.go
:
$ docker build . -t multistage-hello-world
It should only take a couple of seconds to build the image, and then you can run it:
$ docker run multistage-hello-world
Hello world!
Now we know that the build was successful, we can look at the image details to see how small the image is:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
multistage-hello-world latest 2997402f69ba 2 minutes ago 7.61MB
Clearly, the application image is significantly smaller than it would be possible with the golang
images, which
are at least ~300MB before an application is even added.
Why?
You might be wondering why would bother doing this. After all, Docker is already a relatively lightweight solution… Adding extra stages will only and complexity and time to your Docker builds… Does the size of a container really matter on a server…?
Although you are unlikely to even come close to filling a server’s storage capacity with Docker containers for your application, there are several reasons why you might want to minimise the size anyway:
- Registry charges - many cloud container registry services charge per GB, both in terms of data transfer bandwith and storage.
For example, AWS ECR allows up to 500MB to be stored within their free tier.
Using a full
golang:alpine
base image, you wouldn’t even be able to fit 2 containers into the free tier, whereas around 60 images of the size created by our multistage build could be stored for free. - Registry storage - even if you are using a self-hosted container registry where pricing is not an issue, you may find storage capacity becomes an issue if you are storing different versions (perhaps a version per application release) of your container. You will quickly notice the storage filling up if you are building large images on a regular release cycle.
- Bandwidth - as well as registry bandwidth charges from cloud providers, it is simply quicker for your build processes to only need to transfer small images between your servers and registries. Quicker transfers = less potential downtime.
With regards to complexity and build time, multistage builds are more of a reorganisation of your existing Dockerfiles,
rather than an addition. You will often find you don’t need to add any more than 2 lines per stage - an additional FROM
,
and a COPY
to get the build artifacts from a previous stage.