Optimizing your build
The initial cookbooks in this series seemed to automatically cache build steps and perform predictably - make a change to the Dockerfile and the build will pick up from the point of the change.
In such cases, when there are steps that can be executed in order it is a good idea to order the steps that take longest and/or are least likely to change earlier than quick steps that may change more frequently.
This works well until you get to the point where steps need to be ordered.
Now that you are copying sources from outside of the Dockerfile, a change to any source will trigger a rebuild from the point of the COPY statement on, including time consuming steps like
bundle install and
The relevant portion of the Dockerfile looks something like the following:
COPY . . RUN bundle install
Splitting that into two copy statements can make a dramatic improvement in build times:
COPY Gemfile* . RUN bundle install COPY . .
As the first statement will only copy
Gemfile.lock, the time consuming
bundle install step will only be run if these two specific files have changed.
This can result in a dramatic reduction in build times for cases where the Gemfile did not change.
bundle install and
yarn install. Both can be time consuming. They can be run in either order. If run sequentially, you will be faced with a choice: should a change to the
Gemfile result in an unnecessary reinstall of node modules, or should a change to
package.json result in an unnecessary reinstall of gems?
We've seen how multi-stage builds can reduce image size. They also can be used to reduce build times. An example to illustrate:
FROM ruby:slim as base RUN apt-get install -y build-essential && volta install node@lts yarn@latest WORKDIR /demo FROM base as gems COPY Gemfile* . RUN bundle install FROM base as node COPY package*.json . RUN yarn install FROM base RUN apt-get install -y postgresql-client COPY . . COPY --from=gems /usr/local/bundle /usr/local/bundle COPY --from=node /demo/node-modules /demo/node-modules
Such a Dockerfile will only run
bundle install if the
Gemfile has changed, and only run
yarn install if
Even better, if both changed, they will be run concurrently. In fact, on the first run they will be run concurrently with the installation of
We've seen how splitting COPY statements can reduce the number of steps, but it still remains the case that any change to the Gemfile will result in reinstalling all gems.
This can be improved by using the dedicated RUN cache.
Applied to bundle installs, the resulting build instructions would look something like the following:
RUN --mount=type=cache,id=dev-gem-cache,sharing=locked,target=/srv/vendor \ bundle config set app_config .bundle && \ bundle config set without 'development test' && \ bundle config set path /srv/vendor && \ bundle install && \ bundle clean && \ mkdir -p vendor && \ bundle config set path vendor && \ cp -ar /srv/vendor .
That's a lot to unpack. Statement by statement:
gem-cachedirectory is mounted on
- the bundle config directory is set to be a
.bundlesubdirectory of the current application.
- gems marked as development or test in the Gemfile are not to be installed.
- the bundle directory is set to the
- the install is performed
- unused gems are removed
- a vendor subdirectory is created.
- the bundle directory is changed to be
- the contents of the
/srv/vendorcache is copied to the vendor subdirectory.
The final build stage can copy the entire app directory from the build stage that included the above
RUN statement to pick up the configuration as well as the gems.
With this in place, adding a single gem to your Gemfile will result in only the installation of that one gem.
Starting from a simple sequence of two statements (potentially three if
yarn install is added) we have explored a number of techniques involving ordering, splitting, staging, and caching of statements and results.
We started with something that was simple and slow and ended with a solution that is considerably less simple but decidedly faster.
This results in a trade-off. For small projects it may make sense to only adopt some of these techniques to keep the Dockerfile maintainable. For other projects it may make sense to incorporate more.