How to optimize bundles with webpack

Webpack is an asset bundler for web applications. It consolidates files, and output various assets like javascript and CSS files, which would be uploaded to a website hosting service or CDN.

In this guide, we'll be going through how webpack understands and bundles assets so you can optimize and lower your own application's bundle size.

Why bundle?

In the era of HTTP 1.1, this step was needed for performance as websites were usually limited to 6 HTTP connections, which limited the number and speed of files that can be downloaded at once. Since the release of the HTTP 2.0 standard, the connection limit no longer applies, but keeping the number of files low was still found to be better for performance.

What is optimal bundling?

When we bundle, we need to be aware of all the factors involved. We general, we want to reduce:

  • number of files
  • file sizes
  • cache misses
  • code duplication

To add to this, since JavaScript can import other JavaScript files and bundling involves consolidating these files, we are dealing with a graph optimization problem. As such, bundling is a complex topic and the optimal configuration depends on the architecture of your web application.

Bundling in webpack

When webpack starts the bundling process, everything starts as a chunk - a unit of code that can be split and joined as needed. Specifically, we have two types of chunks; entry chunks and async chunks.

Entry Chunks
Entry chunks are chunks that are required synchronously. They are created via webpack's `entry` configuration. For example, if you have an entry with the key `main`, a `main` chunk is generated.
Async Chunks
Async chunks are chunks that are lazily imported. They are created with the dynamic import syntax (e.g. import('filePath.js')), which would give you an auto-generated chunk name. To provide a name, you have to do so via the magic comment syntax - import(/* webpackChunkName: "chunk-name" */ 'filePath.js'). With all these chunks, webpack will decide on how to split and optimally group them next.

SplitChunks Configuration

Webpack's power and flexibility in bundling lies in the optimization.splitChunks configuration. The configuration works by considering all chunks and grouping them into cacheGroups. webpack starts with a pretty good default configuration that is different depending on the version. As of the timing of writing, webpack 5's configuration only splits chunks from node_modules into the vendors' group and shared initial chunks into a separate group.

The documentation for optimization.splitChunks is pretty good. What is confusing though, is just how the following 5 knobs and toggles interact with each other.

splitChunks.chunks
Specifies if which chunks should be selected for splitting. The default is 'initial', and 'async' and 'all' are possible options.
splitChunks.minSize
Specifies what is the minimum size of the chunk to be considered whether it should go into the group.
splitChunks.maxInitialRequests
Specifies what is the maximum numbers of split entry chunks webpack can output.
splitChunks.maxAsyncRequests
Specifies what is the maximum numbers of split async chunks webpack can output.
splitChunks.[cacheGroup].enforce
This option overrides the `minSize`, `minChunks`, and other settings relating to the number of chunks.
splitChunks.[cacheGroup].priority
Specifies the priority of a cacheGroup if a chunk belongs to more than one.

Recipes

Here are some quick recipes to just get that configuration working.

I want to prevent this chunk from getting included

The chunk function works just like a filter function. Using it allows you to reject a chunk or force a chunk into a cacheGroup.

[cacheGroup]: {
    chunks: (chunk) => chunk.name !== chunkName,
},

I want to force chunks to always split into this cacheGroup

Enforce overrides various settings relating to size and number of chunks that can be generated or considered. Use this carefully as it may lead to unexpected results.

[cacheGroup]: {
    enforce: true
},

I want to manually control chunks

The name function is the best way to manually control everything. By specifying the same names, you can force chunks into the same output chunk regardless of webpack's limitations.

[cacheGroup]: {
    name: (module, chunks, cacheGroupKey) => {
        if (isTheseChunk(chunks)) {
            return 'these-chunks';
        } else {
            return 'those-chunks';
        }
    }
},

Example

Let's go through an example to examine what we can do to optimize its bundled output. In this example, we have the following import structure:

Chunk Dependencies
main react, dynamically imported Pages A-E & polyfill
polyfill core-js
PageA SharedByA, SharedByAB, SharedByABC
PageB SharedByA, SharedByAB, SharedByABC
PageC SharedByAll, SharedByABC
PageD SharedByAll, d3
PageE SharedByAll, d3

Using the default configuration in webpack 4, we get the following structure:

While the polyfill and PageD and PageE's d3 dependency is split out into its vendor bundle, our main bundle, unfortunately, contains node_modules code. If you aren't using automated dependency upgrades, your dependencies won't be updated as frequently as production code so we should separate the node_modules into a separate vendor bundle.

Why webpack splits out vendors~polyfill.js and vendors\PageD\PageE.js but not vendors~main.js is because the splitChunks.chunks setting is async by default, which means our entry chunks won't be considered for optimization here.

We can fix this by changing splitChunks.chunks setting to all, giving us this structure:

Now that we have a vendors~main.js bundle, let us look for more optimizations. One area we can do better is to split out dependencies like lodash in Pages A, B, and C. The reason why these aren't getting split is that the node_modules chunk in these bundles is smaller than our 30kb limit set in splitChunks.minSize. We can go around this by either setting minSize to a small number, or set splitChunks.[cacheGroup].enforce to true.

Here, let us enforce the vendors chunk since we are assuming our dependencies rarely update, and by splitting them out, we can get a higher cache hit rate:

Now, this setup looks as optimized as we can get. What is missing above though, is a tiny polyfill bundle at just 168 bytes. This is because the polyfill's main size comes from core-js, a dependency, so it is separated into its chunk. This is webpack's blind spot - it naively splits out chunks if it hits the minSize threshold, but doesn't consider the remaining chunk's size.

To enforce both the polyfill code and its dependencies into one bundle, we will reject the polyfill chunk in the vendors cacheGroup like so:

vendors: {
    test: /[\\/]node_modules[\\/]/,
    enforce: true,
    chunks: (chunk) => chunk.name !== 'polyfill',
    priority: -10,
},

Now that the bundles are split into the most efficient configuration, let's check by ensuring that our hashes are consistent to avoid unnecessary cache misses. If we changed PageA's content now, hashes for both the main.js and PageA.js bundles will change. This is because the main bundle contains webpack's runtime code, which is how it knows which bundles to load in production, and because PageA's hash has changed, so did the runtime code and thus the main.js bundle as well.

We can split the runtime code into a separate chunk by setting optimization.runtimeChunk to 'single', which creates a new bundle called runtime.js.

Now when we change PageA's contents, only the runtime.js and PageA.js bundles' hash are affected. We have finally achieved the optimal setup for this web application!

Conclusion

While understanding webpack's bundling configuration is hard, knowing how to tune the knobs and levers can be one of the highest leverage actions you can do to improve your web application's performance. I encourage you to set up a demo setup to experiment and find out what is the best way to optimize your web application, you may be surprised at just how much you could shave!