Cutting our CI test load by 70% using pants

TL;DR:

  • As codebases grow, so do the number of tests and the duration of your CI pipeline.
  • However, not every test needs to run for every small change.
  • pants is a build system that thrives at automatic dependency inference.
  • Because it infers dependencies, you can use it to only run the set of affected unit tests.
  • With pants’ built-in optimizations, we cut our CI test load by 30%.
  • On top of pants’ built-in optimizations, I cut our CI test load by another 40% using pants’ dependency introspection tools, achieving a 70% test load reduction in total.

How do tests get in the way of moving fast?

You decide as a team that tests are important. You don’t want broken code. So you make it a habit to add unit tests whenever you add new code. Small pull requests are also important, as they get faster and better reviews, improve code quality, and are less likely to break things (take a look at this article). Over time you develop a codebase that is robust and beautiful. But merging also takes longer because the suite of tests that need to be run before you can merge your code is much bigger! This can be frustrating as in many cases, you know what you wrote doesn’t affect anything else in the codebase.

“Come on, I just added a simple print statement. Why do I need to wait 30 minutes to merge this code?” - you, probably.

Well, pants is here to save you.

Let’s put some pants on your CI pipeline

Beyond what you are hopefully wearing right now at work, pants also refers to a build system. As the docs mention, it wraps many underlying tools - “compilers, code generators, dependency resolvers, test runners, linters, formatters, packagers, REPLs and more.” You can read more about it, but the main thing we’re going to focus on is its dependency inference capabilities and how that affects testing.

pants’ dependency inference lets us run only affected tests. On a very high level, pants knows which files are connected. It automatically scans through your imports and says “Hey, it looks like test_foo.py depends on foo.py”.

Using information like this, pants can run only the tests that are affected by changes you make in a new branch. The command looks like the following:

pants --changed-since=origin/main test

If you make changes to foo.py it might say: “Hey, I can see that in this new branch, the one file that changed compared to the main branch is foo.py. Since test_foo.py is the one test that depends on this, let’s just run that test.”

Now, consider the scenario where bar.py also depends on foo.py. It also has its own unit test test_bar.py that only imports from bar.py. When you change foo.py you want to ensure that bar.py also still works as expected i.e. you want test_bar.py to also run! test_bar.py is a transitive dependent. You can run the following command:

pants --changed-since=origin/main --changed-dependents=transitive test

Here, pants says “Hey, you changed foo.py, and I can see that test_foo.py as well as test_bar.py depend on foo.py directly and transitively, respectively. Let me run both unit tests.”

The speed up that you can get from this depends on how many tests you have, and how long each one takes. As a toy example, let’s say you have 100 independent modules, each with their own test that takes 1 second to run. In a new feature branch, you only change one of the modules. In this case, testing without pants would take 100 seconds, while testing with pants would take only 1 second.

Reducing our CI test load by 70%

Using pants’ automatic dependency inference, we trigger 30% fewer tests for each file that we change on average.

However, I noticed that small changes in certain parts of the codebase were still triggering a whole swarm of tests. With some further debugging, we reduce the test load by 70% in total.

The approach outlined in the previous section is only as effective as your software design: for example, if all unit tests depend on a module Foo, and Foo depends on all other parts of the codebase, you’re back at square one: any change triggers all unit tests, so you don’t get as much of a speed up as you would ideally have.

Thankfully, pants’ dependents (documentation) and paths (documentation) commands are perfect for this scenario.

The dependents command shows you the list of files that depend on the file that you’re inspecting. Since our testing is done with the --transitive flag, we use the same flag for this command as well.

pants dependents --transitive <file_name>

I picked a set of files that I knew were triggering unexpected tests, then filtered the outputs to only include test files.

pants dependents --transitive <file_name> | grep tests

This gave me a set of tests that I could inspect. Given these tests, I used the paths command to figure out the dependency path:

pants paths --from=<test_file> --to=<file_name>

In our case we got outputs like the following:

[
  [
    <test_file>,
    ...,
    /.../conftest.py,
    ...,
    <file_name>,
  ]
]

Aha! A conftest.py was importing a whole range of dependencies, and since all tests in the suite depend on conftest.py although they don’t explicitly import from this file, pants was triggering a whole bunch of unrelated tests!

To solve this, I refactored our test modules to break apart this dependency. I figured out what fixtures from the conftest.py were needed in which tests, and put them in their respective tests. Interestingly enough, it was very rare for a fixture in confest.py to actually be needed across multiple tests.

In total, we reduce the average number of tests a given file triggers by 70%, a substantial improvement for us!

Conclusion

I’ve thoroughly enjoyed using pants as it allows us to do what is only necessary. We also use it for linting, formatting, and type checking only the files that are affected by any changes since the main branch.

I think the commands that I’ve found the most useful for me are dependents and dependencies, as they can help you to understand the structure of your code a lot better, and possibly fix things if the structure doesn’t meet your expectations e.g. “this module shouldn’t depend on this!”

You can read more about some of the history of pants here, and how to use it from the documentation here.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Presenting in industry as an academic
  • Training foundation models up to 10x more efficiently with Memory-Mapped Datasets
  • Going from nothing to a first model
  • No more SQL: using ibis as a machine learning researcher