I have split my tests by timing, but why is the split not even?

Overview

Test splitting by timing tries to split the tests across N nodes (based on the parallelism number) as evenly as possible. However, the evenness of the split can depend on many reasons.

We will explore some of these reasons below.

New test files introduced

In your latest runs, there may be new test files introduced (e.g., a pull request that implements a new feature would likely add new tests).

In such a scenario, CircleCI test-splitting will not have past results for these new test files.
As such, CircleCI will spread these new test files across the N nodes randomly.
This can cause this first run to possibly have uneven splits overall.
However, subsequent runs will improve the split, since we would now save updated test results.

To confirm if there were no test files with no recorded past durations, you will notice the (example) following in the step's output:

Autodetected filename timings.
No timing found for "spec/controllers/my_foobar_feature_spec.rb"
...

Changing duration of a test case

Firstly, test-splitting by timing data uses historical data (past test results) to split the current job’s tests.
Importantly, this means the system assumes that existing test cases should run in the same time as before.

However, there can be reasons why this assumption may no longer be true.
There can be some reasons why a subsequent run of the same test case is slower or faster than before.

Possible scenarios include:

Test case requires connecting to an external service (e.g., remote server for an acceptance test)
Test case is updated to include more steps (e.g., additional test fixtures, or longer test setup / teardown)

Partitioning Problem

The evenness of a split also depends on how well CircleCI can partition these test cases into N nodes.

When the test cases have outliers in terms of duration, it can make partitioning challenging.

Example

Let's say we are grouping test cases by filenames here.
The duration taken to run the tests for all files were as follow:

a_test.py : 1 minute
b_test.py : 1 minute
c_test.py : 1 minute
d_test.py : 2 minutes
e_test.py : 5 minutes

Let's say we have `parallelism: 2` set currently.
Assuming a greedy number partitioning approach, we can split the files to 2 nodes nicely:

e_test.py ( = 5 minutes )
d_test.py, c_test.py, b_test.py, a_test.py ( = 5 minutes )

Now, what if we want `parallelism: 3` then?

This would likely lead to the following situation:

e_test.py ( = 5 minutes )
d_test.py ( = 2 minutes )
c_test.py, b_test.py, a_test.py ( = 3 minutes )

As we can see, node 1 here will take considerably longer than nodes 2 and 3.
However, you can say this is the best way we can split or partition the files across the 3 nodes based on their duration.

The outlier here is e_test.py, since its drastically long duration can make splitting evenly a challenge, depending on the parallelism number.

In this case, we can use a little bit of `jq` to aggregate the test cases' duration, for investigation.

This can give us a quick idea if our test-splitting may be affected by cases like the above.

First, download the test metadata JSON from the CircleCI API:

# NOTE: replace the project slug and job number as required
curl -H "Circle-Token: ${CIRCLE_TOKEN}" \
  "https://circleci.com/api/v2/project/github/foobar/repo/123/tests" > metadata.json

Next, we can group by filename, and sum the duration, before sorting by slowest first.

jq ".items | group_by(.file) | map({key: .[0].file, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json

If you are grouping by classname, run the following instead:

jq ".items | group_by(.classname) | map({key: .[0].classname, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json

You can get an output like this, as an example:

# "filename or classname" : duration in seconds

{
  "spec/foobar/foobar_spec.rb": 92.912708,
  "spec/foobar/hoge_spec.rb": 49.292962,
  "spec/fizzbuzz/controller_spec.rb": 2.842457,
  "spec/uploader/uploader_spec.rb": 0.145463,
  "spec/analyzer/analyzer_spec.rb": 0.046291
}

Overview

New test files introduced

Changing duration of a test case

Partitioning Problem

Example

Comments

Articles in this section