I have split my tests by timing, but why is the split not even?


Test splitting by timing tries to split the tests across N nodes (based on the parallelism number) as evenly as possible. However, the evenness of the split can depend on many reasons.

We will explore some of these reasons below.


New test files introduced

In your latest runs, there may be new test files introduced (e.g., a pull request that implements a new feature would likely add new tests).

In such a scenario, CircleCI test-splitting will not have past results for these new test files.
As such, CircleCI will spread these new test files across the N nodes randomly.
This can cause this first run to possibly have uneven splits overall.
However, subsequent runs will improve the split, since we would now save updated test results.

To confirm if there were no test files with no recorded past durations, you will notice the (example) following in the step's output:

Autodetected filename timings.
No timing found for "spec/controllers/my_foobar_feature_spec.rb"


Changing duration of a test case

Firstly, test-splitting by timing data uses historical data (past test results) to split the current job’s tests.
Importantly, this means the system assumes that existing test cases should run in the same time as before.

However, there can be reasons why this assumption may no longer be true.
There can be some reasons why a subsequent run of the same test case is slower or faster than before.

Possible scenarios include:

  • Test case requires connecting to an external service (e.g., remote server for an acceptance test)
  • Test case is updated to include more steps (e.g., additional test fixtures, or longer test setup / teardown)

Partitioning Problem

The evenness of a split also depends on how well CircleCI can partition these test cases into N nodes.

When the test cases have outliers in terms of duration, it can make partitioning challenging.


Let's say we are grouping test cases by filenames here.
The duration taken to run the tests for all files were as follow:

  • a_test.py : 1 minute
  • b_test.py : 1 minute 
  • c_test.py : 1 minute
  • d_test.py : 2 minutes
  • e_test.py : 5 minutes

Let's say we have `parallelism: 2` set currently.
Assuming a greedy number partitioning approach, we can split the files to 2 nodes nicely:

  1. e_test.py ( = 5 minutes ) 
  2. d_test.py, c_test.py, b_test.py, a_test.py ( = 5 minutes )

Now, what if we want `parallelism: 3` then?

This would likely lead to the following situation:

  1. e_test.py ( = 5 minutes )
  2. d_test.py ( = 2 minutes )
  3. c_test.py, b_test.py, a_test.py ( = 3 minutes )

As we can see, node 1 here will take considerably longer than nodes 2 and 3.
However, you can say this is the best way we can split or partition the files across the 3 nodes based on their duration.

The outlier here is e_test.py, since its drastically long duration can make splitting evenly a challenge, depending on the parallelism number.

In this case, we can use a little bit of `jq` to aggregate the test cases' duration, for investigation.

This can give us a quick idea if our test-splitting may be affected by cases like the above.

First, download the test metadata JSON from the CircleCI API:

# NOTE: replace the project slug and job number as required
curl -H "Circle-Token: ${CIRCLE_TOKEN}" \
"https://circleci.com/api/v2/project/github/foobar/repo/123/tests" > metadata.json

Next, we can group by filename, and sum the duration, before sorting by slowest first.

jq ".items | group_by(.file) | map({key: .[0].file, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json

If you are grouping by classname, run the following instead:

jq ".items | group_by(.classname) | map({key: .[0].classname, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json

You can get an output like this, as an example:

# "filename or classname" : duration in seconds

"spec/foobar/foobar_spec.rb": 92.912708,
"spec/foobar/hoge_spec.rb": 49.292962,
"spec/fizzbuzz/controller_spec.rb": 2.842457,
"spec/uploader/uploader_spec.rb": 0.145463,
"spec/analyzer/analyzer_spec.rb": 0.046291
Was this article helpful?
1 out of 4 found this helpful



Article is closed for comments.