Overview
Test splitting by timing tries to split the tests across N nodes (based on the parallelism number) as evenly as possible. However, the evenness of the split can depend on many reasons.
We will explore some of these reasons below.
Changing duration of a test case
Firstly, test-splitting by timing data uses historical data (past test results) to split the current job’s tests.
Importantly, this means the system assumes that existing test cases should run in the same time as before.
However, there can be reasons why this assumption may no longer be true.
There can be some reasons why a subsequent run of the same test case is slower or faster than before.
Possible scenarios include:
- Test case requires connecting to an external service (e.g., remote server for an acceptance test)
- Test case is updated to include more steps (e.g., additional test fixtures, or longer test setup / teardown)
Partitioning Problem
The evenness of a split also depends on how well CircleCI can partition these test cases into N nodes.
When the test cases have outliers in terms of duration, it can make partitioning challenging.
Example
Let's say we are grouping test cases by filenames here.
The duration taken to run the tests for all files were as follow:
- a_test.py : 1 minute
- b_test.py : 1 minute
- c_test.py : 1 minute
- d_test.py : 2 minutes
- e_test.py : 5 minutes
Let's say we have `parallelism: 2` set currently.
Assuming a greedy number partitioning approach, we can split the files to 2 nodes nicely:
- e_test.py ( = 5 minutes )
- d_test.py, c_test.py, b_test.py, a_test.py ( = 5 minutes )
Now, what if we want `parallelism: 3` then?
This would likely lead to the following situation:
- e_test.py ( = 5 minutes )
- d_test.py ( = 2 minutes )
- c_test.py, b_test.py, a_test.py ( = 3 minutes )
As we can see, node 1 here will take considerably longer than nodes 2 and 3.
However, you can say this is the best way we can split or partition the files across the 3 nodes based on their duration.
The outlier here is e_test.py, since its drastically long duration can make splitting evenly a challenge, depending on the parallelism number.
In this case, we can use a little bit of `jq` to aggregate the test cases' duration, for investigation.
This can give us a quick idea if our test-splitting may be affected by cases like the above.
First, download the test metadata JSON from the CircleCI API:
# NOTE: replace the project slug and job number as required
curl -H "Circle-Token: ${CIRCLE_TOKEN}" \
"https://circleci.com/api/v2/project/github/foobar/repo/123/tests" > metadata.json
Next, we can group by filename, and sum the duration, before sorting by slowest first.
jq ".items | group_by(.file) | map({key: .[0].file, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json
If you are grouping by classname, run the following instead:
jq ".items | group_by(.classname) | map({key: .[0].classname, value: map(.run_time) | add}) | sort_by(.value) | reverse | from_entries" metadata.json
You can get an output like this, as an example:
# "filename or classname" : duration in seconds
{
"spec/foobar/foobar_spec.rb": 92.912708,
"spec/foobar/hoge_spec.rb": 49.292962,
"spec/fizzbuzz/controller_spec.rb": 2.842457,
"spec/uploader/uploader_spec.rb": 0.145463,
"spec/analyzer/analyzer_spec.rb": 0.046291
}
Comments
Article is closed for comments.