There are a number of reasons why test splitting across parallel containers may be behaving unexpectedly. Please be sure you have thoroughly read Running Tests in Parallel.
The store_test_results step will ensure test timings are saved, but it doesn't allow for easy debugging. You may also want to upload your tests via store_artifacts to visually verify the number of tests being run.
Echo Test Results
Similar to saving artifacts, here is an example of how you can implement a clean test split command and also echo out the test data.
TESTFILES=$(circleci tests glob "test/**_test.rb" | circleci tests split --split-by=timings)
# bundle exec rake knapsack_pro:minitest
bundle exec rspec --format progress \
--format RspecJunitFormatter \
-o test/reports/rspec.xml \
Find Slowest Test
Some testing libraries such as RSpec will report back the "Slowest test":
Investigate these tests to see if they take considerably longer than other test files and see what can be done to make the test smaller or more efficient.
Vary Your Parallelism
Different amounts of parallelism may have a great effect on your tests, depending on the way they are written. It may also help you identify particular tests that are causing problems repeatedly.
Timing data will be saved upon a successful "green" build. These results are then analyzed over the previous several builds when being used for splitting in the future. If these results are skewed or otherwise improper, they will continue to affect your build timing after you have corrected the issue. Once the issue has been resolved, the timing data will normalize over the next several builds and produce the expected results. Please take this into consideration and push several commits to flush the system. You can view your timing data in $CIRCLE_INTERNAL_TASK_DATA/circle-test-results if you need to.
Variable Length Tests
You may have a test or suite of tests that naturally will vary greatly in their completion time, either on purpose, or for a number of other reasons. This is common with UI or Unit Testing. If your test timing varies with every test, the CircleCI splitting system will never be able to determine valid timing data.
Consider A Balancing Library