Understanding Network Charges and Workspaces
Within CircleCI, network egress charges can apply when making use of self-hosted runners or the IP ranges feature.
This means that restoring or attaching of any caches, artifacts or workspaces will incur a charge.
Of these, some users can get tripped up by the costs associated with restoring workspaces multiple times across a large workflow.
This article aims to explain this pitfall and suggest some optimisations to reduce egress costs.
A Brief Recap on Workspaces
Workspaces are a tool that allows you to persist data between jobs in a given workflow.
Each workflow can contain up to one associated workspace, which can only be used within the workflow it was created in.
As such, whenever a workspace is attached, it will attach all data previously persisted to the workspace across the whole workflow. For example, if two parallel jobs each attach 20MB of unique data, any subsequent jobs will attach both, resulting in 40MB of data being attached.
How This Affects Egress Charges
When using self-hosted runners and IP ranges, network egress charges will apply only when attaching workspaces.
Due to the nature of workspaces containing everything previously persisted, this can result in poorly optimised workspaces exponentially increasing in size, which in turn can lead to higher than expected network charges.
The following diagram outlines an example workflow with different jobs.
Each job persists a certain amount of data, shown as the number above each box, and attaches a certain amount, shown as the number on the arrows.
The total amount of data attached is shown as the larger numbers below.
Despite none of the workspace being particularly large, the total network egress associated with the workflow grows quickly.
Optimisation Techniques
To reduce this effect, there are a few techniques that can help:
- Reducing the scope of workspace persists
- Persist only what is strictly necessary for your workflow
- Avoid making use of wildcards, and explicitly state the paths to be persisted
- Making use of caches
- For specific data that may not be needed in every job, making use of caches can allow more granular control of what data is restored
Comments
Article is closed for comments.