If your GHA Windows jobs are I/O heavy, double check whether the I/O work is happening on the C: drive (OS) or D: drive (working space). If the C: drive is being used, you will likely benefit by moving that work to the D: drive. If you’re feeling more adventurous, you should also consider trying a ReFS/Dev Drive.
A few weeks ago, I was waiting for pip’s CI to turn π so I could merge a PR. I’d just spent an hour on review and had pushed a few final tweaks. Waiting an additional 20 minutes was the last thing I wanted to do.1
I was sufficiently annoyed with checking and re-checking the CI status, so I enabled auto-merging on the repository so I could at least focus my attention on something entirely different while waiting for CI to pass. That smoothed over the pain, but it got me wondering, can pip’s tests (and CI in particular) be faster?
It turns out for the Windows CI jobs, the answer was: yes, they could be much faster.
How slow was pip’s Windows CI?
At the time of writing2, pip’s Github Actions CI workflow consists of 28 jobs. 23 run the test suite across Ubuntu, macOS, Windows, and CPython 3.8 through 3.13.
- 7 on the
ubuntu-latest
runners - 6 on the
macos-latest
(ARM) runners - 6 on the
macos-13
(Intel) runners - 4 on the
windows-latest
runners
The extra Ubuntu job runs the functional test suite against a zipapp version of pip.
On the other hand, the Windows jobs have historically been so slow that we don’t even test pip across the entire supported Python version matrix. We only test the oldest and newest version, 3.8 and 3.13, on Windows. Plus, the test suite is sharded over two jobs per version becauseβas you can probably guessβthe Windows jobs are that much slower π.
I wrote a script that queries the GitHub API and calculates the average job duration over the last X successful runs. As a baseline, this is what pip’s CI looked like:
Last 50 CI runs on main
Job Mean Minimum Stdev
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
tests / 3.10 / macos-latest 6:25 5:44 0:34 (8%)
tests / 3.11 / macos-latest 6:40 6:03 0:39 (9%)
tests / 3.9 / macos-latest 6:57 6:03 0:55 (13%)
tests / 3.13 / macos-latest 6:59 6:15 0:37 (8%)
tests / 3.12 / macos-latest 7:09 6:14 1:01 (14%)
tests / 3.8 / macos-latest 7:11 6:32 0:31 (7%)
tests / 3.11 / ubuntu-latest 9:33 9:08 0:10 (1%)
tests / 3.10 / ubuntu-latest 10:04 9:43 0:14 (2%)
tests / 3.12 / ubuntu-latest 10:34 10:16 0:14 (2%)
tests / 3.13 / ubuntu-latest 10:35 10:10 0:12 (2%)
tests / 3.13 / Windows / 2 10:39 9:55 0:20 (3%)
tests / 3.8 / ubuntu-latest 11:02 10:41 0:14 (2%)
tests / 3.13 / Windows / 1 11:13 10:29 0:24 (3%)
tests / 3.9 / ubuntu-latest 11:44 11:20 0:19 (2%)
tests / 3.10 / macos-13 12:10 8:53 2:16 (18%)
tests / 3.11 / macos-13 12:59 9:09 2:34 (19%)
tests / 3.13 / macos-13 12:59 9:54 2:41 (20%)
tests / 3.9 / macos-13 13:10 10:09 2:42 (20%)
tests / 3.8 / macos-13 13:34 10:09 2:30 (18%)
tests / 3.12 / macos-13 13:55 9:54 3:23 (24%)
tests / 3.8 / Windows / 2 14:58 14:04 0:26 (2%)
tests / zipapp 16:53 16:27 0:15 (1%)
tests / 3.8 / Windows / 1 17:37 16:40 0:30 (2%)
Among the slowest three jobs, the Windows 3.8 jobs hold the 3rd and 1st places for duration. Yikes!
Windows file I/O is poor on GitHub Actions
It’s a known issue that file I/O on the Windows runners is noticeably slower than the Ubuntu and macOS runners.
As a package installer, pip’s test suite is especially I/O heavy. Most functional tests require at least setting a scratch environment and installing something into it. Countless files and directories are created, opened, and deleted during a run.
Moving TEMP to the D: drive
According to Steve Dower, the GHA Windows workers have a slow (but read-optimized) C: system drive and a faster D: drive for working space.
Only the standard Windows runners have a D: drive. The larger (paid) Window runners do NOT have a D: drive. For such situations, ReFS/Dev Drives should be considered.
For many projects, this is probably fine. The repository is still checked out on the D:
drive. Any files you create in the checkout will be on the D: drive. For pip’s test suite,
however, all of the temporary files are created under the system temporary directory.
Where is this path set to by default? On the C: drive at
C:/Users/runneradmin/AppData/Local/Temp/
.3
Thus, pip’s CI was being slowed down by reading and writing primarily to the non-performant disk.
The good news is that this can be overridden by setting the TEMP
environment variable
which Python’s tempfile
will respect.
- name: Set TEMP to D:/Temp
run: |
mkdir "D:\\Temp"
echo "TEMP=D:\\Temp" >> $env:GITHUB_ENV
This simple switch to D:/Temp
translated into the Windows CI time decreasing
by up to 30%. π
Job | Before | After | % decrease |
---|---|---|---|
Python 3.8 (1) | 17m 37s | 12m 41s | 28% |
Python 3.8 (2) | 14m 58s | 10m 44s | 28% |
Python 3.13 (1) | 11m 13s | 9m 53s | 11% |
Python 3.13 (2) | 10m 39s | 9m 32s | 10% |
Or in graph form (note: the data shown here is slightly more recent than the statistics).
What about ReFS or Dev Drives?
When I was investigating ways to improve file I/O performance on Windows, I’d discovered Dev Drives. Now, I am no Windows expert. I stopped using the OS on a daily basis many years ago, but the TL;DR is they are a new kind of storage volume, built on the modern ReFS (Resilient File System) filesystem and designed with developer workloads in mind.
The
uv project has had good success with using ReFS to speed up their Windows GHA jobs,
thus I actually started with a ReFS drive4 while optimising pip’s Windows CI.
Unfortunately, using a ReFS drive
was noticeably slower than simply moving TEMP
to the D: drive. ReFS did save ~a
minute for Python 3.8 (1)
but that pales in comparison with the nearly five minutes
improvement from TEMP=D:/Temp
.
The performance improvements of ReFS/Dev Drives are primarily realized by deferring malware scans (Dev Drive only) and the inherent optimizations of the newer ReFS filesystem. Presumably the malware scans are already disabled on GitHub Actions, which would explain why pip’s CI did not see as big of an improvement as I was expecting.
Given uv’s success with ReFS, I can’t universally recommend using the D: drive over
ReFS/Dev Drives, and vice versa. Which one will be faster is likely workload
dependent. If you would like to experiment with ReFS/Dev Drives, my Powershell script to
create such a drive can be found here. To invoke it, simply pass the desired
drive letter and volume size: devdrive.ps1 -Drive R -Size 4GB
.
Happy experimenting!
Before you kill me for complaining, let me provide some context… When I used to maintain Black, I never really noticed how CI took. Most of the jobs would finish in 3-6 minutes. Short enough that I could look over my changes one last time, add notes as comments, and then come back to merge.
I did notice that Black’s test suite was slow, taking a good few minutes on the slow laptop I had at that time. I took a look at the test logic, trimmed redundant work, and added pytest-xdist for parallelized testing. Afterwards, local test suite runs took less a minute!
I was spoiled by Black’s fast tests and CI, admittedly. ↩︎I wrote the majority of this post in January… ↩︎
We’ve actually set it to
C:/Temp
to work around filename too long errors. Also, the exact default system temporary directory may vary from worker to worker. I don’t feel like double checking this. ↩︎When I did my experiments, the Windows GHA runners did not support Dev Drives. Only ReFS was supported. ↩︎