ci: cache the sccache directory across C++ test builds#765
Conversation
54357f1 to
73b91d4
Compare
|
Thanks for working on this. I was just about to file an issue to track enabling sccache on all platforms after finishing Building third-party dependencies is quite expensive, so if this works well, it can help reducing GitHub Actions resource consumption. +1 for this direction. |
| shell: bash | ||
| run: bash ci/scripts/start_minio.sh | ||
| - name: Restore sccache cache | ||
| uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5 |
There was a problem hiding this comment.
It looks like the Windows workflow was using mozilla-actions/sccache-action together with SCCACHE_GHA_ENABLED, while this PR changes it to use explicit actions/cache restore/save steps plus mozilla-actions/sccache-action.
I'm not sure which approach is better, but in case you missed it, mozilla-actions/sccache-action README documents the SCCACHE_GHA_ENABLED setup here:
https://github.com/mozilla-actions/sccache-action#cc-code
There was a problem hiding this comment.
Hey @zhjwpku, thank you for the review!
With SCCACHE_GHA_ENABLED, sccache uploads each compiled object as its own cache entry, which adds up to hundreds per build. On the larger Linux/macOS builds this hits GitHub's upload rate-limit throttle and surfaces as cache write errors. The build still passes when we hit the throttle, but the affected objects silently don't get cached. Pointing sccache at a local directory with a single actions/cache save/restore turns it into one archive per leg, so there's nothing to throttle. In my fork that brought write errors down to zero, and it was a bit faster overall since it uploads once instead of hundreds of small objects that each take a network round trip.
There was a problem hiding this comment.
Great, in that case, you may want to make the same change to the Meson Windows build as well. I believe it was missed in this PR, unless I'm overlooking.
There was a problem hiding this comment.
Good call out - I did leave it out of this PR. I was worried there might be too many changes in one PR for review as it already has a sizeable diff on the CI files. If you think it'd be fine here I can add it, or if you prefer I can raise it as a follow-up, wdyt?
There was a problem hiding this comment.
I think you can add that here in a separate commit. It will trigger another CI run, and we'll be able to see how effective the cache is in practice.
What
Turn on compiler caching (sccache) for the Linux and macOS builds in
test,aws_test,sanitizer_test, andsql_catalog_test, and switch the Windowstestbuild to the same setup.mainbuilds once and saves the cache; pull requests reuse it without writing back.Why
Right now only the Windows builds reuse compiled output — every Linux and macOS build recompiles the whole bundled Arrow/Parquet/Avro/Boost stack from scratch, even though it never changes between PRs. Building it once and reusing it removes most of that repeated work. Saving the cache as a single file (instead of one upload per compiled file) also avoids the upload rate limit that causes "cache write error" spam.
Validation
On a warm pull-request run, every build reused the cache: 99.6–99.9% of files came from cache, zero write errors. The heavy builds drop from ~10–27 min to ~1.5–5 min.