tart pull: 284% faster pulls with default concurrency setting #970
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This makes
tart pulltimes comparable to skopeo, see benchmarks below.Will post the profiling results tomorrow.
Related to #963.
Benchmarks
Setup
Docker Registry on local machine was used with the following
config.yml:To start it:
git clone https://github.com/distribution/distribution.git cd distribution go run cmd/registry/main.go serve config.ymlThen, push the image to be later pulled in benchmarks:
Codebase modifications
The following patch was applied to
scripts/run-signed.shto produce areleasebinary instead of adebugone:Results
skopeo
Note that skopeo:
--image-parallel-copies=1--image-parallel-copies=2--image-parallel-copies=4--image-parallel-copies=8This PR
--concurrency 1--concurrency 2--concurrency 4--concurrency 8mainbranch of Tart--concurrency 1--concurrency 2--concurrency 4--concurrency 8mainbranch of Tart with 4 MiB hole granularity sizeThis benchmark allows us to compare the performance boost that the rest of the changes (unrelated to b1b973f) provide.
--concurrency 1--concurrency 2--concurrency 4--concurrency 8More than 453 seconds:
Relation of zero chunk size to the amount of zero bytes removed/deduplicated
In b1b973f, the zero chunk size was increased from 64 KiB to 4 MiB.
This increased performance by not thrashing the I/O. Moreover, according to my tests, the change in the amount of zero bytes removed/deduplicated is less than a percent (0.4%-0.5%).
Previously, when we had no concurrency, 64 KiB chunks were fine because there was no I/O contention from multiple tasks. But now, the choice of zero chunk size matters more, according to the benchmarks above.
ghcr.io/cirruslabs/macos-sonoma-vanilla:latest@sha256:c2f45c38060134bf22b32e842ff09b4876cc62bfb378cf2c8fb5fb14481e0551
ghcr.io/cirruslabs/macos-sequoia-vanilla:latest@sha256:5db1b4479d188b0db4e372fdac1b5dab1e5ebcf54f7bfa2dfe6868d9c1e29bb4
ghcr.io/cirruslabs/macos-runner:sonoma@sha256:3d427d5f948a0c1dc366d541ad984e52d8b08202395f08845c4f83dc930720dd
ghcr.io/cirruslabs/macos-runner:sequoia@sha256:565cbf64c464b165371b39e304477bd75e1c545329350e7997a6cf45faa3fdc8