这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@edigaryev
Copy link
Collaborator

@edigaryev edigaryev commented Dec 10, 2024

This makes tart pull times comparable to skopeo, see benchmarks below.

Will post the profiling results tomorrow.

Related to #963.

Benchmarks

Setup

Docker Registry on local machine was used with the following config.yml:

version: 0.1
log:
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /tmp/registry
http:
  addr: :8080

To start it:

git clone https://github.com/distribution/distribution.git
cd distribution
go run cmd/registry/main.go serve config.yml

Then, push the image to be later pulled in benchmarks:

tart push --insecure macos-sequoia-base 127.0.0.1:8080/a/b:latest

Codebase modifications

The following patch was applied to scripts/run-signed.sh to produce a release binary instead of a debug one:

diff --git a/scripts/run-signed.sh b/scripts/run-signed.sh
index 63a0c22..893144c 100755
--- a/scripts/run-signed.sh
+++ b/scripts/run-signed.sh
@@ -5,14 +5,12 @@
 
 set -e
 
-swift build --product tart
-codesign --sign - --entitlements Resources/tart-dev.entitlements --force .build/debug/tart
+swift build -c release --product tart
+codesign --sign - --entitlements Resources/tart-dev.entitlements --force .build/release/tart
 
 rm -Rf .build/tart.app/
 mkdir -p .build/tart.app/Contents/MacOS .build/tart.app/Contents/Resources
-cp -c .build/debug/tart .build/tart.app/Contents/MacOS/tart
+cp -c .build/release/tart .build/tart.app/Contents/MacOS/tart
 cp -c Resources/embedded.provisionprofile .build/tart.app/Contents/embedded.provisionprofile
 cp -c Resources/Info.plist .build/tart.app/Contents/Info.plist
 cp -c Resources/AppIcon.png .build/tart.app/Contents/Resources
-
-.build/tart.app/Contents/MacOS/tart "$@"

Results

skopeo

Note that skopeo:

  • doesn't decompress any data
  • doesn't sift through the decompressed data (which is on average 2-3 times larger than the compressed data)
  • doesn't analyze decompressed data for zero-chunks
  • doesn't write the decompressed data to disk (which is on average 2-3 times larger than the compressed data)

--image-parallel-copies=1

% hyperfine --warmup 0 --runs 3 --prepare 'rm -rf skopeo && sudo purge && sync' -- 'skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=1 docker://127.0.0.1:8080/a/b:latest dir:./skopeo'
Benchmark 1: skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=1 docker://127.0.0.1:8080/a/b:latest dir:./skopeo
  Time (mean ± σ):     24.047 s ±  0.149 s    [User: 13.335 s, System: 14.634 s]
  Range (min … max):   23.878 s … 24.159 s    3 runs

--image-parallel-copies=2

% hyperfine --warmup 0 --runs 3 --prepare 'rm -rf skopeo && sudo purge && sync' -- 'skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=2 docker://127.0.0.1:8080/a/b:latest dir:./skopeo'
Benchmark 1: skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=2 docker://127.0.0.1:8080/a/b:latest dir:./skopeo
  Time (mean ± σ):     18.818 s ±  0.671 s    [User: 13.818 s, System: 17.813 s]
  Range (min … max):   18.356 s … 19.588 s    3 runs

--image-parallel-copies=4

% hyperfine --warmup 0 --runs 3 --prepare 'rm -rf skopeo && sudo purge && sync' -- 'skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=4 docker://127.0.0.1:8080/a/b:latest dir:./skopeo'
Benchmark 1: skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=4 docker://127.0.0.1:8080/a/b:latest dir:./skopeo
  Time (mean ± σ):     15.227 s ±  2.067 s    [User: 13.761 s, System: 21.875 s]
  Range (min … max):   13.134 s … 17.267 s    3 runs

--image-parallel-copies=8

% hyperfine --warmup 0 --runs 3 --prepare 'rm -rf skopeo && sudo purge && sync' -- 'skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=8 docker://127.0.0.1:8080/a/b:latest dir:./skopeo'
Benchmark 1: skopeo --insecure-policy copy --src-tls-verify=false --image-parallel-copies=8 docker://127.0.0.1:8080/a/b:latest dir:./skopeo
  Time (mean ± σ):     14.857 s ±  1.905 s    [User: 12.883 s, System: 19.622 s]
  Range (min … max):   12.803 s … 16.567 s    3 runs

This PR

--concurrency 1

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1
  Time (mean ± σ):     18.543 s ±  0.126 s    [User: 7.974 s, System: 17.964 s]
  Range (min … max):   18.408 s … 18.657 s    3 runs

--concurrency 2

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2
  Time (mean ± σ):     26.944 s ±  0.184 s    [User: 8.363 s, System: 37.398 s]
  Range (min … max):   26.738 s … 27.092 s    3 runs

--concurrency 4

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4
  Time (mean ± σ):     25.068 s ±  0.537 s    [User: 8.745 s, System: 38.266 s]
  Range (min … max):   24.462 s … 25.484 s    3 runs

--concurrency 8

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8
  Time (mean ± σ):     25.073 s ±  0.566 s    [User: 8.886 s, System: 38.448 s]
  Range (min … max):   24.419 s … 25.411 s    3 runs

main branch of Tart

--concurrency 1

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1
  Time (mean ± σ):     28.174 s ±  1.532 s    [User: 8.426 s, System: 21.014 s]
  Range (min … max):   27.095 s … 29.927 s    3 runs

--concurrency 2

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2
  Time (mean ± σ):     57.805 s ±  1.059 s    [User: 8.936 s, System: 65.709 s]
  Range (min … max):   56.656 s … 58.744 s    3 runs

--concurrency 4

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4
  Time (mean ± σ):     71.114 s ±  1.532 s    [User: 9.225 s, System: 102.556 s]
  Range (min … max):   69.374 s … 72.258 s    3 runs

--concurrency 8

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8
  Time (mean ± σ):     503.348 s ± 397.469 s    [User: 9.680 s, System: 3580.003 s]
  Range (min … max):   71.936 s … 854.680 s    3 runs

main branch of Tart with 4 MiB hole granularity size

This benchmark allows us to compare the performance boost that the rest of the changes (unrelated to b1b973f) provide.

--concurrency 1

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 1
  Time (mean ± σ):     31.905 s ±  2.162 s    [User: 9.221 s, System: 22.799 s]
  Range (min … max):   29.646 s … 33.954 s    3 runs

--concurrency 2

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 2
  Time (mean ± σ):     36.394 s ±  1.611 s    [User: 9.750 s, System: 44.494 s]
  Range (min … max):   34.682 s … 37.880 s    3 runs

--concurrency 4

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 4
  Time (mean ± σ):     36.252 s ±  0.445 s    [User: 10.509 s, System: 54.732 s]
  Range (min … max):   35.925 s … 36.759 s    3 runs

--concurrency 8

More than 453 seconds:

% ./scripts/run-signed.sh >/dev/null 2>/dev/null && hyperfine --warmup 0 --runs 3 --prepare 'tart delete 127.0.0.1:8080/a/b:latest ; sudo purge && sync' -- '.build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8'
Benchmark 1: .build/tart.app/Contents/MacOS/tart pull --insecure 127.0.0.1:8080/a/b:latest --concurrency 8
[...]

Relation of zero chunk size to the amount of zero bytes removed/deduplicated

In b1b973f, the zero chunk size was increased from 64 KiB to 4 MiB.

This increased performance by not thrashing the I/O. Moreover, according to my tests, the change in the amount of zero bytes removed/deduplicated is less than a percent (0.4%-0.5%).

Previously, when we had no concurrency, 64 KiB chunks were fine because there was no I/O contention from multiple tasks. But now, the choice of zero chunk size matters more, according to the benchmarks above.

ghcr.io/cirruslabs/macos-sonoma-vanilla:latest@sha256:c2f45c38060134bf22b32e842ff09b4876cc62bfb378cf2c8fb5fb14481e0551

Size (total) Zero chunk size Zero bytes removed (amount) Zero bytes removed (%)
50 GB 64 KiB 32 GB 64.07%
50 GB 256 KiB 32 GB 63.98%
50 GB 1 MiB 32 GB 63.84%
50 GB 4 MiB 32 GB 63.66%
50 GB 16 MiB 32 GB 63.12%
50 GB 64 MiB 31 GB 61.21%

ghcr.io/cirruslabs/macos-sequoia-vanilla:latest@sha256:5db1b4479d188b0db4e372fdac1b5dab1e5ebcf54f7bfa2dfe6868d9c1e29bb4

Size (total) Zero chunk size Zero bytes removed (amount) Zero bytes removed (%)
50 GB 64 KiB 30 GB 60.19%
50 GB 256 KiB 30 GB 60.10%
50 GB 1 MiB 30 GB 59.97%
50 GB 4 MiB 30 GB 59.73%
50 GB 16 MiB 30 GB 59.03%
50 GB 64 MiB 29 GB 57.18%

ghcr.io/cirruslabs/macos-runner:sonoma@sha256:3d427d5f948a0c1dc366d541ad984e52d8b08202395f08845c4f83dc930720dd

Size (total) Zero chunk size Zero bytes removed (amount) Zero bytes removed (%)
340 GB 64 KiB 120 GB 35.26%
340 GB 256 KiB 120 GB 35.14%
340 GB 1 MiB 119 GB 34.99%
340 GB 4 MiB 118 GB 34.87%
340 GB 16 MiB 118 GB 34.68%
340 GB 64 MiB 117 GB 34.42%

ghcr.io/cirruslabs/macos-runner:sequoia@sha256:565cbf64c464b165371b39e304477bd75e1c545329350e7997a6cf45faa3fdc8

Size (total) Zero chunk size Zero bytes removed (amount) Zero bytes removed (%)
320 GB 64 KiB 118 GB 36.85%
320 GB 256 KiB 118 GB 36.71%
320 GB 1 MiB 117 GB 36.54%
320 GB 4 MiB 116 GB 36.37%
320 GB 16 MiB 115 GB 36.05%
320 GB 64 MiB 114 GB 35.78%

@edigaryev edigaryev requested a review from fkorotkov as a code owner December 10, 2024 21:49
@edigaryev
Copy link
Collaborator Author

Will post the profiling results tomorrow.

Just to clarify, the main goal I see here in doing profiling is to avoid regressions, namely excessive allocations.

The goal is not to measure the performance, as this is already done in the benchmarks above.

I've also excluded --concurrency 8 from profiling as it's insanely slow and because --concurrency 4 should already highlight any issues, if any.

main branch of Tart

--concurrency 1

profiler-allocations-target-tart-concurrency-1

--concurrency 2

profiler-allocations-target-tart-concurrency-2

--concurrency 4

profiler-allocations-target-tart-concurrency-4

main branch of Tart with 4 MiB hole granularity size

--concurrency 1

profiler-allocations-target-tart-main-branch-with-4mib-holes-concurrency-1

--concurrency 2

profiler-allocations-target-tart-main-branch-with-4mib-holes-concurrency-2

--concurrency 4

profiler-allocations-target-tart-main-branch-with-4mib-holes-concurrency-4

This PR

--concurrency 1

profiler-allocations-target-this-pr-concurrency-1

--concurrency 2

profiler-allocations-target-this-pr-concurrency-2

--concurrency 4

profiler-allocations-target-this-pr-concurrency-4

@edigaryev edigaryev merged commit 31ab421 into main Dec 11, 2024
7 checks passed
@edigaryev edigaryev deleted the tart-pull-speedup branch December 11, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants