这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@f41gh7
Copy link
Contributor

@f41gh7 f41gh7 commented Aug 5, 2025

This commit changes order of bucket structure fields and mu access
in order to improve CPU cache locality. Previously, mu and atomic counters were accessed with different CPU cache lines and strongly hit performance at older CPUs.

The following benchmark shows up to 30% performance increase:

go test -run=^$ -bench=BenchmarkCache
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/fastcache
cpu: AMD EPYC 7B12
              │    master    │               opt_v3                │
              │    sec/op    │   sec/op     vs base                │
CacheSet-8      4.666m ±  2%   4.674m ± 2%        ~ (p=0.739 n=10)
CacheGet-8      2.364m ±  4%   1.732m ± 7%  -26.74% (p=0.000 n=10)
CacheHas-8      2.171m ±  6%   1.652m ± 2%  -23.89% (p=0.000 n=10)
CacheSetGet-8   11.65m ± 11%   12.03m ± 3%        ~ (p=0.481 n=10)
geomean         4.087m         3.562m       -12.84%

              │    master     │                opt_v3                │
              │      B/s      │     B/s       vs base                │
CacheSet-8      13.39Mi ±  2%   13.37Mi ± 2%        ~ (p=0.755 n=10)
CacheGet-8      26.44Mi ±  5%   36.09Mi ± 7%  +36.50% (p=0.000 n=10)
CacheHas-8      28.80Mi ±  6%   37.83Mi ± 2%  +31.38% (p=0.000 n=10)
CacheSetGet-8   10.73Mi ± 10%   10.39Mi ± 3%        ~ (p=0.469 n=10)
geomean         18.19Mi         20.87Mi       +14.73%

              │    master     │                opt_v3                 │
              │     B/op      │     B/op       vs base                │
CacheSet-8      19.04Ki ± 10%   19.11Ki ±  7%        ~ (p=0.542 n=10)
CacheGet-8      9.598Ki ± 20%   7.006Ki ±  5%  -27.00% (p=0.000 n=10)
CacheHas-8      8.700Ki ± 23%   6.499Ki ±  3%  -25.31% (p=0.000 n=10)
CacheSetGet-8   41.59Ki ±  9%   40.87Ki ± 10%        ~ (p=0.362 n=10)
geomean         16.04Ki         13.73Ki        -14.36%

              │   master    │               opt_v3                │
              │  allocs/op  │  allocs/op   vs base                │
CacheSet-8      32.00 ±  9%   32.50 ±  8%        ~ (p=0.418 n=10)
CacheGet-8      16.00 ± 19%   12.00 ±  8%  -25.00% (p=0.000 n=10)
CacheHas-8      15.00 ± 20%   11.00 ±  9%  -26.67% (p=0.000 n=10)
CacheSetGet-8   71.00 ± 10%   70.00 ± 10%        ~ (p=0.396 n=10)
geomean         27.17         23.41        -13.85%

 This commit changes order of bucket structure fields and mu access
in order to improve CPU cache locality. Previously, mu and atomic
counters were accessed with different CPU cache lines and strongly hit
performance at older CPUs.

 The following benchmark shows up to 30% performance increase:

go test -run=^$ -bench=BenchmarkCache

goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/fastcache
cpu: AMD EPYC 7B12
              │    master    │               opt_v3                │
              │    sec/op    │   sec/op     vs base                │
CacheSet-8      4.666m ±  2%   4.674m ± 2%        ~ (p=0.739 n=10)
CacheGet-8      2.364m ±  4%   1.732m ± 7%  -26.74% (p=0.000 n=10)
CacheHas-8      2.171m ±  6%   1.652m ± 2%  -23.89% (p=0.000 n=10)
CacheSetGet-8   11.65m ± 11%   12.03m ± 3%        ~ (p=0.481 n=10)
geomean         4.087m         3.562m       -12.84%

              │    master     │                opt_v3                │
              │      B/s      │     B/s       vs base                │
CacheSet-8      13.39Mi ±  2%   13.37Mi ± 2%        ~ (p=0.755 n=10)
CacheGet-8      26.44Mi ±  5%   36.09Mi ± 7%  +36.50% (p=0.000 n=10)
CacheHas-8      28.80Mi ±  6%   37.83Mi ± 2%  +31.38% (p=0.000 n=10)
CacheSetGet-8   10.73Mi ± 10%   10.39Mi ± 3%        ~ (p=0.469 n=10)
geomean         18.19Mi         20.87Mi       +14.73%

              │    master     │                opt_v3                 │
              │     B/op      │     B/op       vs base                │
CacheSet-8      19.04Ki ± 10%   19.11Ki ±  7%        ~ (p=0.542 n=10)
CacheGet-8      9.598Ki ± 20%   7.006Ki ±  5%  -27.00% (p=0.000 n=10)
CacheHas-8      8.700Ki ± 23%   6.499Ki ±  3%  -25.31% (p=0.000 n=10)
CacheSetGet-8   41.59Ki ±  9%   40.87Ki ± 10%        ~ (p=0.362 n=10)
geomean         16.04Ki         13.73Ki        -14.36%

              │   master    │               opt_v3                │
              │  allocs/op  │  allocs/op   vs base                │
CacheSet-8      32.00 ±  9%   32.50 ±  8%        ~ (p=0.418 n=10)
CacheGet-8      16.00 ± 19%   12.00 ±  8%  -25.00% (p=0.000 n=10)
CacheHas-8      15.00 ± 20%   11.00 ±  9%  -26.67% (p=0.000 n=10)
CacheSetGet-8   71.00 ± 10%   70.00 ± 10%        ~ (p=0.396 n=10)
geomean         27.17         23.41        -13.85%

Signed-off-by: f41gh7 <nik@victoriametrics.com>
@f41gh7 f41gh7 requested review from Copilot, makasim and rtm0 August 5, 2025 09:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes CPU cache locality by reordering fields in the bucket struct and adjusting mutex acquisition order to improve performance of bucket.Get requests. Based on benchmark results, this change delivers up to 30% performance improvement for cache operations.

Key changes:

  • Reordered atomic counter fields (getCalls, setCalls, misses) to be adjacent to the mutex
  • Moved mutex acquisition before accessing chunks field in Set and Get methods

@codecov
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.68%. Comparing base (66aca6e) to head (c311ee8).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #92   +/-   ##
=======================================
  Coverage   76.68%   76.68%           
=======================================
  Files           4        4           
  Lines         549      549           
=======================================
  Hits          421      421           
  Misses         73       73           
  Partials       55       55           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@f41gh7 f41gh7 merged commit 2693e48 into master Aug 5, 2025
7 checks passed
@f41gh7 f41gh7 deleted the get-perf branch August 5, 2025 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants