-
Notifications
You must be signed in to change notification settings - Fork 187
improve bucket.Get requests performance #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit changes order of bucket structure fields and mu access
in order to improve CPU cache locality. Previously, mu and atomic
counters were accessed with different CPU cache lines and strongly hit
performance at older CPUs.
The following benchmark shows up to 30% performance increase:
go test -run=^$ -bench=BenchmarkCache
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/fastcache
cpu: AMD EPYC 7B12
│ master │ opt_v3 │
│ sec/op │ sec/op vs base │
CacheSet-8 4.666m ± 2% 4.674m ± 2% ~ (p=0.739 n=10)
CacheGet-8 2.364m ± 4% 1.732m ± 7% -26.74% (p=0.000 n=10)
CacheHas-8 2.171m ± 6% 1.652m ± 2% -23.89% (p=0.000 n=10)
CacheSetGet-8 11.65m ± 11% 12.03m ± 3% ~ (p=0.481 n=10)
geomean 4.087m 3.562m -12.84%
│ master │ opt_v3 │
│ B/s │ B/s vs base │
CacheSet-8 13.39Mi ± 2% 13.37Mi ± 2% ~ (p=0.755 n=10)
CacheGet-8 26.44Mi ± 5% 36.09Mi ± 7% +36.50% (p=0.000 n=10)
CacheHas-8 28.80Mi ± 6% 37.83Mi ± 2% +31.38% (p=0.000 n=10)
CacheSetGet-8 10.73Mi ± 10% 10.39Mi ± 3% ~ (p=0.469 n=10)
geomean 18.19Mi 20.87Mi +14.73%
│ master │ opt_v3 │
│ B/op │ B/op vs base │
CacheSet-8 19.04Ki ± 10% 19.11Ki ± 7% ~ (p=0.542 n=10)
CacheGet-8 9.598Ki ± 20% 7.006Ki ± 5% -27.00% (p=0.000 n=10)
CacheHas-8 8.700Ki ± 23% 6.499Ki ± 3% -25.31% (p=0.000 n=10)
CacheSetGet-8 41.59Ki ± 9% 40.87Ki ± 10% ~ (p=0.362 n=10)
geomean 16.04Ki 13.73Ki -14.36%
│ master │ opt_v3 │
│ allocs/op │ allocs/op vs base │
CacheSet-8 32.00 ± 9% 32.50 ± 8% ~ (p=0.418 n=10)
CacheGet-8 16.00 ± 19% 12.00 ± 8% -25.00% (p=0.000 n=10)
CacheHas-8 15.00 ± 20% 11.00 ± 9% -26.67% (p=0.000 n=10)
CacheSetGet-8 71.00 ± 10% 70.00 ± 10% ~ (p=0.396 n=10)
geomean 27.17 23.41 -13.85%
Signed-off-by: f41gh7 <nik@victoriametrics.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes CPU cache locality by reordering fields in the bucket struct and adjusting mutex acquisition order to improve performance of bucket.Get requests. Based on benchmark results, this change delivers up to 30% performance improvement for cache operations.
Key changes:
- Reordered atomic counter fields (getCalls, setCalls, misses) to be adjacent to the mutex
- Moved mutex acquisition before accessing chunks field in Set and Get methods
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #92 +/- ##
=======================================
Coverage 76.68% 76.68%
=======================================
Files 4 4
Lines 549 549
=======================================
Hits 421 421
Misses 73 73
Partials 55 55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This commit changes order of bucket structure fields and mu access
in order to improve CPU cache locality. Previously, mu and atomic counters were accessed with different CPU cache lines and strongly hit performance at older CPUs.
The following benchmark shows up to 30% performance increase: