-
Notifications
You must be signed in to change notification settings - Fork 317
tweaks to modular effort tradeoffs #4236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
908c5f3 to
f875283
Compare
|
This reminds me, I was going to try re-enabling P15 at effort 9. Instead of that though, we might explore a wider predictor overhaul. Adding new options like P14, that try a subset of the most commonly used predictors. Possibly even replace P14, due to how slow Weighted is for en/decoding with marginal improvement over Gradient in most cases, but that will need to be tested and discussed. |
f875283 to
0f01a23
Compare
|
I did some testing recently, and I think It should match well to the higher MA percent in this PR, and uses another feature which is disabled by default. |
6cd38e6 to
3e87bf6
Compare
That could make sense, yes. Let's do it in another PR though. |
|
Rebased this. Now the performance impact is as follows: Before: After: The 'before' is now better than the 'after' was before (other improvements have been made in the mean time), but it looks like this is still an improvement, Pareto-wise. At every effort setting, compression slightly improves, and encode speed either improves or remains similar. |
|
Is there any intent/reason behind |
|
libjxl/lib/jxl/modular/encoding/enc_ma.cc Line 979 in 7cac2ac
|
Also related to #4232 and #4154
Chipping away at the Pareto front, these tweaks aim to (slightly) improve the effort/density trade-offs.
Changes:
max_property_valuesto actually get respected, we can bump up the number of property value quantization buckets at all effort settings (which improves density at the cost of some speed, though overall this has little speed impact)nb_repeatsparameter (which can be configured via the API but by default is just 0.5 at all efforts) is now modulated by effort too, i.e. lower effort also uses fewer samples for MA tree learning. This speeds up lower efforts and slows down higher efforts, remaining neutral at default effort.adds_wpcould be false even though the candidate split does use the WP (when the parent node already used the WP), which can lead to selecting a suboptimal split because of thefast_decode_multiplierpreferring a slightly worse split withadds_wp == falseto a better split withadds_wp == true(which doesn't make sense if in both options, the WP is used anyway). The simpler logic is slightly faster and denser (the difference is small though).Before: (jyrki31 corpus)
After:
TL;DR: e4-e6 become faster and slightly denser (so just better), e7 stays about the same (a tiny bit denser and slower, maybe), e8+ become slightly denser and slower.