这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@jonsneyers
Copy link
Member

Also related to #4232 and #4154

Chipping away at the Pareto front, these tweaks aim to (slightly) improve the effort/density trade-offs.

Changes:

  • After the bugfix at Fix inefficiency in quantize histogram #4154 that caused max_property_values to actually get respected, we can bump up the number of property value quantization buckets at all effort settings (which improves density at the cost of some speed, though overall this has little speed impact)
  • The nb_repeats parameter (which can be configured via the API but by default is just 0.5 at all efforts) is now modulated by effort too, i.e. lower effort also uses fewer samples for MA tree learning. This speeds up lower efforts and slows down higher efforts, remaining neutral at default effort.
  • Simplified/improved the tree learning heuristics a little since the logic was a bit wonky: adds_wp could be false even though the candidate split does use the WP (when the parent node already used the WP), which can lead to selecting a suboptimal split because of the fast_decode_multiplier preferring a slightly worse split with adds_wp == false to a better split with adds_wp == true (which doesn't make sense if in both options, the WP is used anyway). The simpler logic is slightly faster and denser (the difference is small though).

Before: (jyrki31 corpus)

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17162582   10.3463459   5.225  49.279          nan 100.00000000  99.99   0.00000000  0.000000000000  10.346      0
jxl:d0:5        13270 16971925   10.2314097   2.893  39.969          nan 100.00000000  99.99   0.00000000  0.000000000000  10.231      0
jxl:d0:6        13270 16860935   10.1645001   1.849  35.470          nan 100.00000000  99.99   0.00000000  0.000000000000  10.165      0
jxl:d0:7        13270 16638016   10.0301149   1.188  31.430          nan 100.00000000  99.99   0.00000000  0.000000000000  10.030      0
jxl:d0:8        13270 16534367    9.9676308   0.319  31.807          nan 100.00000000  99.99   0.00000000  0.000000000000   9.968      0
jxl:d0:9        13270 16458308    9.9217791   0.235  29.942          nan 100.00000000  99.99   0.00000000  0.000000000000   9.922      0
Aggregate:      13270 16769172   10.1091812   1.164  35.760   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.109      0

After:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17117248   10.3190166   6.899  44.328          nan 100.00000000  99.99   0.00000000  0.000000000000  10.319      0
jxl:d0:5        13270 16934902   10.2090906   3.290  39.130          nan 100.00000000  99.99   0.00000000  0.000000000000  10.209      0
jxl:d0:6        13270 16856572   10.1618699   2.078  35.075          nan 100.00000000  99.99   0.00000000  0.000000000000  10.162      0
jxl:d0:7        13270 16635589   10.0286518   1.167  32.788          nan 100.00000000  99.99   0.00000000  0.000000000000  10.029      0
jxl:d0:8        13270 16532724    9.9666403   0.305  32.073          nan 100.00000000  99.99   0.00000000  0.000000000000   9.967      0
jxl:d0:9        13270 16444861    9.9136727   0.215  30.934          nan 100.00000000  99.99   0.00000000  0.000000000000   9.914      0
Aggregate:      13270 16751992   10.0988244   1.238  35.433   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.099      0

TL;DR: e4-e6 become faster and slightly denser (so just better), e7 stays about the same (a tiny bit denser and slower, maybe), e8+ become slightly denser and slower.

@jonsneyers jonsneyers requested a review from veluca93 May 7, 2025 09:01
@jonsneyers jonsneyers mentioned this pull request May 7, 2025
@jonsneyers jonsneyers force-pushed the modular_effort_tweaks branch 2 times, most recently from 908c5f3 to f875283 Compare May 7, 2025 09:46
@jonnyawsom3
Copy link
Collaborator

This reminds me, I was going to try re-enabling P15 at effort 9.
It was previously disabled because e9 was slower than e10, but that only applies to images under 2048 x 2048 where Local MA trees (and effectively multithreading) is disabled.

Instead of that though, we might explore a wider predictor overhaul. Adding new options like P14, that try a subset of the most commonly used predictors. Possibly even replace P14, due to how slow Weighted is for en/decoding with marginal improvement over Gradient in most cases, but that will need to be tested and discussed.

@jonsneyers jonsneyers force-pushed the modular_effort_tweaks branch from f875283 to 0f01a23 Compare June 4, 2025 12:49
@jonnyawsom3
Copy link
Collaborator

jonnyawsom3 commented Aug 13, 2025

I did some testing recently, and I think -E 1 could be enabled at effort 9, with faster decoding level 2 defaulting it back to 0. It has a small en/decode speed penalty, but the density improvement can be better than -P 15, which is enabled at effort 10.

It should match well to the higher MA percent in this PR, and uses another feature which is disabled by default.

@eustas eustas added the CI:full Label to attach to a PR to run the full CI workflow and not just the regular PR workflows label Aug 21, 2025
@eustas eustas enabled auto-merge August 21, 2025 13:43
@jonsneyers jonsneyers force-pushed the modular_effort_tweaks branch from 6cd38e6 to 3e87bf6 Compare October 17, 2025 14:39
@jonsneyers
Copy link
Member Author

I did some testing recently, and I think -E 1 could be enabled at effort 9, with faster decoding level 2 defaulting it back to 0. It has a small en/decode speed penalty, but the density improvement can be better than -P 15, which is enabled at effort 10.

It should match well to the higher MA percent in this PR, and uses another feature which is disabled by default.

That could make sense, yes. Let's do it in another PR though.

@jonsneyers
Copy link
Member Author

Rebased this.

Now the performance impact is as follows:

Before:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17162620   10.3463688   5.390  58.479          nan 100.00000000  99.99   0.00000000  0.000000000000  10.346      0
jxl:d0:5        13270 16908996   10.1934733   3.187  46.673          nan 100.00000000  99.99   0.00000000  0.000000000000  10.193      0
jxl:d0:6        13270 16797889   10.1264932   1.897  40.528          nan 100.00000000  99.99   0.00000000  0.000000000000  10.126      0
jxl:d0:7        13270 16625029   10.0222858   1.181  34.947          nan 100.00000000  99.99   0.00000000  0.000000000000  10.022      0
jxl:d0:8        13270 16478362    9.9338686   0.380  35.581          nan 100.00000000  99.99   0.00000000  0.000000000000   9.934      0
jxl:d0:9        13270 16385839    9.8780917   0.263  33.514          nan 100.00000000  99.99   0.00000000  0.000000000000   9.878      0
Aggregate:      13270 16724387   10.0821830   1.251  40.795   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.082      0

After:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17117175   10.3189726   7.764  54.181          nan 100.00000000  99.99   0.00000000  0.000000000000  10.319      0
jxl:d0:5        13270 16872864   10.1716914   3.956  46.188          nan 100.00000000  99.99   0.00000000  0.000000000000  10.172      0
jxl:d0:6        13270 16793526   10.1238630   2.337  40.800          nan 100.00000000  99.99   0.00000000  0.000000000000  10.124      0
jxl:d0:7        13270 16622723   10.0208956   1.249  36.068          nan 100.00000000  99.99   0.00000000  0.000000000000  10.021      0
jxl:d0:8        13270 16474269    9.9314011   0.380  35.807          nan 100.00000000  99.99   0.00000000  0.000000000000   9.931      0
jxl:d0:9        13270 16371339    9.8693505   0.249  34.870          nan 100.00000000  99.99   0.00000000  0.000000000000   9.869      0
Aggregate:      13270 16706772   10.0715641   1.429  40.778   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.072      0

The 'before' is now better than the 'after' was before (other improvements have been made in the mean time), but it looks like this is still an improvement, Pareto-wise. At every effort setting, compression slightly improves, and encode speed either improves or remains similar.

@jonsneyers jonsneyers requested a review from eustas October 17, 2025 15:59
@goodusername123
Copy link

Is there any intent/reason behind kSquirrel not having a value defined for nb_repeats or was this just a simple oversight/mistake?

@jonnyawsom3
Copy link
Collaborator

jonnyawsom3 commented Oct 20, 2025

nb_repeats is capped at 1 in this PR, so I'm not sure why it also has 1.1 set for Kitten and 1.3 for Glacier.
I know higher values should increase the quantization percent, but then cparams_.options.nb_repeats = std::min(1.0f, cparams_.options.nb_repeats); should cap at 10, not 1.

// Sample 10% of the final number of samples for property quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:full Label to attach to a PR to run the full CI workflow and not just the regular PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants