-
Notifications
You must be signed in to change notification settings - Fork 317
Description
Describe the bug
During noise synthesis, libjxl generates one batch of 16 floating point numbers for every partial 16-pixel wide row. After filling the row, it generates another batch of 16 floats for the stragglers, in case the the group width is not a multiple of 16 (which it isn't guaranteed to be, for example if the image dimensions are odd).
However, the group width usually is a multiple of 16, and if it is, then it generates this extra batch for the stragglers. And because there are no stragglers, it does nothing with it, so for a group width, of say, 512, it generates 33 batches where only 32 are required, and the last batch is unused.
Look at the loop on line 67 of dec_noise.cc:
for (size_t y = 0; y < ysize; ++y) {
float* JXL_RESTRICT row = rect.Row(noise, y);
size_t x = 0;
// Only entire batches (avoids exceeding the image padding).
for (; x + kFloatsPerBatch <= xsize; x += kFloatsPerBatch) { // <-- HERE, Line 72
rng->Fill(batch);
for (size_t i = 0; i < kFloatsPerBatch; i += Lanes(df)) {
BitsToFloat(reinterpret_cast<const uint32_t*>(batch) + i, row + x + i);
}
}
// Any remaining pixels, rounded up to vectors (safe due to padding).
rng->Fill(batch);
size_t batch_pos = 0; // < kFloatsPerBatch
for (; x < xsize; x += N) {
BitsToFloat(reinterpret_cast<const uint32_t*>(batch) + batch_pos,
row + x);
batch_pos += N;
}
}On line 72 (comment added), there's a non-strict inequality. I believe this should be a strict inequality.
Notice that if xsize is a multiple of kFloatsPerBatch (i.e. 16), then we have x + 16 <= xsize as the limiting condition, which is the same thing as x < xsize, so this fills the entire row. Then rng->Fill(batch); is called again, unconditionally.
Additional Information
This behavior isn't mentioned in the spec. Section K.5.2 in the spec has the following quote:
The next 16 (or the number of remaining samples in the row, whichever is less)
It doesn't explicitly state that this is also generated if the number of remaining samples is zero, and it would be very strange to interpret it that way.