+
Skip to content

Conversation

coreylowman
Copy link
Owner

This moves cpu kernels closer to what cuda kernels do for reductions:

  1. Permute the shape/strides so that all reduced elements are rightmost dims
  2. Iterate over out buffers in order (so no nd-indexing)
  3. Advance the input iter (nd indexed) by the number of elements reduced

Additionally this adds some shortcuts for full reductions (to 0d)

///
/// This makes all the reduced elements sequential when indexed using [NdIndex].
#[inline(always)]
pub(crate) fn index_for_reductions<S: Shape, Ax: Axes>(
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkoppel i'm pretty sure this does exactly what the cuda version below does. Thoughts on if we could combine them? I'd like to keep the cpu one not using vecs, which I think is the biggest difference.

Copy link
Contributor

@nkoppel nkoppel Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can factor out the shared logic from index_for_reductions and permute_for_reductions, but I'd prefer to keep the filtering of broadcasted dimensions in permute_for_reductions, to reduce overhead from get_strided_index for broadcasted tensors.

This implementation looks correct to me, and is likely faster than my implementation, so go ahead and use yours.

Comment on lines +26 to +28
new_shape[num_non_reduced_dims + i_reduced] = dims[i_src];
new_strides[num_non_reduced_dims + i_reduced] = strides[i_src];
i_reduced += 1;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optimization for this is probably to sort the strides in this section of new_shape/new_strides by decreasing size, to maximize locality when indexing.

@coreylowman coreylowman merged commit cc45a0c into main Feb 25, 2023
@coreylowman coreylowman deleted the reduction-optimizations branch February 25, 2023 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载