Replacing conv2d implementation with matmuls #237

coreylowman · 2022-10-12T13:29:13Z

This addresses some slowness in conv2d implementation by leveraging matrix multiplication algorithms.

Here are some numbers from my local benchmarks with Conv2D<128, 256, 4> on Tensor4D<64, 128, 28, 28>:

	pytorch	pytorch single thread*	matrixmultiply	mkl-dynamic-seq	mkl-dynamic-iomp	old naive approach
forward	163ms	400ms	1597ms	524ms	382ms	2000ms
backward	293ms	795ms	4100ms	1245ms	876ms	7714ms

pytorch has num_threads and num_interop_threads set to 1. Parallel batched convs will be addressed by Parallelize conv_batched #145

There is probably still more work that can be done as far as reducing allocations/copying when creating the patches buffers used below.

seddonm1 · 2022-10-12T19:43:31Z

Did you run the benchmarks for the previous 'naive' implementation?

coreylowman · 2022-10-13T20:53:05Z

Did you run the benchmarks for the previous 'naive' implementation?

Just added - they are really slow lol. Didn't scale up the number of channels enough in my original benchmarks of convs, but it was a really simple algorithm to debug and test. All the unit tests made it really easy to verify that this new implementation worked! 😁

coreylowman · 2022-10-13T20:54:35Z

src/devices/conv.rs

+        #[cfg(not(feature = "cblas"))]
+        unsafe {
+            matrixmultiply::sgemm(
+                m, k, n, 1.0, a, k as isize, 1, b, n as isize, 1, 1.0, c, n as isize, 1,
+            )
+        }
+
+        #[cfg(feature = "cblas")]
+        unsafe {
+            let (m, n, k) = (m as libc::c_int, n as libc::c_int, k as libc::c_int);
+            sgemm(RowMajor, NoTr, NoTr, m, n, k, 1.0, a, k, b, n, 1.0, c, n)
+        }


Noting that this is copied from the matmul implementation - using Cpu::mm requires casting the inputs to 2d arrays, which in turn requires generic_const_expr bounds added to the trait, which I wanted to avoid.

seddonm1 · 2022-10-13T21:00:57Z

Damn, I'd hoped for some LLVM intrinsics magic that kept the naive way somewhat competitive! Looks like Fortran will be with us for a lot longer 😝

coreylowman · 2022-10-13T21:13:05Z

Damn, I'd hoped for some LLVM intrinsics magic that kept the naive way somewhat competitive! Looks like Fortran will be with us for a lot longer 😝

Hah yeah I was hoping that too. I'm pretty sure it was auto vectorizing the forward at least, but I think all the matmul stuff is suuuuper cache optimized.

coreylowman added 4 commits October 12, 2022 08:59

Tests passing for using matmuls in conv impl

c334f79

Enabling cblas in conv2d

4464229

Setting c=0.0 instead of filling w_tr with 0s

b7e3cb6

Removing unused imports

948ff79

coreylowman commented Oct 13, 2022

View reviewed changes

coreylowman merged commit da32b77 into main Oct 15, 2022

coreylowman deleted the conv-matmuls branch October 15, 2022 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Replacing conv2d implementation with matmuls #237

Replacing conv2d implementation with matmuls #237

Uh oh!

coreylowman commented Oct 12, 2022 •

edited

Loading

Uh oh!

seddonm1 commented Oct 12, 2022

Uh oh!

coreylowman commented Oct 13, 2022

Uh oh!

coreylowman Oct 13, 2022

Uh oh!

seddonm1 commented Oct 13, 2022

Uh oh!

coreylowman commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Replacing conv2d implementation with matmuls #237

Replacing conv2d implementation with matmuls #237

Uh oh!

Conversation

coreylowman commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seddonm1 commented Oct 12, 2022

Uh oh!

coreylowman commented Oct 13, 2022

Uh oh!

coreylowman Oct 13, 2022

Choose a reason for hiding this comment

Uh oh!

seddonm1 commented Oct 13, 2022

Uh oh!

coreylowman commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coreylowman commented Oct 12, 2022 •

edited

Loading