Optimizing conv kernels a bit #605

coreylowman · 2023-03-24T16:00:26Z

Resolves #547
Related to #578

This does a couple things:

Adds a workspace to Cuda that allows conv kernel to not have to re-allocate memory for patches
Removes a memset(0) from the output of Conv operation
Removes a memset(0) on patches allocation
No longer broadcasts filters in transpose_and_broadcast_filters
Sets patches[i] = 0.0 for both unfold_input & unfold_output
Uses the parallel stream to parallelize conv operations a bit

Timings of conv2d bench on A10

branch	forward	backward
main	9ms	52ms
updates	5ms	21.5ms

coreylowman · 2023-03-24T16:01:26Z

FYI @opfromthestart. I think there's still a lot more work to do with unfolding, which is why i'm going to leave the issue open.

coreylowman added 9 commits March 22, 2023 16:07

Adding workspace to cuda device

6439e31

Parallel conv & reduce alloc

2e7f4f9

Merge remote-tracking branch 'origin/workspace' into conv-optims-v2

926d10d

Some optimizations

162fdfd

Move unfold_input back to batch

c613fdd

Removing a memset with Conv alloc

3830865

Don't broadcast filters for cuda kernel

d3c0f60

Updating names for transpose kernel

767450f

updating cudarc

d12181d

coreylowman merged commit b7a6b5f into main Mar 24, 2023

coreylowman deleted the conv-optims-v2 branch March 24, 2023 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimizing conv kernels a bit #605

Optimizing conv kernels a bit #605

Uh oh!

coreylowman commented Mar 24, 2023

Uh oh!

coreylowman commented Mar 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Optimizing conv kernels a bit #605

Optimizing conv kernels a bit #605

Uh oh!

Conversation

coreylowman commented Mar 24, 2023

Uh oh!

coreylowman commented Mar 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant