Transformers refactor #173

coreylowman · 2022-08-30T23:15:59Z

Closes #158
Closes #122
Closes #170
Closes #171
Closes #172

This turned into a way bigger refactor than I was expecting at first. Summary of changes:

Repeated now uses Vec instead of array (Repeated should use vec instead of array for smaller memory footprint #172)
MultiHeadAttention changes
1. Removed N param
2. Make K & V default to M
3. Rename M -> EMBED_DIM, H -> NUM_HEADS, K -> K_DIM, V -> V_DIM
4. impl SaveToNpz and LoadFromNpz
5. Unify the same seq length impls and the diff seq length impls into one. Now there are only 2 impls, 1 for batched and 1 for unbatched. Both of the impls support different seq lengths, which by def supports the same seq length. (Can MHA Module impls be combined? #170)
6. Both Module impls now take a 3-tuple of q/k/v (Can MHA Module impls be combined? #170)
7. Add appropriate calls to permute_axes (MultiHeadAttention needs to permute before reshape & after final matul #158)
8. Fix scalar value (MultiHeadAttention dividing by wrong scalar value #171)
TransformerEncoderBlock/TransformerDecoderBlock
1. Both of these now impl Module generically, so there is only 1 impl module instead of 2
2. Updated to use new generic params
3. Updated to use 3-tuple input to MHA
TransformerEncoder/TransformerDecoder
1. Both of these now impl Module generically, so there is only 1 impl module instead of 2
2. Updated to use new generic params
3. Updated to use 3-tuple input to MHA
Testing changes (for MHA, TransformerEncoderBlock, and TransformerDecoderBlock)
1. They all randomly initialize all of their parameters, and their input values
2. Only the expected output is specified. And this value is generated from passing the exact same random parameter values through the corresponding pytorch model

coreylowman · 2022-08-30T23:24:00Z

FYA @jafioti. the addition of permute_axes made transformers implementation look a lot closer to other ones I've seen. and some additions of Module<> into where clauses helped reduce number of impls

coreylowman added 15 commits August 30, 2022 09:01

#158 Adding permutes in transformers

8da3e4c

Slight readability improvement

0d725e7

Working commit of transformers refactor

fb55269

Adding test for mha & fixing bug

8bc60c1

Testing batched mha

e817975

Testing different seq lens in batched mha

6738f61

Testing diff seq lens in unbatched mha

8400960

Updating transformer docstrings

3a3eead

impl npz for mha

c0a4ba7

Adding tests for encoder block

2c79833

Separatign layer norms for encoders & decors

8a8f4b7

Renamign layernorms to norms

e8dc53b

Adding tests for decoder block

5d4160c

Improving docstring for decoder

5819bd3

Removing old prints

c6979b4

coreylowman merged commit b5208ce into main Aug 31, 2022

coreylowman deleted the 158-transformers-permute branch August 31, 2022 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Transformers refactor #173

Transformers refactor #173

Uh oh!

coreylowman commented Aug 30, 2022 •

edited

Loading

Uh oh!

coreylowman commented Aug 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Transformers refactor #173

Transformers refactor #173

Uh oh!

Conversation

coreylowman commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coreylowman commented Aug 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coreylowman commented Aug 30, 2022 •

edited

Loading