This repository was archived by the owner on Aug 3, 2021. It is now read-only.
fix bug where the non-linear activation is also applied on the last decoding layer #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello. Thanks for the great work. I know from the paper that the last layer is supposed to be computed as an affine transformation
W*x + b, but it's not being done that way in the code (apparently unintentionally). In thedecode()function you are checking if the weights are at the last layerkind=self._nl_type if ind!=self._last else 'none', however we know thatnumber of weights matrices = number of layers - 1, and sinceindcorresponds to the index of the current weight matrix, and since indexing starts at 0, thenself._lastshould be set asself._last = len(layer_sizes) - 2and not asself._last = len(layer_sizes) - 1.A good test case is to run the test units AutoEncoder with
nl_typeset tosigmoid, it will perform badly because the last layer is computing values in the range of [0,1] which does not correspond to the range of ratings. After applying the fix it will perform way better.That actually explains why
sigmoidandtanhactivation functions did not perform well as described on the paper.