这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Aug 3, 2021. It is now read-only.

Conversation

@amoussawi
Copy link
Contributor

Hello. Thanks for the great work. I know from the paper that the last layer is supposed to be computed as an affine transformation W*x + b, but it's not being done that way in the code (apparently unintentionally). In the decode() function you are checking if the weights are at the last layer kind=self._nl_type if ind!=self._last else 'none', however we know that number of weights matrices = number of layers - 1, and since ind corresponds to the index of the current weight matrix, and since indexing starts at 0, then self._last should be set as self._last = len(layer_sizes) - 2 and not as self._last = len(layer_sizes) - 1.

A good test case is to run the test units AutoEncoder with nl_type set to sigmoid, it will perform badly because the last layer is computing values in the range of [0,1] which does not correspond to the range of ratings. After applying the fix it will perform way better.

That actually explains why sigmoid and tanh activation functions did not perform well as described on the paper.

@okuchaiev okuchaiev self-assigned this Oct 9, 2017
@okuchaiev
Copy link
Member

Thanks @amoussawi ! This is indeed a bug. Which invalidates sigmoid and tanh results from section 3.2 of the paper. I'll need to re-do those experiments.
I'll merge your fix and will also add an option if user wants (or doesn't want to skip last layer's non-linearity).

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

@okuchaiev okuchaiev merged commit fcf3161 into NVIDIA:master Oct 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants