Bug in feature aggregation

Hi, @gordicaleksa . Thank you for your implementation of GAT.

I'm new to GNNs so I'm not sure whether I understood your code correctly, but I think there is a bug in the feature aggregation in your GATLayer. The direction of aggregation appears as target->source.

In your implementation 1, attention scores are calculated as follows:
https://github.com/gordicaleksa/pytorch-GAT/blob/32bd7141e071a98f7fd1fb4fbcd51c750eca3b8c/models/definitions/GAT.py#L464-L470
The three dimensions of `all_attention_coefficients` mean (head, src, tgt), and you apply softmax on dim=-1 i.e. dim=2, making the scores sum up to 1 for each attention head and each source node.

And then in aggregation:
https://github.com/gordicaleksa/pytorch-GAT/blob/32bd7141e071a98f7fd1fb4fbcd51c750eca3b8c/models/definitions/GAT.py#L476-L477
Let's ignore the head dimension, then this calculates:
`out_nodes_features[i,:] = sum_over_j(all_attention_coefficients[i,j], nodes_features_proj[j,:])`
The definition of `all_attention_coefficients` is (head, src, tgt), and `nodes_features_proj` (node, feat), where "node" corresponds to "tgt" dim, so `out_nodes_features`'s 2 dims should mean (src, feat).

All of the code above has done the following: calculate attention score for each node as source of edge, and aggregate features of all its neighboring target nodes.
However based on my understanding, the feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target.

The implementation 2 also comes with the same problem. I'm still working to understand impl 3 so I don't know if the big persists.

	# shape = (NH, N, 1) + (NH, 1, N) -> (NH, N, N) with the magic of automatic broadcast <3
	# In Implementation 3 we are much smarter and don't have to calculate all NxN scores! (only E!)
	# Tip: it's conceptually easier to understand what happens here if you delete the NH dimension
	all_scores = self.leakyReLU(scores_source + scores_target.transpose(1, 2))
	# connectivity mask will put -inf on all locations where there are no edges, after applying the softmax
	# this will result in attention scores being computed only for existing edges
	all_attention_coefficients = self.softmax(all_scores + connectivity_mask)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bug in feature aggregation #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	# shape = (NH, N, N) * (NH, N, FOUT) -> (NH, N, FOUT)
	out_nodes_features = torch.bmm(all_attention_coefficients, nodes_features_proj)

Uh oh!

Bug in feature aggregation #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions