Background

Fourier Analysis

Reference

Graph Laplacian

Reference

Graph Fourier Transform

References [1], [2]

Batch Norm

Graph Networks

Graph Convolutional Network

https://arxiv.org/abs/1609.02907

where

An activation function should be applied to with a bias:

Highway GCN

https://arxiv.org/abs/1804.08049

where $Z$ is the same as the $Z$ in GCN.

Simplifying Graph Convolutional Networks

https://arxiv.org/abs/1902.07153

where the non-linearities between layers are removed.

FastGCN

https://arxiv.org/abs/1801.10247

GCN can be treated as

where is the input features and its i-th row is the features of the i-th node

If for each layer, we sample nodes from nodes, denote the indices as

Further, we can use importance sampling (sample the nodes following a distribution )

with .

The sampling in each GCN layer is independent.

Relational Graph Convolutional Network

https://arxiv.org/abs/1703.06103

Graph Attention Networks

https://arxiv.org/abs/1710.10903

where (normalize over $j$) with a weight vector .

The attention can be extended to multi-head like transformer, where $h_i$ from multiple heads are concatenated. If using multi-head attention, the results should be averaged before activation at the final layer:

Skip connections could also be considered with attention mechanism.

Line Graph Neural Network

https://arxiv.org/abs/1705.08415

Large-Scale Learnable Graph Convolutional Network

https://arxiv.org/abs/1808.03965

LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs.

GraphSage (sample and aggregate)

https://arxiv.org/abs/1706.02216

Diffusion-Convolutional Neural Networks

https://arxiv.org/abs/1511.02136

Gated Graph Sequence Neural Networks

https://arxiv.org/abs/1511.05493

Jumping Knowledge Networks

https://arxiv.org/abs/1806.03536

The idea is that, in the same graph, for different nodes, the same number of steps can lead to very different effect. Because the range of neighboring nodes that a node’s representation draws from strongly depends on the graph structure. For a node, denote its representation at different layers as

The final representation is then an aggregation of these.