Fourier Analysis


Graph Laplacian


Graph Fourier Transform

References [1], [2]

Batch Norm

Graph Networks

Graph Convolutional Network



An activation function should be applied to \(Z\) with a bias:


Highway GCN

\[\tilde{X}^k=\sigma(Z+b)\] \[T=\text{sigmoid}(WX^{k-1}+c)\] \[X^k=\tilde{X}^k\odot T + X^{k-1}\odot(1-T)\]

where $Z$ is the same as the $Z$ in GCN.

Simplifying Graph Convolutional Networks

\[S=\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}\] \[Z=\sigma(S^KX\Theta)\]

where the non-linearities between layers are removed.


GCN can be treated as

\[S=\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}\] \[Z=\sigma(SX\Theta)\]

where \(X\in\mathbb{R}^{N\times C}\) is the input features and its i-th row is the features of the i-th node


If for each layer, we sample \(T\) nodes from \(N\) nodes, denote the indices as \(I_T\)

\[Z_{i,:}=\frac{N}{T}\sum\limits_{j=1}^N1_{j\in I_T}S_{ij}X_{j,:}\Theta\]

Further, we can use importance sampling (sample the nodes following a distribution \(q\))

\[Z_{i,:}=\frac{1}{T}\sum\limits_{j=1}^N1_{j\in I_T}q_j^{-1}S_{ij}X_{j,:}\Theta\]

with \(q_j\sim\|S_{:,j}\|^2\).

The sampling in each GCN layer is independent.

Relational Graph Convolutional Network

Graph Attention Networks


where \(\alpha_{ij}\sim\exp(\text{LeakyReLU}(a^T[Wh_i;Wh_j]))\) (normalize over $j$) with a weight vector \(a\).

The attention can be extended to multi-head like transformer, where $h_i$ from multiple heads are concatenated. If using multi-head attention, the results should be averaged before activation at the final layer:


Skip connections could also be considered with attention mechanism.

Line Graph Neural Network

Large-Scale Learnable Graph Convolutional Network

LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs.

GraphSage (sample and aggregate)

Diffusion-Convolutional Neural Networks

Gated Graph Sequence Neural Networks

\[h_{n(v)}=\sum\limits_{u\in n(v)}h_u^{k-1}+b\] \[h_v^{k}=\text{GRU}(x:h_{n(v)},h:h_v^{k-1})\]

Jumping Knowledge Networks

The idea is that, in the same graph, for different nodes, the same number of steps can lead to very different effect. Because the range of neighboring nodes that a node’s representation draws from strongly depends on the graph structure. For a node, denote its representation at different layers as \(h_v^1,\cdots,h_v^k\)

The final representation is then an aggregation of these.