Graph Convolution Networks and Heat Diffusion
Let’s first look at the essence of the problem:
Every node in a graph is constantly changing its state under the influence of its neighbors and even more distant nodes, until it finally reaches equilibrium. Closer neighbors exert stronger influence.
To truly understand GCNs, the most important thing is to understand what the Laplacian matrix is doing; the rest is just different ways of solving the same problem. Fourier transforms, for example, simply shift the problem from the spatial domain to the frequency domain, but there are also methods that solve it directly in the spatial domain (e.g., GraphSage).
To make the problem easier to understand, let’s forget about time and frequency domains for now, and start with the simplest physical phenomenon—**heat conduction.**
Heat diffusion model on a graph
As we know, in the absence of external interventions, heat flows from hotter regions to cooler regions irreversibly. According to Newton’s Law of Cooling, the rate of heat transfer is proportional to the temperature gradient. Intuitively, if location A is hotter than location B, and they are in contact, then heat will flow from A to B at a rate proportional to their temperature difference.
1-Dimensional case
Let’s build a 1D heat diffusion model. Imagine a uniform iron rod, where different points have different temperatures. We want to describe how the temperature distribution on the rod evolves over time. For a continuous rod, we would need a partial differential equation (PDE) with respect to both time and space, but let’s keep it simple by discretizing space. Suppose the rod is now a 1D chain, where each segment has a single uniform temperature. Heat flows between adjacent segments:
For the i-th segment, it only exchanges heat with i-1 and i+1. If its current temperature is $\phi_i$, we have:
Here k is a constant depending on material properties like specific heat. The first term is heat flowing in from the next segment, raising the temperature, and the second term is heat flowing out to the previous segment, lowering it.
The second term is a difference of differences, i.e., a second derivative! So for a continuous 1D rod, the heat conduction equation becomes:
In higher-dimensional Euclidean space, the first derivative generalizes to the gradient, and the second derivative generalizes to the Laplacian operator:
So we see:
Now, let’s extend this heat conduction model to a graph topology, and you’ll see that a GCN is essentially describing the same thing!
Heat diffusion on a graph
Still considering heat conduction, but now on a graph. Each node is a unit that only exchanges heat with adjacent nodes (those connected by edges).
For example, in the graph below, node 1 only exchanges heat with nodes 0, 2, and 4. More distant nodes, like 5, can only influence it indirectly via node 4.
If we assume heat flow still follows Newton’s Law of Cooling, then for any node $i$, the temperature evolves as:
where $A$ is the adjacency matrix of the graph:
We only consider the simple case:
Now, rearrange:
where deg(i) is the degree of node i (how many edges it has).
Now collect all nodes’ temperatures into a vector $\phi=[\phi_1,\dots,\phi_n]^T$:
where $D = \text{diag}(\text{deg}(1),\dots,\text{deg}(n))$ is the degree matrix. So we get:
where $L=D-A$ is the graph Laplacian. Compare this with the continuous Euclidean PDE:
They have the same form!
Recommended by LinkedIn
So we can naturally generalize continuous-space heat conduction to graphs, where the analogous operator is the graph Laplacian L.
Extending to GCNs
Now the picture is clear:
We can complicate the model with kernels, neural networks, non-linearities to better model complex relationships. Fourier transforms appear because in many cases, solving in the frequency domain is easier, so you transform, solve, then transform back.
So the core idea is:
Essentially, this is Message Passing—a kind of inductive process. Convolution, Fourier transforms—those are just surface-level solution methods.
Discretizing time → iterative update (ML-friendly)
We can write the original equation:
In ML, time is discrete, so the derivative becomes a step-wise iteration: ${\phi}_{t+1}=\phi_{t}-kL{\phi}_{t}$
This looks exactly like gradient descent:
Except here the update is Laplacian acting on the current state, rather than a gradient of a loss function. So for each node $i$:
So the next state is a weighted mix of the node’s current state and its neighbors’.
This is exactly what GraphSage does:
So GraphSage is just a nonlinear, learnable generalization of heat diffusion. We can generalize Aggregate to e.g. Sum Pooling, LSTM,Attention etc.
In actual implementations,
Fourier domain analogy
So the graph Fourier transform is simply projection onto the Eigenbasis of $L$. There are normalized Laplacians too, which just apply different normalization strategies to account for node degree. ***https://en.wikipedia.org/wiki/Laplacian_matrixen.wikipedia.org/wiki/Laplacian_matrix***
Semi-supervised learning analogy: heat sources
In thermodynamics, for an isolated system, eventually everything reaches a uniform equilibrium (“heat death”). But if the graph has heat sources (nodes with fixed temperature), the system reaches a non-uniform steady state.
In semi-supervised learning:
Optimization perspective
The heat equation can be discretized as:
Each node updates as:
Compare this with GraphSage:
So GCNs/GraphSage are learnable, nonlinear, sampled approximations of the same message-passing process.
Well written article! The equations came through pretty well.