Graph Convolution Networks and Heat Diffusion

Let’s first look at the essence of the problem:

Every node in a graph is constantly changing its state under the influence of its neighbors and even more distant nodes, until it finally reaches equilibrium. Closer neighbors exert stronger influence.

To truly understand GCNs, the most important thing is to understand what the Laplacian matrix is doing; the rest is just different ways of solving the same problem. Fourier transforms, for example, simply shift the problem from the spatial domain to the frequency domain, but there are also methods that solve it directly in the spatial domain (e.g., GraphSage).

To make the problem easier to understand, let’s forget about time and frequency domains for now, and start with the simplest physical phenomenon—**heat conduction.**

Heat diffusion model on a graph

As we know, in the absence of external interventions, heat flows from hotter regions to cooler regions irreversibly. According to Newton’s Law of Cooling, the rate of heat transfer is proportional to the temperature gradient. Intuitively, if location A is hotter than location B, and they are in contact, then heat will flow from A to B at a rate proportional to their temperature difference.

1-Dimensional case

Let’s build a 1D heat diffusion model. Imagine a uniform iron rod, where different points have different temperatures. We want to describe how the temperature distribution on the rod evolves over time. For a continuous rod, we would need a partial differential equation (PDE) with respect to both time and space, but let’s keep it simple by discretizing space. Suppose the rod is now a 1D chain, where each segment has a single uniform temperature. Heat flows between adjacent segments:

Article content

For the i-th segment, it only exchanges heat with i-1 and i+1. If its current temperature is $\phi_i$, we have:


Article content

Here k is a constant depending on material properties like specific heat. The first term is heat flowing in from the next segment, raising the temperature, and the second term is heat flowing out to the previous segment, lowering it.


Article content

The second term is a difference of differences, i.e., a second derivative! So for a continuous 1D rod, the heat conduction equation becomes:


Article content

In higher-dimensional Euclidean space, the first derivative generalizes to the gradient, and the second derivative generalizes to the Laplacian operator:


Article content

So we see:

  1. In Euclidean space, the rate of temperature change at a point is proportional to the temperature distribution around it, measured by the Laplacian.
  2. The Laplacian operator is the generalization of the second derivative to higher-dimensional space.

Now, let’s extend this heat conduction model to a graph topology, and you’ll see that a GCN is essentially describing the same thing!

Heat diffusion on a graph

Still considering heat conduction, but now on a graph. Each node is a unit that only exchanges heat with adjacent nodes (those connected by edges).

For example, in the graph below, node 1 only exchanges heat with nodes 0, 2, and 4. More distant nodes, like 5, can only influence it indirectly via node 4.


Article content

If we assume heat flow still follows Newton’s Law of Cooling, then for any node $i$, the temperature evolves as:


Article content

where $A$ is the adjacency matrix of the graph:

  • $A_{ij}=1$ if nodes i and j are neighbors, $A_{ij}=0$ otherwise.

We only consider the simple case:

  1. Undirected graph: $A_{ij}=A_{ji}$;
  2. No self-loops : diagonal entries of $A$ are 0.

Now, rearrange:


Article content

where deg(i) is the degree of node i (how many edges it has).

Now collect all nodes’ temperatures into a vector $\phi=[\phi_1,\dots,\phi_n]^T$:


Article content


where $D = \text{diag}(\text{deg}(1),\dots,\text{deg}(n))$ is the degree matrix. So we get:


Article content

where $L=D-A$ is the graph Laplacian. Compare this with the continuous Euclidean PDE:


Article content

They have the same form!

  • In graph topology, a finite set of nodes has a state vector $\phi$, and its change is governed by a linear operator −L
  • In Euclidean space, a continuous field $\phi(x,t)$ changes under the Laplacian Δ.

So we can naturally generalize continuous-space heat conduction to graphs, where the analogous operator is the graph Laplacian L.

Extending to GCNs

Now the picture is clear:

  • Instead of heat, in a graph we’re propagating features or messages.
  • A GCN describes feature/message flow and propagation on a graph network!
  • The most primitive form of this propagation is: state change ∝ Laplacian acting on current state.

We can complicate the model with kernels, neural networks, non-linearities to better model complex relationships. Fourier transforms appear because in many cases, solving in the frequency domain is easier, so you transform, solve, then transform back.

So the core idea is:

  1. We have a graph; each node represents an entity. In physics, it’s a unit with a temperature. In ads/recommender systems, it’s a user/item/ad with an embedding vector as its state.
  2. Neighboring nodes are more closely related. In physics, this is spatial proximity. In ads/recs, it’s logical relations like “clicked,” “purchased,” etc.
  3. Nodes propagate heat/messages to neighbors, making neighboring states more similar.

Essentially, this is Message Passing—a kind of inductive process. Convolution, Fourier transforms—those are just surface-level solution methods.

Discretizing time → iterative update (ML-friendly)

We can write the original equation:


Article content

In ML, time is discrete, so the derivative becomes a step-wise iteration: ${\phi}_{t+1}=\phi_{t}-kL{\phi}_{t}$

This looks exactly like gradient descent:


Article content


Except here the update is Laplacian acting on the current state, rather than a gradient of a loss function. So for each node $i$:


Article content

So the next state is a weighted mix of the node’s current state and its neighbors’.

This is exactly what GraphSage does:

  1. Aggregate neighbors’ states.
  2. Combine (concat + neural network) neighbor-aggregate with the node’s own state → new node state.

So GraphSage is just a nonlinear, learnable generalization of heat diffusion. We can generalize Aggregate to e.g. Sum Pooling, LSTM,Attention etc.

In actual implementations,

  1. You don’t need to operate on the whole graph at once. You can batchify and update only a subset of nodes at a time, letting different parts take turns.
  2. You can restrict message passing to only the 1st- and 2nd-degree neighbors of a node, rather than the entire graph.
  3. This corresponds to doing an eigen-decomposition of the Laplacian and keeping only the low-frequency eigenvectors, because the spectrum of the matrix usually has a long-tail distribution, where only the top few eigenvalues/eigenvectors dominate the energy.

Fourier domain analogy

  • In Euclidean space, the Laplacian’s eigenfunctions are $e^{-j\omega t}$, with eigenvalues $\omega^2$.


Article content

  • In Graph space, the Laplacian $L$ has eigenvalues/eigenvectors playing the same role.
  • Continuous space has infinite orthogonal Fourier bases; graph space has finite orthogonal eigenvectors.

So the graph Fourier transform is simply projection onto the Eigenbasis of $L$. There are normalized Laplacians too, which just apply different normalization strategies to account for node degree. ***https://en.wikipedia.org/wiki/Laplacian_matrixen.wikipedia.org/wiki/Laplacian_matrix***

Semi-supervised learning analogy: heat sources

In thermodynamics, for an isolated system, eventually everything reaches a uniform equilibrium (“heat death”). But if the graph has heat sources (nodes with fixed temperature), the system reaches a non-uniform steady state.

In semi-supervised learning:

  • Temperature = label
  • Heat source = labeled node (fixed label)
  • Unlabeled nodes passively receive propagated features from labeled ones until they stabilize.
  • So labeled nodes influence their neighbors, and labels propagate like heat.

Optimization perspective

The heat equation can be discretized as:


Article content

Each node updates as:


Article content

Compare this with GraphSage:


Article content

  • Original heat diffusion simply sums over neighbors.
  • GraphSage uses a learnable aggregator (sum, mean, LSTM, attention).
  • Original diffusion mixes neighbors with self using a fixed linear combination. GraphSage uses concat + neural net → nonlinear fusion.

So GCNs/GraphSage are learnable, nonlinear, sampled approximations of the same message-passing process.


Article content


Well written article! The equations came through pretty well.

To view or add a comment, sign in

More articles by LingJun Zhou

Others also viewed

Explore content categories