Graph Convolution Networks and Heat Diffusion

LingJun Zhou

Published Jul 25, 2025

Let’s first look at the essence of the problem:

Every node in a graph is constantly changing its state under the influence of its neighbors and even more distant nodes, until it finally reaches equilibrium. Closer neighbors exert stronger influence.

To truly understand GCNs, the most important thing is to understand what the Laplacian matrix is doing; the rest is just different ways of solving the same problem. Fourier transforms, for example, simply shift the problem from the spatial domain to the frequency domain, but there are also methods that solve it directly in the spatial domain (e.g., GraphSage).

To make the problem easier to understand, let’s forget about time and frequency domains for now, and start with the simplest physical phenomenon—**heat conduction.**

Heat diffusion model on a graph

As we know, in the absence of external interventions, heat flows from hotter regions to cooler regions irreversibly. According to Newton’s Law of Cooling, the rate of heat transfer is proportional to the temperature gradient. Intuitively, if location A is hotter than location B, and they are in contact, then heat will flow from A to B at a rate proportional to their temperature difference.

1-Dimensional case

Let’s build a 1D heat diffusion model. Imagine a uniform iron rod, where different points have different temperatures. We want to describe how the temperature distribution on the rod evolves over time. For a continuous rod, we would need a partial differential equation (PDE) with respect to both time and space, but let’s keep it simple by discretizing space. Suppose the rod is now a 1D chain, where each segment has a single uniform temperature. Heat flows between adjacent segments:

For the i-th segment, it only exchanges heat with i-1 and i+1. If its current temperature is $\phi_i$, we have:

Here k is a constant depending on material properties like specific heat. The first term is heat flowing in from the next segment, raising the temperature, and the second term is heat flowing out to the previous segment, lowering it.

The second term is a difference of differences, i.e., a second derivative! So for a continuous 1D rod, the heat conduction equation becomes:

In higher-dimensional Euclidean space, the first derivative generalizes to the gradient, and the second derivative generalizes to the Laplacian operator:

So we see:

In Euclidean space, the rate of temperature change at a point is proportional to the temperature distribution around it, measured by the Laplacian.
The Laplacian operator is the generalization of the second derivative to higher-dimensional space.

Now, let’s extend this heat conduction model to a graph topology, and you’ll see that a GCN is essentially describing the same thing!

Heat diffusion on a graph

Still considering heat conduction, but now on a graph. Each node is a unit that only exchanges heat with adjacent nodes (those connected by edges).

For example, in the graph below, node 1 only exchanges heat with nodes 0, 2, and 4. More distant nodes, like 5, can only influence it indirectly via node 4.

If we assume heat flow still follows Newton’s Law of Cooling, then for any node $i$, the temperature evolves as:

where $A$ is the adjacency matrix of the graph:

$A_{ij}=1$ if nodes i and j are neighbors, $A_{ij}=0$ otherwise.

We only consider the simple case:

Undirected graph: $A_{ij}=A_{ji}$;
No self-loops : diagonal entries of $A$ are 0.

Now, rearrange:

where deg(i) is the degree of node i (how many edges it has).

Now collect all nodes’ temperatures into a vector $\phi=[\phi_1,\dots,\phi_n]^T$:

where $D = \text{diag}(\text{deg}(1),\dots,\text{deg}(n))$ is the degree matrix. So we get:

where $L=D-A$ is the graph Laplacian. Compare this with the continuous Euclidean PDE:

They have the same form!

In graph topology, a finite set of nodes has a state vector $\phi$, and its change is governed by a linear operator −L
In Euclidean space, a continuous field $\phi(x,t)$ changes under the Laplacian Δ.

Extending to GCNs

Now the picture is clear:

Instead of heat, in a graph we’re propagating features or messages.
A GCN describes feature/message flow and propagation on a graph network!
The most primitive form of this propagation is: state change ∝ Laplacian acting on current state.

We can complicate the model with kernels, neural networks, non-linearities to better model complex relationships. Fourier transforms appear because in many cases, solving in the frequency domain is easier, so you transform, solve, then transform back.

So the core idea is:

We have a graph; each node represents an entity. In physics, it’s a unit with a temperature. In ads/recommender systems, it’s a user/item/ad with an embedding vector as its state.
Neighboring nodes are more closely related. In physics, this is spatial proximity. In ads/recs, it’s logical relations like “clicked,” “purchased,” etc.
Nodes propagate heat/messages to neighbors, making neighboring states more similar.

Essentially, this is Message Passing—a kind of inductive process. Convolution, Fourier transforms—those are just surface-level solution methods.

Discretizing time → iterative update (ML-friendly)

We can write the original equation:

In ML, time is discrete, so the derivative becomes a step-wise iteration: ${\phi}_{t+1}=\phi_{t}-kL{\phi}_{t}$

This looks exactly like gradient descent:

Except here the update is Laplacian acting on the current state, rather than a gradient of a loss function. So for each node $i$:

So the next state is a weighted mix of the node’s current state and its neighbors’.

This is exactly what GraphSage does:

Aggregate neighbors’ states.
Combine (concat + neural network) neighbor-aggregate with the node’s own state → new node state.

So GraphSage is just a nonlinear, learnable generalization of heat diffusion. We can generalize Aggregate to e.g. Sum Pooling, LSTM，Attention etc.

In actual implementations,

You don’t need to operate on the whole graph at once. You can batchify and update only a subset of nodes at a time, letting different parts take turns.
You can restrict message passing to only the 1st- and 2nd-degree neighbors of a node, rather than the entire graph.
This corresponds to doing an eigen-decomposition of the Laplacian and keeping only the low-frequency eigenvectors, because the spectrum of the matrix usually has a long-tail distribution, where only the top few eigenvalues/eigenvectors dominate the energy.

Fourier domain analogy

In Euclidean space, the Laplacian’s eigenfunctions are $e^{-j\omega t}$, with eigenvalues $\omega^2$.

In Graph space, the Laplacian $L$ has eigenvalues/eigenvectors playing the same role.
Continuous space has infinite orthogonal Fourier bases; graph space has finite orthogonal eigenvectors.

So the graph Fourier transform is simply projection onto the Eigenbasis of $L$. There are normalized Laplacians too, which just apply different normalization strategies to account for node degree. ***https://en.wikipedia.org/wiki/Laplacian_matrixen.wikipedia.org/wiki/Laplacian_matrix***

Semi-supervised learning analogy: heat sources

In thermodynamics, for an isolated system, eventually everything reaches a uniform equilibrium (“heat death”). But if the graph has heat sources (nodes with fixed temperature), the system reaches a non-uniform steady state.

In semi-supervised learning:

Temperature = label
Heat source = labeled node (fixed label)
Unlabeled nodes passively receive propagated features from labeled ones until they stabilize.
So labeled nodes influence their neighbors, and labels propagate like heat.

Optimization perspective

The heat equation can be discretized as:

Each node updates as:

Compare this with GraphSage:

Original heat diffusion simply sums over neighbors.
GraphSage uses a learnable aggregator (sum, mean, LSTM, attention).
Original diffusion mixes neighbors with self using a fixed linear combination. GraphSage uses concat + neural net → nonlinear fusion.

So GCNs/GraphSage are learnable, nonlinear, sampled approximations of the same message-passing process.

Allen Wass 9mo

Well written article! The equations came through pretty well.

1 Reaction

See more comments

To view or add a comment, sign in

Graph Convolution Networks and Heat Diffusion

LingJun Zhou

Heat diffusion model on a graph

1-Dimensional case

Heat diffusion on a graph

Recommended by LinkedIn

Extending to GCNs

So the core idea is:

Discretizing time → iterative update (ML-friendly)

Fourier domain analogy

Semi-supervised learning analogy: heat sources

Optimization perspective

More articles by LingJun Zhou

Others also viewed

Geometric reframing of the equivalence principle, Part 2: Quantifying

Quantizing action instead of space, time, matter or field

What is gravity? Part 3: The equivalence principle revisited

HUFT Theory - Abstract

An interferometer that brings empirical evidence for the speed of light anisotropy

Inertial Sensor Simulation. Part III: Coordinates & Earth Models.

Geometric invariance, Part 4a: Why the uncertainty principle could exist at all

A Chronodynamic Interpretation of Atomic Structure: Electrons, Protons, and Neutrons in a 16-Tesseract Framework

#35 — Feynman’s Path Integral and the Space of Admissibility

Voids cause polarity inversions in GPR data….

Explore content categories

Heat diffusion model on a graph

1-Dimensional case

Heat diffusion on a graph

Recommended by LinkedIn

Extending to GCNs

So the core idea is:

Discretizing time → iterative update (ML-friendly)

Fourier domain analogy

Semi-supervised learning analogy: heat sources

Optimization perspective

More articles by LingJun Zhou

This is completely generated with AI to explain EML function

Others also viewed

Geometric reframing of the equivalence principle, Part 2: Quantifying

Quantizing action instead of space, time, matter or field

What is gravity? Part 3: The equivalence principle revisited

HUFT Theory - Abstract

An interferometer that brings empirical evidence for the speed of light anisotropy

Inertial Sensor Simulation. Part III: Coordinates & Earth Models.

Geometric invariance, Part 4a: Why the uncertainty principle could exist at all

A Chronodynamic Interpretation of Atomic Structure: Electrons, Protons, and Neutrons in a 16-Tesseract Framework

#35 — Feynman’s Path Integral and the Space of Admissibility

Voids cause polarity inversions in GPR data….

Similar topics

Heat Transfer Optimization Methods

Explore content categories