Benchmarking Topological Deep Learning

Benchmarking Topological Deep Learning

While TDA and Deep Learning rely on a wide range of mathematical structures and lifting/transforming algorithms, TopoBench simplifies the entire research cycle. It automates the design and evaluation process by providing a ready-to-use pipeline for configuring and training various graph and topological models.


What you will learn: The basics of TopoBench architecture, configuration, training and evaluation with a simple dataset.

Article content

👉 The full article, featuring design principles, detailed implementation, in-depth analysis, and Q&A, is available on the Substack article Benchmarking Topological Deep Learning



🎯  Overview

Topological Data Analysis (TDA) and Deep Learning encompass a diverse array of mathematical frameworks, higher-order structures, and sophisticated lifting and transformation algorithms. TopoBench streamlines this complexity by automating the design and assessment of topological architectures through an integrated pipeline for the configuration, training, and evaluation of Graph and Topological Neural Networks.

🎨 Modeling & Design Principles

Topobench is a modular Python framework built to standardize benchmarks and streamline research within Topological Deep Learning (TDL). It enables the seamless training and comparative analysis of various Topological Neural Networks (TNNs) across multiple domains, including graphs, simplicial complexes, cellular complexes, and hypergraphs [ref 1].

Topobench is a modular Python framework built to standardize benchmarks and streamline research within Topological Deep Learning (TDL). It enables the seamless training and comparative analysis of various Topological Neural Networks (TNNs) across multiple domains, including graphs, simplicial complexes, cellular complexes, and hypergraphs

Topological Domains

Topological Data Analysis (TDA) is a methodology that applies concepts from algebraic topology and computational geometry to analyze and extract meaningful patterns from complex datasets. It provides a geometric and topological perspective to study the shape and structure of data. 

The most common topological domains - a supported in TopoX and TopoBench are

  • Simplicial Complexes
  • Cellular Complexes
  • Hypergraphs
  • Combinatorial Complexes

simplicial complex is a graph with faces. It generalizes graphs that model higher-order relationships among data elements—not just pairwise (edges), but also triplets, quadruplets, and beyond (0-simplex: node, 1-simplex: edge, 2-simplex: Triangle, 3-simplex: Tetrahedron, ) [ref 2].

Cell complexes (or CW complexes) represent objects of flexible shape which are built out of basic ball-shaped building blocks (cells) of arbitrary dimension. Cells of different dimensions are rigidly related. For example, an area is enclosed by lines, which in turn are enclosed by points. This rigid structure describes the underlying topology. These complexes can be used as an alternative to Graph Neural Networks when data modeling requires high-order relationships [ref 3].

hypergraph is a generalization of a graph where a hyperedge can connect any number of nodes or vertices. Similarly to Cell and Simplicial Complexes, a node and a hyperedge are said to be incident if the vertex is a member of the hyperedge. Hypergraphs are obviously ideal to encode relationships that are not strictly pairwise such as a chemical reaction of multiple molecules or a conference call involving many participants [ref 4].

Combinatorial complexes are structures used to represent topological spaces—like surfaces or multi-dimensional shapes—by breaking them down into discrete pieces like points, line segments, triangles, and their higher-dimensional counterparts. They are more flexible than Simplicial and Cell complexes.

Comparison table for simplicial complex, cell complex, combinatorial complex and hypergraphs.
Table 1 Comparative summary of Topological Domains

📌 This section emphasizes complexes, assuming they are less familiar to the general reader. Note, however, that TopoBench also natively supports traditional topological domains, including graphs and point clouds. Graphs (nodes and edges) and point clouds (nodes only) are simplest forms of complexes after all.


Topological Lifting

Lifting is the mathematical and architectural process of transforming a graph (composed of nodes and edges) into a higher-order topological structure supporting cells, faces, volumes or hyperedges.

The most common scenario is the topological lifting of a graph to a simplicial complex as described and illustrated in a previous article [ref 5]. The resulting components are

  • 0-Simplices (Nodes): The original vertices.
  • 1-Simplices (Edges): The original connections.
  • 2-Simplices (Faces): Created by identifying cliques (like a triangle of three connected nodes) and filling them in as a solid surface.
  • 3-Simplices (Tetrahedra): Created by identifying fully connected groups of four nodes.

There are many lifting techniques, the most common among them being

  • Clique Complex Lifting: This is the most common method. Every k-clique (a set of k nodes where every node is connected to every other node) in the graph is mapped to a (k-1) simplex. For example, every 3-node cycle becomes a triangular face.
  • Ring/Cycle Lifting: Often used in chemistry (for molecules like Benzene), this method identifies cycles of a certain length and lifts them to 2-cells. This is particularly useful for Cellular Neural Networks.
  • Curvature-Based Lifting: Using measures like Ollivier-Ricci curvature, you can lift regions of high connectivity or specific geometric properties into higher-order structures to better represent the manifold’s “flow.”

While graph-to-topological-complex transformations are the standard entry point for TDL, the lifting process is not limited to graphs; one topological complex can be further lifted into another, more sophisticated higher-order structure.

While graph-to-topological-complex transformations are the standard entry point for TDL, the lifting process is not limited to graphs; one topological complex can be further lifted into another, more sophisticated higher-order structure.
Fig. 1 Illustration of the hierarchy of lifting techniques from lower to higher order topological structures in TopoBench

📌 This article focuses on graph and topological complexes. However, a simple topological structure, point set or point cloud can be also lifting into a graph or any of the complexes.



TopoBench

TopoBench provides a unified benchmarking infrastructure for Topological Deep Learning (TDL) by integrating and expanding upon current software tools. It combines NetworkX for graph processing with the TopoX suite—specifically TopoNetX [ref 6] for building complex structures and TopoModelX for model implementation. Additionally, it supports out-of-the box PyTorch Geometric (PyG) [ref 7models and original research code.

At its core, TopoBench is a unified and flexible workflow that supports a variety of datasets, data transformation, preprocessing methods along with deep learning models (e.g., Graph Neural Networks) and customizable metrics.

The key components are 

  • Customizable loader: Extends the capability of PyG’s InMemoryDataset.
  • Data loader: Provides interface to data batch for graphs, hypergraphs, simplicial, cell and combinatorial complexes.
  • Pre-processor: Defines a pipeline of sequential transforms that processes dataset only once.
  • Transforms: Inherited PyG’s BaseTransform, categorized as Data manipulation, Topology lifting and full or feature lifting.
  • Models: Torch Lightning modules - Neural networks defined as backbone Topological Neural Network - can be imported from either PyG or TopoX.

A standout capability of TopoBench is its support for 'lifting,' which allows users to transform basic graph data into higher-order topological structures. This process enables the elevation of both raw features and the underlying connectivity, with the primary transformation routes detailed in the diagram below.

A standout capability of TopoBench is its support for 'lifting,' which allows users to transform basic graph data into higher-order topological structures. This process enables the elevation of both raw features and the underlying connectivity, with the primary transformation routes detailed in the diagram below.
Fig. 2 Illustration of various topological domains and lifting methods supported in TopoBench


⚙️ Hands‑on with Python

Environment

# 1. Setup uv if not installed
wget -qO- https://astral.sh/uv/install.sh | sh
      or
pip install uv
      or
brew install uv     # MacOS

# 2. Load the source code
git clone git@github.com:geometric-intelligence/topobench.git 
       or    
git clone http://geometric-intelligence/topobench.git
cd TopoBench

# 3. Setup the virtual environment
uv venv --python 3.11
source .venv/bin/activate

# 4. Sync dependencies with suitable version (torch ...)
uv sync --all-extras        


Wrapper

The configuration of TopoBench for evaluating graphs or topological structures such as Simplicial Complexes can be tedious. Let’s automate and componentize the TopoBench functionality by wrapping it into a class, TopoBenchWrapper.

The default constructor takes 2 arguments:

  • graph_network: A simple graph model
  • topo_bench_descriptors: A tuple of several JSON-formatted descriptors for loading, transforming, splitting, training/optimizing and evaluating the model

Example of descriptors

transform_desc = {
   “khop_lifting”: {
       “transform_type”: “lifting”,
       “transform_name”: “HypergraphKHopLifting”,
       “k_value”: k_value
   }
}

evaluator_desc = {
   "task": "classification",
   "num_classes": 2,
   "metrics": ["accuracy", "precision", "recall", "f1"]
}        


🔎 The class TopoBenchConfig described in the appendix automates the configuration of TopoBench from the list of descriptors.

The configuration of TopoBench for evaluating graphs or topological structures such as Simplicial Complexes can be tedious. Let’s automate and componentize the TopoBench functionality by wrapping it into a class, TopoBenchWrapper.

The default constructor takes 2 arguments:

graph_network: A simple graph model
topo_bench_descriptors: A tuple of several JSON-formatted descriptors for loading, transforming, splitting, training/optimizing and evaluating the model

The alternative constructor, build, loads a predefined, parameterized configuration of TopoBench defined in __get_config_descriptors . This implementation is an example of parameterization of the configuration for TopoBench using k_value for k-hop lifting, dataset data_name and the learning rate, lr used by the optimizer.

🔎 Training Implementation

With the necessary components assembled, we can now proceed to train the lightning_graph_model. This model is derived from the original PyTorch module via a conversion process detailed in the Appendix. The training configuration is defined by three primary hyperparameters:

  • max_epochs: The upper limit for training iterations.
  • float_precision: The bit-depth for weights and input data (16, 32, or 64-bit).
  • device_name: The specific hardware accelerator (CPU/GPU/TPU) used for computation.

 The training configuration is defined by three primary hyperparameters:

max_epochs: The upper limit for training iterations.
float_precision: The bit-depth for weights and input data (16, 32, or 64-bit).
device_name: The specific hardware accelerator (CPU/GPU/TPU) used for computation.


Note: In torch Lightning, metrics are efficiently collected through call back - callback_metrics





📈 Evaluation

Datasets

PyTorch Geometric library contains a rich set of graph data that covers node classification, edge prediction and graph classification for graph structure with various homophily [ref 8].

TUDataset is a collection consists of over 120 datasets of varying sizes from a wide range of applications related to graph classification and regression. My evaluation uses two dataset collections: MUTAG and PROTEINS [ref 9].

MUTAG is a collection of 188 nitroaromatic chemical compounds. The primary goal is a binary classification task: predicting whether a given molecule has a mutagenic effect (specifically on the Salmonella typhimurium bacterium).

  • Positive Class: Mutagenic (harmful/toxic potential).
  • Negative Class: Non-mutagenic.
  • 188 graphs (training & Evaluation)
  • 7 discrete node features
  • 4 discrete labels

PROTEINS dataset deals with much larger macromolecular structures, making it a more rigorous test for a model’s ability to handle scale and complexity. It determines whether a protein is an Enzyme or a Non-Enzyme.

  • Positive Class: Enzyme (catalyzes biochemical reactions).
  • Negative Class: Non-Enzyme (structural or signaling proteins)
  • 1113 graphs
  • 3 node features
  • No edge features


📌 I purposely selected the MUTAG dataset used in one of the tutorials of TopoBench so the reader can validate the results.


🔎 Training & Testing

The first step is to select a simple multi-layer perceptron model with two sets of linear layers with their associated activation functions.

  • hypernodes_linear: Module processing node features of the hypergraph, associated with the ReLU activation hypernodes_relu
  • hyperedges_linear: Module process edges features of the hypergraph, associated with the ReLU activation hyperedges_relu

The first step is to select a simple multi-layer perceptron model with two sets of linear layers with their associated activation functions.

hypernodes_linear: Module processing node features of the hypergraph, associated with the ReLU activation hypernodes_relu
hyperedges_linear: Module process edges features of the hypergraph, associated with the ReLU activation hyperedges_relu

Finally, let’s train and evaluate for these two data set with slightly different training parameters.

Article content
"data_domain": "graph",
"data_type": "TUDataset",
"data_name": "MUTAG",
"data_dir": "./data/MUTAG/"        
Performance quality metrics (Accuracy, Precision, Recall, F1) and training + test losses for a lifted hypergraph using TUDataset/MUTAG dataset.
Fig. 3 Performance metrics for k-hops lifting to a hypergraph for TUDataset?MUTAG dataset

💎 Evaluation of TopoBench model on Proteins dataset is available at Benchmarking Topological Deep Learning - Training & Test


 

📘 References

  1. TopoBench: A Framework for Benchmarking Topological Deep Learning. L. Telyatnikov et All. 2025
  2. Exploring Simplicial Complexes for Deep Learning: Concepts to Code - Hands-on Geometric Deep Learning, 2025
  3. Graphs Reimagined: The Power of Cell Complexes - Hands-on Geometric Deep Learning, 2025
  4. Exploring Hypergraphs with TopoX Library - Hands-on Geometric Deep Learning, 2025
  5. Topological Lifting of Graph Neural Networks - Hands-on Geometric Deep Learning, 2025
  6. TopoX: A Suite of Python Packages for Machine Learning on Topological Domains - M. Hajij et all, 2024
  7. Taming PyTorch Geometric for Graph Neural Networks - Hands-on Geometric Deep Learning, 2025
  8. PyTorch Geometric - Dataset Cheatsheet
  9. TUDataset: A collection of benchmark datasets for learning with graphs C. Morris, N. Kriege, F. Bause, K. Kersting, P. Mutzel, M. Newmann, 2020


💎 Key takeaways, exercises and paper review is available in the original Substack article Benchmarking Topological Deep Learning 



⏭️ Share in the comments the next topic you’d like me to tackle.


Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 and Hands-on Geometric Deep Learning Newsletter.

To view or add a comment, sign in

More articles by Patrick Nicolas

  • Joint-Embedding Predictive Architecture Unpacked

    Large Language Models struggle to understand the physical world because they only see data as a flat string of words…

    6 Comments
  • Graphs Deserve Some Attention

    Graph Attention Models offer a hybrid solution to the limitations of spectral and spatial graph processing addressing…

    2 Comments
  • The Mathematics of World Models

    While LLMs have made massive strides, they remain hindered by high training costs and a lack of grounded, experiential…

    5 Comments
  • Visualization Tools for Geometric Deep Learning

    Have you ever found it challenging to translate complex AI architectures or machine learning outcomes into a clear…

    4 Comments
  • 2025 Retrospective Research Papers on Geometric Deep Learning

    A 2025 retrospective on 23 important research papers on Geometric Deep Learning. 👉 Complete articles on Geometric Deep…

    2 Comments
  • Curvature-informed Graph Learning

    Is your Graph Neural Network struggling with over-squashing or over-smoothing during training? These message-passing…

    4 Comments
  • Turbocharging Deep Learning with Taichi

    You don't have to rewrite your entire codebase to get better performance from existing numerical computation routines…

  • Understanding Data Through Persistence Diagrams

    Persistence diagrams are cornerstones of Topological Data Analysis, yet the formal algebraic topology required to…

    1 Comment
  • Data Modeling with Hypergraphs

    Hypergraphs generalize graphs by allowing edges that connect multiple nodes simultaneously. They are powerful tools for…

    5 Comments
  • Graphs Reimagined: The Power of Cell Complexes

    Cell complexes extend simplicial complexes into a more general framework for higher-order network modeling, gaining…

    3 Comments

Others also viewed

Explore content categories