Benchmarking Topological Deep Learning
While TDA and Deep Learning rely on a wide range of mathematical structures and lifting/transforming algorithms, TopoBench simplifies the entire research cycle. It automates the design and evaluation process by providing a ready-to-use pipeline for configuring and training various graph and topological models.
What you will learn: The basics of TopoBench architecture, configuration, training and evaluation with a simple dataset.
👉 The full article, featuring design principles, detailed implementation, in-depth analysis, and Q&A, is available on the Substack article Benchmarking Topological Deep Learning
🎯 Overview
Topological Data Analysis (TDA) and Deep Learning encompass a diverse array of mathematical frameworks, higher-order structures, and sophisticated lifting and transformation algorithms. TopoBench streamlines this complexity by automating the design and assessment of topological architectures through an integrated pipeline for the configuration, training, and evaluation of Graph and Topological Neural Networks.
🎨 Modeling & Design Principles
Topobench is a modular Python framework built to standardize benchmarks and streamline research within Topological Deep Learning (TDL). It enables the seamless training and comparative analysis of various Topological Neural Networks (TNNs) across multiple domains, including graphs, simplicial complexes, cellular complexes, and hypergraphs [ref 1].
Topological Domains
Topological Data Analysis (TDA) is a methodology that applies concepts from algebraic topology and computational geometry to analyze and extract meaningful patterns from complex datasets. It provides a geometric and topological perspective to study the shape and structure of data.
The most common topological domains - a supported in TopoX and TopoBench are
A simplicial complex is a graph with faces. It generalizes graphs that model higher-order relationships among data elements—not just pairwise (edges), but also triplets, quadruplets, and beyond (0-simplex: node, 1-simplex: edge, 2-simplex: Triangle, 3-simplex: Tetrahedron, ) [ref 2].
Cell complexes (or CW complexes) represent objects of flexible shape which are built out of basic ball-shaped building blocks (cells) of arbitrary dimension. Cells of different dimensions are rigidly related. For example, an area is enclosed by lines, which in turn are enclosed by points. This rigid structure describes the underlying topology. These complexes can be used as an alternative to Graph Neural Networks when data modeling requires high-order relationships [ref 3].
A hypergraph is a generalization of a graph where a hyperedge can connect any number of nodes or vertices. Similarly to Cell and Simplicial Complexes, a node and a hyperedge are said to be incident if the vertex is a member of the hyperedge. Hypergraphs are obviously ideal to encode relationships that are not strictly pairwise such as a chemical reaction of multiple molecules or a conference call involving many participants [ref 4].
Combinatorial complexes are structures used to represent topological spaces—like surfaces or multi-dimensional shapes—by breaking them down into discrete pieces like points, line segments, triangles, and their higher-dimensional counterparts. They are more flexible than Simplicial and Cell complexes.
📌 This section emphasizes complexes, assuming they are less familiar to the general reader. Note, however, that TopoBench also natively supports traditional topological domains, including graphs and point clouds. Graphs (nodes and edges) and point clouds (nodes only) are simplest forms of complexes after all.
Topological Lifting
Lifting is the mathematical and architectural process of transforming a graph (composed of nodes and edges) into a higher-order topological structure supporting cells, faces, volumes or hyperedges.
The most common scenario is the topological lifting of a graph to a simplicial complex as described and illustrated in a previous article [ref 5]. The resulting components are
There are many lifting techniques, the most common among them being
While graph-to-topological-complex transformations are the standard entry point for TDL, the lifting process is not limited to graphs; one topological complex can be further lifted into another, more sophisticated higher-order structure.
📌 This article focuses on graph and topological complexes. However, a simple topological structure, point set or point cloud can be also lifting into a graph or any of the complexes.
TopoBench
TopoBench provides a unified benchmarking infrastructure for Topological Deep Learning (TDL) by integrating and expanding upon current software tools. It combines NetworkX for graph processing with the TopoX suite—specifically TopoNetX [ref 6] for building complex structures and TopoModelX for model implementation. Additionally, it supports out-of-the box PyTorch Geometric (PyG) [ref 7] models and original research code.
At its core, TopoBench is a unified and flexible workflow that supports a variety of datasets, data transformation, preprocessing methods along with deep learning models (e.g., Graph Neural Networks) and customizable metrics.
The key components are
A standout capability of TopoBench is its support for 'lifting,' which allows users to transform basic graph data into higher-order topological structures. This process enables the elevation of both raw features and the underlying connectivity, with the primary transformation routes detailed in the diagram below.
⚙️ Hands‑on with Python
Environment
# 1. Setup uv if not installed
wget -qO- https://astral.sh/uv/install.sh | sh
or
pip install uv
or
brew install uv # MacOS
# 2. Load the source code
git clone git@github.com:geometric-intelligence/topobench.git
or
git clone http://geometric-intelligence/topobench.git
cd TopoBench
# 3. Setup the virtual environment
uv venv --python 3.11
source .venv/bin/activate
# 4. Sync dependencies with suitable version (torch ...)
uv sync --all-extras
Recommended by LinkedIn
Wrapper
The configuration of TopoBench for evaluating graphs or topological structures such as Simplicial Complexes can be tedious. Let’s automate and componentize the TopoBench functionality by wrapping it into a class, TopoBenchWrapper.
The default constructor takes 2 arguments:
Example of descriptors
transform_desc = {
“khop_lifting”: {
“transform_type”: “lifting”,
“transform_name”: “HypergraphKHopLifting”,
“k_value”: k_value
}
}
evaluator_desc = {
"task": "classification",
"num_classes": 2,
"metrics": ["accuracy", "precision", "recall", "f1"]
}
🔎 The class TopoBenchConfig described in the appendix automates the configuration of TopoBench from the list of descriptors.
The alternative constructor, build, loads a predefined, parameterized configuration of TopoBench defined in __get_config_descriptors . This implementation is an example of parameterization of the configuration for TopoBench using k_value for k-hop lifting, dataset data_name and the learning rate, lr used by the optimizer.
🔎 Training Implementation
With the necessary components assembled, we can now proceed to train the lightning_graph_model. This model is derived from the original PyTorch module via a conversion process detailed in the Appendix. The training configuration is defined by three primary hyperparameters:
Note: In torch Lightning, metrics are efficiently collected through call back - callback_metrics
📈 Evaluation
Datasets
PyTorch Geometric library contains a rich set of graph data that covers node classification, edge prediction and graph classification for graph structure with various homophily [ref 8].
TUDataset is a collection consists of over 120 datasets of varying sizes from a wide range of applications related to graph classification and regression. My evaluation uses two dataset collections: MUTAG and PROTEINS [ref 9].
MUTAG is a collection of 188 nitroaromatic chemical compounds. The primary goal is a binary classification task: predicting whether a given molecule has a mutagenic effect (specifically on the Salmonella typhimurium bacterium).
PROTEINS dataset deals with much larger macromolecular structures, making it a more rigorous test for a model’s ability to handle scale and complexity. It determines whether a protein is an Enzyme or a Non-Enzyme.
📌 I purposely selected the MUTAG dataset used in one of the tutorials of TopoBench so the reader can validate the results.
🔎 Training & Testing
The first step is to select a simple multi-layer perceptron model with two sets of linear layers with their associated activation functions.
Finally, let’s train and evaluate for these two data set with slightly different training parameters.
"data_domain": "graph",
"data_type": "TUDataset",
"data_name": "MUTAG",
"data_dir": "./data/MUTAG/"
💎 Evaluation of TopoBench model on Proteins dataset is available at Benchmarking Topological Deep Learning - Training & Test
📘 References
💎 Key takeaways, exercises and paper review is available in the original Substack article Benchmarking Topological Deep Learning
⏭️ Share in the comments the next topic you’d like me to tackle.
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 and Hands-on Geometric Deep Learning Newsletter.
Original substack article: https://tinyurl.com/2nrrb2hc