Introducing sequifier: training foundation models for anything

Leon Luithlen

Published Nov 25, 2025

When I started working on my start up almost exactly one year ago, I expected to go to market with a v0 after 6 months tops. However, after 8 months, I was side tracked by a side project of mine, and this is what I want to introduce to you today: sequifier

Sequifier is the library to create foundation models for any domain other than language. And today, v1 has been officially released.

The gap in the market

We all have been talking pretty much non-stop about AI for the better part of three years, and the tooling for training causal transformer models and optimizing their inference has exploded in quality and quantity.

Curiously neglected, however, is the tooling to apply causal transformer models to other types of data, such as user sessions, financial data, IoT data, log events, biological data, and many other types of sequential data.

Sequifier close this gap.

How does it work?

Sequifier is a CLI with three distinct stages, preprocessing, training and inference.

The user has to transform their data into the (very intuitive) format that sequifier preprocess expects, and all the subsequent steps of training and inferring a multi-variate causal transformer model on that data can be controlled via three configuration files, one for each stage.

Why use it?

If you have sequence data that you always wanted to have a generative model for, to extrapolate them into the future or predict future states of the system, or that you wanted to create embeddings for, sequifier will cut the time to a v0 from weeks or months to days or hours.

The standardized tooling included in sequifier enables easy monitoring of training progress, validated configuration of the specific architecture you want to use, reproducible training and inference runs, checkpointing and training resumption and hyperparameter search. You can have confidence in the implementation and the results it produces, because each step has been thoroughly tested. If you have a lot of data and want to train on multiple GPUs (multiple nodes is on the roadmap), it's a simple change in the training config.

Experiences so far

Over the last 6 months, I have been collaborating with a neuroscience start up on training generative models on neural and behavioral data, with the aim of training models that enable us to create synthetic traces of brain or behavioral activity. In the process of looking at the generative models we have trained, we found that even small models learn the dynamics of the underlying system quite faithfully, and reproduce a lot of the characteristics of the real distributions.

An earlier project with a friend aimed to create a generative model for sperm whale language, based on a tokenization of clicks based on a dictionary of so-called "codas": whale GPT. Sperm whales often communicate in tandem, with overlapping vocalizations, so we found a representation of the a communicative dyad to more faithfully represent their communication. With sequifier, more complex sequence representations can be easily incorporated into the model.

Recommended by LinkedIn

The Importance of Hyperparameter Tuning in Machine…

Sushil Saini 1 year ago

AI Engineering Journal | Issue #5

Dennis Leo 1 year ago

Object Detection in Darkness: Leveraging Computer…

Arif Miah 1 year ago

I am currently evaluating the promise of causal transformer models on the ESA predictive maintenance challenge on Kaggle. The aim is to discover which patterns in the multivariate sensor data indicate an anomaly, that might require some kind of human intervention. With sequifier, transforming the data into the right format and training and inferring the first models took a few days.

Applications

There are many potentially fruitful applications of causal transformers, but the expense of creating a viable prototype has prevented an extensive evaluation of how useful they are. Until now.

Here is a short list of applications:

Novel recommender systems: Conventional recommender systems typically take into account the overall consumption history of a specific customer, but does not account for the order of purchases, and completely ignores the current browsing behavior. A transformer model of sequential purchasing decisions can model the trajectory of a customer through a product catalogue, and sample from the probabilities over products of the next purchase. Session-based recommenders ignore the purchase history of customers completely, and try to lead them to the product that best fits their purchase intent based on their browsing behavior on the website itself. Both approaches are an excellent fit to the causal transformer model, and sequifier.
Imitation learning in games: A transformer model trained on the actions of a player, conditioned on events in the game, can be a powerful approach to creating AI opponents to players in games. The stochasticity enabled by sequifier allows for non-deterministic behavior, and models trained on player actions of different skill levels can be used to modulate the difficulty of the AI player.
Financial modelling: Most financial data takes the form of multivariate time series, and is an easy fit for transformer modelling. While it is probably pointless to model security prices directly, many higher level attributes of financial markets could show interesting temporal patterns that are amenable to modelling using transformers, such as factor correlations or implied volatility.
Biological and medical data: Many biological markers can be measured continuously through smart devices, and specific, but complex patterns in these measurements can indicate various disorders or risks. Various approaches can be taken to model these relationships: A model can be trained with the measured variables as target and then used to generate embeddings, which can serve in turn as input to predictive models of target variables of interest. Or they can be used to generatively extrapolate the multivariate time series to see if the probability of breaking specific thresholds. A third alternative is to train a model on the ultimate target variables directly, if these also extend through time.
Predictive maintenance: Analogous to the ESA challenge described above, many industrial systems measure various aspects of their operation continuously, to monitor their internal state and warn operators in case threshold are breached. As with biological systems, various modelling approaches are feasible: generating embeddings to augment other inputs to a supervised model, extrapolating the measured variables into the future, and training on ultimate target variables directly. More sophisticated approaches, such as the "Telemanom" architecture, can also incorporate generative transformers.

...and many more

I honestly believe that there are thousands of interesting problems to model with this approach, and there will be many that are outside my imagination. I'd love to see what you want to use it for!

How to move forward

If this sounds interesting to you, start by installing sequifier:

pip install sequifier

and then follow the instructions on the README.

Tip: If you are uncertain on how to configure your steps, you can always pass the explanations of the relevant configs in documentation/configs and a description of your data to an LLM of your choice, and it will help you configure it correctly.

If you have gone through this stage, and want my input on what you are doing, please reach out! I am happy to help out.

To view or add a comment, sign in

Introducing sequifier: training foundation models for anything

Leon Luithlen

Recommended by LinkedIn

Others also viewed

Optimizing Model Performance with Hyperparameter Tuning: Best Practices

Behind the Algorithm: How does a Machine Learning Team Work? Part II

Today’s Infrastructure must be like brains for AI

Regularization in Machine Learning

Step-by-Step Guide to Machine Learning Classification and Its Algorithms

The Secret Mathematics Behind AI's Reasoning Revolution: How 1,000 Training Examples Beat Million-Dollar Models

Machine Learning in Research vs Production

"Autoencoders and PCA: Bridging the Gap in Machine Learning Architecture"

AI & Machine Learning - Shouldn't we embrace it whole heartedly rather then doubting it...

Data Preprocessing for Large Language Models

How to Build a Reliable Data Foundation for AI

Why Use Inference-First Systems for Large Language Models

How to Train Custom Language Models

Explore content categories

Recommended by LinkedIn

Others also viewed

Optimizing Model Performance with Hyperparameter Tuning: Best Practices

Behind the Algorithm: How does a Machine Learning Team Work? Part II

Today’s Infrastructure must be like brains for AI

Regularization in Machine Learning

Step-by-Step Guide to Machine Learning Classification and Its Algorithms

The Secret Mathematics Behind AI's Reasoning Revolution: How 1,000 Training Examples Beat Million-Dollar Models

Machine Learning in Research vs Production

"Autoencoders and PCA: Bridging the Gap in Machine Learning Architecture"

AI & Machine Learning - Shouldn't we embrace it whole heartedly rather then doubting it...

Similar topics

Data Preprocessing for Large Language Models

How to Build a Reliable Data Foundation for AI

Why Use Inference-First Systems for Large Language Models

How to Train Custom Language Models

Explore content categories