AI can now create virtual tumor models in minutes For decades, turning published papers into realistic computational models has been nearly impossible—particularly for complex biological processes such as tumor evolution. In my previous post exploring what AGI for biomedical research might look like, I highlighted the critical role of modeling complex systems (https://lnkd.in/gkSJKAcE). I had wondered if recent AI advances could dramatically change our ability to build these sophisticated models. Realistic computational models are essential, as they enable rapid hypothesis testing and deeper exploration of complex biological mechanisms. In two recent studies from our group and colleagues (paper 1: https://lnkd.in/gUj4_d57, paper 2: https://lnkd.in/ekh_HkH4), we characterized early lung cancer evolution at single-cell resolution, uncovering immune cell dynamics and tumor microenvironment interactions. Curious if these findings could rapidly become detailed virtual models, I gave Sonnet 3.7 a simple request: "Based on the content of the paper, create a comprehensive hybrid, multi-scale agent-based model in Python (using Mesa or similar) to recapitulate our results." Remarkably, Sonnet 3.7 immediately generated ~600 lines of robust Python code, requiring only modest refinement with Sonnet-assisted Cursor AI. The resulting hybrid agent-based model, built using Mesa (a Python framework for modeling complex adaptive systems), includes tumor cells, immune cells (cytotoxic T cells, regulatory T cells, polarized macrophages), endothelial cells, and environmental signaling molecules (VEGFA, TREM2, CXCL13). Agents follow biologically informed rules directly derived from experimental observations. Remarkably and despite many parameter assumptions, the virtual tumor faithfully reproduced key experimental observations: 🔸 Stepwise progression from preinvasive to invasive adenocarcinoma 🔸 Immune shifts: fewer cytotoxic cells, more suppressive populations 🔸 Realistic spatial signaling patterns (angiogenesis, immune polarization) As statistician George Box famously said, "All models are wrong, but some are useful." While no model is perfect, this AI-enabled approach rapidly bridges scientific papers to highly useful virtual experiments. The ability to create virtual tumor models in minutes could profoundly accelerate discovery—enabling entirely new ways of exploring and answering some of cancer’s most complex and pressing questions.
Computational Bioengineering Models
Explore top LinkedIn content from expert professionals.
Summary
Computational bioengineering models use advanced computer simulations to replicate and study complex biological systems, helping researchers understand how the body works and how diseases develop. These models blend biology, engineering, and computer science—enabling scientists to run virtual experiments that can reveal insights much faster than traditional lab methods.
- Apply AI-driven modeling: Use artificial intelligence tools to quickly create and test virtual representations of diseases or biological processes, uncovering patterns and testing new ideas without waiting for lengthy experiments.
- Explore dynamic simulations: Try models that reveal how proteins and cells change over time, offering a more realistic view for drug discovery, therapy development, or understanding disease progression.
- Prioritize model validation: Always compare model predictions with real-world data to ensure your simulations accurately reflect biological reality, especially given the variability in living systems.
-
-
Computational modeling of gene regulatory networks has become increasingly important for understanding the biological complexity underlying disease progression in diverse cell types and for the identification of potential therapeutic targets in drug discovery. To address this inherent complexity, a team led by Christina Theodoris and Patrick Ellinor developed Geneformer, a context-aware deep learning model pretrained on large-scale transcriptomic data to enable predictions in network biology with limited data. I included the link to the full paper and brief summary below. Transfer learning enables predictions in network biology. https://lnkd.in/dXiKxTga The model and data for training are available on Hugging Face at https://lnkd.in/d3Esd6eK. https://lnkd.in/dGxB2QaZ. Methods overview: The authors assembled a large-scale pretraining corpus called Genecorpus-30M, comprising 29.9 million human single-cell transcriptomes from various tissues. They developed a rank value encoding method to represent the transcriptome of each single cell, ranking genes by their expression within that cell normalized by their expression across the entire corpus. The researchers designed Geneformer's architecture with six transformer encoder units, each composed of a self-attention layer and feed-forward neural network layer. They implemented a masked learning objective for pretraining, where 15% of genes within each transcriptome were masked, and the model was trained to predict the masked genes. The authors optimized the pretraining process using dynamic length-grouped padding and distributed GPU training to handle the large-scale dataset efficiently. Results overview The authors showed that Geneformer boosted cell-type predictions compared to alternative methods, especially in complex multiclass prediction applications. The researchers fine-tuned Geneformer to predict gene dosage sensitivity, achieving high accuracy with limited data and generalizing well to newly reported disease genes. They applied Geneformer to predict chromatin dynamics, including bivalent domains and transcription factor regulatory range, outperforming alternative methods. The authors used Geneformer to predict network hierarchy and distinguish central versus peripheral factors within gene networks. They developed an in silico deletion approach to model gene network connections and identify dosage-sensitive genes. The researchers applied Geneformer to disease modeling of cardiomyopathy, identifying candidate therapeutic targets that were experimentally validated in an iPSC-based model of the disease.
-
BioEmu-1, developed by Microsoft Research, is a deep learning model that predicts dynamic structural ensembles of proteins, addressing the limitations of static models like AlphaFold and computationally intensive molecular dynamics (MD) simulations. Unlike traditional MD simultion, which struggles with scalability, BioEmu-1 combines data from AlphaFold, MD trajectories, and experimental stability metrics to generate thousands of conformations rapidly (10,000–100,000x faster) on a single GPU. It employs a diffusion-based generative approach to explore free-energy landscapes, revealing intermediate states and transient binding pockets critical for drug design. Validated against MD benchmarks, it accurately predicts folding free energies (R²=0.85) and allosteric pathways, aiding applications like kinase inhibitor development. Current limitations include handling novel folds and large multi-domain proteins, but future updates aim to integrate cryo-EM/NMR data and expand to RNA dynamics. Open-sourced to the community which is a great open source contribution to biology, BioEmu-1 accelerates research in drug discovery and protein engineering by bridging static structure analysis with dynamic functional insights. #ProteinDynamics #StructuralEnsembles #DeepLearning #AIInBiology #DrugDiscovery #Bioinformatics #MicrosoftResearch #OpenScience #MolecularDynamicsComparison #Allostery #ConformationalChanges #GenerativeAI #FreeEnergyLandscapes #CryoEM #KinaseInhibitors #ComputationalBiology #TherapeuticDesign
-
Biomechanical Simulation: A Pure CAE Perspective How close can our simulations get to reality when the system itself is biologically complex? Biomechanical simulation is not merely about meshing geometry or running a solver. It is about capturing highly nonlinear, anisotropic, and time-dependent behavior within a numerically stable and physically consistent framework. From a computational standpoint, these models typically integrate: ✔ Finite Element Analysis (FEA) for soft tissues and structural response ✔ Computational Fluid Dynamics (CFD) for airflow and hemodynamics ✔ Fluid–Structure Interaction (FSI) for coupled fluid–tissue mechanics ✔ Multibody dynamics for kinematic systems ✔ Advanced constitutive laws (hyperelasticity, viscoelasticity, anisotropy) The real challenge is not model construction, it's model credibility: Strong variability in biological material properties Uncertain and idealized boundary conditions Large deformation, contact, and nonlinear convergence issues Limited experimental datasets for validation High numerical sensitivity to parameters and mesh density Unlike conventional mechanical components, biological systems do not come with standardized material datasheets. Correlation, parameter identification, and stability control become critical steps in the simulation workflow. The GIF from Oklahoma State University beautifully illustrates how numerical modeling can reveal transient airflow patterns in an elastic lung model, phenomena that are impossible to observe directly in vivo: https://lnkd.in/d_b3DTvT Biomechanical simulation is where advanced computational mechanics meet the complexity of life itself. #BiomechanicalSimulation #CAE #FiniteElementAnalysis #CFD #FSI #NonlinearAnalysis #ComputationalMechanics #EngineeringSimulation
-
I still remember 2015 in the lab. Three months. One recombinant protein. Dozens of different host systems, vectors, strains, temperatures - everything you could possibly tweak. Most attempts? Only inclusion bodies. Or worse: NO EXPRESSION. If we hit a 10 % success rate, we celebrated like it was a publication. Fast-forward to 2025: The biggest shift in protein production since recombinant DNA technology isn’t new expression systems or bioreactors. 👉 It’s AI finally understanding what we couldn’t. When AlphaFold2 (2020) arrived, it didn’t just predict structures, it changed how we think about folding, stability, and function. And what came next has transformed expression strategy and design more than any textbook update ever did. Here are a few AI tools that changed the game for me: 🧬 SignalP 6.0: Helps you deciding between periplasmic, secretory, or eukaryotic targeting. 🧫 DeepTMHMM: Predicts α-helical and β-barrel transmembrane topologies. 🧩 ProteinMPNN: Designs sequences from backbone structures in seconds. 💫 Rfdiffusion: Generates new protein backbones nature never imagined. ⚡ ESMFold: 60 × faster than AlphaFold 2 and ideal for high-throughput screening. 🧠 AlphaFold 3: Predicts protein-ligand and complex assemblies. 🔡 CodonTransformer: AI-driven codon optimization considering tRNA abundance, mRNA folding, and ribosome kinetics. AI is no longer just a “support tool.” It’s rewriting the way we approach protein design and expression. 💬 Which AI tools have changed your workflow? 👇 Drop your favorites below. I’d love to compare notes. If you like insights that blend bench-reality with AI-powered innovation, follow me (Reinhold Horlacher) for more biotech deep dives. #ProteinEngineering #AlphaFold #AIinBiology #ComputationalBiology #ProteinProduction #ExpressionScreening #SyntheticBiology #trenzyme
-
Excited to share our new paper in Nature Communications Biology: Distance-AF, a method that makes AlphaFold2 predictions more accurate by adding user-specified distance constraints, Yuanyuan Zhang, Zicong Zhang, Yuki Kagaya, Genki Terashi, Bowen Zhao, Yi Xiong & Daisuke Kihara. ✨ Why it matters AlphaFold2 transformed structural biology, but still struggles with multi-domain proteins, flexible conformations, and ambiguous orientations. Distance-AF lets researchers guide AF2 with experimental or hypothetical distance info — unlocking models that were previously out of reach. 🛠️ What’s novel Unlike earlier methods, Distance-AF does not rely on retraining or heavy pretraining. Rather, it modifies the loss function during prediction, combining distance-constraint loss with existing AF2 terms in a dynamic reweighting scheme. It allows users to input experimental or hypothetical distance constraints (e.g. from cross-linking, NMR, EM) to steer structural models. Robust even when constraints are imperfect (perturbed) or sparse. 💡 Implications & opportunities Distance-AF opens new avenues in structural biology and computational modeling: Hybrid workflows combining experimental data and AI-based structure prediction Structure refinement in ambiguous or flexible systems Drug discovery: more accurate models of binding pockets, conformational states Protein engineering: exploring alternative conformations with controlled constraints 💡 Big picture: Distance-AF opens a path to hybrid AI + experimental workflows for protein structure prediction, refinement, and drug discovery. 👉 Check out the full article here: https://lnkd.in/gtYwzPBF
-
🧬 When AI meets biology: designing life from the inside out It’s incredible to see how far we’ve come in reimagining what’s possible in synthetic biology. A new paper in Cell Systems by Gherman et al. (University of Bristol, 2025) shows how combining a whole-cell model of E. coli with machine learning can drastically speed up the process of in silico genome design. By combining a whole-cell model (WCM) of E. coli with a machine learning surrogate, the team achieved a 95% reduction in computational time while accurately predicting which gene deletions a cell can survive. This resulted in a reduced E. coli genome, EMine-737 - that retains only ~60% of modeled genes yet still divides successfully. Minimal E. coli cells, like EMine-737, are the biological equivalent of a stripped-down operating systems, lean, efficient, and ready to be used as a - Simplified chassis for synthetic biology - Platform for biomanufacturing and metabolic engineering - Model for fundamental biology - Training ground for AI-driven biological design - Foundation for next-generation engineered microbes This study highlights how AI could reshape the future of biology by accelerating discovery by enhancing, not replacing, experimental work. Publication: Gherman et al., “Accelerated design of Escherichia coli reduced genomes using a whole-cell model and machine learning.” Cell Systems (2025) | DOI: 10.1016/j.cels.2025.101392 https://lnkd.in/e3K-Rd2u #SyntheticBiology #AIinBiology #GenomeEngineering #SystemsBiology #Ecoli #WholeCellModel #CellSystems
-
Introducing **Evo 2**, a new foundation model for biology. - Evo 2 is the largest-scale, fully open-source AI model ever released: 40 billion parameters, over 9 trillion tokens, and a 1-million-token context length. All the details are public: weights, data, training infrastructure, and inference infrastructure. - Evo 2 is built on a new model architecture: convolutional multi-hybrids (StripedHyena 2). StripedHyena 2 excels at modeling byte-tokenized data, providing faster training and lower perplexity compared to both Transformers and previous-generation hybrids based on state-space models. I am grateful for the team behind Evo 2—working with you was one of the proudest moments of my career (the core pretraining team was fewer than five people; you can just do things). Today, we release two papers (yes, plural), as well as weights, data, training, and inference codebases. Enjoy! There is so much more to this project: Evo 2 demonstrates for the first test-time scaling laws in biology; Evo 2 is state-of-the-art performance on classifying BRCA1 variants of unknown significance in breast cancer; Evo 2 generations can be steered with features identified via sparse autoencoders… Learned a lot working with the team: Garyk B., Eric Nguyen, Chris Re, Stefano Ermon, Brian Hie, Patrick Hsu, Hani Goodarzi, Dave Burke, Patrick Collison, Brandon Yang, Amy Lu, Anthony Costa, Matthew Durrant, Stefano Massaroli, David W. Romero, Greg Brockman and many more. Thank you to Arc Institute, Stanford University, Liquid AI, and to Ming-Yu LiuK Kimberly Powell Jensen Huang NVIDIA for believing in the team and sponsoring with compute. - Evo 2 code: https://lnkd.in/gHinyWvF - OpenGenome 2 (dataset): https://lnkd.in/gxk7cXgM - Savanna (training): https://lnkd.in/gHKTQJeR - Vortex (inference): https://lnkd.in/gpXV9aNV - Evo 2 paper: https://lnkd.in/gfxHjZh9 - StripedHyena 2 paper: https://lnkd.in/gt3nqxNz
-
🧬 What do 𝗲𝗺𝘂𝘀 (yes, the Australian birds!) have to do with protein research? More than you’d expect! 𝗣𝗿𝗼𝘁𝗲𝗶𝗻𝘀 are the engines of life. And they don’t just fold: they shift, open, close, bind, unbind! These conformational states define function, but they’re hard to capture: — Experiments? Precise, but slow and expensive — MD simulations? Accurate, but compute-heavy — ML? Promising, but emulators haven’t scaled well. Until now! This paper introduces 𝗕𝗶𝗼𝗘𝗺𝘂, a generative model that can sample conformational landscapes of proteins at 𝗠𝗗-𝗹𝗶𝗸𝗲 𝗽𝗿𝗲𝗰𝗶𝘀𝗶𝗼𝗻. 10,000 times faster! 🌀 BioEmu skips the physics and directly learns distributions of protein structures. Trained on a blend of: — AlphaFold structures — 200 ms of MD data — 500k experimental stability measurements It generates thousands of diverse, functional conformations in minutes/hours, on a single GPU. What can it do? — Recover domain motions (84%) — Predict local unfolding & cryptic pocket formation — Emulate free-energy landscapes of fast folders — Handle disordered regions — Predict effects of mutations on protein stability All while preserving structural realism! It generates thousands of diverse, functional conformations in minutes, on a single GPU. It doesn’t replace MD, it complements it. You use BioEmu to start exploring the landscape, and MD will take it from there. Are there limitations? Sure: — No physics awareness, so no generalization — No confidence metric — Only trained at 300K But that just leaves room for future versions! Lots of promise in merging simulations with machine learning! Read the full write-up here: https://lnkd.in/empShP4W #biotech #AI #machinelearning #computationalbiology #drugdiscovery
-
🧬 When AI Starts Speaking DNA: The Rise of Genome Language Models What if ChatGPT could read your DNA the way it reads your text? That’s not science fiction anymore — it’s the new frontier of Genome Language Models (gLMs). 💡 Just as Large Language Models learn the grammar of words, gLMs learn the grammar of life — predicting, generating, and even designing genetic code. Here’s how it works: Instead of words, these models learn from billions of A, C, G, and T sequences. They detect patterns in how genes are structured, regulated, and evolved. And with enough data, they can predict mutations, design synthetic DNA, or rewrite biological instructions faster than traditional bioengineering ever could. 🚨 But here’s where it gets truly provocative: Recently, scientists used AI models to generate entirely new viruses — not human pathogens, but bacteriophages — viruses that attack bacteria. Sixteen of these AI-designed phages were synthesized in the lab. They worked. They killed antibiotic-resistant E. coli. In other words: AI didn’t just understand biology — it created life’s building blocks. That’s both thrilling and chilling. 🤔 What happens when machines start designing organisms beyond our imagination? Who governs a world where code and DNA blur — where innovation and bio-risk walk hand-in-hand? The future of biotech won’t just belong to biologists. It’ll belong to those who can speak both languages — data and DNA. #AI #Genomics #BioTech #Innovation #Ethics #ArtificialIntelligence #LifeSciences
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development