Machine Learning in Drug Discovery

Explore top LinkedIn content from expert professionals.

Summary

Machine learning in drug discovery refers to using advanced computer algorithms to analyze huge biological datasets and help scientists find, design, and test new medicines more quickly and accurately. By learning from patterns in genetic, chemical, and medical information, these systems can predict promising drug targets, generate novel molecules, and speed up the entire drug development process.

Adopt AI-powered workflows: Consider integrating automated systems that can suggest drug candidates or analyze experimental results to reduce manual labor and accelerate research timelines.
Combine human expertise: Pair machine-generated predictions with expert review to ensure promising compounds are selected, filtered, and validated before moving forward in development.
Use diverse data sources: Tap into genetic, chemical, and clinical data to train more accurate models that can uncover subtle relationships and identify new treatment possibilities.

Summarized by AI based on LinkedIn member posts

Ken Wasserman

Assistant Professor at Georgetown University School of Medicine

4,549 followers 2w
Report this post
Perplexity: Pharmaceutical Superintelligence (PSI) represents a paradigm shift in drug discovery where generative AI, multimodal foundation models, and automated labs merge into an end‑to‑end “prompt‑to‑drug” system. A scientist can issue a natural‑language prompt (e.g., “Design a drug for idiopathic pulmonary fibrosis”), and a reasoning AI autonomously coordinates specialized agents to identify targets, design molecules, execute robotic synthesis, and outline clinical strategies. 1. Protenix‑v1 and the Open‑Science Revolution Predicting accurate 3D protein–ligand structures forms the foundation of rational drug design. Protenix‑v1 is the first open‑source biomolecular prediction model to equal or exceed proprietary systems like AlphaFold 3 under identical data and compute budgets. It introduces inference‑time scaling—boosting accuracy by increasing sampling depth (up to 100 random seeds per complex)—providing a controllable accuracy–cost tradeoff. Its transparent architecture, alongside Boltz‑2 and Chai‑1, allows researchers to build fully inspectable biological pipelines. Together, these tools turn biology into a software‑defined stack—integratable, reproducible, and free from vendor lock‑in. 2. Open‑Source Engines for Autonomous Agents For PSI to function, reasoning agents must invoke specialized biophysical models to test hypotheses. Open‑source engines supply this validation layer, enabling AI systems to move from summarizing biology to inventing it. Multi‑agent environments (e.g., “Virtual Biotech”) can explore millions of datasets and interactions autonomously, collapsing the time and cost of discovering viable targets and candidate compounds. 3. TITO Strategy: Target Identification & Prompt‑to‑Drug Orchestration Drawing from Target Identification Pro (TID‑Pro) and the Text‑In/Text‑Out pipeline, PSI integrates target selection and molecule generation into a single reasoning hierarchy. TID‑Pro uses a positive–unlabeled framework combining 22 omics and text features to rank targets by clinical viability, outperforming general LLMs. The Prompt‑to‑Drug orchestrator decomposes user requests into structured tasks: target analysis (TID‑Pro), structure prediction (Protenix‑v1), molecule generation, and robotic synthesis/testing. This multi‑layered feedback loop—where each decision is cross‑checked by physics‑grounded predictors—ensures automation without hallucination. PSI thus transforms drug discovery into a continuous, autonomous feedback cycle capable of producing safe, synthesizable medicines directly from text prompts, marking a decisive step toward self‑driving pharmaceutical R&D. https://lnkd.in/dK7FU74U https://lnkd.in/exJSkinE listen to the podcast: https://lnkd.in/dcKwQGhH

Scientific Intelligence AI and Pharmaceutical Superintelligence via MultiAgent Stratification Ken Wasserman on LinkedIn
Like Comment
Anima Anandkumar Anima Anandkumar is an Influencer

227,546 followers 6mo
Report this post
We are proud to present our latest paper on physics-informed AI for drug design appearing in PNAS special issue on machine learning in chemistry . Standard data-driven AI does not work well on examples that are significantly different from training data. This can result in unphysical predictions that are clearly wrong. To limit this type of unphysical result in the realm of drug design we introduced a new machine learning model called NucleusDiff, which incorporates a simple physical idea into its training, greatly improving the algorithm's performance. NucleusDiff ensures that atoms stay at an appropriate distance from one another, accounting for physical concepts such as repellant forces that prevent atoms from overlapping or colliding. Rather than accounting for the distance between every single pair of atoms in a molecule, which would be expensive, NucleusDiff estimates a manifold, and on that manifold, it then establishes main anchoring points to watch, making sure that the atoms never get too close to one another. We predicted binding affinities of a newer molecule that was not included in the training dataset: the COVID-19 therapeutic target 3CL protease. NucleusDiff showed increased accuracy and a reduction of atomic collisions by up to two-thirds as compared to other leading models.
No more previous content

No more next content
15 Comments
Like Comment
Daphne Koller

Founder and CEO, insitro. Co-founder, Coursera. Professor of CS & Pathology at Stanford (now adjunct).

29,287 followers 4mo
Report this post
When I launched insitro, the goal was never to make one drug. If that had been the case, we could have used existing tools. Our goal was to change how drugs are made – and to do that, we built an AI-enabled causal biology engine from the ground up. We architected platforms for repeatable, scalable, predictable drug discovery — with urgency for patients who are running out of time, but never for quick wins. This week in Nature Communications, we published work that validates this approach, and I couldn't be prouder of the team that made it possible. The study details CellPaint-POSH, which integrates pooled CRISPR screening, high-content imaging, and self-supervised machine learning to map gene function at scale – without predefined biomarkers or human-engineered hypotheses. This is the power of a platform. For decades, drug discovery has forced a trade-off researchers accepted as inevitable: screen thousands of genes with crude readouts, or deeply characterize a handful with rich phenotypic data. Biology doesn't organize itself around what we've learned to measure. POSH upends this by providing a visual language that reveals answers to questions we didn't know to ask. Using DINO-ViT models trained directly on raw cellular images, we constructed gene-gene interaction maps from scratch – no hypotheses or biases limiting what the models can discover. When we let ML have an unfettered look at the data, it identifies functional relationships that conventional methods systematically miss. Our models distinguish genes producing complex phenotypes – mitochondrial dysfunction intertwined with signaling dysregulation – from those perturbing a single pathway alone. These are subtle, layered signatures no human could uncover manually. And critically, the models grow more capable as training data diversity increases. This matters because insitro is built on causal biology – understanding what drives disease. POSH lets us interrogate the genome at scale while preserving the phenotypic complexity where disease mechanisms actually live. We're not just calling hits; we're reconstructing gene function and causal relationships, then following that signal to uncover novel therapeutic targets. In this study alone, the platform surfaced new regulators of mTORC1 signaling – validated through orthogonal experiments. POSH is one component of a larger architecture that brings us closer to a future where drug discovery operates with the speed and predictability patients deserve. My thanks to Max Salick, Srinivasan Sivanandan, Bobby Leitmann, Ajamete Kaykas, and colleagues across insitro who made POSH possible. Paper: https://lnkd.in/gJD8XPmr Code: https://lnkd.in/gkVQhYuQ #MachineLearning #AI #DrugDiscovery

A pooled Cell Painting CRISPR screening platform enables de novo inference of gene function by self-supervised deep learning - Nature Communications nature.com

90 Comments
Like Comment
Ganna Posternak

Drug Discovery Scientist | Translating Complex Research Into Strategic Insight & Business Value for Biotech | AI & Biotech | Scientific Strategy & Narrative | 15+ Years Experience

5,972 followers 3w
Report this post
⌬⌬⌬ In the early, exploratory stages of drug discovery, machine learning is increasingly used to generate and explore large numbers of novel molecules, benefiting from large datasets and relatively fewer constraints. 🧪In contrast, during lead optimization, machine learning is used to refine existing compounds by simultaneously optimizing multiple properties, making it a challenging multi-objective problem. 📊 One of the main challenges in applying ML to lead optimization is data scarcity, as only a limited number of experimentally tested compounds are available, making it difficult to train models effectively. 🧩There are a few ways researchers deal with this, like data augmentation, transfer learning, multi-task learning etc. ➜ A recent ACS Bio & Med Chem Au paper is a nice example of how to tackle this issue in practice using transfer learning approaches like fine-tuning. The authors used a pretrained transformer (trained on ~1M ChEMBL molecules) and fine-tuned it on a small set of METTL3 inhibitors to support lead optimization. 🧬 METTL3 is an RNA methyltransferase that catalyzes the formation of N6-methyladenosine (m6A), one of the most abundant RNA modifications in eukaryotic cells. This modification regulates mRNA processing, stability, and translation, and its dysregulation has been associated with cancers such as acute myeloid leukemia and solid tumors. ➥ The goal was to improve metabolic stability within the UZH2 inhibitor series while maintaining strong potency. Using a two-step fine-tuning approach (first on potency, then on potency + stability), the model generated thousands of candidates. ➥ After filtering and synthesis, 5 compounds were tested. Two showed low-nanomolar potency together with improved metabolic stability in mouse liver microsomes compared to compounds in the original series. A few important nuances: ➤ The model operates within a narrow chemical space (single series) The study was conducted within a single, closely related chemical series. This means the model’s ability to generalize beyond that chemical space remains unclear. ➤ Stability optimization was MLM-based, with limited cross-species evaluation (HLM and RLM for selected compounds). ➤ Significant human filtering was required In addition, extensive human filtering and prioritization were required to select final candidates, emphasizing that the workflow is not fully automated. ⇨ The last point is important because, beyond being a strong example of applying AI to lead optimization, this paper demonstrates a very realistic “human-in-the-loop” workflow. While ML generates molecules, humans guide, filter, and validate them, showing “human-AI synergy” in action. Link to this paper: https://lnkd.in/ei7XChWB #DrugDiscovery #MachineLearning #MedicinalChemistry #AIinHealthcare #TransferLearning #HumanInTheLoop #Pharma
No more previous content

No more next content
6 Comments
Like Comment
Arnaud Delobel

Analytical Sciences 🧪 Innovative Therapies 💊 | 24,000+ followers 🌍 | Sharing insights on biopharma innovation 🚀

24,610 followers 9mo
Report this post
🧬 𝐀𝐈 𝐢𝐬 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐢𝐧𝐠 𝐨𝐧𝐜𝐨𝐥𝐨𝐠𝐲 𝐝𝐫𝐮𝐠 𝐝𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲 The integration of artificial intelligence into oncology pipelines is reshaping the landscape—from target identification to clinical validation. This comprehensive review underscores the pivotal role of AI across all stages of the drug development continuum. 🔍 𝐅𝐫𝐨𝐦 𝐓𝐚𝐫𝐠𝐞𝐭 𝐭𝐨 𝐓𝐡𝐞𝐫𝐚𝐩𝐲 AI accelerates the discovery of novel anti-tumor agents by: 🔹 Mining biomedical data for precise target identification 🔹 Validating targets using integrated in vitro/in vivo strategies 🔹 Designing compounds through CADD, generative AI, and molecular simulations 🔹 Predicting pharmacokinetics and toxicity profiles 🔹 Streamlining clinical trial design and patient stratification 🧠 𝐂𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐢𝐧 𝐀𝐜𝐭𝐢𝐨𝐧 💡 In one example, a machine learning framework identified 307 synergistic drug combinations for pancreatic cancer—validated in vitro with >80% accuracy. 💡 Generative AI (GAI) tools like ReLeaSE and GAN-based models propose entirely new molecular scaffolds with optimized bioactivity. 💡 DL models have successfully predicted selective ligands for key oncology targets like RXR and STK33. 📉 𝐀𝐈 𝐫𝐞𝐝𝐮𝐜𝐞𝐬 𝐜𝐨𝐬𝐭𝐬 𝐚𝐧𝐝 𝐭𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬 By minimizing dependency on high-throughput trial-and-error approaches, AI enables: ⚙️ Shorter discovery-to-clinic cycles ⚙️ Reduced attrition rates ⚙️ Enhanced precision in compound optimization 🧪 𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐨𝐧𝐜𝐨𝐥𝐨𝐠𝐲 𝐢𝐬 𝐧𝐨𝐰 𝐀𝐈-𝐩𝐨𝐰𝐞𝐫𝐞𝐝 Deep learning networks integrate genomic, transcriptomic, and phenotypic data to support individualized treatment strategies—ushering in the era of data-driven precision medicine. 🎯 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞-𝐀𝐰𝐚𝐲𝐬: • AI drives efficiency in oncology pipelines: from target ID to clinical trials • Generative AI enables de novo molecule design for anti-tumor therapies • ML enhances toxicity prediction, drug synergy modeling, and ADME profiling • AI-assisted clinical trials optimize recruitment, dosing, and monitoring • Hybrid workflows—AI + experimental methods—are key to future innovation #AIinPharma #DrugDiscovery #OncologyResearch #PrecisionMedicine #GenerativeAI #ClinicalTrials #MachineLearning #CancerTherapeutics #BiotechInnovation Fatimah Albani, Sahar S. Alghamdi, MS, PhD, FHEA, Mohammed Almutairi, MSc., PhD. & Tariq Alqahtani

2 Comments
Like Comment
Andrii Buvailo, Ph.D.

Biotech & AI analyst | Industry commentator | Co-founder, BiopharmaTrend.com | Writing Molecules & Empires

38,530 followers 1y
Report this post
A new report “Beyond Legacy Tools: Defining Modern AI Drug Discovery for 2025 and Beyond,” is out! (link in the comments) The report by BiopharmaTrend (Disclaimer: I am a co-founder of the company) analyzes the AI platforms behind companies like Recursion, Insilico Medicine, Iambic Therapeutics, Schrödinger, Verge Genomics, NOETIK and several others — and shows that despite their different architectures and areas of focus, they share a set of category defining traits: ✔️ Modeling biology holistically, not just focusing on single targets or pathways ✔️ Building scalable, software-first platforms that integrate wet-lab and in silico workflows ✔️ Owning or generating massive, multimodal datasets (e.g. omics, imaging, patient data, and proprietary perturbation experiments) ✔️ Embedding AI at every stage of the pipeline, connecting the dots via gen AI. 👉 The report also introduces the concept of Holistic Drug Development (HDD), a vision where AI platforms integrate real-world patient data, systems biology, and generative chemistry into a continuous, learning-driven loop. Here is my take: “We’ve been using machine learning in biology for decades” is a common argument meant to downplay the idea that AI drug discovery (AIDD) is a new category. But IMO, this argument falls short. Yes, machine learning (ML) has been used in biology and chemistry for decades: QSAR models, clustering, PCA, support vector machines, and basic neural nets, etc. But those were point solutions: tools applied to narrow tasks (e.g., predicting solubility, docking ligands, or clustering gene expression data). What’s different now is that ML, particularly deep learning, generative modeling, and transformer architectures, is being used to rebuild the entire discovery workflows. Next, earlier ML approaches required handcrafted features (e.g., molecular descriptors). Today’s models can learn rich, abstract representations directly from raw data — from sequences, graphs, images, and text — and use them across tasks. That shift is foundational and category-defining. Also, traditional ML tools were modular and disconnected. Today’s AIDD platforms integrate multimodal data (omics, imaging, EHRs, chemical structures, etc). Modern AI drug discovery platforms, operate in closed feedback loops with wet-lab systems, offer full-stack software products with APIs, dashboards, and orchestration layers. That level of scope and systems integration is categorically different, IMO. Classical ML mostly focused on prediction and classification. AIDD platforms now generate novel chemistry, hypotheses, even trial designs, shifting from prediction tools to creative engines within the discovery process. Finally, the important aspect is production-grade software platforms, not just scripts and models. Using ML as a helper tool ≠ building AI-native, data-driven engines. I am pretty certain. Disagree? Image credit: BiopharmaTrend
No more previous content

No more next content
42 Comments
Like Comment
David Walker

Commercial Leader at the Intersection of AI, Human Genetics & Pharma R&D | Turning Genetic Data into Better Drug Discovery Decisions | Enterprise AI Partnerships | Author | MBA MSc

4,429 followers 1mo
Report this post
One of the most interesting shifts in pharma R&D right now is the emergence of continuous learning loops between computational models and lab experiments. Strategic investments are reinforcing this direction. A recent example is the $1B collaboration between NVIDIA and Eli Lilly and Company, aimed at building an AI factory for drug discovery, leveraging large-scale models trained on the language of biology and chemistry. At the core of this approach is a tight feedback loop between the wet lab and the dry lab, where experimental results continuously update computational models. Instead of the traditional discovery cycle: Hypothesis → experiment → analysis → new hypothesis AI enables something closer to: Model prediction → experiment → real-time data → updated model → next experiment This continuous loop allows research teams to iterate far more quickly. Industry analyses suggest that embedding AI directly into experimental workflows could reduce discovery timelines by as much as 40% in some cases. For pharma organizations, the implications are significant: • accelerating target validation • prioritizing experiments more effectively • reducing failed experimental cycles The companies that succeed may not simply use AI tools. They will build AI-native discovery systems in which computation and experimentation continuously inform one another. Article: https://lnkd.in/gzXqtj2Y #AI #DrugDiscovery #Pharma #Biotech #PrecisionMedicine#AI #DrugDiscovery #Biotech #PharmaR&D

How AI is Transforming the Biopharmaceutical Value Chain from Discovery to Manufacturing | Pharmaceutical Technology pharmtech.com

3 Comments
Like Comment
Jorge Bravo Abad

AI/ML for Science & DeepTech | Prof. of Physics at UAM | Author of “IA y Física” & “Ciencia 5.0”

28,985 followers 1y
Report this post
Artificial intelligence in drug development Drug discovery has long struggled with high costs and low success rates, hindered by the complexities of disease mechanisms and the massive chemical space to explore. Recent advances in artificial intelligence (AI)—from large language models (LLMs) to sophisticated machine learning frameworks—aim to overcome these bottlenecks, accelerating the identification of viable drug targets, expediting virtual screening, and optimizing clinical trials. By sifting through extensive multi-omics and real-world datasets, AI-driven methods can decipher intricate biological pathways, predict toxicity, and streamline regulatory compliance in ways that human-led, trial-and-error paradigms cannot match. Zhang and colleagues provide a comprehensive overview of how AI is applied throughout drug development, spanning from target identification and molecular design to preclinical evaluation and clinical monitoring. They discuss how new machine learning tools, such as generative models and specialized LLMs, enable deeper insights into disease biology and chemical diversity, thereby reducing the time and cost of discovering active compounds. Drawing on examples of AI-based activity prediction and drug repurposing, the authors show that even real-world, noisy datasets—such as electronic health records and insurance claims—can yield actionable results for complex conditions, paving the way toward personalized treatments. Their review also highlights remaining hurdles to AI-augmented drug development, including sparse data, interpretability challenges, and the need for more robust cross-industry collaboration. Yet there is confidence that with continued innovation, AI will increasingly function as a co-pilot rather than a mere assistant in modern pharmaceutical research, helping deliver better, safer medications faster. By integrating knowledge-driven approaches and improved data stewardship, the field stands at the threshold of an era in which AI significantly boosts both efficiency and innovation in drug development. Paper: https://lnkd.in/dGmvh6am #DrugDiscovery #AIinPharma #MachineLearning #GenerativeAI #Biotech #DrugDevelopment #LLMs #PrecisionMedicine #ClinicalTrials #AIforScience #Pharmaceuticals #MultiOmics #RealWorldData #HealthcareInnovation #BiomedicalResearch
No more previous content

No more next content
Like Comment
Amir Barati Farimani

Associate Professor at Carnegie Mellon University

8,846 followers 9mo
Report this post
🚀 Pushing the boundaries of AI in drug discovery! 🧬 “Large Language Model Agent for Modular Task Execution in Drug Discovery” — now on bioRxiv! In this work, we introduce AgentD, an LLM-powered agent that integrates language reasoning with domain-specific tools to automate and streamline the early-stage drug discovery pipeline. ✨ What can AgentD do? Retrieve biomedical data (FASTA sequences, SMILES, literature) from web and structured databases. Answer tough, domain-specific scientific questions grounded in real literature (via RAG). Generate diverse seed molecules (using REINVENT & Mol2Mol). Predict critical ADMET properties and binding affinities. Iteratively refine molecules to improve drug-likeness and safety. Generate 3D protein–ligand complex structures for deeper analysis. 🚀 Why is this exciting? Drug discovery typically takes 10–15 years and billions of dollars. AgentD tackles these bottlenecks by integrating all the pieces into one modular, flexible, LLM-driven framework — enabling rapid screening, prioritization, and structural evaluation of drug candidates. In our case study on BCL-2 for lymphocytic leukemia: ✅ Increased drug-likeness (QED > 0.6) from 34 to 55 molecules after just two refinement rounds. ✅ Boosted compounds satisfying empirical drug-likeness rules from 29 to 52. ✅ Generated 3D structures to prepare for docking and MD — all starting from a single query. The modular design means AgentD can easily incorporate new generative models, property predictors, and simulation tools, making it a robust foundation for next-generation AI-driven therapeutic discovery. 📖 Check out the preprint here: https://lnkd.in/eysCq2_A #DrugDiscovery #AI #LargeLanguageModels #ComputationalBiology #GenerativeAI #MachineLearning #PharmaTech #LLM #Bioinformatics #CMU
No more previous content

No more next content
Like Comment
Fatemeh Vafaee

Professor at UNSW Sydney | Co-Founder & Director of OmniTx.AI Pty Ltd | Founding Director of OmniOmics.AI Pty Ltd | Deputy Director (Science) of UNSW AI Institute | Program Lead of Med-Tech.AI | Lead of UNSW Vafaee Lab

3,145 followers 5mo
Report this post
💊 + 💊 ≠ 💊💊 Not all drug combinations add up. Some synergise, some cancel each other, and some only work at very specific doses. ❓Can we predict which combinations will work and know when to trust the prediction? That is the problem #AlgoraeOS was built to solve. 🧬💊 An uncertainty-aware, dose-sensitive AI platform for combination therapy discovery. Combination therapies are central in cancer, infectious disease, and immunology. Yet discovering effective pairs is slow, expensive, and often guided by trial-and-error. AlgoraeOS changes this. It learns how drugs interact inside different cellular environments, and crucially, it estimates its own confidence in every prediction. 🔹 What makes AlgoraeOS different? • Dose matters — and the model understands dose–response, not just binary “works/doesn’t work.” • Confidence matters — it quantifies aleatoric and epistemic uncertainty, signalling when predictions are reliable. • Generalisation matters — trained on millions of harmonised synergy assays across 3,000+ drugs, it performs few-shot and zero-shot inference on unseen compounds. • Performance matters — it outperforms specialised drug synergy models and large #LLM-based systems such as #TxGemma–27B (Google DeepMind) — while being far more lightweight and deployable. This platform is the result of a deep #academic–industry partnership between UNSW Sydney and Algorae Pharmaceuticals (ASX: 1AI), supported by UNSW ResTech and high-performance computing on National Computational Infrastructure (NCI Gadi) and CSIRO. We particularly acknowledge Muhammad Javad Heydari, first author, whose technical contribution shaped the core of this work. 🔗 Full article / preprint link: https://lnkd.in/gP5AWA4H #AIinDrugDiscovery #CombinationTherapy #PrecisionOncology #Biopharma #MachineLearning #DrugSynergy #DeepLearning #BiotechInnovation David Hainsworth Brad Dilkes James Mckenna Muhammad Javad Heydari David Burt Vafaee Lab UNSW AI Institute Tom Melville Tom Marsland Parvin Mansouri Bryan Lye John Lock Shane T. Grey

7 Comments
Like Comment

Machine Learning in Drug Discovery

Summary

More in AI in Healthcare Innovation

Explore categories