Drug Discovery Computational Tools

Explore top LinkedIn content from expert professionals.

Summary

Drug discovery computational tools use advanced software and artificial intelligence to identify, design, and test new medicines by simulating biological interactions, helping scientists explore millions of possibilities without needing to run every experiment in a lab. These tools make it possible to search, predict, and validate drug candidates much faster and more thoroughly than traditional methods.

Explore large datasets: Use AI-powered platforms to virtually screen millions of compounds against thousands of protein targets in a fraction of the time, speeding up the search for new medicines.
Simulate biology: Deploy computational models to predict how drugs and genes interact with cells, allowing researchers to prioritize experiments and uncover unexpected drug effects.
Automate workflows: Integrate open-source engines and automated lab systems to streamline everything from target identification to molecule design, making pharmaceutical research more accessible and efficient.

Summarized by AI based on LinkedIn member posts

Jorge Bravo Abad

AI/ML for Science & DeepTech | Prof. of Physics at UAM | Author of “IA y Física” & “Ciencia 5.0”

28,988 followers 3mo
Report this post
AI-powered virtual screening that scores 10 trillion protein-ligand pairs in a single day Of ~20,000 human protein-coding genes, only about 10% have been successfully targeted by FDA-approved drugs or have documented small-molecule binders. The bottleneck isn't biology—it's computational scale. Traditional molecular docking takes seconds to minutes per protein-ligand pair, making genome-wide screening essentially impossible with current resources. Yinjun Jia and coauthors tackle this head-on with DrugCLIP, a contrastive learning framework that reframes virtual screening as a dense retrieval problem—similar to how modern search engines work. The key innovation: encode protein pockets and small molecules into a shared latent space using separate neural networks, then use cosine similarity for ultrafast ranking. The model is pretrained on 5.5 million synthetic pocket-ligand pairs extracted from protein structures, then fine-tuned on 40,000 experimentally determined complexes. The speed gains are staggering—up to 10 million times faster than docking. Combined with GenPack, a generative module that refines pocket detection on AlphaFold2-predicted structures, DrugCLIP enables screening at a scale previously unthinkable: 500 million compounds against ~10,000 human proteins, scoring more than 10 trillion pairs in under 24 hours on just 8 GPUs. The wet-lab validations are equally compelling. For norepinephrine transporter (NET), a 15% hit rate with two inhibitors structurally confirmed by cryo-EM. For TRIP12—a challenging E3 ubiquitin ligase with no known inhibitors or holo structures—a 17.5% hit rate using only AlphaFold2 predictions, with functional enzymatic inhibition confirmed. The resulting database, GenomeScreenDB, covers ~20,000 pockets from 10,000 proteins—nearly half the human genome—and is freely available at drugclip.com. The message is clear: by combining contrastive representation learning with generative pocket refinement and AlphaFold structures, we've entered an era where genome-wide drug discovery becomes computationally tractable, opening systematic exploration of the vast undrugged proteome. Paper: https://lnkd.in/e7aGUvAX #DrugDiscovery #ArtificialIntelligence #MachineLearning #DeepLearning #VirtualScreening #ComputationalBiology #AlphaFold #ProteinScience #Biotech #AIforScience #StructuralBiology #Bioinformatics #Pharmaceuticals #ComputationalChemistry #PrecisionMedicine
No more previous content

No more next content
4 Comments
Like Comment
Olivier Elemento

Director, Englander Institute for Precision Medicine & Associate Director, Institute for Computational Biomedicine

10,454 followers 3mo
Report this post
💊 AI just made drug discovery searchable Virtual screening at genome scale has been computationally prohibitive. Traditional molecular docking works well for one target at a time, but screening large compound libraries against thousands of human proteins simultaneously would take years, even on modern GPU clusters. A team at Tsinghua University just changed this. They screened 500 million compounds against 10,000 human proteins, scoring 10 trillion protein-ligand pairs in under 24 hours using just 8 GPUs (!). Their new paper in Science (https://lnkd.in/ek5-d9F7) introduces DrugCLIP, a contrastive learning approach that's 10 million times faster than traditional docking. 🔬 How it works Two neural networks encode protein pockets and drug molecules into a shared embedding space, trained so that binders cluster near their targets while non-binders are pushed apart. Both encoders are built on UniMol, a 3D transformer that processes atomic coordinates directly rather than chemical formulas. The training is clever: pretrained on 5.5 million synthetic protein-fragment pairs, then fine-tuned on 44,000 real crystal structures using random conformations rather than exact poses - forcing the model to learn chemical features, not memorize geometry. Once trained, screening becomes nearest-neighbor search. 🚀 Why it's so fast The speed comes from pre-computation. You encode your 500 million molecules once and store the vectors offline. Screening a new protein target then becomes vector similarity - no physics simulations, no pose sampling, no energy minimization per molecule. 📊 The validation The team validated hits in wet-lab experiments. Traditional virtual screens typically yield 1-5% hit rates. DrugCLIP achieved: → 15% hit rate for norepinephrine transporter (NET), with structurally novel inhibitors distinct from existing drugs - two confirmed by cryo-EM → 17.5% hit rate for TRIP12, a target with no previously known ligands, using only AlphaFold-predicted structures That second result is remarkable - they found the first functional inhibitors for an unexplored target implicated in cancer and Parkinson's. 🌐 The resource The team released GenomeScreenDB (https://drugclip.com), an open-access database containing candidate molecules for ~20,000 pockets across ~10,000 human proteins - more targets than have any known ligands in ChEMBL. I think this represents a shift in how drug discovery will work. When screening becomes this fast and cheap, the bottleneck moves from computation to ideas: which targets matter, which patient populations to prioritize, how to validate hits efficiently. Congratulations to co-first authors Yinjun Jia, Bowen Gao, Jiaxin Tan, Jiqing Zheng, Xin Hong and senior authors Yanyan Lan, Wei Zhang, Chuangye Yan, and Lei Liu.
No more previous content

No more next content
12 Comments
Like Comment
Raya Khanin PhD

7,270 followers 6mo
Report this post
📊 Can we discover new therapeutics entirely in silico? A new study in Nature Computational Science "In silico biological discovery with large perturbation models" https://lnkd.in/eitcTVBs introduces the Large Perturbation Model (LPM), a deep learning framework that learns from thousands of perturbation experiments across CRISPR, drug, and transcriptomic datasets to predict unseen biological outcomes. Every perturbation experiment captures how a cell changes when a gene is silenced or a compound is applied. But integrating these diverse data types has long been a bottleneck. The same gene perturbed in two different cell types or time points often leads to incomparable results. LPM solves this by separating three essential dimensions — Perturbation (P), Readout (R), and Context (C) — and then learning how they interact. The result: a model that doesn’t just interpolate data but learns causal rules connecting interventions to outcomes. 🔍 Key findings: → LPM consistently outperforms leading models (CPA, GEARS, scGPT, Geneformer) in predicting post-perturbation gene expression. → By placing drugs and gene knockouts in a shared latent space, it links compounds to their molecular targets — and flags off-target effects. For example, pravastatin grouped with anti-inflammatory agents, aligning with known biology. → In a virtual screen for polycystic kidney disease (ADPKD), LPM predicted that simvastatin could increase PKD1 expression — a prediction later validated in real-world patient data showing slower disease progression. → When its “virtual experiments” were added to real datasets, causal gene-gene network inference became more accurate. 💡 Why it matters: LPM shows that it’s possible to simulate biology before running an experiment. By training across heterogeneous datasets, it builds a foundation model for biological discovery , one that can generalize across cell types, perturbations, and modalities. 🧠 Potential applications: • Virtual drug discovery and mechanism-of-action prediction • Detecting off-target or synergistic drug effects • Filling in missing perturbation data to strengthen causal network mapping • Prioritizing compounds or pathways for wet-lab validation • Personalizing therapeutic hypotheses based on molecular context This isn’t just about prediction — it’s about transforming how experimental biology is done. Models like LPM turn large-scale perturbation data into a continuous, learnable system. Instead of testing one gene or one compound at a time, researchers can now explore millions of possibilities computationally, then focus resources where the model predicts meaningful biology. As perturbation datasets grow, from CRISPR screens to chemical libraries and multi-omic assays, this approach will redefine discovery pipelines in pharma, functional genomics, and personalized medicine alike. 🧬 Digital models are becoming the new laboratories of discovery.
No more previous content

No more next content
1 Comment
Like Comment
Ken Wasserman

Assistant Professor at Georgetown University School of Medicine

4,549 followers 2w
Report this post
Perplexity: Pharmaceutical Superintelligence (PSI) represents a paradigm shift in drug discovery where generative AI, multimodal foundation models, and automated labs merge into an end‑to‑end “prompt‑to‑drug” system. A scientist can issue a natural‑language prompt (e.g., “Design a drug for idiopathic pulmonary fibrosis”), and a reasoning AI autonomously coordinates specialized agents to identify targets, design molecules, execute robotic synthesis, and outline clinical strategies. 1. Protenix‑v1 and the Open‑Science Revolution Predicting accurate 3D protein–ligand structures forms the foundation of rational drug design. Protenix‑v1 is the first open‑source biomolecular prediction model to equal or exceed proprietary systems like AlphaFold 3 under identical data and compute budgets. It introduces inference‑time scaling—boosting accuracy by increasing sampling depth (up to 100 random seeds per complex)—providing a controllable accuracy–cost tradeoff. Its transparent architecture, alongside Boltz‑2 and Chai‑1, allows researchers to build fully inspectable biological pipelines. Together, these tools turn biology into a software‑defined stack—integratable, reproducible, and free from vendor lock‑in. 2. Open‑Source Engines for Autonomous Agents For PSI to function, reasoning agents must invoke specialized biophysical models to test hypotheses. Open‑source engines supply this validation layer, enabling AI systems to move from summarizing biology to inventing it. Multi‑agent environments (e.g., “Virtual Biotech”) can explore millions of datasets and interactions autonomously, collapsing the time and cost of discovering viable targets and candidate compounds. 3. TITO Strategy: Target Identification & Prompt‑to‑Drug Orchestration Drawing from Target Identification Pro (TID‑Pro) and the Text‑In/Text‑Out pipeline, PSI integrates target selection and molecule generation into a single reasoning hierarchy. TID‑Pro uses a positive–unlabeled framework combining 22 omics and text features to rank targets by clinical viability, outperforming general LLMs. The Prompt‑to‑Drug orchestrator decomposes user requests into structured tasks: target analysis (TID‑Pro), structure prediction (Protenix‑v1), molecule generation, and robotic synthesis/testing. This multi‑layered feedback loop—where each decision is cross‑checked by physics‑grounded predictors—ensures automation without hallucination. PSI thus transforms drug discovery into a continuous, autonomous feedback cycle capable of producing safe, synthesizable medicines directly from text prompts, marking a decisive step toward self‑driving pharmaceutical R&D. https://lnkd.in/dK7FU74U https://lnkd.in/exJSkinE listen to the podcast: https://lnkd.in/dcKwQGhH

Scientific Intelligence AI and Pharmaceutical Superintelligence via MultiAgent Stratification Ken Wasserman on LinkedIn
Like Comment
Dakshinamurthy Sivakumar

Turning 19 years of computational chemistry into AI tools that design better drugs | Director (AI & DD), BioCogniz | Director (R & I), Prognica Labs (Dubai) | Ex Discovery Scientist, Cresset (UK) | Professor | Mentor

6,181 followers 3mo
Report this post
Before you can design a drug, you need to find where it binds. Binding site prediction is often overlooked, but it's critical. 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀: → Cryptic pockets only visible in certain conformations → Allosteric sites far from active sites → Protein-protein interaction interfaces (flat, featureless) → Transient pockets that open/close dynamically 𝗧𝗼𝗼𝗹𝘀 𝗜'𝘃𝗲 𝗳𝗼𝘂𝗻𝗱 𝘂𝘀𝗲𝗳𝘂𝗹: 𝗣𝟮𝗥𝗮𝗻𝗸: ML-based, fast, good accuracy 𝗙𝗣𝗼𝗰𝗸𝗲𝘁: Geometry-based, well-validated 𝗦𝗶𝘁𝗲𝗠𝗮𝗽: Comprehensive druggability assessment 𝗗𝗼𝗚𝗦𝗶𝘁𝗲𝗦𝗰𝗼𝗿𝗲𝗿: Good for comparing multiple pockets 𝗣𝗿𝗼 𝘁𝗶𝗽: Run pocket detection on MULTIPLE conformations from MD. That cryptic site might only appear in 10% of frames. But it could be your most druggable option. What's your go-to tool for binding site prediction? #DrugDiscovery #BindingSite #P2Rank #Druggability #CADD #StructuralBiology #BioCogniz #UAE #Pharmacy #Bioinformatics
No more previous content

No more next content
22 Comments
Like Comment
Centre of Bioinformatics Research and Technology (CBIRT)

Democratizing Bioinformatics for a Smarter, Healthier Future!

50,383 followers 1y
Report this post
Scientists at Tsinghua University and Westlake University introduced #Dynaformer, a revolutionary graph-based Deep Learning model for predicting protein-ligand binding affinities. Unlike previous methods, Dynaformer leverages molecular dynamics simulations to capture the dynamic nature of protein-ligand interactions. 🎯 Dynaformer demonstrates state-of-the-art performance on the CASF-2016 benchmark, outperforming existing methods. The model learns from a curated dataset of 3,218 protein-ligand complexes, offering unprecedented accuracy in binding affinity prediction. 🔬 In a real-world test, Dynaformer identified 12 hit compounds (including 2 submicromolar hits) for HSP90 through virtual screening. This success, coupled with novel scaffold discoveries, showcases Dynaformer's potential to accelerate early-stage drug discovery. Quick Read: https://lnkd.in/gRWKt-V8 #Bioinformatics #MolecularDynamics #DeepLearning #StructuralBIology #AIinDrugDiscovery #ComputationalChemistry #DrugDesign #ScienceNews
No more previous content

No more next content
Like Comment
Pritam Kumar Panda, Ph.D.

Bioinformatician @ Stanford | AI Research Scientist in Drug Discovery & Protein Modeling | Foundation Models, LLMs, Multi-Omics, Deep Learning | Open-Source Developer | Nextflow Ambassador | Digital Biology

17,862 followers 4mo
Report this post
Contemporary applications and advances of LLMs in bioinformatics. 1. DNA/RNA sequence analysis, functional and structure prediction: includes tools for sequence analysis (e.g. HvenaDNA, DNAGPT), functional prediction (e.g. BERT-enhancer, DNABERT), and structure-focused methods (e.g. RNABERT, GeoBoost2). 2. Protein sequence analysis, functional and structure prediction: covers protein sequence modeling (e.g. ESM, ProtGPT), post-translational modification prediction (e.g. EpiBERTope, TransPPMP), and structural analysis (e.g. ProteinBERT, MSA transformer). 3. Multi-omics data analysis: features tools for genomics (e.g. scGPT, iDNA-ABT), epigenomics (e.g. scELMo, Mul_an-methyl), and integrative omics approaches (e.g. DeepGene transformer, POOE). 4. Computational drug discovery and design: includes models for molecular design (e.g. MolGPT, ChemBERTa), drug–target interaction (e.g. DT-I-BERT, TransDTI), and pharmaceutical applications (e.g. PharmBERT). 5. Biomedical literature mining: lists NLP models for biomedical text analysis (e.g. BioBERT, ClinicalBERT, Galactica). Paper Link: https://lnkd.in/gPHSyKRe
No more previous content

No more next content
5 Comments
Like Comment
Nadia Harhen

AI Leader | Regulatory Affairs Expert | Driving Discovery and Innovation in Life Sciences, Chemistry, and Materials Science

3,373 followers 3mo
Report this post
Generative AI has the potential to transform drug discovery, but many existing models struggle to satisfy the full set of molecular requirements needed for viable drug candidates. IDOLpro addresses this challenge by combining diffusion-based generative chemistry with multi-objective optimization, guided by differentiable scoring functions. In benchmark tests, IDOLpro generated ligands with 10 to 20 percent higher binding affinity and improved synthetic accessibility compared to state-of-the-art methods. The result is faster and more cost-effective discovery of drug-like molecules than exhaustive virtual screening approaches. This platform enables faster hit identification, more efficient lead optimization, and a new path forward for next-generation drug discovery. Interested in learning more? Download the full paper here: https://lnkd.in/ePrRwNAh

Guided Multi-Objective Generative AI to Enhance Structure-Based Drug Design | Publications Library | SandboxAQ pub.sandboxaq.com
Like Comment
John Carpenter

Professor Emeritus at Univ. of Colorado Anschutz Medical Campus Biopharma Consultant when not fishing

21,976 followers 3w
Report this post
This excellent new paper by Arsiccio et. al. from Coriolis Pharma describes studies on a novel in silico platform combining data-driven and physics-based models for protein formulation developability assessment. Quoting from the abstract: "The development of novel therapeutic proteins is accompanied by huge financial and time investments. Advancing molecules through drug product development without proper scrutiny often leads to costly failures. In some cases, such failures are related to the inherent instability of the candidate molecule, and the difficulty of minimizing these instabilities by selecting a suitable formulation. Early characterization of a protein drug candidate's formulation developability is therefore crucial to reduce the risk. A comprehensive in vitro assessment is often constrained by the scarcity of drug substance material available in early stages of development, raising the interest in simulations and computational models. Currently, only very few in silico methods focusing on formulation development and assessing a protein drug candidate’s formulation developability are available. To address this problem, we present a novel in silico platform based on data driven and physics-based models that combines computational techniques spanning different areas, including structure prediction, bioinformatics, machine learning and molecular dynamics calculations. The platform only requires the primary sequence of a therapeutic protein as input, thus eliminating the need for physical material. We describe the computational tools that are part of the platform, and show how they can be used to identify liabilities and outlying properties of the protein drug candidate, evaluate high concentration suitability, screen the effect of formulation conditions on self-interaction propensity, and suggest formulation corridors. We additionally present two case studies that illustrate potential applications of the developed platform to real-world scenarios and prove its experimental validation."

8 Comments
Like Comment
Moustafa Gabr

Associate Professor at Weill Cornell Medicine

5,881 followers 5mo
Report this post
🚀 Python-powered drug discovery 🐍💊 from the Gabr Lab is now published in Computational and Structural Biotechnology Journal. Proud to share our new paper where we didn’t just run virtual screening, we built the entire computational pipeline from scratch in Python to target the CAPON-NOS axis that was considered “undruggable.” Our Python workflow: ⚙️ preprocesses 4.6 million molecules with RDKit 🎯 performs ensemble docking across NMR conformations 🔎 automates hit ranking using consensus + Pareto optimization 🔄 validates via 200-ns molecular dynamics (MD) Results: ✅ 9 high-confidence hits ✅ 2 compounds that stabilize CAPON-NOS across conformational states The future of drug discovery? Code first. Experiments second. 🔗Read the full paper: https://lnkd.in/eXw-SJ_R 👏Huge credit for Hossam Nada for leading this amazing study. Another fantastic collaboration with Gerhard Wolber #Python #DrugDiscovery #ComputationalChemistry #VirtualScreening #MolecularDynamics #MedChem
No more previous content

No more next content
2 Comments
Like Comment

Drug Discovery Computational Tools

Summary

More in Scientific Computing Software Tools

Explore categories