Top LinkedIn Content on Computational Biology Resources

7,563 followers 6mo

AI in Epidemiology: The 2025 Skill Map - I spent the last 3 weeks building this no-nonsense roadmap so public health professionals, researchers, and data scientists can break into AI-powered epidemiology in 2025. Save it. Share it. Apply it ➦ 1. AI for Disease Surveillance: AI is changing how we detect, track, and forecast outbreaks in real time. To add real value here, learn to: →Build predictive pipelines with epidemiological time series (ARIMA, LSTM, Prophet) →Integrate mobility, search, and social media data for early-warning systems → Use anomaly detection for unusual patterns in case counts or symptoms → Combine mechanistic (SEIR) and machine-learning models for hybrid forecasting Tools: Python, PyTorch, Prophet, GLEAMviz, Google Health Trends API ➦ 2. NLP for Epidemiologic Intelligence: Natural language processing is revolutionizing outbreak reports, social listening, and misinformation detection. Key skills: → Text cleaning and entity recognition (diseases, symptoms, drugs, places) → Topic modeling and trend analysis for digital surveillance → Sentiment and misinformation classification → Fine-tuning domain-specific LLMs on health text (BioBERT, PubMedBERT) Tools: spaCy, Transformers, scikit-learn, LlamaIndex, LangChain ➦ 3. Explainable AI (XAI): Health data isn’t useful if policymakers can’t trust it. Epidemiologists need to master transparent AI methods. Learn to: →Interpret models with SHAP and LIME → Identify feature importance and bias in prediction models → Build dashboards that visualize explainability for policy stakeholders → Audit algorithms for fairness and equity Tools: SHAP, LIME, ELI5, Plotly Dash, Streamlit, Tableau ➦ 4. Spatiotemporal Modeling Most public health questions are “where and when.” AI brings precision to those answers. Skills to focus on: →Build spatial regressions and GWR models → Detect hot-spots using LISA or Moran’s I → Combine satellite, environmental, and social data for risk prediction → Train spatial ML models (XGBoost + spatial lags, graph neural networks) Tools: GeoPandas, PySAL, XGBoost, Kepler.gl, ArcGIS Pro, QGIS ➦ 5. Causal AI for Policy Impact Prediction shows what — causality shows why. You’ll need to know: → Difference-in-Differences and ATT(g,t) estimation → Structural causal models and DAGs → Counterfactual prediction using machine learning (causal forests, DoWhy) → Policy simulation with synthetic controls Tools: DoWhy, EconML, PyMC, CausalImpact, R (did, fixest, causalforest) If you work in public health, medicine, or data science - this is your AI decade. Don’t wait for “AI experts” to explain your field. 💡Key takeaway: "Epidemiologists who understand both data and disease will define the next generation of health intelligence." If this roadmap helps you see where AI meets epidemiology - hit save, share, or tag someone building the future of public health. #DigitalEpidemiology #PublicHealthAI #Epidemiology #DataScience #AIEthics #HealthInnovation #AI

15 Comments

Kermen Bolaeva

Area Sales Rep Middle East & CIS@ New England Biolabs | Molecular Biology

2,330 followers 1y

❓ ONT, Illumina & MGI – What’s the Difference? 🔬 Next-Generation Sequencing (NGS) allows scientists to read genetic code by sequencing millions (or billions) of DNA fragments in parallel. Let’s explore some key platforms: 1️⃣ Illumina 1) Sample & Library Preparation: DNA/RNA is purified, fragmented, and ligated with adapters containing cluster recognition sites (bind to specific spots on the flow cell), index sequences (identify the sample), and primer binding sites. NEBNext UltraExpress® FS DNA Library Prep Kit https://lnkd.in/dMjcZphg is widely used for high-quality library preparation. 2) Cluster Generation: The flow cell has oligonucleotides complementary to the adapters, allowing fragments to bind. A PCR-like process (bridge amplification) forms clusters. Multiple copies of the strand ensure that the fluorescent signal during sequencing will be strong enough. 3) Sequencing: Fluorescently labeled nucleotides (G,C,A,T) with terminators bind one at a time to all single strands in the cluster (at any given moment, only one type of nucleotide binds, emitting a specific color). A camera records fluorescence to identify nucleotides. Terminator groups are cleaved to allow the next cycle. 4) Reverse Strand Sequencing: Index sequences are read, the reverse strand is synthesized and sequencing is repeated. 5) Data Analysis: Low-quality reads are filtered, and sequences are aligned. 2️⃣ MGI 1) Sample & Library Preparation: DNA is fragmented, ligated with adapters, and circularized into ssCirDNA. NEBNext® FS DNA Library Prep Kit for MGI® https://lnkd.in/dQMtgbNd provides a reliable solution for generating high-complexity libraries with optimized workflow. 2) DNB Generation by Rolling Circle Amplification: ssCirDNA acts as a template for continuous amplification, forming dense DNA Nanoballs (DNBs) with multiple copies of the sequence. 3) Loading DNBs: DNBs bind to specific spots on the flow cell. 4) Sequencing: Fluorescently labeled nucleotides with terminators bind one at a time to all sequences in the DNBs simultaneously. A camera records fluorescence to identify each nucleotide. Terminators are cleaved to allow the next cycle. 3️⃣ Oxford Nanopore Technologies 1) Sample & Library Preparation: DNA/RNA is extracted, purified, and ligated with motor protein adapters. 2) Loading the Flow Cell: The library is added to a flow cell containing thousands of nanopores. 3) Sequencing: The motor protein unzips the DNA, guiding it through the nanopore one base at a time. Each nucleotide disrupts the ionic current in a unique way, producing a signal used to determine the sequence. 4) Base Calling & Data Analysis: Signals are converted into nucleotide sequences, followed by read alignment and error correction. #NGS #Sequencing #Genomics #Bioinformatics #Illumina #Nanopore #MGI #Biotech

29 Comments

🎯 Ming "Tommy" Tang

65,027 followers 10mo

Ever wondered why analyzing RNA-seq data feels like walking through a fog with 20,000 dimensions? Let’s talk about the curse of dimensionality in bioinformatics—and why it’s not just a math problem, it’s a biological one. 🧵 1/ In bulk RNA-seq, you often have 20,000 genes and maybe 10 samples. That’s the definition of p >> n. More features than samples. Sounds powerful? It’s a trap. For every sample, you have a vector of 20000, that is high dimension 2/ This high-dimensional setup leads to: Overfitting Noise overload Spurious correlations Useless distances (if you use all 20k genes, the distances among samples are similar) Your model starts seeing patterns where none exist. 3/ How do we fix it? We reduce dimensions. You can either select fewer genes or transform the data into fewer axes. Let’s unpack both. 4/ Feature selection: Keep top genes by variance Choose DE genes via DESeq2/edgeR Use pathway gene sets You lose some data, but keep interpretability. 5/ Dimension reduction: PCA: Projects into fewer axes capturing most variance t-SNE/UMAP: non-linear reduction, Great for visualizing clusters Autoencoders: Deep learning compression You lose interpretability, but gain structure. 6/ What does PCA really do? Take 20,000 genes and build 10 orthogonal “super-genes” (PCs). They explain variance, not necessarily biology. Still, it helps you see patterns you’d never spot in raw data. 7/ This works in single-cell RNA-seq too—except now you’ve got 20,000 genes AND 50,000 cells. Welcome to p ≈ n. But there’s a twist: the matrix is full of zeroes. That’s dropout. 8/ scRNA-seq dimension reduction steps: Pick highly variable genes Use PCA to drop noise UMAP to visualize clusters Harmony/scVI to align across batches. (Harmony correct the PC coordinates) Autoencoders to denoise 9/ Key challenges in scRNA-seq: Sparsity Batch effects Scalability You need tools that know how to swim in noisy waters—Seurat, Scanpy, scVI. 10/ Bottom line: Dimensionality is a curse only if you treat all data as equal. The solution is knowing when to cut, compress, or interpret. 11/ Takeaways: High dimensions can lie Use PCA/UMAP wisely Interpret results with biology in mind Don’t trust t-SNE for distances Feature selection is your first defense 12/ Bioinformatics isn’t about blindly analyzing big matrices. It’s about asking: “What’s the signal? What’s the noise? And what can I ignore to see the truth?” That’s the real art of dimension reduction. I hope you've found this post helpful. Follow me for more. Subscribe to my FREE newsletter chatomics to learn bioinformatics https://lnkd.in/erw83Svn

11 Comments

Adedolapo Bakare

Research Officer | Molecular Biologist | Biotechnologist | Backend Developer.

2,012 followers 7mo

I still remember the first time I heard the term PCR-RFLP (Polymerase Chain Reaction – Restriction Fragment Length Polymorphism). It sounded so complex that I thought it belonged only in advanced textbooks. But once I saw how it worked, it became one of the most fascinating techniques I’ve learned in molecular biology. PCR is like making endless photocopies of a specific page from a giant book (your DNA). Once you have enough copies, RFLP steps in. It uses molecular ‘scissors’ (restriction enzymes) that cut the DNA at specific recognition sites. If a mutation is present, even a single base change, the scissors cut differently (or sometimes not at all). That small change creates a whole new pattern of DNA fragments. Practical example: Imagine two chickens that look identical, but we want to know which one carries a growth-related gene variant. Using PCR-RFLP, we amplify the gene region, digest it with an enzyme, and run it on a gel. One bird’s DNA might show two bands, the other three—clear evidence of a genetic difference we can’t see with our eyes. That’s the beauty of PCR-RFLP: it transforms hidden genetic variations into clear patterns we can read, whether in plant breeding, human disease research, or even forensic science.

25 Comments

Lucas Barreira

PhD student in Tropical Ecology | GIS & Spatial Analysis Specialist | Conservation of Threatened Flora

7,533 followers 1y

Species Distribution with QGIS and MaxEnt Species distribution modeling is a powerful tool in ecology, helping us understand and predict the habitats of various species Preparing Your Datasets Define Your Study Area: Load the shapefile.shp (for your region of interest) in QGIS to display boundaries. Select specific region using expressions in the attribute table. Save the selected region as a new shapefile. Create a Study Area Polygon: Use the Polygon from Layer Extent tool to generate a polygon shapefile box.shp covering your study area. Clip Environmental Rasters: Clip your raster layers from Worldclim using the Clip raster with polygon function, ensuring the output files are in .ASC format for MaxEnt compatibility. Extracting Species Occurrence Points Use tools like R or Excel to filter species occurrence data. For instance, model the distribution of Apuleia leiocarpa observed in the Northeast region. Modeling with MaxEnt Load Data in MaxEnt: Load your species occurrence CSV and environmental rasters. Ensure all settings are configured correctly (e.g., feature types, output format). Visualize in QGIS: Load the resulting ASC file in QGIS. Style the layer to visualize the predicted distribution. By following these steps, you can create a robust species distribution model that helps in conservation planning and ecological research. 🌳 📚 References Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978. Ramos, L.T., Torres, A.M., Pulhin, F.B. & Lasco, R.D. (2011) Developing a georeferenced database of selected threatened forest tree species in the Philippines. Philippine Journal of Science, 141, 165–177. Stattersfield, A.J., Crosby, M., Long, A.J. & Wege, D.C. (1998) Endemic Bird Areas of the World: Priorities for Biodiversity Conservation. The Burlington Press, Ltd., Cambridge, United Kingdom. hashtag #QGIS #MaxEnt #SpeciesDistribution#Ecology#GIS #Conservation #Biodiversity #RemoteSensing #EnvironmentalScience#GeospatialAnalysis #BiodiversityConservation #GeographicInformationSystems #EcologicalModeling #GISMapping #EnvironmentalData#SpatialAnalysis #Sustainability #WildlifeConservation#NatureConservation #HabitatModeling #DataScience #EarthScience

Chrysanthos Lymperopoulos

MSc Student in Statistics | Agriculture Graduate | Interested in Data Science, Ecological Modelling, Medical Entomology, Biostatistics & Epidemiology.

5,063 followers 3mo

🚀 Learning R in 2026 and working in epidemiology? 📘 The Epidemiologist R Handbook is a free, task-oriented R reference designed for applied epidemiology and public health practice. Developed by practicing epidemiologists, it provides practical guidance across the full analytic workflow, including data management, analysis, visualization, and reporting. The handbook has been used by over 850,000 learners worldwide and supports training programs at organizations such as Centers for Disease Control and Prevention, World Health Organization, and numerous public health agencies. 🟢 If you’re new to R R Basics Transition to R Suggested packages R projects Import and export 🟡 If you already use R a bit Cleaning data and core functions Working with dates Pivoting data Grouping & joining data De-duplication 🔵 If you’re doing applied analysis Use the Analysis section as a toolbox: Descriptive tables Regression (uni & multivariable) Missing data Time series & outbreak detection Survey & survival analysis #GIS basics 🔗 https://lnkd.in/dDsM-2na Applied Epi 💡Bonus 📘 Handbook highlight: Chapter 24 — Epidemic Modeling Chapter 24 of The Epidemiologist R Handbook provides a practical introduction to epidemic modelling for applied epidemiology, with a focus on implementation rather than theory. This chapter demonstrates how to: • Estimate the effective reproduction number (Rₜ) • Account for reporting delays and uncertainty • Compare commonly used tools (EpiNow2 and EpiEstim) • Produce short-term incidence projections • Interpret outputs responsibly for public health decision-making Rather than emphasizing methodological derivations, the chapter guides readers through realistic workflows using linelist data, reproducible code, and clearly explained assumptions—reflecting how modelling is used in routine outbreak response. 🔗 Explore Chapter 24 and the full handbook at appliedepi.org #Epidemiology #EpidemicModeling #RStats #AppliedEpi #OutbreakResponse Lucca Nielsen Shazia Ruybal-Pesántez, PhD Berhe E. Tesfay Santiago Rayment Gomez

20 Comments

Andrew Gonzalez

3,433 followers 6mo

🌍 Predicting the way forward for the Global Biodiversity Framework A new paper led by Damaris Zurell and many fantastic colleagues GEO BON https://lnkd.in/euzPSQjd The world has rallied behind the UN Biodiversity Kunming–Montreal Global Biodiversity Framework (GBF) — a landmark plan to halt biodiversity loss by 2030. We point out that the GBF's indicators track what’s already happened (aka, lagging indicators) — yet offer little insight into whether today’s actions will actually secure nature’s future. The risk is that we are looking only in the rear-view mirror. 🔮 The solution? Bring prediction to the heart of biodiversity policy. Just as climate science uses models to forecast global temperatures, extreme events, and guide emissions targets, conservation needs predictive models to test strategies before they fail. These models can: Link actions to outcomes — showing which conservation measures will work where; Balance ecological goals with economic and social realities. Anticipate time lags and cross-border impacts. Highlight data gaps and guide smarter monitoring. Provide the basis for leading indicators. To make this shift, we propose a World Biodiversity Research Programme (WBRP) — a coordinated global effort, akin to the World Climate Research Programme, to standardize and advance biodiversity modeling. Without such foresight, the GBF could end up documenting decline instead of preventing it. With it, we could turn from "writing nature’s obituary to crafting its recovery." #Biodiversity #Conservation #SciencePolicy #PredictiveModeling #Sustainability #GBF #NaturePositive

Predicting the way forward for the Global Biodiversity Framework | PNAS pnas.org

1 Comment

Francesco Rugolo, PhD

Molecular Biologist & Biochemist | Oncology & Immunology Research | Bioinformatics & Data-Driven Scientist | Published Scientist & Mentor

5,431 followers 7mo

One of the most underrated tools for bioinformatics beginners (and honestly, even for seasoned researchers) is Galaxy (https://usegalaxy.org/). It’s a web-based platform that allows you to run full NGS pipelines without installing anything locally. I’ve been using Galaxy mainly for RNA-seq analysis, and here’s a simple roadmap if you want to try it: 1. Upload your data FASTQ files go in first. Galaxy lets you upload them directly or fetch them from online repositories like GEO/SRA. 2. Quality check Start with FastQC. You’ll quickly spot adapter contamination, low-quality bases, or overrepresented sequences. 3. Trimming Use Trimmomatic (or similar) to clean your reads. This step is crucial to avoid biases in downstream analysis. 4. Alignment Map your reads to the reference genome with HISAT2 or STAR. Galaxy handles the heavy lifting, and you’ll get BAM files as output. 5. Read counting Quantify gene expression with featureCounts or HTSeq-count. Now you have your count matrix, the backbone of differential expression analysis. 6. Differential expression Galaxy integrates DESeq2 and edgeR. You can run them directly, generate MA plots, volcano plots, and identify up/downregulated genes. 7. Pathway / functional analysis Push your gene lists into enrichment tools (Galaxy has plugins, or you can export to Enrichr, DAVID, GSEA). That’s where biological meaning emerges. Why Galaxy? 🔺 No command line needed. 🔺 Transparent workflows you can save, share, and reproduce. Perfect for teaching, learning, or quick analyses without a powerful computer. Excel it's very powerful but limiting, R and Python are the gold standards but hard begin with; if you want to get hands-on with real NGS data quickly, Galaxy is one of the most accessible entry points (imho). 👉 Have you ever tried Galaxy for RNA-seq? Do you think web-based pipelines can replace command line workflows in research? -- #Bioinformatics #RNAseq #NGS #GalaxyProject #Transcriptomics #Genomics #DataAnalysis #LifeScience #Postdoc #PhDLife

18 Comments

John Drake

Regents’ Professor of Ecology and Director of the Center for the Ecology of Infectious Diseases

2,089 followers 4mo

Forecasting infectious diseases remains challenging because purely data-driven models often overfit, while traditional compartmental models struggle to adapt when transmission conditions change. Our newly published study in Journal of the Royal Society Interface examines whether physics-informed neural networks (PINNs) can bridge this gap by embedding a full epidemiological ODE system directly into a deep learning model. Using COVID-19 data from California, we evaluate how well this hybrid approach predicts cases, hospitalizations, and deaths across 1–4 week horizons. We find that PINNs produce more stable and more accurate forecasts than naïve baselines and common sequence models (RNNs, LSTMs, GRUs, Transformers), while remaining simpler to implement than large state-space models. The work suggests a viable path toward forecasting frameworks that learn from data while staying anchored to disease dynamics. 🔗 https://lnkd.in/eFkW_egm #Forecasting #COVID19 #AI #MachineLearning #ScientificMachineLearning #PhysicsInformedNeuralNetworks #ComputationalEpidemiology #EpidemiologicalModeling #InfectiousDiseaseModeling #TimeSeriesForecasting #DynamicalSystems

6 Comments

Zulqarnain Yousaf

Dedicated Biotechnologist with expertise in Molecular Biology (PCR),Microbiology and Serology specialising in advanced research, diagnostics, and QC/QA to drive scientific innovation and excellence.

8,067 followers 11mo

Common Experimental Methods in Molecular Biology Molecular biology focuses on the structure and function of molecules essential to life, particularly DNA, RNA, and proteins. It provides the foundation for understanding gene regulation, cellular signaling, and disease mechanisms. A wide array of experimental methods is used to analyze these biomolecules and their interactions. One of the core techniques is nucleic acid extraction, which isolates high-purity DNA or RNA from cells or tissues for downstream applications. Once extracted, Polymerase Chain Reaction (PCR) is used to amplify specific DNA sequences, while quantitative PCR (qPCR) enables real-time quantification of gene expression. Gel electrophoresis is employed to separate DNA, RNA, or proteins based on size, providing a simple but powerful method for assessing molecular integrity and fragment length. Agarose gels are used for nucleic acids, whereas SDS-PAGE is applied to proteins. Reverse transcription (RT) techniques convert RNA into complementary DNA (cDNA), which can then be analyzed by PCR or sequencing to study gene expression. Northern blotting and Southern blotting remain classic tools for detecting specific RNA and DNA sequences, respectively. Western blotting is commonly used to detect specific proteins using antibodies, revealing information about protein expression levels and modifications. In addition, cloning and transformation techniques allow genes to be inserted into plasmids and expressed in host cells, facilitating functional studies or protein production. Advanced tools like CRISPR-Cas9 genome editing, RNA interference (RNAi), and next-generation sequencing (NGS) are now integral to molecular biology, enabling precise gene manipulation and large-scale genomic analysis.

6 Comments

Computational Biology Resources

More in Computational Biology Resources

More Science topics

Explore categories