Big Data Utilization in Epidemiological Research

Explore top LinkedIn content from expert professionals.

Summary

Big data utilization in epidemiological research means using large, complex datasets—such as online search trends, social media posts, clinical records, and sensor data—to study patterns, causes, and impacts of diseases in populations. This approach helps researchers track outbreaks faster, analyze public health trends, and predict how diseases might spread in real time.

Integrate multiple sources: Combine hospital data, digital signals, and environmental information to gain a fuller picture of population health and spot outbreaks early.
Use machine learning: Apply advanced algorithms to sift through vast data, identify important patterns, and improve prediction accuracy for disease trends and risk groups.
Consider data biases: Be mindful that digital data can miss certain groups and reflect public attention as much as actual disease, so interpret signals carefully and supplement with traditional records when possible.

Summarized by AI based on LinkedIn member posts

Oliver Morgan

Global Health Executive | WHO Director | Strategic Innovator | Public Health Intelligence Leader | Executive Coach | Author | Speaker

7,937 followers 9mo
Report this post
This new paper by Sergio Consoli et al explores how generative AI can transform unstructured outbreak data into structured, searchable knowledge. The team developed an epidemiological knowledge graph (eKG) using WHO Disease Outbreak News (DONs), applying an ensemble of large language models to extract details such as disease name, country, date, and number of cases or deaths. The researchers used open-source models including Mistral, Zephyr, and Meta-Llama to extract information from over 3,000 outbreak reports. They structured this data into a FAIR-compliant knowledge graph, linking it with biomedical and geographic ontologies. The resulting resource—comprising nearly 3,000 outbreak events—is now publicly accessible via SPARQL endpoints and visualization tools. This matters because many official outbreak reports remain locked in prose, making them difficult to analyze at scale. With eKG, public health professionals can conduct detailed, structured queries across decades of global outbreak data. This significantly improves our ability to track, analyze, and respond to emerging health threats. The big takeaway: AI can unlock the full value of legacy outbreak data by transforming it into structured, interoperable formats that support real-time analysis and response. This approach opens new possibilities for integrating informal sources like news and social media into formal disease surveillance systems, advancing global preparedness and early warning capabilities. https://lnkd.in/ePc54yvQ #GlobalHealth #PathogenSurveillance #HealthInnovation #PublicHealth

An epidemiological knowledge graph extracted from the World Health Organization’s Disease Outbreak News - Scientific Data nature.com

11 Comments
Like Comment
Akshaya Bhagavathula

Professor of Epidemiology, NDSU | Digital Epidemiologist & AI | PharmacoEpi | Legal Epi | IHME GBD Lead | ACE Fellow | Spatial Informatics

7,563 followers 6mo
Report this post
Digital Epidemiology: Listening to the Public’s Pulse Through Data My recent research examined how digital signals reveal shifts in public concern in real time. Using hourly Google Trends data, AI, and interrupted time series modeling, I tracked how searches about #Tylenol and #autism changed after a national announcement. Key findings: 👉 Search interest in Tylenol increased more than four-fold within an hour, a clear “digital shockwave.” 👉 Related queries quickly turned to #pregnancy, #autism, and #safety, showing how fast public narratives evolve. 👉 Attention decayed within two days, with a half-life of about 46 hours, underscoring how quickly #interest fades #online. Why this matters ⭐ Digital epidemiology helps me understand how people react to health information at scale. ⭐ By combining explainable #AI with real-time #search data, I can identify information gaps, anticipate #misinformation, and support better risk communication. Public health #surveillance must evolve beyond counting cases. It must listen to conversations, because the future of population health is biological, #behavioral, #digital, and human. #DigitalEpidemiology #PublicHealth #AIinHealth #HealthCommunication #Misinformation #PopulationHealth #ScienceCommunication #infodemiology
No more previous content

No more next content
7 Comments
Like Comment
Oke Ikpekpe

Research Associate | Medical Writer | Epidemiology & Public Health

3,156 followers 3mo
Report this post
If thousands of people in one city post about fever, sore throat, and loss of smell within the same hour, what does that mean? An outbreak Or a trending topic? Traditional epidemiology relies on clinical records, laboratory confirmation, and verified case reports. These data are reliable, but slow. By the time a signal reaches formal surveillance systems, transmission may already be widespread. This gap is where digital epidemiology comes in Search queries, social media posts, and data from wearable devices offer something traditional systems cannot: speed. These digital traces can capture shifts in population health in near real time. Signals related to influenza, fever, mental distress, or even medication side effects may appear online before they reach a clinic. But digital epidemiology comes with its own biases. One is selection bias; platforms like X or Google are not random samples. They skew younger, urban, and digitally connected. Entire groups, often the most vulnerable, remain digitally invisible. An outbreak affecting older adults or rural communities may barely register online, leading to a dangerous blind spot in the models. Another is information bias, panic not just pathogens, can drive search trends. During a crisis, public anxiety is often indistinguishable from an infection signal in the data. Google Flu Trends is a classic example. It eventually overestimated influenza activity, in part because people searched for symptoms when they were scared, not only when they were sick. A spike in searches for shortness of breath can reflect a frightening news cycle as much as viral transmission. This leads to a critical epidemiological question: Is the signal biological, or is it social? Digital data often measures public attention as much as disease dynamics. During COVID-19, searches for anosmia (loss of smell) appeared days before confirmed cases in some regions, but they also rose sharply alongside media coverage. The signal was real, but it was amplified by awareness. There is also a denominator problem. In digital epidemiology, the denominator is vast but poorly defined. What is actually being measured? Disease occurrence, symptom perception, health anxiety, or collective attention? So... Is the real value of digital epidemiology in tracking disease, or in tracking the infodemic that shapes how outbreaks unfold? #DigitalEpidemiology #Epidemiology #PublicHealth #Surveillance #DataScience #HealthTech #Bias #Infodemiology
No more previous content

No more next content
11 Comments
Like Comment
Jack (Jie) Huang MD, PhD

Chief Scientist I Founder and CEO I President at AASE I Vice President at ABDA I Visit Professor I Editors

35,116 followers 8mo
Report this post
This newsletter highlights how multi-source data fusion is transforming real-time infectious disease surveillance. By integrating diverse datasets—from hospital records, laboratory test results, wearable devices, environmental sensors, mobile data, and even social media signals—next-generation platforms are enabling earlier detection of outbreaks and more accurate predictions of disease spread. Furthermore, this newsletter explains how machine learning can filter out noise, identify anomalies, and predict the trajectory of outbreaks, while interactive dashboards provide public health officials with real-time hotspot maps and intervention plans. Crucially, ensuring that sensitive health data can be analyzed securely builds public trust. In short, this shift from retrospective reporting to proactive and predictive surveillance represents a new paradigm for precision public health, enhancing global preparedness and recovery capabilities. #InfectiousDiseaseMonitoring #RealTimeSurveillance #MultiSourceData #PublicHealthAI #DigitalHealth #EpidemicForecasting #DataFusion #PrecisionPublicHealth #AIinHealthcare #HealthTech #CSTEAMBiotech

🟥 Real-time Infectious Disease Surveillance Platform Based on Multi-Source Data Fusion Jack (Jie) Huang MD, PhD on LinkedIn

2 Comments
Like Comment
Zhaohui Su

VP, Strategic Consulting @ Veristat | Scientific Leader with 25+ Years in Biostatistics

5,276 followers 9mo
Report this post
The Cox proportional hazards model is commonly used to analyze censored survival data, but high-dimensional covariates in some real-world evidence (#RWE) studies require robust #feature_selection techniques. With the expansion of “big data,” advanced #machine_learning models like random survival forests (RSF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) have become essential for survival analysis. RSF leverages decision tree ensembles for non-linear patterns, while GBM and XGBoost iteratively improve prediction accuracy through ensemble learning. These techniques not only enhance computational efficiency and predictive power but also contribute to #personalized_medicine by uncovering patient subgroups suitable for targeted treatments. For more information, please refer to the recent paper below by Cai and colleagues.
Like Comment
AIMMLab - Artificial Intelligence and Mathematical Modelling Lab

Home to CRC in Community-Oriented Disease Modeling, AI4PEP and ACADIC | Expert in AI, Data Analytics & Mathematical Modelling| Design & Deploy AI and Mathematical solutions for decision-makers in various sectors.

2,217 followers 9mo
Report this post
🎉 Preprint Update!🎉 AIMMLab - Artificial Intelligence and Mathematical Modelling Lab is excited to share our latest work: "Integrating Deep Learning Methods and Web-Based Data Sources for Surveillance, Forecasting, and Early Warning of Avian Influenza" has consistently been the top downloaded article for several weeks now. Thanks, everyone, for reading it. 📄 Read the full paper here: https://lnkd.in/e9VSjV5j 💡 Why it matters: Avian Influenza (HPAI-H5N1) poses a growing public health and economic threat. Our study uses Canada as a case study to show how integrating web-based digital signals (like Google Trends, Reddit, Facebook, and news data) with deep learning models can improve outbreak forecasting and early warning systems. 🧠 What we did: We evaluated 7 public web-based and environmental data sources alongside historical outbreak data. We tested multiple models including: — GRU, LSTM, CNN-GRU, CNN-LSTM — Random Forest, SVM, and Naïve Bayes 📊 What we found: — GRU outperformed all other models. — Historical HPAI case counts, Facebook post volume, and minimum temperature were the strongest predictors. — Our findings highlight the promise of digital epidemiology in strengthening animal and public health surveillance. 👏 Shout-out to our amazing Postdoc Dr Zahra Movahedi Nia for this excellent research. 🙏 Funding Acknowledgement: This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2022-04559), and additional grants from SSHRC-CRSH and the International Development Research Centre (IDRC). 🤝 Interested in disease forecasting, digital epidemiology, or collaborations on zoonotic disease surveillance? Let’s talk. 📧 Contact: aimmlab.dlsph@utoronto.ca Abbas Yazdinejad|Gelan Ayana Zewdie, Ph.D.|Yang Xu|Cynthia Luo|Yiyang LIU|Qing Han|Lavneet Singh|Jude Kong|Aseel Magzoub
No more previous content

No more next content
1 Comment
Like Comment

Big Data Utilization in Epidemiological Research

Summary

More in Epidemiology Data Collection Methods

Explore categories