Analyzing Hospital Data with SQL
Picture Credits: Canva

Analyzing Hospital Data with SQL

Introduction

As a physical therapist, I’ve spent years at patients’ bedsides—helping them recover, regain function, and navigate their hospital journeys. But I always had a quiet curiosity: What drives the decisions behind the care plans? How are hospital resources managed? Is care really equitable across patient populations?

When I stepped into this new hypothetical role as a healthcare data analyst, I was handed a long list of tasks by my overwhelmed boss. At first, it felt like stepping into a different world—spreadsheets instead of stretchers, SQL instead of SOAP notes. But soon, that initial pressure turned into a sense of excitement. This was a chance to look behind the curtain and explore the systems I’d been working within for years.

Why THIS Project?

This project became my way of connecting the dots between clinical care and data-driven decisions. By analyzing hospital stay lengths, identifying which specialties performed the most procedures, and examining patterns in lab testing and medication use, I could start to see how data tells a story—one that impacts real patients, real workflows, and real outcomes.

I was especially drawn to the questions about equitable care. As someone who has treated patients from all backgrounds, it was encouraging to see that the data showed minimal disparity in lab procedures across racial groups. That insight, along with others, reaffirmed why this work matters.

In short, this project wasn’t just about running queries—it was about blending my clinical lens with analytical tools to uncover insights that can genuinely improve patient care and help hospitals function more effectively.

Information on dataset:

The data set can be found on Kaggle and represents 10 years (1999-2008) of clinical care data from 130 US hospitals and integrated delivery networks. This extensive data provided a solid foundation for my analysis. There are 2 main tables:

  • Demographics: Patient race, gender, and age (71,000+ records)
  • Health: Length of hospital stay, medical specialty, procedures performed, and medications (101,000+ records).

Key Takeaways

  • Most patients leave the hospital within 7 days, which is crucial for optimizing bed availability.
  • Thoracic Surgery, Cardiovascular Surgery, and Radiology are the busiest specialties.
  • There are minimal differences in lab procedures among racial groups, indicating equitable treatment.
  • More lab procedures are linked to longer hospital stays.
  • The analysis of medication usage offers insights into age-specific treatment strategies.

The Data Analysis

Hospital Stay Length

Time in the hospital is a BIG part of hospital operations. The hospital has one limiting resource, their size. Our health care data analyst boss wants to know what the distribution of time spent in the hospital looks like. They're also curious to know if the majority stay less than 7 days. Once patients stay over 7 days, the hospital wants to ensure these patients are very acute. For this, I first created a bucket for the number of days patients stay in the hospital, and then created the bar histogram using the RPAD function.

This gave me the answer that most of the patients stay less than 7 days in the hospital.

Article content
Bucketing and RPAD-Based Visualization of Time in Hospital


Article content
Histogram of Time in Hospital Using RPAD for Visual Frequency Representation

Medical Specialties with the Highest Procedure Averages

Next, there is a hypothetical brand new Hospital Director who wants to know what medical specialties are doing the most number of procedures on average, since procedures make up the significant cost of the hospital. They've asked to get a list of all the specialties and the average total of the number of procedures.

I used a HAVING clause, limiting the focus only on specialties with over 50 patients and an average of more than 2.5 procedures. This helped highlight the departments driving the bulk of procedural activity. I also used a ROUND function, to round up my average procedures up to 1 decimal unit.


Article content
Identifying High-Procedure Medical Specialties Using AVG and GROUP BY

The findings indicate that specialties such as Thoracic Surgery, Cardiovascular Surgery, Radiology, Cardiology, and Vascular Surgery perform the highest average number of procedures. This suggests their critical role in the hospital, likely driven by high patient demand for both surgical interventions and diagnostic services.


Article content
Ranking Medical Specialties by Average Procedure Volume

Checking for Racial Disparities in Lab Procedure Access

Next, the Chief of Nursing wants to know if the hospital seems to be treating patients of different races differently, specifically with the number of lab procedures done.

To find this, I used a JOIN function, to join the demographic and health tables, using patient_num as a foreign key relationship. Then, I calculated the average number of lab procedures per racial group by grouping the data by race and sorting the results in descending order to identify any noticeable differences.


Article content
Average Number of Lab Procedures by Race Using JOIN and GROUP BY

The analysis reveals that the average volume of lab procedures among different racial groups varies only slightly. This small difference indicates there is no significant disparity in how lab services are provided across races. Therefore, the data does not support the claim of unequal treatment based on race.


Article content
Analyzing Lab Procedure Averages Across Racial Groups

Exploring the Relationship Between Lab Procedures and Hospital Stay

Next, the boss asked me to give some insight on how the number of lab procedures might correlate to the number of days in hospital. They want to know do those who get a lot of lab procedures, stay in the hospital longer?

For this, I used a CASE WHEN function, that created a column called procedure_frequency that is either "few", "average", or "many". Next, by using GROUP BY and AVG functions I could evaluate how the average length of stay changes based on what procedure frequency group.


Article content
Categorizing Lab Procedure Volume Using CASE Statement

I found that there is a correlation between these; the longer someone is at the hospital the more lab procedures they have.


Article content
Lab Procedure Bucketing into 'Few', 'Average', and 'Many'

Patient ID List for Targeted Medical Testing

The research team issued an urgent request to identify patients with patient ID for upcoming medical tests—focusing on African American individuals and those with an increased metformin dosage.

To fulfill this, I used the UNION function to combine relevant results from the demographics and health tables, producing a unified list of patient IDs meeting either criterion.


Article content
Identifying Patients Who Are African American or Have Increased Metformin Dosage Using UNION


Article content

Highlighting Emergency Patient Success Stories

The hospital administrator sought to identify notable cases involving patients admitted through the emergency department (admission type ID = 1) who had hospital stays shorter than the overall average.

To uncover these cases, I wrote a query that retrieved all patient records, applied a filter for emergency admissions, and further refined the results to include only those with a length of stay below the calculated overall average, using a subquery for comparison.


Article content
Emergency Admissions with Hospital Stay Below Average (Subquery Approach)

Article content
Identifying Short-Stay Emergency Admissions Using Subquery

Top 50 Patients by Medication Count and Lab Procedures

Our boss has asked us to write a summary for the top 50 medication patients, make any ties with the number of lab procedures the sort the highest at the top. The output should be in the following format "Patient [patient_id] was [race] and was [readmitted/not readmitted]. They had [x] medications and [x] lab procedures."

For this, I used a query that combines patient demographic and medical data into a single summary string using SELECT and CONCAT, labels readmission status with a CASE statement, and joins the necessary tables. It ranks patients by medication count (breaking ties with lab procedures) using ORDER BY, and returns the top 50 results with LIMIT.


Article content
Concatenated Patient Summary Sorted by Medications and Lab Procedures


Article content

When I reviewed this, I realized there is a large gap in number of procedures. Further investigating, I realized that number of lab procedures were stored in VARCHAR, which I ALTER to INT (integer) to give me following result.


Article content
Using ALT to clean the data


Article content

Ranking Common Diabetes Medications by Patient Age

Lastly, the Hospital manager asked me, "How do the top 3 medications are ranked per age group?" the top 3 medications were insulin, metformin, and glipizide. Now these 3 medications were in 3 different columns in the health table.

To get the answer, I used UNION to stack medication columns into a long format (transposing the table), then wrapped it in a CTE. I joined it with demographics to include age, grouped and counted records by age and medication, and used a window function to rank them.


Article content
Ranking Top Diabetes Medications by Age Group Using UNION and Window Functions


Article content
Analyzing and Ranking Metformin, Glipizide, and Insulin Use by Age

Main Takeaways

Through this project, I learned that targeted data analysis can provide actionable insights that improve hospital operations and patient care. Here are some important points:

  • Understanding bed utilization and patient length of stay helps optimize resource allocation. This insight can guide hospitals to improve patient flow and reduce bottlenecks, enhancing overall efficiency.
  • Identifying high-impact specialties can guide administrative focus and budgeting. Targeted investments in these specialties may yield better patient outcomes and more cost-effective care delivery.
  • Evidence-based evaluations of care delivery equity are essential. Regular monitoring ensures all patient groups receive fair treatment and helps address potential disparities proactively.
  • Recognizing correlations between procedures and length of stay aids in discharge planning. Clinicians and administrators can use this information to anticipate patient needs and streamline care transitions.
  • Understanding medication usage patterns informs treatment strategies tailored to patients' needs. Personalizing medication protocols based on age and other factors can improve effectiveness and patient adherence.

Conclusion and Personal Reflections

This project taught me a lot about the importance of data quality and the impact of analysis on real-world healthcare decisions. Cleaning the data was challenging, but it reinforced the idea that accuracy is vital for trust in healthcare analytics. I also gained insight into how data can drive equity and efficiency in patient care, shaping my perspective as I move forward in my career.

Call to Action

If you found this article insightful, I invite you to connect with me on LinkedIn! Let’s share thoughts and explore opportunities together. If you or someone you know is looking to hire a data/healthcare analyst, let's chat. I would love to hear your thoughts or answer any questions you have!

Enlighten! I like how you clearly presented the insight.

You’ve done a fantastic job with this project. Your personal voice adds real depth.

Like
Reply

To view or add a comment, sign in

More articles by Asmi Dave, PT, DPT, CI, MSHI

  • From Ore to Insight: Quality Forecasting with Python

    Data tells a deeper story than we might expect at first glance. For this project, I dove into a mining process dataset…

    1 Comment
  • Analyzing NBA Data with Tableau (2022)

    Working on projects at opposite ends of my comfort zone has been a unique learning experience. My previous project in…

    12 Comments
  • Excel On Delivery: DoorDash Data Unboxed

    DoorDash is a familiar name—we’ve all seen it, and most of us have used it at some point. With so many orders and…

    5 Comments

Others also viewed

Explore content categories