Analyzing Hospital Data with SQL

Analyzing Hospital Data with SQL

Ever wondered what hospital stays can tell us about how efficiently care is delivered?

For this project, I stepped into the role of a healthcare data analyst, using SQL. I wanted to understand what’s happening behind the scenes, from how long patients stay, to how many procedures they go through, and whether care is being delivered equitably.

The dataset had over 170,000 records spread across two tables: one with patient demographics and another with medical data. I looked at everything from age and race to lab procedures and length of stay to spot patterns and opportunities for improvement.

I learned how to write cleaner queries and pull real insights out of raw data, all while getting a deeper look at how hospitals run day to day.


Key Questions

  • What is the distribution of patient hospital stays in days?
  • Which medical specialties perform the most procedures on average?
  • Are lab procedures being distributed equitably across different racial groups?
  • Is there a connection between how many procedures a patient has and how long they stay?
  • Can SQL help answer custom requests, like identifying successful emergency cases or filtering for specific types of patients?


The Data

This dataset comes from a Kaggle Project on Hospital Readmissions and contains 2 main tables;

  • Demographics: Patient race, gender, and age (71,000+ records)
  • Health: Length of hospital stay, medical specialty, procedures performed, and medications (101,000+ records)


Syntax Used

Using SQL functions, including JOIN, UNION, CASE WHEN, CONCAT, ROUND, RPAD, HAVING, and other functions to answer real-world questions from leadership within a hospital.


Hospital Stay Breakdown (in days)

Tracking how long patients stay in the hospital is essential for understanding both the quality and efficiency of care. SQL isn't built to create visual charts like histograms, but I was able to query a workaround to create a visual.

The query groups patients by how many days they are staying and rounds them into day-based buckets. I used the RPAD() function to generate a visual bar made out of asterisks, where each *represents 100 patients.

Article content

The histogram shows how many patients stayed between 1 and 14 days, with a strong trend: most patients are staying in the hospital for under 7 days. The largest number of patients are staying in the hospital for 1 to 4 days. This indicated that the hospital does a good job of taking care of its patients and keeping beds available for incoming patients who need care.

Article content

Which medical specialties perform the most procedures on average?

Understanding which medical specialties perform the most procedures on average is crucial for efficient hospital operations and patient care. It helps healthcare administrators allocate resources like staff, equipment, and budget where they’re needed most. To answer this, I wrote a query that used an aggregate function to calculate the average number of procedures per medical specialty. I added a HAVING clause to focus on only specialties with over 50 patients and an average of more than 2.5 procedures.

Article content

The results show that specialties like Thoracic Surgery, Cardiovascular Surgery, Radiology, Cardiology, and Vascular Surgery have the highest average number of procedures.

Article content

Are lab procedures being distributed equitably across different racial groups?

This analysis was prompted by a concern from the Chief of Nursing, who had heard a rumor suggesting that access to lab procedures might vary across racial groups.

To analyze, I joined the demographics and health tables using the patient ID (Patient_NBR) as the unique identifier. I then grouped the data by race and calculated the average number of lab procedures for each group, sorting the results in descending order to highlight any significant disparities.

Article content

The results show that the average number of lab procedures across racial groups ranges from 40.86 to 44.08. This variation in the data is minimal, indicating that there is no major difference in how lab procedures are administered by race.

Article content

Is there a connection between how many procedures a patient has and how long they stay?

The hospital director wanted to understand whether there is a connection between the number of lab procedures a patient receives and their length of stay. I first used SQL aggregate functions MIN, AVG, and MAX to summarize the overall distribution of lab procedures. This initial step provided a baseline for identifying trends and patterns.

Article content
Article content

Now that I have the minimum, average, and maximum number of lab procedures identified. I created a detailed CASE statement to categorize the patients into 3 distinct groups

  • Few: Less than 25 procedures
  • Average: Between 25 and 55 procedures
  • Many: More than 55 lab procedures

Article content

The data shows an upward trend: patients who receive more procedures are in the hospital on average longer. This is a strong positive correlation between time in hospital and how many of procedures a patient undertakes.

Article content

Patient IDs for Focused Diagnostic Testing

The research team submitted an urgent request to identify patients eligible for upcoming medical tests, focusing on African American individuals and those with a recent increase in their metformin dosage. To fulfill this, I used the UNION operator to combine relevant results from the demographics and health tables, generating a single list of patient IDs.

Article content


Article content
Preview of IDs for patients flagged by either condition

Top 100 Patients by Medication Count and Lab Procedures

To conclude the analysis, I was tasked with creating a summary of the top 100 patients based on the total number of medications they received, using the number of lab procedures as a tiebreaker. The final output was structured according to a specific format:

"Patient [patient_id] was [race] and was [readmitted/not readmitted]. They had [x] medications and [x] lab procedures.
Article content

  • SELECT + CONCAT: Combined key patient details into a single summary sentence.
  • CASE Statement: Evaluated each patient's readmission status and labeled them accordingly.
  • JOIN: Merged the demographics and health tables to include both personal and clinical information.
  • ORDER BY: Ranked patients based on the number of medications, using lab procedures to resolve any ties.
  • LIMIT: Returned only the top 100 patients, as specified in the request.

Article content

Main Takeaways

  • The majority of patients were discharged within 1–4 days, suggesting efficient patient flow and resource management.
  • Thoracic and cardiovascular surgery specialties had the highest average procedure counts, reflecting the intensive nature of those departments.
  • The average number of lab procedures showed minimal variation between racial groups, suggesting equitable lab testing practices in this dataset.
  • Patients who received more lab procedures tended to stay in the hospital longer, indicating a possible connection between treatment complexity and length of stay.


CALL FOR ACTION

I hope you enjoyed my analysis and would love to hear your thoughts! If you have any questions or insights, please leave a comment down below. I am currently looking for a Data Analyst job. If you are a recruiter, hiring manager, or your company is hiring, please reach out to me on LinkedIn or send me an email at serio.aaron@yahoo.com. Thank you for reading, and let's connect!

Email - Serio.aaron@yahoo.com

LinkedIn - Aaron Serio

To view or add a comment, sign in

More articles by Aaron Serio

Others also viewed

Explore content categories