Analyzing Hospital Data with SQL
Ever wondered what hospital stays can tell us about how efficiently care is delivered?
For this project, I stepped into the role of a healthcare data analyst, using SQL. I wanted to understand what’s happening behind the scenes, from how long patients stay, to how many procedures they go through, and whether care is being delivered equitably.
The dataset had over 170,000 records spread across two tables: one with patient demographics and another with medical data. I looked at everything from age and race to lab procedures and length of stay to spot patterns and opportunities for improvement.
I learned how to write cleaner queries and pull real insights out of raw data, all while getting a deeper look at how hospitals run day to day.
Key Questions
The Data
This dataset comes from a Kaggle Project on Hospital Readmissions and contains 2 main tables;
Syntax Used
Using SQL functions, including JOIN, UNION, CASE WHEN, CONCAT, ROUND, RPAD, HAVING, and other functions to answer real-world questions from leadership within a hospital.
Hospital Stay Breakdown (in days)
Tracking how long patients stay in the hospital is essential for understanding both the quality and efficiency of care. SQL isn't built to create visual charts like histograms, but I was able to query a workaround to create a visual.
The query groups patients by how many days they are staying and rounds them into day-based buckets. I used the RPAD() function to generate a visual bar made out of asterisks, where each *represents 100 patients.
The histogram shows how many patients stayed between 1 and 14 days, with a strong trend: most patients are staying in the hospital for under 7 days. The largest number of patients are staying in the hospital for 1 to 4 days. This indicated that the hospital does a good job of taking care of its patients and keeping beds available for incoming patients who need care.
Which medical specialties perform the most procedures on average?
Understanding which medical specialties perform the most procedures on average is crucial for efficient hospital operations and patient care. It helps healthcare administrators allocate resources like staff, equipment, and budget where they’re needed most. To answer this, I wrote a query that used an aggregate function to calculate the average number of procedures per medical specialty. I added a HAVING clause to focus on only specialties with over 50 patients and an average of more than 2.5 procedures.
The results show that specialties like Thoracic Surgery, Cardiovascular Surgery, Radiology, Cardiology, and Vascular Surgery have the highest average number of procedures.
Are lab procedures being distributed equitably across different racial groups?
This analysis was prompted by a concern from the Chief of Nursing, who had heard a rumor suggesting that access to lab procedures might vary across racial groups.
To analyze, I joined the demographics and health tables using the patient ID (Patient_NBR) as the unique identifier. I then grouped the data by race and calculated the average number of lab procedures for each group, sorting the results in descending order to highlight any significant disparities.
The results show that the average number of lab procedures across racial groups ranges from 40.86 to 44.08. This variation in the data is minimal, indicating that there is no major difference in how lab procedures are administered by race.
Recommended by LinkedIn
Is there a connection between how many procedures a patient has and how long they stay?
The hospital director wanted to understand whether there is a connection between the number of lab procedures a patient receives and their length of stay. I first used SQL aggregate functions MIN, AVG, and MAX to summarize the overall distribution of lab procedures. This initial step provided a baseline for identifying trends and patterns.
Now that I have the minimum, average, and maximum number of lab procedures identified. I created a detailed CASE statement to categorize the patients into 3 distinct groups
The data shows an upward trend: patients who receive more procedures are in the hospital on average longer. This is a strong positive correlation between time in hospital and how many of procedures a patient undertakes.
Patient IDs for Focused Diagnostic Testing
The research team submitted an urgent request to identify patients eligible for upcoming medical tests, focusing on African American individuals and those with a recent increase in their metformin dosage. To fulfill this, I used the UNION operator to combine relevant results from the demographics and health tables, generating a single list of patient IDs.
Top 100 Patients by Medication Count and Lab Procedures
To conclude the analysis, I was tasked with creating a summary of the top 100 patients based on the total number of medications they received, using the number of lab procedures as a tiebreaker. The final output was structured according to a specific format:
"Patient [patient_id] was [race] and was [readmitted/not readmitted]. They had [x] medications and [x] lab procedures.
Main Takeaways
CALL FOR ACTION
I hope you enjoyed my analysis and would love to hear your thoughts! If you have any questions or insights, please leave a comment down below. I am currently looking for a Data Analyst job. If you are a recruiter, hiring manager, or your company is hiring, please reach out to me on LinkedIn or send me an email at serio.aaron@yahoo.com. Thank you for reading, and let's connect!
Email - Serio.aaron@yahoo.com
LinkedIn - Aaron Serio