CDC Data 2021: What Affects Weight?

CDC Data 2021: What Affects Weight?

In 2021, heart disease claimed the lives of 695,000 individuals in the United States, accounting for approximately 1 in 5 deaths for that year. It is a well-established fact that factors such as weight and smoking significantly contribute to the development of heart disease. The purpose of this analysis, conducted using CDC Chronic Disease Indicators, was to practice Python and Pandas programming to gain insights from the data.

Transitioning from the alarming statistics of heart disease, it becomes evident that disseminating this information to the general public is paramount. Such knowledge empowers individuals to make informed decisions and embark on healthier lifestyles, potentially reducing the incidence of heart disease.

The initial phase of the analysis involved importing Python packages and acquainting myself with the dataset, examining available columns and their respective data types. The dataset encompassed both object and float data types across 17 columns, providing a wide array of variables for potential analysis. For the scope of this investigation, the focus was placed on gender, age, weight, family history of obesity, and smoking status.

Article content

A histogram was employed to assess the gender distribution within the dataset, given the recognized biological distinctions between genders and their potential impact on weight outcomes. The results indicated a relatively even distribution, with over 1000 participants for both genders, making conclusions drawn from this analysis with respect to gender more robust. Subsequently, age distributions for both genders were examined, revealing a peak in ages between 15 and 30, with a mean age of 24. Therefore, for the scope of this analysis, ages between 15 and 30 were selected for further investigation.

Article content
Article content
Article content

Scatterplots were generated, using age as the independent variable and weight as the dependent variable, for both genders. While a modest correlation between age and weight was observed for males, there was minimal correlation for females. This divergence implies the presence of numerous lifestyle factors at play among women compared to men.

Article content
Article content

To validate these findings, an ANOVA test was conducted, yielding a remarkably low p-value, approaching 0, and a significantly high F-statistic, nearly 57. The diminutive p-value strongly suggests a substantial gender-based difference, further corroborated by the sizable F-statistic, cementing the results.

Article content

Similar histograms and ANOVA tests were executed for family history of obesity and smoking status. A notable divergence was observed in the prevalence of family history of obesity, with significantly more individuals having such a history than those without. Conversely, there were substantially more non-smokers than smokers in the dataset. However, caution should be exercised when drawing conclusions from these analyses.

Article content
Article content

The ANOVA test results for both family history of obesity and smoking status exhibited exceedingly low F-statistics and p-values approaching 1.0. This implies an absence of significant differences between those with and without a family history of obesity and between smokers and non-smokers.

Article content
Article content

In conclusion, this analysis has shed light on the intricate relationships between gender, age, weight, family history of obesity, and smoking status, illuminating potential risk factors for heart disease. It underscores the importance of disseminating such findings to the public to facilitate informed decision-making and foster healthier lifestyles. Additionally, these insights can guide future research and interventions aimed at mitigating the prevalence of heart disease.

Very interesting topic for analysis! I read about the Annova test somewhere. I have only done a T-test thus far on continuous data. When do you use the Annova test and when do you use T Test?

Like
Reply

To view or add a comment, sign in

More articles by Daniel Chavez

  • Beneath the Surface: A Python Analysis of Concrete Production

    Unlocking Insights from Concrete Manufacturing Data Concrete, the backbone of our modern infrastructure, often goes…

    19 Comments
  • Pharmaceutical Prices Per mg

    As a Quality Assurance Specialist and a former Production Scientist in the pharmaceutical industry, the prices of…

    16 Comments
  • Fortnite Gameplay Analysis: Experience and Sobriety

    "Fortnite" is a popular video game by Epic Games published in 2017. As a third-person shooter battle royale, where…

    22 Comments
  • Song Key’s and Spotify’s Top Streamed Songs 2023

    Pivot tables are an essential function of data analysis and visualization in Excel. After learning new ideas on data…

    2 Comments
  • Patterns of DoorDash Customers

    I’m doing this project as a part of the Data Analytics Accelerator program, where I was challenged to use Excel to…

    4 Comments

Others also viewed

Explore content categories