CDC Data 2021: What Affects Weight?
In 2021, heart disease claimed the lives of 695,000 individuals in the United States, accounting for approximately 1 in 5 deaths for that year. It is a well-established fact that factors such as weight and smoking significantly contribute to the development of heart disease. The purpose of this analysis, conducted using CDC Chronic Disease Indicators, was to practice Python and Pandas programming
Transitioning from the alarming statistics of heart disease, it becomes evident that disseminating this information to the general public
The initial phase of the analysis involved importing Python packages and acquainting myself with the dataset, examining available columns and their respective data types. The dataset encompassed both object and float data types across 17 columns, providing a wide array of variables for potential analysis. For the scope of this investigation, the focus was placed on gender, age, weight, family history of obesity, and smoking status.
A histogram was employed to assess the gender distribution within the dataset, given the recognized biological distinctions between genders and their potential impact on weight outcomes. The results indicated a relatively even distribution, with over 1000 participants for both genders, making conclusions drawn from this analysis with respect to gender more robust. Subsequently, age distributions for both genders were examined, revealing a peak in ages between 15 and 30, with a mean age of 24. Therefore, for the scope of this analysis, ages between 15 and 30 were selected for further investigation.
Scatterplots were generated, using age as the independent variable and weight
Recommended by LinkedIn
To validate these findings, an ANOVA test was conducted, yielding a remarkably low p-value, approaching 0, and a significantly high F-statistic, nearly 57. The diminutive p-value strongly suggests a substantial gender-based difference, further corroborated by the sizable F-statistic, cementing the results.
Similar histograms and ANOVA tests were executed for family history of obesity and smoking status. A notable divergence was observed in the prevalence of family history of obesity, with significantly more individuals having such a history than those without. Conversely, there were substantially more non-smokers than smokers in the dataset. However, caution should be exercised when drawing conclusions from these analyses.
The ANOVA test results for both family history of obesity and smoking status exhibited exceedingly low F-statistics and p-values approaching 1.0. This implies an absence of significant differences between those with and without a family history of obesity and between smokers and non-smokers.
In conclusion, this analysis has shed light on the intricate relationships between gender, age, weight, family history of obesity, and smoking status, illuminating potential risk factors for heart disease. It underscores the importance of disseminating such findings to the public to facilitate informed decision-making and foster healthier lifestyles. Additionally, these insights can guide future research and interventions aimed at mitigating the prevalence of heart disease.
Very interesting topic for analysis! I read about the Annova test somewhere. I have only done a T-test thus far on continuous data. When do you use the Annova test and when do you use T Test?