Data Visualization - IPL Dataset
Hey people, we've performed a few basic EDA commands on the IPL dataset. Now we are going to learn how to visualize the data using the Seaborn and Matplotlib libraries.
Why do we need to Visualize data?
Fact: Humans remember more than they hear.
A picture can say a thousand words. Instead of analyzing each term, we can visualize them so that the maximum amount of data is analyzed. People can easily understand what a picture represents and the data in it.
There are several types of representation of data in terms of shape or relationship. We will try to cover a maximum of shapes and relations to make you understand the visualization.
Bar Graph
Line Chart
Histogram
sns.histplot(df['home_runs'], bins=20, kde=True, color='blue', label='Home Team Runs')
sns.histplot(df['away_runs'], bins=20, kde=True, color='red', label='Away Team Runs')
plt.xlabel('Runs') plt.ylabel('Frequency')
plt.title('Distribution of Runs Scored by Home and Away Teams')
plt.legend()
plt.show()
Scatter Plot
The first line of code, plt.figure(figsize=(10, 6)), creates a new figure with a width of 10 inches and a height of 6 inches. This is the size of the plot that will be displayed.
sns.scatterplot(x='home_score', y='away_score', data=df)
uses the Seaborn library to create a scatter plot of the home team score vs. the away team score. The x and y parameters specify the columns in the df DataFrame that contain the home team score and away team score, respectively.
plt.xlabel('Home Team Score')
plt.ylabel('Away Team Score')
plt.title('Scatter Plot of Home Team vs. Away Team Scores')
set the labels for the x-axis, y-axis, and title of the plot.
plt.show()
#displays the plot.
Pie Chart
HeatMap
A heatmap is a graphical representation of data where values are depicted by color.
Heatmaps can be used to show a variety of things, such as:
sns.heatmap(correlation_matrix, cmap='coolwarm', annot=True, fmt=".2f")
uses the Seaborn library to create a heatmap of the correlation matrix. The cmap parameter specifies the colormap to use, and the annot parameter specifies whether to annotate the heatmap with the correlation coefficients. The fmt parameter specifies the format of the annotations. In this case, the annotations are formatted as two-decimal-place floats.
Box Plot
Bubble Chart
sns.scatterplot(x='home_runs', y='away_runs', data=df, hue='season', size='season', sizes=(50, 200))
This is all I've learned about Data Visualization, I hope you guys learned something about visualizing data.
In the next article, we are going to implement basic Machine Learning algorithms on this dataset and perform a few actions like
To not miss further articles, please follow me and comment if you've any doubts regarding Data Visualization.