Data Visualization - IPL Dataset
Data Visualization on IPL Dataset

Data Visualization - IPL Dataset

Hey people, we've performed a few basic EDA commands on the IPL dataset. Now we are going to learn how to visualize the data using the Seaborn and Matplotlib libraries.

Why do we need to Visualize data?

Fact: Humans remember more than they hear.

A picture can say a thousand words. Instead of analyzing each term, we can visualize them so that the maximum amount of data is analyzed. People can easily understand what a picture represents and the data in it.

There are several types of representation of data in terms of shape or relationship. We will try to cover a maximum of shapes and relations to make you understand the visualization.

Bar Graph

  • A bar graph enables you to visualize the data between two variables in bars.
  • Here, we've used the Matplotlib and Seaborn libraries to make it visual. (Which is imported as plt and sns)
  • Here, xticks refers to the labels that we used to represent each bar in the graph.
  • xlabel and ylabels are used to indicate on which axis we need to plot which column's data.
  • title enables us to give a title to the bar graph, as its name suggests. The same goes for the show

Article content
Article content

Line Chart

  • figure function enables us to give the size of the image in which we get our graph visualized.
  • we need to include the column's name in the lineplot function as mentioned.

Article content

Histogram

  • A histogram is similar to the bar graph, but here all the bars stick together so that we can analyze the data easily without strain.
  • The first line of code plt.figure(figsize=(10, 6))creates a new figure with a width of 10 inches and a height of 6 inches.
  • This is the size of the plot that will be displayed.

sns.histplot(df['home_runs'], bins=20, kde=True, color='blue', label='Home Team Runs') 
sns.histplot(df['away_runs'], bins=20, kde=True, color='red', label='Away Team Runs')        

  • Use the Seaborn library to create two histograms, one for the home team runs and one for the away team runs.
  • The bins The parameter controls the number of bins in the histogram, and the kde parameter controls whether to show a smoothed version of the distribution (called a kernel density estimate).
  • The color The parameter controls the color of the bars, and the label parameter adds a label to the legend.

plt.xlabel('Runs') plt.ylabel('Frequency') 
plt.title('Distribution of Runs Scored by Home and Away Teams')        

  • set the labels for the x-axis, y-axis, and title of the plot.

plt.legend() 
plt.show()        
Article content
Histogram Code Implementation
Article content
Histogram Visualization

Scatter Plot

The first line of code, plt.figure(figsize=(10, 6)), creates a new figure with a width of 10 inches and a height of 6 inches. This is the size of the plot that will be displayed.

sns.scatterplot(x='home_score', y='away_score', data=df)
        

uses the Seaborn library to create a scatter plot of the home team score vs. the away team score. The x and y parameters specify the columns in the df DataFrame that contain the home team score and away team score, respectively.

plt.xlabel('Home Team Score')
plt.ylabel('Away Team Score')
plt.title('Scatter Plot of Home Team vs. Away Team Scores')
        

set the labels for the x-axis, y-axis, and title of the plot.

plt.show()
#displays the plot.
        
Article content
Scatterplot Code Implementation
Article content
Scatterplot Visualization

Pie Chart

  • The df['decision'].value_counts() The function counts the number of loans in each category of decision (approved or denied) and stores the results in a data frame.
  • The plot(kind='pie', autopct='%1.1f%%') function then creates a pie chart of the data frame, where each slice represents the percentage of loans in each category.
  • The autopct parameter specifies the format of the labels on the pie chart. In this case, the labels are formatted as percentages with one decimal place.


Article content
Pie Chart Code Implementation
Article content
Pie Chart Visualization

HeatMap

A heatmap is a graphical representation of data where values are depicted by color.

Heatmaps can be used to show a variety of things, such as:

  • The distribution of data: A heatmap can be used to show the distribution of data in a dataset. For example, a heatmap of the temperatures in a city can show how the temperatures vary across the city.
  • The relationship between two variables: A heatmap can be used to show the relationship between two variables. For example, a heatmap of the correlation between customer satisfaction and product price can show how customer satisfaction changes with product price.
  • The importance of features: A heatmap can be used to show the importance of features in a machine learning model. For example, a heatmap of the feature importance in a decision tree can show which features are most important for the model's predictions.

sns.heatmap(correlation_matrix, cmap='coolwarm', annot=True, fmt=".2f")        

uses the Seaborn library to create a heatmap of the correlation matrix. The cmap parameter specifies the colormap to use, and the annot parameter specifies whether to annotate the heatmap with the correlation coefficients. The fmt parameter specifies the format of the annotations. In this case, the annotations are formatted as two-decimal-place floats.

Article content
Heatmap Code Implementation
Article content
Heatmap Visualization

Box Plot

  • The sns.set(font_scale=1.2) function is used to set the font scale for all subsequent plots.
  • This means that the font size of all the text in the plot will be 1.2 times the default size. This can be useful for making the text in plots easier to read.
  • The boxplot() function is used to create a box plot. A box plot is a graphical representation of the distribution of data.
  • It shows the median, interquartile range, and outliers for each group of data. The median is the middle value in the distribution, and the interquartile range is the range of values between the first and third quartiles.
  • Outliers are values that are much higher or lower than the rest of the data.

Article content
Box Plot Implementation Code
Article content
Box Plot Graph

Bubble Chart

sns.scatterplot(x='home_runs', y='away_runs', data=df, hue='season', size='season', sizes=(50, 200))
        

  • This uses the Seaborn library to create a bubble chart of the home_runs and away_runs columns in the DataFrame df.

  • The x and y parameters specify the columns to plot on the x-axis and y-axis, respectively.
  • The hue parameter specifies the column to use for coloring the bubbles, and the size parameter specifies the column to use for sizing the bubbles.
  • The sizes parameter specifies the range of sizes for the bubbles.

Article content
Bubble Chart Implementation Code
Article content
Bubble Chart - Visualization


This is all I've learned about Data Visualization, I hope you guys learned something about visualizing data.

In the next article, we are going to implement basic Machine Learning algorithms on this dataset and perform a few actions like

  • Regression
  • And going to create a model in which we predict the best batsman and blower in the IPL.

To not miss further articles, please follow me and comment if you've any doubts regarding Data Visualization.

To view or add a comment, sign in

More articles by Manjunath Irukulla

  • Guide to Observability using SigNoz - Part 2

    Hey, Welcome Back to the Series of Observability using SigNoz! Let’s focus on installing SigNoz on our system. Make…

  • A Comprehensive Guide to Observability using SigNoz

    In the ever-evolving world of DevOps, observability stands as a pillar for maintaining robust and reliable systems…

    2 Comments
  • Kicking Off an Exciting New Project: Intelligent CI/CD Pipeline with AWS and Generative AI! 🚀

    I'm thrilled to announce that we're embarking on a groundbreaking project to build a robust and intelligent CI/CD…

  • Terraform Series - Part 1

    Terraform, it is an Infrastructure as Code tool. As a DevOps or Cloud Engineer, we may work with multiple cloud…

  • EDA Analysis on IPL Dataset

    In this new series, I'm going to share my learnings in Machine Learning. In this article, we are going to learn what…

  • AWS for DevOps - EC2 Instance

    Hey everyone, this is the third article in the series on AWS for DevOps. In this article, we are going to learn about…

  • Introduction to Containerization - Docker

    Containerization is a powerful process that simplifies application deployment and management across various…

  • AWS for DevOps - Part 02

    Prerequisites: To continue this blog you must have created an account on AWS. if you haven't made one, then try this…

  • Source Control Management

    Have you heard the term Source Control Management before? It's okay if not, because in this blog we are going to cover…

    1 Comment
  • AWS for DevOps

    Cloud computing is considered one of the most required skills these days. For those who generally don't know what cloud…

Explore content categories