DATA VISUALIZATION-DISTRIBUTION PLOT
This article will take a comprehensive look at using distribution plot in Python using the matplotlib and seaborn libraries.
Histograms
A great way to get started exploring a single variable is with the histogram. A histogram divides the variable into bins, counts the data points in each bin, and shows the bins on the x-axis and the counts on the y-axis.
To make a basic histogram in Python, we can use either matplotlib or seaborn. The code below shows function calls in both libraries that create equivalent figures. For the plot calls, we specify the binwidth by the number of bins. For this plot, I will use bins that are 5 minutes in length, which means that the number of bins will be the range of the data (from -60 to 120 minutes) divided by the binwidth, 5 minutes ( bins = int(180/5)).
# matplotlib histogram
plt.hist(flight['arr_delay'],bins=int(180/5),color='blue',edgecolor = 'black')
#seaborn diagram
sns.distplot(flight['arr_delay'], bins=int(180/5),kde=False ,
hist=True, hist_kws={'edgecolor':'black'},
color='blue')
Kernel density estimation
The kernel density estimate may be less familiar, but it can be a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encode the density of observations on one axis with height along the other axis:
#seaborn diagram
sns.distplot(flight['arr_delay'], bins=int(180/5),kde=True ,
hist=False, hist_kws={'edgecolor':'black'},
color='blue')
Scatterplots
The most familiar way to visualize a bivariate distribution is a scatterplot, where each observation is shown with point at the x and y values. This is analogous to a rug plot on two dimensions. You can draw a scatterplot with the matplotlib plt.scatter function, and it is also the default kind of plot shown by the jointplot() function:
#JOINTPLOT
#seaborn
sns.jointplot(x='arr_time', y='arr_delay', data=flight, kind='scatter')
#matplotlib
plt.scatter(x='arr_time', y='arr_delay', data=flight)
Pairplot
To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This creates a matrix of axes and shows the relationship for each pair of columns in a DataFrame. by default, it also draws the univariate distribution of each variable on the diagonal
#seaborn
sns.pairplot(data=flight)
sns.pairplot(data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind='scatter', diag_kind='hist', markers=None, size=2.5, aspect=1, dropna=True, plot_kws=None, diag_kws=None, grid_kws=None)
Rugplot
A rugplot is a graph that places a dash horizontally with each occurrence of an item in a dataset.Areas where there is great occurrence of an item see a greater density of these dashes.Areas where there is little occurrence of an item see just occasional dashes.This is the essence of a rugplot.
#seaborn
sns.rugplot(flight['arr_delay'])