Unlocking HR Insights with Graphs Using Python (with Example Codes)
Data, data, data! Just like any other sector in the ever-evolving world of human resources, data-driven decisions are becoming more critical than ever. To harness the power of HR data, it’s essential to visualize it effectively. In this article, you’ll explore five types of graphs using Python that are perfect for HR data analysis. Each graph has its unique advantages, providing valuable insights into various aspects of HR.
The data I generated as a source for examples is a simple CSV file that contains the following information:
Example code:
#Import libraries
import csv
import os
# Define the data
data = [
['EmployeeID', 'Gender', 'Department', 'RecruitmentSource', 'HiringOutcome', 'Team', 'EngagementScore', 'Productivity', 'EmployeeSatisfaction', 'Tenure', 'PerformanceRating'],
[1, 'Male', 'HR', 'LinkedIn', 'Accepted', 'Team A', 75, 80, 4.5, 3, 'Excellent'],
[2, 'Female', 'Finance', 'Indeed', 'Accepted', 'Team B', 80, 85, 4.8, 4, 'Outstanding'],
[3, 'Male', 'Engineering', 'LinkedIn', 'Rejected', 'Team C', 90, 92, 4.2, 5, 'Outstanding'],
[4, 'Female', 'Marketing', 'Indeed', 'Accepted', 'Team A', 72, 78, 4.0, 2, 'Average'],
[5, 'Male', 'Engineering', 'LinkedIn', 'Accepted', 'Team D', 88, 90, 4.9, 6, 'Outstanding'],
[6, 'Male', 'HR', 'Referral', 'Accepted', 'Team B', 79, 82, 4.6, 2, 'Excellent'],
[7, 'Female', 'Finance', 'LinkedIn', 'Accepted', 'Team C', 82, 88, 4.7, 3, 'Excellent'],
[8, 'Female', 'Engineering', 'Indeed', 'Rejected', 'Team D', 91, 94, 4.4, 7, 'Outstanding'],
[9, 'Male', 'Marketing', 'Referral', 'Accepted', 'Team A', 70, 75, 4.1, 2, 'Good'],
[10, 'Female', 'Engineering', 'LinkedIn', 'Accepted', 'Team B', 86, 89, 4.7, 4, 'Excellent']
]
# Get the desktop directory path
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
# Specify the file path on the desktop
file_path = os.path.join(desktop_path, 'employee_data.csv')
# Create and open the CSV file in write mode
with open(file_path, 'w', newline='') as csv_file:
# Create a CSV writer
csv_writer = csv.writer(csv_file)
# Write the data to the CSV file
csv_writer.writerows(data)
print(f'CSV file "{file_path}" has been created on your desktop successfully.')
This code will help you generate and save a CSV file on your PC desktop. You can generate different data or use already existing data if you have but for this article, I’ll use the data generated above. So, if you’re all set let’s dive in!
1. Mosaic Plot (Marimekko Chart):
- Why it’s used: Mosaic plots are ideal for visualizing categorical data, especially when you want to explore the relationship between two or more categorical variables.
- HR data needed: Use it to analyze employee demographics, such as gender and department, or compare recruitment sources and hiring outcomes.
- Benefits: Mosaic plots provide a clear representation of how categories within different variables intersect, making it easy to identify patterns and trends.
Here’s the example code:
# Example code for creating a mosaic plot using Python
#Import libraries
import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
import matplotlib.pyplot as plt
# Load the dataset (replace 'employee_data.csv' with your actual file path)
employee_data = pd.read_csv('employee_data.csv')
# Create the mosaic plot
plt.figure(figsize=(10, 6))
mosaic(employee_data, ['Gender', 'Department'], title='Mosaic Plot: Gender vs. Department')
plt.show()
This code will create a mosaic plot that visualizes the relationship between gender and department in your employee dataset. You can adjust the column names in the mosaic function to explore other relationships or variables as needed.
2. Treemap:
- Why it’s used: Treemaps are excellent for displaying hierarchical data structures, making them suitable for visualizing HR organizational hierarchies and reporting structures.
- HR Data Needed: Explore the breakdown of employees by department, teams, or hierarchical levels.
- Benefits: Treemaps provide a hierarchical view of data, allowing HR professionals to understand the distribution and relationships within the organization.
Here’s the example code:
# Example code for creating a treemap using Python
#Import libraries
import pandas as pd
import squarify
import matplotlib.pyplot as plt
# Load the dataset (replace 'employee_data.csv' with your actual file path)
employee_data = pd.read_csv('employee_data.csv')
# Calculate the department sizes
department_sizes = employee_data['Department'].value_counts()
# Create labels for each department
labels = department_sizes.index
# Create treemap
plt.figure(figsize=(10, 6))
ax= squarify.plot(sizes=department_sizes, label=labels, alpha=0.7)
# Annotate each square with the number of employees
for i, label in enumerate(labels):
x, y, dx, dy = ax.patches[i].get_bbox().bounds
plt.text(x+dx/2, y+dy/3 , f'{department_sizes[i]}', va='center', ha='center', fontsize=12, fontweight='bold')
plt.axis('off')
plt.title('Employee Breakdown by Department (Treemap)')
plt.show(
This code will create a treemap visualization with numbers representing the count of employees in each department, and the numbers are centered within each rectangle.
3. Heatmap:
- Why it’s used: Heatmaps are effective for visualizing correlations between variables, making them valuable for identifying relationships within HR data.
- HR data needed: Analyze correlations between employee performance metrics, such as engagement scores and productivity.
- Benefits: Heatmaps make it easy to spot trends, outliers, and areas of concern in HR data, facilitating data-driven decision-making.
Recommended by LinkedIn
Here’s the example code:
# Example code for creating a heatmap using Python
#Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset (replace 'employee_data.csv' with your actual file path)
employee_data = pd.read_csv('employee_data.csv')
# Select the columns for analysis
columns_to_analyze = ['EmployeeSatisfaction', 'Tenure']
# Create a correlation matrix
correlation_data = employee_data[columns_to_analyze].corr()
# Create a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_data, annot=True, cmap='YlGnBu', fmt='.2f', cbar=True)
plt.title('Correlation Heatmap of Employee Metrics')
plt.show()
This heatmap will show the correlations between the selected variables, allowing you to analyze how they relate to each other in your employee dataset.
4. Box Plot (Box-and-Whisker Plot):
- Why it’s used: Box plots are a valuable tool for visualizing and summarizing the distribution of a single continuous or numerical variable. They are especially useful when you need to understand the spread, central tendency, and presence of outliers within the data.
- HR data needed: Box plots can be applied to HR data to gain insights into various aspects, such as employee salary distributions, performance rating variations, or tenure across different departments or teams.
- Benefits: Box plots provide a clear summary of key statistics, including the median (central tendency), quartiles (spread), and potential outliers, making it easy to grasp the overall distribution of the variable.
Here’s the example code:
# Example code for creating a box plot using Python
#Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset (replace 'employee_data.csv' with your actual file path)
employee_data = pd.read_csv('employee_data.csv')
# Create a box plot of employee performance ratings by department
plt.figure(figsize=(12, 6))
sns.boxplot(data=employee_data, x='Department', y='EngagementScore', palette='Set2')
plt.xlabel('Department')
plt.ylabel('Engagement Score')
plt.title('Engagement Score by Department (Box Plot)')
plt.xticks(rotation=45)
plt.show()
This plot allows you to detect outliers and understand the central tendency and spread of engagement ratings in your HR data.
5. Stacked Bar Chart:
- Why it’s used: Stacked bar charts are effective for visualizing categorical data and comparing the composition of categories within a variable across different groups or categories.
- HR data needed: Use stacked bar charts to represent how different categories, such as employee genders or recruitment sources, are distributed within various departments or teams.
- Benefits: Stacked bar charts provide a clear visual comparison of category distributions across different groups, allowing HR professionals to easily identify disparities or trends.
Here’s the example code:
# Example code for creating a stacked bar chart using Python
#Import libraries
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset (replace 'employee_data.csv' with your actual file path)
employee_data = pd.read_csv('employee_data.csv')
# Create a stacked bar chart to visualize recruitment source distribution by department
recruitment_data = employee_data.groupby(['Department', 'RecruitmentSource']).size().unstack(fill_value=0)
recruitment_data.plot(kind='bar', stacked=True, figsize=(10, 6), colormap='Paired')
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Recruitment Source Distribution by Department (Stacked Bar Chart)')
plt.xticks(rotation=45)
plt.legend(title='Recruitment Source', loc='upper right')
plt.show()
This chart helps you understand where the organization’s talent is primarily sourced from in each department, which can be valuable for recruitment and hiring strategy decisions.
So here you have it! I hope these examples help you through your analysis journey. I’ll continue to share more graphs, code snippets, and deeper analysis to enhance our understanding of HR data.
Cheers,
Gayane
Thanks for posting Gayane! I’ve been meaning to switch my focus to more data analysis.
Thanks for sharing. These charts look very useful. I’m looking forward to your new articles. 👏
Important graphs for a quick insight into HR analytics. Thank you for sharing Gayane 💡 I'm looking forward to your new articles 🚀👨💻