War on Talent: Identifying talent from your talent pool with Python

Brendan Lys (MAppPsy)

Published May 14, 2019

My observation around organizational approaches to mining talent pools has often found the approaches to be wanting. Organizations amass collections of cv’s from hopeful candidates, yet after the advertised position is filled, these cv’s are very often ignored. Yet these same dormant cv’s may contain candidates with the very skill set you are now searching for.

Admittedly manually reading through each cv is time consuming, often its cheaper and easier to pay for advertising, and then read through the cv’s that are submitted specifically for that role. What if however we could read through our collection of cv’s in mere moments, mining them for specific keywords that relate to the role we are currently looking to hire for? And while there are commercial products, we can simply use Python, an open source language which is one of the most commonly used languages in data science.

Not everyone is familiar with Python, however chances are very good that if you’re in a medium sized organization either one of your HR analysts will have knowledge of Python, or certainly some of your IT crew will do. While the completed code is below, really what I wanted to highlight is that today you can mine that stack of cv’s your organization has collected, and find candidates that match your skill requirements. One of the advantages of Python is that it is readable by humans, you’ll see just over half way down is an object called Skills_sought, this is a list containing the skills that relate to the job vacancy you are looking to fill - simply insert the skills your are looking for into this list. What this code will do, is read through a cv (identified in the command pdfFileObj = open('brendan resume example.pdf', 'rb') , look for those key words, and then provide you with the following output:

['machine learning', 'reporting', 'analysis', ‘Excel’, 'organizational psychology', 'workforce strategy', 'HR', 'Python']

Match to skills sought 88.88 percent match

The full code is below, I built this code to read through a cv in PDF format. Useful additions to this code would include the code iterating through a list of cv’s, and also outputting the results to an Excel spreadsheet. Your friendly HR analyst or IT person will be able to do this for you, my goal was to share the main aspect of the code, to allow interested parties to read through it and implement it quickly.

import PyPDF2

# pdf file object

# you can find find the pdf file with complete code in below

pdfFileObj = open('brendan resume example.pdf', 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

#the .getNumPages returns the number of pages in the document, this is required for the count function.

pages = pdfReader.getNumPages()

count = 0

#PyPDF2 starts its page numbering at zero, hence count also starts at zero

text = ""

#the following while statement iterates through the pdf reading each page

while count < pages:

pageObj = pdfReader.getPage(count)

count += 1

text += pageObj.extractText()

text = text.replace("\n","") #this line removes the carriage returns within the document

#having read the text with the above code, we can now move on to

keyword_found= [] #this is simply inialising an empty list where we will store the keywords that we've found in the CV

Skills_sought = ["machine learning", "reporting", "analysis", "HR", "organisational psychology", "Python", "Excel", "workforce strategy", "employment relations"]

CVlower = text.lower()

CV = text

for i in Skills_sought:

locate_keyword = CVlower.find(i)

if locate_keyword > -1:

keyword_found.append(i)

for i in Skills_sought:

locate_keyword = CV.find(i)

if locate_keyword > -1:

keyword_found.append(i)

keyword_found = list(dict.fromkeys(keyword_found))

print(keyword_found)

print("Match to skills sought", len(keyword_found) / len(Skills_sought)*100,"percent match")

LynleyShimat Lys 6y

Cool! My dad would really like this kind of thing.

To view or add a comment, sign in

Applying Data Science to Human Resources: Job Families in the spotlight

Apr 30, 2019

War on Talent: Identifying talent from your talent pool with Python

Brendan Lys (MAppPsy)

More articles by this author

Others also viewed

Manufacturing Data using Python

ChatGPT + Python Study (1wk) = ?

Floatation Plant Insights with Python

ROLE OF PYTHON IN THE INDUSTRY (FIELD OF BIOTECHNOLOGY)

Mining social media data using Python (2) - Make API request with Python

Exploring Polynomial Regression in Machine Learning with Python Implementation

Guide on how to use the Algorithm Design Manual

Python Challenge: Number of Hires During Specific Time Period

How Python is Reshaping Quant Hiring in 2025

Getting Prepared for Python Coding Interview Questions for Data Science and Machine Learning Positions

How to Identify Talent in Job Applications

Key Skills Needed for Python Developers

Top Skills for Job Seekers in AI

AI Skills Needed for Workforce and Hiring

Explore content categories

Applying Data Science to Human Resources: Job Families in the spotlight

Apr 30, 2019