Manufacturing Mining Process
BACKGROUND:
In my latest project, I wanted to get my feet wet in another data tool, Python. I will be focusing on a mining company called Metals R’ Us analyzing their flotation plant. In this flotation plant, Metals R’ Us collects iron from clumps of dirt in order to filter the iron and sell it. Their equipment will clean the iron from its impurities by putting it through a process that allows the metals to rise to the top and the minerals remain at the bottom of the liquid mixture.
THE DATA
The data that I am using is from Kaggle: Quality Prediction in a Mining Process. This dataset focuses on predicting how much impurity is in the ore concentrate. As this impurity is measured either every 20 seconds or every hour, if we can predict how much silica (impurity) is in the ore concentrate.
In this project, the questions that I will be answering are:
KEY FINDINGS
THE ANALYSIS
What is the count, median, min, and max for every column?
In order to get some of these summary stats, I used a very simple Python command to show all this data at once.
Was there an unusual occurrence that happened on June 1, 2017?
First I had to look at the dates and return the earliest day and the latest date.
Now I want just the data from June 1st. I do this by filtering the rows with a boolean mask and create a new data frame called df_june.
After this, I created a dataframe for just the important columns.
Recommended by LinkedIn
How do the most important variables, % Silica Concentrate, Ore Pupl pH, Flotation Column 05 Level, and % Iron Concentrate correlate to each other?
To determine this correlation, I used a Seaborn library and called the pairplot using the important columns data frame as the argument.
Looking at these data plots, there does not seem to be any correlation between the variables.
How does the % Iron Concentrate change throughout the date of June 1st?
To dig deeper into this information, I created a line graph to show the percentage of iron concentrate changed throughout the day.
There are a few spikes during the day for the % Iron Concentrate, but these spikes do fall between the minimum and maximum numbers for this variable.
I then created line charts for the other variables using a For Loop to compare their changes during the same timeframe.
As we look at all the graphs, we can see that they all have similar changes during the day.
RECOMMENDATIONS/ INSIGHTS
Good job, Kimberly!
Hi Kimberly Saylor. Hats off!
Nice, good job Kimberly 👏💪👏