Manufacturing Mining Process

Manufacturing Mining Process

BACKGROUND:

In my latest project, I wanted to get my feet wet in another data tool, Python.  I will be focusing on a mining company called Metals R’ Us analyzing their flotation plant.  In this flotation plant, Metals R’ Us collects iron from clumps of dirt in order to filter the iron and sell it.  Their equipment will clean the iron from its impurities by putting it through a process that allows the metals to rise to the top and the minerals remain at the bottom of the liquid mixture. 


THE DATA

The data that I am using is from Kaggle:  Quality Prediction in a Mining Process. This dataset focuses on predicting how much impurity is in the ore concentrate. As this impurity is measured either every 20 seconds or every hour, if we can predict how much silica (impurity) is in the ore concentrate.  


In this project, the questions that I will be answering are:

  • What is the count, median, min, and max for every column?
  • Was there an unusual occurrence that happened on June 1, 2017?
  • How do the most important variables, % Silica Concentrate, Ore Pupl pH, Flotation Column 05 Level, and % Iron Concentrate correlate to each other?
  • How does the % Iron Concentrate change throughout the date of June 1st?


KEY FINDINGS

  • The count, median, min, and max numbers for % Iron Concentrate are as follows:
  • Count= 737453, Median= 65.05006, Min= 62.05, Max= 68.01
  • Nothing unusual happened during this date to cause any concern.
  • There is no apparent correlation between these variables found.
  • The % Iron Concentrate did fluctuate throughout the day on June 1st as did the other variables that we have analyzed.


THE ANALYSIS

What is the count, median, min, and max for every column?

In order to get some of these summary stats, I used a very simple Python command to show all this data at once.

No alt text provided for this image
No alt text provided for this image


Was there an unusual occurrence that happened on June 1, 2017?

First I had to look at the dates and return the earliest day and the latest date.  

No alt text provided for this image
No alt text provided for this image

Now I want just the data from June 1st.  I do this by filtering the rows with a boolean mask and create a new data frame called df_june.

No alt text provided for this image

After this, I created a dataframe for just the important columns.

No alt text provided for this image
No alt text provided for this image


How do the most important variables, % Silica Concentrate, Ore Pupl pH, Flotation Column 05 Level, and % Iron Concentrate correlate to each other?


To determine this correlation, I used a Seaborn library and called the pairplot using the important columns data frame as the argument.

No alt text provided for this image
No alt text provided for this image

Looking at these data plots, there does not seem to be any correlation between the variables.


How does the % Iron Concentrate change throughout the date of June 1st?

To dig deeper into this information, I created a line graph to show the percentage of iron concentrate changed throughout the day.  

No alt text provided for this image
No alt text provided for this image

There are a few spikes during the day for the % Iron Concentrate, but these spikes do fall between the minimum and maximum numbers for this variable.

I then created line charts for the other variables using a For Loop to compare their changes during the same timeframe.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

As we look at all the graphs, we can see that they all have similar changes during the day.  


RECOMMENDATIONS/ INSIGHTS

  1. I found the count, median, min, and max for the dataset.
  2. There was no unusual occurrence that happened on June 1, 2017.
  3. I did not find any correlation between the variables of the different columns I investigated.
  4. The line charts showed similar changes between intervals throughout the day.


Thank you for reading my latest data analysis project!  Please connect with me and check out my portfolio!

To view or add a comment, sign in

More articles by Kimberly Saylor

  • NBA Basketball Analysis

    For this project, I will be analyzing a dataset from the NBA during the 2021-2022 season. I would like to get a feel…

    6 Comments
  • Healthcare Analysis

    BACKGROUND: Have you or a loved one ever been in a hospital for a procedure or testing? I’m sure most of us have at…

    11 Comments
  • World Bank Analysis

    BACKGROUND: For this analysis, I helped The International Development Association (IDA) examine their data to give them…

    3 Comments
  • Are Massachusetts Students Prepared for Graduation?

    BACKGROUND: Being an educator for several years, our school district's main goal was to prepare our students for the…

    5 Comments
  • DoorDash Marketing Analysis

    BACKGROUND: While my son was in college and living in the dorm, it was very convenient for him to walk across the lawn…

    10 Comments

Others also viewed

Explore content categories