Python Processing Plant
Iron Flotation Process

Python Processing Plant

Who likes running experiments, testing hypotheses, or even solving problems? If you are like me, the answer is a big fat YES!! So I guess it makes sense that in previous jobs I was a technician and an engineer for a manufacturing company. I loved getting my hands dirty (metaphorically and literally) in the lab and on the assembly line. Often times I was responsible for designing, testing and analyzing experiments to see if we were doing our processes as efficiently as we could.

For this data project, I have been recently “hired” as a data analyst for a manufacturing company. Let's call them Metals R' Us. I have been given data from their froth flotation processing plant. My goal was to operate with Python functions for analyzing key components found in the data and look for irregularities that might have happened. The "plant manager" wanted an investigation done to see if there were any problems that needed to be addressed. First, let me give you an idea of what the froth process encompasses.

Froth Flotation Process Background

The froth flotation process is widely used in mineral processing when wanting to isolate specific minerals or compounds. This process is used to filter out impurities like dirt, sand and silica while keeping only the specified mineral: which in our case is iron. Basically what happens is the company digs a big hole where they collect big clumps of dirt. Iron is found in those clumps, which is the main item the company is trying to get so they can eventually sell it. They put it through a froth flotation process to come up with cleaner Iron (see figure below).

The company then sends these clumps into a pulp or slurry (mixture of water and ore), mix it with Starch and Amina (which strip the dirt away from the iron) and then shoot air or Nitrogen bubbles at the liquid mixture to get the iron to rise to the top, while the impurities remain at the bottom. The iron is then brought to the surface in a frothy like state, where it is then transferred into a separate area with only the pure iron particles. And then you have the final pure iron product.

No alt text provided for this image
Froth Flotation Process

This process is also used in waste water treatment plants where water is separated from solids or oils. For more information on this frothy process, check out this website or watch this video on Youtube for a more clear explanation.

DATA

The data can be found here. This is real data taken and used to predict the quality in the froth flotation process from March 2017 to September 2017. Column readings are odd in some ways with some columns being sampled every 20 seconds while others are sampled every hour. So necessary steps were taken to clean the data set here.

Three libraries were required for upload and support in Python. Pandas, Seaborn and Matplotlib. Pandas were used for data manipulation while Seaborn and Matplotlib were used for data visualization.

No alt text provided for this image
Python Libraries

To get an idea of what the dataset contained, the function df.head() was used to showcase the first five rows of the dataset with all of the columns and df.shape was used to tell us the exact number of rows and columns in the set. There were 737,453 rows and 24 columns in this dataset.

No alt text provided for this image
df.head() & df.shape

The dates used in this dataset were in text or string form so they had to be converted to a date time column in order to be aggregated. See below the Python function I began with to realize what form was being used and then the converted function to update the string into date format:  df['date'] = pd.to datetime(df['date']).

No alt text provided for this image
String to Date Format

Another oddity with the dataset was that commas were used for the numerical data. I updated the cells to contain periods instead of commas so the numbers would be formatted the same. The function used to address this was df = pd.read_csv('MiningProcess_Flotation_Plant_Database.csv',decimal=",")

No alt text provided for this image
Format Periods & Commas

The data dictionary shown below describes the columns used in the dataset. In the columns, flow is how fast something is moving and level is the height that the frothing occurs from all the bubbles.

No alt text provided for this image
Data Dictionary

The second to last column is "% Iron Concentrate" which is one of the main items to focus on. This tells us how pure the iron is at the end of the flotation process. In order to access certain columns, the function df["% Iron Concentrate"] shows the information relating to that specific column.

No alt text provided for this image
Index Specific Column

Whereas, to access certain rows and their information, the function df.iloc[176:183,:] helps index specific rows to focus on.

No alt text provided for this image
Index Specific Rows

Analysis

Now onto the analysis portion. The boss asked to get some summary statistics for each of the columns. They want to know the average and median, as well as the min & max for every column.

Summary Stats

By using the function df.describe() we can clearly see the wanted information in an easy to see format. Each column is provided with their summary stats.

No alt text provided for this image
Summary Stats per Column

And if we want Python to tell us clearly the range of data we are working with, meaning what was the start, midpoint and end date. I used the MAX, MIN & MEDIAN functions. See Below for details.

No alt text provided for this image
Max, Min, Median Date

June 16 Irregularity?

The boss was going through some reports and wanted me to investigate anything unusual that might have happened on June 16, 2017 (our halfway point of the gathered dataset).

First, I needed to filter the rows with a boolean mask and create a new dataframe df_june:

No alt text provided for this image
16 June 2017 All data

Now this says, let's create a new dataframe called df_june that is actually just the old dataframe, but only where the date is larger than June 15, 2017 at midnight & less than June 17, 2017. The & sign allows for those two conditions to be met. This has helped reduce the number of rows, but we still have all the columns.

I then created a variable that was a list of all the important columns we want to focus on. The % Iron Concentrate is the most important column along with the % Silica Concentrate, Ore Pupl pH, Flotation Column 05 Level and Date. I called this variable important_colsAfter setting that column, I simply created a new dataframe called df_june_important and set it equal to the older dataframe (df_june) in the column of important_cols.

No alt text provided for this image
Important Columns 16 June 2017

HOW DO THEY RELATE?

Now, we have just isolated our variables and have graphs to look at, but what does it mean? It's hard to tell based on those small graphs. And the boss wanted to know how these variables all relate to one another. But since we wanted to look at all these variables & their relationships it would require 6 different scatter plots. The data visualization library Seaborn allows us to do this easily with just one line of code, the pairplot.

No alt text provided for this image
Relational Scatter Plots

This just seems like even more confusing visuals. So I decided to match the graphs with linear regression models.

No alt text provided for this image
Regression Models

Now this was a lot more helpful. It shows us a strong correlation between % Iron Concentrate and % Silica Concentrate. The other relationships are of little significance and don't tell us how variables are related. But it would appear that the higher the % Iron Concentrate would yield lower % Silica Concentrate on this particular day.

More Detail

The boss was still a bit confused and wanted to see how the % Iron Concentrate changed throughout that day. So I made a line plot with Seaborn to show this graph, where the date is expressed in hours.

No alt text provided for this image
16 June 2017 % Iron Concentrate Fluctuations

This graph clearly shows us that there was a major increase in % Iron Concentrate between 6:00 am and 9:00 am & between 9:00 am and 12:00 pm. Thus leading to the sharp decrease in the % Silica Concentrate as seen in the graph before.

The boss found this graph to be very useful. They wanted to see the other variables (important columns) across the same time frame. I thought about putting them all on the same graph, but the units of measure are actually very different from one another. The percentage for Iron & Silica will always be between 0 - 100, but the pH has much lower values whereas the flotation has much higher values.

So instead, I made a few separate graphs to show the boss the fluctuation over the course of the day by using a loop function with Python. I skipped past the first 2 variables because we didn't need to see a date to date comparison or the previous graph we just saw.

No alt text provided for this image
Loop Function
No alt text provided for this image
% Silica & Ore Pulp pH
No alt text provided for this image
Flotation

CONCLUSION

As was mentioned earlier, it is clear to see that when the % Iron Concentrate went up, the % Silica Concentrate went down at the same time. There is nothing significant about the fluctuation of pH. But it is noteworthy to see the sudden jump of flotation level at the end of the day on June 16. This could be explained in part by closing down the plant and stopping the processes, leading to an overfill of material.This data can be useful for future comparisons after giving it over to the boss.

Python and its libraries are able to produce charts fairly easily to show the information you are trying to analyze and can keep businesses running smoothly.

Thank you for reading all of this. If you have any questions feel free to comment below or connect with me Brock Johnson here on LinkedIn.

I am looking for new opportunities and roles in the data world, so if you hear of any or are in the market please reach out, thanks!

Well done! Very informative, but easy to understand.

Like
Reply

To view or add a comment, sign in

More articles by Brock Johnson

  • Europe's Big 5 Football Data at a Glance 2022-2023

    There's a reason why soccer or football is the world's favorite sport. It's often called the "beautiful game".

    3 Comments
  • HR Attrition Rates with R

    Have you ever gotten to the point in a job where you think to yourself “Should I Stay or Should I Go?” (queue music)…

    3 Comments
  • The World Banking Effect

    Welcome to another project done by yours truly. For this project I have been "hired" as a data analyst to look over all…

    2 Comments
  • Interview: NBA Data Analyst Report for the Utah Jazz

    LET'S GO JAZZ! Growing up in Southern Utah, the closest professional sports team was the Utah Jazz. Back when legends…

    3 Comments
  • Diabetes, Hospitals and SQL. Oh My!

    Let's face it, NO ONE likes going to the hospital. Well maybe you do, but I definitely do not.

    4 Comments
  • Mass. Education Analysis via Tableau

    Education around the world can be considered one of the greatest contributions to a thriving society. Countries that…

    3 Comments
  • DoorDash Data Delivery

    Food. Who doesn't love to talk about food?! It's one of life's greatest pleasures.

    12 Comments

Others also viewed

Explore content categories