Python project: This is Not Another Iron Mining Analysis

Andres Cordero

Published May 31, 2024

Background

For this week's project, We will sit behind the wheel of a data analyst hired by a mining company called Metals R' Us. Our first task in this new role will be analyzing data from their flotation plant. The information collected from this analysis will help determine the level of purity of iron concentrate.

The Dataset

The dataset we will analyze in this project was taken from Kaggle and contains real data collected between March 2017 and September 2017. This dataset has 24 columns and 737453 rows.

The columns were sampled every 20 seconds and every hour. The samples were gathered at 20-second intervals. The date sampled includes the day, month, year, and hour, but does not show the minutes. If you are interested in reviewing this set of data, you can find it at the following link

Please note that this report is for educational purposes only and is part of a project for Data Career Jumpstart. For this project, we will use #Excel, the browser-based notebook #DeepNote, which will allow us to use #Python for this analysis.

Data Cleaning

Before starting our analysis, we needed to connect our Notebook to the libraries Pandas (for data manipulation), Seaborn, and Matplotlib (for data visualization). Given that time will be one of the important measures when creating visualizations, we need to redefine the date column, which is currently being read as a string. Using the Pandas function to_datetime(), we will convert the date column to datetime.

The Data Analysis

The first step in this analysis will be to provide a summary of statistics for each column. We will use the Pandas function df.describe() to calculate the average, median, as well as minimum and maximum values.

Given the nature of this business, we have a series of relevant variables to monitor:

% Iron Concentrate
Silica Concentrate
Ore Pulp pH
Flotation Column 05 Level

Our engineering team asked us to investigate the readings from July 16, 2017, as it seems something unusual happened. To do this, we will use a date filter with a boolean mask to create a new DataFrame, df_july.

To focus on specific columns in our existing DataFrame (df_july), we can create a list called important_cols that contains the column names we want to highlight. Then, in one step, we will create a new DataFrame, df_july_important, that contains only the columns from df_july that match the names in important_cols. This will filter df_july to keep only the information we're interested in.

Recommended by LinkedIn

Using Python to understand Mining Data

Ian Steffensen,MBA 2 years ago

Mining Data Treasures: A Dive into Metals R' Us…

Julie S. 1 year ago

From Dirt to Data: Mining Data with Python

Aaron Serio 9 months ago

Correlations

Now, the head of engineering reached out to see if these variables correlate. Using the data visualization library Seaborn, we created scatterplots to visualize the four most relevant variables.

Since it is not easy to identify any correlations in the scatterplots, a correlation matrix will provide a clearer perspective on any insights.

The matrix confirmed that all the correlation values are low. The highest correlation is between % Iron Concentrate and Flotation Column 05 Level, with a value of 0.09.

% Iron Concentrate

The last step in this study is analyzing the % Iron Concentrate changes throughout July 16. Once again, using the Seaborn Library, we will visualize this information.

The highest and lowest percentages of iron concentration are present at the end of the day, with the highest being around 66.5% and the lowest 64.5%. It is safe to assume that during this day, the concentration levels remained stable with no significant changes.

Conclusion

Throughout the development of this project, it was easy to understand the importance of data for every industry. Mining is no exception, where the right analysis of information can save a company millions of dollars and ensure optimal resource allocation. Although there were no major findings when analyzing the top five most relevant variables for the company, this project demonstrated how using Python together with different libraries can help make informed business decisions.

Thank you for taking the time to read my new article. I would love to hear your thoughts and comments! It means a lot to me. This project was part of the Avery Smith ' Data Career Jumpstart BootCamp that I am a part of. I am diving deeper into the data world and always learning. Please follow me at Andres Cordero and stay connected on this data journey!

Data Career Jumpstart 1y

Interesting project, Andres! Keep up the good work, and thanks for sharing your project.

Carlos Braschi 1y

Great job Andres Cordero!

1 Reaction

Johanna Pinzon 1y

Insightful! You’ve got my attention 👊🏻

1 Reaction

See more comments

To view or add a comment, sign in

Python project: This is Not Another Iron Mining Analysis

Andres Cordero

Background

The Dataset

Data Cleaning

The Data Analysis

Recommended by LinkedIn

Correlations

% Iron Concentrate

Conclusion

More articles by Andres Cordero

Others also viewed

The Mining Game: Uncovering Trends of Real World Mining Plants With Python

Unearthing Insights: Analyzing Iron Mining Data with Python

Python Data Analysis for Mining Operations

⛏️ Analyzing Mining Data with Python

Python-Powered Insights: A Data Analyst’s Mining Journey ⛏️📊

From Messy Data to Mining Insights with Python

Digging Up Hidden Patterns: My Journey with Mining Data Analysis

Unearthing Insights: Python Data Analysis in the Mining Industry

Data Mining for Iron Ore

Explore content categories

Background

The Dataset

Data Cleaning

The Data Analysis

Recommended by LinkedIn

Correlations

% Iron Concentrate

Conclusion

More articles by Andres Cordero

People Analytics: What factors explain IBM's high employee turnover?

Sports Analytics: NBA Season Analysis and Visualization

Healthcare Through the SQL Lens

Analyzing the World Bank data with SQL

A Focus on Massachusetts Schools: Where do high school graduates go after graduation?

Unpacking DoorDash's Historical Sales Trends: An In-Depth Analysis for Actionable Insights

Others also viewed

The Mining Game: Uncovering Trends of Real World Mining Plants With Python

Unearthing Insights: Analyzing Iron Mining Data with Python

Python Data Analysis for Mining Operations

⛏️ Analyzing Mining Data with Python

Python-Powered Insights: A Data Analyst’s Mining Journey ⛏️📊

From Messy Data to Mining Insights with Python

Digging Up Hidden Patterns: My Journey with Mining Data Analysis

Unearthing Insights: Python Data Analysis in the Mining Industry

Data Mining for Iron Ore

Explore content categories