Python project: This is Not Another Iron Mining Analysis

Python project: This is Not Another Iron Mining Analysis

Background

For this week's project, We will sit behind the wheel of a data analyst hired by a mining company called Metals R' Us. Our first task in this new role will be analyzing data from their flotation plant. The information collected from this analysis will help determine the level of purity of iron concentrate.

The Dataset

The dataset we will analyze in this project was taken from Kaggle and contains real data collected between March 2017 and September 2017. This dataset has 24 columns and 737453 rows.

Article content

The columns were sampled every 20 seconds and every hour. The samples were gathered at 20-second intervals. The date sampled includes the day, month, year, and hour, but does not show the minutes. If you are interested in reviewing this set of data, you can find it at the following link

Please note that this report is for educational purposes only and is part of a project for Data Career Jumpstart. For this project, we will use #Excel, the browser-based notebook #DeepNote, which will allow us to use #Python for this analysis.

Data Cleaning

Before starting our analysis, we needed to connect our Notebook to the libraries Pandas (for data manipulation), Seaborn, and Matplotlib (for data visualization). Given that time will be one of the important measures when creating visualizations, we need to redefine the date column, which is currently being read as a string. Using the Pandas function to_datetime(), we will convert the date column to datetime.

Article content

The Data Analysis

The first step in this analysis will be to provide a summary of statistics for each column. We will use the Pandas function df.describe() to calculate the average, median, as well as minimum and maximum values.

Article content

Given the nature of this business, we have a series of relevant variables to monitor:

  • % Iron Concentrate
  • Silica Concentrate
  • Ore Pulp pH
  • Flotation Column 05 Level

Our engineering team asked us to investigate the readings from July 16, 2017, as it seems something unusual happened. To do this, we will use a date filter with a boolean mask to create a new DataFrame, df_july.

Article content

To focus on specific columns in our existing DataFrame (df_july), we can create a list called important_cols that contains the column names we want to highlight. Then, in one step, we will create a new DataFrame, df_july_important, that contains only the columns from df_july that match the names in important_cols. This will filter df_july to keep only the information we're interested in.

Article content

Correlations

Now, the head of engineering reached out to see if these variables correlate. Using the data visualization library Seaborn, we created scatterplots to visualize the four most relevant variables.

Article content
Article content

Since it is not easy to identify any correlations in the scatterplots, a correlation matrix will provide a clearer perspective on any insights.

Article content

The matrix confirmed that all the correlation values are low. The highest correlation is between % Iron Concentrate and Flotation Column 05 Level, with a value of 0.09.

% Iron Concentrate

The last step in this study is analyzing the % Iron Concentrate changes throughout July 16. Once again, using the Seaborn Library, we will visualize this information.

Article content

The highest and lowest percentages of iron concentration are present at the end of the day, with the highest being around 66.5% and the lowest 64.5%. It is safe to assume that during this day, the concentration levels remained stable with no significant changes.

Conclusion

Throughout the development of this project, it was easy to understand the importance of data for every industry. Mining is no exception, where the right analysis of information can save a company millions of dollars and ensure optimal resource allocation. Although there were no major findings when analyzing the top five most relevant variables for the company, this project demonstrated how using Python together with different libraries can help make informed business decisions.

Thank you for taking the time to read my new article. I would love to hear your thoughts and comments! It means a lot to me. This project was part of the Avery Smith ' Data Career Jumpstart BootCamp that I am a part of. I am diving deeper into the data world and always learning. Please follow me at Andres Cordero and stay connected on this data journey!





Interesting project, Andres! Keep up the good work, and thanks for sharing your project.

Like
Reply

Insightful! You’ve got my attention 👊🏻

To view or add a comment, sign in

More articles by Andres Cordero

Others also viewed

Explore content categories