Pythonizing Business Efficiency: Bioenergy Supply Chain Optimization
Quest for Sustainable Energy
So, I recently participated in my very first Shell.ai Hackathon (Waste to Energy — Shell.ai Hackathon 2023). The problem for this year is one that most, if not every, business intrinsically faces today — supply chain optimization. Irrespective of the industry, the million-dollar question is — how can we increase efficiency while reducing cost? In this case study, we are dealing with the bioenergy industry- something I am excited about as the world cranks down on fossil fuels.
I will now provide an overview of the problem. As promising as an alternative fuel source is, it comes with its caveats, limiting widespread adoption.
Now, let us envision the setup of a biorefinery in a geographic region. To do this effectively, one needs a deep understanding of the region’s current and future biomass production. The collection and transportation of this biomass to intermediate depots become crucial steps in the process. It is a complex puzzle that requires careful planning and coordination.
In a nutshell, while the potential for cleaner alternatives is immense, there are hurdles to jump. However, with determination, collaboration, and innovation, we can clear these hurdles, leading us towards a more sustainable and eco-friendly future for all.
Data is Everything
You may refer to the original competition’s problem statement here, but I encourage you to read along to better understand it in ‘“natural language”.
If you look over the state of Gujarat, India, on Google Maps, you’ll find agricultural fields spread all over it. These are the fields where the biomass is generated and harvested in variable quantities, i.e., each site may produce a different amount of biomass, which could vary over time (in this case, years). This fluctuation could be due to several factors, such as the size of the agricultural field, its productivity, type of crop, and climatic conditions.
Furthermore, there is an interconnecting of road networks between the fields. The presence of curves and corners in the road networks indicates in some cases, the road leading from field-A to field-B is not the same road that leads from field-B to field-A. Thus, the distances between two fields could be different based on what direction you’re going. Since there is a sparse distribution of the biomass across the region, it only makes sense that we have intermediate locations, like a depot, where we could gather biomass from nearby fields to be preprocessed (i.e., dehydrated and densified into pellets) before being shipped to the nearest refineries.
Since we do not have infinite resources, we would have to consider some constraints in solving this problem.
To solve this problem, we need data. Thankfully, we don’t have to go scavenging to find one. Here is the dataset we’ll be working with.
Biomass_History.csv: Here, we have a time series of biomass production history in Gujarat from 2010 to 2017. The breakdown considers the arable land, depicted as a map comprising 2418 uniformly sized grid blocks, representing the distinct harvesting sites. Accompanying the dataset is the location index, latitude, and longitude.
Distance_Matrix.csv: The travel distance between the source grid block and the destination grid block is encapsulated within a matrix of dimensions 2418 by 2418. As hinted above, it’s important to note that this matrix is not symmetric owing to factors such as U-turns and one-way routes, which contribute to variations in distances for trips from source to destination as opposed to trips from destination to source.
sample_submission.csv: Contains sample format for the solution submissions.
You can find the full dataset here.
Recommended by LinkedIn
Given the prerequisites above, our task is to:
First, forecast the volume of biomass for each biomass site for both the year 2018 and 2019.
Second, determine the optimal number of depots and refineries needed to process that amount of forecasted biomass. It should be capable of processing at least 80% of the total forecasted biomass each year.
Third, determine the optimal locations to build the depots, considering the amount of biomass produced in each site and the distances from each other. Depots must be within the distance matrix of the biomass sites. Indeed, my first thought was to locate the depots in the harvest sites that produce the most biomass, but the issue with that idea is that it only considers the amount and not the distances the biomass has to travel. As you’ll soon see on the map, there is a cluster of some of the most productive biomass sites in a few areas. We could build the 25 depots there, but that would not be optimal because the rest of the sites would have to travel longer distances to transport the biomass for preprocessing, defeating the reason we decided to have these depots in the first place. The same goes for the refineries.
Fourth, we must allocate the sites (location) and the amount (quantity) of biomass to haul to each depot. Subsequently, we must allocate the site and the amount of pellets to haul from each depot to the refineries.
Without further ado, let’s jump right in.
Step 1: Exploratory Data Analysis (EDA)
We begin by importing several Python packages.
import pandas as pd
import geopandas as gpd
pd.set_option('display.max_columns', 200)
import os
Next we load our datasets.
DATA_PATH = "/kaggle/input/shell-ai-waste-to-energy-dataset"
bh_dataset = pd.read_csv(os.path.join(DATA_PATH, "Biomass_History.csv"))
dm_dataset = pd.read_csv(os.path.join(DATA_PATH, "Distance_Matrix.csv"))
ss_dataset = pd.read_csv(os.path.join(DATA_PATH, "sample_submission.csv"))
bh_dataset.head()
Next we combine the longitudes and latitudes to create a point geography for plotting.
# Create point geometries
geometry = gpd.points_from_xy(bh_dataset.Longitude, bh_dataset.Latitude)
geo_df = gpd.GeoDataFrame(bh_dataset, geometry=geometry)
# Plot biomass production distribution for the year 2017
geo_df.plot("2017", legend = True)
To keep the length of the post in check, I will be publishing the solution in chunks over several manageable posts. Please leave a like (clap) if you enjoyed this post. See you soon!