Exploring Inventory Optimization with Machine Learning - Deep Q Network: A Beginner's Journey

Exploring Inventory Optimization with Machine Learning - Deep Q Network: A Beginner's Journey

Meet Mr. Frost (our fictional character), the proud owner of "Fresh Ice Cream," an ice cream shop known for its delicious, freshly churned treats.

Recently, Mr. Frost wanted to improve his forecast and inventory optimization to maximize his profits.

He started studying machine learning which is a branch of artificial intelligence that focuses on developing algorithms and techniques to enable computers to learn from data and make predictions or decisions without being explicitly programmed.

He considered to start with Reinforcement Learning after reading Mr. Xie`s article (https://towardsdatascience.com/a-reinforcement-learning-based-inventory-control-policy-for-retailers-ac35bc592278). Reinforcement Learning learns to interact with an environment by taking actions and receiving feedback in the form of rewards or penalties. Deep reinforcement learning extends traditional reinforcement learning techniques by incorporating deep neural networks to learn complex representations of the environment alowing superior performance.

Mr. Frost meticulously analyzed the demand patterns for his ice cream. From Monday to Thursday, the demand follows a normal distribution N(3, 1.5), while Friday sees a surge following distribution N(6, 1). The weekend brings even higher demand, with Saturday and Sunday following N(12, 2) distributions. He was amazed that the patterns were the same as the one in Mr. Xie article (the Deep Q Network Python code parameters I used the same as Mr. Xie, however I included the shelf life challenge and used a time based ordering system instead of stock amount).

Mr. Frost Current Strategy: After conducting numerous simulations using Excel, Mr. Frost concluded that ordering ice cream once a week, specifically 42 units per week, aligned with the average weekly demand. To prevent wastage due to the ice cream's short 7-day shelf life, he set out to optimize his inventory management.

Key Parameters:

  • Freezer capacity: 60 units
  • Holding costs: $0.5 per unit
  • Sales price: $5 per unit
  • Order cost for 42 units: $50
  • Delivery lead time: 2 days

This strategy was yelding arounf 1100 dollars per quarter to Mr. Frost. Is it possible that the machine learning algorithm will improve his profits?


The Python Code: Mr. Frost's journey led him to develop a Python code implementing the Deep Q Network to calculate profits for each interaction. His code incorporated considerations for shelf life, ensuring efficient utilization of inventory while minimizing waste.

Results and Insights: Training the model over 1000 iterations yielded promising results. The Reinforcement Learning model demonstrated an average profit of $2500, a significant improvement over Mr. Frost's original method, which yielded $1200 in profits.

Article content


Looking Ahead: Impressed by the outcomes, Mr. Frost is now considering further exploration into Machine Learning to refine his inventory policies. He envisions extending this project to benefit not only his ice cream parlor but also other ice cream shops seeking optimization.


As described above I changed the initial parameters andcreated a new time inventory replenishment technique. I believe the most difficult part to re-write would be the shelf life logic so I am adding it below:

    def step(self, action):
        if action > 0:
            y = 1
            self.order_arrival_list.append([self.current_period + self.lead_time, action])
        else:
            y = 0
        if len(self.order_arrival_list) > 0:
            if self.current_period == self.order_arrival_list[0][0]:
                self.inv_level = min(self.capacity, self.inv_level + self.order_arrival_list[0][1])
                self.order_arrival_list.pop(0)
        demand = self.demand_list[self.current_period - 1]
        units_sold = demand if demand <= self.inv_level else self.inv_level
        reward = units_sold * self.unit_price - self.holding_cost * self.inv_level - y * self.fixed_order_cost
        self.inv_level = max(0, self.inv_level - demand)
        self.inv_pos = self.inv_level
        if len(self.order_arrival_list) > 0:
            for i in range(len(self.order_arrival_list)):
                self.inv_pos += self.order_arrival_list[i][1]
        self.day_of_week = (self.day_of_week + 1) % 7
        self.state = np.array([self.inv_pos] + self.convert_day_of_week(self.day_of_week))
        self.current_period += 1
        self.state_list.append(self.state)
        self.action_list.append(action)
        self.reward_list.append(reward)

        if self.day_of_week == 0 and self.count_days < 7:
            self.sun_stock = self.inv_level
            self.inv_w_scrap = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales += self.week_sales + units_sold
            self.count_days += 1
        elif self.day_of_week == 1 and self.count_days < 7:
            self.mon_stock = self.inv_level
            self.inv_w_scrap = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales = self.week_sales + units_sold
            self.count_days += 1
            #print(self.units_sold_day[self.day_of_week])
            #print(self.week_sales)
        elif self.day_of_week == 2 and self.count_days < 7:
            self.tue_stock = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales = self.week_sales + units_sold
            self.count_days += 1
            #print(self.units_sold_day[self.day_of_week])
            #print(self.week_sales)
        elif self.day_of_week == 3 and self.count_days < 7:
            self.wed_stock = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales += units_sold
            self.count_days += 1
            #print(self.units_sold_day[self.day_of_week])
            #print(self.week_sales)
        elif self.day_of_week == 4 and self.count_days < 7:
            self.thu_stock = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales += units_sold
            self.count_days += 1
        elif self.day_of_week == 5 and self.count_days < 7:
            self.fri_stock = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales += units_sold
            self.count_days += 1
        elif self.day_of_week == 6 and self.count_days < 7:
            self.sat_stock = self.inv_level
            self.units_sold_day[self.day_of_week] = units_sold
            self.week_sales += units_sold
            self.count_days += 1

        #After 1 week start calculating scrap
        if self.day_of_week == 0 and self.count_days >= 7 and self.sun_stock > self.week_sales:
            self.scrap_qty = self.sun_stock - self.week_sales  #Amount scrapped
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week] #last 7 days sales updated
            self.inv_pos = max(0, self.inv_pos-self.scrap_qty)#remove scrap from inventory
            self.sun_stock = self.inv_pos
        elif self.day_of_week == 0 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.sun_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 1 and self.count_days >= 7 and self.mon_stock > self.week_sales:
            self.scrap_qty = self.mon_stock - self.week_sales  # Amount scrapped
            #print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.mon_stock = self.inv_pos
        elif self.day_of_week == 1 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.mon_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 2 and self.count_days >= 7 and self.tue_stock > self.week_sales:
            self.scrap_qty = self.tue_stock - self.week_sales  # Amount scrapped
            #print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales update
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.tue_stock = self.inv_pos
        elif self.day_of_week == 2 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.tue_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 3 and self.count_days >= 7 and self.wed_stock > self.week_sales:
            self.scrap_qty = self.wed_stock - self.week_sales  # Amount scrapped
            #print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales update
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.wed_stock = self.inv_pos
        elif self.day_of_week == 3 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.wed_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 4 and self.count_days >= 7 and self.thu_stock > self.week_sales:
            self.scrap_qty = self.thu_stock - self.week_sales  # Amount scrapped
            #print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.thu_stock = self.inv_pos
        elif self.day_of_week == 4 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.thu_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 5 and self.count_days >= 7 and self.fri_stock > self.week_sales:
            self.scrap_qty = self.fri_stock - self.week_sales  # Amount scrapped
            # print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.fri_stock = self.inv_pos
        elif self.day_of_week == 5 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.fri_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.day_of_week == 6 and self.count_days >= 7 and self.sat_stock > self.week_sales:
            self.scrap_qty = self.sat_stock - self.week_sales  # Amount scrapped
            # print(self.scrap_qty)
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.inv_pos = max(0, self.inv_pos - self.scrap_qty)
            self.sat_stock = self.inv_pos
        elif self.day_of_week == 6 and self.count_days >= 7:
            self.week_sales = self.week_sales + units_sold - self.units_sold_day[self.day_of_week]  # last 7 days sales updated
            self.sat_stock = self.inv_level  # New day stock  /  aqui eu preciso arrumar no futuro ou calcular o inv_level removendo o scrap

        if self.current_period > self.n_period:
            terminate = True
        else:
            terminate = False
        return self.state, reward, terminate        

Currently I'm very into machine learning and the power of data... discovering a new way of predicting through patterns and creating algorithms.

Currently I'm very into machine learning and the power of data... discovering a new way of predicting through patterns and creating algorithms.

Like
Reply

Wow Bruno, this is impressive! Do share your thinking more with the team ! Very interesting!

To view or add a comment, sign in

More articles by Bruno Fink

Others also viewed

Explore content categories