Simplistic Automation of Waterflood Forecasting
Background
This Python script was initially written for a friend who was facing the challenge of generating basic production forecasts for dozens of oil-producing wells under waterflood, using very limited data. Tackling this manually with "your tool of choice" would have been time-consuming and tedious. To address this, I suggested automating part of the process by writing a simple Python script. I’m now sharing this with the broader community, as I believe others may face similar challenges.
Disclaimer
This script was originally developed in a Jupyter Notebook during a quiet Sunday afternoon. No effort was made to optimize the code or adhere to advanced Pythonic principles, and the comments are minimal. The functionality, as well as the reservoir engineering principles applied, are basic—this was intentional, given the scope of the task.
Problem Context and Assumptions
From the above, it's clear why I've referred to this as a "simplistic" solution.
Input
The script reads data from a .csv file, which I’ve simply named "input.csv." Below is a template for the file:
I believe the column names are self-explanatory, but for clarity, here’s a brief description of each:
The output of the script is a .csv file generated for each well, containing the relevant forecast data such as oil rates, water rates, WOR, and cumulative production for the Low, Mid, and High cases.
The script also generates a plot of oil rates, water rates, and WOR versus cumulative production (Np), in case anyone finds it useful:
Python Code
Below is the script itself. Feel free to use or modify it as needed. I haven’t uploaded it to GitHub yet for proper cloning, so for now, you can use the good old copy-and-paste method from here.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')
# Enter the actual path to the working folder here!!!
# Out folder will be created there with all output data.
working_dir = r'C:\Users\molok\Scripts\\'
outdir = f'{working_dir}Out'
if not os.path.exists(outdir):
os.mkdir(outdir)
df = pd.read_csv(f'{working_dir}input.csv', skiprows=1,
names=['Well', 'StartDate', 'EndDate', 'Qoil', 'Qoim', 'Qoih', 'Dil', 'Dim', 'Dih', 'Qliq', 'Qcut', 'WORcut'],
parse_dates=['StartDate', 'EndDate'], dayfirst=True, engine='python')
for idx, row in df.iterrows():
buffer = pd.DataFrame({'Date':pd.date_range(start=row['StartDate'],
end=row['EndDate'], freq='MS')})
buffer['DaysMonth']=buffer['Date'].dt.daysinmonth
buffer['CumProdDays']=buffer['Date'].dt.daysinmonth.cumsum().shift().fillna(0)
buffer['Well'] = row['Well']
buffer['Qol'] = row['Qoil']*np.exp(-row['Dil']*buffer['CumProdDays'])
buffer['Qom'] = row['Qoim']*np.exp(-row['Dim']*buffer['CumProdDays'])
buffer['Qoh'] = row['Qoih']*np.exp(-row['Dih']*buffer['CumProdDays'])
buffer['MonthProdLow'] = buffer['Qol']*buffer['DaysMonth'].shift(-1)
buffer['MonthProdMid'] = buffer['Qom']*buffer['DaysMonth'].shift(-1)
buffer['MonthProdHigh'] = buffer['Qoh']*buffer['DaysMonth'].shift(-1)
buffer['Qwl'] = row['Qliq'] - buffer['Qol']
buffer['Qwm'] = row['Qliq'] - buffer['Qom']
buffer['Qwh'] = row['Qliq'] - buffer['Qoh']
buffer['MonthWatProdLow'] = buffer['Qwl']*buffer['DaysMonth'].shift(-1)
buffer['MonthWatProdMid'] = buffer['Qwm']*buffer['DaysMonth'].shift(-1)
buffer['MonthWatProdHigh'] = buffer['Qwh']*buffer['DaysMonth'].shift(-1)
buffer['WOR_low'] = buffer['Qwl']/buffer['Qol']
buffer['WOR_mid'] = buffer['Qwm']/buffer['Qom']
buffer['WOR_high'] = buffer['Qwh']/buffer['Qoh']
buffer.loc[buffer['Qol'] < row['Qcut'], ['Qol', 'Qwl', 'WOR_low']] = np.NaN
buffer.loc[buffer['WOR_low'] > row['WORcut'], ['Qol', 'Qwl', 'WOR_low']] = np.NaN
buffer.loc[buffer['Qom'] < row['Qcut'], ['Qom', 'Qwm', 'WOR_mid']] = np.NaN
buffer.loc[buffer['WOR_mid'] > row['WORcut'], ['Qom', 'Qwm', 'WOR_mid']] = np.NaN
buffer.loc[buffer['Qoh'] < row['Qcut'], ['Qoh', 'Qwh', 'WOR_high']] = np.NaN
buffer.loc[buffer['WOR_high'] > row['WORcut'], ['Qoh', 'Qwh', 'WOR_high']] = np.NaN
buffer['Np_low']=buffer['MonthProdLow'].cumsum()
buffer['Np_mid']=buffer['MonthProdMid'].cumsum()
buffer['Np_high']=buffer['MonthProdHigh'].cumsum()
fullname = os.path.join(outdir, row['Well'])
buffer.to_csv(fullname+'.csv')
plt.figure(figsize = (14, 6))
plt.scatter(buffer['Np_low'], buffer['WOR_low'], color='red')
plt.scatter(buffer['Np_mid'], buffer['WOR_mid'], color='green')
plt.scatter(buffer['Np_high'], buffer['WOR_high'], color='blue')
plt.title(row['Well'])
plt.xlabel('Np, STB', fontsize=10)
plt.ylabel('WOR', fontsize=10)
#Save the figure
plt.savefig(fullname+".png", dpi = 300, bbox_inches = "tight")
plt.figure(figsize = (14, 6))
plt.plot(buffer['Date'], buffer['Qol'], color='red')
plt.plot(buffer['Date'], buffer['Qom'], color='green')
plt.plot(buffer['Date'], buffer['Qoh'], color='blue')
plt.title(row['Well'])
plt.xlabel('Date', fontsize=10)
plt.ylabel('Qoil, STB/D', fontsize=10)
#Save the figure
plt.savefig(fullname+"_rate.png", dpi = 300, bbox_inches = "tight")
Very helpful Dimitri! Thanks for sharing
Very informative