DATA VISUALIZATION AND ANALYSIS USING PYTHON

DATA VISUALIZATION AND ANALYSIS USING PYTHON

Inspired by my own interest in data science and a desire to improve my coding skills, I have recently applied Python to a project and wanted to share how I have used Python in my work and to highlight the benefits of working in the Python environment as opposed to the more conventional use of Excel for data management and analysis. I hope this post can be of help to others who have an interest in learning Python to see how it can be applied in the context of their daily work and projects.

By way of background information, I am a petroleum engineer and the following example is from a recent project of mine involving the performance analysis of a group of wells in a producing oilfield.

The project entailed processing and analyzing large amounts of time-series data from wells producing with the aid of electric submersible pumps (ESPs) along with water injection rate and pressure data from a water injection well. The client sent me the data on a regular basis and my job was to use it to develop performance dashboards and provide insight on well performance and make recommendations to optimize production based on these observations.

The data was transmitted in comma separated values (csv) format and consisted of surface and downhole gauge sensor data for pressures, temperatures, injection rates, motor frequency, motor voltage and current. Because of the large amounts of data involved and the inefficiency of importing, graphing and analyzing the data in excel, I decided this was a good opportunity to deploy the data analysis capabilities of Python, in particular the Pandas library, the graphing capabilities of matplotlib and the Jupyter notebooks working environment for writing and running the Python code.

The first step before doing any data analysis was to read the csv data. I first scanned the data to see how it was formatted and saw that it consisted of a combination of dates, times, floating point numbers - sometimes formatted as strings of text with quotation marks and commas used as the thousands separator - along with null values as either missing data or as a dash (-). Figure 1 is a screenshot of the data in one of the csv files:

Figure 1

One of the challenges of learning to code in Python is to learn about the various methods and functions available to execute a particular task. In this case, a number of methods are needed to parse through the data and convert it to a consistent format for doing data analysis, the code for reading the csv file itself; however, boils down to one simple line of code as shown in Figure 3.

First, import the relevant Python libraries; the bit of code ‘%matplotlib nbagg’ triggers the display of interactive charts directly in the Jupyter notebook so that the chart can be viewed in the same notebook where the code is written:

Figure 2

Next identify where the data file is saved and read in the data using the csv reader in Pandas:

Figure 3

The line of code to read in the csv file includes instructions on how to handle the different data formats and to identify which columns of data to import. The parsed and cleaned data is stored in a Pandas dataframe (which is analogous to a spreadsheet in excel). The data is now in a consistent format as shown in Figure 4:

Figure 4


The next step in building a data visualization dashboard was to write the code to generate the charts to visualize the data. I wanted to create a small dashboard which would enable me to visualize all of the relevant data in one place without making the charts too cumbersome. Also, I wanted to be able to use the interactive features of matplotlib to be able to easily pan or zoom in and out of the charts as desired. This is where the advantages of using Python become so apparent – generating these charts in excel is very time-consuming and excel is not capable of generating interactive charts.

While the amount of code for the chart formats in Figure 5 appears extensive, it is only formatting code, and once set up, can be used repeatedly to generate charts from a new data set.

Figure 5


The resulting dashboard looks like this:

Figure 6

With the interactive features of matplotlib, I could zoom in and out of and pan across specific data ranges to get a better idea of how the pump in this well was behaving as well as save an image of the dashboard for inclusion into a PowerPoint presentation. Figure 7 below is a screenshot of the same dashboard, but zoomed in to a subset of the data. Also the x-axes in the charts in each column are linked so that zooming into a specific time frame in one chart also causes the chart above or below it to zoom in to the same time scale. I have included a full screenshot (Figure 7) of the dashboard as it appears in the Jupyter notebook to show the buttons for panning, zooming, and saving the dashboard as an image file.

Figure 7

In addition to the pump data analysis, I was also able to look for communication between the nearest water injection well and one of the surrounding producing wells by comparing the changes in water injection rates with the pump intake pressures in the surrounding wells:

The first step in this process was to read in and parse the data from the csv files for the producer and injector wells – first identifying the path to where the data is stored, then reading and parsing the data:

Figure 8

The next step was to generate a data visualization dashboard to compare water injection rates in one well with downhole pressure data in the nearest offset oil producer. First, though, I had to convert the units of injection rate from m3/hr. to barrels/day:

Figure 9

Then, generate the dashboard to display the data:

Figure 10

The resulting dashboard looks like this:

Figure 11

The multi-colored arrows in the figure 11 indicate changes in injection rate from the injection well and the corresponding pressure response seen in the nearest oil producer.

I also wanted to look at long-term trends in the data and used Python to first concatenate multiple weekly datasets (each weekly dataset consisting of around 12,000 lines of data and over 300,000 data points) into one single data file. Using the concatenated data file, I then re-sampled the data to convert from minute by minute data to daily data to visualize the long term data trends of individual wells and compared them against one another. The operations to re-sample the data take only a fraction of a second for over one million data points. 

To concatenate weekly datasets, I wrote code to create a function which I could then re-run every week to update the concatenated data file with new data. The code for creating the concatenated file uses a loop to iterate through each weekly data file, read the data in the file and append it to a new dataframe – the dataframe is analogous to a worksheet in excel. The new dataframe is then saved to a csv-format file called Well_1Concat.csv:

Figure 12

The output of this function is a csv file with data concatenated from multiple weekly files. . Figure 13 shows how to call the function by simply typing in the function name (concatenateWell_1()) to concatenate data from 14 csv files and save the output into a new file:

Figure 13

Repeating this process with the other wells in my dataset, I generated concatenated datasets for each well. With the concatenated data, I could then re-sample each well dataset on a daily basis and plot the trends in pump intake and discharge pressures from a well to assess the pump’s performance and its ability to generate lift:

Figure 14

In summary, working with the data in Python provided me the freedom to experiment with different visualizations of my data and ultimately facilitated a better understanding of the well and field performance. It freed me up from the time-intensive tasks of importing and filtering data and building charts in excel and to focus instead on well performance analysis. The insight that I gained from creating the dashboards to visualize the data then led to other questions about well and pump performance which I could analyze and evaluate using other tools.

One other key benefit of working with Python is speed. I often had no more than one or two days to process the data, analyze the performance of the wells and pumps and compile the dashboards and analysis and recommendations in a PowerPoint presentation and a weekly report. With Python, I was able to allocate up 90% of my time to analysis and recommendations to the client and limit the data processing time to no more than 10% of my time.

If you have any questions about how to integrate Python in your projects, please don’t hesitate to contact me. I am happy to share my knowledge and am always looking for new and innovative ways to work with data.

Thank you very much for sharing the detailed information.I am also learning python,can you share some datasets with me for practice.Thanks in Advance for your help.Email: sagu94271@gmail.com

Like
Reply

Looks very interesting, Dan, thanks for sharing.

Like
Reply

I am currently doing a python course online. Can you please share the data with me for my practice. Or any other data you feel will help me learn to creat graphs. kazim.ayub@yahoo.com

Like
Reply

Definitely a skill to be learned during these days if you want to be an unconvetional geologist. Working on it!

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories