Power of Python
After all the years of geotechnical engineering and software development you really begin to appreciate a language like Python. So little code and so much result.. see for yourself!
This post is about reading and plotting a GEF-CPT file (Geotechnical Exchange Format) that represents a CPT (Cone Penetration Test) using Python and nothing but Python. After reading this post I bet you are eager to start learning Python yourself ;-)
CPT?
So what is a CPT? As stated before this stands for a Cone Penetration Test and in plain English this is a test where a fixed size cone is pushed into the ground. The resisting force as well as the amount of friction of the soil is measured which is a great indication of the type of soil layers. This can be interesting for all kind of geotechnical purposes. Let's imagine that you want to build a house.. it would be nice if the foundation is placed on a tough soil layer (high resistance) or else your nice house would just collapse. So if you ever happen to see truck like this..
..you can be quite sure that someone wants to build something at that location!
GEF-CPT
Someday, somewhere someone decided that it would be nice to have some kind of uniform(ish!) format to exchange the results of CPT tests. So after a lot of meetings the GEF-CPT format was agreed upon. It is a nice format with a header containing all metadata and a block of measurement data. It looks a bit like this;
#GEFID= 1, 1, 0
#PROJECTID= 12345-11
#COLUMN= 8
#COLUMNINFO= 1, m, penetration length, 1
#COLUMNINFO= 2, MPa, qc, 2
#COLUMNINFO= 3, MPa, fs, 3
#COLUMNINFO= 4, degrees, i_x, 9
#COLUMNINFO= 5, degrees, i_y, 10
#COLUMNINFO= 6, Sec, SampleTime, 12
#COLUMNINFO= 7, degrees, i_res, 8
#COLUMNINFO= 8, %, Rf, 4
#DATAFORMAT= ASCII
#DATATYPE= REAL8
#LASTSCAN= 1701
#XYID= 31000, 123123, 484343, 0, 0
#ZID= 31000, 0.3200, 0.0001
#PROCEDURECODE= GEF-CPT-Report, 1, 1, 0, -
#PROJECTNAME= Ringdijk-Amsterdam
#REPORTCODE= GEF-CPT-Report, 1, 1, 0, -
#STARTTIME= 8, 37, 0.000000
#OS= DOS
#EOH=
0.0000e+000 2.2240e-001 0.0000e+000 7.7970e-001 5.1130e-001 1.0000e+001 9.3240e-001 0.0000e+000
1.0000e-002 2.2650e-001 0.0000e+000 4.0940e-001 7.8660e-001 1.0910e+001 8.8680e-001 0.0000e+000
2.0000e-002 3.8760e-001 0.0000e+000 9.7470e-001 5.3100e-001 1.1410e+001 1.1099e+000 0.0000e+000
3.0000e-002 3.8760e-001 0.0000e+000 9.7470e-001 5.3100e-001 1.1830e+001 1.1099e+000 0.0000e+000
4.0000e-002 4.3300e-001 0.0000e+000 4.8730e-001 9.4390e-001 1.2240e+001 1.0623e+000 0.0000e+000
5.0000e-002 4.7020e-001 0.0000e+000 -1.5460e-001 1.8092e+000 1.2660e+001 1.8158e+000 0.0000e+000
6.0000e-002 4.9090e-001 3.0000e-003 -2.7050e-001 1.3569e+000 1.3060e+001 1.3836e+000 7.2590e-001
7.0000e-002 4.6190e-001 5.9000e-003 -2.8990e-001 2.2812e+000 1.3480e+001 2.2995e+000 1.4110e+000
8.0000e-002 4.5160e-001 7.6000e-003 -2.8990e-001 2.2812e+000 1.3890e+001 2.2995e+000 1.7781e+000
9.0000e-002 4.4650e-001 8.0000e-003 -7.7290e-001 1.3569e+000 1.4290e+001 1.5616e+000 1.8788e+000
1.0000e-001 4.4130e-001 8.1000e-003 -7.7290e-001 1.3569e+000 1.4710e+001 1.5616e+000 1.9398e+000
1.1000e-001 4.4540e-001 8.2000e-003 1.0916e+000 4.1300e-001 1.5130e+001 1.1671e+000 2.0007e+000
1.2000e-001 4.2480e-001 8.2000e-003 -3.8650e-001 1.3176e+000 1.5530e+001 1.3731e+000 2.0689e+000
1.3000e-001 3.7520e-001 7.3000e-003 -3.6720e-001 2.3600e-001 1.5940e+001 4.3650e-001 1.9030e+000
1.4000e-001 3.7520e-001 7.3000e-003 -3.6720e-001 2.3600e-001 1.6340e+001 4.3650e-001 1.9557e+000
The header consists of keywords (like GEFIF, XYID) and some parameters and after #EOH we find the data seperated by spaces.
Unfortunately this format is not as uniform as you would like it to be. Some companies uses tabs, others use comma's or semicolons etc. Some companies measure depth as a negative value, others as length from #ZID. It is a mess.. believe me.. I've been there. But let's for the sake of clarity assume that we have GEF-CPT that is in excellent format. Then things are really easy.
In this article you can find the basic code to read and plot a GEF-CPT file. Note that this code is open for improvement because we only tested this on a nice valid GEF-CPT but some are not so nice and will crash the script. More GEF files and tests are needed to make it a solid script but hey.. it's a great start!
The script
So what do we do.. let me lead you through the script step by step;
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
One of the great things of Python is the amount of very complex and powerful libraries. Here we use Numpy (scientific computing package), Pandas (excellent for datascience) and Matplotlib (for plotting). On this very, very unusual occasion I used Windows to develop the script (yeah, yeah.. Linux addict and such) which is way I have to tell matplotlib that I do not have a backend to show the plots. Normally using the last matplotlib line will do just fine.
Next up is code that creates a class which is a kind of an object that represents a GEF-CPT file.
class GEF:
def __init__(self):
self._data_seperator = ' '
self._columns = {}
self.x = 0.
self.y = 0.
self.z = 0.
self.dz = []
self.qc = []
self.pw = []
self.wg = []
The init function is called once the object is initialized. In this code we set some parameters to default values (like the x,y,z coordinates) and we initialize some empty lists so we can fill them later. The empty columns dictionary is a nice Python type used to make a key-value relation between the meaning of the data and the position in the data columns.
def readFile(self, filename):
lines = open(filename, 'r').readlines()
reading_header = True
for line in lines:
if reading_header:
self._parseHeaderLine(line)
else:
self._parseDataLine(line)
if line.find('#EOH') > -1:
if self._check_header():
reading_header = False
else:
return
And here is the function that reads the GEF file. It is quite easy actually. We open the file and read all the lines into memory. This is possible because GEF files are not that big so with modern day computers this will not be a problem. The next step is to iterate over all lines and parse the line as part of the header or of the data. We know that the #EOH text in the file is the end of the header and the start of the data so we can use this to decide if the line needs to be parsed as header of data code.
def _check_header(self):
if not 1 in self._columns:
print("Fatale fout > Dit GEF bestand mist een diepte kolom")
return False
if not 2 in self._columns:
print("Fatale fout > Dit GEF bestand mist een qc (conusweerstand) kolom")
return False
if not 3 in self._columns:
print("Fatale fout > Dit GEF bestand mist een fs (plaatselijke wrijving) kolom")
return False
return True
Now this is code that doesn't look really great.. avoid repetition but on the other hand don't optimize unless you have to. What this code does is to check whether we are dealing with a full blown CPT including depth, resistance and friction measurements. If this is the case we return True which means that we will start reading the data. If we miss data we stop the reading process.
But how do know if the data is available? That is where the columns dictionary comes into play. Looking at the GEF file we see lines like this;
#COLUMNINFO= 1, m, penetration length, 1
The meaning of this line is that data in column 1 is in meters, stands for penetration length. The last 1 refers to definitions in the GEF-CPT format where 1 stands for depth, 2 for cone resistance etc. We use this info in the _parseHeaderLine function to fill a dictionary like this;
column[type of data] = column number in the data
So in the case of the previous #COLUMNINFO this means that we get an entry like column[1] = 0. Note that the first entry in a list starts at 0 so 0 stands for the first column of data. Now we can use this info to move the data into the right list. This columninfo information is essential to read a GEF file.
def _parseHeaderLine(self, line):
keyword, argline = line.split('=')
keyword = keyword.strip()
argline = argline.strip()
args = argline.split(',')
if keyword=='#XYID':
self.x = float(args[1].strip())
self.y = float(args[2].strip())
elif keyword=='#ZID':
self.z = float(args[1].strip())
elif keyword=='#COLUMNINFO':
column = int(args[0])
dtype = int(args[-1].strip())
if dtype==11: #override depth with corrected depth, TODO > check of 11 de juiste code is
dtype = 1
self._columns[dtype] = column - 1
Here we have the code to parse a header line. Again, really easy.. we split the line into a keyword and arguments part since we know for certain that it follows the KEYWORD = PARAM1, PARAM2 convention. After that we check for interesting keywords like #XYID for the coordinates or the #COLUMNINFO data. If we find such a keyword we use the args list which is simply a list with all individual parameters.
Note that this code should actually benefit from some error checking because there are a lot of possibilities to crash.. For instance, what if my #XYID did not contain information in the second argument or worse, not a number but some text. That will stop the script so a try - except construction would be a good idea!
One other thing, GEF-CPT files can be equiped with a corrected depth column (which takes the x,y angles into account) so if we find this column (and note that I was too lazy to check if 11 is the code for this.. sorry ;-) we override the depth column with this one.
def _parseDataLine(self, line):
args = line.split(self._data_seperator)
dz = self.z - float(args[self._columns[1]])
qc = float(args[self._columns[2]])
pw = float(args[self._columns[3]])
self.dz.append(dz)
self.qc.append(qc)
self.pw.append(pw)
if qc<=0.:
self.wg.append(10.)
else:
wg = (pw / qc) * 100.
if wg > 10.: wg = 10.
self.wg.append(wg)
Here we have the code that reads the actual data. Again we split the data into a list of arguments and note that we know which column to use because of the columns dictionary based on the #COLUMNINFO keywords in the header. We now simply convert the values to floats and we do some basic calculations like getting the actual depth and making sure that we do not divide by zero. We also use the given data to calculate a value wg that is a great indication of the type of soil.
def asNumpy(self):
return np.transpose(np.array([self.dz, self.qc, self.pw, self.wg]))
def asDataFrame(self):
a = self.asNumpy()
return pd.DataFrame(data=a, columns=['depth','qc','fs','wg'])
These two functions are not really necessary but they are very helpful in further analysis of the data. We use these functions to convert the lists of data like qc, depth etc. into a matrix (Numpy) or a DataFrame (Pandas). Numpy is great for matrix calculations and widely used in engineering scripts. The DataFrame is like Calc / Excel (but better ;-) and has some very interesting and powerful functions, like the one we will see next!
def plot(self, filename):
df = self.asDataFrame()
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(10,12), gridspec_kw = {'width_ratios':[3, 1, 1]})
df.plot(x='qc', y='depth', ax=axes[0], sharey=True, label='qc [MPa]')
df.plot(x='fs', y='depth', ax=axes[1], sharey=True, label='fs [MPa]')
df.plot(x='wg', y='depth', ax=axes[2], sharey=True, label='wg [%]')
for i in range(3): axes[i].grid()
plt.savefig(filename)
Wow.. this is what I really like about Python. I have written GEF readers and viewer before, in C++, in Go, Object Pascal but displaying a GEF is some work. It took me at least half a day to write code that was able to show the loaded data into a plot. With Python it took about 10 minutes! Really, really powerful stuff for engineers!
Now the code... as you can see I first convert the data to a Pandas DataFrame like I mentioned before. After that I use matplotlib code to create the basis of the plot. I state that I want a plot containing 1 row of 3 columns with a size of 10x12 and the ratio of width must be 3:1:1. Next we use the Pandas DataFrame functions to plot the data to the right part of the plot, we share the y-axis and add a label and enable the gridlines. Finally we save the plot to the given filename and that's it.
Putting it all together
So we have defined our object / class.. how do we use it? Well, here's the code to read a GEF-CPT file;
if __name__=="__main__":
g = GEF()
g.readFile('E06-1441.GEF')
g.plot("E06-1441.png")
We create a GEF object, we read a file (sorry, I am not allowed to add that file) and we use the plot function to plot it to an image. Here you see the result;
Beautiful.. almost ready for production ;-)
What now?
Not much actually.. I hope I made you curious about other possibilities with Python and there are many. Just to name a few;
- convert the GEF-CPT to soillayers using correlations
- log the top of the first sand layer
- create a pile foundation algorithm based on the data
- make a 2D crosssection out of multiple GEF-CPT's
- etc
I have already had a lot of fun with writing code to do those thing and highly recommend any engineer to pick up Python and start having fun!
Just a few notes
The code is free to use but it would be nice to refer to me if you use this code (Rob van Putten | breinbaasnl@gmail.com ). The code needs improvement because of the way GEF files are used. It is a mess and you will have to deal with `not so nice` GEF files. Better freshen up your try - except skills!
If you are looking for a nice Python introduction I can recommend https://www.udemy.com/complete-python-bootcamp/ I have used this course a few times during my time as a Python teacher and it is great!
Unfortunately I cannot attach a GEF file to my code due to legal issues so you will have to look up some GEF files to test your code yourself.
Update 2018-04-06
It's a few days and some programming hours later and I have used the previous code to make a nice QGis plugin. Here it is..
I have added the option to import GEF-CPT files in QGis. This will put the information into a Postgres Database and show the locations on a QGis map. Nice..
Retrieving the CPT from the database is easy. Just select a location in QGis and the plugin will show the CPT (left on the screenshot). I use a method described in a publication to correlate the measurements to the possible soil type. Since this will generate a soillayer at every interval (sometimes every cm) this might lead to a huge load of soillayers so I added an option to define a minimal interval (like 0.5m). It is possible to change the correlation method as well as the minimal interval on the fly and the preview will update immediately.
The final step was to write code to export the generated soil layers to geotechnical software (DGeoStability in this case) and auto start the software.
See what happens if you start programming.. you just can't stop. I still have a lot of ideas but it is time for a break so.. see you later!
Steven Thuis Nick Monteny
This is super helpful Rob, thanks! Will definitely be keeping an eye out for any more articles.
Brengt bij mij herinneringen teweeg aan de tijd waarin ik samen met Jos Maccabiani en Henk den Adel in Matlab een module heb geschreven om gef bestanden te kunnen inlezen en te plotten. Python lijkt wel wat op matlab in die zin dat het met arrays werkt. De gef standaard gaat echter over naar xml.
Really a good way to appreciate python