Machine learning in geoscience with scikit-learn. Part 1: checking, tidying, and analyzing the dataset

The idea behind this series of articles is to show how to predict P-wave velocity, as measured by a geophysical well log (the sonic), from a suite of other logs: density, gamma ray, and neutron, and also depth, using Machine Learning.

The log suite is from the same well that Alessandro Amato del Monte used in the Seismic Petrophysics Notebook accompanying his Geophysical tutorial article on The Leading Edge.

I will explore different Machine Learning methods from the scikit-learn Python library and compare their performances.

To wet your appetites, here’s an example of P-wave velocity, Vp, predicted using a cross-validated linear model, which will be the benchmark for the performance of other models, such as SVM and Random Forest:

multilinear

In the first notebook, which is already available on GitHub here, I show how to use the Pandas and Seaborn Python libraries to import the data, check it, clean it up, and visualize to explore relationships between the variables. For example, shown below is a heatmap with the pairwise Spearman correlation coefficient between the variables (logs):

Stay tuned for the next post / notebook!

PS: I am very excited by the kick-off of the Geophysical Tutorial (The Leading Edge) Machine Learning Contest 2016. Check it out here!

Machine learning in geoscience and planetary science with scikit-learn: series outline

  • Machine learning in geoscience with scikit-learn. Part 3: the SEG ML contest
  • Machine Learning in Geoscience with Scikit-learn. Part 4: TBE

sketch2model

This guest post (first published here) is by Elwyn Galloway, author of Scibbatical on WordPress. It is the first in our series of collaborative articles about sketch2model, a project from the 2015 Calgary Geoscience Hackathon organized by Agile Geoscience. Happy reading.

Collaboration in action. Evan, Matteo, and Elwyn (foreground, L to R) work on sketch2model at the 2015 Calgary Geoscience Hackathon. Photo courtesy of Penny Colton.

Welcome to an epic blog crossover event. Two authors collaborating to tell a single story over the course of several articles.

We’ve each mentioned the sketch2model project on our respective blogs, MyCarta and scibbatical, without giving much detail about it. Apologies if you’ve been waiting anxiously for more. Through the next while, you’ll get to know sketch2model as well as we do.

The sketch2model team came together at the 2015 Geoscience Hackathon (Calgary), hosted by Agile Geoscience. Elwyn and Evan Saltman (epsalt on twitter and GitHub) knew each other from a previous employer, but neither had met Matteo before. All were intrigued by the project idea, and the individual skill sets were diverse enough to combine into a well-rounded group. Ben Bougher, part of the Agile Geoscience team, assisted with the original web interface at the hackathon. Agile’s take on this hackathon can be found on their blog.

Conception

The idea behind sketch2model is that a user should be able to easily create forward seismic models. Modelling at the speed of imagination, allowing seamless transition from idea to synthetic seismic section. It should happen quickly enough to be incorporated into a conversation. It should happen where collaboration happens.

The skech2model concept: modelling at the speed of imagination. Take a sketch (a), turn it into an earth model (b), create a forward seismic model (c). Our hack takes you from a to b.

Geophysicists like to model wedges, and for good reasons. However, wedge logic can get lost on colleagues. It may not effectively demonstrate the capability of seismic data in a given situation. The idea is not to supplant that kind of modeling, but to enable a new, lighter kind of modeling. Modeling that can easily produce results for twelve different depositional scenarios as quickly as they can be sketched on a whiteboard.

The Hack

Building something mobile to turn a sketch into a synthetic seismic section is a pretty tall order for a weekend. We decided to take a shortcut by leveraging an existing project: Agile’s online seismic modelling package, modelr. The fact that modelr works through any web browser (including a smartphone) kept things mobile. In addition, modelr’s existing functionality allows a user to upload a png image and use it as a rock property model. We chose to use a web API to interface our code with the web application (as a bonus, our approach conveniently fit with the hackathon’s theme of Web). Using modelr’s capabilities, our hack was left with the task of turning a photo of a sketched geologic section into a png image where each geologic body is identified as a different color. An image processing project!

Agile is a strong proponent for Python in geophysics (for reasons nicely articulated in their blog post), and the team was familiar with the language to one extent or another. There was no question that it was the language of choice for this project. And no regrets!

We aimed to create an algorithm robust enough to handle any image of anything a user might sketch while accurately reproducing their intent. Marker on whiteboard presents different challenges than pencil on paper. Light conditions can be highly variable. Sketches can be simple or complex, tidy or messy. When a user leaves a small gap between two lines of the sketch, should the algorithm take the sketch as-is and interpret a single body? Or fill the small gap and interpret two separate bodies?

Our algorithm needs to be robust enough to handle a variety of source images: simple, complex, pencil, marker, paper, white board (check out the glare on the bottom left image). These are some of the test images we used.

Matteo has used image processing for geoscience before, so he landed on an approach for our hack almost instantly: binarize the image to distinguish sketch from background (turn color image into a binary image via thresholding); identify and segregate geobodies; create output image with each body colored uniquely.

Taking the image of the original sketch (left) and creating a binary image (right) is an integral part of the sketch2model process.

Python has functions to binarize a color image, but for our applications, the results were very inconsistent. We needed a tool that would work for a variety of media in various lighting conditions. Fortunately, Matteo had some tricks up his sleeve to precondition the images before binarization. We landed on a robust flow that can binarize whatever we throw at it. Matteo will be crafting a blog post on this topic to explain what we’ve implemented.

Once the image is binarized, each geological body must be automatically identified as a closed polygon. If the sketch were reproduced exactly as imagined, a segmentation function would do a good job. The trouble is that the sketch captured is rarely the same as the one intended — an artist may accidentally leave small gaps between sketch lines, or the sketch medium can cause unintentional effects (for example, whiteboard markers can erase a little when sketch lines cross, see example below). We applied some morphological filtering to compensate for the sketch imperfections. If applied too liberally, this type of filtering causes unwanted side effects. Elwyn will explore how we struck a balance between filling unintentional gaps and accurate sketch reproduction in an upcoming blog post.

Morphological filtering can compensate for imperfections in a sketch, as demonstrated in this example. The original sketch (left) was done with a marker on white board. Notice how the vertical stroke erased a small part of the horizontal one. The binarized version of the sketch (middle) shows an unintentional gap between the strokes, but morphological filtering successfully closes the small gap (right).

Compared to the binarization and segmentation, generating the output is a snap. With this final step, we’ve transformed a sketch into a png image where each geologic body is a different color. It’s ready to become a synthetic seismic section in modelr.

Into the Wild

“This is so cool. Draw something on a whiteboard and have a synthetic seismogram right on your iPad five seconds later. I mean, that’s magical.”

Sketch2model was a working prototype by the end of the hackathon. It wasn’t the most robust algorithm, but it worked on a good proportion of our test images. The results were promising enough to continue development after the hackathon. Evidently, we weren’t the only ones interested in further development because sketch2model came up on the February 17th episode of Undersampled Radio. Host Matt Hall: “This is so cool. Draw something on a whiteboard and have a synthetic seismogram right on your iPad five seconds later. I mean, that’s magical.”

Since the hackathon, the algorithm and web interface have progressed to the point that you can use it on your own images at sketch2model.com. To integrate this functionality directly into the forward modelling process, sketch2model will become an option in modelr. The team has made this an open-source project, so you’ll also find it on GitHub. Check out the sketch2model repository if you’re interested in the nuts and bolts of the algorithm. Information posted on these sites is scant right now, but we are working to add more information and documentation.

Sketch2model is designed to enable a new kind of collaboration and creativity in subsurface modelling. By applying image processing techniques, our team built a path to an unconventional kind of forward seismic modelling. Development has progressed to the point that we’ve released it into the wild to see how you’ll use it.

Geology in pictures – meanders and oxbow lakes

I love meandering rivers. This is a short post with some (I think) great images  of meandering rivers and oxbow lakes.

In my previous post Google Earth and a 5 minute book review: Geology illustrated I reviewed one of my favorite coffee table books: Geology Illustrated by John S. Shelton.

The last view in that post was as close a perspective replica of Figure 137 in the book as I could get using Google Earth, showing the meander belt of the Animas River a few miles from Durango, Colorado.

What a great opportunity to create a short time lapse: a repeat snapshots of the same landscape nearly 50 years apart. This is priceless: 50 years are in many cases next to nothing in geological terms, and yet there are already some significant differences in the meanders in the two images, which I have included together below.

Meander belt and floodplain of the Animas River a few miles from Durango, Colorado. From Geology Illustrated, John S. Shelton, 1966. Copyright of the University of Washington Libraries, Special Collections (John Shelton Collection, Shelton 2679).

Meander belt and floodplain of the Animas River as above, as viewed on Google Earth in 2013.

For example, it looks like the meander cutoff in the lower left portion of the image had likely ‘just’ happened in the 60s, whereas at the time the imagery used by Google Earth was acquired, the remnant oxbow lake seems more clearly defined. Another oxbow lake in the center has nearly disappeared, perhaps in part due to human activity.

Below are two pictures of the Fraser River in the Robson Valley that I took in the summer of 2013 during a hike to McBride Peak. This is an awesome example of oxbow lake. We can still see where the river used to run previous to the jump.

And this is a great illustration (from Infographic illustrations) showing how these oxbow lakes are created. I love the hands-on feel…

The last is an image of tight meander loop from a previous post: Some photos of Northern British Columbia wildlife and geology.

 

3D sketchfab model of the ternary system quartz – nepheline – kalsilite

Another old model brought back to life. This is the Click on the image below to see the model in action.  This is the ternary system quartz – nepheline – kalsilite, also called petrogeny’s residua system, which is used to describe the composition of many cooled residual magmas and the genesis of very under saturated alkaline rocks.

More on this classification, used for cooled residual magmas, on Alessandro Da Mommio’s website, for example his page on Tracheite.

3D sketchfab model of Yoder Tilley tetrahedron for basalt classification

I just tried Sketchfab today at lunch. In less than five minutes I was able to bring back from the dead my defunct AutoCAD 3D model of the Yoder-Tilley tetrahedron for basalt classification. Click on the image below to see the model in action. If you want to learn more about basalt classification, check on Alessandro Da Mommio’s awesome website.

My top two favorite 3D geology models from today’s browsing:

Fault Propagation Fold – Ryan Shackleton

Logarithmic spiral, nautilus, and rainbow

The other day I stumbled into an interesting article on The Guardian online: The medieval bishop who helped to unweave the rainbow. In the article I learned for the first time of Robert Grosseteste, a 13th century British scholar (with an interesting Italian last name: Grosse teste = big heads) who was also the Bishop of Lincoln.

The Bishops’ interests and investigations covered diverse topics, making him a pre-renaissance polymath; however, it is his 1225 treatise on colour, the De Colore, that is receiving much attention.

In a recent commentary on Nature Physics (All the colours of the rainbow), and reference therein (A three-dimensional color space from the 13th century),  Smithson et al. (who also recently published a new critical edition/translation of the treatise with analysis and critical commentaries) analyze the 3D colorspace devised by Grosseteste, who claimed it allows the generation of all possible colours and to describe the variations of colours among different rainbows.

As we learn from Smithson et al., Grosseteste’s colorpsace had three dimensions, quantified by physical properties of the incident light and the medium: these are the scattering angle (which produces variation of hue within a rainbow), the purity of the scattering medium (which produces variation between different rainbows and is linked to the size of the water droplets in the rainbow), and the altitude of the sun (which produces variation in the light incident on a rainbow). The authors were able to model this colorspace and also to show that the locus of rainbow colours generated in that colorspace forms a spiral surface (a family of spiral curves, each form a specific rainbow) in  the perceptual CIELab colorspace.

I found this not only fascinating – a three-dimensional, perceptual colorspace from the 13th century!! – but also a source of renewed interest in creating the perfect perceptual colormaps by spiralling through CIELab.

My first attempt of colormap spiralling in CIELab, CubicYF, came to life by selecting hand-picked colours on CIELab colour charts at fixed lightness values (found in this document by Gernot Hoffmann). The process was described in this post, and you can see an animation of the spiral curve in CIELab space (created with the 3D color inspector plugin in ImageJ) in the video below:

Some time later, after reading this post by Rob Simmon (in particular the section on the NASA Ames Color Tool), and after an email exchange with Rob, I started tinkering with the idea of creating perceptual rainbow colormaps in CIELab programmatically, by using a helix curve or an Archimedean spiral, but reading Smithson et al. got me to try the logarithmic spiral.

So I started my experiments with a warm-up and tried to replicate a Nautilus using a logarithmic spiral with a growth ratio equal to 0.1759. You may have read that the rate at which a Nautilus shell grows can be described by the golden ratio phi, but in fact the golden spiral constructed from a golden rectangle is not a Nautilus Spiral (as an aside, as I was playing with the code I recalled reading some time ago Golden spiral, a nice blog post (with lots of code) by Cleve Moler, creator of the first version of Matlab,  who simulated a golden spiral using a continuously expanding sequence of golden rectangles and inscribed quarter circles).

My nautilus-like spiral, plotted in Figure 1, has a growth ratio of 0.1759 instead of the golden ratio of 1.618.

Figure 1: nautilus-like spiral with growth ratio = 0.1759

And here’s the colormap (I called it logspiral) I came up with after a couple of hours of hacking: as hue cycles from 360 to 90 degrees, chroma spirals outwardly (I used a logarithmic spiral with polar equation c1*exp(c2*h) with a growth ratio c2 of 0.3 and a constant c1 of 20), and lightness increases linearly from 30 to 90.

Figure 2 shows the trajectory in the 2D CIELab a-b plane; the colours shown are the final RGB colours. In Figure 3 the trajectory is shown in 3D CIELab space. The coloured lightness profiles were made using the Colormapline submission from the Matlab File Exchange.

Figure 2: logspiral colormap trajectory in CIELab a-b plane

Figure 3: logspiral colormap in CIELab 3D space

 

N.B. In creating logspiral, I was inspired by Figure 2 in the Nature Physics paper, but there are important differences in terms of colorspace, lightness profile and perception: I am not certain their polar coordinates are equivalent to Lightness, Chroma, and Hue, although they could; and, more importantly, the three-dimensional spirals based on Grosseteste’s colorpsace go from low lightness at low scattering angles to much higher values at mid scattering angles, and then drop again at high scattering angle (remember that these spirals describe real world rainbows), whereas lightness in logspiral lightness is strictly monotonically increasing.

In my next post I will share the Matlab code to generate a full set of logspiral colormaps sweeping the hue circle from different staring colours (and end colors) and also the slower-growing logarithmic spirals to make a set of monochromatic colormaps (similar to those in Figure 2 in the Nature Physics paper).

 

Visualize Mt St Helen with Python and a custom color palette

Evan Bianco of Agile Geoscience wrote a wonderful post on how to use python to import, manipulate, and display digital elevation data for Mt St Helens before and after the infamous 1980 eruption. He also calculated the difference between the two surfaces to calculate the volume that was lost because of the eruption to further showcase Python’s capabilities. I encourage readers to go through the extended version of the exercise by downloading his iPython Notebook and the two data files here and here.

I particularly like Evan’s final visualization (consisting of stacked before eruption, difference, and after eruption surfaces) which he created in Mayavi, a 3D data visualization module for Python. So much so that I am going to piggy back on his work, and show how to import a custom palette in Mayavi, and use it to color one of the surfaces.

Python Code

This first code block imports the linear Lightness palette. Please refer to my last post for instructions on where to download the file from.

import numpy as np
# load 256 RGB triplets in 0-1 range:
LinL = np.loadtxt('Linear_L_0-1.txt') 
# create r, g, and b 1D arrays:
r=LinL[:,0] 
g=LinL[:,1]
b=LinL[:,2]
# create R,G,B, and ALPHA 256*4 array in 0-255 range:
r255=np.array([floor(255*x) for x in r],dtype=np.int) 
g255=np.array([floor(255*x) for x in g],dtype=np.int)
b255=np.array([floor(255*x) for x in b],dtype=np.int)
a255=np.ones((256), dtype=np.int); a255 *= 255;
RGBA255=zip(r255,g255,b255,a255)

This code block imports the palette into Mayavi and uses it to color the Mt St Helens after the eruption surface. You will need to have run part of Evan’s code to get the data.

from mayavi import mlab
# create a figure with white background
mlab.figure(bgcolor=(1, 1, 1))
# create surface and passes it to variable surf
surf=mlab.surf(after, warp_scale=0.2)
# import palette
surf.module_manager.scalar_lut_manager.lut.table = RGBA255
# push updates to the figure
mlab.draw()
mlab.show()

Reference

Mayavi custom colormap example

 

Related Posts (MyCarta)

The rainbow is dead…long live the rainbow! – Part 5 – CIE Lab linear L* rainbow

Convert color palettes to Python Matplotlib colormaps

Related Posts (external)

How much rock was erupted from Mt St Helens