Geoscience Machine Learning bits and bobs – data completeness

Posted on September 19, 2020 by matteomycarta

2016 Machine learning contest – Society of Exploration Geophysicists

In a previous post I showed how to use pandas.isnull to find out, for each well individually, if a column has any null values, and sum to get how many, for each column. Here is one of the examples (with more modern, pandaish syntax compared to the example in the previous post:

for well, data in training_data.groupby('Well Name'): 
print(well)
print (data.isnull().values.any())
print (data.isnull().sum(), '\n')

Simple and quick, the output showed met that – for example – the well ALEXANDER D is missing 466 samples from the PE log:

ALEXANDER D
True
Facies         0
Formation      0
Well Name      0
Depth          0
GR             0
ILD_log10      0
DeltaPHI       0
PHIND          0
PE           466
NM_M           0
RELPOS         0
dtype: int64

A more appealing and versatile alternative, which I discovered after the contest, comes with the matrix function form the missingno library. With the code below I can turn each well into a Pandas DataFrame on the fly, then a missingno matrix plot.

for well, data in training_data.groupby('Well Name'): 

msno.matrix(data, color=(0., 0., 0.45)) 
fig = plt.gcf()
fig.set_size_inches(20, np.round(len(data)/100)) # heigth of the plot for each well reflects well length 
axes=fig.get_axes()
axes[0].set_title(well, color=(0., 0.8, 0.), fontsize=14, ha='center')

In each of the following plots, the sparklines at the right summarizes the general shape of the data completeness and points out the rows with the maximum and minimum nullity in the dataset. This to me is a much more compelling and informative way to inspect log data as it shows the data range where data is missing. The well length is also annotated on the bottom left, by which information I learned that Recruit F9 is much shorter than the other wells. And to ensure this is reinforced I introduced a line to modify each plot so that its height reflects the length. I really like it!

2020 Machine Predicted Lithology – FORCE

Since I am taking part in this year’s FORCE Machine Predicted Lithology challenge, I decided to take the above visualization up a notch. Using Ipywidget’s interactive and a similar logic (in this case data['WELL'].unique()) the tool below allows browsing wells using a Select widget and check the chosen well’s curves completeness, on the fly. You can try the tool in this Jupyter notebook.

In a second Jupyter notebook on the other hand, I used missingno matrix to make a quick visual summary plot of the entire dataset completion, log by log (all wells together):

Then, to explore in more depth the data completion, below I also plotted the library’s dendrogram plot. As explained in the library’s documentation, The dendrogram uses a hierarchical clustering algorithm (courtesy of Scipy) to bin variables against one another by their nullity correlation (measured in terms of binary distance). At each step of the tree the variables are split up based on which combination minimizes the distance of the remaining clusters. The more monotone the set of variables, the closer their total distance is to zero, and the closer their average distance (the y-axis) is to zero.

I find that looking at these two plots provides a very compelling and informative way to inspect data completeness, and I am wondering if they couldn’t be used to guide the strategy to deal with missing data, together with domain knowledge from petrophysics.

Interpreting the dendrogram in a top-down fashion, as suggested in the library documentation, my first thoughts are that this may suggest trying to predict missing values in a sequential fashion rather than for all logs at once. For example, looking at the largest cluster on the left, and starting from top right, I am thinking of testing use of GR to first predict missing values in RDEP, then both to predict missing values in RMED, then DTC. Then add CALI and use all logs completed so far to predict RHOB, and so on.

Naturally, this strategy will need to be tested against alternative strategies using lithology prediction accuracy. I would do that in the context of learning curves: I am imagining comparing the training and crossvalidation error first using only non NaN rows, then replace all NANs with mean, then compare separately this sequential log completing strategy with an all-in one strategy.

Mild or wild: robustness through morphological filtering

Posted on December 10, 2016 by matteomycarta

This guest post (first published here) is by Elwyn Galloway, author of Scibbatical on WordPress. It is the forth in our series of collaborative articles about sketch2model, a project from the 2015 Calgary Geoscience Hackathon organized by Agile Geoscience. Happy reading.

We’re highlighting a key issue that came up in our project, and describing what how we tackled it. Matteo’s post on Morphological Filtering does a great job of explaining what we implemented in sketch2model. I’ll build on his post to explain the why and how. In case you need a refresher on sketch2model, look back at sketch2model, Sketch Image Enhancement, Linking Edges with Geomorphological Filtering.

Morphological Filtering

As Matteo demonstrated by example, sketch2model’s ability to segment a sketch properly depends on the fidelity of a sketch.

An image of a whiteboard sketch (left) divides an area into three sections. Without morphological filtering, sketch2model segments the original image into two sections (identified as orange, purple) (centre). The algorithm correctly segments the area into three sections (orange, purple, green) when morphological filtering is applied (right).

To compensate for sketch imperfections, Matteo suggested morphological filtering on binarized images. Morphological filtering is a set of image modification tools which modify the shape of elements in an image. He suggested using the closing tool for our purposes. Have a look at Matteo’s Post for insight into this and other morphological filters.

One of the best aspects of this approach is that it is simple to apply. There is essentially one parameter to define: a structuring element. Since you’ve already read Matteo’s post, you recall his onion analogy explaining the morphological filtering processes of erosion and dilation – erosion is akin to removing an onion layer, dilation is adding a layer on. You’ll also recall that the size of the structuring element is the thickness of the layer added to, or removed from, the onion. Essentially, the parameterization of this process comes down to choosing the thickness of the onion layers.

Sketch2model uses dilation followed by erosion to fill gaps left between sketch lines (morphological dilation followed by erosion is closing). Matteo created this really great widget to illustrate closing using an interactive animation.

Matteo’s animation was created using this interactive Jupyter notebook. Closing connects the lines of the sketch.

Some is Good, More is Better?

Matteo showed that closing fails if the structural element used is too small. So just make it really big, right? Well, there can be too much of a good thing. Compare what happens when you use an appropriately sized structuring element (mild) to the results from an excessively large structuring element (wild).

Comparing the results of mild and wild structuring elements: if the structuring element is too large, the filter compromises the quality of the reproduction.

Using a morphological filter with a structural element that is too small doesn’t fix the sketches, but using a structural element that is too large compromises the sketch too. We’re left to find an element that’s just right. Since one of the priorities for sketch2model was to robustly handle a variety of sketches with as little user input as possible — marker on whiteboard, pencil on paper, ink on napkin — we were motivated to find a way to do this without requiring the user to select the size of the structuring element.

Is there a universal solution? Consider this: a sketch captured in two images, each with their own resolution. In one image, the lines of the sketch appear to be approximately 16 pixels wide. The same lines appear to be 32 pixels wide in the other image. Since the size of the structuring element is defined in terms of pixels, it becomes apparent the ideal structuring element cannot be “one size fits all”.

High-resolution (left) versus low-resolution (right) image of the same portion of a sketch. Closing the gap between the lines would require a different size structuring element for each image: about 5 pixels for high-resolution or 1 pixel for low-resolution.

Thinking Like a Human

Still motivated to avoid user parameterization for the structuring element, we explored ways to make the algorithm intelligent enough to select an appropriate structuring element on its own. Ultimately, we had to realize a few things before we came up with something that would work:

When capturing an image of a sketch, users compose very similar images (compose in the photographic sense of the word): sketch is centered and nearly fills the captured image.
The image of a sketch is not the same as a user’s perception of a sketch: a camera may record imperfections (gaps) in a sketch that a user does not perceive.
The insignificance of camera resolution: a sketched feature in captured at two different resolutions would have two different lengths (in pixels), but identical lengths when defined as a percentage of image size.

With these insights, we deduced that the gaps we were trying to fill with morphological filtering would be those that escaped the notice of the sketch artist.

Recognizing the importance of accurate sketch reproduction, our solution applies the smallest structuring element possible that will still fill any unintentional gaps in a sketch. It does so in a way that is adaptable.

A discussion about the definition of “unintentional gap” allowed us to create a mandate for the closing portion of our algorithm. Sketch2model should fill gaps the user doesn’t notice. The detail below the limit of the user’s perception should not affect the output model. A quick “literature” (i.e. Google) search revealed that a person’s visual perception is affected by many factors beyond the eye’s optic limits. Without a simple formula to define a limit, we did what any hacker would do… define it empirically. Use a bunch of test images to tweak the structuring element of the closing filter to leave the perceptible gaps and fill in the imperceptible ones. In the sketch2model algorithm, the size of structuring element is defined as a fraction of the image size, so it was the fraction that we tuned empirically.

Producing Usable Results

Implicit in the implementation is sketch2model’s expectation that the user’s sketch, and their image of the sketch are crafted with some care. The expectations are reasonable: connect lines you’d like connected; get a clear image of your sketch. Like so much else in life, better input gives better results.

Input (left) and result (right) of sketch2model.

To produce an adaptable algorithm requiring as little user input as possible, the sketch2model team had to mix a little image processing wizardry with some non-technical insight.

Have you tried it? You can find it at sketch2model.com. Also on GitHub.

Previous posts in the sketch2model series: sketch2model, Sketch Image Enhancement, Linking Edges with Geomorphological Filtering.

Machine Learning in Geoscience with Scikit-learn. Part 2: inferential statistics and domain knowledge to select features for oil prediction

Posted on November 13, 2016 by matteomycarta

In the first post of this series I showed how to use Pandas, Seaborn, and Matplotlib to:

load a dataset
test, clean up, and summarize the data
start looking for relationships between variables using scatterplots and correlation coefficients

In this second post, I will expand on the latter point by introducing some tests and visualizations that will help highlight the possible criteria for choosing some variables, and dropping others. All in Python.

I will use a different dataset than that in the previous post. This one is from the paper “Many correlation coefficients, null hypotheses, and high value“(Lee Hunt, CSEG Recorder, December 2013).

The target to be predicted is oil production from a marine barrier sand. We have measured production (in tens of barrels per day) and 7 unknown (initially) predictors, at 21 wells.

Hang on tight, and read along, because it will be a wild ride!

I will show how to:

1) automatically flag linearly correlated predictors, so we can decide which might be dropped. In the example below (a matrix of pair-wise correlation coefficients between variables) we see that X2, and X7, the second and third best individual predictors of production (shown in the bottom row) are also highly correlated to X1, the best overall predictor.

2) automatically flag predictors that fail a critical r test

3) create a table to assess the probability that a certain correlation is spurious, in other words the probability of getting at least the correlation coefficient we got with our the sample, or even higher, purely by chance.

I will not recommend to run these tests and apply the criteria blindly. Rather, I will suggest how to use them to learn more about the data, and in conjunction with domain knowledge about the problem at hand (in this case oil production), make more informed choices about which variables should, and which should not be used.

And, of course, I will show how to make the prediction.

Have fun reading: get the Jupyter notebook on GitHub.

Reinventing the color wheel – part 1

Posted on June 16, 2015 by matteomycarta

In New Matlab isoluminant colormap for azimuth data I showcased a Matlab colormap that I believe is perceptually superior to the conventional, HSV-based colormaps for azimuth data, in that it does not superimposes on the data the color artifacts that plague all rainbows. However, it still has a limitation, which is that the main colours do not correspond exactly to the four compass directions N, E, W, and S.

My intention with this series is to go back to square one, deconstruct the conventional colormaps for azimuth, and build a new one that has all the desired properties of both perceptual linearity, and correct location of the main colors. All reproducible in Python.

If we wanted to build from scratch a colormap for azimuth (or phase) data the main tasks would be to generate a sequence of distinguishable colours at opposite quadrants, or compass directions (like 0 and 180 degrees, or N and S), and to wrap around the sequence with the same colour at the two ends.

But to do that, we should avoid interpolating linearly between fully saturated hues in RGB or HSL space.

To illustrate why, it is useful to look at the figure below. On the left is a hue circle with primary, secondary, and tertiary colours in a counter-clockwise sequence: red, rose, magenta, violet, blue, azure, cyan, aquamarine, electric green, chartreuse, yellow, and orange. The colour chips are placed at evenly spaced angular distances according to their hue (in radians).

Left, primary, secondary, and tertiary colour chips arranged using hue for angular distance; right, the same colour chips arranged using intensity for angular distance.

This looks familiar and seems like a natural ordering of colors, so we may be tempted in building a colormap, to just take that sequence, wrap it around at the red (or the magenta) and linearly interpolate to 256 colours to get a continuous colormap [1], and use it for azimuth data, which is how usually the conventional azimuth colormaps are built.

On the right side in the figure the chips have been rearranged according to their intensity on a counter-clockwise sequence from 0 to 255 with 0 at three hours; so, for example blue, which is the darkest colour with an intensity of 29, is close to the beginning of the sequence, and yellow, the brightest with an intensity of 225, is close to the end. Notice that the chips are no longer equidistant.

The most striking is that the blue and the yellow chips are more separated than the other chips, and for this reason blue and yellow features seem to stand out a lot more in a map when using this color sequence, which can be both distracting and confusing. A good example is Figure 3 in New Matlab isoluminant colormap for azimuth data.

Also, yellow and red, being two chips apart in the left circle in the figure above, are used to colour azimuths 60 degrees apart, and so do cyan and green. However, if we look at the right circle, we realize that the yellow and red chips are much further apart than the cyan and green chips [2] in the perceptual dimension of intensity; therefore, features colored in yellow and red could be perceived as much further apart (in azimuth) than cyan and green.

These differences may be subtle, but in my opinion they become important when dip azimuth is combined with other attributes, perhaps using a 3D colormap, and the resulting map is used for detailed structural interpretation. There is a really good example of this type of 3D colormap in Chopra and Marfurt (2007), where dip azimuth is rendered with hue modulation, dip magnitude with saturation modulation, and coherence with lightness modulation.

A code snippet with the main Python commands to generate the two polar scatterplots in the figure is listed, and explained below. The full code can be found in this Jupiter Notebook.

01 import matplotlib.colors as clr
02 keys=['red', '#FF007F', 'magenta', '#7F00FF', 'blue', '#0080FF','cyan', '#00FF80',
'#00FF00', '#7FFF00', 'yellow', '#FF7F00']
03 my_cmap = clr.ListedColormap(keys)
04 x = np.arange(12)
05 color = my_cmap(x)
06 n = 12
07 theta = 2*np.pi*(np.linspace(0,1,13)) 
08 r = np.ones(13)*2.5
09 area = 200*r**2 # size of color chips
10 c = plt.scatter(theta, r, c=color, s=area)
11 theta_i = 2*np.pi*(sorted_intensity/255.0)
12 colors = my_sorted_cmap(np.arange(12))
13 c = plt.scatter(theta_i, r, c=colors, s=area)

In line 01 we import the Colors module from the Matplotlib library, then line 02 creates the desired sequence of colours (red, rose, magenta, violet, blue, azure, cyan, aquamarine, electric green, chartreuse, yellow, and orange) using either the name or Hex code, and line 03 generates the colormap. Then we use lines 04 and 05 to assign colours to the chips in the first scatterplot (left), and lines 06, 07, and 09 to specify the number of chips, the angular distances between chips, and the area of the chips, respectively. Line 10 generates the plot. The modifications in lines 11-14 will result in the scatterplot on the right side in the figure (the sorted intensity is calculated in much the same way as in my Geophysical tutorial – How to evaluate and compare colormaps in Python).

[1] Or, perhaps, just create 12 discrete colour classes to group azimuth values in bins of pi/6 (30 degrees) each, and wrap around again at the magenta, to generate a discrete colormap.

[2] The green chip is almost completely covered by the orange chip.

Logarithmic spiral, nautilus, and rainbow

Posted on November 28, 2014 by matteomycarta

The other day I stumbled into an interesting article on The Guardian online: The medieval bishop who helped to unweave the rainbow. In the article I learned for the first time of Robert Grosseteste, a 13th century British scholar (with an interesting Italian last name: Grosse teste = big heads) who was also the Bishop of Lincoln.

The Bishops’ interests and investigations covered diverse topics, making him a pre-renaissance polymath; however, it is his 1225 treatise on colour, the De Colore, that is receiving much attention.

In a recent commentary on Nature Physics (All the colours of the rainbow), and reference therein (A three-dimensional color space from the 13th century), Smithson et al. (who also recently published a new critical edition/translation of the treatise with analysis and critical commentaries) analyze the 3D colorspace devised by Grosseteste, who claimed it allows the generation of all possible colours and to describe the variations of colours among different rainbows.

As we learn from Smithson et al., Grosseteste’s colorpsace had three dimensions, quantified by physical properties of the incident light and the medium: these are the scattering angle (which produces variation of hue within a rainbow), the purity of the scattering medium (which produces variation between different rainbows and is linked to the size of the water droplets in the rainbow), and the altitude of the sun (which produces variation in the light incident on a rainbow). The authors were able to model this colorspace and also to show that the locus of rainbow colours generated in that colorspace forms a spiral surface (a family of spiral curves, each form a specific rainbow) in the perceptual CIELab colorspace.

I found this not only fascinating – a three-dimensional, perceptual colorspace from the 13th century!! – but also a source of renewed interest in creating the perfect perceptual colormaps by spiralling through CIELab.

My first attempt of colormap spiralling in CIELab, CubicYF, came to life by selecting hand-picked colours on CIELab colour charts at fixed lightness values (found in this document by Gernot Hoffmann). The process was described in this post, and you can see an animation of the spiral curve in CIELab space (created with the 3D color inspector plugin in ImageJ) in the video below:

Some time later, after reading this post by Rob Simmon (in particular the section on the NASA Ames Color Tool), and after an email exchange with Rob, I started tinkering with the idea of creating perceptual rainbow colormaps in CIELab programmatically, by using a helix curve or an Archimedean spiral, but reading Smithson et al. got me to try the logarithmic spiral.

So I started my experiments with a warm-up and tried to replicate a Nautilus using a logarithmic spiral with a growth ratio equal to 0.1759. You may have read that the rate at which a Nautilus shell grows can be described by the golden ratio phi, but in fact the golden spiral constructed from a golden rectangle is not a Nautilus Spiral (as an aside, as I was playing with the code I recalled reading some time ago Golden spiral, a nice blog post (with lots of code) by Cleve Moler, creator of the first version of Matlab, who simulated a golden spiral using a continuously expanding sequence of golden rectangles and inscribed quarter circles).

My nautilus-like spiral, plotted in Figure 1, has a growth ratio of 0.1759 instead of the golden ratio of 1.618.

nautilus logarithmic spiral with growth ratio = 0.1759

Figure 1: nautilus-like spiral with growth ratio = 0.1759

And here’s the colormap (I called it logspiral) I came up with after a couple of hours of hacking: as hue cycles from 360 to 90 degrees, chroma spirals outwardly (I used a logarithmic spiral with polar equation c1*exp(c2*h) with a growth ratio c2 of 0.3 and a constant c1 of 20), and lightness increases linearly from 30 to 90.

Figure 2 shows the trajectory in the 2D CIELab a-b plane; the colours shown are the final RGB colours. In Figure 3 the trajectory is shown in 3D CIELab space. The coloured lightness profiles were made using the Colormapline submission from the Matlab File Exchange.

2D logspiral colormap in CIELab a-b plane

Figure 2: logspiral colormap trajectory in CIELab a-b plane

logspiral colormap trajectory in CIELab 3D space

Figure 3: logspiral colormap in CIELab 3D space

N.B. In creating logspiral, I was inspired by Figure 2 in the Nature Physics paper, but there are important differences in terms of colorspace, lightness profile and perception: I am not certain their polar coordinates are equivalent to Lightness, Chroma, and Hue, although they could; and, more importantly, the three-dimensional spirals based on Grosseteste’s colorpsace go from low lightness at low scattering angles to much higher values at mid scattering angles, and then drop again at high scattering angle (remember that these spirals describe real world rainbows), whereas lightness in logspiral lightness is strictly monotonically increasing.

In my next post I will share the Matlab code to generate a full set of logspiral colormaps sweeping the hue circle from different staring colours (and end colors) and also the slower-growing logarithmic spirals to make a set of monochromatic colormaps (similar to those in Figure 2 in the Nature Physics paper).

New rainbow colormap: sawthoot-shaped lightness profile

Posted on November 13, 2014 by matteomycarta

Why another rainbow

In the comment section of my last post Steve Eddins from Mathworks reported that some Matlab users prefer Jet to Parula, the new default perceptual colormap in Matlab, because within certain ranges Jet affords a greater contrast, intended as the rate of change in lightness.

My counter-argument to that is that yes, some data may benefit from being displayed using Jet (in terms of contrast, and hence the power to resolve smaller anomalies) because of those areas of very steep rate of change of lightness, like the blue to cyan and yellow to red portions (see Figure 1). But the price one has to pay is that there is an area of very low gradient (a greenish band between cyan and yellow) where there’s nearly no contrast, which would obfuscate subtle anomalies in the data. On top of that there’s no control of where each of those areas are located, so a lot of effort has to go into trying to fit those regions of artificially high contrast to the portion of data of interest.

Figure 1

Because of their high lightness, the yellow and cyan artificial edges also cause problems. In his latest blog post Steve uses a test pattern do demonstrate how they make the interpretation of trivial structures more difficult. He also explains why they occurr in some locations and not others in the first place. I wonder if the resulting regions of high lightness juxtaposed to regions of low lightness could be chromatic Mach bands.

Additionally, as Steve points out, the low-contrast juxtaposition of dark red and dark blue bands creates the visual illusion of depth (Chromostereopsis) in other positions of the test pattern, creating further confusion.

But I have some good news for the hardcore fans of Jet, and rainbow colormaps in general. I created a rainbow with a sawtooth-shaped lightness profile made up of 5 ramps, each with the same rate of change in lightness and total lightness change of 60, and alternatively negative and positive signs. This is shown in Figure 2, and replaces the lightness profile of a basic 6-color rainbow (magenta-blue-cyan-green-yellow-red) shown in Figure 3.

Figure 2

Figure 3

With this rainbow users have the ability to apply greater contrast to their data to boost small anomalies, but in a more controlled way. The colormap is available with my File Exchange function, Perceptually improved colormaps. Below is the Matlab code I used to generate the new rainbow.

Matlab code

To run this code you will need Colorspace, a free function from Matlab File Exchange, for the color space transformations.

%% basic 6-colour rainbow
% create RGB components
m = [1, 0, 1]; % magenta
b = [0, 0, 1]; % blue
c = [0, 1, 1]; % cyan
g = [0, 1, 0]; % green
y = [1, 1, 0]; % yellow
r = [1, 0, 0]; % red
% concatenate components
rgb = vertcat(m,b,c,g,y,r);
% interpolate to 256 colours
rainbow=interp1(linspace(1, 256, 6),rgb,[1:1:256]);

%% calculate Lab components
% convert from RGB to Lab colour space
% requires this function: Colorspace transforamtions
% www.mathworks.com/matlabcentral/fileexchange/28790-colorspace-transformations
lab = colorspace('RGB->Lab',rainbow);

%% replace random lightness profile with sawtooth-shaped profile
% contrast (magnitude of lightness change) between
% each pair of adjeacent colors set to 60
L1 = [90, 30, 90, 30, 90, 30];
% interpolate to 256 lightness values
L1int = interp1(linspace(1, 256, 6),L1,[1:1:256])';
% replace
lab1 = horzcat(L1int,lab(:,2),lab(:,3));

%% new rainbow
% convert back from Lab to RGB colour space
swtth = colorspace('RGB<-Lab',lab1);

Test results

Figures 4, 5, and 6 show the three colormaps used with my Pyramid test surface (notice in Figure 5 that the green band artifact with this rainbow is even more pronounced than with jet). I welcome feedback.

Figure 4

Figure 5

Figure 6

Aknowledgements

The coloured lightness profiles were made using the Colormapline submission from the Matlab File Exchange.

Visualizing colormap artifacts

Posted on November 12, 2014 by matteomycarta

In Evaluate and compare colormaps, I have shown how to extract and display the lightness profile of a colormap using Python. I do this routinely with colormaps, but I realize it takes an effort, and not all users may feel comfortable using code to test whether a colormap is perceptual or not.

This got me thinking that there is perhaps a need for a user-friendly, interactive tool to help identify colormap artifacts, and wondering how it would look like.

In a previous post, Comparing color palettes, I plotted the elevation for the South American continent from the Global Land One-km Base Elevation Project using four different color palettes. In Figure 1 below I plot again 3 of those: rainbow, linear lightness rainbow, and grayscale, respectively, from left to right. In maps like these some artifacts are very evident. For example there’s a classic film negative effect in the map on the left, where the Guiana Highlands and the Brazilian Highlands, both in blue, seem to stand lower than the Amazon basin, in violet. This is due to the much lower lightness (or alternatively intensity) of the colour blue compared to the violet.

Figure 1

However, other artifacts are more subtle, like the inversion of the highest peaks in the Andes, which are coloured in red, relative to their surroundings, in particular the Altipiano, an endorheic basin that includes Lake Titicaca.

My idea for this tool is simple, and consists of two windows. The first is a basemap window which can display either a demo dataset or user data loaded from an ASCII grid file. In this window the user would interactively select a profile by building a polyline with point-and-click, like the one in Figure 2 in white.

Figure 2

The second window would show the elevation profile with the colour fill assigned based on the colormap, like in Figure 3 at the bottom (with colormap to the right), and with a profile of the corresponding colour intensities (on a scale 1-255) at the top.

In this view it is immediately evident that, for example, the two highest peaks near the center, coloured in red, are relative intensity lows. Another anomaly is the absolute intensity low on the right side, corresponding to the colour blue, where the elevation profile varies smoothly.

Figure 3

I created this concept prototype using a combination of Matlab, Python, and Surfer. I welcome suggestions for possible additional features, and would like to hear form folks interested in collaboration on a web app (ideally in Python).

What your brain does with colours when you are not “looking” – part 2

Posted on November 3, 2014 by matteomycarta

In What your brain does with colours when you are not “looking”, part 1, I displayed some audio spectrogram data (courtesy of Giuliano Bernardi at the University of Leuven) using 5 different colormaps to render the amplitude values: Jet (until recently Matlab’s standard colormap), grayscale, linear lightness rainbow, modified heated body, and cube lightness rainbow. I then asked readers to cast a vote for what they thought was the best colormap to visualize this dataset.

I was curious to see how all these colormaps fared, but my expectation was that Jet would sink to the bottom. I was really surprised to see it came on top, one vote ahead of the linear lightness rainbow (21 and 20 votes out of 62, respectively). The modified heated body followed with 11 votes.

My surprise comes from the fact that Jet carries perceptual artifacts within the progression of colours (see for example this post). One way to demonstrate these artifacts is to convert the 2D map into a 3D surface where again we use Jet to colour amplitude values, but we use the intensities from the 2D map for the elevation. This can be done for example using the Interactive 3D Surface Plot plugin for ImageJ (as in my previous post Lending you a hand with image processing – introduction to ImageJ). The resulting surface is shown in Figure 1. This is almost exactly what your brain would do when you look at the 2D map colored with Jet in the previous post.

Figure 1

In Figure 2 the same data is now displayed as a surface where amplitude values were used for the elevation, with a very light sun shading to help a bit with the perception of relief, but no colormap at all. to When comparing Figure 1 with Figure 2 one of the artifacts is immediately recognized: the highest values in Figure 2, which honours the data, become a relative low in Figure 1. This is because red has lower intensity than yellow and therefore data colored in red in 2D are plotted at a lower elevation than data colored in yellow, even though the amplitudes of the latter were lowest.

Figure 2

For these reasons, I did not expect Jet to be the top pick. On the other hand, I think Jet is perhaps favoured because with consistent use, our brain, learns in part to accommodate for these non-perceptual artifacts in 2D maps, and because it has at least two regions of higher contrast (higher magnitude gradient) than other colormaps. Unfortunately, as I wrote in a recently published tutorial, these regions are randomly placed in the colormap, and the gradients are variable, so we gain on contrast but lose on faithfulness in representing the data structure.

Matt Hall wrote a great comment following the previous post, really making an argument for switching between multiple colormaps in the interpretation stage to explore and highlight features in both the signal and the noise in the data, and that perhaps no single colormap is best overall. I agree 100% on almost everything Matt said, except perhaps on the best overall: looking at the 2D maps, at least with this dataset, I feel the heated body could be the best overall colormap, even if marginally. In Figure 3, Figure 4, Figure 5, and Figure 6 I show the 3D displays obtained by converting the 2D grayscale, linear lightness rainbow, modified heated body, and cube llightness rainbow, respectively. Looking at the 3D displays altogether gives me a confirmation of that feeling.

What do you think?

Figure 3

Surface_Plot_of_spectrogram_lin_L_rainbow

Figure 4

Surface_Plot_of_spectrogram_mod_heated_body

Figure 5

Figure 6

MyCarta

A blog about Geoscience, Visualization, Data Science, AI

Category Archives: Tutorial

Geoscience Machine Learning bits and bobs – data completeness

2016 Machine learning contest – Society of Exploration Geophysicists

2020 Machine Predicted Lithology – FORCE

Like this:

Mild or wild: robustness through morphological filtering

This guest post (first published here) is by Elwyn Galloway, author of Scibbatical on WordPress. It is the forth in our series of collaborative articles about sketch2model, a project from the 2015 Calgary Geoscience Hackathon organized by Agile Geoscience. Happy reading.

Morphological Filtering

Some is Good, More is Better?

Thinking Like a Human

Producing Usable Results

Like this:

Machine Learning in Geoscience with Scikit-learn. Part 2: inferential statistics and domain knowledge to select features for oil prediction

Like this:

Reinventing the color wheel – part 1

Like this:

Logarithmic spiral, nautilus, and rainbow

Like this:

New rainbow colormap: sawthoot-shaped lightness profile

Why another rainbow

Matlab code

Test results

Aknowledgements

Like this:

Visualizing colormap artifacts

Like this:

What your brain does with colours when you are not “looking” – part 2

Like this:

2016 Machine learning contest – Society of Exploration Geophysicists

2020 Machine Predicted Lithology – FORCE

Share this:

Like this:

This guest post (first published here) is by Elwyn Galloway, author of Scibbatical on WordPress. It is the forth in our series of collaborative articles about sketch2model, a project from the 2015 Calgary Geoscience Hackathon organized by Agile Geoscience. Happy reading.

Morphological Filtering

Some is Good, More is Better?

Thinking Like a Human

Producing Usable Results

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Why another rainbow

Matlab code

Test results

Aknowledgements

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: