Upscaling geophysical logs with Python using Pandas and Bruges

With a few hours of work last weekend, I finished putting together a Jupyter notebook tutorial, started at the Geophysics Python sprint 2018, demonstrating how to:

    • Use Agile Scientific’s Welly to load two wells with several geophysical logs
    • Use PandasWelly, and NumPy to: remove all logs except for compressional wave velocity (Vp), shear wave velocity (Vs), and density (RHOB); store the wells in individual DataFrames; make the sampling rate common to both wells; check for null values; convert units from imperial to metric; convert slowness to velocity; add a well name column
    • Split the DataFrame by well using unique values in the well name column
    • For each group/well use Agile Scientific’s Bruges ‘s Backus average to upscale all curves individually
    • Add the upscaled curves back to the DataFrame

Matt Hall, (organizer), told me during a breakfast chat on the first day of the sprint that this tutorial would be a very good to have since it is one of the most requested examples by neophyte users of the Bruges library; I was happy to oblige.

The code for the most important bit, the last two items in the above list, is included below:

# Define parameters for the Backus filter
lb = 40   # Backus length in meters
dz = 1.0  # Log sampling interval in meters

# Do the upscaling work
wells_bk = pd.DataFrame()
grouped = wells['well'].unique()  
for well in grouped:
    new_df = pd.DataFrame()
    Vp = np.array(wells.loc[wells['well'] == well, 'Vp'])
    Vs = np.array(wells.loc[wells['well'] == well, 'Vs'])
    rhob = np.array(wells.loc[wells['well'] == well, 'RHOB'])
    Vp_bks, Vs_bks, rhob_bks = br.rockphysics.backus(Vp, Vs, rhob, lb, dz)
    new_df['Vp_bk'] = Vp_bks
    new_df['Vs_bk'] = Vs_bks
    new_df['rhob_bk'] = rhob_bks
    wells_bk = pd.concat([wells_bk, new_df])

# Add to the input DataFrame
wells_final = (np.concatenate((wells.values, wells_bk.values), axis=1)) 
cols = list(wells) + list(wells_bk) 
wells_final_df = pd.DataFrame(wells_final, columns=cols)

And here is a plot comparing the raw and upscaled Vp and Vs logs for one of the wells:

backus

Please check the notebook if you want to try the full example.

Geophysics Python sprint 2018 – day 2 and beyond, part II

In the last post I wrote about what Volodymyr and I worked on during a good portion of day two of the sprint in October, and continued to work on upon our return to Calgary.

In addition to that I also continued to work on a notebook example, started in day one, demonstrating on how to upscale sonic and density logs from more than one log at a time using Bruges ‘ backusand Panda’s groupby. This will be the focus of a future post.

The final thing I did was to write, and test an error_flag function for Bruges. The function calculates the difference between a predicted and a real curve; it flags errors in prediction if the difference between the curves exceeds a user-defined distance (in standard deviation units) from the mean difference. Another option available is to check whether the curves have opposite slopes (for example one increasing, the other decreasing within a specific interval). The result is a binary error log that can then be used to generate QC plots, to evaluate the performance of the prediction processes in a more (it is my hope) insightful way.

The inspiration for this stems from a discussion over coffee I had 5 or 6 years ago with Glenn Larson, a Geophysicist at Devon Energy, about the limitations of (and alternatives to) using a single global score when evaluating the result of seismic inversion against wireline well logs (the ground truth). I’d been holding that in the back of my mind for years, then finally got to it last Fall.

flag_full

Summary statistics can also be calculated by stratigraphic unit, as demonstrated in the accompanying Jupyter Notebook.

Having fun learning Python with your kids

When I was a kid I loved video games. Now … we’re talking about 70s and 80s games: my first home computer was a Texas Instruments TI-99/4A, my second one a Commodore 64. I loved all games in those days, no matter how primitive they might seem like now; you can get a glimpse of WHAT we are talking about in this video. And funny enough, I still really do love them; below is one of my favorites, H.E.R.O. which I can play these days thanks to a C64 emulator.

HERO

Unfortunately for me, I did not catch (not then, at least) the programming bug. What really turned me down was having tried to type thousands of lines of BASIC to run what looked like an awesome adventure game from a command list in a magazine, only to never be able to play the game because the list was full of typos.

So, I want to make it better for both my daughter, AND myself,  because I plan to make more games. First of all, we are working with Python. There are a lot of resources for making games in Python; a really good one is Al Sweigart‘s Invent Your Own Computer Games with Python (as an aside: I have been using a chapter here, a chapter there from many of his books, it is awesome that he made them available online, but it was time to give back so I bought the 4th edition of Invent Your Own Computer Games with Python).

To set the stage for our first game I reviewed the first few chapters of the book, particularly Chapter 5: Dragon Realm, to get some ideas on how to use Python to make interactive games, but then decided to start with our own game: a calculator that would talk to the player, via the terminal, and ask for numbers and for their favorite algebraic operation. However, we added a twist because it sounded a lot more fun: IF her mom were to play, we’d want the results to be occasionally wrong. My daughter worked out some of the logic, after suggesting the twist, and I did the heavy lifting with Python; in the remainder of the post I will work through how we did it. Please note this game is written for Python 3.

In the first block, Jester, the calculator, introduces itself and then asks the player for their name; the name is stored to a variable via the input function and used for greeting and throughout the game. The capitalize() method turns the first letter of the name string to upper case, if needed.

print ('Hello, I am a talking calculator and my name is Jester.') 
name = input('What is your name? ') 
print ('Hello ' + name.capitalize() + ', nice to meet you!')

The real game starts in the big block below:

while True:
  operation = input(name.capitalize() + 
', would you like to add or subtract? (enter "a" or "s"; "q" to quit) ')
  if operation.lower() =='q':
    print('Goodbye ' + name.capitalize())
    break
  elif operation.lower() =='s' or operation.lower() =='a':
    print ('OK, ready.')
    input1 = input('Can I have the first number? ')
    try:
        num1=int(input1)
    except ValueError:
        print ("That is not a number!")
        input1 = input('Can I have the first number? ')
    input2 = input('Can I have the second number? ')
    try:
       num2=int(input2)
    except ValueError:
        print("That is not a number!")
        input2 = input('Can I have the second number? ')
    num1=int(input1)
    num2=int(input2)

Let’s break it up. First, the player chooses an operation (addition or subtraction) again via the input function. The while loop allows the player to continue playing after each operation.

while True:
  operation = input(name.capitalize() + 
', would you like to add or subtract? (enter "a" or "s"; "q" to quit) ')

The following if statement is used to exit the while loop (using break) if the player decides to quit.

  if operation.lower() =='q':
    print('Goodbye ' + name.capitalize())
    break

An elif statement comes next, allowing the player to chose the operation; at the same time if neither ‘a’ nor ‘s’ are chosen, the program will go back to the beginning of the while loop.

  elif operation.lower() =='s' or operation.lower() =='a':
    print ('OK, ready.')

Finally, Jester asks for the first number to be added or subtracted. We want to be sure that the string passed by the user in here is a number, so we use a try clause to handle a possible exception (for example a letter is entered instead of a number). Here is the logic: if when prompted for the first number we typed a letter by accident (for example the string ‘w’, instead of the string ‘3’ , when input1  is passed to int we’d get this fatal error and the program would be interrupted:

>>> num1=int(input1)
>>> ValueError: invalid literal for int() with base 10: 'w'

With the try clause we instead handle the exception by printing the custom error message “That is not a number!“, followed by a new request for a number:

    input1 = input('Can I have the first number? ')
    try:
        num1=int(input1)
    except ValueError:
        print ("That is not a number!")
        input1 = input('Can I have the first number? ')
    num1=int(input1)

The above cycle is repeated for the second number. After Jester gets two legitimate integers to work with, the operation part begins, which includes Jester’s prank. Indeed, depending on the name of the player, the result will be either always correct, or sometimes correct, sometimes a bit off (which is done using random). We picked “Olivia” to represent the “lucky” object of the prank. This is the only hard-coded part of the program. It needs to be changed with the name of the intended “victim”.

There are two nested if statements below that allow for the following to happen: if the player’s name is Olivia, a random number between 1 and 3, rand1 is drawn.  Then the  value of rand1 is checked: if if it is 3 then a non-zero number, num3, of value drawn between 1 and 5 (rand2) will be added to the result of the operation to make it wrong; otherwise the operation will execute correctly.

from random import randint
num3 = 0 # initializes random error variable
  if (name.lower() == 'olivia'): # for this player ....
    rnd1 = randint(1,3)
    rnd2 = randint(1,5)
    if rnd1 == 3: # .... if a 3 is drawn the correct result 
      num3 = rnd2 # .... will be modified by this random amount

Finally, this is the addition and subtraction block: in both cases unless “Olivia” is the player num3 will be zero and the result will be correct.

  if operation.lower() == 'a':
    result = num1 + num2 + num3
    print (str(num1) + ' + '  + str(num2) + ' = ' + str(result))
  elif operation.lower() == 's':
    result = (max(num1, num2) - min(num1, num2)) - num3
    print (str(max(num1, num2)) + ' - ' 
         + str(min(num1, num2))  + ' = ' + str(result))

If you want to play the game, please copy the full program below and paste it in a .py file and give it a try. Enjoy!

# NB optimized for Python 3
from random import randint

print ('Hello, I am a talking calculator and my name is Jester.')
name = input('What is your name? ')
print ('Hello ' + name.capitalize() + ', nice to meet you!')

while True:
  operation = input(name.capitalize() +
  ', would you like to add or subtract? (enter "a" or "s"; "q" to quit) ')

  if operation.lower() =='q':
    print('Goodbye ' + name.capitalize())
    break

  elif operation.lower() =='s' or operation.lower() =='a':
    print ('OK, ready.')
    input1 = input('Can I have the first number? ')
    try:
        num1=int(input1)
    except ValueError:
       print ("That is not a number!")
       input1 = input('Can I have the first number? ')

    input2 = input('Can I have the second number? ')
    try:
       num2=int(input2)
    except ValueError:
        print("That is not a number!")
        input2 = input('Can I have the second number? ')

    num1=int(input1)
    num2=int(input2)

  num3 = 0
  if (name.lower() == 'olivia'):
    rnd1 = randint(1,3)
    rnd2 = randint(1,5)
    if rnd1 == 3:
      num3 = rnd2

  if operation.lower() == 'a':
    result = num1 + num2 + num3
    print (str(num1) + ' + '  + str(num2) + ' = ' + str(result))

  elif operation.lower() == 's':
    result = (max(num1, num2) - min(num1, num2)) - num3
    print (str(max(num1, num2)) + ' - '
           + str(min(num1, num2))  + ' = ' + str(result))

 

Geophysics Python sprint 2018 – day 2 and beyond, part I

In my last post I wrote about what I did on day one of the Geophysics sprint run by Agile Scientific in Santa Ana two weeks ago.

This post and the next one are about the project Volodymyr and I worked on during day two of the sprint, and continued to work on upon our return to Calgary.

We had read a great notebook by Alessandro Amato del Monte (I recommend browsing his Geophysical notes repo) showing how to reconstruct a velocity log from density with optimized alpha and beta parameters for the Inverse Gardner function, found via scipy.curve_fit.

Inspired by that, we set out with a dual goal:

  • First, we wanted to adapt Alessandro’s optimization idea so that it would work with Bruges‘ Inverse Gardner
  • Second, we wanted to adapt a function from some old work of mine to flag sections of the output velocity log with poor prediction; this would be useful to learn where alpha and beta may need to be tweaked because of changes in the rock lithology or fluid content

I’ll walk you through some of our work. Below are the two functions:

# Alessandro's simple inverse Gardner
def inv_gardner(rho, alpha, beta):
    return (rho/alpha)**(1/beta)

# Bruges' inverse Gardner
def inverse_gardner(rho, alpha=310, beta=0.25, fps=False):
    """
    Computes Gardner's density prediction from P-wave velocity.
    Args:
        rho (ndarray): Density in kg/m^3.
        alpha (float): The factor, 310 for m/s and 230 for fps.
        beta (float): The exponent, usually 0.25.
        fps (bool): Set to true for FPS and the equation will use the typical
            value for alpha. Overrides value for alpha, so if you want to use
            your own alpha, regardless of units, set this to False.
        Returns:
            ndarray: Vp estimate in m/s.
    """
    alpha = 230 if fps else alpha
    exponent = 1 / beta
    factor = 1 / alpha**exponent
    return factor * rho**exponent

They look similarly structured, and take the same arguments. We can test them by passing a single density value and alpha/beta pair.

inv_gardner(2000, 0.39, 0.23)
>>> 1.349846231542594e+16

inverse_gardner(2000, 0.39, 0.23)
>>> 1.3498462315425942e+16

Good. So the next logical step would be to define some model density and velocity data (shamelessly taken from Alessandro’s notebook, except we now use Bruges’ Gardner with S.I. units) and pass the data, and Bruges’ inverse Gardner toscipy.curve_fit to see if it does just work; could it be that simple?

# Make up random velocity and density with Bruges' direct Gardner
vp_test = numpy.linspace(1500, 5500)
rho_test = gardner(vp_test, 310, 0.25)
noise = numpy.random.uniform(0.1, 0.3, vp_test.shape)*1000
rho_test = rho_test + noise

The next block is only slightly different from Alessandro’s notebook. Instead of using all data, we splits both density and velocity into two pairs of arrays: a rho12 and vp2 to optimize foralpha and beta,  a rho1 for calculating “unknown” velocities vp_calc1 further down; the last one, v1, will be used just to show where the real data might have been had we not had to calculate it.

idx = np.arange(len(vp_test))
np.random.seed(3)
spl1 = np.random.randint(0, len(vp_test), 15)
spl2 = np.setxor1d(idx,spl1)
rho1 = rho_test[spl1]
rho2 = rho_test[spl2]
vp1= vp_test[spl1] # this we pretend we do not have 
vp2= vp_test[spl2]

Now, as in Alessandro’s notebook, we pass simple inverse Gardner function to scipy.curve_fit to find optimal alpha and beta parameters, and we printalpha and beta.

popt_synt2, pcov2 = scipy.curve_fit(inv_gardner,rho2, vp2)
print (popt_synt2)
>>> [3.31376056e+02 2.51257203e-01]

Those values seem reasonable, but just to be sure let’s calculate vp_calc1 from rho1 and plot everything to be sure.

vp_calc1 = inv_gardner(rho1, *popt_synt2)

# this is to show the fit line
rho_synt_fit=np.linspace(1, 3000, 50)
vp_synt_fit=inv_gardner(rho_synt_fit, *popt_synt2)

plt.figure(figsize=(10, 10))
plt.plot(rho2,vp2,'or', markersize = 10, label = "fitted points")
plt.plot(rho1,vp1,'ob', markersize = 10, alpha = 0.4, label = "calculated points")
plt.plot(rho1,vp1,'ok', markersize = 10, label = "withheld points")
plt.plot(rho_synt_fit, vp_synt_fit, '-r', lw=2, 
         label='Fit' r'$ V_p=(%.2f / \rho)^{1/%.2f}$' %(popt_synt2[0], 
                                                     popt_synt2[1]))
plt.xlabel('Density rho [kg/m^3]'), plt.xlim(1800, 3000)
plt.ylabel('Velocty Vp [m/s]'), plt.ylim(1000, 6000)
plt.grid()
plt.legend(loc='upper left')
plt.show()

plot

That looks great. Let’s now try the same using Bruges’ Inverse Gardner.

popt_synt2, pcov2 = curve_fit(inverse_gardner, rho2, vp2)
print (popt_synt2)
>>> [1.         0.29525991 1.        ]

That is odd, we do not get the same parameters; additionally, there’s this error message:

../scipy/optimize/minpack.py:794:
OptimizeWarning: Covariance of the parameters could not be estimated 
category=OptimizeWarning)

One possible explanation is that although both inv_gardner and inverse_gardner take three parameters, perhaps scipy.curve_fit does not know to expect it because in the latter alpha and betaare pre-assigned.

The workaround for this was to write a wrapper function to ‘map’ between the call signature of scipy.curve_fit and that of inverse_gardner so that it would be ‘communicated’ to the former explicitly.

def optimize_inverse_gardner(rho, alpha, beta):
    return inverse_gardner(rho, alpha=alpha, beta=beta)

popt_synt2, pcov2 = scipy.curve_fit(optimize_inverse_gardner, 
                                    rho2, vp2) 
print (popt_synt2)
>>> [3.31376060e+02 2.51257202e-01]

Which is the result we wanted.

In the next post we will apply this to some real data and show how to flag areas of poorer results.

Geophysics Python sprint 2018 – day 1

Last weekend I went to California to attend my first ever Python sprint, which was organized at MAZ Café con leche (Santa Ana) by Agile Scientific.

For me this event was a success in many respects. First of all, I wanted to spend some dedicated time working on an open source project, rather than chipping away at it once in a while. Also, participating in a project that was not my own seemed like a good way to challenge myself, by pushing me out of a zone of comfort. Finally, this was an opportunity to engage with other members of the Software Underground Slack team, some of which (for example Jesper Dramsch and Brendon Hall) I’ve known for some time but actually never met in person.

Please read about the Sprint in general on Matt Hall‘s blog post, Café con leche. My post is a short summary of what I did on the first day.

After a tasty breakfast, and at least a good hour of socializing, I sat at a table with three other people interested in working on Bruges (Agile’s Python library for Geophysics) : Jesper Dramsch, Adriana Gordon and Volodymyr Vragov.

As I tweeted that evening, we had a light-hearted start, but then we set to work.Screen Shot 2018-10-21 at 11.11.52 AM

While Adriana and Jesper tackled Bruges’ documentation, which was sorely needed, Volodymyr spent some hours on example notebooks from in-Bruges (a tour of Bruges), which needed fixing, and also on setting up our joint project for day 2 (more in the next post). For my part, I  put together a tutorial notebooks on how to use Bruges’ functions on wireline logs stored in a Pandas DataFrame. According to Matt, this is requested quite often, so it seemed like a good choice.

Let’s say that a number of wells are stored in a DataFrame with both a depth column, and a well name column, in addition to log curves.

The logic for operating on logs individually is this:
Split the wells in the DataFrame using groupby, then
for each well
for each of the logs of interest
do something using one of Bruges’ functions (for example apply a rolling mean)

The code to do that is surprisingly simple, once you’ve figure it out (I myself struggle often, and not little with Pandas at the outset of new projects).

One has to first create a list with the logs of interest, like so:

logs = ['GR', 'RHOB']

then define the length of the window for the rolling operation:

window = 9

finally, the logic above is applied as:

wells_sm=pd.DataFrame()

grouped=wells['well'].unique()

for well in grouped:    
  new_df=pd.DataFrame()   
    for log in logs:
      sm=br.filters.mean(np.array(wells[log][wells['well']==well]),
                         window)
        new_df[str(log) + '_sm']=sm 
    wells_sm=pd.concat([wells_sm, new_df])

where wells_sm is a temporary DataFrame for the filtered logs, which can be added back to the original DataFrame with:

wells_filtered = (np.concatenate((wells.values, 
                  wells_sm.values), axis=1))
cols = list(wells) + list(wells_sm)
wells_filtered_df = pd.DataFrame(wells_filtered, columns=cols)

You can work through the full example in the notebook.

Machine Learning in Python: classification using Support Vector Machines and Scikit-learn

This post is a short extract, with minor modifications,  from my recently released article on the check the CSEG Recorder Machine Learning in Geoscience V: Introduction to Classification with SVMs.

Understanding classification with Support Vector Machines

Support Vector Machines are a popular type of algorithm used in classification, which is the process of  “…identifying to which of a set of categories (sub-populations) a new observation belongs (source: Wikipedia).

In classification, the output variable is a category, for example ‘sand’, or ‘shale’, and the main task of the process is the creation of a dividing boundary between the classes. This boundary will be a line in a bi-dimensional space (only two features used to classify), a surface in a three dimensional space (three features), and a hyper-plane in a higher- dimensional space. In this article I will use interchangeably the terms hyper-plane, boundary, and decision surface.

Defining the boundary may sound like a simple task, especially with two features (a bidimensional scatterplot), but it underlines the important concept of generalization, as pointed out by Jake VanderPlas in his Introduction to Scikit-Learn, because ”… in drawing this separating line, we have learned a model which can generalize to new data: if you were to drop a new point onto the plane which is unlabeled, this algorithm could now predict…” the class it belongs to.

Let’s use a toy classification problem to understand in more detail how in practice SVMs achieve the class separation and find the hyperplane. In the figure below I show an idealized version (with far fewer points) of a Vp/Vs ratio versus P-impedance crossplot from Amato del Monte (2017, Seismic rock physics, tutorial on The Leading Edge).  I’ve added three possible boundaries (dashed lines) separating the two classes.

Each boundary is valid, but are they equally good? Well, for the SVM classifier, they are not because the classifier looks for the boundary with the largest distance from the nearest point in either of the classes.

These points, called Support Vectors, are the most representative of each class, and typically the most difficult to classify. They are also the only ones that matter; if a Support Vector is moved, the boundary will also move. However, if any other point is moved, provided that it is not moved into the margin or across the boundary, it would have no effect on the boundary. This makes SVM classifiers insensitive to outliers (points very far away from the rest of the points in their class and from the boundary) and also less memory intensive than other classifiers (for example, the perceptron). The process of finding this boundary is referred to as “maximizing the margin”, where the margin is a corridor with no data points between the boundary and the support vectors. The larger this buffer, the lower the generalization error; conversely, small margins are almost invariably associated with over-fitting. We will see more on this in a subsequent section.

So, to go back to the question, which of the three proposed boundaries is the best one (and by “best” I am referring to the one that will generalize better to unseen data)? Based on what we’ve learned so far, it would have to be the green boundary. Indeed, the orange one is so close to its support vectors (the two points circled with orange) that it leaves virtually no margin; the purple boundary is slightly better (the support vectors are the points circled with purple) but its margin is still quite small compared to the green boundary.

Maximizing the margin is the goal of the SVM classifier, and it is a constrained optimization problem. I refer interested readers to Hearst (1998, Support Vector Machines, IEEE Intelligent Systems); however, I will quote a definition from that paper (with reference to Figure 1 and accompanying text) as it yields further understanding: “… the optimal hyper-plane is orthogonal to the shortest line connecting the convex hulls of the two classes, and intersects it half way”.

In the inset in the figure, I zoomed closer to the 4 points near the green boundary; I’ve also drawn the convex hulls for the classes, the margin, and the shortest orthogonal line, which is bisected by the hyper-plane. I have selected (by hand) the best hyper-plane already (the green one), but if you can imagine rotating a line to span all possible orientations in the empty space close to the two classes without intersecting either of the hulls and find the one with the largest margin, you’ve just done quadratic optimization in your head. Moreover, you’ve turned a crossplot into a decision surface (quoted from Sebastian Thrun,  Intro to Machine Learning, Udacity 120 course).

If you are interested in learning more about Support Vector Machines in an intuitive way, and then how to try classification in practice (using Python and the Scikit-learn library), read the full article here, check the GitHub repo, then read How good is what? (blog post by Evan Bianco of Agile Scientific) for an example and DIY evaluation of  classifier performance.

How to fix rainbows and other bad colormaps using Python

Yep, colormaps again!

In my 2014 tutorial on The Leading Edge I showed how to Evaluate and compare colormaps (Jupyter notebook here). The article followed an extended series of posts (The rainbow is dead…long live the rainbow!) and then some more articles on rainbow-like colormap artifacts (for example here and here).

Last year, in a post titled Unweaving the rainbow, Matt Hall described our joint attempt to make a Python tool for recovering digital data from scientific images (and seismic sections in particular), without any prior knowledge of the colormap. Please check our GitHub repository for the code and slides, and watch Matt’s talk (very insightful and very entertaining) from the 2017 Calgary Geoconvention below:

One way to use the app is to get an image with unknown, possibly awful colormap, get the data, and re-plot it with a good one.

Matt followed up on colormaps with a more recent post titled No more rainbows! where he relentlessly demonstrates the superiority of perceptual colormaps for subsurface data. Check his wonderful Jupyter notebook.

So it might come as a surprise to some, but this post is a lifesaver for those that really do like rainbow-like colormaps. I discuss a Python method to equalize colormaps so as to render them perceptual.  The method is based in part on ideas from Peter Kovesi’s must-read paper – Good Colour Maps: How to Design Them – and the Matlab function equalisecolormap, and in part on ideas from some old experiments of mine, described here, and a Matlab prototype code (more details in the notebook for this post).

Let’s get started. Below is a time structure map for a horizon in the Penobscot 3D survey (offshore Nova Scotia, licensed CC-BY-SA by dGB Earth Sciences and The Government of Nova Scotia). Can you clearly identify the discontinuities in the southern portion of the map? No?

equalization_before_horizon

OK, let me help you. Below I am showing the map resulting from running a Sobel filter on the horizon. Penobscop_sobel

This is much better, right? But the truth is that the discontinuities are right there in the original data; some, however, are very hard to see because of the colormap used (nipy spectral, one of the many Matplotlib cmaps),  which introduces perceptual artifacts, most notably in the green-to-cyan portion.

In the figure below, in the first panel (from the top) I show a plot of the colormap’s Lightness value (obtained converting a 256-sample nipy spectral colormap from RGB to Lab) for each sample; the line is coloured by the original RGB colour. This erratic Lightness profile highlights the issue with this colormap: the curve gradient changes magnitude several times, indicating a nonuniform perceptual distance between samples.

In the second panel, I show a plot of the cumulative sample-to-sample Lightness contrast differences, again coloured by the original RGB colours in the colormap. This is the best plot to look at because flat spots in the cumulative curve correspond to perceptual flat spots in the map, which is where the discontinuities become hard to see. Notice how the green-to-cyan portion of this curve is virtually horizontal!

That’s it, it is simply a matter of very low, artificially induced perceptual contrast.

Solutions to this problem: the obvious one is to Other NOT use this type of colormaps (you can learn much about which are good perceptually, and which are not, in here); a possible alternative is to fix them. This can be done by re-sampling the cumulative curve so as to give it constant slope (or constant perceptual contrast). The irregularly spaced dots at the bottom (in the same second panel) show the re-sampling locations, which are much farther apart in the perceptually flat areas and much closer in the more dipping areas.

The third panel shows the resulting constant (and regularly sampled) cumulative Lightness contrast differences, and the forth and last the final Lightness profile which is now composed of segments with equal Lightness gradient (in absolute value).

equalization_pictorialHere is the structure map for the Penobscot horizon using the nipy spectum before and after equalization on top of each other, to facilitate comparison. I think this method works rather well, and it will allow continued use of their favourite rainbow and rainbow-like colormaps by hard core aficionados.

 

equalize

If you want the code to try the equalization, get the noteboook on GitHub.

Using Python to calculate northern hemisphere’s surface land coverage

Yesterday during my lunch break I was rather bored; it is unseasonably cold for the fall, even in Calgary, and a bit foggy too.
For something to do I browsed the Earth Science beta on Stack Exchange looking for interesting questions (as an aside, I encourage readers to look at the unanswered questions).
There was one that piqued my curiosity, “In the northern hemisphere only, what percentage of the surface is land?”.
It occurred to me that I could get together an answer using an equal area projection map and a few lines of Python code; and indeed in 15 minutes I whipped-up this workflow:

  • Invert and import this B/W image of equal area projection (Peters) for the Northern hemisphere (land = white pixels).
Peters_projection,_black_north

Source of original image (full globe): Wikimedia Commons

  • Store the image as a Numpy array.
  • Calculate the total number of pixels in the image array (black + white).
  • Calculate the total number of white pixels (1s) by summing the entire array. Black pixels (0s) will not contribute.
  • Calculate percentage of white pixels.

The result I got is 40.44%. Here’s the code:

# import libraries
import numpy as np
from skimage import io
from matplotlib import pyplot as plt

# import image
url = 'https://mycartablog.com/wp-content/uploads/2018/09/peters_projection_black_north.png'
north_equal_area = io.imread(url, as_grey=True)

# check the image
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(1, 1, 1)
ax.set_xticks([])
ax.set_yticks([])
plt.imshow(north_equal_area, cmap = 'gray');

# Do the calculations
r, c = np.shape(north_equal_area)
sz =  r*c
s = np.sum(north_equal_area)
print(np.round(s/sz*100, decimals=2))
>>> 40.44

As suggested in a comment to my initial answer, I run the same Python script for the entire globe and got the expected 30% land coverage:

# import image 
url = 'https://mycartablog.com/wp-content/uploads/2018/09/peters_projection_black_full.png'
equal_area = io.imread(url1, as_grey=True)

# Do the calculations 
r1, c1= np.shape(equal_area)
sz1 =  r1*c1
s1 = np.sum(equal_area)
print(np.round(s1/sz1*100, decimals=2))
>>> 30.08