Geophysics Python sprint 2018 – day 1

Last weekend I went to California to attend my first ever Python sprint, which was organized at MAZ Café con leche (Santa Ana) by Agile Scientific.

For me this event was a success in many respects. First of all, I wanted to spend some dedicated time working on an open source project, rather than chipping away at it once in a while. Also, participating in a project that was not my own seemed like a good way to challenge myself, by pushing me out of a zone of comfort. Finally, this was an opportunity to engage with other members of the Software Underground Slack team, some of which (for example Jesper Dramsch and Brendon Hall) I’ve known for some time but actually never met in person.

Please read about the Sprint in general on Matt Hall‘s blog post, Café con leche. My post is a short summary of what I did on the first day.

After a tasty breakfast, and at least a good hour of socializing, I sat at a table with three other people interested in working on Bruges (Agile’s Python library for Geophysics) : Jesper Dramsch, Adriana Gordon and Volodymyr Vragov.

As I tweeted that evening, we had a light-hearted start, but then we set to work.Screen Shot 2018-10-21 at 11.11.52 AM

While Adriana and Jesper tackled Bruges’ documentation, which was sorely needed, Volodymyr spent some hours on example notebooks from in-Bruges (a tour of Bruges), which needed fixing, and also on setting up our joint project for day 2 (more in the next post). For my part, I  put together a tutorial notebooks on how to use Bruges’ functions on wireline logs stored in a Pandas DataFrame. According to Matt, this is requested quite often, so it seemed like a good choice.

Let’s say that a number of wells are stored in a DataFrame with both a depth column, and a well name column, in addition to log curves.

The logic for operating on logs individually is this:
Split the wells in the DataFrame using groupby, then
for each well
for each of the logs of interest
do something using one of Bruges’ functions (for example apply a rolling mean)

The code to do that is surprisingly simple, once you’ve figure it out (I myself struggle often, and not little with Pandas at the outset of new projects).

One has to first create a list with the logs of interest, like so:

logs = ['GR', 'RHOB']

then define the length of the window for the rolling operation:

window = 9

finally, the logic above is applied as:

wells_sm=pd.DataFrame()

grouped=wells['well'].unique()

for well in grouped:    
  new_df=pd.DataFrame()   
    for log in logs:
      sm=br.filters.mean(np.array(wells[log][wells['well']==well]),
                         window)
        new_df[str(log) + '_sm']=sm 
    wells_sm=pd.concat([wells_sm, new_df])

where wells_sm is a temporary DataFrame for the filtered logs, which can be added back to the original DataFrame with:

wells_filtered = (np.concatenate((wells.values, 
                  wells_sm.values), axis=1))
cols = list(wells) + list(wells_sm)
wells_filtered_df = pd.DataFrame(wells_filtered, columns=cols)

You can work through the full example in the notebook.