Modernizing Python Code in the AI Era: A Different Kind of Learning

Featured

A few years ago I wrote about advancing my Python coding skills after working through a couple of chapters from Daniel Chen’s excellent book Pandas for Everyone. In that post I showed how I improved code I’d written in 2018 for the SEG Machine Learning contest. The original code used unique() to get lists of well names, then looped through with list comprehensions to calculate flagged samples and proportions. The 2020 version replaced all that with groupby() and apply(), making it much more compact and Pythonic. For example, where I’d written a list comprehension like [result_a.loc[result_a.zone==z,'flag'].sum() for z in zones_a], I could now write simply result_a.groupby('zone', sort=False).flag.sum().values. The runtime also improved – from 86ms down to 52ms. I remember being quite happy with how much cleaner and more readable the code turned out, and how the learning from those two chapters made an immediate practical difference.

Recently, I had to modernize the Busting bad colormaps Panel app, which I built back in 2020 to demonstrate colormap distortion artifacts (something that – as you know – I care a lot about). The app had been deliberately frozen in time – I’d pinned specific library versions in the environment file because I knew things would eventually become obsolete, and I wanted it to stay functional for as long as possible without having to constantly fix compatibility issues.

But some of those issues had finally caught up with me, and the app had ben down for soem time. Last fall, working with Github copilot, I fixed some matplotlib 3.7+ compatibility problems – replace the deprecated cm.register_cmap() with plt.colormaps.register(), fix anrgb2gray error, and resolve a ValueError in the plotting functions.

But the deployment was also broken. In 2021, mybinder.org had switched to JupyterLab as the default interface, changing how apps needed to be deployed. Panel developers had to adapt their code to work with this new setup. The old Panel server URL pattern no longer worked. I tried to figure out the new URL pattern by browsing through the Binder documentation, but I couldn’t make sense of it and failed miserably. It was a short-lived effort that pushed me toward trying something different: full-on coding with Claude Opus 4.5 using Copilot in VSCode.

That’s what allowed me, this month, to complete the modernization process (though honestly, we still haven’t fully sorted out a Binder timeout issue).

A step back to 2020: Building the app from scratch

When I originally built the colormap app, I coded everything myself, experimenting with Panel features I’d never used before, figuring out the supporting functions and visualizations. I also got very good advice from the Panel Discourse channel when I got stuck.

One issue I worked on was getting the colormap collection switching to behave properly. After the first collection switch, the Colormaps dropdown would update correctly, but the Collections dropdown would become non-responsive. With help from experts on the Discourse channel, I figured out how to fix it using Panel’s param.Parameterized class structure.

2026: Working with Claude

The second, and hardest part of the modernization was done almost entirely by Claude Opus. Here’s what that looked like in practice:

Binder deployment: Claude independently figured out the new JupyterLab URL pattern (?urlpath=lab/tree/NotebookName.ipynb instead of the old ?urlpath=%2Fpanel%2FNotebookName). Only later, when fact-checking for this post, did we discover the history of Binder’s 2021 switch to JupyterLab and how Panel had to adapt. This helped, though we’re still working through some timeout issues.

Environment upgrade: Claude upgraded to Python 3.12 and Panel 1.8.5, bringing everything up to modern versions. The key packages are now Panel 1.8.5, param 2.3.1, and bokeh 3.8.1.

Code modernization: Claude spotted and fixed deprecated API calls – the style parameter for Panel widgets became styles.

Collection switching – Claude’s breakthrough: This was Claude’s biggest solo contribution. The collection switching broke during the update, and Claude independently diagnosed that the class-based param.Parameterized approach that had worked in Panel 0.x wasn’t reliable in Panel 1.x. Without me having to guide the solution, Claude figured out how to rewrite it using explicit widgets with param.watch callbacks.

The comparison shows the change:

The new approach uses explicit widget objects with callback functions, which works more reliably in Panel 1.x than the class-based parameterized approach.

New features: Claude integrated two new colormap collections I’d been wanting to add for years – Fabio Crameri’s scientific colormaps (cmcrameri) and Kristen Thyng’s cmocean colormaps. That brought the total from 3 to 5 colormap collections.

Here are examples of the app showing each of the new collections:

The app testing of cmocean deep colormap
The app testing of Crameri’s batlow colormap

Documentation: Claude updated the README with detailed step-by-step Binder instructions, added a troubleshooting section, and created a table documenting all five colormap collections.

I provided the requirements and guidance throughout, but I almost never looked at the implementation details – what I’ve taken to calling the “bits and bobs” of the code. I focused on what I needed to happen, Claude figured out how to make it happen.

What changed (and what didn’t)

I still understand what the code does conceptually. I can read it, review it, check that it’s correct. I know why we needed to move from Parameterized classes to explicit widgets, and I understand the reactive programming model. But I didn’t write those lines myself.

The work happens at a different level now. I bring the domain expertise (what makes a good colormap visualization), the requirements (needs to deploy on Binder, needs these specific colormap collections), and the quality judgment (that widget behavior isn’t quite right). Claude brings the implementation knowledge, awareness of modern best practices, and the ability to quickly adapt code patterns to new frameworks.

This is really different from my 2020 experience. Back then, working through those Pandas patterns taught me techniques I could apply to other projects. Now, I’m learning what becomes possible when you can clearly articulate requirements and delegate the implementation.

The honest trade-off

There’s a trade-off here, and I’m trying to be honest about it. In 2020, working through the Panel widget patterns taught me things that stuck. In 2026, I got working, modernized code in a fraction of the time, but with less hands-on knowledge of Panel 1.x internals.

For this particular project, that trade-off made sense. I needed a working app deployed and accessible, not deep expertise in Panel migration patterns. But I’m conscious that I’m optimizing for different outcomes now: shipping features fast versus building deep technical understanding through hands-on work.

What this means going forward

After years of writing code line by line, this new way of working feels both efficient and different. I got more done in a couple of hours than I might have accomplished in several weeks working solo. The app is modernized, deployed, working better than ever, and even has new features I’d been wanting to add for years.

This has been a gamechanger for how I work. I still do the work that matters most to me: seeing the tool gap, coming up with the vision, iteratively prototyping to flesh out what I actually need. That’s substantial work, and it’s mine. But after that initial phase? A lot of the implementation will be done with Claude. The app is done and it’s great, and I know this is the path forward for me.

References

Chen, D.Y. (2018). Pandas for Everyone: Python Data Analysis. Addison-Wesley Professional.

Crameri, F. (2018). Geodynamic diagnostics, scientific visualisation and StagLab 3.0. Geoscientific Model Development, 11, 2541-2562. https://www.fabiocrameri.ch/colourmaps/

Niccoli, M. (2020). Keep advancing your Python coding skills. MyCarta Blog. https://mycartablog.com/2020/10/22/keep-advancing-your-python-coding-skills/

Thyng, K.M., Greene, C.A., Hetland, R.D., Zimmerle, H.M., and DiMarco, S.F. (2016). True colors of oceanography: Guidelines for effective and accurate colormap selection. Oceanography, 29(3), 9-13. https://matplotlib.org/cmocean/


Try the app yourself: The modernized colormap distortion app is available on GitHub and you can run it in Binder without installing anything.

Geophysics Python sprint 2018 – day 2 and beyond, part II

In the last post I wrote about what Volodymyr and I worked on during a good portion of day two of the sprint in October, and continued to work on upon our return to Calgary.

In addition to that I also continued to work on a notebook example, started in day one, demonstrating on how to upscale sonic and density logs from more than one log at a time using Bruges ‘ backusand Panda’s groupby. This will be the focus of a future post.

The final thing I did was to write, and test an error_flag function for Bruges. The function calculates the difference between a predicted and a real curve; it flags errors in prediction if the difference between the curves exceeds a user-defined distance (in standard deviation units) from the mean difference. Another option available is to check whether the curves have opposite slopes (for example one increasing, the other decreasing within a specific interval). The result is a binary error log that can then be used to generate QC plots, to evaluate the performance of the prediction processes in a more (it is my hope) insightful way.

The inspiration for this stems from a discussion over coffee I had 5 or 6 years ago with Glenn Larson, a Geophysicist at Devon Energy, about the limitations of (and alternatives to) using a single global score when evaluating the result of seismic inversion against wireline well logs (the ground truth). I’d been holding that in the back of my mind for years, then finally got to it last Fall.

Summary statistics can also be calculated by stratigraphic unit, as demonstrated in the accompanying Jupyter Notebook.

Geophysics Python sprint 2018 – day 2 and beyond, part I

In my last post I wrote about what I did on day one of the Geophysics sprint run by Agile Scientific in Santa Ana two weeks ago.

This post and the next one are about the project Volodymyr and I worked on during day two of the sprint, and continued to work on upon our return to Calgary.

We had read a great notebook by Alessandro Amato del Monte (I recommend browsing his Geophysical notes repo) showing how to reconstruct a velocity log from density with optimized alpha and beta parameters for the Inverse Gardner function, found via scipy.curve_fit.

Inspired by that, we set out with a dual goal:

  • First, we wanted to adapt Alessandro’s optimization idea so that it would work with Bruges‘ Inverse Gardner
  • Second, we wanted to adapt a function from some old work of mine to flag sections of the output velocity log with poor prediction; this would be useful to learn where alpha and beta may need to be tweaked because of changes in the rock lithology or fluid content

I’ll walk you through some of our work. Below are the two functions:

# Alessandro's simple inverse Gardner
def inv_gardner(rho, alpha, beta):
    return (rho/alpha)**(1/beta)

# Bruges' inverse Gardner
def inverse_gardner(rho, alpha=310, beta=0.25, fps=False):
    """
    Computes Gardner's density prediction from P-wave velocity.
    Args:
        rho (ndarray): Density in kg/m^3.
        alpha (float): The factor, 310 for m/s and 230 for fps.
        beta (float): The exponent, usually 0.25.
        fps (bool): Set to true for FPS and the equation will use the typical
            value for alpha. Overrides value for alpha, so if you want to use
            your own alpha, regardless of units, set this to False.
        Returns:
            ndarray: Vp estimate in m/s.
    """
    alpha = 230 if fps else alpha
    exponent = 1 / beta
    factor = 1 / alpha**exponent
    return factor * rho**exponent

They look similarly structured, and take the same arguments. We can test them by passing a single density value and alpha/beta pair.

inv_gardner(2000, 0.39, 0.23)
>>> 1.349846231542594e+16

inverse_gardner(2000, 0.39, 0.23)
>>> 1.3498462315425942e+16

Good. So the next logical step would be to define some model density and velocity data (shamelessly taken from Alessandro’s notebook, except we now use Bruges’ Gardner with S.I. units) and pass the data, and Bruges’ inverse Gardner toscipy.curve_fit to see if it does just work; could it be that simple?

# Make up random velocity and density with Bruges' direct Gardner
vp_test = numpy.linspace(1500, 5500)
rho_test = gardner(vp_test, 310, 0.25)
noise = numpy.random.uniform(0.1, 0.3, vp_test.shape)*1000
rho_test = rho_test + noise

The next block is only slightly different from Alessandro’s notebook. Instead of using all data, we splits both density and velocity into two pairs of arrays: a rho12 and vp2 to optimize foralpha and beta,  a rho1 for calculating “unknown” velocities vp_calc1 further down; the last one, v1, will be used just to show where the real data might have been had we not had to calculate it.

idx = np.arange(len(vp_test))
np.random.seed(3)
spl1 = np.random.randint(0, len(vp_test), 15)
spl2 = np.setxor1d(idx,spl1)
rho1 = rho_test[spl1]
rho2 = rho_test[spl2]
vp1= vp_test[spl1] # this we pretend we do not have 
vp2= vp_test[spl2]

Now, as in Alessandro’s notebook, we pass simple inverse Gardner function to scipy.curve_fit to find optimal alpha and beta parameters, and we printalpha and beta.

popt_synt2, pcov2 = scipy.curve_fit(inv_gardner,rho2, vp2)
print (popt_synt2)
>>> [3.31376056e+02 2.51257203e-01]

Those values seem reasonable, but just to be sure let’s calculate vp_calc1 from rho1 and plot everything to be sure.

vp_calc1 = inv_gardner(rho1, *popt_synt2)

# this is to show the fit line
rho_synt_fit=np.linspace(1, 3000, 50)
vp_synt_fit=inv_gardner(rho_synt_fit, *popt_synt2)

plt.figure(figsize=(10, 10))
plt.plot(rho2,vp2,'or', markersize = 10, label = "fitted points")
plt.plot(rho1,vp1,'ob', markersize = 10, alpha = 0.4, label = "calculated points")
plt.plot(rho1,vp1,'ok', markersize = 10, label = "withheld points")
plt.plot(rho_synt_fit, vp_synt_fit, '-r', lw=2, 
         label='Fit' r'$ V_p=(%.2f / \rho)^{1/%.2f}$' %(popt_synt2[0], 
                                                     popt_synt2[1]))
plt.xlabel('Density rho [kg/m^3]'), plt.xlim(1800, 3000)
plt.ylabel('Velocty Vp [m/s]'), plt.ylim(1000, 6000)
plt.grid()
plt.legend(loc='upper left')
plt.show()

That looks great. Let’s now try the same using Bruges’ Inverse Gardner.

popt_synt2, pcov2 = curve_fit(inverse_gardner, rho2, vp2)
print (popt_synt2)
>>> [1.         0.29525991 1.        ]

That is odd, we do not get the same parameters; additionally, there’s this error message:

../scipy/optimize/minpack.py:794:
OptimizeWarning: Covariance of the parameters could not be estimated 
category=OptimizeWarning)

One possible explanation is that although both inv_gardner and inverse_gardner take three parameters, perhaps scipy.curve_fit does not know to expect it because in the latter alpha and betaare pre-assigned.

The workaround for this was to write a wrapper function to ‘map’ between the call signature of scipy.curve_fit and that of inverse_gardner so that it would be ‘communicated’ to the former explicitly.

def optimize_inverse_gardner(rho, alpha, beta):
    return inverse_gardner(rho, alpha=alpha, beta=beta)

popt_synt2, pcov2 = scipy.curve_fit(optimize_inverse_gardner, 
                                    rho2, vp2) 
print (popt_synt2)
>>> [3.31376060e+02 2.51257202e-01]

Which is the result we wanted.

In the next post we will apply this to some real data and show how to flag areas of poorer results.

Geophysics Python sprint 2018 – day 1

Last weekend I went to California to attend my first ever Python sprint, which was organized at MAZ Café con leche (Santa Ana) by Agile Scientific.

For me this event was a success in many respects. First of all, I wanted to spend some dedicated time working on an open source project, rather than chipping away at it once in a while. Also, participating in a project that was not my own seemed like a good way to challenge myself, by pushing me out of a zone of comfort. Finally, this was an opportunity to engage with other members of the Software Underground Slack team, some of which (for example Jesper Dramsch and Brendon Hall) I’ve known for some time but actually never met in person.

Please read about the Sprint in general on Matt Hall‘s blog post, Café con leche. My post is a short summary of what I did on the first day.

After a tasty breakfast, and at least a good hour of socializing, I sat at a table with three other people interested in working on Bruges (Agile’s Python library for Geophysics) : Jesper Dramsch, Adriana Gordon and Volodymyr Vragov.

As I tweeted that evening, we had a light-hearted start, but then we set to work.

While Adriana and Jesper tackled Bruges’ documentation, which was sorely needed, Volodymyr spent some hours on example notebooks from in-Bruges (a tour of Bruges), which needed fixing, and also on setting up our joint project for day 2 (more in the next post). For my part, I  put together a tutorial notebooks on how to use Bruges’ functions on wireline logs stored in a Pandas DataFrame. According to Matt, this is requested quite often, so it seemed like a good choice.

Let’s say that a number of wells are stored in a DataFrame with both a depth column, and a well name column, in addition to log curves.

The logic for operating on logs individually is this:
Split the wells in the DataFrame using groupby, then
for each well
for each of the logs of interest
do something using one of Bruges’ functions (for example apply a rolling mean)

The code to do that is surprisingly simple, once you’ve figure it out (I myself struggle often, and not little with Pandas at the outset of new projects).

One has to first create a list with the logs of interest, like so:

logs = ['GR', 'RHOB']

then define the length of the window for the rolling operation:

window = 9

finally, the logic above is applied as:

wells_sm=pd.DataFrame()

grouped=wells['well'].unique()

for well in grouped:    
  new_df=pd.DataFrame()   
    for log in logs:
      sm=br.filters.mean(np.array(wells[log][wells['well']==well]),
                         window)
        new_df[str(log) + '_sm']=sm 
    wells_sm=pd.concat([wells_sm, new_df])

where wells_sm is a temporary DataFrame for the filtered logs, which can be added back to the original DataFrame with:

wells_filtered = (np.concatenate((wells.values, 
                  wells_sm.values), axis=1))
cols = list(wells) + list(wells_sm)
wells_filtered_df = pd.DataFrame(wells_filtered, columns=cols)

You can work through the full example in the notebook.