Why Wind Power Scales as v³: An Intuition Built from First Principles

Featured

and a regulator’s motivation for caring

The Three Methods: A Regulator’s Ladder for Evaluating Energy Claims

Suppose a developer submits a proposal for the Middle Bank area on the Scotian Shelf: 926 turbines, each rated at 15 MW, at 4.2D spacing (where D is the rotor diameter—240 m for this turbine, so 4.2D ≈ 1,000 m between towers), claiming annual energy production (AEP) of 60 TWh. Is that plausible?

You have three increasingly sophisticated ways to check.

METHOD 1 Nameplate (30 seconds, back of envelope)

The simplest possible estimate:

AEP = N × Prated × 8760 hours × CF

Where N is the number of turbines, Prated is each turbine’s maximum output (15 MW here), 8760 is the hours in a year, and CF is the capacity factor—the fraction of rated output the turbine actually produces over a year. For offshore wind, CF typically falls between 0.40 and 0.55.

Check the units: N is dimensionless (a count), Prated is in MW, 8760 is in hours, and CF is dimensionless (a fraction). So AEP comes out in MW × hours = MWh—or equivalently, dividing by 10&sup6, TWh. It’s just (number of turbines) × (power per turbine) × (hours per year) × (fraction of time at full output).

For our developer’s claim:

AssumptionAEP
CF = 0.40 (conservative)926 × 15 MW × 8760 × 0.40 = 48.7 TWh
CF = 0.50 (typical offshore)926 × 15 MW × 8760 × 0.50 = 60.8 TWh
CF = 0.55 (optimistic)926 × 15 MW × 8760 × 0.55 = 66.9 TWh

The developer’s 60 TWh falls in range—right at a typical offshore CF. Not obviously wrong. But this tells you nothing about whether CF = 0.40 or 0.55 is appropriate for this site. The capacity factor is doing all the work, and you borrowed it from industry averages rather than deriving it from the actual wind resource.

What the nameplate method hides

It treats CF as an input. But CF is an output—it’s determined by the wind speed distribution, the turbine’s power curve, wake interactions, and availability. It’s the answer, not the question.

METHOD 2 Ginsberg Swept Area (5 minutes, needs mean wind speed)

If you know the site’s mean wind speed, you can estimate power from first principles:

Pavailable = ½ × ρ × A × vavg³

Where ρ is air density (~1.225 kg/m³ at sea level), A is the rotor’s swept area (π × D² / 4), and vavg is the mean wind speed at hub height.

The derivation. Consider a cylinder of air passing through the rotor in time t. Its length is v × t, so its volume is A × v × t, and its mass is ρ × A × v × t. The kinetic energy of that air is:

KE = ½ × m × v² = ½ × (ρ × A × v × t) × v² = ½ × ρ × A × v³ × t

Divide both sides by t to get power (energy per unit time):

P = KE / t = ½ × ρ × A × v³

That’s where the v³ comes from: v once from the mass flow rate (how fast air arrives), v² from the kinetic energy per unit mass (how much energy it carries). Ginsberg (2019) walks through this same derivation; the full physical reasoning for why this matters is developed in The Starting Point below.

But there’s a catch. Wind speed varies, and because power scales as v³, the average of the cubes is not the cube of the average. A site with vavg = 9 m/s but gusty conditions produces more energy than a site with a steady 9 m/s, because the high-wind moments contribute disproportionately (v³ is convex).

Ginsberg handles this with the Energy Pattern Factor (EPF)—a multiplier that corrects the mean-cubed estimate for the actual shape of the wind speed distribution:

Mean Power Density = ½ × ρ × EPF × vavg³

For Rayleigh-distributed winds (shape factor k = 2), EPF ≈ 1.91. This corrects for the distribution without requiring the full wind record. Then to get AEP:

AEP = Mean Power Density × A × 8760 × ηturbine × ηavailability

Where ηturbine accounts for the turbine’s conversion efficiency (Cp, the power coefficient—capped at 59.3% by the Betz limit, which is the theoretical maximum any turbine can extract from the wind) and ηavailability for downtime.

This is more physical—you’re deriving CF from the wind resource rather than assuming it. For the Scotian Shelf, with mean winter wind of 9.3 m/s and summer 7.1 m/s at hub height, the swept area method produces a site-specific estimate rather than borrowing a generic CF from global averages.

What the swept area method hides

It treats each turbine as if it sees the undisturbed wind. In reality, downstream turbines sit in the wakes of upstream ones. A 926-turbine farm at 4.2D spacing will have interior turbines seeing 70–80% of the freestream velocity. Since power scales as v³, that 20–30% velocity deficit translates to 50–65% power loss for those turbines.

METHOD 3 Wake Modeling (hours to days, needs wind distribution + layout)

This is PyWake territory—PyWake is an open-source wind farm simulation tool (developed by DTU Wind Energy) that models how upstream turbines reduce wind speed for downstream ones. You specify the turbine layout, the wind climatology (direction + speed distribution), and a wake deficit model. The simulation propagates wakes through the farm, computing the actual wind speed each turbine sees, and integrates over all wind conditions to produce AEP.

Here’s where v³ bites hardest. Consider a turbine sitting 5D downstream of another in a 9.3 m/s winter wind. The Bastankhah–Porté-Agel Gaussian (bell-curve shaped) deficit model—used in Ma et al. (2025)—predicts the centerline velocity deficit from the wake expansion rate (k* = 0.04, typical for offshore low-turbulence conditions) and the upstream turbine’s thrust coefficient (CT—a measure of how hard the rotor pushes back against the wind; CT ≈ 0.78 for the IEA 15 MW reference turbine—the benchmark design used in the Ma et al. study—at 9.3 m/s, which is below rated speed). At 5D downstream, the model gives a 28% velocity deficit.

Your first instinct might be: 28% less wind, 28% less power. But the cubic says otherwise:

  • Freestream turbine sees 9.3 m/s → P ∝ (9.3)³ = 804
  • Wake-affected turbine sees 6.7 m/s → P ∝ (6.7)³ = 301

That’s 63% less power, not 28%. The cubic more than doubles the impact of the velocity deficit. And in a dense 926-turbine farm, most interior turbines are wake-affected.

Wake losses for the Scotian Shelf scenarios range from 19% (sparse layout, winter) to 46% (dense layout, summer), according to the Ma et al. (2025) simulations. For Middle Bank specifically, the losses are 22% in winter and 41% in summer. At the high end, nearly half the energy you’d expect from nameplate calculations never materializes—a correction too large for any regulator to wave through on trust.

This is what PyWake computes: the v³-amplified impact of every upstream turbine on every downstream one, integrated over all wind directions and speeds across the full year.

The Ladder

MethodInputWhat it capturesWhat it misses
NameplateN, Prated, assumed CFQuick plausibility checkEverything about the site
Ginsbergvavg, A, EPFWind resource physics, v³Wake interactions, layout effects
PyWakev(t,θ) (speed × direction), layout, turbine curvesWake losses, spacing trade-offs(This is the target capability)

Each method reveals a limitation that motivates the next. And the single thread connecting all three is why power scales as v³—because understanding the cubic relationship tells you why the nameplate method hides so much, why the EPF correction exists, and why wake-induced velocity deficits are so devastating.

That’s what this document builds.


The Starting Point

The power available in wind passing through a turbine’s swept area is:

P = ½ × ρ × A × v³

Where:

  • ρ = air density (kg/m³)

  • A = swept area (m²)

  • v = wind velocity (m/s)

The formula is easy to derive—v appears in the mass flow rate (ρ × A × v) and v² appears in kinetic energy (½ × m × v²), so power scales as v³. The math is straightforward.

What’s less obvious is why we work with power at all. Why not go directly from energy density (½ × ρ × v²) to annual energy production? Why the detour through instantaneous power?

This document develops an intuition for that question.


The Cylinder Mental Model

Imagine standing at a wind turbine and watching air flow through the rotor over an entire year. You could visualize this as an impossibly long cylinder:

  • Cross-section = the swept area (π × D² / 4)

  • Length = the total distance air has traveled past the rotor over the year

    If the wind blew at a constant 10 m/s for a year, your cylinder would be about 315 million meters long (10 m/s × 31.5 million seconds).

To find the total energy, you might try:

Energy = (energy density) × (volume)

The energy density of moving air is ½ × ρ × v² (joules per cubic meter). The volume is A × L, where L is the cylinder length. Multiply and done?

Not quite. Here’s where it gets awkward.


The Awkwardness: A Cylinder That Won’t Cooperate

The wind doesn’t blow at a constant speed. Your cylinder is made of “slices”—some added during high-wind moments, some during calm. Each slice has its own energy density depending on what v was when that slice passed through.

You might still try to salvage the simple approach:

Energy = (average energy density) × (total volume)

But you can’t cleanly separate these terms.

When v is high:

  • The cylinder extends faster (more meters of air arriving per second)

  • Those slices are energy-rich (½ × ρ × v² is large)


    When v is low:
  • The cylinder extends slowly

  • Those slices are energy-poor

    The high-v slices are both thicker (more length added per unit time) and richer (more joules per cubic meter). The low-v slices are both thinner and poorer.

This coupling wrecks any attempt at simple averaging. If you average energy density across time, you underweight the thick, juicy slices. If you try to average across volume, you need v for both terms—energy density (½ × ρ × v²) AND slice thickness (v × dt). Both depend on v, and v is different for every slice. You’re back to needing the full wind record anyway.

Total energy ≟ ½ρ · v̄² · A · v̄ · t = ½ρA · v̄³ · t    ← WRONG

Why wrong? Because the cube amplifies differences. A gust at 12 m/s contributes (12)³ = 1,728 to the energy integral, while a lull at 6 m/s contributes only (6)³ = 216. The gust is worth 8× the lull, not 2×. Averaging the wind speed before cubing it buries this asymmetry.

Energy = Σ  ½ρA · v(t)³ · Δt    ← sum over each hour
The regulator’s takeaway

When a developer reports “mean wind speed 9.3 m/s,” that single number is not enough to evaluate their AEP claim. Two sites with identical means but different variability will produce different amounts of energy—and the gustier site wins, thanks to the v³ amplification.

A Geophysics Parallel: Degrees of Entanglement

To see why this is so stubborn, consider a spectrum of cases from reservoir geophysics:

Core data (you can measure each property independently):

In a layered reservoir, each bed has a permeability (k) and a thickness (h). From core samples, you measure them separately—ruler for thickness, core plug for permeability. A thick layer can have low permeability; a thin layer can have high permeability. They’re independent. Averaging works (arithmetic, harmonic, or geometric depending on flow geometry).

Seismic inversion (the properties are independent, but the measurement tangles them):

Now try to estimate k and h from seismic reflection data. You don’t see them separately anymore. The seismic response convolves them—a thick low-k layer might look like a thin high-k layer. They’re physically independent, but entangled in the measurement. You can try to untangle them, but it’s hard.

Wind (the two properties are the same variable):

Energy density is ½ × ρ × v². Slice thickness is v × dt. Both ARE v. There’s no underlying separation to recover. It’s not that the measurement convolves them—they’re the same variable wearing two hats.

CaseProperty vs. WeightSeparable?
Core datak and h independent, measured separatelyYes
Seismic inversionk and h independent, convolved in measurementHard
Wind½ρv² and v×dt are both vImpossible—nothing to untangle

Wind sits at the extreme end: the entanglement isn’t observational, it’s definitional.


The Root Cause: The Carrier IS the Cargo

Most energy delivery systems have a carrier and a cargo that are independent.

The Truck and Coal Analogy

Imagine you’re receiving coal deliveries by truck. Two things determine how much energy arrives per hour:

  1. How fast the trucks arrive (delivery rate)

  2. How much energy is in each truckload (energy content)

    These are independent. You could:
  • Speed up the trucks without changing the coal quality

  • Switch to higher-grade coal without changing the delivery schedule

  • Double one while halving the other

The truck’s velocity has nothing to do with the coal’s BTU content. Two separate knobs, two separate decisions.

Concrete examples of this independence:

  • Slow trucks, high-grade coal: One delivery per week, but it’s anthracite. Few arrivals, lots of BTUs per ton.

  • Fast trucks, low-grade coal: Ten deliveries per day, but it’s lignite. Frequent arrivals, few BTUs per ton.

Both are perfectly coherent. You could even tune them to deliver the same total energy per month. The truck schedule and the coal grade are set by different people making different decisions—the dispatcher and the mine, say.

This independence is typical of energy delivery systems:

SystemCarrierCargo
Coal truckTruck (speed adjustable)Coal (energy content independent of truck speed)
Power lineWire (current adjustable)Electrons (voltage adjustable independently)
Gas pipelinePipe flow (rate adjustable)Gas (BTU content independent of flow rate)

You can speed up delivery without changing what’s being delivered. Two knobs.

Wind Breaks This Independence

Wind is different. There are no trucks. The air’s motion delivers it to you, and the air’s motion is the energy. There is no “air truck” bringing “energy cargo.” The velocity that transports air to your rotor is the same velocity that determines how much kinetic energy that air contains.

Think about what would need to be true for wind to behave like coal trucks: you’d need slow-moving air that somehow contained lots of kinetic energy, or fast-moving air with little energy. That’s a contradiction. The air’s kinetic energy is ½ × m × v², where v is the same velocity that’s bringing it to you.

The impossible wind analogues would be:

  • Slow breeze carrying “anthracite air” (high energy density)

  • Fast wind carrying “lignite air” (low energy density)

These don’t exist. There’s no mine selecting the air’s energy grade independently of the velocity that delivers it. The energy grade is v². The dispatcher and the mine are the same person, turning the same knob.

Coal trucks have two degrees of freedom. Wind has one.

One phenomenon, two consequences. One knob.

A Bridge Analogy: The Bullet Conveyor Belt

Imagine a conveyor belt covered with bullets, all pointing at a target. The bullets are arranged in rows across the belt. When they reach the end, they fly off and hit the target.

You have two ways to increase the damage:

Add more bullets per row (wider rows):

Each meter of belt carries more bullets. More bullets hit the target per second. But each bullet hits just as hard as before. Double the bullets per row, double the damage. Simple.

Speed up the belt:

Here’s where it gets strange. Speeding up the belt does two things at once:

  • Bullets arrive faster (more hits per second)

  • Each bullet is moving faster when it flies off, so it hits harder (damage per bullet goes up)

You can’t get one without the other. There’s no way to make bullets arrive faster while keeping them gentle, or make them hit harder while keeping arrivals slow. One dial, two consequences.

That’s wind.

Air density and rotor size are like bullets per row—you can adjust them separately. But wind speed is like belt speed. When v goes up:

  • More air arrives per second (delivery rate, proportional to v)

  • Each parcel of air carries more punch (energy density, proportional to v²)

Multiply them together: v × v² = v³.

The belt speed controls both how often bullets arrive and how hard they hit. Wind speed controls both how fast air arrives and how much energy it carries. One knob. Two consequences. That’s where the cubic comes from.

This is why v appears twice in the power equation:

  • Delivery rate (volume flow): A × v

  • Energy content (energy density): ½ × ρ × v²

Multiply them: ½ × ρ × A × v³

The v² and the v aren’t two separate variables that happen to move together. They’re two aspects of a single physical reality — one velocity, showing up twice in the equation for two different physical reasons. You cannot crank up the delivery rate while holding energy content fixed. The air delivers itself.


The Firehose Intuition

You’re standing in front of a firehose. Someone doubles the water velocity.

You don’t get hit by faster water AND more water as if those were two separate decisions. There’s one dial: velocity. Turning it up necessarily does both:

  • Each drop hits harder (v²)—because it’s moving faster

  • More drops arrive per second (v)—because they’re moving faster

Same cause, two consequences.

Total punishment: 4 × 2 = 8×

That’s the v³. Not two correlated effects, but one effect with two faces.


Why Integration Solves the Problem

Given the coupling, how do we actually calculate annual energy production?

Integration refuses to average.

Instead of trying to summarize the whole year with bulk quantities, integration says:

“Fine. I’ll go moment by moment. At this instant, v = 7 m/s. What’s the power? Good. Now the next instant, v = 7.2 m/s. What’s the power? Good. Next…”

At each infinitesimal moment, v is just one number. The coupling is trivially resolved—the same v goes into both the “how fast is the cylinder growing” calculation and the “how rich is this slice” calculation.

Power right now = ½ × ρ × A × v³ right now

No averaging. No untangling. Just one v, doing its two jobs, at this instant.

Then add up all the instants:

Energy = integral of P dt = integral of ½ × ρ × A × v³ dt

The Insight

Integration doesn’t untangle the coupling. It shrinks to a scale where the coupling doesn’t matter—because at an instant, there’s nothing to correlate. There’s just one v, with its two consequences, right now.

The sum of countless “right nows” is your answer.


When Would Averaging Work? A Thought Experiment

To sharpen the intuition, ask: what would need to be true for simple averaging to work?

The Bubble Cylinder

Return to the cylinder mental model, but change one thing. Imagine the cylinder always advances at constant speed—say, 10 m/s, all year. The energy isn’t carried by the air’s motion anymore. Instead, imagine energy as “bubbles” suspended in the air, and what varies moment to moment is the bubble density.

Now you can average:

Energy = (average bubble density) × (fixed volume)

The cylinder grows at a constant rate. Some hours have dense bubbles, some have sparse bubbles, but each hour contributes the same thickness of cylinder. The two terms—total volume and average energy density—are decoupled. Multiply at the end, done.

This is mathematically identical to the coal truck. The carrier (cylinder advancing at constant speed) is independent of the cargo (bubble density). Two knobs.

A Physical Example: Hot Water in a Pipe

What’s a real system with varying carrier speed but constant cargo density?

A pipe delivering hot water. The pump speed varies—sometimes fast, sometimes slow. But the thermal energy per liter is set by the water temperature, say 60 deg C. That’s independent of flow rate.

  • Flow fast → more liters per second, each at 60 deg C

  • Flow slow → fewer liters per second, each still at 60 deg C

The energy density (joules per liter, set by temperature) is decoupled from the delivery rate (liters per second, set by pump speed). Two knobs.

You can work with averages:

Energy delivered = (energy per liter) × (total liters delivered)

Or: (constant energy density) × (average flow rate) × (time)

The varying pump speed affects how much volume arrives, but each parcel’s richness is the same regardless of how fast it traveled.

Why Wind Doesn’t Give You This Escape

For wind to behave like hot water, you’d need the air to carry something whose concentration doesn’t depend on wind speed—say, a constant pollen count per cubic meter. Wind speed varies, but pollen density stays fixed. Now the cylinder’s “cargo” is independent of how fast it’s growing. Average pollen density, multiply by total volume, done.

But wind’s kinetic energy doesn’t work this way. The “temperature” of the air—its energy density, ½ × ρ × v²—is its velocity. There’s no separate thermostat. The air’s motion is both the carrier and the cargo.

This is why integration isn’t optional. The coupling between delivery rate and energy content is fundamental to what kinetic energy is. You can’t engineer around it. You can only shrink to instants where there’s nothing to decouple.


Two Paths to the Integral: Measurement vs. Prediction

The integration solution demands that we know v at each instant. In practice, there are two ways to get this:

Path 1: Measure the Wind Record Directly

Deploy instruments and record v(t) over time. For offshore wind, this typically means floating LIDAR (Flidar)—a buoy-mounted remote sensing system that measures wind speed at hub height. A 1-3 year measurement campaign gives you a detailed wind speed record.

With this record, you can:

  • Bin the data by wind speed (how many hours at 4 m/s, 5 m/s, 6 m/s…)

  • Calculate power for each bin

  • Sum to get annual energy production

This is the integral computed directly from measurements.

Path 2: Predict from a Probability Distribution

The Ladder’s Method 2 already used the EPF shortcut. Here we see where it comes from — why the correction factor exists at all. What if you only have the average wind speed at a site? You might know v_avg = 9 m/s from regional data or a short measurement campaign, but not the full distribution.

Here’s the problem: you can’t just compute P = ½ × ρ × A × (v_avg)³.

Because of the v³ nonlinearity, mean(v³) ≠ mean(v)³ — the average of the cubes always exceeds the cube of the average.

The solution: assume a probability distribution for wind speeds. The most common choice is the Rayleigh distribution (a special case of Weibull with shape parameter k=2), which fits many sites reasonably well.

For a Rayleigh distribution, the ratio mean(v³) / mean(v)³ works out to approximately 1.91. This is the Energy Pattern Factor (EPF)—the same EPF we used in the Ladder’s Method 2, now derived from the distribution.

The tradeoff:

  • Flidar measurement → accurate, site-specific, expensive, time-consuming

  • EPF prediction → quick, cheap, approximate, assumes Rayleigh distribution holds

For preliminary screening (“Is this site worth investigating?”), the EPF approach is often sufficient. For detailed project assessment and financing, you need the full wind speed distribution — either from a measurement campaign or from validated reanalysis data. The next section shows how that distribution is used.


From Power to Annual Energy Production

In practice, this integral is evaluated using wind speed statistics:

  1. Measure (or model) the distribution of wind speeds at a site—how many hours per year at 4 m/s, at 5 m/s, at 6 m/s, etc.

  2. For each wind speed bin, calculate power using P = ½ × Cp × ρ × A × v³ (where Cp is the turbine’s efficiency, limited by the Betz limit of 59.3%)

  3. Multiply each power by the hours at that wind speed

  4. Sum across all bins

The result is Annual Energy Production (AEP), typically in MWh or GWh per year.

This is the integral in discrete form: breaking the year into bins where v is approximately constant, computing power for each bin, multiplying by time, summing.


The Scaling Relationships (Summary)

ChangePower scales asDoubling gives you
Wind speed8x power
Rotor diameter4x power
Swept areaA2x power

Why These Matter

The v³ dominates everything. A mediocre turbine at a windy site beats an excellent turbine at a calm site.

Error propagation is brutal. A 10% error in wind speed estimates becomes a ~33% error in power predictions (1.1³ ~ 1.33). This is why wind resource assessment demands years of careful measurement.

Power vs. Energy: Power (watts) is the instantaneous rate—what the physics gives you. Energy (watt-hours) is the accumulated total—what you sell. The bridge between them is integration over time.


The Swept Area Method: The Engineer’s Lever

So v³ dominates the physics. Why do wind energy textbooks make such a fuss about the “swept area method”?

Because you can’t control the wind. You can control the rotor.

The Knobs You Actually Have

When designing or selecting a turbine, you don’t get to dial up v. The wind is what it is at your site. What you can choose is rotor diameter—and through it, swept area.

This makes the D² relationship the engineer’s primary lever:

Rotor diameterSwept areaRelative power
50 m~2,000 m²1x
100 m~7,900 m²4x
150 m~17,700 m²9x
200 m~31,400 m²16x

Going from a 50m rotor to a 200m rotor—a 4x increase in diameter—gives you 16x the power. That’s a big deal.

Why Turbines Keep Getting Bigger

In the 1980s, rotor diameter was about 15 meters. Today’s largest offshore rotors exceed 230 meters. That’s roughly a 15x increase in diameter, which means:

  • (15)² ~ 225x more swept area

  • 225x more power per turbine (at the same wind speed)

This is why the industry relentlessly pursues larger rotors despite the engineering challenges. The scaling reward is enormous—even though it’s “only” quadratic.

The Terminology Trap

Ginsberg (2019) writes:

“Power increases exponentially with swept area”

This is wrong — the relationship is quadratic, not exponential. But the impulse is understandable: Ginsberg is trying to emphasize that doubling the diameter does far more than double the output.

Better ways to convey the same idea:

  • “Power scales with the square of rotor diameter—double the diameter, quadruple the output”

  • “Going from an 80m to a 160m rotor doesn’t double production—it quadruples it”

  • “The swept area method matters because area is the one variable you actually control”

  • “Larger rotors capture dramatically more energy” (vague but not wrong)

What to avoid:

  • “Exponential” (mathematically incorrect—different growth class entirely)

  • “Increases rapidly” without quantifying (invites misinterpretation)

The Full Picture

The v³ relationship tells you what physics allows. The D² relationship tells you what engineering can capture. Together:

P = ½ × ρ × A × v³ = ½ × ρ × (π × D² / 4) × v³

You can’t change ρ (air density is what it is). You can’t change v (the wind blows as it will). You can change D—and every doubling of diameter buys you a factor of four.

That’s why swept area deserves its own “method” in the textbooks. Not because the scaling is exponential—it isn’t. But because it’s the lever you actually get to pull.


Terminology Note

These relationships are:

  • Linear in area (P ~ A)

  • Quadratic in diameter (P ~ D²)

  • Cubic in velocity (P ~ v³)

None of them are exponential. True exponential growth (P ~ ex or P ~ 2x) means the exponent contains the variable. These are polynomial relationships—the variable is in the base, not the exponent.

The distinction matters: exponential functions eventually outgrow any polynomial. Saying “exponential” when you mean “cubic” or “quadratic” isn’t just imprecise—it’s a different class of mathematical behavior.


Key Takeaways

  1. Wind power scales as v³ because velocity does double duty: it determines both how fast air arrives and how much energy that air contains.

  2. The carrier is the cargo. Unlike most energy systems, you can’t decouple delivery rate from energy content. One knob, two consequences.

  3. The cylinder model helps visualize annual energy as a long tube of variable-density air—but the coupling between slice thickness and slice richness prevents simple averaging.

  4. Integration solves this by shrinking to moments where there’s only one v, then summing. It doesn’t untangle the coupling; it sidesteps it.

  5. Power is the physics; energy is the economics. The cubic relationship governs instantaneous extraction. Integration over real wind distributions gives you what the turbine actually produces—and what investors actually care about.

  6. The methods ladder follows from v³. The nameplate method hides the cubic sensitivity inside an assumed capacity factor. The Ginsberg method exposes it through the EPF correction. Wake modeling confronts it directly: a 25% velocity deficit in a wake means (0.75)³ = 42% of undisturbed power. Each method up the ladder gives you more honest engagement with the cubic.



Closing the Loop: Why This Path?

A natural question: why do we go through energy density and power at all? Why not calculate energy directly?

Here’s the logic chain:

Step 1: Energy Density is the Fundamental Physics

The kinetic energy per cubic meter of moving air is:

Energy density = ½ × ρ × v²

This is bedrock—it falls straight out of KE = ½ × m × v².

Step 2: But Energy Density Alone is Stuck

You might want to say:

Total energy = (energy density) × (volume)

But what volume? The air isn’t sitting still. It’s a flow, not a parcel. And worse: when v changes, the energy density changes AND the rate at which volume passes through changes. The carrier-is-the-cargo coupling makes any direct calculation treacherous.

Step 3: Multiply by Flow Rate to Get Power

Introduce the volume flow rate (A × v) and multiply:

Power = (energy density) × (volume flow rate) = ½ × ρ × v² × A × v = ½ × ρ × A × v³

Power is the natural quantity for a continuous flow. It answers: “Right now, at this instant, how much energy per second is passing through?”

Step 4: Power Lets You Work Instant by Instant

This is the key move. At each instant, v is just one number. The coupling that wrecked the cylinder averaging is trivially resolved—there’s nothing to correlate. One v, doing its two jobs (setting energy density AND delivery rate), right now.

No averaging required. No untangling. Just: what’s v? Compute power. Done.

Step 5: Integrate Power Over Time to Get Energy

Sum up the instants:

Energy = integral of P dt = integral of ½ × ρ × A × v³ dt

Each moment contributes its power × its duration. The integral handles the fact that v changes from moment to moment. The result is total energy—MWh, GWh, what you actually sell.

The Path

Energy density (½ × ρ × v²)
|
v
× flow rate (A × v)
|
v
Power (½ × ρ × A × v³) <-- work instant by instant here
|
v
× time (integrate)
|
v
Energy (MWh, GWh/year)

We don’t go through power because it’s convenient. We go through power because it’s the only clean waypoint when the carrier is the cargo and v won’t hold still.

This is exactly what PyWake does at industrial scale: for each turbine in a 926-unit farm, at each hourly wind condition, it computes the local wind speed (accounting for upstream wakes), evaluates v³, and sums the result. The physics in this document is the physics inside that software.


References

Bastankhah, M. and Porté-Agel, F. (2014). A new analytical model for wind-turbine wakes. Renewable Energy, 70, 116–123. doi:10.1016/j.renene.2014.01.002

Gaertner, E., Rinker, J., Sethuraman, L., Zahle, F., Anderson, B., Barter, G., Abbas, N., Meng, F., Bortolotti, P., Skrzypinski, W., Scott, G., Feil, R., Ber, H., Dykes, K., Shields, M., Allen, C., and Viselli, A. (2020). Definition of the IEA 15-Megawatt Offshore Reference Wind Turbine. NREL/TP-5000-75698.

Ginsberg, M. (2019). Harness It: Renewable Energy Technologies and Project Development Models Transforming the Grid. Business Expert Press. ISBN: 978-1-63157-931-8.

Ma, Y., Zhai, L., Nickerson, E. C., Bhatt, U. S., Bhatt, M. P., and Lin, H. (2025). Wind data assessment and energy estimation on the Scotian Shelf. Wind Energy Science, 10, 2965–2999. doi:10.5194/wes-10-2965-2025

Pedersen, M. M., van der Laan, P., Friis-Møller, M., Rinker, J., and Réthoré, P.-E. (2019). DTUWindEnergy/PyWake. Zenodo. doi:10.5281/zenodo.2562662

Geoscience Machine Learning bits and bobs – data completeness

2016 Machine learning contest – Society of Exploration Geophysicists

In a previous post I showed how to use pandas.isnull to find out, for each well individually, if a column has any null values, and sum to get how many, for each column. Here is one of the examples (with more modern, pandaish syntax compared to the example in the previous post:

for well, data in training_data.groupby('Well Name'): 
print(well)
print (data.isnull().values.any())
print (data.isnull().sum(), '\n')

Simple and quick, the output showed met that  – for example – the well ALEXANDER D is missing 466 samples from the PE log:

ALEXANDER D
True
Facies         0
Formation      0
Well Name      0
Depth          0
GR             0
ILD_log10      0
DeltaPHI       0
PHIND          0
PE           466
NM_M           0
RELPOS         0
dtype: int64

A more appealing and versatile alternative, which I discovered after the contest, comes with the matrix function form the missingno library. With the code below I can turn each well into a Pandas DataFrame on the fly, then a missingno matrix plot.

for well, data in training_data.groupby('Well Name'): 

msno.matrix(data, color=(0., 0., 0.45)) 
fig = plt.gcf()
fig.set_size_inches(20, np.round(len(data)/100)) # heigth of the plot for each well reflects well length 
axes=fig.get_axes()
axes[0].set_title(well, color=(0., 0.8, 0.), fontsize=14, ha='center')

I find that looking at these two plots provides a very compelling and informative way to inspect data completeness, and I am wondering if they couldn’t be used to guide the strategy to deal with missing data, together with domain knowledge from petrophysics.

Interpreting the dendrogram in a top-down fashion, as suggested in the library documentation, my first thoughts are that this may suggest trying to predict missing values in a sequential fashion rather than for all logs at once. For example, looking at the largest cluster on the left, and starting from top right, I am thinking of testing use of GR to first predict missing values in RDEP, then both to predict missing values in RMED, then DTC. Then add CALI and use all logs completed so far to predict RHOB, and so on.

Naturally, this strategy will need to be tested against alternative strategies using lithology prediction accuracy. I would do that in the context of learning curves: I am imagining comparing the training and crossvalidation error first using only non NaN rows, then replace all NANs with mean, then compare separately this sequential log completing strategy with an all-in one strategy.

Be a geoscience and data science detective

Featured

September 16, 2020

Introduction

These days everyone talks about data science. But here’s a question: if you are a geoscientist, and like me you have some interest in data science (that is, doing more quantitative and statistical analyses with data), why choose between the two? Do both… always! Your domain knowledge is an indispensable condition, and so is an attitude of active curiosity (for me, an even more important condition). So, my advice for aspiring geoscientists and data scientists is: “yes, do some course work, read some articles, maybe a book, get the basics of Python or R, if you do not know a programming language” but then jump right ahead into doing! But – please! – skip the Titanic, Iris, Cars datasets, or any other data from your MOOC or that many have already looked at! Get some data that is interesting to you as geoscientist, or put it together yourself.

Today I will walk you through a couple of examples; the first one, I presented as a lightning talk at the Transform 2020 virtual conference organized by Software Underground. This project had in fact begun at the 2018 Geophysics Sprint (organized by Agile Scientific ahead of the Annual SEG conference) culminating with  this error flag demo notebook. Later on, inspired by watching the Stanford University webinar How to be a statistical detective, I decided to resume it, to focus on a more rigorous approach. And that leads to my last bit of advice, before getting into the details of the statistical analysis: keep on improving your geo-computing projects (if you want more advice on this aspect, take a look at my chapter from the upcoming Software Underground book, 52 things you should know about geocomputing): it will help you not only showcase your skills, but also your passion and motivation for deeper and broader understanding.

First example: evaluate the quality of seismic inversion from a published article

In this first example, the one presented at Transform 2020, I wanted to evaluate the quality of seismic inversion from a published article in the November 2009 CSEG Recorder Inversion Driven Processing. In the article, the evaluation was done by the authors at a blind well location, but only qualitatively, as illustrated in Figure 5, shown below for reference. In the top panel (a) the evaluation is for the inversion without additional processing (SPNA, Signal Protected Noise Attenuation); in the bottom panel (b) the evaluation is for the inversion with SPNA. On the right side of each panel the inverted seismic trace is plotted against the upscaled impedance well log (calculated by multiplying the well density log and the well velocity log from compressional sonic); on the right, the upscaled impedance log is inserted in a seismic impedance section as a colored trace (at the location of the well) using the same color scale and range used for the impedance section.

Figure 5 caption: Acoustic impedance results at the blind well for data without (a) and with (b) SPNA. The figure shows a 200 ms window of inverted seismic data with well B, the blind well, in the middle on the left, along with acoustic impedance curves for the well (red) and inverted seismic (blue) on the right. The data with SPNA shows a better fit to the well, particularly over the low frequencies.

What the authors reported in the figure caption is the extent to which the evaluation was discussed in the paper; unfortunately it is not backed up in any quantitative way, for example comparing a score, such as R^2, for the two methods. Please notice that I am not picking on this paper in particular, which in fact I rather quite liked, but I am critical of the lack of supporting statistics, and wanted to supplement the paper with my own.

In order to do that, I hand-digitized from the figure above the logs and inversion traces , then interpolated to regularly-sampled time intervals (by the way: if you are interested in a free tool to  digitize plots, check  use WebPlotDigitizer).

My plan was to split my evaluation in an upper and lower zone, but rather than using the seismically-picked horizon, I decided to add a fake top at 1.715 seconds, where I see a sharp increase in impedance in Figure 5. This was an arbitrary choice on my part of a more geological horizon, separating the yellow-red band from the green blue band in the impedance sections. The figure below shows all the data in a Matplotlib figure:

The first thing I did then, was to look at the Root Mean Square Error in the upper and lower zone obtained using the fake top. They are summarized in the table below:

Based on the RMSE , it looks like case b, the inversion with Signal Protected Noise Attenuated applied on the data, is a better result for the Upper zone, but not for the Lower one. This result is in agreement with my visual comparison of the two methods.

But lets’ dig a bit deeper. After looking at RMSE, I used the updated version of the error_flag function, which I first wrote at the 2018 Geophysics Sprint, listed below:
 
def error_flag(pred, actual, stat, dev = 1.0, method = 1):
    """Calculate the difference between a predicted and an actual curve 
    and return a curve flagging large differences based on a user-defined distance 
    (in deviation units) from either the mean difference or the median difference
    
Matteo Niccoli, October 2018. Updated in May 2020.
    
    Parameters:
        predicted : array
            predicted log array           
        actual : array
            original log array            
        stat : {‘mean’, ‘median’}
            The statistics to use. The following options are available:
                - mean: uses numpy.mean for the statistic, 
                and np.std for dev calculation
                - median: uses numpy.median for the statistic, 
                and scipy.stats.median_absolute_deviation (MAD) for dev calculation        
        dev : float, optional
            the standard deviations to use. The default is 1.0           
        method : int {1, 2, 3}, optional
            The error method to use. The following options are available
            (default is 1):
                1: difference between curves larger than mean difference plus dev
                2: curve slopes have opposite sign (after a 3-sample window smoothing)
                3: curve slopes of opposite sign OR difference larger than mean plus dev  
    Returns:
        flag : array
        The error flag array
    """   
    
    flag = np.zeros(len(pred))
    err = np.abs(pred-actual)
    
    if stat == 'mean':
        err_stat = np.mean(err)
        err_dev = np.std(err)
    elif stat == 'median':
        err_stat = np.median(err)
        err_dev = sp.stats.median_absolute_deviation(err)
        
    pred_sm = pd.Series(np.convolve(pred, np.ones(3), 'same'))
    actual_sm = pd.Series(np.convolve(actual, np.ones(3), 'same'))
    ss = np.sign(pred_sm.diff().fillna(pred_sm))
    ls = np.sign(actual_sm.diff().fillna(actual_sm))
                  
    if method == 1:
        flag[np.where(err>(err_stat + (dev*err_dev)))] = 1
    elif method == 2:      
        flag[np.where((ss + ls)==0 )]= 1
    elif method == 3:
        flag[np.where(np.logical_or(err>(err_stat + (dev*err_dev)), (ss+ls)==0 ))]= 1
    return flag

I believe this new version is greatly improved because:

  • Users now can choose between mean/standard deviation and median/median absolute deviation as a statistic for the error. The latter is more robust in the presence of outliers
  • I added a convolutional smoother prior to the slope calculation, so as to make it less sensitive to noisy samples
  • I expanded and improved the doctstring

The figure below uses the flag returned by the function to highlight areas of poorer inversion results, which I assigned based on passed to the function very restrictive parameters:

  • using median and a median absolute deviation of 0.5 to trigger flag
  • combining the above with checking for the slope sign
 

I also wrote short routines to count the number and percentage of samples that have been flagged, for each result, which are summarized in the table below:

The error flag method is in agreement with the RMS result: case b, the inversion on Signal Protected Noise Attenuated data is a better result for the Upper zone, but for the Lower zone the inversion without SPNA is the better one. Very cool!

But I was not satisfied yet. I was inspired to probe even deeper after a number of conversations with my friend Thomas Speidel, and reading the chapter on Estimation from Computational and Inferential Thinking (UC Berkeley). Specifically, I was left with the question in mind: “Can we be confident about those two is better in the same way?

This question can be answered with a bootstrapped Confidence Interval for the proportions of flagged samples, which I do below using the code in the book, with some modifications and some other tools from the datascience library. The results are shown below. The two plots, one for the Upper and one for the Lower zone, show the distribution of bootstrap flagged proportions for the two inversion results, with SPNA in yellow, and without SPNA in blue, respectively, and the Confidence Intervals in cyan and brown, respectively (the CI upper and lower bounds are also added to the table, for convenience).

By comparing the amount (or paucity) of overlap between the distributions (and between the confidence intervals) in the two plots, I believe I can be more confident in the conclusion drawn for the Lower zone, which is that the inversion on data without SPNA is better (less proportion of flagged errors), as there is far less overlap.

I am very pleased with these results. Of course, there are some caveats to keep in mind, mainly that:
  • I may have introduced small, but perhaps significant errors with hand digitizing
  • I chose a statistical measure (the median and median absolute deviation) over a number of possible ones
  • I chose an arbitrary geologic reference horizon without a better understanding of the reservoir and the data, just the knowledge from the figure in the paper

However, I am satisfied with this work, because it points to a methodology that I can use in the future. And I am happy to share it! The Jupyter notebook is available on GitHub.

Second example: evaluate regression results from a published article

My second, more recent example, is shorter but no less interesting, in my opinion. It is a case study correlating inversion data to petrophysical estimates of porosity-height in the Doig and Montney Formations in Western Canada (from the paper Tight gas geophysics: AVO inversion for reservoir characterization, Close et al. CSEG Recorder, May 2010, an article which I enjoyed very much reading).

The authors indicated that Vp/Vs and/or Poisson’s ratio maps from seismic inversion are good indicators of porosity in the Lower Doig and Upper Montney reservoirs in the wells used in their study, so it was reasonable to try to predict Phi-H from Vp/Vs via regression. The figure below, from the article, shows one such Vp/Vs ratio map and the Vp/Vs vs. Phi-H cross-plot for 8 wells.

Figure 7 caption: Figure 7. a) Map of median Vp/Vs ratio value and porosity-height from 8 wells through the Lower Doig and Upper Montney. The red arrows highlight wells with very small porosity-height values and correspond in general to areas of higher Vp/Vs ratio. The blue arrow highlights a well at the edge of the seismic data where the inversion is adversely affected by decreased fold. The yellow line is the approximate location of a horizontal well where micro-seismic data were recorded during stimulation. b) Cross-plot of porosity-height values as shown in (a) against Vp/Vs extracted from the map. The correlation co-efficient of all data points (blue line) of 0.7 is improved to 0.9 (red line) by removing the data point marked by the blue arrow which again corresponds to the well near the edge of the survey (a).

They also show in the figure that by removing one of the data points, corresponding to a well near the edge of the survey (where the seismic inversion result is presumably not as reliable, due to lower offset and azimuth coverage), the correlation co-efficient is improved from 0.7 to 0.9 (red line).

So, the first thing I set out to do was to reproduce the crossplot.  I again hand-digitized the porosity-height and Vp/Vs pairs in the cross-plot using again WebPlotDigitizer. However, switched around the axes, which seems more natural to me since the objectives of regression efforts would be to predict as the dependent variable Phi-h, at the location of yet to drill wells, given Vp/Vs from seismic inversion. I also labelled the wells using their row index, after having loaded them in a Pandas DataFrame. And I used Ordinary Least Square Regression from the statsmodels library twice: once with all data points, the second time after removal of the well labelled as 5 in my plot above.

So, I am able to reproduced the analysis from the figure. I think removing an outlier with insight from domain knowledge (the observation that poorer inversion result at this location is reasonable for the deviation from trend) is a legitimate choice. However, I would like to dig a bit deeper, to back up the decision with other analyses and tests, and to show how one might do it with their own data.

The first thing to look at is an Influence plot, which is a plot of the residuals, scaled by their standard deviation, against the leverage, for each observation. Influence plots are useful to distinguish between high leverage observations from outliers and are one of statsmodel ‘s standard Regression plots, so we get the next figure almost for free, with minor modifications to the default example). Here it is below, together with the OLS regression result, with all data points.

From the Influence plot it is very obvious that the point labelled as zero has high leverage (but not high normalized residual). This is not a concern because points with high leverage are important but do not alter much the regression model. On the other hand, the point labelled as 5 has very high normalized residual. This point is an outlier and it will influence the regression line, reducing the R^2 and correlation coefficient. This analysis is a robust way to legitimize removing that data point.

Next I run some inferential tests. As I’ve written in an earlier notebook on Data loading, visualization, significance testing, I find the critical r very useful in the context of bivariate analysis. The critical r is the value of the correlation coefficient at which you can rule out chance as an explanation for the relationship between variables observed in the sample, and I look at it in combination with the confidence interval of the correlation coefficient.

The two plots below display, in dark blue and light blue respectively, the upper and lower confidence interval bounds, as the correlation coefficient r varies between 0 and 1 (x axis). These two bounds will change with different number of wells (they will get closer with more wells, and farther apart with less wells). The lower bound intersects the x axis (y=0) axis at a value equal to the critical r (white dot). The green dots highlight the actual confidence interval for a specific correlation coefficient chosen, in this case 0.73 with 9 wells, and 0.9 with 8 wells.

By the way: these plots are screen captures from the interactive tool I built taking advantage of the Jupyter interactive functionality (ipywidgets). You can try the tool by running the Jupyter notebook.

With 9 wells, and cc=0.73, the resulting critical r = 0.67 tells us that for a 95% confidence level (0.05 alpha) we need at least a correlation coefficient of 0.67 in the sample (the 9 wells drilled) to be able to confidently say that there is correlation in the population (e.g. any well, future wells). However, the confidence interval is quite broad, ranging between 0.13 and 0.94 (you can get these numbers by running confInt(0.73, 9) in a cell.

With 8 wells (having removed the outlier), CC=0.9, the critical r is now 0.71, meaning that the requirement for rejecting the the null hypothesis (there is no association between Vp/Vs and Phi-H) is now a bit higher. However, a CC increased to 0.9, and only one less well, also results in a confidence interval ranging from 0.53 to 0.98, hence our confidence is greatly increased.

This second analysis also corroborates the choice of removing the outlier data point. One thing worth mentioning before moving on to the next test is that these confidence interval bounds are the expected population ones, based on the sample correlation coefficient and the number of observations; the data itself was not used. Of course, I could also have calculated, and shown you, the OLS regression confidence intervals, as I have done in this notebook (with a different dataset).

My final test involved using the distance correlation (dcor.distance_correlation) and p-value (dcor.independence.distance_covariance_test) from the dcor library. I have written before in a blog post how much I like the distance correlation, because it does not assume a linear relationship between variables, and because a distance correlation of zero does mean that there is no dependence between those two variables (contrary to Pearson and Spearman). In this case the relationship seems to be linear, but it is still a valuable test to compare DC and p-value before and after removing the outlier. Below is a summary of the test:
All data points:
D.C. =  0.745
p-value =  0.03939
Data without outlier:
D.C. =  0.917
p-value =  0.0012

The distance correlation values are very similar to the correlation coefficients from OLS, again going up once removed the outlier. But, more interestingly, with all data points, a p-value of 0.04079 is very close to the alpha of 0.05, whereas once we removed the outlier, the p-value goes down by a factor of 20, to 0.0018. This again backs up the decision to remove the outlier data point.

Geoscience Machine Learning bits and bobs – data inspection

If you have not read Geoscience Machine Learning bits and bobs – introduction, please do so first as I go through the objective and outline of this series, as well as a review of the dataset I will be using, which is from the  2016 SEG Machine LEarning contest.

*** September 2020 UPDATE ***

Although I have more limited time now, compared to 2016,  I am very excited to be participating in the 2020 FORCE Machine Predicted Lithology challenge. Most new work and blog posts will be about this new contest instead of the 2016 one.

***************************

OK, let’s begin!

With each post, I will add a new notebook to the GitHub repo here. The notebook that goes with this post is  called 01 – Data inspection.

Data inspection

The first step after loading the dataset is to create a Pandas DataFrame. With the describe method I get a lot of information for free:

Indeed, from the the first row in the summary I learn that about 20% of samples in the photoelectric effect column PE are missing.

I can use pandas.isnull to tell me, for each well, if a column has any null values, and sum to get the number of null values missing, again for each column.

for well in training_data['Well Name'].unique():
    print(well)
    w = training_data.loc[training_data['Well Name'] == well] 
    print (w.isnull().values.any())
    print (w.isnull().sum(), '\n')

Simple and quick, the output tells met, for example, that the well ALEXANDER D is missing 466 PE samples, and Recruit F9 is missing 12.

However,  the printout is neither easy, nor pleasant to read, as it is a long list like this:

SHRIMPLIN
False
Facies       0
Formation    0
Well Name    0
Depth        0
GR           0
ILD_log10    0
DeltaPHI     0
PHIND        0
PE           0
NM_M         0
RELPOS       0
dtype: int64 

ALEXANDER D
True
Facies         0
Formation      0
Well Name      0
Depth          0
GR             0
ILD_log10      0
DeltaPHI       0
PHIND          0
PE           466
NM_M           0
RELPOS         0
dtype: int64 

Recruit F9
True
Facies        0
Formation     0
Well Name     0
Depth         0
GR            0
ILD_log10     0
DeltaPHI      0
PHIND         0
PE           12
NM_M          0
RELPOS        0
dtype: int64
...
...

 

From those I can see that, apart from the issues with the PE log, GR has some high values in SHRIMPLIN, and so on…

All of the above is critical to determine the data imputation strategy, which is the topic of one of the next posts; but first in the next post I will use a number of visualizations of  the data, to examine its distribution by well and by facies, and to explore relationships among variables.

Geoscience Machine Learning bits and bobs – introduction

Bits and what?

After wetting (hopefully) your appetite with the Machine Learning quiz / teaser I am now moving on to a series of posts that I decided to title “Geoscience Machine Learning bits and bobs”.

OK, BUT fist of all, what does ‘bits and bobs‘ mean? It is a (mostly) British English expression that means “a lot of small things”.

Is it a commonly used expression? If you are curious enough you can read this post about it on the Not one-off British-isms blog. Or you can just look at the two Google Ngram plots below: the first is my updated version of the one in the post, comparing the usage of the expression in British vs. US English; the second is a comparison of its British English to that of the more familiar “bits and pieces” (not exactly the same according to the author of the blog, but the Cambridge Dictionary seems to contradict the claim).

I’ve chosen this title because I wanted to distill, in one spot, some of the best collective bits of Machine Learning that came out during, and in the wake of the 2016 SEG Machine Learning contest, including:

  • The best methods and insights from the submissions, particularly the top 4 teams
  • Things that I learned myself, during and after the contest
  • Things that I learned from blog posts and papers published after the contest

I will touch on a lot of topics but I hope that – in spite of the title’s pointing to a random assortment of things –  what I will have created in the end is a cohesive blog narrative and a complete, mature Machine Learning pipeline in a Python notebook.

*** September 2020 UPDATE ***

Although I have more limited time these days, compared to 2016,  I am very excited to be participating in the 2020 FORCE Machine Predicted Lithology challenge. Most new work and blog posts will be about this new contest instead of the 2016 one.

***************************

Some background on the 2016 ML contest

The goal of the SEG contest was for teams to train a machine learning algorithm to predict rock facies from well log data. Below is the (slightly modified) description of the data form the original notebook by Brendon Hall:

The data is originally from a class exercise from The University of Kansas on Neural Networks and Fuzzy Systems. This exercise is based on a consortium project to use machine learning techniques to create a reservoir model of the largest gas fields in North America, the Hugoton and Panoma Fields. For more info on the origin of the data, see Bohling and Dubois (2003) and Dubois et al. (2007).

This dataset is from nine wells (with 4149 examples), consisting of a set of seven predictor variables and a rock facies (class) for each example vector and validation (test) data (830 examples from two wells) having the same seven predictor variables in the feature vector. Facies are based on examination of cores from nine wells taken vertically at half-foot intervals. Predictor variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. These are essentially continuous variables sampled at a half-foot sample rate.

The seven predictor variables are:

The nine discrete facies (classes of rocks) are:

Tentative topics for this series

  • List of previous works (in this post)
  • Data inspection
  • Data visualization
  • Data sufficiency
  • Data imputation
  • Feature augmentation
  • Model training and evaluation
  • Connecting the bits: a full pipeline

List of previous works (comprehensive, to the best of my knowledge)

In each post I will make a point to explicitly reference whether a particular bit (or a bob) comes from a submitted notebook by a team, a previously unpublished notebook of mine, a blog post, or a paper.

However, I’ve also compiled below a list of all the published works, for those that may be interested.

The contest’s original article published by Brendon Hall on The Leading Edge, and the accompanying notebook

The Github repo with all teams’ submissions.

Two blog posts by Matt Hall of Agile Scientific, here and here

The published summary of the contest by Brendon Hall and Matt Hall on The Leading Edge

An SEG extended abstract on using gradient boosting on the contest dataset

An arXiv e-print paper on using a ConvNet on the contest dataset

Abstract for a talk at the 2019 CSEG / CSPG Calgary Geoconvention

 

Machine Learning quiz – part 3 of 3

In my last two posts I published part 1 and part 2 of this Machine Learning quiz. If you have not read them, please do (and cast your votes) before you read part 3 below.

QUIZ, part 3: vote responses and (some) answers

In part 1 I asked which predictions looked “better”: those from model A or those from model B (Figure 1)?

Figure 1

As a reminder, both model A and model B were trained to predict the same labeled facies picked by a geologist on core, shown on the left columns (they are identical) of the respective model panels. The right columns in each panels are the predictions.

The question is asked out of context, with no information given about the training process, and or difference in data manipulation (if any) and/or model algorithm used. Very unfair, I know!  And yet, ~78% of 54 respondent clearly indicated their preference for model A. My sense is that this is because model A looks overall smoother and has less of the extra misclassified thin layers.

Response 1

In part 2, I presented the two predictions, this time accompanied by a the confusion matrix for each model (Figure 2).

Figure 2

I asked again which model would be considered better [1] and this was the result:

Response 2a

Although there were far fewer votes (not as robust a statistical sample) I see that the proportion of votes is very similar to that in the previous response, and decidedly in favor of model A, again. However, the really interesting learning, and to me surprising, came from the next answer (Response 2b): about 82% of the 11 respondents believe the performance scores in the confusion matrix to be realistic.

Response 2b

Why was it a surprise? It is now time to reveal the trick…..

…which is that the scores in part 2, shown in the confusion matrices of Figure 2, were calculated on the whole well, for training and testing together!!

A few more details:

  • I used default parameters for both models
  • I used a single 70/30 train/test split (the same random split for both models) with no crossvalidation

which is, in essence, how to NOT do Machine Learning!

In Figure 3, I added a new column on the right of each prediction showing in red which part of the result is merely memorized, and in black which part is interpreted (noise?). Notice that for this particular well (the random 70/30 split was done on all wells together) the percentages are 72.5% and 27.5%.

I’ve also added the proper confusion matrix for each model, which used only the test set. These are more realistic (and poor) results.

Figure 3

So, going back to that last response: again, with 11 votes I don’t have solid statistics, but with that caveat in mind one might argue that this is a way you could be ‘sold’ unrealistic (as in over-optimistic) ML results.

At least you could sell them by being vague about the details to those not familiar with the task of machine classification of rock facies and its difficulties (see for example this paper for a great discussion about resolution limitations inherent  in using logs (machine) as opposed to core (human geologist).

Acknowledgments

A big thank you goes to Jesper (Way of the Geophysicist) for his encouragement and feedback, and for brainstorming with me on how to deliver this post series.


[1] notice that, as pointed out in part 2, model predictions were slightly different from those part 1 because I’d forgotten to set the random seed to be the same in the two pipelines; but not very much, the overall ‘look’ was very much the same.

Machine Learning quiz – part 2 of 3

In my previous post I posted part 1 (of 3) of a Machine Learning quiz. If you have not read that post, please do, cast your vote, then come back and try part 2 below.

QUIZ, part 2

Just as a quick reminder, the image below shows the rock facies predicted from two models, which I just called A and B. Both were trained to predict the same labeled rock facies, picked by a geologist on core, which are shown on the left columns (they are identical) of the respective model panels. The right columns in each panels are the predictions.

*** Please notice that the models in this figure are (very slightly) different from part 1 because I’d forgotten to set the random seed to be the same in the two pipelines (yes, it happens, my apology). But they are not so different, so I left the image in part 1 unchanged and just updated this one.

Please answer the first question: which model predicts the labeled facies “better” (visually)?

Now study the performance as summarized in the confusion matrices for each model (the purple arrows indicate to which model each matrix belongs; I’ve highlighted in green the columns where each model does better, based on F1 (you don’t have to agree with my choice), and answer the second question (notice the differences are often a few 1/100s, or just one).

 

Machine Learning quiz – part 1 of 3

Introduction

I’ve been meaning to write about the 2016 SEG Machine Learning Contest for some time. I am thinking of a short and not very structured series (i.e. I’ll jump all over the place) of 2, possibly 3 posts (with the exclusion of this quiz). It will mostly be a revisiting – and extension – of some work that team MandMs (Mark Dahl and I) did, but not necessarily posted. I will touch most certainly on cross-validation, learning curves, data imputation, maybe a few other topics.

Background on the 2016 ML contest

The goal of the SEG contest was for teams to train a machine learning algorithm to predict rock facies from well log data. Below is the (slightly modified) description of the data form the original notebook by Brendon Hall:

The data is originally from a class exercise from The University of Kansas on Neural Networks and Fuzzy Systems. This exercise is based on a consortium project to use machine learning techniques to create a reservoir model of the largest gas fields in North America, the Hugoton and Panoma Fields. For more info on the origin of the data, see Bohling and Dubois (2003) and Dubois et al. (2007).

This dataset is from nine wells (with 4149 examples), consisting of a set of seven predictor variables and a rock facies (class) for each example vector and validation (test) data (830 examples from two wells) having the same seven predictor variables in the feature vector. Facies are based on examination of cores from nine wells taken vertically at half-foot intervals. Predictor variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. These are essentially continuous variables sampled at a half-foot sample rate.

The seven predictor variables are:

The nine discrete facies (classes of rocks) are:

For some examples of the work during the contest, you can take a look at the original notebook, one of the submissions by my team, where we used Support Vector Classification to predict the facies, or a submission by the one of the top 4 teams, all of whom achieved the highest scores on the validation data with different combinations of Boosted Trees trained on augmented features alongside the original features.

QUIZ

Just before last Christmas, I run a little fun experiment to resume work with this dataset. I decided to turn the outcome into a quiz.

Below I present the predicted rock facies from two distinct models, which I call A and B. Both were trained to predict the same labeled facies picked by the geologist, which are shown on the left columns (they are identical) of the respective model panels. The right columns in each panels are the predictions. Which predictions are “better”?

Please be warned, the question is a trick one. As you can see, I am gently leading you to make a visual, qualitative assessment of “better-ness”, while being absolutely vague about the models and not giving any information about the training process, which is intentional, and – yes! – not very fair. But that’s the whole point of this quiz, which is really a teaser to the series.