BLOGS & WHITEPAPERS

4 min read

PUBLISHED

Nov 3, 2025

SHARE THIS POST

Science

What You Group is What You See

Every data story begins with a choice: what counts as together?

AUTHOR

Frank Corrigan

Published

Nov 3, 2025

4 min read

Share this post

Group by is everywhere.
Every spreadsheet. Every dashboard. Every script.

In Excel, it’s a pivot table.
In SQL, an aggregation.
In Python:

df.groupby("category")["value"].mean()

It hides in plain sight.
Too familiar to notice.

But when used intentionally, it’s a superpower.

That power comes from picking the right groups.

Group sales by region: East is thriving.
Group by customer: one whale accounts for 60% of East’s revenue. Same data, opposite stories.

The frame decides the truth you see.
Grouping is the act of choosing that frame.

Find the right groups, reveal the insight.

Anyone can run the code.
The skill is judgement.
Judgement is choosing what belongs together.

In operations, my teams used facet grids all the time.
Multiple panels of the same chart type over different groupings.

Each carrier, overall → everyone hits 90% on-time.
Break by carrier × destination → FedEx bleeds in BOS.
Nothing fancy.
Just correct framing.

This doesn’t stop at descriptive stats. In machine learning, groupby is a hero.

NVIDIA puts it plainly: feature engineering is still one of the most effective ways to improve tabular ML, and the most powerful feature engineering technique is groupby aggregations.

Translation: even modern models need you to decide the meaningful buckets.

Concrete example (with synthetic data):

Data columns: carrier, origin, destination, recent on-time rate.
Baseline model: random forest with default features ≈ 88% accuracy.
Add one feature: 30-day mean on-time for carrier x destination.
Same model, same params. Accuracy jumps to ≈ 89%. You handed the algorithm the right grouping. Signal unlocked.

Only 1 percentage point?

1% better on $100M of inventory; that's $1M of potential savings. Either $1M of reduced OOS that increases revenue, or less carrying cost from obsolete/slow-moving inventory that never needed to come in.

You didn’t tune the model.
You didn’t collect new data.
All you did was give the machine the right world view.
Nearly free ROI.

But how? There are at least two paths to the right groups:

Brute force: when you have no intuition or have compute to burn.
Domain knowledge: when you understand the structure of the world.

Jan 26, 2023

Vish Oza

Science

Prediction is the New Visualization

May 30, 2024

Frank Corrigan

A New Era in Supply Chain

Home

Company

Blog

Careers

Security

A New Era in Supply Chain

Home

Company

Blog

Careers

Security

A New Era in Supply Chain

Home

Company

Blog

Careers

Security

Home

Company

Resources

book a demo

Home

Company

Blog

Resources

Home

Company

Blog

Resources

book a demo