Blog/Methodology
MethodologyMay 15, 2026

We Just Recalibrated the Humor Index Display Scale. Here's Why and What Changed.

On May 15, 2026 we made a methodology change to the Humor Index: we recalibrated the display formula. Show-level scores shifted down between 0.7 and 4.3 points. The rank order didn't change.

If you saw scores before today and you're seeing different numbers now, this post explains why.

What changed

The raw Humor Index — the underlying score we compute from per-joke craft, impact, density, and consistency — is unchanged. Every per-joke and per-episode raw score in our database is exactly what it was yesterday.

What we changed is the formula that maps that raw 0–10 score to the 0–100 display value you see on the site.

Old display formula:

`display = 75 + (raw − 6.5) ÷ 0.55 × 10`

New display formula:

`display = 75 + (raw − 6.5) ÷ 0.80 × 10`

The only change is the spread parameter — 0.55 → 0.80. The slope of the scale flattened by about 31%. Center stays at 75 (median episode = 75). What changes is how aggressively a high-or-low raw score gets stretched into the display range.

Why we had to do this

The previous calibration was set in April 2026 when we had six shows in the dataset. The implicit assumption was that "raw 8.0 ≈ display 100" — i.e., a raw score of 8 would represent an "essentially flawless" episode, calibrated against the absolute best episode in the dataset at that time.

When we added 30 Rock on May 14, that assumption broke.

The all-time top episode in the dataset — The Office, "Dinner Party" (S04E13) — has a raw HI of 8.34. Under the old display formula, that converts to display 108.45. Above 100. That's not what 100 is supposed to mean.

It got worse with 30 Rock. Seven 30 Rock episodes scored above raw 7.875, which translated to display > 100 under the old formula. The clamp in the code caught them at exactly 100.0, but that masked real differences (Game Over at raw 8.08 and Reaganing at raw 7.91 were both being displayed as the same "100" even though they're meaningfully different scores).

When the ceiling stops being a ceiling, the scale stops being useful.

The recalibration

We picked the new spread (0.80) so that raw 8.5 = display 100 exactly. Why 8.5 instead of 8.34 (the current max)?

  • It gives us a small buffer above the current dataset max. The next few shows we score (Brooklyn Nine-Nine, It's Always Sunny, Big Bang Theory, Two and a Half Men) could surface a higher-scored episode. We don't want to re-recalibrate every time a new show ships.
  • 8.5 is a clean number. The methodology page is easier to defend with "raw 8.5 = ceiling" than with "raw 8.34 = ceiling, locked to whatever the current Office score is."

Under the new formula, the all-time top episode (Dinner Party) lands at display 98.0. Nothing is at exactly 100 yet. There's room above the current ceiling for future episodes that we might score higher.

What every show shifted to

| Show | Old | New | Δ | |---|---:|---:|---:| | 30 Rock | 88.6 | 84.3 | −4.3 | | Arrested Development | 85.2 | 82.0 | −3.2 | | Parks and Recreation | 80.55 | 78.8 | −1.75 | | The Office | 80.22 | 78.6 | −1.6 | | Seinfeld | 79.10 | 77.8 | −1.3 | | Friends | 78.66 | 77.5 | −1.2 | | Schitt's Creek | 78.30 | 77.3 | −1.0 |

Shows that were higher under the old scale shifted down more. That's the entire point of the recalibration — the old scale was over-rewarding the top end of the distribution. The median (anywhere near 75) barely moved.

What did NOT change

To be very explicit about this:

  • Every raw HI is unchanged. If you query our database directly, the per-joke craft scores, per-joke impact scores, per-episode raw HI, and per-season raw HI are all exactly what they were before this update.
  • The rank order is identical. 30 Rock is still #1. AD is still #2. The mid-cluster (Parks, Office, Seinfeld, Friends, Schitt's) is in the same relative order.
  • The methodology that produces the raw scores is unchanged. Same 9-dimension craft rubric. Same 3-run consensus. Same Bayesian shrinkage for per-character WAR. Same format-coefficient-deprecated approach.

What this means for prior content

A handful of our prior blog posts and social posts cite specific display scores ("Seinfeld at 79.1", "Parks at 80.55", etc.). Those references are now slightly off from the live site. They're not wrong about the conclusions they drew — the rank order and the relative gaps are intact — but the absolute display numbers in those posts are from the old scale.

We're not going back to edit historical posts. The "as of when" date on each post documents what scale was live at the time. If you want to compare a number from an old post to a number on the current site, multiply the old display by approximately 0.69 to get the new display: `new = 75 + (old − 75) × 0.6875`.

The 30 Rock launch post from yesterday has been updated to reflect the new scale.

What we'd do differently next time

The honest read on this incident: we should have recalibrated the display scale when we set up the methodology page in April, not waited for 30 Rock to expose the problem. The display formula was set in February 2026 against a much smaller dataset and was never re-anchored as the dataset grew. The clamp at 100 in the code papered over the issue for a while but eventually a high-scoring show was going to break it visibly.

Going forward:

  • The display scale will be re-anchored whenever a new show pushes the raw maximum above 8.5. We'll document each recalibration in a methodology post like this one.
  • The methodology page (/methodology) now states the current display formula and the anchor (raw 8.5 = display 100) explicitly.
  • We'll publish a "what shifted" comparison post like this one alongside any future recalibration. Public methodology, public revisions.

Why we're telling you

We could have done this silently. Most analytics products quietly tune their scales.

The reason we're publishing this instead is that the brand is built around methodology transparency. We publish the ICC noise floor. We publish confidence intervals on show rankings. We publish the fact that the Office/Seinfeld/Friends/Parks/Schitt's cluster is statistically a wash. The Humor Index isn't useful if we hide what's actually happening inside it.

A scoring system that adjusts itself without saying so isn't really a scoring system. It's a vibes engine with numbers.

---

The full display formula is documented at our [methodology page](/methodology). The technical write-up of why 0.55 was wrong and 0.80 is right is at [our scorer-noise-floor post](/blog/scorer-noise-floor) under the calibration section. Questions: hello@thehumorindex.com.

Liked this analysis?

We publish one deep dive every week.

Join comedy fans getting weekly rankings, new show analyses, and the funniest moments we found. No spam, unsubscribe anytime.

Explore the rankings

See the full per-episode breakdown of the highest-ranked sitcoms on the Humor Index.

See every show ranked →