Why nine-boxes drift up
The nine-box performance grid drifts top-heavy because every manager has a strong incentive to rate their reports highly: it makes promotion cases easier, reduces conflict, and signals competence. Without calibration, top-heavy drift is the default state.
Companies that ignore drift end up with eighty per cent of the workforce rated 'meets or exceeds' and a compensation bill that does not match the underlying performance distribution. The leak shows up in talent-density metrics two years later.
The agenda that works
Calibration sessions work best when they are short, structured, and skip-level. Each manager presents three reports they have rated above the average and three below. The skip-level chairs the discussion. Other managers can challenge, but only with specific examples, not opinions.
Ratings are not finalised in the room. The chair takes the discussion offline and circulates the final distribution within forty-eight hours. Our performance product hosts the rating data; the ritual is what makes calibration real.
The disagreement-resolution rule
When two managers disagree on a rating, the rule we ship is: the manager closer to the work decides, the chair documents the alternative view, and the next cycle revisits. This avoids the manager-political dynamic of arguing rating in real time.
Pair this with structured interview kits at the front of the funnel and you have a coherent talent system.
What you measure quarterly
Three numbers: rating distribution per manager, rating-to-promotion conversion, and self-vs-manager delta. The dashboards live in our manager view. If a manager's distribution diverges sharply from the org's, that is the calibration-cycle agenda for next time.