Maintain a scored prediction log
Record predictions with explicit probabilities and score them when they resolve.
Why it works
Calibration requires a feedback loop: you state a probability, the outcome happens, and you learn whether your stated confidence matched the base rate of your predictions. Without recording and scoring, confidence remains a feeling rather than a measurable quantity. The Brier score is the standard metric: lower is better, rewarding confident correct predictions and penalizing confident wrong ones proportionally.
How to do it
- Set up a simple log: date, question, probability estimate, outcome (when resolved).
- Record at least five to ten predictions per month to generate enough data for calibration analysis.
- Score each resolved prediction using the Brier score formula: (probability − outcome)², where outcome is 0 or 1.
- Plot your Brier scores over time — a trend toward lower scores is real improvement.
Evidence
Tetlock’s forecasting tournaments used Brier scoring throughout; tracked, scored forecasters improved significantly more than those who forecast without feedback. The scoring mechanism is the operationalization of calibration feedback and is central to the good-judgment research program. (observational)
Personal prediction logs require discipline to maintain and only become statistically meaningful after sufficient volume — typically hundreds of predictions for fine-grained calibration.
Sources
- Tetlock (2005), Expert Political Judgment
- Tetlock & Gardner (2015), Superforecasting
Common mistake
Recording predictions in vague language ("I think it’ll work out") rather than numerical probabilities, which makes scoring impossible and calibration unmeasurable.
Practice this with IX Coach
IX Coach maintains your prediction log automatically across sessions, scoring resolved forecasts and computing your running calibration so you can see improvement rather than just feel it.
7 days free, then $40/month (~$1.30/day).