The Academy Awards are a week away, and I’m sharing my machine-learning-based predictions for Best Picture as well as some insights I took away from the process (particularly XGBoost’s sparsity-aware split finding). Oppenheimer is a heavy favorite at 97% likely to win—but major surprises are not uncommon, as we’ll see.
I pulled data from three sources. First, industry awards. Most unions and guilds for filmmakers—producers, directors, actors, cinematographers, editors, production designers—have their own awards. Second, critical awards. I collected as wide as possible, from the Golden Globes to the Georgia Film Critics Association. More or less: If an organization had a Wikipedia page showing a historical list of nominees and/or winners, I scraped it. Third, miscellaneous information like Metacritic score and keywords taken from synopses to learn if it was adapted from a book, what genre it is, the topics it covers, and so on. Combining all of these was a pain, especially for films that have bonkers names like BİRDMAN or (The Unexpected Virtue of Ignorance).
The source data generally aligns with what FiveThirtyEight used to do, except I casted a far wider net in collecting awards. Other differences include FiveThirtyEight choosing a closed-form solution for weighting the importance of awards and then rating films in terms of “points” they accrued (out of the potential pool of points) throughout the season. I chose to build a machine learning model, which was tricky.
To make the merging of data feasible (e.g., different tables had different spellings of the film or different years associated with the film), I only looked at the movies who received a nomination for Best Picture, making for a tiny dataset of 591 rows for the first 95 ceremonies. The wildly small N presents a challenge for building a machine learning model, as does sparsity and missing data.
Sparsity and Missing Data
There are a ton of zeroes in the data, creating sparsity. Every variable (save for the Metacritic score) is binary. Nomination variables (i.e., was the film nominated for the award?) may have multiple films for a given year with a 1, but winning variables (i.e., did the film win the award?) only have a single 1 each year.
There is also the challenge of missing data. Not every award in the
model goes back to the late 1920s, meaning that each film has an
NA
if it was released in a year before a given award. For
example, I only included Metacritic scores for contemporaneous releases,
and the site launched in 2001, while the Screen Actors Guild started
their awards in 1995.
My first thought was an ensemble model. Segment each group of awards, based on their start date, into different models. Get predicted probabilities from these, and combine them weighted on the inverse of out-of-sample error. After experimenting a bit, I came to the conclusion so many of us do when building models: Use XGBoost. With so little data to use for tuning, I simply stuck with model defaults for hyper-parameters.
Outside of its reputation for being accurate out of the box, it handles missing data. The docs simply state: “XGBoost supports missing values by default. In tree algorithms, branch directions for missing values are learned during training.” This is discussed in deeper detail in the “sparsity-aware split finding” section of the paper introducing XGBoost. The full algorithm is shown in that paper, but the general idea is that an optimal default direction at each split in a tree is learned from the data, and missing values follow that default.
Backtesting
To assess performance, I backtested on the last thirty years of Academy Awards. I believe scikit-learn would call this group k-fold cross-validation. I removed a given year from the dataset, fit the model, and then made predictions on the held-out year. The last hiccup is that the model does not know that if Movie A from Year X wins Best Picture, it means Movies B - E from Year X cannot. It also does not know that one of the films from Year X must win. My cheat around this is I re-scale all the predicted probabilities to sum to one.
The predictions for the last thirty years:
Year | Predicted Winner | Modeled Win Probability | Won Best Picture? | Actual Winner |
---|---|---|---|---|
1993 | schindler’s list | 0.996 | 1 | schindler’s list |
1994 | forrest gump | 0.990 | 1 | forrest gump |
1995 | apollo 13 | 0.987 | 0 | braveheart |
1996 | the english patient | 0.923 | 1 | the english patient |
1997 | titanic | 0.980 | 1 | titanic |
1998 | saving private ryan | 0.938 | 0 | shakespeare in love |
1999 | american beauty | 0.995 | 1 | american beauty |
2000 | gladiator | 0.586 | 1 | gladiator |
2001 | a beautiful mind | 0.554 | 1 | a beautiful mind |
2002 | chicago | 0.963 | 1 | chicago |
2003 | the lord of the rings: the return of the king | 0.986 | 1 | the lord of the rings: the return of the king |
2004 | the aviator | 0.713 | 0 | million dollar baby |
2005 | brokeback mountain | 0.681 | 0 | crash |
2006 | the departed | 0.680 | 1 | the departed |
2007 | no country for old men | 0.997 | 1 | no country for old men |
2008 | slumdog millionaire | 0.886 | 1 | slumdog millionaire |
2009 | the hurt locker | 0.988 | 1 | the hurt locker |
2010 | the king’s speech | 0.730 | 1 | the king’s speech |
2011 | the artist | 0.909 | 1 | the artist |
2012 | argo | 0.984 | 1 | argo |
2013 | 12 years a slave | 0.551 | 1 | 12 years a slave |
2014 | birdman | 0.929 | 1 | birdman |
2015 | spotlight | 0.502 | 1 | spotlight |
2016 | la la land | 0.984 | 0 | moonlight |
2017 | the shape of water | 0.783 | 1 | the shape of water |
2018 | roma | 0.928 | 0 | green book |
2019 | parasite | 0.576 | 1 | parasite |
2020 | nomadland | 0.878 | 1 | nomadland |
2021 | the power of the dog | 0.981 | 0 | coda |
2022 | everything everywhere all at once | 0.959 | 1 | everything everywhere all at once |
Of the last 30 years, 23 predicted winners actually won, while 7 lost—making for an accuracy of about 77%. Not terrible. (And, paradoxically, many of the misses are predictable ones to those familiar with Best Picture history.) However, the mean predicted probability of winning from these 30 cases is about 85%, which means the model is maybe 8 points over-confident. We do see recent years being more prone to upsets—is that due to a larger pool of nominees? Or something else, like a change in the Academy’s makeup or voting procedures? At any rate, some ideas I am going to play with before next year are weighting more proximate years higher (as rules, voting body, voting trends, etc., change over time), finding additional awards, and pulling in other metadata on films. It might just be, though, that the Academy likes to swerve away from everyone else sometimes in a way that is not readily predictable from outside data sources. (Hence the fun of watching and speculating and modeling in the first place.)
This Year
I wanted to include a chart showing probabilities over time, but the story has largely remained the same. The major inflection point was the Directors Guild of America (DGA) Awards.
Of the data we had on the day the nominees were announced (January 23rd), the predictions were:
Film | Predicted Probability |
---|---|
Killers of the Flower Moon | 0.549 |
The Zone of Interest | 0.160 |
Oppenheimer | 0.147 |
American Fiction | 0.061 |
Barbie | 0.039 |
Poor Things | 0.023 |
The Holdovers | 0.012 |
Past Lives | 0.005 |
Anatomy of a Fall | 0.005 |
Maestro | 0.001 |
I was shocked to see Oppenheimer lagging in third and to see
The Zone of Interest so high. The reason here is that, while
backtesting, I saw that the variable importance for winning the DGA
award for Outstanding Directing - Feature Film was the highest by about
a factor of ten. Since XGBoost handles missing values nicely, we can
rely on the sparsity-aware split testing to get a little more
information from these data. If we know the nominees of an award but not
the winner yet, we can still infer: Anyone who was nominated is left
NA
, while anyone who was not nominated is set to zero. That
allows us to partially use this DGA variable (and the other awards where
we knew the nominees on January 23rd, but not the winners). When we do
that, the predicted probabilities as of the announcing of the
Best Picture nominees were:
Film | Predicted Probability |
---|---|
Killers of the Flower Moon | 0.380 |
Poor Things | 0.313 |
Oppenheimer | 0.160 |
The Zone of Interest | 0.116 |
American Fiction | 0.012 |
Barbie | 0.007 |
Past Lives | 0.007 |
Maestro | 0.003 |
Anatomy of a Fall | 0.002 |
The Holdovers | 0.001 |
The Zone of Interest falls in favor of Poor Things, since the former was not nominated for the DGA award while the latter was. I was still puzzled, but I knew that the model wouldn’t start being certain until we knew the DGA award. Those top three films were nominated for many of the same awards. Then Christopher Nolan won the DGA award for Oppenheimer, and the film hasn’t been below a 95% chance for winning Best Picture since.
Final Predictions
The probabilities as they stand today, a week before the ceremony, have Oppenheimer as the presumptive winner at a 97% chance of winning.
Film | Predicted Probability |
---|---|
Oppenheimer | 0.973 |
Poor Things | 0.010 |
Killers of the Flower Moon | 0.005 |
The Zone of Interest | 0.004 |
Anatomy of a Fall | 0.003 |
American Fiction | 0.002 |
Past Lives | 0.001 |
Barbie | 0.001 |
The Holdovers | 0.001 |
Maestro | 0.000 |
There are a few awards being announced tonight (Satellite Awards, the awards for the cinematographers guild and the edtiors guild), but they should not impact the model much. So, we are in for a year of a predictable winner—or another shocking year where a CODA or a Moonlight takes home film’s biggest award. (If you’ve read this far and enjoyed Cillian Murphy in Oppenheimer… go check out his leading performance in Sunshine, directed by Danny Boyle and written by Alex Garland.)