Fun with Numbers
Don’t Guess with SOS
A Collaboration by Dave Bartoo and Keith Garrett
In this new era of sports data, fans often get confused between stats and predictive analytics. Stats are just numbers that you can look back to try and find trends to create the predictive analytics. ‘Moneyball’ is the standard go-to for fans when entering this discussion and the spray chart is great example of stats versus predictive analytics. Someone took all the ‘stats’ of where every individual player hit the ball in play to create a ‘predictive model’ of the best odds of where a ball will get hit in any given at-bat in order to put the defense in the best possible position.
The Playoff Committee is now facing a dilemma of a mountain of stats anyone can access but with no predictive analytics support to help them assess the numbers and improve their odds in picking the four best teams. Without the assistance in the assessment of their ideas, they are simply guessing, or worse, having someone tell them the best stats to use.The great part about having access to all the information the public information the Playoff Committee has access to is that we can all help them by running the numbers on any given assumption. Let’s start with Strength of Schedule.
Imagine Retired Lt. General Michael Gould asking you the question: “When I look at the top available SOS ranking systems, I saw that Florida State had an average SOS of 65.8 and won the national title over a team with a top ten SOS. It was the same in the 2013 title game, when Alabama, at no. 30, won their title over a team with a much stronger SOS. With the final no. 1 ranked team over the last five years having an average SOS of no. 29, is using SOS rankings a good predictive model for choosing the better team?” While those numbers are correct and a great start to a bigger question, they do not establish a trend or should be used as an indicator for predicting the better teams.
The idea of SOS has been a hot topic for years with fans as they believed it was not being used in the BCS formulas by voters and computers. Fans want SOS to be real and for it to be used as it directly benefits the fans. If you make SOS relevant and you get better games to watch and TV makes more money and thus, everyone is happy. That is why any negative view on SOS will be met with the Backfire Effect.
When fans look to SOS rankings, there are five sources that are commonly accepted and have data back for five seasons: Sagarin, GEB, Football Outsiders/F+, NCAA and Phil Steele. With all the rankings, and all the bowl results, we can easily look at the predictive nature of these SOS models. The immediate fan argument is “Not every team plays all out in a bowl.” I agree, and so we looked at ‘classic’ bowl games and national title games with our first pass.
BIG BOWLS PROFILED
Everyone would likely agree that each team brought their ‘A’ game to these 26 bowls. We used bowl ‘rankings’ from ESPN, Bleacher Report, Yahoo and Fox Sports websites to try and find consensus in ‘classic’ bowl games.
26 solid bowl games and 38% were won by the team with the better Strength of Schedule. It is a small sample, but 10 out of 26 is not supporting the idea that SOS is a predictive metric for the Playoff committee members.
ALL BOWLS PROFILED
The committee has to pick the top 12 teams for the bowls so having a narrow look at a handful of big bowls is not enough information to confirm or deny the use of SOS systems. The numbers, when looking at all the bowls over the last five season, do not get any better.
None of the five systems had more than two seasons over a coin flip odds of accuracy in picking the winning team. Only five times in the twenty five spots in this Matrix did a system go above 55% for a bowl season. Football Outsiders was the best over the last five years at 53.5% and the NCAA rankings, regarded widely as the worst model was not far behind, yet still the lowest value at 49.4%.
The number of times a model was under .500 for a bowl season occurred an amazing eight times with Phil Steele, Sagarin and FO each recording it twice. That makes 16 of these 25 individual results at or below 50% accuracy. That is worse than flipping a coin.
It is important to understand that four of these models use extensive stats and metrics while the NCAA model, whose results are not dissimilar to the other models, uses raw data to rank its SOS. They take a teams schedule and add up the wins from last year. There is absolutely no opinion or projection or use of stats from the current season.
While fans and people covering and producing the games desperately want SOS to be used to force tougher schedules and get better games, it does not appear to be a fair tool in evaluating teams. A reward tool, yes, a tool to separate better teams, no. Without the ability to analyze the predictive value of any stats or ranking, the committee is simply guessing or being told which stats to use in picking the top 12 teams.
While the current SOS rankings are not predictive for the Playoff Committee to use with any certainty, the answers for predictive analytics, including SOS rankings, are out there, someone just needs to ask how to find them.