NFL Predictions:
Michael Lehan (Getty Images)

What’s This About 🤔

Welcome to NFL Predictions, a series where we dive into what it takes to make NFL predictions. In this first part, Winner by Record, we’ll explore how the team seasonal record impacts the likelihood of winning games.

As the series unfolds, we’ll develop tools and eventually create a predictive model to answer the ultimate question: Who’s going to win? Let’s get started!

Disclaimers

First, I’m relatively new to American football, with just three years of watching under my belt. While I still have plenty to learn, this fresh perspective allows me to focus solely on the numbers, free from bias.

Second, while I mention betting, I discourage gambling for profit. A small bet for fun or tracking predictions is fine, but gambling to earn money is addictive and harmful.

Summary

In this article, we explored whether a team's season win and loss record can predict game outcomes. In the end, we built a very simple ML model that predicts winners with 80% accuracy, covering about 40 cases per year.

The Data

For this series, we'll use the play-by-play data provided by nflfastR (Big shoutout to nflverse). The dataset contains 372 columns of play-by-play data, spanning from 09/12/1999 to 12/02/2024, with a total of 26 seasons, 6,898 games, and 1,154,352 plays.

Below is a quick sneak peek at the data.

Table.1: NFL Fast R Play by Play Data

play_id game_id old_game_id home_team away_team season_type week posteam posteam_type defteam side_of_field yardline_100 game_date quarter_seconds_remaining half_seconds_remaining game_seconds_remaining game_half quarter_end drive sp qtr down goal_to_go time yrdln ydstogo ydsnet desc play_type yards_gained shotgun no_huddle qb_dropback qb_kneel qb_spike qb_scramble pass_length pass_location air_yards yards_after_catch run_location run_gap field_goal_result kick_distance extra_point_result two_point_conv_result home_timeouts_remaining away_timeouts_remaining timeout timeout_team td_team td_player_name td_player_id posteam_timeouts_remaining defteam_timeouts_remaining total_home_score total_away_score posteam_score defteam_score score_differential posteam_score_post defteam_score_post score_differential_post no_score_prob opp_fg_prob opp_safety_prob opp_td_prob fg_prob safety_prob td_prob extra_point_prob two_point_conversion_prob ep epa total_home_epa total_away_epa total_home_rush_epa total_away_rush_epa total_home_pass_epa total_away_pass_epa air_epa yac_epa comp_air_epa comp_yac_epa total_home_comp_air_epa total_away_comp_air_epa total_home_comp_yac_epa total_away_comp_yac_epa total_home_raw_air_epa total_away_raw_air_epa total_home_raw_yac_epa total_away_raw_yac_epa wp def_wp home_wp away_wp wpa vegas_wpa vegas_home_wpa home_wp_post away_wp_post vegas_wp vegas_home_wp total_home_rush_wpa total_away_rush_wpa total_home_pass_wpa total_away_pass_wpa air_wpa yac_wpa comp_air_wpa comp_yac_wpa total_home_comp_air_wpa total_away_comp_air_wpa total_home_comp_yac_wpa total_away_comp_yac_wpa total_home_raw_air_wpa total_away_raw_air_wpa total_home_raw_yac_wpa total_away_raw_yac_wpa punt_blocked first_down_rush first_down_pass first_down_penalty third_down_converted third_down_failed fourth_down_converted fourth_down_failed incomplete_pass touchback interception punt_inside_twenty punt_in_endzone punt_out_of_bounds punt_downed punt_fair_catch kickoff_inside_twenty kickoff_in_endzone kickoff_out_of_bounds kickoff_downed kickoff_fair_catch fumble_forced fumble_not_forced fumble_out_of_bounds solo_tackle safety penalty tackled_for_loss fumble_lost own_kickoff_recovery own_kickoff_recovery_td qb_hit rush_attempt pass_attempt sack touchdown pass_touchdown rush_touchdown return_touchdown extra_point_attempt two_point_attempt field_goal_attempt kickoff_attempt punt_attempt fumble complete_pass assist_tackle lateral_reception lateral_rush lateral_return lateral_recovery passer_player_id passer_player_name passing_yards receiver_player_id receiver_player_name receiving_yards rusher_player_id rusher_player_name rushing_yards lateral_receiver_player_id lateral_receiver_player_name lateral_receiving_yards lateral_rusher_player_id lateral_rusher_player_name lateral_rushing_yards lateral_sack_player_id lateral_sack_player_name interception_player_id interception_player_name lateral_interception_player_id lateral_interception_player_name punt_returner_player_id punt_returner_player_name lateral_punt_returner_player_id lateral_punt_returner_player_name kickoff_returner_player_name kickoff_returner_player_id lateral_kickoff_returner_player_id lateral_kickoff_returner_player_name punter_player_id blocked_player_name tackle_for_loss_1_player_id tackle_for_loss_1_player_name tackle_for_loss_2_player_id tackle_for_loss_2_player_name qb_hit_1_player_id qb_hit_1_player_name qb_hit_2_player_id qb_hit_2_player_name forced_fumble_player_1_team forced_fumble_player_1_player_id forced_fumble_player_1_player_name forced_fumble_player_2_team forced_fumble_player_2_player_id forced_fumble_player_2_player_name solo_tackle_1_team solo_tackle_2_team solo_tackle_1_player_id solo_tackle_2_player_id solo_tackle_1_player_name solo_tackle_2_player_name assist_tackle_1_player_id assist_tackle_1_player_name assist_tackle_1_team assist_tackle_2_player_id assist_tackle_2_player_name assist_tackle_2_team assist_tackle_3_player_id assist_tackle_3_player_name assist_tackle_3_team assist_tackle_4_player_id assist_tackle_4_player_name assist_tackle_4_team tackle_with_assist tackle_with_assist_1_player_id tackle_with_assist_1_player_name tackle_with_assist_1_team tackle_with_assist_2_player_id tackle_with_assist_2_player_name tackle_with_assist_2_team pass_defense_1_player_id pass_defense_1_player_name pass_defense_2_player_id pass_defense_2_player_name fumbled_1_team fumbled_1_player_id fumbled_1_player_name fumbled_2_player_id fumbled_2_player_name fumbled_2_team fumble_recovery_1_team fumble_recovery_1_yards fumble_recovery_1_player_id fumble_recovery_1_player_name fumble_recovery_2_team fumble_recovery_2_yards fumble_recovery_2_player_id fumble_recovery_2_player_name sack_player_id sack_player_name half_sack_1_player_id half_sack_1_player_name half_sack_2_player_id half_sack_2_player_name return_team return_yards penalty_team penalty_player_id penalty_player_name penalty_yards replay_or_challenge replay_or_challenge_result penalty_type defensive_two_point_attempt defensive_two_point_conv defensive_extra_point_attempt defensive_extra_point_conv safety_player_name safety_player_id season cp cpoe series series_success series_result order_sequence start_time time_of_day stadium weather nfl_api_id play_clock play_deleted play_type_nfl special_teams_play st_play_type end_clock_time end_yard_line fixed_drive fixed_drive_result drive_real_start_time drive_play_count drive_time_of_possession drive_first_downs drive_inside20 drive_ended_with_score drive_quarter_start drive_quarter_end drive_yards_penalized drive_start_transition drive_end_transition drive_game_clock_start drive_game_clock_end drive_start_yard_line drive_end_yard_line drive_play_id_started drive_play_id_ended away_score home_score location result total spread_line total_line div_game roof surface temp wind home_coach away_coach stadium_id game_stadium aborted_play success passer passer_jersey_number rusher rusher_jersey_number receiver receiver_jersey_number pass rush first_down special play passer_id rusher_id receiver_id name jersey_number id fantasy_player_name fantasy_player_id fantasy fantasy_id out_of_bounds home_opening_kickoff qb_epa xyac_epa xyac_mean_yardage xyac_median_yardage xyac_success xyac_fd xpass pass_oe old_game_id_x nflverse_game_id old_game_id_y possession_team offense_formation offense_personnel defenders_in_box defense_personnel number_of_pass_rushers players_on_play offense_players defense_players n_offense n_defense ngs_air_yards time_to_throw was_pressure route defense_man_zone_type defense_coverage_type extra_point_count two_point_conv_count field_goal_count posteam_score_diff defteam_score_diff winning_team_type vegas_away_wp total_minutes winning_team_score losing_team_score total_minutes_rounded winning_team_score_total losing_team_score_total score_diff_total
Loading ITables v2.2.3 from the internet... (need help?)

As you can see, there is a lot of information ranging from passing yards per play to tackles with assistance.

For this first part of the series, we will focus solely on how records impact the current game results. To simplify the dataset, we will narrow it down to Vegas information and the current season record.

Below you can see a more reduced table, that we will use:

Table.2: Game Result Data by Team

season week game_id team team_type opponent_team team_score_final opponent_team_score_final is_winner_final team_score_diff_final team_vegas_wp team_vegas_spread team_is_winner_vegas_spread opponent_team_vegas_wp
Loading ITables v2.2.3 from the internet... (need help?)

Vegas Odds

When it comes to predictions, most ML models are built to compete against Vegas lines. But why fight it when we can embrace the wealth of information Vegas provides? Sure, there are tons of bookies offering different odds, and public betting probably skews the lines. But for this analysis, the spread line from Pro-Football-Reference should do just fine. It gives us a solid baseline for how accurate Vegas can be and sets the stage for some exciting comparisons.

Scores
Aaron M. Sprecher (Getty Images)

As shown in Fig. 1, Vegas' accuracy in predicting winners using the spread line was an impressive 66.3%.

No description has been provided for this image

Now, let’s take a look at how accurate Vegas is at projecting the winner based on the spread line. It’s important to note that in this part of the series, we are not focusing on the point spread itself. This means we’re not evaluating whether the spread was covered; we’re only predicting winners regardless of the points difference.

No description has been provided for this image

As expected in Fig.2, we can see that the larger the projected spread difference, the more accurate Vegas is at predicting the winner. However, over 50% of games have a spread line under 4 points, and for those, the accuracy is between 50-60%—not exactly impressive.

An interesting stat: in 94 instances where Vegas offered a spread of 15 or more, only 3 times did the underdog actually win. Those games were:

Another game that sticks out—and one I remember all too well—was Week 14 of the 2023 season: my home team, the Miami Dolphins, faced the Tennessee Titans. Miami was riding high at 9-3, while the Titans were struggling at 4-8. Vegas gave Miami a 14-point spread, but we ended up losing 28-27. To make it worse, I had Tyreek Hill in my fantasy lineup, and he barely played in that game 😩.


The Record

Let’s dive into the goal of this article and find out whether a team’s record influences their chances of winning.

Scores
Chris Graythen (Getty Images)

First, let's calculate the Pearson correlation coefficient of winning and winning margin against:

  • Total Games (record_total_games)
  • Season Games Won (season_winning_record)
  • Season Games Lost (season_losing_record)
  • Season Winning Ratio (season_winning_ratio)
  • Opponent Season Games Won (opponent_season_winning_record)
  • Opponent Season Games Lost (opponent_season_losing_record)
  • Opponent Season Winning Ratio (opponent_season_winning_ratio)
  • Winning Ratio Difference (Season Winning Ratio - Opponent Season Winning Ratio) (winning_ratio_diff)
  • Whether the team is playing at home or not (is_home)

We'll also include the Vegas Spread Line for comparison.

Table.3: Is Winner/Score Diff Correlations

is_winner_final team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

As expected, Vegas knows what it's doing—the Vegas spread line is far more correlated to winning and winning margin than the winning games ratio. Additionally, winning ratios are more significant than the record itself.

But what about winning streaks? Do they have an impact in the NFL?

Table.4: Is Winner/Score Diff Correlations

is_winner_final team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

As shown in the table above, the opponent’s winning ratio has minimal correlation with the likelihood of winning. However, longer streaks—such as 12, 13, or 14 games—seem to have a more significant influence on determining the winner.

In other words, having a winning or losing record doesn’t directly impact the outcome in a noticeable way. But let’s dig deeper into the data to see if we can uncover any hidden patterns.

No description has been provided for this image

Now we’re getting somewhere! As we can see, when the winning ratio difference is small, anything can happen. But as we move toward the extremes—where (>0.25) a great team faces one that’s struggling—we start to see a clear trend emerge.

One interesting observation is that when the diff=1, the pattern shifts. After inspecting the data, this happens early in the season when there are many unbeaten or winless teams. So, let’s try filtering the data to include only games after Week 3.

No description has been provided for this image

Well, this is something. Let's check correlations after Week 3 and when the winning ratio difference is greater than 0.25 or less than -0.25.

Table.4: Is Winner/Score Diff Correlations

is_winner_final team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

Now our data aligns closely with the Vegas spread, which is motivating enough to try building a simple ML model. And yes, we’ve significantly reduced our dataset—from 6,898 games to just 2,254 games—but hey, you can’t always win, right? 😅

Machine Learning

Now let's try to make a simple ML Model, we will use a Decision Tree Algorithm mainly because it is simple and very illustrative, and also we have a very small dataset with lower than 10000 datapoints.

ML
Chat GPT

Decision trees are a type of supervised machine learning algorithm used for both classification and regression tasks. They work by recursively splitting the data into subsets based on the features that provide the most significant separation according to a chosen metric. This method is interpretable, as we can visualize how the tree splits the data and the decision-making process it follows. However, Decision Trees can be prone to overfitting, especially with complex datasets, so we’ll use techniques like limiting tree depth to ensure better generalization.

First, we will split the data. We will use data from 1999 to 2022 to train our Decision Tree, and then test performance with 2023 and 2024 data.

No description has been provided for this image

Fig. 5 shows the decision tree. We optimized the parameters using GridSearchCV, which means we tested several tree configurations to find the one that performed best. As shown, the tree has 3 depth levels. It first checks whether winning_ratio_diff > -0.029, then evaluates if winning_ratio_diff > 0.216, and finally considers whether the team is playing at home.

From 1999 to 2022, teams that were local and had a winning_ratio_diff of 0.216 played 990 games and won 763 of them, achieving an impressive win rate of almost 80%.

Now, let's test how this simple rule would have done in 2023 and 2024.

As we see above, out of 64 instances where this condition was met, the 80% win ratio was maintained. In 51 of those cases, the winning team celebrated at home 🎉🎉🎉🎉.

Now, let’s move on and display the table of predictions.

Table5: Prediction Results

season week team opponent_team win_prediction is_winner team_vegas_wp team_vegas_spread team_score_final opponent_team_score_final is_home season_winning_ratio opponent_season_winning_ratio
Loading ITables v2.2.3 from the internet... (need help?)

Actually, in Week 13 of 2024, we could have hit a 7-leg parlay 😂.

Conclusions & Credits

We’ve concluded the first part of NFL Predictions. We achieved a solid result with a simple decision tree providing an 80% success rate in cases where it applied—around 35-45 cases per year. In the next part, we’ll dive into Points, Spreads, and Totals.

Shoutouts:

  • OpenAI: Without ChatGPT, this article wouldn’t have been possible—the author’s native language is Spanish (which they barely speak now 😂).
  • NFLFastR: For providing the data to analyze the NFL.
  • My wife ❤️.