What’s This About 🤔

Welcome to NFL Predictions, a series where we dive into what it takes to make NFL predictions. In this first part, Winner by Record, we’ll explore how the team seasonal record impacts the likelihood of winning games.

As the series unfolds, we’ll develop tools and eventually create a predictive model to answer the ultimate question: Who’s going to win? Let’s get started!

Disclaimers

First, I’m relatively new to American football, with just three years of watching under my belt. While I still have plenty to learn, this fresh perspective allows me to focus solely on the numbers, free from bias.

Second, while I mention betting, I discourage gambling for profit. A small bet for fun or tracking predictions is fine, but gambling to earn money is addictive and harmful.

Summary

In this article, we explored whether a team's season win and loss record can predict game outcomes. In the end, we built a very simple ML model that predicts winners with 80% accuracy, covering about 40 cases per year.

The Data

For this series, we'll use the play-by-play data provided by nflfastR (Big shoutout to nflverse). The dataset contains 372 columns of play-by-play data, spanning from 09/12/1999 to 12/02/2024, with a total of 26 seasons, 6,898 games, and 1,154,352 plays.

Below is a quick sneak peek at the data.

Table.1: NFL Fast R Play by Play Data

	play_id	game_id	old_game_id	home_team	away_team	season_type	week	posteam	posteam_type	defteam	side_of_field	yardline_100	game_date	quarter_seconds_remaining	half_seconds_remaining	game_seconds_remaining	game_half	quarter_end	drive	sp	qtr	down	goal_to_go	time	yrdln	ydstogo	ydsnet	desc	play_type	yards_gained	shotgun	no_huddle	qb_dropback	qb_kneel	qb_spike	qb_scramble	pass_length	pass_location	air_yards	yards_after_catch	run_location	run_gap	field_goal_result	kick_distance	extra_point_result	two_point_conv_result	home_timeouts_remaining	away_timeouts_remaining	timeout	timeout_team	td_team	td_player_name	td_player_id	posteam_timeouts_remaining	defteam_timeouts_remaining	total_home_score	total_away_score	posteam_score	defteam_score	score_differential	posteam_score_post	defteam_score_post	score_differential_post	no_score_prob	opp_fg_prob	opp_safety_prob	opp_td_prob	fg_prob	safety_prob	td_prob	extra_point_prob	two_point_conversion_prob	ep	epa	total_home_epa	total_away_epa	total_home_rush_epa	total_away_rush_epa	total_home_pass_epa	total_away_pass_epa	air_epa	yac_epa	comp_air_epa	comp_yac_epa	total_home_comp_air_epa	total_away_comp_air_epa	total_home_comp_yac_epa	total_away_comp_yac_epa	total_home_raw_air_epa	total_away_raw_air_epa	total_home_raw_yac_epa	total_away_raw_yac_epa	wp	def_wp	home_wp	away_wp	wpa	vegas_wpa	vegas_home_wpa	home_wp_post	away_wp_post	vegas_wp	vegas_home_wp	total_home_rush_wpa	total_away_rush_wpa	total_home_pass_wpa	total_away_pass_wpa	air_wpa	yac_wpa	comp_air_wpa	comp_yac_wpa	total_home_comp_air_wpa	total_away_comp_air_wpa	total_home_comp_yac_wpa	total_away_comp_yac_wpa	total_home_raw_air_wpa	total_away_raw_air_wpa	total_home_raw_yac_wpa	total_away_raw_yac_wpa	punt_blocked	first_down_rush	first_down_pass	first_down_penalty	third_down_converted	third_down_failed	fourth_down_converted	fourth_down_failed	incomplete_pass	touchback	interception	punt_inside_twenty	punt_in_endzone	punt_out_of_bounds	punt_downed	punt_fair_catch	kickoff_inside_twenty	kickoff_in_endzone	kickoff_out_of_bounds	kickoff_downed	kickoff_fair_catch	fumble_forced	fumble_not_forced	fumble_out_of_bounds	solo_tackle	safety	penalty	tackled_for_loss	fumble_lost	own_kickoff_recovery	own_kickoff_recovery_td	qb_hit	rush_attempt	pass_attempt	sack	touchdown	pass_touchdown	rush_touchdown	return_touchdown	extra_point_attempt	two_point_attempt	field_goal_attempt	kickoff_attempt	punt_attempt	fumble	complete_pass	assist_tackle	lateral_reception	lateral_rush	lateral_return	lateral_recovery	passer_player_id	passer_player_name	passing_yards	receiver_player_id	receiver_player_name	receiving_yards	rusher_player_id	rusher_player_name	rushing_yards	lateral_receiver_player_id	lateral_receiver_player_name	lateral_receiving_yards	lateral_rusher_player_id	lateral_rusher_player_name	lateral_rushing_yards	lateral_sack_player_id	lateral_sack_player_name	interception_player_id	interception_player_name	lateral_interception_player_id	lateral_interception_player_name	punt_returner_player_id	punt_returner_player_name	lateral_punt_returner_player_id	lateral_punt_returner_player_name	kickoff_returner_player_name	kickoff_returner_player_id	lateral_kickoff_returner_player_id	lateral_kickoff_returner_player_name	punter_player_id	blocked_player_name	tackle_for_loss_1_player_id	tackle_for_loss_1_player_name	tackle_for_loss_2_player_id	tackle_for_loss_2_player_name	qb_hit_1_player_id	qb_hit_1_player_name	qb_hit_2_player_id	qb_hit_2_player_name	forced_fumble_player_1_team	forced_fumble_player_1_player_id	forced_fumble_player_1_player_name	forced_fumble_player_2_team	forced_fumble_player_2_player_id	forced_fumble_player_2_player_name	solo_tackle_1_team	solo_tackle_2_team	solo_tackle_1_player_id	solo_tackle_2_player_id	solo_tackle_1_player_name	solo_tackle_2_player_name	assist_tackle_1_player_id	assist_tackle_1_player_name	assist_tackle_1_team	assist_tackle_2_player_id	assist_tackle_2_player_name	assist_tackle_2_team	assist_tackle_3_player_id	assist_tackle_3_player_name	assist_tackle_3_team	assist_tackle_4_player_id	assist_tackle_4_player_name	assist_tackle_4_team	tackle_with_assist	tackle_with_assist_1_player_id	tackle_with_assist_1_player_name	tackle_with_assist_1_team	tackle_with_assist_2_player_id	tackle_with_assist_2_player_name	tackle_with_assist_2_team	pass_defense_1_player_id	pass_defense_1_player_name	pass_defense_2_player_id	pass_defense_2_player_name	fumbled_1_team	fumbled_1_player_id	fumbled_1_player_name	fumbled_2_player_id	fumbled_2_player_name	fumbled_2_team	fumble_recovery_1_team	fumble_recovery_1_yards	fumble_recovery_1_player_id	fumble_recovery_1_player_name	fumble_recovery_2_team	fumble_recovery_2_yards	fumble_recovery_2_player_id	fumble_recovery_2_player_name	sack_player_id	sack_player_name	half_sack_1_player_id	half_sack_1_player_name	half_sack_2_player_id	half_sack_2_player_name	return_team	return_yards	penalty_team	penalty_player_id	penalty_player_name	penalty_yards	replay_or_challenge	replay_or_challenge_result	penalty_type	defensive_two_point_attempt	defensive_two_point_conv	defensive_extra_point_attempt	defensive_extra_point_conv	safety_player_name	safety_player_id	season	cp	cpoe	series	series_success	series_result	order_sequence	start_time	time_of_day	stadium	weather	nfl_api_id	play_clock	play_deleted	play_type_nfl	special_teams_play	st_play_type	end_clock_time	end_yard_line	fixed_drive	fixed_drive_result	drive_real_start_time	drive_play_count	drive_time_of_possession	drive_first_downs	drive_inside20	drive_ended_with_score	drive_quarter_start	drive_quarter_end	drive_yards_penalized	drive_start_transition	drive_end_transition	drive_game_clock_start	drive_game_clock_end	drive_start_yard_line	drive_end_yard_line	drive_play_id_started	drive_play_id_ended	away_score	home_score	location	result	total	spread_line	total_line	div_game	roof	surface	temp	wind	home_coach	away_coach	stadium_id	game_stadium	aborted_play	success	passer	passer_jersey_number	rusher	rusher_jersey_number	receiver	receiver_jersey_number	pass	rush	first_down	special	play	passer_id	rusher_id	receiver_id	name	jersey_number	id	fantasy_player_name	fantasy_player_id	fantasy	fantasy_id	out_of_bounds	home_opening_kickoff	qb_epa	xyac_epa	xyac_mean_yardage	xyac_median_yardage	xyac_success	xyac_fd	xpass	pass_oe	old_game_id_x	nflverse_game_id	old_game_id_y	possession_team	offense_formation	offense_personnel	defenders_in_box	defense_personnel	number_of_pass_rushers	players_on_play	offense_players	defense_players	n_offense	n_defense	ngs_air_yards	time_to_throw	was_pressure	route	defense_man_zone_type	defense_coverage_type	extra_point_count	two_point_conv_count	field_goal_count	posteam_score_diff	defteam_score_diff	winning_team_type	vegas_away_wp	total_minutes	winning_team_score	losing_team_score	total_minutes_rounded	winning_team_score_total	losing_team_score_total	score_diff_total
Loading ITables v2.2.3 from the internet... (need help?)

As you can see, there is a lot of information ranging from passing yards per play to tackles with assistance.

For this first part of the series, we will focus solely on how records impact the current game results. To simplify the dataset, we will narrow it down to Vegas information and the current season record.

Below you can see a more reduced table, that we will use:

Table.2: Game Result Data by Team

season	week	game_id	team	team_type	opponent_team	team_score_final	opponent_team_score_final	is_winner_final	team_score_diff_final	team_vegas_wp	team_vegas_spread	team_is_winner_vegas_spread	opponent_team_vegas_wp
Loading ITables v2.2.3 from the internet... (need help?)

Vegas Odds

When it comes to predictions, most ML models are built to compete against Vegas lines. But why fight it when we can embrace the wealth of information Vegas provides? Sure, there are tons of bookies offering different odds, and public betting probably skews the lines. But for this analysis, the spread line from Pro-Football-Reference should do just fine. It gives us a solid baseline for how accurate Vegas can be and sets the stage for some exciting comparisons.

Scores — Aaron M. Sprecher (Getty Images)

As shown in Fig. 1, Vegas' accuracy in predicting winners using the spread line was an impressive 66.3%.

No description has been provided for this image

Now, let’s take a look at how accurate Vegas is at projecting the winner based on the spread line. It’s important to note that in this part of the series, we are not focusing on the point spread itself. This means we’re not evaluating whether the spread was covered; we’re only predicting winners regardless of the points difference.

As expected in Fig.2, we can see that the larger the projected spread difference, the more accurate Vegas is at predicting the winner. However, over 50% of games have a spread line under 4 points, and for those, the accuracy is between 50-60%—not exactly impressive.

An interesting stat: in 94 instances where Vegas offered a spread of 15 or more, only 3 times did the underdog actually win. Those games were:

Bills 27 vs 6 Vikings (September 23, 2018, Week 3)
Dolphins 27 vs 24 Patriots (December 29, 2019, Wild Card Game)
Jets 23 vs 20 Rams (December 20, 2020, Week 16)

Another game that sticks out—and one I remember all too well—was Week 14 of the 2023 season: my home team, the Miami Dolphins, faced the Tennessee Titans. Miami was riding high at 9-3, while the Titans were struggling at 4-8. Vegas gave Miami a 14-point spread, but we ended up losing 28-27. To make it worse, I had Tyreek Hill in my fantasy lineup, and he barely played in that game 😩.

The last time the @Titans played Miami on MNF, they mounted a 14-point comeback in the last 3 minutes of the game 😳

📺: #TENvsMIA – Tonight 7:30pm ET on ESPN
📱: Stream on #NFLPlus pic.twitter.com/rBKrieWrzN
— NFL (@NFL) September 30, 2024

The Record

Let’s dive into the goal of this article and find out whether a team’s record influences their chances of winning.

First, let's calculate the Pearson correlation coefficient of winning and winning margin against:

Total Games (record_total_games)
Season Games Won (season_winning_record)
Season Games Lost (season_losing_record)
Season Winning Ratio (season_winning_ratio)
Opponent Season Games Won (opponent_season_winning_record)
Opponent Season Games Lost (opponent_season_losing_record)
Opponent Season Winning Ratio (opponent_season_winning_ratio)
Winning Ratio Difference (Season Winning Ratio - Opponent Season Winning Ratio) (winning_ratio_diff)
Whether the team is playing at home or not (is_home)

We'll also include the Vegas Spread Line for comparison.

Table.3: Is Winner/Score Diff Correlations

	is_winner_final	team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

As expected, Vegas knows what it's doing—the Vegas spread line is far more correlated to winning and winning margin than the winning games ratio. Additionally, winning ratios are more significant than the record itself.

But what about winning streaks? Do they have an impact in the NFL?

Table.4: Is Winner/Score Diff Correlations

	is_winner_final	team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

As shown in the table above, the opponent’s winning ratio has minimal correlation with the likelihood of winning. However, longer streaks—such as 12, 13, or 14 games—seem to have a more significant influence on determining the winner.

In other words, having a winning or losing record doesn’t directly impact the outcome in a noticeable way. But let’s dig deeper into the data to see if we can uncover any hidden patterns.

Now we’re getting somewhere! As we can see, when the winning ratio difference is small, anything can happen. But as we move toward the extremes—where (>0.25) a great team faces one that’s struggling—we start to see a clear trend emerge.

One interesting observation is that when the diff=1, the pattern shifts. After inspecting the data, this happens early in the season when there are many unbeaten or winless teams. So, let’s try filtering the data to include only games after Week 3.

Well, this is something. Let's check correlations after Week 3 and when the winning ratio difference is greater than 0.25 or less than -0.25.

Table.4: Is Winner/Score Diff Correlations

	is_winner_final	team_score_diff_final
Loading ITables v2.2.3 from the internet... (need help?)

Now our data aligns closely with the Vegas spread, which is motivating enough to try building a simple ML model. And yes, we’ve significantly reduced our dataset—from 6,898 games to just 2,254 games—but hey, you can’t always win, right? 😅

Machine Learning

Now let's try to make a simple ML Model, we will use a Decision Tree Algorithm mainly because it is simple and very illustrative, and also we have a very small dataset with lower than 10000 datapoints.

Decision trees are a type of supervised machine learning algorithm used for both classification and regression tasks. They work by recursively splitting the data into subsets based on the features that provide the most significant separation according to a chosen metric. This method is interpretable, as we can visualize how the tree splits the data and the decision-making process it follows. However, Decision Trees can be prone to overfitting, especially with complex datasets, so we’ll use techniques like limiting tree depth to ensure better generalization.

First, we will split the data. We will use data from 1999 to 2022 to train our Decision Tree, and then test performance with 2023 and 2024 data.

Fig. 5 shows the decision tree. We optimized the parameters using GridSearchCV, which means we tested several tree configurations to find the one that performed best. As shown, the tree has 3 depth levels. It first checks whether winning_ratio_diff > -0.029, then evaluates if winning_ratio_diff > 0.216, and finally considers whether the team is playing at home.

From 1999 to 2022, teams that were local and had a winning_ratio_diff of 0.216 played 990 games and won 763 of them, achieving an impressive win rate of almost 80%.

Now, let's test how this simple rule would have done in 2023 and 2024.

As we see above, out of 64 instances where this condition was met, the 80% win ratio was maintained. In 51 of those cases, the winning team celebrated at home 🎉🎉🎉🎉.

Now, let’s move on and display the table of predictions.

Table5: Prediction Results

season	week	team	opponent_team	win_prediction	is_winner	team_vegas_wp	team_vegas_spread	team_score_final	opponent_team_score_final	is_home	season_winning_ratio	opponent_season_winning_ratio
Loading ITables v2.2.3 from the internet... (need help?)

Actually, in Week 13 of 2024, we could have hit a 7-leg parlay 😂.

Conclusions & Credits

We’ve concluded the first part of NFL Predictions. We achieved a solid result with a simple decision tree providing an 80% success rate in cases where it applied—around 35-45 cases per year. In the next part, we’ll dive into Points, Spreads, and Totals.

Shoutouts:

OpenAI: Without ChatGPT, this article wouldn’t have been possible—the author’s native language is Spanish (which they barely speak now 😂).
NFLFastR: For providing the data to analyze the NFL.
My wife ❤️.

Sergio Fernandez

Coder & Sport Enthusiast

NFL Predictions: "Winners by Record"

Table.1: NFL Fast R Play by Play Data

Table.2: Game Result Data by Team

Table.3: Is Winner/Score Diff Correlations

Table.4: Is Winner/Score Diff Correlations

Table.4: Is Winner/Score Diff Correlations

Table5: Prediction Results