# Using Explosive Plays to Predict Points Per Game

Woody Hayes, the legendary football coach at Ohio State, is credited with coining the term “three yards and a cloud of dust” to describe an offensive philosophy of grinding out yards on the way to scoring a touchdown. In theory, this sounds like a great strategy. For those teams who are able to physically dominate their opponent for 60 minutes or out scheme them on each and every drive, this style of play can be very successful. However, those teams are the outliers, as it's the teams who generate the most "explosive plays" who are lighting up the scoreboard.

In coaching circles, the term **"explosive play" **generally refers to an **offensive play that generates 20 yards or more**. Many offensive coordinators have a special section on their call sheets dedicated to generating explosive plays. These plays are usually designed around getting the ball in the hands of the best skill players and exploiting a weak spot in the defense at the right time.

The purpose of this analysis is to first show the direct relationship between **"explosive plays" and Points Per Game, as well as Predict what a teams Points Per Game will be, based on the number of +20 yard plays it generates during the season. **

__Why are Explosive Plays Important? __

Before we get into predictions, let's take a look at the **Correlation Plot t**hat I created** **below using data from the 2018-19 college football season. In statistics, a **correlation** refers to the **degree in which a pair of variables are related in a linear fashion.** They can either have a **positive **relationship where as one variable goes up the other goes up (Think Yards Per game & Points Per Game), or it can be a **negative relationship** where one variable goes up and the other goes down (think about Wins & Defensive Points Per Game). As Defensive Points Per Game get lower, wins get higher! That is a strong Negative Correlation.

Correlations are also useful because they can **indicate a predictive relationship**, so starting with this correlation plot we see how a number of Offensive Categories are related. Correlations are measured on a scale of 0 to 1 the closer to 1, the stronger the correlation with **1 being perfectly Correlated. **

**Key takeaways from the correlation plot**

· Points Per Game & Wins are highly Correlated 72%

· Yards Per Game & Points Per Game are highly Correlated 89%

· Yards Per Game, Points Per Game & Wins are all highly Correlated with **Offensive Plays of +20 yards** at 89%, 83% & 59% respectively

So as you can see, explosive plays of +20 yards or more are directly related to **more yards, more points and more wins!** If you need more proof, just look at the visuals below.

**Explosive Play Ratio**

Using data from the 2018-19 College football season, I created an **explosive play Ratio** for each of the 130 FBS teams, you can see this number in the "Ratio" column below. The first group shown below is the **Top 10 Teams in the country for +20 explosive play Ratio**. Oklahoma at #1 had an amazing 8.3 ratio, meaning they had a +20 yard gain just about every 8 snaps!

__ The top ten Explosive Play teams __by Ratio had an average of

**9.5 Wins**and averaged

**41.49 Points per Game.**Compare that to the

__below who had an average of only__

**bottom 10 Explosive Play teams****5 wins**and 1

**9.07 Points Per Game**.

**Central Michigan**had the distinction of being the worst team in the country at generating explosive plays with a Play over 20 yards roughly every 33 snaps.

**Relationship between Explosive Plays & Points Per Game**

To better understand and visualize the linear relationship between +20 yard explosive plays and Points Per Game, we can plot the output of **all 130 FBS teams**. As you can see, there is a clear **linear relationship between Offensive Plays of +20 yards and Points Per Game**, as the total number of Offensive +20 yard plays go up, Points Per Game goes up.

__Creating the Predictive Model (Linear Regression)__

Just because two variables have a strong correlation , it does not always mean one variable predicts the value or outcome of the other. In order to run a predictive model we must start with a ** regression analysis. **A Regression analysis is a statistical technique for estimating the change in a dependent variable

**(Points Per Game)**in this case , due to the change in one or more independent variables

**(+20 yard Plays). A regression**is a powerful and flexible tool which is used to forecast the past, present or future events on the basis of past or present events. The purpose of our linear Regression is to create a model that will

**predict the expected Points Per Game based on differing totals of +20 yard plays.**

__Linear Model( Points Per Game ~ 20+ yard Plays)__

Using the data from the 2018-19 college football season in the model, I've included some of output from of the Regression analysis.

**(Intercept)** 5.49122 1.44262 3.806 0.000218 ***

**O.20.** 0.38789 0.02275 17.048 < 2e-16 ***

--- Significance. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Multiple R-squared: 0.6942, **Adjusted R-squared: 0.6918 **

F-statistic: 290.6 on 1 and 128 DF, p-value: < 2.2e-16

The output of the model and the mathematics involved are outside the scope of this article but the **Adjusted R-Squared is really the only thing you need to be aware of. **The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model, think of it as a **Predictive Strength score**. R-squared is always between **0 and 100%: **0% indicates that the model explains none of the variability of the response data around its mean.100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. So in this case the **R -Squared is 69% which is a very strong!**

__Predicting Points Per Game__

**Now comes the fun part! **Using the output from the model, we can now predict the Points Per Game Total for different values of a teams total number of +20 Yard Plays .

We start by creating a new data frame containing, 13 new +20 yard play totals for the season.

**New +20 Yard Play Totals**( 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 95, 100, 120 )

Now that we have our new +20 yard play totals, we can run the prediction. You can see the new predicted Points Per Game Totals below. **Each Row numbered Row 1 - 13** corresponds to each of the **new +20 yard play totals that we created**. Because of the uncertainty in any prediction, each **row has 3 different totals** ( Fit , Lower & Upper). See confidence Interval below. The fit is the predicted Points Per Game total & the lower , upper is the range for the 95% confidence interval.

**New +20 Yard Play Totals **1) 20, 2) 25, 3) 30, 4) 35, 5) 40, 6) 45, 7) 50, 8) 60, 9) 70, 10) 80, 11) 95, 12) 100, 13) 120

__Confidence Interval__

As previous stated, as with any prediction you are always dealing with a **level of uncertainty.** The confidence interval reflects the uncertainty around the mean predictions. To display the 95% confidence intervals around the mean the predictions, we can add it to the model.

The output contains the following columns:

**fit:**the predicted Points Per Game values for the 13 new +20 Play totals, same as above**lwr**and**upr**: the lower and the upper confidence limits for the expected values, respectively. By default the function produces the 95% confidence limits.

For example, Let's take look at **Row #9** above which is the predicted Points Per Game based on 70 Plays of 20 or more yards in a season. The predicted Points Per Game Total for that team is 32.64 (Fit) & The 95% confidence interval associated with a +20 Yard play total of **70** is (**31.88, 33.40**) ( lwr, Upr). This means that, according to our model, a team with 70 total plays of +20 yards or more during the season will core on average between 31.88 and 33.40 Points Per Game. **Pretty cool right!**

One more example: Take a look at **Row #12**, this represents a team with **100 total plays** of 20 yards or more. According to our model the predicted Points Per Game for this team is **44.28** with a 95% confidence Interval range between **42. 43 - 46.12 Points Per Game.**

So that is an overview of my model for Predicting points per game based on the number of +20 yard plays your team generates. Hopefully you enjoyed this analysis and see just how powerful predictive analytics can be.

**As always, I am available to speak one on with Coaches, Radio & TV personalities who are interested in further analysis and insights. Feel free to reach out to me at anytime info@acefootballanalytics.com **