The Trouble Under Center

Wfu sports analytics club president Jonathan Bell has written a new academic article on predicting the NFL success of college quarterbacks, check out the article here! 

The Trouble Under Center

Are the 2018 Red Sox the Best Regular Season Team in History?

MLB: Tampa Bay Rays at Boston Red Sox
Apr 7, 2018; Boston, MA, USA; Boston Red Sox right fielder J.D. Martinez (28) watches the ball after hitting a home run against the Tampa Bay Rays in the seventh inning at Fenway Park. Mandatory Credit: Brian Fluharty-USA TODAY Sports


The Boston Red Sox have seemingly had just about everything a team could ask for so far in the 2018 season.  Offseason acquisition J.D. Martinez is making a serious bid for the triple crown and the MVP award with the only other serious contender being teammate Mookie Betts who leads the AL with a preposterous 8.3 WAR.  Through 146 innings, Chris Sale has been mowing down opponents to the tune of a 1.97 ERA, 219 strikeouts and a 0.849 WHIP. Six other members of the team have a WAR above 2. There is no question that the 2018 edition of the Red Sox has been a historically great regular season team thus far, but will they be the greatest ever?

The current mark for wins in a season in the MLB is set at 116, held concurrently by the 2001 Seattle Mariners and the 1906 Chicago Cubs.  The Red Sox head into Monday nights matchup against the Cleveland Indians with an 88-37 record. With 37 games remaining in the schedule, the Red Sox require an exceptional finish to eclipse the 116 win mark, needing to go 29-8 in order to go down in the record books.  I was curious as to just how much of a shot the Sox had at 117, so I wrote some code to help me answer this question.

I decided the most efficient way of solving this problem was to simulate the rest of the Red Sox 2018 season 10,000 times and then look at how many of the simulations ended with over 116 wins.  So I first headed over to to collect some data. Fivethirtyeight lists win probabilities for every game left in the major league season, as we are only interested in those that involve the Red Sox, we can see all of these data points here.

Saving these data points into a CSV file, I then was able to read in and store these values.  To simulate each “game” left in the season, the code generates a random number between 0 and 100.  If the number is less than the corresponding win probability value, a win gets stored.  If it is more than the value, it goes down as a loss.  For example, Boston has a 55% chance of winning against Cleveland tonight according to fivethirtyeight, any random number less than 55 results in a win and anything more results in a loss.  At the end of each “season”, the program checks to see if more than 28 wins are recorded. After 10,000 such simulations, we can then look at how many times the Red Sox entered the history books.

Relatively surprisingly, (and to my Red Sox fan self, sadly), the Red Sox only have about a 3% chance to break the MLB wins record according to my model (314 out of 10,000 simulations).Screen Shot 2018-08-19 at 5.30.08 PMWhen looking at the chances to break or tie the record, the percentage about doubles to around 7% (703 out of 10,000).Screen Shot 2018-08-19 at 5.29.48 PM

With ace Chris Sale still on the DL and series against  the Astros, Indians, Yankees and Braves coming up, it seems that Boston has an uphill battle in their attempt to rewrite history.  


Code for my simulation model below:

//program to solve 116 win question

#include <cmath>

#include <iostream>

#include <vector>

#include <iostream>

#include <fstream>

using namespace std; //imports and namepsace statements

#define DEFAULT_RSEED   1299709 //defining constants

//#define RAND_MAX

int main() {

  vector <double> probs; //vector to store game probabilities from 538

        ifstream inFS;     // Input file stream

        string fileName;

        cout << “Which database would you like to use?”;

        cin >> fileName;; //reading in file with game probailities from 538

        if (!inFS.is_open()) {

            cout << “Could not open file “ << fileName << endl;

            return 1; //ask the user which database they would like to use for their game probabilities, then use ifstream to open up the file, if it opens move on, otherwise print out an error message and return 1


        int winProb;

        while (!inFS.eof()){

            inFS >> winProb;

            probs.push_back(winProb); //iterate through the game probability file and push the numbers to the win probability vector


    int winc =0;

    int sim = 10000;

    int totalc = 0;

    int probn = 0; //initializing variables

    while (totalc < sim) { //while the total count is less than desired numbers of sims

        winc = 0;

        for (int i = 0; i < probs.size(); ++i) { //iterate over the probs vector

            int random = rand() % 100 + 1;  //generate a random number between 0 and 100

            if ( – random > 0) {

                winc++; //if the difference between the random number and the probability is greater than 0, add one to the win count



        if (winc > 28){

            probn++; //if the win count exceeds 28, add one to the amount of times the red sox break the win record


        totalc++; //add one to the total count


    cout << probn; //return the number of times over 116 wins

    return 0;


Are the 2018 Red Sox the Best Regular Season Team in History?

Predicting The NFL Draft: RB Round Selection


By Mason Kirchner

In honor of this weekend’s NFL Draft, I decided to make a spiritual successor to my recent article that tried to predict how well incoming NFL rookies would perform in fantasy football. Feel free to read that to get an idea of the dataset used in this project, but note that the method is slightly different this time around. This time around, instead of trying to predict how well the incoming rookie would perform in the NFL, I decided to try to predict which round the player will be selected in.


In order to get more out of my data, I introduced squared terms and Principal Component Analysis. The hope is that by squaring each of the individual statistics, such as yards, is that some sort of inflection point can be more easily identified in the classification problem. Principal Component Analysis is used to transform the data in a way that reduces variance amongst different features.

This time around, instead of using purely random forests, I decided to use a voting ensemble method. This video sums up ensembles nicely, but the strength in using this over just random forests is that it allows me to combine multiple models to get better predictions.  In addition to voting, I used bagging, which is explained in this video, in order to additionally strengthen each of the individual models. The three methods used in the voting ensemble are K Nearest Neighbors, Random Forests, and Extra Random Forests, all of which were bagged. Other methods, such as SVM and AdaBoost were tested, but didn’t perform as well as the three that I settled on.

It’s worth noting that the best individual model only had .516 mean accuracy in cross-validation (Bagged Random Forest), and that the voting ensemble performed at .506 mean accuracy in cross-validation. This can be attributed to there being 8 different classifications – one for each round, as well as undrafted status. With this many possible classifications, getting an average of around 50% of the samples being correctly predicted is alright in my books.


Without further adieux, here’s the complete table of round probabilities for players that went to the 2017 Combine (and Joe Mixon):

Name Undrafted Round1 Round2 Round3 Round4 Round5 Round6 Round7
McCaffrey, Christian 0.092 0.049 0.175 0.159 0.193 0.211 0.080 0.042
Jones, Aaron 0.104 0.112 0.192 0.140 0.195 0.149 0.066 0.043
Pumphrey, DJ 0.106 0.091 0.196 0.178 0.185 0.137 0.069 0.040
McNichols, Jeremy 0.136 0.108 0.164 0.126 0.202 0.163 0.094 0.007
Cook, Dalvin 0.148 0.101 0.150 0.107 0.268 0.148 0.057 0.022
Hill, Brian 0.150 0.079 0.212 0.132 0.131 0.193 0.096 0.007
Hunt, Kareem 0.188 0.038 0.157 0.076 0.140 0.283 0.083 0.036
Foreman, D’onta 0.189 0.129 0.195 0.112 0.105 0.189 0.055 0.028
Williams, Joe 0.213 0.038 0.159 0.097 0.188 0.163 0.101 0.041
Joe Mixon 0.215 0.067 0.203 0.059 0.138 0.135 0.126 0.058
Williams, Jamaal 0.228 0.042 0.148 0.093 0.181 0.194 0.091 0.024
Mack, Marlon 0.241 0.045 0.105 0.135 0.100 0.150 0.145 0.079
Dayes, Matt 0.316 0.027 0.076 0.038 0.084 0.195 0.168 0.096
Clement, Corey 0.341 0.046 0.108 0.068 0.086 0.197 0.115 0.039
Williams, Stanley 0.359 0.028 0.098 0.056 0.107 0.152 0.117 0.084
McGuire, Elijah 0.380 0.009 0.094 0.074 0.110 0.165 0.120 0.048
Perine, Samaje 0.380 0.017 0.080 0.069 0.118 0.083 0.206 0.047
Gallman, Wayne 0.381 0.011 0.084 0.050 0.087 0.180 0.148 0.059
Conner, James 0.384 0.036 0.093 0.075 0.092 0.119 0.148 0.053
Redding, Devine 0.446 0.012 0.051 0.028 0.101 0.147 0.146 0.068
Thomas, Jahad 0.455 0.016 0.082 0.042 0.118 0.071 0.142 0.075
Hood, Elijah 0.466 0.055 0.049 0.057 0.084 0.130 0.074 0.084
Fournette, Leonard 0.468 0.059 0.052 0.078 0.094 0.112 0.065 0.072
Smith, De’Veon 0.538 0.025 0.039 0.054 0.052 0.116 0.060 0.118
Stevenson, Freddie 0.559 0.014 0.037 0.075 0.069 0.045 0.129 0.073
Logan, TJ 0.561 0.006 0.034 0.048 0.073 0.074 0.150 0.054
Kamara, Alvin 0.561 0.009 0.045 0.074 0.100 0.072 0.111 0.028
Cohen, Tarik 0.571 0.007 0.028 0.071 0.106 0.065 0.064 0.088
Rogers, Sam 0.582 0.003 0.028 0.073 0.043 0.033 0.133 0.107
Davis, Justin 0.608 0.009 0.026 0.045 0.063 0.081 0.084 0.085
Henderson, De’Angelo 0.620 0.007 0.024 0.084 0.096 0.054 0.060 0.056
Carson, Christopher 0.652 0.010 0.063 0.042 0.052 0.041 0.102 0.038
Ogunbowale, Dare 0.709 0.002 0.056 0.047 0.061 0.028 0.067 0.029
Shell, Rushel 0.773 0.010 0.018 0.035 0.029 0.030 0.058 0.048

The table itself is sorted from least to greatest on the probability that the player goes undrafted. You can download the excel file if you wish to make it sortable yourself.


We need to be careful in our analysis of these numbers not to assume that the highest predicted probability for each player is what is most likely to occur.

For example, Samaje Perine is given a .38 predicted probability of being undrafted. Or a .62 probability of being drafted. These predicted probabilities are more useful to us if we set up a probability distribution of where a player might be drafted. Using these numbers, we’re going to find that Perine has a .407 predicted probability of going somewhere between rounds four and six by adding up each of his predicted probabilities for those rounds.

The model still could use plenty of help. Like mentioned previously, it was only able to accurately predict an average of .506 during cross validation. There are some notable players that the model doesn’t like, such as Leonard Fournette, who likely end up being the first RB’s name called Thursday night. Alvin Kamara is another potential first/second rounder that is getting snubbed by the model. Their poor scores likely stem from the fact that both Fournette and Kamara missed games due to injury this past season.

Hopefully you enjoyed this exercise, and enjoy the NFL Draft as well.

Feel free to check out the code here.


Predicting The NFL Draft: RB Round Selection

Identifying Breakout Rookie Running Backs in Fantasy Football


By Mason Kirchner

Everybody is looking for the edge to winning their fantasy football leagues, and one of the easiest ways to win is by landing a breakout running back on one’s roster. With every expert proclaiming that “their guy” is the next Jordan Howard or Ezekiel Elliot, it’s easy to get lost in a sea of unfounded speculation. The goal of this article is simple: identify the RBs in the 2017 rookie class that look poised to have a breakout season at least once in their career by using a machine learning model.

The Data

The data acquired for the analysis is a combination of combine metrics and on-field performance statistics. The combine metrics include each player’s Height, Weight, BMI, Arm Length, Hand Size, 40 Yard Dash time, Bench Press repetitions, Vertical Leap height, Broad Jump distance, Shuttle time, and Three-Cone time. If a player was not invited to the combine, or did not run at the combine, their unofficial Pro Day numbers were used if available.

The on-field performance statistics are divided into three different categories (note that all of these statistics are drawn from the player’s final year of college football).

The first group of statistics are the elementary rushes, yards, and yards per carry numbers.

The second group of statistics are more advanced and are provided by Player Profiler. These statistics include Breakout Age, College Dominator, and College Target Share. The idea behind College Dominator and College Target Share is that they give a measurement of how much of their college team’s offense that they were responsible for, and Breakout Age provides the age in which the player first became responsible for over 30% of their team’s offense. There has been previous research done using these metrics by the writers over at RotoViz (note: RotoViz has a light pay-wall).

The third group of statistics is a group of statistics developed by Bill Connelly over at College Football Studyhall. Connelly developed these statistics as an attempt to try to mitigate the running back’s offensive line play, and figure out how well a running back performs when they’re on their own in the open field. You can read more about these statistics here. The Connelly statistics that are used include Highlight Yards, Highlight Yards per Carry, Highlight Opportunities, Highlight Yards per Opportunity, and Opportunity Rate.

Note: Both Player Profiler and Connelly provide the Target Share statistic.

The goal is to predict whether or not a player will finish the season in the top 12 highest scoring RBs in a Points Per Reception (PPR) league. To get an idea about which features are correlated with an RB1 season in PPR, the following correlation plot is made:

Correlation Plot Between Variables

The area of interest in this plot is the final row and column — the statistics correlated with having a PPR RB1 season. Most of the data are, at most, weakly correlated with an RB1 season, but it is worth nothing that the combine drills are least correlated of all. The Highlight Yards statistic is most correlated with an RB1 season.


Random Forest is the machine learning method used in this experiment. The model was trained using 80 percent of the samples in the data, and the final 20 percent for validation. Given the nature of Random Forests, over-fitting is mitigated, so all features described earlier were used in the model. The other nice thing about Random Forests is that we can easily work around the issue of missing data. Missing data arises frequently in this exercise due to many players not participating in certain combine drills, or the absence of the Connelly statistics that only exist at the FBS level of college football.

Using Sk-Learn’s Random Forest Classifier, a model was created that was able to predict the RB1 status in the validation set with perfect accuracy. It’s worth noting that most of the players in the sample are not classified as having an RB1 season, so the model only had to accurately predict the one player in the validation set that was an RB1 (Melvin Gordon). Below is the importance of each feature in the model (greater values indicate higher importance):

Importance of Each Model Feature

Extremely interestingly, hand size is the most important feature of RB PPR success in this model. This can likely be attributed to a few things. The top three hand sizes in the data set all turned out to be RB1s, so the model is trained to favor large hand measurements. Large hands will also help the RB catch the ball more easily, and prevent fumbling — attributes that will increase a player’s PPR score.

Not as surprisingly, College Dominator, BMI, and College Target Share round out the top four most important features. The previous research from RotoViz showed that College Dominator and Target Share proved to be valuable, and our model further backs up their claim. BMI is intuitively valuable as well, because an ideal RB will be have a high weight-to-height ratio that makes them difficult to tackle.

Predicting the 2017 Class

Using the model, and the available data from the 2017 RB Class, here are the predicted probabilities that each RB will have an RB1 season in PPR:

The most notable entry on this list is located at the top: Aaron Jones out of UTEP. Predicted by most experts to be a day three selection, this model gives him the highest probability in his draft class to be an RB1 at 61 percent. Digging further into his numbers we find a player that tested near the top of his position group in multiple combine drills, notable the Broad Jump, 3-Cone Drill, and Vertical Jump — all drills that measure an athlete’s explosiveness.

Aaron Jones’ Combine Measurables. Courtesy of


Additionally, Jones put up staggering production numbers during his 2016 campaign, as he accounted for 47% of his team’s offense on his way to running for 1773 yards. My favorite statistic of his was that he ran for nine Highlight Yards per Opportunity in his final season. In other words, for every rush he had that went over five yards, on average he rushed for an additional nine. With his high level of athleticism and great on the field performance, it makes sense that the model loves him.

Meanwhile, the (nearly) consensus number one prospect in this class, Leonard Fournette, ranks out as the 12th most likely RB to post an RB1 season in PPR at a probability of 0.2. What’s up with this? I can offer a few explanations.

The first of which is that Fournette missed five games last season, which undoubtedly hurt his season-long numbers. On top of that, Fournette only participated in two drills at the NFL combine: the 40 Yard Dash, and Vertical Jump. His absence from drills along with his historically bad, 28.5″ Vertical Jump further add to the equation.

He also has kind of small hands.

Put this all together, and the model ends up ranking him 12th in his class. That being said, a 20 percent chance of being an RB1 in PPR isn’t really that bad, especially considering this is who we’re talking about:


Goes without saying that this model is best paired with an eye test.

Further Work

A notable missing component from the model at this point is draft position. I didn’t want to include it in this model, because the 2017 Draft has not yet occurred but I still wanted to try to play my hand at GM for the upcoming draft. There has been prior research that shows draft position is heavily correlated with NFL success, so Fournette should see a nice bump in his ranking if he gets drafted at the top of the first round, and Jones will likely see a decrease if he is taken in a later round.

There’s an argument to be made in order to implement some feature reduction as well. While Random Forests shouldn’t run into an issue with over-fitting, it may be unnecessary for some of the less important variables to be included in the model as they add minimal value and introduce noise to the model.

Ethan Young also used some interesting variables in his recent QB analysis article in order to factor in how the player’s situation affects performance. Introducing variables that account for level of competition, era, surrounding talent, and scheme into this model would hopefully be just as worthwhile for RBs as they appear to be for QBs. Look for these variables to be in future models.

For those curious in seeing how this analysis was done, check out the python notebook here.


Identifying Breakout Rookie Running Backs in Fantasy Football

The Dilemma of the Aging Quarterback



By: Jonathan Bell

In the NFL, the first step to achieving and maintaining long term success for a franchise is finding a quarterback that you can build your team around.  All of the greatest dynasties in NFL history are synonymous with passers that are known by one name: Bradshaw, Montana, Aikman, and more recently Brady, Rodgers and Manning.  Yet, the annual promise of contention that comes with a franchise quarterback has one inevitable drawback: the unavoidable reality that the production of their star player will decline and even drastically.  

Often, we see general managers holding onto their beloved quarterbacks for just one season too long and the results (both on the individual and team level) can be shocking from year to year.  An understanding of when and why players regress is an essential skill for general managers, which is why Bill Belichick has been so incredibly successful.  In this article, I will use three case studies (Carson Palmer, Peyton Manning and Brett Favre) to attempt to solve the mystery of quarterback regression.

Peyton Manning

Peyton Manning’s late-career move to the Denver Broncos breathed new life into a career that many thought was over.  After missing almost all of 2011 due to a debilitating neck injury, Peyton made the most of his fresh start in Denver.  From 2012-2014, Manning posted an unprecedented stretch of numbers during the regular season, leaps and bounds ahead of the rest of the league.  To better visualize just how good Manning’s numbers were over this three year period, I made graphs using 3 key statistics: DVOA, DYAR, QBR.  

These are three advanced statistics utilized by to better judge quarterback performance rather than more traditional statistics such as yards, touchdowns, interceptions etc.  These statistics are fully explained on the website, but essentially a better DYAR means a quarterback with more total value and a higher DVOA means a quarterback with more value per play; QBR is simply a rating of everything a quarterback does in the run of play (not just passing) and is normalized on a scale ranging from 0-100. Manning’s face represents his data points on the charts.

Screen Shot 2017-03-30 at 3.08.23 PM

Screen Shot 2017-03-30 at 3.10.30 PM





These four charts depict two different things.  The charts on the left graph DYAR on the Y axis and QBR on the Y while the ones on the right chart DYAR with DVOA on the X axis.  The top charts are from Manning’s three year peak and the bottom charts are from Manning’s final season.  The extreme nature of Manning’s collapse is astounding.  Manning lapped the field  during his 3 year peak and no other quarterback was really even close.  Manning led the field in all 3 statistics and was the only quarterback with an average DYAR over 1500 (1,897) and QBR over 80 (81).  However in 2015, Manning fell to dead last in DVOA and DYAR and was in the bottom 3 for QBR.  So what caused this remarkable decline?  While one might point to Manning’s mounting list of injuries and age related decline as the major factors accounting for the dreadful season, I believe there was something else at play.

As most of us are only casual observers of the game of football, we only really get to see the game as it’s presented to us on television.  We see the quarterback drop back to pass and then the ball fly in the air towards the receiver.  However, we don’t get to see how these plays develop, how each player performs in their individual role and how this performance affects the outcome of the game as a whole.  Football Outsiders does an excellent job of resolving this problem, ranking individual players and offensive/defensive line units to determine overall effectiveness.  

Quarterback performance begins with their protection, and Peyton Manning’s offensive line ranked near the top during his first three years with the Broncos. From 2012 to 2014,  the Broncos unit finished no worse than second on Football Outsiders’ rankings (which are based on adjusted sack rate) and were top of the league in Manning’s final effective season behind center.  Over the three years, the Broncos experienced little turnover on their offensive line.  They lost only two starters:  OT Ryan Clady, who was only temporary as he missed 14 games in 2014 due to injury, and T/G Zane Beadles, who left as a free agent after 2013.  Otherwise, they started some combination of 7 different linemen for 3 straight seasons with a few exceptions.  

In the modern NFL, it is nearly impossible to start essentially the same combination of 5 to 7 core guys for almost three seasons and retain that level of performance. In combing through the available data on Football Outsiders, only one other team retained a top two position for three consecutive years in the pass blocking category (Indianapolis from 1999-2001) and none in the following 16 years.  

The offensive line also benefitted enormously from offensive coordinator Adam Gase’s offensive scheme.  Gase’s spread formation, often featuring 4 or more wideouts, would ensure that the defense could rush no more than 4 defenders, fearing Manning’s deep-threat ability.  Combining the spread offense with Manning in the shotgun for the vast majority of his snaps made the job significantly easier for an already talented offensive line.

Perhaps even more astonishing than the elite level performance of Manning’s line over this period was that of his receiving corps.  Blessed with Emmanuel Sanders, Demaryius Thomas, Wes Welker and Eric Decker over the three years, the Broncos were never short of talent at the wideout position.  While advanced wide receiver statistics aren’t perfect, (Football Outsiders even warns, “We cannot yet fully separate the performance of a receiver from the performance of his quarterback. Be aware that one will affect the other.”) they still tell us a good deal about wide receiver performance compared to the rest of the field.  

What the Broncos receivers do in this respect is what amazes.  The Broncos had the number one ranked receiver two out of three years, two receivers in the top five two out of three years, and two in the top ten all three years during Manning’s tenure.  The Broncos were basically an offensive Pro Bowl team in front of Manning.  This historic run stands alone in the 29-year database provided by Football Outsiders.

The Broncos offensive statistics plummeted in 2015.  Their once-elite pass protection fell from 1st in the league to 13th, and their highest ranked wideout was Emmanuel Sanders at 49th (49th!) followed by Demaryius Thomas at 60th.  The problems in 2015 began before the season even started with the hiring of Gary Kubiak as head coach.  Previously the head coach of the Texans for 7 years and coming off a single season as offensive coordinator for the Baltimore Ravens, Kubiak was a respected NFL coach who had enjoyed some success; however, Kubiak installed his own offense in Denver, involving a new zone blocking scheme with many plays in the pistol formation (a shortened shotgun).  Kubiak was changing an offense that had very successful in the past few years.  Kubiak’s zone blocking disrupted the chemistry of an offensive line that had already lost two starters in Orlando Franklin and Will Montgomery, and his pistol formation gave less time and comfort in the pocket for the 39 year-old Manning, who had been using the same playbook for the past 3 seasons.  Kubiak was convinced that the scheme he had been using for years, having success for him with Schaub and Foster and then with Flacco and Forsett, would work with the Broncos.

While I was originally going to into the same amount of detail in the cases of Brett Favre and Carson Palmer, I decided there was no need: the same exact trends were observable in both cases.  In 2015 (Palmer’s excellent year before his 2016 regression) Palmer’s top 2 targets, Larry Fitzgerald and John Brown, were both in the top 5 of Football Outsiders Receiver Ranking.  Their offensive line was ranked 5th in the NFL in pass protection.  In 2016, Arizona’s offensive line plummeted to 21st in pass blocking, and they failed to have a receiver crack the top 50.

 For Brett Favre, he enjoyed the number one rated receiver during his resurgent 2009 campaign…Sidney Rice (Rice never posted another top 50 ranked year and was out of the league by 2013) and the 14th rated offensive line.  You can guess what happened in 2010.  Favres offensive line fell into the bottom third of the league and the Vikings failed to have a receiver finish in the top 20.

As quarterbacks get older, they get progressively more injury-prone.  At the end of his career, Peyton Manning was battling two different nagging injuries: Plantar Fasciitis in his foot as well nerve issues in his neck.  Quarterbacks also lose some of their natural physical ability they had when they were younger.  


 Peyton Manning completing an “out” route in 2014 for a touchdown to Emmanuel Sanders in 2014


Peyton Manning throwing the same out route in 2015 which this time gets returned by Dre Kirkpatrick for a pick six

However, these simple age related problems remained hidden for at least one year in all of the three cases I studied.   Age and injuries alone were not the most important determinants of regression. Why?  Because the physical problems that these players faced were hidden by incredible (and potentially anomalous) play by their surrounding teammates.  

The extraordinary performances by the Broncos and Cardinals offensive line and receiving corps (and to a slightly lesser degree, Favre’s Vikings) were able to mask the declining natural ability of the quarterbacks.  In fact, they masked it so well that these quarterbacks appeared to be at the top of their games.  So what’s the point of this information and data I’ve collected?  Aware of these deadly trends, NFL front offices can use this data to determine if their quarterback is “at-risk” for regression.  

By analyzing the tell-tale warning signs (aging quarterback, incredible and unsustainable wide receiver/offensive line play) they can make better football decisions.  By either trading your player to an unsuspecting team or adding a new quarterback via free agency or the draft, a GM can deftly avoid the dilemma of playing (and paying) an aging quarterback who does nothing but hurt your team both on the field and on the cap sheet.

The Dilemma of the Aging Quarterback

A Brief Investigation into the 2016 MLB Season


by Jonathan Bell

As the 2016 baseball playoffs begin to ramp up, I find myself taking a look at the teams still remaining and wondering how they got here.  How did the teams still left stack up against the ones that people assumed would be there at the beginning of the season? How reliable were regular season performances in forecasting playoff readiness? Let’s take a look.

Every baseball nerd needs his own tools, so I created a model based on ZiPS and steamer projections from FanGraphs which attempted to predict regular season wins based on the collected data. To get as close to team wins as possible, I chose 4 statistical measures that I believed would most concisely and correctly correlate to wins.  I used total runs created per 27 outs (RC/27), total team pitcher wins above replacement, total team hitter wins above replacement (combined to form total wins above replacement) and mean OPS+.  I then compiled all of this data and put it into R to form a linear model.  After running the model, R produced this table:


This is the excel spreadsheet of my raw data and the projected wins from the formula R created:


Essentially, this means that the relationship between Wins and total WAR, OPS+ and WRC/27 is as follows:

Wins= 26.19295 + 1.11996(total WAR) + 0.04309(OPS+) + 0.09539(WRC/27).  

The model had a standard error of just over 6 wins meaning that the model, on average, was accurate to around +/- 6 wins.  Using the model and the outputs that it produced I was able to do some further explorations into the 2016 season.


The Model Was Right…

The model was most accurate in predicting the Marlins win total at 79.1. The boys from Miami remarkably managed to remain in playoff contention until September despite yet another injury to All-Star slugger Giancarlo Stanton.  Stanton seemed poised to finally have the season that all of the MLB was waiting for until he had the worst slump of his career in June, having only 13 hits compared to 31 strikeouts.  Right as he was breaking out of the decline, he strained his groin, missing games when the Marlins so desperately needed him.  On August 21, the marlins were 65-59 and in the thick of the race for the second and final wild card spot in the National League.  During their final 15 games without Stanton the Marlins missed his offense more than ever, going 3-12 while scoring 2.7 runs per game effectively knocking them out of the playoff race.  If the Marlins can stay healthy next season, the postseason is by no means out of reach.  Christian Yelich is a rising star and Justin Bour was having a breakout year before he too hit the disabled list. The unheralded combination of JT Realmuto and Marcell Ozuna gave the Marlins almost 6 wins above replacement.  

Analytics aside, Dee Gordon’s home run in the first game after the tragic loss of Jose Fernandez was one of the most incredible moments that I have ever witnessed in sports.  My friend who was lucky enough to be at the game described it as “Hands down one of the most surreal experiences of my life, let alone at a sporting event. I think everyone in that ballpark, and everyone watching on TV, just really wanted Jose to be there. You could feel his energy in the tears of the players and fans. When that ball flew over the fence, there was no doubt that Jose was a part of it. I don’t think I’ll ever see something like that again.” Fernandez was truly one of the kindest people in baseball and will not only be missed by his friends and family, but by the entire baseball community.  Despite the loss of one of baseball’s shining stars, the future still looks bright in Miami.

And Wrong…

The model’s largest gap between predicted (78) and actual wins (95) was the Texas Rangers, whose projection differed from the actual result by about 6 wins greater than any other team in the model.  It’s no secret that the Rangers got very lucky for the first four months of the season.  In fangraphs pythagorean and base runs created models the Rangers were also the largest outlier, so clearly something was up.  So what happened?  It turns out the Rangers record in one run games is absolutely insane.  The Rangers went 36-11 (Thirty Six and Eleven!!!) in games decided by one run or less.  That’s the best record in the modern era of baseball and the best record in over 110 years.  One explanation of this record could be that the Rangers had an incredible bullpen that shut down opponents late in games.  Sam Dyson of investigates this claim and found out that not only was their bullpen merely average (in terms of WAR), but that the Rangers record with their bullpens skill level would only happen less than once in ten thousand simulated seasons.  Not so shockingly, the Rangers were swept by the Blue Jays in the ALDS.  How did the Blue Jays close out the Rangers?  They won on a walk off in Toronto in extra innings…7-6.  The Rangers luck finally ran out.

LA Living

Behind the Cubs, the model predicted the Dodgers would have the second highest win total in baseball.  While the dodgers had a fine season, winning 91 games, they didn’t quite live up to their lofty 96 win projection.  A major reason for this prediction was that the Dodgers had the second highest projected pitcher WAR of any team in the league.  Los Angeles actually met these expectations, finishing with 20.1 pWAR as compared to their projected total of 20.3.  Clayton Kershaw almost reaching his predicted value of 7.0 despite pitching only 149.1 innings was no doubt a major factor in the Dodgers superb pitching performance.  In fact Kershaw finished third in all of the MLB in WAR despite pitching between 37 and 73 fewer innings than his closest competitors.  One can only wonder what the type season we would have witnessed if Kershaw hadn’t succumbed to a back injury, missing almost half the season.  So if not pWAR, what caused the discrepancy in wins?  It’s not exactly a scientific answer but after the Dodgers clinched the NL west on September 25th, they proceeded to lose 5 of their final six games, including a couple against the lowly Padres.  Those meaningless games might have been the difference.  

What happened to Pittsburgh?

After making the playoffs for three consecutive seasons and one year removed from a 98 win season where they were widely regarded as the second best team in baseball, the Pirates not only failed to make the playoffs but also finished under 0.500.  According to my model,  the Pirates underperformed to the extent of about 7.5 wins.  One can hardly blame teams in the NL Central for struggling considering how the Cubs set the league on fire this season.  In fact, the Cubs took out quite a bit of their 108 year misery on the Pirates, going 14-4 against Pittsburgh in 2016.  But I suspected there was a deeper problem in the steel city.  Pittsburgh was projected to have an elite offense, with over 25.1 WAR attributed to the bats.  However, Pittsburgh underperformed by nearly five wins. But this wasn’t the most disturbing part of the Pirates season.  Nearly all of those lost five wins (4.9) can be attributed to one player-2013 NL MVP Andrew McCutchen.  Andrew McCutchen had his lowest win total of his career (0.7), 3.1 lower than his rookie season and nearly six wins lower than his average over the past 5 years (6.7).  The pirates go as McCutchen goes and if he continues to perform closer to the level of Phil Gosselin than the 2013 version of himself, the Pirates are going to have major struggles for as long as he remains their franchise player.


The 2016 season was full of surprises, and these teams certainly had an interesting go of it with plenty of unpredictable performances. After tonight, baseball begins its most thrilling stretch as the few remaining teams are whittled down to one champion. Unfortunately, due to the small sample size of the postseason, my model isn’t of much use so I’m looking forward to putting stats aside and just enjoying the baseball. While models like these can be useful over the course of the regular season, October is where anomalies take center stage. One swing can make or break an entire season. Here we go.


A Brief Investigation into the 2016 MLB Season

It’s Time for NBA General Managers to Shift Their Second Round Strategy


By Jonathan Bell

If you’re an NBA fan in mid-june and your team isn’t vying for the title, there’s only one thing on your mind: the upcoming NBA summer.  The oft wacky, wild and drama filled months of June and July start with the NBA draft, to be held this year on June 23 in New York.  The draft is a symbol of hope for many of the lesser teams in the league, believing that they will finally acquire the player needed to reverse their fortunes, propelling them into playoff and title contention.  Teams with the first several picks pore over statistics, highlight tapes and individual player workouts to try to determine which player they will select to be the new savior of their franchise.  However, underneath the awful suits, awkward hugs and Bill Simmon’s fist pumps prevalent in the first round, there is a side to the draft as shrouded in mystery as the Sixers long term strategy: the second round.

A couple times a year, either to solve an argument or to satisfy personal curiosity I go on an NBA draft binge on basketball reference.  Scrolling through the picks, I rarely make it as far as the second round.  But every time I do, the wasteland I find there confuses me.  Multiple players picked in a row never play a single NBA game, and more often than not I can barely recognize the names I see.  In fact, it seemed to me that the second round was useless. The whole point of the draft (and the very specific order in which it proceeds) is that the worst teams get the “best” players in order to help them improve and become competitive in the quest for basketball supremacy.  Yet, what I seemed to observe in my casual glances at past second rounds was a barren basketball desert where you were just as likely to get a contributing player picking 1st as you are 30th.  Knowing this fact could drastically shift a team’s strategy come draft day.  What value does a second round pick have if, unlike the first round, where you select has little to no correlation with the quality of player you expect to get in return?

I wanted to compare trends of how valuable a player is expected to be based on their numerical selection, separating the data by round in order to come up with some form of a “value” for draft picks.  I decided to quantify “expected value” using two different advanced statistics, Box Score Plus/Minus (BPM) and Value over Replacement Player or VORP.  I compiled data from all drafts between 2001 and 2014 in order to get a sample size large enough to test.

Graphs above show data collected from the 2001-2014 NBA drafts, showcasing draft position vs VORP and BPM.  Clear correlations between selection number and BPM/VORP are observed in the first round graphs where there is no observable correlation in the graphs depicting the second round.  

NBA general managers should heed this uncertainty and think about shifting their strategy in the second round.  If the value of a second rounder is impossible to predict, it might be more valuable for a team to draft a player that will never play in the NBA, or at least won’t for enough time that a team can determine the value of said player.  Teams are starting to realize that the traditional second rounder-a player who often wastes away on the bench and usually not contributing anything to the team except for excessive towel waving-is not the ideal.

The second round has no rookie salary scale, meaning that teams can sign players at their own discretion (almost always the league minimum).  Further,unlike first rounders, second round picks do not count against the cap while unsigned.  This key measure of the CBA is a loophole that general managers have been exploiting for a long time via the “draft and stash”, IE drafting a foreign player and not paying them while they continue to play for their foreign team.   It’s also possible to accomplish this domestically, as evidenced by players such as Andrew Harrison, Dakari Johnson and other members of the 2015 draft class who didn’t sign with the team that drafted them but instead play on their D-league affiliate team.  These players do not count against the cap but remain under control of the team that selected them.  This bit of front office ingenuity not only saves teams money but also enables them to keep a closer eye on their developing prospects then they would be able to with a foreign stashed player.  This is an ideal scenario for teams because it takes all the (admittedly minimal) risk of a second round pick out of the equation while keeping the player under close watch in case they develop into a replacement level or better NBA player.

Second round picks remain a mystery.  Many different factors could contribute to the extreme uncertainty found there.  It could have to do with the fact that many players not projected to be picked in the first round are not scouted and analyzed as thoroughly as first round picks.  It could also be that teams take risks more frequently with who they draft in the second round.  Second round picks are also often trade fodder, passed around the league packed with players involved with trades.  In 2016, 22 of the 30 picks in the draft have changed hands.  While the format of the NBA draft doesn’t appear to be changing anytime soon, the mindsets of general managers certainly should.  If this trend in the second round continues, general managers should change their concept of what a successful second round pick is.  It is becoming detrimental to the modern NBA franchise for GM’s to go the standard route in the, it makes far more sense for all general managers to either trade or somehow stash their picks.  The traditional NBA second round pick has become outdated and GM’s need to catch up with the times.

It’s Time for NBA General Managers to Shift Their Second Round Strategy