Predicting The NFL Draft: RB Round Selection

Christian-McCaffrey

By Mason Kirchner

In honor of this weekend’s NFL Draft, I decided to make a spiritual successor to my recent article that tried to predict how well incoming NFL rookies would perform in fantasy football. Feel free to read that to get an idea of the dataset used in this project, but note that the method is slightly different this time around. This time around, instead of trying to predict how well the incoming rookie would perform in the NFL, I decided to try to predict which round the player will be selected in.

Method

In order to get more out of my data, I introduced squared terms and Principal Component Analysis. The hope is that by squaring each of the individual statistics, such as yards, is that some sort of inflection point can be more easily identified in the classification problem. Principal Component Analysis is used to transform the data in a way that reduces variance amongst different features.

This time around, instead of using purely random forests, I decided to use a voting ensemble method. This video sums up ensembles nicely, but the strength in using this over just random forests is that it allows me to combine multiple models to get better predictions.  In addition to voting, I used bagging, which is explained in this video, in order to additionally strengthen each of the individual models. The three methods used in the voting ensemble are K Nearest Neighbors, Random Forests, and Extra Random Forests, all of which were bagged. Other methods, such as SVM and AdaBoost were tested, but didn’t perform as well as the three that I settled on.

It’s worth noting that the best individual model only had .516 mean accuracy in cross-validation (Bagged Random Forest), and that the voting ensemble performed at .506 mean accuracy in cross-validation. This can be attributed to there being 8 different classifications – one for each round, as well as undrafted status. With this many possible classifications, getting an average of around 50% of the samples being correctly predicted is alright in my books.

Results

Without further adieux, here’s the complete table of round probabilities for players that went to the 2017 Combine (and Joe Mixon):

Name Undrafted Round1 Round2 Round3 Round4 Round5 Round6 Round7
McCaffrey, Christian 0.092 0.049 0.175 0.159 0.193 0.211 0.080 0.042
Jones, Aaron 0.104 0.112 0.192 0.140 0.195 0.149 0.066 0.043
Pumphrey, DJ 0.106 0.091 0.196 0.178 0.185 0.137 0.069 0.040
McNichols, Jeremy 0.136 0.108 0.164 0.126 0.202 0.163 0.094 0.007
Cook, Dalvin 0.148 0.101 0.150 0.107 0.268 0.148 0.057 0.022
Hill, Brian 0.150 0.079 0.212 0.132 0.131 0.193 0.096 0.007
Hunt, Kareem 0.188 0.038 0.157 0.076 0.140 0.283 0.083 0.036
Foreman, D’onta 0.189 0.129 0.195 0.112 0.105 0.189 0.055 0.028
Williams, Joe 0.213 0.038 0.159 0.097 0.188 0.163 0.101 0.041
Joe Mixon 0.215 0.067 0.203 0.059 0.138 0.135 0.126 0.058
Williams, Jamaal 0.228 0.042 0.148 0.093 0.181 0.194 0.091 0.024
Mack, Marlon 0.241 0.045 0.105 0.135 0.100 0.150 0.145 0.079
Dayes, Matt 0.316 0.027 0.076 0.038 0.084 0.195 0.168 0.096
Clement, Corey 0.341 0.046 0.108 0.068 0.086 0.197 0.115 0.039
Williams, Stanley 0.359 0.028 0.098 0.056 0.107 0.152 0.117 0.084
McGuire, Elijah 0.380 0.009 0.094 0.074 0.110 0.165 0.120 0.048
Perine, Samaje 0.380 0.017 0.080 0.069 0.118 0.083 0.206 0.047
Gallman, Wayne 0.381 0.011 0.084 0.050 0.087 0.180 0.148 0.059
Conner, James 0.384 0.036 0.093 0.075 0.092 0.119 0.148 0.053
Redding, Devine 0.446 0.012 0.051 0.028 0.101 0.147 0.146 0.068
Thomas, Jahad 0.455 0.016 0.082 0.042 0.118 0.071 0.142 0.075
Hood, Elijah 0.466 0.055 0.049 0.057 0.084 0.130 0.074 0.084
Fournette, Leonard 0.468 0.059 0.052 0.078 0.094 0.112 0.065 0.072
Smith, De’Veon 0.538 0.025 0.039 0.054 0.052 0.116 0.060 0.118
Stevenson, Freddie 0.559 0.014 0.037 0.075 0.069 0.045 0.129 0.073
Logan, TJ 0.561 0.006 0.034 0.048 0.073 0.074 0.150 0.054
Kamara, Alvin 0.561 0.009 0.045 0.074 0.100 0.072 0.111 0.028
Cohen, Tarik 0.571 0.007 0.028 0.071 0.106 0.065 0.064 0.088
Rogers, Sam 0.582 0.003 0.028 0.073 0.043 0.033 0.133 0.107
Davis, Justin 0.608 0.009 0.026 0.045 0.063 0.081 0.084 0.085
Henderson, De’Angelo 0.620 0.007 0.024 0.084 0.096 0.054 0.060 0.056
Carson, Christopher 0.652 0.010 0.063 0.042 0.052 0.041 0.102 0.038
Ogunbowale, Dare 0.709 0.002 0.056 0.047 0.061 0.028 0.067 0.029
Shell, Rushel 0.773 0.010 0.018 0.035 0.029 0.030 0.058 0.048

The table itself is sorted from least to greatest on the probability that the player goes undrafted. You can download the excel file if you wish to make it sortable yourself.

Analysis

We need to be careful in our analysis of these numbers not to assume that the highest predicted probability for each player is what is most likely to occur.

For example, Samaje Perine is given a .38 predicted probability of being undrafted. Or a .62 probability of being drafted. These predicted probabilities are more useful to us if we set up a probability distribution of where a player might be drafted. Using these numbers, we’re going to find that Perine has a .407 predicted probability of going somewhere between rounds four and six by adding up each of his predicted probabilities for those rounds.

The model still could use plenty of help. Like mentioned previously, it was only able to accurately predict an average of .506 during cross validation. There are some notable players that the model doesn’t like, such as Leonard Fournette, who likely end up being the first RB’s name called Thursday night. Alvin Kamara is another potential first/second rounder that is getting snubbed by the model. Their poor scores likely stem from the fact that both Fournette and Kamara missed games due to injury this past season.

Hopefully you enjoyed this exercise, and enjoy the NFL Draft as well.

Feel free to check out the code here.

 

Advertisements
Predicting The NFL Draft: RB Round Selection

Identifying Breakout Rookie Running Backs in Fantasy Football

ezekiel-elliott-092516-getty-ftr-usjpg_2nj0hp8bw45x1wvz14logdxro

By Mason Kirchner

Everybody is looking for the edge to winning their fantasy football leagues, and one of the easiest ways to win is by landing a breakout running back on one’s roster. With every expert proclaiming that “their guy” is the next Jordan Howard or Ezekiel Elliot, it’s easy to get lost in a sea of unfounded speculation. The goal of this article is simple: identify the RBs in the 2017 rookie class that look poised to have a breakout season at least once in their career by using a machine learning model.

The Data

The data acquired for the analysis is a combination of combine metrics and on-field performance statistics. The combine metrics include each player’s Height, Weight, BMI, Arm Length, Hand Size, 40 Yard Dash time, Bench Press repetitions, Vertical Leap height, Broad Jump distance, Shuttle time, and Three-Cone time. If a player was not invited to the combine, or did not run at the combine, their unofficial Pro Day numbers were used if available.

The on-field performance statistics are divided into three different categories (note that all of these statistics are drawn from the player’s final year of college football).

The first group of statistics are the elementary rushes, yards, and yards per carry numbers.

The second group of statistics are more advanced and are provided by Player Profiler. These statistics include Breakout Age, College Dominator, and College Target Share. The idea behind College Dominator and College Target Share is that they give a measurement of how much of their college team’s offense that they were responsible for, and Breakout Age provides the age in which the player first became responsible for over 30% of their team’s offense. There has been previous research done using these metrics by the writers over at RotoViz (note: RotoViz has a light pay-wall).

The third group of statistics is a group of statistics developed by Bill Connelly over at College Football Studyhall. Connelly developed these statistics as an attempt to try to mitigate the running back’s offensive line play, and figure out how well a running back performs when they’re on their own in the open field. You can read more about these statistics here. The Connelly statistics that are used include Highlight Yards, Highlight Yards per Carry, Highlight Opportunities, Highlight Yards per Opportunity, and Opportunity Rate.

Note: Both Player Profiler and Connelly provide the Target Share statistic.

The goal is to predict whether or not a player will finish the season in the top 12 highest scoring RBs in a Points Per Reception (PPR) league. To get an idea about which features are correlated with an RB1 season in PPR, the following correlation plot is made:

rbCorr
Correlation Plot Between Variables

The area of interest in this plot is the final row and column — the statistics correlated with having a PPR RB1 season. Most of the data are, at most, weakly correlated with an RB1 season, but it is worth nothing that the combine drills are least correlated of all. The Highlight Yards statistic is most correlated with an RB1 season.

Modeling

Random Forest is the machine learning method used in this experiment. The model was trained using 80 percent of the samples in the data, and the final 20 percent for validation. Given the nature of Random Forests, over-fitting is mitigated, so all features described earlier were used in the model. The other nice thing about Random Forests is that we can easily work around the issue of missing data. Missing data arises frequently in this exercise due to many players not participating in certain combine drills, or the absence of the Connelly statistics that only exist at the FBS level of college football.

Using Sk-Learn’s Random Forest Classifier, a model was created that was able to predict the RB1 status in the validation set with perfect accuracy. It’s worth noting that most of the players in the sample are not classified as having an RB1 season, so the model only had to accurately predict the one player in the validation set that was an RB1 (Melvin Gordon). Below is the importance of each feature in the model (greater values indicate higher importance):

importance
Importance of Each Model Feature

Extremely interestingly, hand size is the most important feature of RB PPR success in this model. This can likely be attributed to a few things. The top three hand sizes in the data set all turned out to be RB1s, so the model is trained to favor large hand measurements. Large hands will also help the RB catch the ball more easily, and prevent fumbling — attributes that will increase a player’s PPR score.

Not as surprisingly, College Dominator, BMI, and College Target Share round out the top four most important features. The previous research from RotoViz showed that College Dominator and Target Share proved to be valuable, and our model further backs up their claim. BMI is intuitively valuable as well, because an ideal RB will be have a high weight-to-height ratio that makes them difficult to tackle.

Predicting the 2017 Class

Using the model, and the available data from the 2017 RB Class, here are the predicted probabilities that each RB will have an RB1 season in PPR:

The most notable entry on this list is located at the top: Aaron Jones out of UTEP. Predicted by most experts to be a day three selection, this model gives him the highest probability in his draft class to be an RB1 at 61 percent. Digging further into his numbers we find a player that tested near the top of his position group in multiple combine drills, notable the Broad Jump, 3-Cone Drill, and Vertical Jump — all drills that measure an athlete’s explosiveness.

aj2
Aaron Jones’ Combine Measurables. Courtesy of MockDraftable.com

 

Additionally, Jones put up staggering production numbers during his 2016 campaign, as he accounted for 47% of his team’s offense on his way to running for 1773 yards. My favorite statistic of his was that he ran for nine Highlight Yards per Opportunity in his final season. In other words, for every rush he had that went over five yards, on average he rushed for an additional nine. With his high level of athleticism and great on the field performance, it makes sense that the model loves him.

Meanwhile, the (nearly) consensus number one prospect in this class, Leonard Fournette, ranks out as the 12th most likely RB to post an RB1 season in PPR at a probability of 0.2. What’s up with this? I can offer a few explanations.

The first of which is that Fournette missed five games last season, which undoubtedly hurt his season-long numbers. On top of that, Fournette only participated in two drills at the NFL combine: the 40 Yard Dash, and Vertical Jump. His absence from drills along with his historically bad, 28.5″ Vertical Jump further add to the equation.

He also has kind of small hands.

Put this all together, and the model ends up ranking him 12th in his class. That being said, a 20 percent chance of being an RB1 in PPR isn’t really that bad, especially considering this is who we’re talking about:

giphy

Goes without saying that this model is best paired with an eye test.

Further Work

A notable missing component from the model at this point is draft position. I didn’t want to include it in this model, because the 2017 Draft has not yet occurred but I still wanted to try to play my hand at GM for the upcoming draft. There has been prior research that shows draft position is heavily correlated with NFL success, so Fournette should see a nice bump in his ranking if he gets drafted at the top of the first round, and Jones will likely see a decrease if he is taken in a later round.

There’s an argument to be made in order to implement some feature reduction as well. While Random Forests shouldn’t run into an issue with over-fitting, it may be unnecessary for some of the less important variables to be included in the model as they add minimal value and introduce noise to the model.

Ethan Young also used some interesting variables in his recent QB analysis article in order to factor in how the player’s situation affects performance. Introducing variables that account for level of competition, era, surrounding talent, and scheme into this model would hopefully be just as worthwhile for RBs as they appear to be for QBs. Look for these variables to be in future models.

For those curious in seeing how this analysis was done, check out the python notebook here.

 

Identifying Breakout Rookie Running Backs in Fantasy Football

The Dilemma of the Aging Quarterback

 

peyton-manning

By: Jonathan Bell

In the NFL, the first step to achieving and maintaining long term success for a franchise is finding a quarterback that you can build your team around.  All of the greatest dynasties in NFL history are synonymous with passers that are known by one name: Bradshaw, Montana, Aikman, and more recently Brady, Rodgers and Manning.  Yet, the annual promise of contention that comes with a franchise quarterback has one inevitable drawback: the unavoidable reality that the production of their star player will decline and even drastically.  

Often, we see general managers holding onto their beloved quarterbacks for just one season too long and the results (both on the individual and team level) can be shocking from year to year.  An understanding of when and why players regress is an essential skill for general managers, which is why Bill Belichick has been so incredibly successful.  In this article, I will use three case studies (Carson Palmer, Peyton Manning and Brett Favre) to attempt to solve the mystery of quarterback regression.

Peyton Manning

Peyton Manning’s late-career move to the Denver Broncos breathed new life into a career that many thought was over.  After missing almost all of 2011 due to a debilitating neck injury, Peyton made the most of his fresh start in Denver.  From 2012-2014, Manning posted an unprecedented stretch of numbers during the regular season, leaps and bounds ahead of the rest of the league.  To better visualize just how good Manning’s numbers were over this three year period, I made graphs using 3 key statistics: DVOA, DYAR, QBR.  

These are three advanced statistics utilized by http://www.footballoutsiders.com/ to better judge quarterback performance rather than more traditional statistics such as yards, touchdowns, interceptions etc.  These statistics are fully explained on the website, but essentially a better DYAR means a quarterback with more total value and a higher DVOA means a quarterback with more value per play; QBR is simply a rating of everything a quarterback does in the run of play (not just passing) and is normalized on a scale ranging from 0-100. Manning’s face represents his data points on the charts.

Screen Shot 2017-03-30 at 3.08.23 PM

Screen Shot 2017-03-30 at 3.10.30 PM

 

 

 

 

These four charts depict two different things.  The charts on the left graph DYAR on the Y axis and QBR on the Y while the ones on the right chart DYAR with DVOA on the X axis.  The top charts are from Manning’s three year peak and the bottom charts are from Manning’s final season.  The extreme nature of Manning’s collapse is astounding.  Manning lapped the field  during his 3 year peak and no other quarterback was really even close.  Manning led the field in all 3 statistics and was the only quarterback with an average DYAR over 1500 (1,897) and QBR over 80 (81).  However in 2015, Manning fell to dead last in DVOA and DYAR and was in the bottom 3 for QBR.  So what caused this remarkable decline?  While one might point to Manning’s mounting list of injuries and age related decline as the major factors accounting for the dreadful season, I believe there was something else at play.

As most of us are only casual observers of the game of football, we only really get to see the game as it’s presented to us on television.  We see the quarterback drop back to pass and then the ball fly in the air towards the receiver.  However, we don’t get to see how these plays develop, how each player performs in their individual role and how this performance affects the outcome of the game as a whole.  Football Outsiders does an excellent job of resolving this problem, ranking individual players and offensive/defensive line units to determine overall effectiveness.  

Quarterback performance begins with their protection, and Peyton Manning’s offensive line ranked near the top during his first three years with the Broncos. From 2012 to 2014,  the Broncos unit finished no worse than second on Football Outsiders’ rankings (which are based on adjusted sack rate) and were top of the league in Manning’s final effective season behind center.  Over the three years, the Broncos experienced little turnover on their offensive line.  They lost only two starters:  OT Ryan Clady, who was only temporary as he missed 14 games in 2014 due to injury, and T/G Zane Beadles, who left as a free agent after 2013.  Otherwise, they started some combination of 7 different linemen for 3 straight seasons with a few exceptions.  

In the modern NFL, it is nearly impossible to start essentially the same combination of 5 to 7 core guys for almost three seasons and retain that level of performance. In combing through the available data on Football Outsiders, only one other team retained a top two position for three consecutive years in the pass blocking category (Indianapolis from 1999-2001) and none in the following 16 years.  

The offensive line also benefitted enormously from offensive coordinator Adam Gase’s offensive scheme.  Gase’s spread formation, often featuring 4 or more wideouts, would ensure that the defense could rush no more than 4 defenders, fearing Manning’s deep-threat ability.  Combining the spread offense with Manning in the shotgun for the vast majority of his snaps made the job significantly easier for an already talented offensive line.

Perhaps even more astonishing than the elite level performance of Manning’s line over this period was that of his receiving corps.  Blessed with Emmanuel Sanders, Demaryius Thomas, Wes Welker and Eric Decker over the three years, the Broncos were never short of talent at the wideout position.  While advanced wide receiver statistics aren’t perfect, (Football Outsiders even warns, “We cannot yet fully separate the performance of a receiver from the performance of his quarterback. Be aware that one will affect the other.”) they still tell us a good deal about wide receiver performance compared to the rest of the field.  

What the Broncos receivers do in this respect is what amazes.  The Broncos had the number one ranked receiver two out of three years, two receivers in the top five two out of three years, and two in the top ten all three years during Manning’s tenure.  The Broncos were basically an offensive Pro Bowl team in front of Manning.  This historic run stands alone in the 29-year database provided by Football Outsiders.

The Broncos offensive statistics plummeted in 2015.  Their once-elite pass protection fell from 1st in the league to 13th, and their highest ranked wideout was Emmanuel Sanders at 49th (49th!) followed by Demaryius Thomas at 60th.  The problems in 2015 began before the season even started with the hiring of Gary Kubiak as head coach.  Previously the head coach of the Texans for 7 years and coming off a single season as offensive coordinator for the Baltimore Ravens, Kubiak was a respected NFL coach who had enjoyed some success; however, Kubiak installed his own offense in Denver, involving a new zone blocking scheme with many plays in the pistol formation (a shortened shotgun).  Kubiak was changing an offense that had very successful in the past few years.  Kubiak’s zone blocking disrupted the chemistry of an offensive line that had already lost two starters in Orlando Franklin and Will Montgomery, and his pistol formation gave less time and comfort in the pocket for the 39 year-old Manning, who had been using the same playbook for the past 3 seasons.  Kubiak was convinced that the scheme he had been using for years, having success for him with Schaub and Foster and then with Flacco and Forsett, would work with the Broncos.

While I was originally going to into the same amount of detail in the cases of Brett Favre and Carson Palmer, I decided there was no need: the same exact trends were observable in both cases.  In 2015 (Palmer’s excellent year before his 2016 regression) Palmer’s top 2 targets, Larry Fitzgerald and John Brown, were both in the top 5 of Football Outsiders Receiver Ranking.  Their offensive line was ranked 5th in the NFL in pass protection.  In 2016, Arizona’s offensive line plummeted to 21st in pass blocking, and they failed to have a receiver crack the top 50.

 For Brett Favre, he enjoyed the number one rated receiver during his resurgent 2009 campaign…Sidney Rice (Rice never posted another top 50 ranked year and was out of the league by 2013) and the 14th rated offensive line.  You can guess what happened in 2010.  Favres offensive line fell into the bottom third of the league and the Vikings failed to have a receiver finish in the top 20.

As quarterbacks get older, they get progressively more injury-prone.  At the end of his career, Peyton Manning was battling two different nagging injuries: Plantar Fasciitis in his foot as well nerve issues in his neck.  Quarterbacks also lose some of their natural physical ability they had when they were younger.  

2014-10-1918_45_19

 Peyton Manning completing an “out” route in 2014 for a touchdown to Emmanuel Sanders in 2014

P7B0R3p

Peyton Manning throwing the same out route in 2015 which this time gets returned by Dre Kirkpatrick for a pick six

However, these simple age related problems remained hidden for at least one year in all of the three cases I studied.   Age and injuries alone were not the most important determinants of regression. Why?  Because the physical problems that these players faced were hidden by incredible (and potentially anomalous) play by their surrounding teammates.  

The extraordinary performances by the Broncos and Cardinals offensive line and receiving corps (and to a slightly lesser degree, Favre’s Vikings) were able to mask the declining natural ability of the quarterbacks.  In fact, they masked it so well that these quarterbacks appeared to be at the top of their games.  So what’s the point of this information and data I’ve collected?  Aware of these deadly trends, NFL front offices can use this data to determine if their quarterback is “at-risk” for regression.  

By analyzing the tell-tale warning signs (aging quarterback, incredible and unsustainable wide receiver/offensive line play) they can make better football decisions.  By either trading your player to an unsuspecting team or adding a new quarterback via free agency or the draft, a GM can deftly avoid the dilemma of playing (and paying) an aging quarterback who does nothing but hurt your team both on the field and on the cap sheet.

The Dilemma of the Aging Quarterback

A Brief Investigation into the 2016 MLB Season

corey-seager-100115-getty-ftrjpg_1nslcpnq3aix810iqma78kljun

by Jonathan Bell

As the 2016 baseball playoffs begin to ramp up, I find myself taking a look at the teams still remaining and wondering how they got here.  How did the teams still left stack up against the ones that people assumed would be there at the beginning of the season? How reliable were regular season performances in forecasting playoff readiness? Let’s take a look.

Every baseball nerd needs his own tools, so I created a model based on ZiPS and steamer projections from FanGraphs which attempted to predict regular season wins based on the collected data. To get as close to team wins as possible, I chose 4 statistical measures that I believed would most concisely and correctly correlate to wins.  I used total runs created per 27 outs (RC/27), total team pitcher wins above replacement, total team hitter wins above replacement (combined to form total wins above replacement) and mean OPS+.  I then compiled all of this data and put it into R to form a linear model.  After running the model, R produced this table:

jon1

This is the excel spreadsheet of my raw data and the projected wins from the formula R created:

jon2

Essentially, this means that the relationship between Wins and total WAR, OPS+ and WRC/27 is as follows:

Wins= 26.19295 + 1.11996(total WAR) + 0.04309(OPS+) + 0.09539(WRC/27).  

The model had a standard error of just over 6 wins meaning that the model, on average, was accurate to around +/- 6 wins.  Using the model and the outputs that it produced I was able to do some further explorations into the 2016 season.

 

The Model Was Right…

The model was most accurate in predicting the Marlins win total at 79.1. The boys from Miami remarkably managed to remain in playoff contention until September despite yet another injury to All-Star slugger Giancarlo Stanton.  Stanton seemed poised to finally have the season that all of the MLB was waiting for until he had the worst slump of his career in June, having only 13 hits compared to 31 strikeouts.  Right as he was breaking out of the decline, he strained his groin, missing games when the Marlins so desperately needed him.  On August 21, the marlins were 65-59 and in the thick of the race for the second and final wild card spot in the National League.  During their final 15 games without Stanton the Marlins missed his offense more than ever, going 3-12 while scoring 2.7 runs per game effectively knocking them out of the playoff race.  If the Marlins can stay healthy next season, the postseason is by no means out of reach.  Christian Yelich is a rising star and Justin Bour was having a breakout year before he too hit the disabled list. The unheralded combination of JT Realmuto and Marcell Ozuna gave the Marlins almost 6 wins above replacement.  

Analytics aside, Dee Gordon’s home run in the first game after the tragic loss of Jose Fernandez was one of the most incredible moments that I have ever witnessed in sports.  My friend who was lucky enough to be at the game described it as “Hands down one of the most surreal experiences of my life, let alone at a sporting event. I think everyone in that ballpark, and everyone watching on TV, just really wanted Jose to be there. You could feel his energy in the tears of the players and fans. When that ball flew over the fence, there was no doubt that Jose was a part of it. I don’t think I’ll ever see something like that again.” Fernandez was truly one of the kindest people in baseball and will not only be missed by his friends and family, but by the entire baseball community.  Despite the loss of one of baseball’s shining stars, the future still looks bright in Miami.

And Wrong…

The model’s largest gap between predicted (78) and actual wins (95) was the Texas Rangers, whose projection differed from the actual result by about 6 wins greater than any other team in the model.  It’s no secret that the Rangers got very lucky for the first four months of the season.  In fangraphs pythagorean and base runs created models the Rangers were also the largest outlier, so clearly something was up.  So what happened?  It turns out the Rangers record in one run games is absolutely insane.  The Rangers went 36-11 (Thirty Six and Eleven!!!) in games decided by one run or less.  That’s the best record in the modern era of baseball and the best record in over 110 years.  One explanation of this record could be that the Rangers had an incredible bullpen that shut down opponents late in games.  Sam Dyson of fivethirtyeight.com investigates this claim and found out that not only was their bullpen merely average (in terms of WAR), but that the Rangers record with their bullpens skill level would only happen less than once in ten thousand simulated seasons.  Not so shockingly, the Rangers were swept by the Blue Jays in the ALDS.  How did the Blue Jays close out the Rangers?  They won on a walk off in Toronto in extra innings…7-6.  The Rangers luck finally ran out.

LA Living

Behind the Cubs, the model predicted the Dodgers would have the second highest win total in baseball.  While the dodgers had a fine season, winning 91 games, they didn’t quite live up to their lofty 96 win projection.  A major reason for this prediction was that the Dodgers had the second highest projected pitcher WAR of any team in the league.  Los Angeles actually met these expectations, finishing with 20.1 pWAR as compared to their projected total of 20.3.  Clayton Kershaw almost reaching his predicted value of 7.0 despite pitching only 149.1 innings was no doubt a major factor in the Dodgers superb pitching performance.  In fact Kershaw finished third in all of the MLB in WAR despite pitching between 37 and 73 fewer innings than his closest competitors.  One can only wonder what the type season we would have witnessed if Kershaw hadn’t succumbed to a back injury, missing almost half the season.  So if not pWAR, what caused the discrepancy in wins?  It’s not exactly a scientific answer but after the Dodgers clinched the NL west on September 25th, they proceeded to lose 5 of their final six games, including a couple against the lowly Padres.  Those meaningless games might have been the difference.  

What happened to Pittsburgh?

After making the playoffs for three consecutive seasons and one year removed from a 98 win season where they were widely regarded as the second best team in baseball, the Pirates not only failed to make the playoffs but also finished under 0.500.  According to my model,  the Pirates underperformed to the extent of about 7.5 wins.  One can hardly blame teams in the NL Central for struggling considering how the Cubs set the league on fire this season.  In fact, the Cubs took out quite a bit of their 108 year misery on the Pirates, going 14-4 against Pittsburgh in 2016.  But I suspected there was a deeper problem in the steel city.  Pittsburgh was projected to have an elite offense, with over 25.1 WAR attributed to the bats.  However, Pittsburgh underperformed by nearly five wins. But this wasn’t the most disturbing part of the Pirates season.  Nearly all of those lost five wins (4.9) can be attributed to one player-2013 NL MVP Andrew McCutchen.  Andrew McCutchen had his lowest win total of his career (0.7), 3.1 lower than his rookie season and nearly six wins lower than his average over the past 5 years (6.7).  The pirates go as McCutchen goes and if he continues to perform closer to the level of Phil Gosselin than the 2013 version of himself, the Pirates are going to have major struggles for as long as he remains their franchise player.

 

The 2016 season was full of surprises, and these teams certainly had an interesting go of it with plenty of unpredictable performances. After tonight, baseball begins its most thrilling stretch as the few remaining teams are whittled down to one champion. Unfortunately, due to the small sample size of the postseason, my model isn’t of much use so I’m looking forward to putting stats aside and just enjoying the baseball. While models like these can be useful over the course of the regular season, October is where anomalies take center stage. One swing can make or break an entire season. Here we go.

 

A Brief Investigation into the 2016 MLB Season

It’s Time for NBA General Managers to Shift Their Second Round Strategy

nba-draft.png

By Jonathan Bell

If you’re an NBA fan in mid-june and your team isn’t vying for the title, there’s only one thing on your mind: the upcoming NBA summer.  The oft wacky, wild and drama filled months of June and July start with the NBA draft, to be held this year on June 23 in New York.  The draft is a symbol of hope for many of the lesser teams in the league, believing that they will finally acquire the player needed to reverse their fortunes, propelling them into playoff and title contention.  Teams with the first several picks pore over statistics, highlight tapes and individual player workouts to try to determine which player they will select to be the new savior of their franchise.  However, underneath the awful suits, awkward hugs and Bill Simmon’s fist pumps prevalent in the first round, there is a side to the draft as shrouded in mystery as the Sixers long term strategy: the second round.

A couple times a year, either to solve an argument or to satisfy personal curiosity I go on an NBA draft binge on basketball reference.  Scrolling through the picks, I rarely make it as far as the second round.  But every time I do, the wasteland I find there confuses me.  Multiple players picked in a row never play a single NBA game, and more often than not I can barely recognize the names I see.  In fact, it seemed to me that the second round was useless. The whole point of the draft (and the very specific order in which it proceeds) is that the worst teams get the “best” players in order to help them improve and become competitive in the quest for basketball supremacy.  Yet, what I seemed to observe in my casual glances at past second rounds was a barren basketball desert where you were just as likely to get a contributing player picking 1st as you are 30th.  Knowing this fact could drastically shift a team’s strategy come draft day.  What value does a second round pick have if, unlike the first round, where you select has little to no correlation with the quality of player you expect to get in return?

I wanted to compare trends of how valuable a player is expected to be based on their numerical selection, separating the data by round in order to come up with some form of a “value” for draft picks.  I decided to quantify “expected value” using two different advanced statistics, Box Score Plus/Minus (BPM) and Value over Replacement Player or VORP.  I compiled data from all drafts between 2001 and 2014 in order to get a sample size large enough to test.

Graphs above show data collected from the 2001-2014 NBA drafts, showcasing draft position vs VORP and BPM.  Clear correlations between selection number and BPM/VORP are observed in the first round graphs where there is no observable correlation in the graphs depicting the second round.  

NBA general managers should heed this uncertainty and think about shifting their strategy in the second round.  If the value of a second rounder is impossible to predict, it might be more valuable for a team to draft a player that will never play in the NBA, or at least won’t for enough time that a team can determine the value of said player.  Teams are starting to realize that the traditional second rounder-a player who often wastes away on the bench and usually not contributing anything to the team except for excessive towel waving-is not the ideal.

The second round has no rookie salary scale, meaning that teams can sign players at their own discretion (almost always the league minimum).  Further,unlike first rounders, second round picks do not count against the cap while unsigned.  This key measure of the CBA is a loophole that general managers have been exploiting for a long time via the “draft and stash”, IE drafting a foreign player and not paying them while they continue to play for their foreign team.   It’s also possible to accomplish this domestically, as evidenced by players such as Andrew Harrison, Dakari Johnson and other members of the 2015 draft class who didn’t sign with the team that drafted them but instead play on their D-league affiliate team.  These players do not count against the cap but remain under control of the team that selected them.  This bit of front office ingenuity not only saves teams money but also enables them to keep a closer eye on their developing prospects then they would be able to with a foreign stashed player.  This is an ideal scenario for teams because it takes all the (admittedly minimal) risk of a second round pick out of the equation while keeping the player under close watch in case they develop into a replacement level or better NBA player.

Second round picks remain a mystery.  Many different factors could contribute to the extreme uncertainty found there.  It could have to do with the fact that many players not projected to be picked in the first round are not scouted and analyzed as thoroughly as first round picks.  It could also be that teams take risks more frequently with who they draft in the second round.  Second round picks are also often trade fodder, passed around the league packed with players involved with trades.  In 2016, 22 of the 30 picks in the draft have changed hands.  While the format of the NBA draft doesn’t appear to be changing anytime soon, the mindsets of general managers certainly should.  If this trend in the second round continues, general managers should change their concept of what a successful second round pick is.  It is becoming detrimental to the modern NBA franchise for GM’s to go the standard route in the, it makes far more sense for all general managers to either trade or somehow stash their picks.  The traditional NBA second round pick has become outdated and GM’s need to catch up with the times.

It’s Time for NBA General Managers to Shift Their Second Round Strategy

The Masterson Effect

By Jonathan Bell

This is the start of a series where we will be examining players in all sports who showcase elite level talent, yet never break through.  They may even have spectacular seasons but then fade away into obscurity as their level of play never reaches their potential.  In each case, we will attempt to figure out what went wrong and why they couldn’t succeed despite their obvious talent.

MLB: Boston Red Sox at Philadelphia Phillies
Apr 9, 2015; Philadelphia, PA, USA; Boston Red Sox starting pitcher Justin Masterson (63) pitches during the first inning against the Philadelphia Phillies at Citizens Bank Park. Mandatory Credit: Bill Streicher-USA TODAY Sports

As math becomes increasingly prevalent in the sports world, the relevance of the “eye test” grows more obsolete.  Yet sometimes there are moments that make you forget all about analytics.  For me, one of these moments came last year in a meaningless mid-August game between the Red Sox and the Indians watching Justin Masterson pitch.  Masterson’s specialty is his slider.  

To righties, his slider starts perfectly on the inner half of the plate before darting down and away with exceptional late movement.  When he’s in command, his slider looks borderline unhittable, fooling even in the best hitters in the league.  After watching him strike out the final batter of the inning on a particularly nasty pitch, I turned to one of my friends and muttered “That’s got to be one of the best sliders I’ve ever seen”.  Masterson had passed the eye test with flying colors.  

I decided to dig deeper on Masterson’s slider, attempting to find data on how this pitch actually performed compared to what I had observed.  The results were astounding.  Masterson’s slider consistently ranked among the best in all of baseball.  His slider has saved him an astonishing 52.0 runs over his career (via pitch f/x pitch value data) and an average of 1.22 runs per 100 pitches (approaching the best possible score of 1.5).  In 2013, Masterson’s slider ranked second in all of baseball with 18.7 runs saved over the season.  Masterson was well ahead of perennial Cy Young Award contenders Fernandez, Scherzer, Sale, Kershaw, Bumgarner and Sabathia.

Masterson doesn’t suffer from small sample size bias either, he threw his slider 26.9% of the time in 2013 and 21.2% over his career.  Masterson’s slider also boasts an exceptional K%, over 40% for his career.  By comparison, Clayton Kershaw (the current king of  K% in the  MLB) has a K% of 52.8% on his curveball (widely regarded as one of the best strikeout pitches in baseball) and his slider comes in at 44.1%, slightly above Masterson’s.

Masterson’s slider generates an impressive amount of swing-and-misses as well.  When he hits his sweet spot (down and away to righties) opposing batters whiff at 27% clip, considerably above league average. So at this point you’re probably wondering why  I am writing about this. Justin Masterson has a great slider, so what?  

Well that’s the whole point. Despite having an elite pitch in his repertoire (a slider nearly as valuable as Clayton Kershaw’s curveball by pitch value, 56.3 vs. 52.1 over their respective careers) Justin Masterson continually underperforms, faltering in the face of pressure and displaying signs of constant inconsistency.

pasterson

Apart from two sparkling seasons where Masterson showcased flashes of brilliance (2011 and 2013), Masterson has pitched poorly and erratically.  His ERA in those other seasons is well over 4.00, approaching 5.00 in most years and a ghastly 6.00 in his last two years in the league (2014-2015).  

The first piece of evidence against Masterson is his fastball.  Easily his worst pitch, his fastball comes in at -49.6 runs saved for his career and has a K% of only 18.9%.  His fastball also began losing velocity like a damaged engine losing fuel after 2013.  His fastball fell five miles an hour in velocity (from 93 to 88) in just two years.  Yet, even after Masterson slashed the usage of his fastball from 45% down to 19.8% in 2012 and kept it low for the rest of his career, Masterson continued to struggle.  

His sinker (around league average -0.14 runs saved per 100) became his new primary pitch, but it didn’t really help.  Interestingly, the first year after the switch Masterson struggled immensely and his numbers suffered accordingly until the best year of his career came out of nowhere in 2013, two years after the change.  Yet, the success didn’t last as Masterson’s two worst-and final-seasons came three years after he shifted from the fastball to the more effective sinker.

What went truly wrong with Masterson is hard to know.  Why his fastball was so terrible when his velocity wasn’t that bad until the end of his career is still a mystery.  Why his numbers didn’t improve that much after he shelved the fastball for the much better sinker is as well.  Masterson did struggle with control over his pitches in general throughout his career with a career average BB/9 rate of 3.7 (league average is 2.9, awful is 4.0).  This high walk rate led to an equally high career WHIP of 1.398.  Masterson also struggled with injuries and bounced around the league quite a bit.  

Maybe it was mental and maybe it was physical but for whatever reason, despite having a hall of fame caliber pitch at his disposal nearly his whole career, Masterson never made the leap.  He performed at an average or below average level for the majority of his largely unimpressive career, wasting away along with one of the most dominant pitches in the league.  Other than his lone All Star appearance, Masterson became just a small blip in baseball history.  

The beginning of the end for Masterson came in 2014 when he got traded to the Cardinals who, due to organizational beliefs, made him throw his all world slider as little as possible.  Masterson, in turn, was shelled and after an equally awful stint with the Red Sox in 2015 has been out of the league ever since.  

The Masterson Effect

New Academic Articles

header

Yesterday three new academic articles were uploaded to our main website.

The first of which is “The Queen of Aces: An Examination of the Strategy behind Serena William’s Serve” by Camille Wixon. At 5’9”, with 69 singles titles, Serena Williams is the number one female tennis player in the world, and arguably the best female athlete of all time (ESPN). With 452 aces over 60 matches (102 in the 2012 Wimbledon Championships alone), S. Williams dominates with her powerful and effective serve. However it is not just her power driving her success, but also a brilliantly executed strategy at a blistering pace. In this paper, Serena’s unique spinning toss and close practice of mixed strategy equilibrium due to her immunity from the timing variable as factors of her success are examined. Read the full paper here.

The second article is “Put the Driver Away: An Analysis of Determinants in Annual Earnings of PGA Tour Players” by CJ Dickman. This paper examines the determinants of the earnings of PGA Tour golfers from the period 2012-2015. The goal of this paper is to determine the relative importance of each statistical category relevant to a golfer’s average round. Golf analysts and economists have argued for well over two decades that putting is what wins championships. Recently however, that theory has come under fire with a leaning more on distance over accuracy and putting. With the addition of two key variables to the model (Strokes Gained Tee to Green and Strokes Gained Putting), courtesy of the PGA Tour, the model is able to support the original claim that accuracy and putting are the largest determinants of PGA tour earnings. Read the full paper here.

The final article uploaded is “Predicting Swing and Miss Percentage for Pitchers Using Pitch f/x Data” by Adam Yudelman. For an MLB franchise, making the most of a limited payroll, is what separates franchises that do not have the financial resources of teams like the Los Angeles Dodgers. In an effort to do this, teams are committed to the idea of arbitrage, where they attempt to identify undervalued players. In a method popularized by the movie Moneyball, teams use data analysis to find quantitative values of the players. About a decade ago, the amount of data available exploded with the introduction of Pitch F/X, which uses two cameras to track pitches and record statistics like speed, ball rotation, release point, and movement. More recent technology now simultaneously tracks the movement of all players on the field. This project uses Pitch F/X data to try find undervalued pitchers by building a model that predicts how well a pitcher can cause swing and misses. Read the full paper here.

-WFUSA

New Academic Articles