The Arsenal Score is a metric which can examine how effective a pitch currently is, or how effective it could be. This metric is compiled from z-scores (a statistical measure of how far above, or below the mean a specific value is) of ground ball and swinging strike rates (Sarris, 2016). Eno Sarris put this metric together to see which players might be on the verge of a breakout, should they figure out control issues, improve their fitness and last longer in games. Eno has used the Arsenal score to rank pitchers from the 2015 season, proposing that pitchers like Chad Bettis, Rich Hill, and Rasiel Iglesias are on the verge of a breakout.
My colleague Dan and I built the Stuff metric for a couple of different reasons. The first, and yet to be completed, was to look at how a pitcher’s stuff could influence their risk of injury. The second, was for a similar reason as to the development of the Arsenal score – how can we possibly find players who have electric “stuff”, yet are a mere tweak away from major league success. The stuff metric is developed in a similar fashion to the Arsenal score – we look at the z-scores of a pitcher’s velocity, change of velocity, velocity of breaking pitches, and amount of break (Sonne & Mulla, 2015). However, unlike the Arsenal score, we have no indication as to how these pitchers are influencing the hitter – if they causing swings and misses, or if they inducing ground balls. In a sense, this is a weakness of the Stuff metric compared to the Arsenal scores, but it could possibly be used sooner than the arsenal score – as minor league parks install PITCHf/x system and other tools for measuring pitch movement and velocity. Using the stuff metric, we’ve proposed possible 2016 breakout pitchers like Chris Bassitt and Mike Foltynewicz.
These two metrics try to get at similar answers, but go about it in a different manner. For this analysis, I wanted to see how these two metrics could be combined to predict pitcher success.
I used the stuff metric calculated for 2015 pitchers (found here: http://www.mikesonne.ca/baseball/22/) and the arsenal scores for pitchers in 2015 (found here: http://www.fangraphs.com/fantasy/the-change-arsenal-scores/). In both evaluations, a pitch had to be thrown 100 times to be eligible for further analysis. In total, 138 different pitchers were included in this analysis. To see how both new pitching metrics performed (Arsenal Scores and Stuff), I calculated the R2 between the metric and ERA, xFIP, k/9, and WAR. These result values were obtained from FanGraphs. To see how the combined metrics worked to predict pitcher performance, I used a multiple regression analysis, and developed separate equations for each of the FanGraphs result values, using the sum of Arsenal scores and Stuff value as inputs.
For further analysis of the combined metric model, the difference between predicted values and actual values was calculated for ERA, xFIP, and k/9. This analysis did not include WAR, as to allow for equal comparison between players who played different numbers of games.
In general, the Arsenal Score was a better predictor of pitcher performance than Stuff. Arsenal scores had higher R2 values when predicting xFIP, WAR and K/9, with Stuff having a slightly higher R2 value for ERA (Table 1). The new combined model was a better predictor than either metric alone, with the greatest improvement seen for WAR (an 11% increase in explained variance compared to a single input variable).
The combined Arsenal-Stuff model performed the best when predicting xFIP (accounting for 46% of the variance in xFIP). Predicted vs. actual values can be found in figure 1 for all result variables.
Table 1. R2 values between the input variables of Stuff / Arsenal Score, and result values of ERA, K/9, WAR, and xFIP. R2 values are also presented for the combined model, which uses both Arsenal Score and Stuff as an input.
Figure 1. Relationships between predicted K/9, ERA, WAR, and xFIP and actual values. All predicted values are determined from a model that uses both Arsenal Scores and the Stuff Metric.
As a post-hoc analysis, I calculated the difference between predicted values and actual values. For ERA and xFIP, a lower value indicated the player’s predicted ERA or xFIP was lower than their actual results, which, could indicate that the player may perform better in 2016. A higher value may indicate that the pitcher may not have as favourable of results in 2016. The analysis is the opposite for K/9 – with higher values indicating that the pitcher should be expected to strike out more batters in 2016.
Table 2. The top 10, and bottom 10 predicted ERA errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.
|Rank||Pitcher||ERA Difference||Predicted ERA||ERA||Arsenal Score||Stuff|
|Room for Improvement||1||Chris Capuano||-0.80||4.44||7.97||0.19||-0.62|
|Due for Regression||121||Jerad Eickhoff||0.29||3.76||2.65||2.05||0.85|
Table 3. The top 10, and bottom 10 predicted xFIP errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.
|Rank||Pitcher||xFIP Difference||Predicted xFIP||xFIP||Arsenal Score||Stuff|
|Room for Improvement||1||Allen Webster||-0.40||4.30||6.02||-0.95||-0.95|
|10||Chi Chi Gonzalez||-0.21||4.36||5.26||-1.98||0.00|
|Due for Regression||121||Chris Sale||0.15||3.08||2.60||6.49||1.49|
Table 4. The top 10, and bottom 10 predicted K/9 errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.
|Rank||Pitcher||K9 Difference||Predicted K9||K9||Arsenal Score||Stuff|
|Room for Improvement||1||Tyler Wilson||0.52||6.76||3.25||-0.76||-0.55|
|2||Chi Chi Gonzalez||0.39||6.61||4.03||-1.98||0.00|
|Due for Regression||121||Stephen Strasburg||-0.20||9.10||10.96||4.40||1.61|
This new model which incorporates both the Stuff Metric and the Arsenal Score improves predictions of ERA, xFIP, K/9 and WAR. By combining both of these metrics, the new model incorporates both the action of a pitch, plus the ability of a pitcher to induce swings and misses, and ground balls.
Examining the player rankings to determine which pitchers are both under performing and over performing based on the new model’s predictions, there are some interesting names that show up. Carlos Carrasco appears to be due for improvement based on ERA and xFIP. Matt Moore is slowly returning from injury, but could see improvements in 2016 based off of his Stuff and Arsenal Scores.
While pitchers like Zack Greinke, David Price, and Dallas Keuchel appear on the list of pitchers who could see regression in 2016, this is more due to the fact that they had other worldly, perhaps outlier seasons, than it is a commentary on them pitching above their ability. Zack Greinke has gone on the record saying that his 2015 season was an outlier, and “that he may not actually be that good (Rodgers, 2016”. For Blue Jays fans, it is exciting to see how Aaron Sanchez’s stuff predicts he will have a better K/9 next season – though it’s to be seen whether he will pitch as a starter or reliever.
This model, much like the previous evaluations of Stuff and Arsenal scores, does not factor in control, deception or pitch sequencing. While model performance is strong, there is room for improvement of greater than 50% of explained variance. Pitching is complicated, and to achieve better predictions – models will need to grow increasingly complicated.
The combined Stuff/Arsenal score model improves predictions of ERA, xFIP, K/9 and WAR over the individual metrics on their own. This model was used to identify possible candidates for improvement and regression in the 2016 season. Future work should include a variety of more complicated measures to account for control, deception and additional game factors.
Rogers, J., 2016. Zack Greinke on furthering his 2015 domination: ‘I’m probably not that good’. Retrieved from:
Sarris, E., 2016. The Change: Arsenal Scores. Retrieved from: http://www.fangraphs.com/fantasy/the-change-arsenal-scores/, on February 2, 2016.
Sonne, M.W., and Mulla, D., 2015. Revisiting the “Stuff” Metric. Retrieved from http://www.mikesonne.ca/baseball/22/, on December 21, 2016.
Difference between predicted and actual values – all pitchers included in the analysis.