Combining Arsenal Scores and Stuff to evaluate pitcher performance

Introduction

The Arsenal Score is a metric which can examine how effective a pitch currently is, or how effective it could be. This metric is compiled from z-scores (a statistical measure of how far above, or below the mean a specific value is) of ground ball and swinging strike rates (Sarris, 2016). Eno Sarris put this metric together to see which players might be on the verge of a breakout, should they figure out control issues, improve their fitness and last longer in games. Eno has used the Arsenal score to rank pitchers from the 2015 season, proposing that pitchers like Chad Bettis, Rich Hill, and Rasiel Iglesias are on the verge of a breakout.

My colleague Dan and I built the Stuff metric for a couple of different reasons. The first, and yet to be completed, was to look at how a pitcher’s stuff could influence their risk of injury. The second, was for a similar reason as to the development of the Arsenal score – how can we possibly find players who have electric “stuff”, yet are a mere tweak away from major league success. The stuff metric is developed in a similar fashion to the Arsenal score – we look at the z-scores of a pitcher’s velocity, change of velocity, velocity of breaking pitches, and amount of break (Sonne & Mulla, 2015). However, unlike the Arsenal score, we have no indication as to how these pitchers are influencing the hitter – if they causing swings and misses, or if they inducing ground balls. In a sense, this is a weakness of the Stuff metric compared to the Arsenal scores, but it could possibly be used sooner than the arsenal score – as minor league parks install PITCHf/x system and other tools for measuring pitch movement and velocity. Using the stuff metric, we’ve proposed possible 2016 breakout pitchers like Chris Bassitt and Mike Foltynewicz.

These two metrics try to get at similar answers, but go about it in a different manner. For this analysis, I wanted to see how these two metrics could be combined to predict pitcher success.

Methods

I used the stuff metric calculated for 2015 pitchers (found here: http://www.mikesonne.ca/baseball/22/) and the arsenal scores for pitchers in 2015 (found here: http://www.fangraphs.com/fantasy/the-change-arsenal-scores/). In both evaluations, a pitch had to be thrown 100 times to be eligible for further analysis. In total, 138 different pitchers were included in this analysis. To see how both new pitching metrics performed (Arsenal Scores and Stuff), I calculated the R²between the metric and ERA, xFIP, k/9, and WAR. These result values were obtained from FanGraphs. To see how the combined metrics worked to predict pitcher performance, I used a multiple regression analysis, and developed separate equations for each of the FanGraphs result values, using the sum of Arsenal scores and Stuff value as inputs.

For further analysis of the combined metric model, the difference between predicted values and actual values was calculated for ERA, xFIP, and k/9. This analysis did not include WAR, as to allow for equal comparison between players who played different numbers of games.

Results

Model Performance

In general, the Arsenal Score was a better predictor of pitcher performance than Stuff. Arsenal scores had higher R² values when predicting xFIP, WAR and K/9, with Stuff having a slightly higher R² value for ERA (Table 1). The new combined model was a better predictor than either metric alone, with the greatest improvement seen for WAR (an 11% increase in explained variance compared to a single input variable).

The combined Arsenal-Stuff model performed the best when predicting xFIP (accounting for 46% of the variance in xFIP). Predicted vs. actual values can be found in figure 1 for all result variables.

Table 1. R² values between the input variables of Stuff / Arsenal Score, and result values of ERA, K/9, WAR, and xFIP. R² values are also presented for the combined model, which uses both Arsenal Score and Stuff as an input.

	ERA	K9	WAR	xFIP
Stuff	0.14	0.17	0.27	0.13
Sum Arsenal	0.12	0.37	0.33	0.44
Combined Model	0.19	0.41	0.44	0.46

Figure 1. Relationships between predicted K/9, ERA, WAR, and xFIP and actual values. All predicted values are determined from a model that uses both Arsenal Scores and the Stuff Metric.

Player Identification

As a post-hoc analysis, I calculated the difference between predicted values and actual values. For ERA and xFIP, a lower value indicated the player’s predicted ERA or xFIP was lower than their actual results, which, could indicate that the player may perform better in 2016. A higher value may indicate that the pitcher may not have as favourable of results in 2016. The analysis is the opposite for K/9 – with higher values indicating that the pitcher should be expected to strike out more batters in 2016.

Table 2. The top 10, and bottom 10 predicted ERA errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

	Rank	Pitcher	ERA Difference	Predicted ERA	ERA	Arsenal Score	Stuff
Room for Improvement	1	Chris Capuano	-0.80	4.44	7.97	0.19	-0.62
	2	Bud Norris	-0.74	3.85	6.72	1.15	0.81
	3	Keyvius Sampson	-0.67	3.92	6.54	0.11	0.89
	4	Hector Noesi	-0.61	4.28	6.89	-2.06	0.41
	5	Carlos Carrasco	-0.48	2.45	3.63	14.33	1.43
	6	David Hale	-0.47	4.15	6.09	2.36	-0.35
	7	Archie Bradley	-0.46	3.97	5.80	1.51	0.38
	8	Matt Garza	-0.45	3.88	5.63	-0.92	1.25
	9	Matt Moore	-0.38	3.92	5.43	0.90	0.66
	10	Michael Lorenzen	-0.38	3.90	5.40	-0.59	1.10
Due for Regression	121	Jerad Eickhoff	0.29	3.76	2.65	2.05	0.85
	122	Josh Tomlin	0.31	4.36	3.02	0.90	-0.58
	123	Jake Arrieta	0.31	2.56	1.77	7.22	2.95
	124	Jaime Garcia	0.33	3.63	2.43	4.14	0.67
	125	David Price	0.34	3.70	2.45	1.61	1.11
	126	Dallas Keuchel	0.34	3.76	2.48	6.04	-0.19
	127	Brandon Morrow	0.36	4.28	2.73	-1.89	0.37
	128	John Lackey	0.38	4.46	2.77	-2.30	-0.04
	129	Steven Matz	0.44	4.02	2.27	1.02	0.36
	130	Zack Greinke	0.52	3.45	1.66	3.04	1.48

Table 3. The top 10, and bottom 10 predicted xFIP errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

	Rank	Pitcher	xFIP Difference	Predicted xFIP	xFIP	Arsenal Score	Stuff
Room for Improvement	1	Allen Webster	-0.40	4.30	6.02	-0.95	-0.95
	2	Archie Bradley	-0.34	3.85	5.15	1.51	0.38
	3	Henry Owens	-0.33	3.77	5.01	1.93	0.62
	4	Carlos Carrasco	-0.32	2.02	2.66	14.33	1.43
	5	Hector Noesi	-0.30	4.33	5.61	-2.06	0.41
	6	Jarred Cosart	-0.25	3.57	4.46	3.15	0.99
	7	Keyvius Sampson	-0.24	3.99	4.97	0.11	0.89
	8	Garrett Richards	-0.24	3.06	3.80	6.44	1.69
	9	Matt Moore	-0.23	3.91	4.81	0.90	0.66
	10	Chi Chi Gonzalez	-0.21	4.36	5.26	-1.98	0.00
Due for Regression	121	Chris Sale	0.15	3.08	2.60	6.49	1.49
	122	Joe Blanton	0.16	3.56	3.01	3.99	-0.15
	123	Jose Quintana	0.16	4.18	3.51	-0.91	0.33
	124	Dallas Keuchel	0.16	3.29	2.75	6.04	-0.19
	125	Tyler Duffey	0.16	4.35	3.64	-2.35	0.56
	126	Clay Buchholz	0.17	3.98	3.30	0.40	0.57
	127	Brett Anderson	0.18	4.29	3.51	-2.10	0.92
	128	Jose Fernandez	0.19	3.24	2.62	5.38	1.33
	129	Michael Pineda	0.19	3.65	2.95	3.07	0.26
	130	Stephen Strasburg	0.20	3.35	2.69	4.40	1.61

Table 4. The top 10, and bottom 10 predicted K/9 errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

	Rank	Pitcher	K9 Difference	Predicted K9	K9	Arsenal Score	Stuff
Room for Improvement	1	Tyler Wilson	0.52	6.76	3.25	-0.76	-0.55
	2	Chi Chi Gonzalez	0.39	6.61	4.03	-1.98	0.00
	3	Jose Urena	0.39	6.70	4.09	-1.99	0.24
	4	Cody Anderson	0.38	7.01	4.34	-0.47	-0.12
	5	Scott Feldman	0.36	7.91	5.07	1.52	0.71
	6	Jarred Cosart	0.29	8.49	6.07	3.15	0.99
	7	Aaron Sanchez	0.26	8.09	5.95	1.25	1.37
	8	Archie Bradley	0.25	7.78	5.80	1.51	0.38
	9	Kyle Ryan	0.25	6.39	4.79	-0.85	-1.42
	10	Allen Webster	0.25	6.54	4.94	-0.95	-0.95
Due for Regression	121	Stephen Strasburg	-0.20	9.10	10.96	4.40	1.61
	122	Chris Archer	-0.21	8.83	10.70	3.77	1.39
	123	Tyler Duffey	-0.22	6.72	8.22	-2.35	0.56
	124	Chris Sale	-0.22	9.66	11.82	6.49	1.49
	125	Ian Kennedy	-0.23	7.55	9.30	0.18	0.79
	126	Vincent Velasquez	-0.24	7.55	9.38	-0.11	1.00
	127	Nate Karns	-0.27	7.01	8.88	-1.35	0.54
	128	Lance Lynn	-0.28	6.70	8.57	-2.27	0.45
	129	Drew Smyly	-0.34	7.75	10.40	2.16	-0.17
	130	John Lamb	-0.62	6.49	10.51	-2.09	-0.24

Discussion

This new model which incorporates both the Stuff Metric and the Arsenal Score improves predictions of ERA, xFIP, K/9 and WAR. By combining both of these metrics, the new model incorporates both the action of a pitch, plus the ability of a pitcher to induce swings and misses, and ground balls.

Examining the player rankings to determine which pitchers are both under performing and over performing based on the new model’s predictions, there are some interesting names that show up. Carlos Carrasco appears to be due for improvement based on ERA and xFIP. Matt Moore is slowly returning from injury, but could see improvements in 2016 based off of his Stuff and Arsenal Scores.

While pitchers like Zack Greinke, David Price, and Dallas Keuchel appear on the list of pitchers who could see regression in 2016, this is more due to the fact that they had other worldly, perhaps outlier seasons, than it is a commentary on them pitching above their ability. Zack Greinke has gone on the record saying that his 2015 season was an outlier, and “that he may not actually be that good (Rodgers, 2016”. For Blue Jays fans, it is exciting to see how Aaron Sanchez’s stuff predicts he will have a better K/9 next season – though it’s to be seen whether he will pitch as a starter or reliever.

This model, much like the previous evaluations of Stuff and Arsenal scores, does not factor in control, deception or pitch sequencing. While model performance is strong, there is room for improvement of greater than 50% of explained variance. Pitching is complicated, and to achieve better predictions – models will need to grow increasingly complicated.

Conclusion

The combined Stuff/Arsenal score model improves predictions of ERA, xFIP, K/9 and WAR over the individual metrics on their own. This model was used to identify possible candidates for improvement and regression in the 2016 season. Future work should include a variety of more complicated measures to account for control, deception and additional game factors.

References

Rogers, J., 2016. Zack Greinke on furthering his 2015 domination: ‘I’m probably not that good’. Retrieved from:

http://www.sportingnews.com/mlb-news/4695603-zack-greinke-stats-diamondbacks-projection-cy-young-chances, on February 21, 2016.

Sarris, E., 2016. The Change: Arsenal Scores. Retrieved from: http://www.fangraphs.com/fantasy/the-change-arsenal-scores/, on February 2, 2016.

Sonne, M.W., and Mulla, D., 2015. Revisiting the “Stuff” Metric. Retrieved from http://www.mikesonne.ca/baseball/22/, on December 21, 2016.

Additional Information

Difference between predicted and actual values – all pitchers included in the analysis.

http://bit.ly/1TyKbxt

Combining Arsenal Scores and Stuff to evaluate pitcher performance

0 Comments

1 Pingback

Leave a Reply Cancel reply

Recent Posts

Recent Posts

Recent Comments

Archives

Categories

Meta

Categories