Cy Young Award in the Sabermetrics Era: A Study of Who Will Win in 2011, Part 2

This is Part 2 of a two-part series in which I will analyze a current Cy Young Predictor formula. I offer a replacement formula to account for the change in philosophy for the Cy Young voters with the growing influence of new-age statistics (sabermetrics) and use this new formula to project the Cy Young race in 2011 and beyond.

In Part 1, I looked into a widely accepted Cy Young Predictor formula and explained the flaws in it. (You can check out Part 1 here).

In Part 2, I will project pitching statistics for the top 12 National League pitchers.

NOTE: All of the Tables contain a lot of information and, as such, have been uploaded elsewhere and linked in here for better clarity.

PART 2: A Method to predict the Cy Young Award winner in any given year

We have already found a more accurate way to predict who will win the Cy Young Award based upon season statistics (see Part 1), but now let’s look into an accurate way to predict who will win the Cy Young before the season even starts.

The 2011 MLB season is quickly approaching, and there has been a ton of hype surrounding the Phillies’ pitching staff, but is it warranted?

They have all shown to be dominant pitchers in the past, but how likely is it that one of the four main Phillies' starters will take home the Cy Young this year?

Who are the most likely candidates to challenge the Phillies' aces for the crown in 2011?

To answer all of these questions, I will need to project 2011 pitching statistics based on prior years’ data.

There’s no simple way to do this. Every year is different, and you don’t know who has made improvements and who has struggled through the offseason.

The age factor is always a question too. Some young pitchers come in with high expectations and never break through, while others come out of nowhere and have dominant seasons.

Older pitchers have a lot more experience, yet their arm strength usually suffers late in their career.

Pitchers going to a new team, or significant defensive improvements made in the offseason, are both obstacles that are used to project accurate pitching statistics.

But in general, barring any unforeseen injury, a pitcher’s statistics will be closely related to his statistics from past seasons.

There is a limit to how far back you can look, though.

Obviously, Cliff Lee’s rookie or sophomore season isn’t a fair comparison to his later years. It takes time for a pitcher to show his true colors and for him to either develop into a star, or fade into obscurity.

Most pitchers have bumps along the way, but looking at any three or four-year time period seems to tell a lot. A pitcher’s average statistics over one three-year period will often provide clues as to how his next year will be.

Why three years?

Well, three years seems to be a good median number of years to analyze. If we only look at the previous season, we won’t give ourselves enough information.

As an example, Zach Greinke won the Cy Young in 2009 with a 2.16 ERA and 1.07 WHIP. His numbers in 2008 were decent, but not Cy Young type numbers.

In 2010, his numbers again were really good, but not enough for the Cy Young. If we were to only gauge a player’s performance on the previous year, we would surely think Greinke could post Cy Young caliber numbers in 2010.

However, it is actually very rare for a pitcher to have similar seasons back to back.

On the flip side, if we were to look at Greinke’s entire career, we’d be pulling in information from when he was just starting out in the MLB and hadn’t yet developed into the dominant pitcher he turned out to be. Again, that wouldn’t be a fair analysis.

Greinke is just one example, but the trend holds true for the majority of cases.

The 2011 Cy Young Predictor

I've compiled a list of the top 12 starting pitchers over the past three seasons. Their statistics are shown in table 2a here:

TABLE 2a

The average number of games started over the three-year time period is boxed for each pitcher. The adjusted wins/losses columns will be discussed shortly.

First, let’s use these statistics from past years to find an accurate projection for the 2011 season in Cy Young Points (CYP).

TABLE 2b

The CYP (adjusted) column shows what that pitcher’s statistics in that year would have given him with the ADJUSTED CYP as found in PART 1 (Cy Young Points (CYP) = ((5*IP/9)-ER) + (K’s/5) + (SV*1.5) + (Shutouts*2) + ((W*3)-(L*2)) + (VB*5) + ((0.5*IP)-(IP*WHIP/3)))

This is found by adding the individual CYP from each individual category shown in the previous columns.

The W/L ADJ is a simple adjustment to account for a pitcher changing teams between 2008 and 2011.

Take, for instance, Zack Greinke.

Basically, if Greinke was on the Brewers for the past three years, how many wins would have been compared to how many wins he got with the Royals?

Clearly, the Brewers had a better offense, so Greinke would have most likely had a few more wins. This adjustment compares the team wins in that given year and, using that, shows what the CYP would have been.

Then the CYP(W/L adj) column shows the total CYP using the W/L adjusted values.

The CYP(#G adj) is a simple adjustment to equalize the number of games started for all pitchers.

Some pitchers have started more games than others over the past three seasons. Some of that is due to the pitcher’s team being in a playoff race, and some is due to the pitcher himself having a dominant season.

In any given year, any pitcher shown could throw more innings, but it depends on if they are called upon to do so. It’s out of the pitcher’s control, so I assume each pitcher to start the SAME number of games to equalize each pitcher’s chances.

If you look at Table 2a, the boxed numbers show the average number of games played for each pitcher. Normalizing to the largest average will eliminate the advantage that some pitchers have had by simply starting more games.

NOTE: This does not equalize number of innings pitched, but only equalizes the number of starts. If a pitcher typically goes deep into games, he will still have a significant advantage over a pitcher who only tosses a few innings per start.

After those two simple adjustments, we can find the average CYP over the past three years, as shown in the far right column.

As you can see, I’ve taken a straight average over the past three years to find the CYP (ave). The reason I haven’t put any emphasis on more recent years, is like I mentioned earlier: it’s very rare for a pitcher to have similar seasons back-to-back.

An average over a three-year period is a much better indicator.

Now comes the fun part.

Using all this data, we can find the probability of a pitcher having a dominant season, and thus infer his likeliness to take home the Cy Young award.

We have to find the probability that a pitcher will reach a high numbers of CYP in 2011. Oswalt may average 129 CYP, but will he be able to get enough CYP in 2011 to win the Cy Young Award? You won’t do that with only 129 CYP.

The way we can analyze this is cumulative probability.

Cumulative probability is the sum of probabilities. It is used to predict the probability of a randomly selected score being greater than or equal to a specific value (referred to as the normal random variable).

If we set our normal random variable to a CYP value of say, 190, that would show the probability of any pitcher reaching 190 CYP in 2011. This number is used because using the past data, if a pitcher scores 190 CYP, he should take home the award.

Table 2c below shows the standard deviation, then the cumulative probability of each of the top 12 pitchers achieving 190 CYP in 2011.

Then, in the rightmost column, each pitchers' chance of winning the award is broken down into a percentage of the sum.

TABLE 3a

A safety factor of 20 percent was left in to account for other starting pitchers not mentioned, and relief pitchers.

As you can see, Roy Halladay has the best chance of achieving 190 CYP this year. Cliff Lee, Chris Carpenter and Tim Lincecum also have good odds, but the odds are significantly less than Halladay’s.

A breakdown by team is shown in Table 2d below.

Probability By Team
Phillies	49.64%
Cardinals	11.32%
Giants	10.80%
Brewers	7.30%
Rockies	2.14%
Padres	1.58%
Dodgers	0.93%
Marlins	0.67%
All Others	15.63%
TOTAL	84.38%

The Phillies’ top four starters combine for a predictable advantage in winning the Cy Young Award this year at 49.64 percent.

The Cardinals have the next best odds at 11.32 percent, but it’s worth noting that the analysis was also done with Wainwright in the rotation for the Cards.

The probability shifted from around 15 percent for the Cardinals, down to 11.32 percent without him.

Conclusion

The calculations presented in this study use several assumptions, and every year presents a new opportunity for ANY pitcher.

A new ace may come out of nowhere, much like Mat Latos did last year. A steady pitcher over the past few years could completely fall apart, a pitcher could get traded, or there could be significant injuries to anyone.

You never know what will happen, and that’s why we watch sports, right? If everything went as predicted every year, where’s the fun in cheering for the underdog?

This study is a snapshot of predictable statistics for the 2011 season and how the Cy Young voting will turn out come season’s end, based on things we know NOW.

The Phillies may have good odds to win the Cy Young, but will they? Or will a young stud come out of nowhere and challenge one of the perennials in the N.L.?

The odds indicate otherwise but, with so many variables, it’s impossible to really know.

When we sharpen our pencil and get down deep into the stats, we can know a little more and be more prepared to answer those questions.

Hopefully, this study puts you a little bit ahead of the other guy.

Written By Todd Drager
This article was originally published in clean and simple PDF form here.

Follow Todd on Twitter @7thandPattison

Read more MLB news on BleacherReport.com

User login

Who's online

Cy Young Award in the Sabermetrics Era: A Study of Who Will Win in 2011, Part 2

Poll

Recent blog posts

Featured Sponsors