<img src="//bat.bing.com/action/0?ti=5037995&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">

Advanced Analytics: BABIP - A Quality Statistic Measuring Quality Contact

Batting Average on Balls in Play (“BABIP”) is a statistic that measures success on batted balls that remain in the field of play. BABIP can also be used by coaches at the High School level to measure their teams’ and players’ ability to consistently make quality contact when they do put the bat on the ball. The equation for BABIP is as follows:1
 
BABIP = H - HR ) ÷ AB - HR - K + SF or Hits in Play ÷ Balls in Play 

There are three main factors that affect BABIP in the short run and understanding each of them individually can go a long way in helping fans interpret the significance of the statistic as a whole.

  • Defense: A batter hitting against an above average defense will likely struggle more to reach base safely. Fielders that can cover more ground or potentially dive and stop a hard hit ball will hurt a player’s BABIP in the short run.
  • Luck: A lucky player who gets bloop singles and finds the hole on slow rollers could potentially have a higher BABIP than a player who constantly hits the ball hard down the line, but right to the third baseman.
  • Talent: A talented batter who makes better contact will hit the ball harder and through the infield more often than less talented hitters. Although in the short run a talented batter may be hurt by strong defense or benefit by luck, with a large enough sample size the batter’s talent will prevail and the player’s BABIP will likely reflect that much more strongly than defense and luck.

Many baseball fans today doubt the predictive ability of BABIP due to its apparent unpredictability at the professional level. The correlation of BABIP vs Runs Scored for every MLB team over the last fifteen seasons can be seen on the graph below.2 Each point on the graph represents a single team from a single season (e.g. the 2007 New York Yankees).

Major League Baseball BABIP vs MLBS Runs Scored.png

According to the data, one can somewhat predict a Major League team’s offensive success solely by looking at its BABIP for a season. A correlation coefficient of a graph (the R2 value) measures how close the data falls to the regression line (the blue line). The correlation coefficient of 0.19 for this graph means that about 19% of teams’ scoring outputs can be explained by their BABIP. This is evident by the lack of a cluster of points around the regression line (a correlation coefficient of 1 would have all the points in a straight line on the regression line). So, while a higher BABIP does usually result in more runs scored, this may not always be the case in the MLB. 

However, BABIP is a much stronger indicator of offensive performance at the amateur level. The graph below represents the correlation between BABIP and Runs for more than 9,000 High School-level teams that scored a minimum of twenty games on GameChanger during the Spring 2017 season.3 The teams in red were 2017 High School State Champions (according to MaxPreps):

HSBABIP-1.png 

Although the relationship between BABIP and Runs in the MLB is positive, there is a much stronger correlation between BABIP and Runs per Plate Appearance at the amateur level.4 This can be seen by the large cluster of points around the regression line, and the cluster is also going to the top right, like the regression line. The correlation coefficient of this graph is 0.491 because the cluster is much tighter than the last graph. This means that BABIP can predict scoring outputs for almost half of the teams at the High School level, more than double the predictive capabilities of BABIP at the Major League level.

The stronger predictive ability at the amateur level is likely due to a the fact that 13- to 18-year-olds hit Home Runs at a much lower rate than the pros do. MLB players hit one home run every 30.3 at-bats, accounting for nearly 45% of all runs scored.5  By comparison, players at the High School level only hit one Home Run every 212.6 at-bats. Since BABIP only accounts for balls in the field of play, it predicts offensive output much better when fewer runs are scored via Home Runs, like they are at the High School level. 

Below is a table displaying the top five GameChanger teams in BABIP and their corresponding percentile ranks in Runs Scored:

BABIP Rank

Team Location (Town, State)

Team Nickname

 BABIP 

National Percentile in Runs Scored

1

Castro Valley, CA

Dodgers

0.675

91%

2

Blue Springs, MO

Shockers

0.633

98%

3

Baton Rouge, LA

Cardinals

0.633

98%

4

Spreckels Park, CA

Tigers

0.620

99%

5

Blount, AL

Leopards

0.619

89%

According to the table, each of the top-five teams in BABIP also had more success scoring runs than the average High School level team (three of them were in the top two percentiles for Runs Scored). In addition, all five of these teams scored more than 0.26 Runs per Plate Appearance – much greater than the median of 0.19 Runs scored per Plate Appearance. Considering the fact that these teams average about 30 Plate Appearances per game, these top-five teams would score more than two runs more per game than their league-average counterparts.5

The prevailing theory for what BABIP truly measures is the ability of the batter to make strong contact. The graph below poses BABIP against a statistic called Isolated Power (ISO), which measures raw power by calculating the number of extra bases gained per At-Bat.6 Teams with higher ISO are hitting for extra bases at a higher rate due to the fact that they are making better contact with the ball more often. The formula and graph are shown below:

ISO = ( 2B + 2*3B + 3*HR )  ÷ AB HSISO-2.png

According to the graph, BABIP is also a strong predictor of a team’s ability to hit for extra bases. A correlation coefficient of 0.26 conveys that BABIP explains more than 25 % of team’s Isolated Power. In comparison, BABIP explains less than 2% of ISO in the MLB. This strong linear trend at the High School level also proves that there’s a lot more to BABIP than luck. The majority of lucky hits in baseball are bloop singles, but all of these teams are getting their fair share of extra-base hits due to an increased rate of quality contact.

BABIP can also tell a lot about an individual player’s ability to make strong contact. The following graph displays the correlation between BABIP and Runs Created per Plate Appearance for individual players with at least 20 Plate Appearances:7

PlayerBABs.png

According to the graph, BABIP is a great predictor of offensive success for hitters at the High School level, with as few as 20 Plate Appearances (the correlation coefficient is 0.583). Teams and players with BABIPs above the median of 0.367 are consistently making quality contact with the ball and will likely continue to do so. On the other hand, those struggling with BABIP will need to work on swing mechanics and focus on squaring up the ball in order to improve BABIP and scoring outputs for their teams.

Overall, Batting Average on Balls in Play or BABIP should be used as a go-to metric for your team in the seasons to come. BABIP is a strong guideline to measure quality, consistent contact with the ball, resulting in maximized run production for the team.

From Michael Model of GameChanger

1 Note that Strikeouts are not included in the denominator because the batter did not make contact. Home Runs are not included either because the ball left the field of play.

2 There are a total of 450 data points, as all 30 teams have a point for each of the last fifteen seasons.

3 Data is from teams composed of players between the ages of 13 and 18, on High School and Travel/Select teams. Data was calculated through May 22, 2017.

4 Runs per Plate Appearance was used because season lengths vary greatly across the country, and this scale allows us to keep the best teams across the country in the data set.

5 Calculated using data from https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2017

6 Normalized at 30 plate appearance per game.

7 Formula inspired by Weighted On Base Average from FanGraphs.com

8 Runs Created was created by Bill James as a way to put all hitters on an equal playing field. Any hitter can create runs at any time in any situation. The formula is RC = (H+BB)*TB/(AB+BB).

Baseball, Baseball Stats & Scorekeeping, Advanced Analytics

Comments