The probability of winning and the effect of home-field advantage: the case of Major League Baseball.
Levernier, William ; Barilla, Anthony G.
ABSTRACT
This paper examines the factors that affect the probability of a
major league baseball team winning a game. The basic hypotheses of the
study are that home teams are more likely to win a game than visiting
teams, that teams that travel to arrive at a game are less likely to win
the game than teams that don't, and that teams having a strong
batting performance are more likely to win a game than teams having a
weak batting performance. To examine these issues, we estimate five
logit regressions from data for all 2,428 regular season games played during the 2004 season. We find that while the strength of a team's
batting performance does affect its probability of winning, travel does
not affect the likelihood of either the home team or visiting team
winning a game. The major finding of the paper, however, is that
contrary to the commonly held belief that a home-field advantage exists
in major league baseball games, home teams only have an advantage over
visiting teams in very close games. In games that are won by more than
one run, the likelihood of winning is roughly equal for home teams and
visiting teams.
INTRODUCTION
In major league baseball, like most other professional sports, the
conventional wisdom is that a home-field advantage exists. Birnbaum (2004, p. 972) reports that home teams have historically won about 54
percent of their games. The difference between a 54 percent winning
percentage and a 46 percent winning percentage is substantial since,
during a standard 162-game season, a team that wins 54 percent of its
games will accumulate 12 more victories than a team that wins 46 percent
of its games. Twelve additional wins during the course of a season often
makes the difference between a team going to the post-season playoffs
and not going to the playoffs. In the two most recent seasons, 2003 and
2004, the first place team won fewer than twelve more games than the
second place team in five of the six Major League Baseball divisions.
(1)
One reason the home team has the advantage in baseball is the fact
that they bat last, which becomes a factor in one-run victories. If a
game enters the top of the last inning with the score tied, for example,
the manager of the visiting team doesn't know whether his strategy
should involve trying to score a single run, since he doesn't know
whether or not one run will ultimately be enough to win the game. If the
score is tied entering the bottom of the last inning, however, the
manager of the home team knows that a single run will be enough to win
the game, and he can therefore employ a strategy designed to score just
one run. Another possible reason that a home team has an advantage is
that the visiting team experiences travel-induced stress and fatigue.
Since the visiting team must travel to arrive at a game, it incurs the
inconveniences associated with travel, in terms of both the physical act
of traveling and the act of staying in an unfamiliar city. In some cases
the home team also incurs the inconvenience of travel. (2) If the home
team does travel, they would be subjected to the same travel-induced
fatigue as the visiting team, but they would not experience the
discomfort of being away from the familiar surroundings of home. As
such, when both teams travel to a game the visiting team is more likely
than the home team to be adversely affected by the travel.
The primary purpose of this paper is to determine the effect that
home-field advantage has on the probability of a team winning a major
league baseball game played during the 2004 season. We also determine
the effect that team batting performance and travel have on the
probability of a team winning a game. Specifically, we will determine
whether a home-field advantage exists and, if so, whether it exists
generally or only in limited situations. To examine these issues we
develop and estimate a series of binary logit regressions where the
outcome of the game (i.e., win or lose) is the dependent variable.
In the next section we review the literature pertaining to the
home-field advantage in major league baseball and to the analysis of
factors affecting the run production of baseball teams. In the third
section we discuss the data and descriptive statistics and report the
probability of victory in various situations. In the fourth section we
describe the logit regressions. In the fifth section we report and
discuss the regression results. Finally, in the last section we present
a summary of our major findings and offer some concluding remarks,
including a suggestion for potential directions that future research on
the subject of the home-field advantage might take.
REVIEW OF THE LITERATURE
A relatively new and popular field that applies statistical models
and methodologies to baseball data is sabermetrics, which derives its
name from the Society for American Baseball Research (SABR), an
organization devoted to furthering the study of baseball. Birnbaum
(2004, p. 963) defines sabermetrics as "the science of answering
questions about baseball through the analysis of the statistical
evidence." It has also been defined by Bill James, the man who
popularized sabermetrics in the early 1980s in the initial versions of
the annual The Bill James Baseball Abstract, as "the search for
objective knowledge about baseball" (Grabiner).
The scholarly literature has examined several baseball related
issues. Lindsey (1963), in one of the earliest academic studies
pertaining to baseball performance, derives a formula that explains the
number of runs a team scores based on the various components of its
hitting production. Albert (1994) employs a Bayesian hierarchical model to determine which game-situations affect players' batting average and determines that several situations affect batting average: the pitch
count faced by the batter, facing a pitcher of the opposite arm, facing
a groundball pitcher, and playing in a home game. Albright (1993)
conducts a statistical analysis of hitting streaks among major league
batters during the 1987-1990 seasons and concludes that hitting streaks
happen at about the same rate as what would occur in a random model.
Gius and Hylan (1996), in a statistical study of the determinants of
baseball player salaries, use a fixed-effects multivariate regression
model to estimate player salaries during the 1965 to 1992 period. They
conclude that the bargaining power derived from free agency and salary
arbitration is a major determinant of a baseball player's salary.
In a study of the home-field advantage, Morong (2004) analyzes the
home-field advantage for each season from 1901-2003. He finds that, on
average, a home-field advantage exists but that the advantage has
gradually and slightly decreased over the century. The average yearly
difference between the proportion of games won by the home team and the
proportion of games won by the visiting team during the 1901 to 1950
period was .091, while the average yearly difference during the 1951 to
2002 period was .076.
Birnbaum (2004, p. 973) notes that the question of why the
home-field advantage exists is one of the largest unresolved issues in
sabermetric research. He postulates that there are several possible
reasons for the existence of the advantage: 1) the stress of travel
makes the visiting teams worse; 2) the support of the crowd lifts the
home team to perform better; 3) the home team gains an advantage from
batting last; and 4) the physical and psychological benefits associated
with players being more comfortable in their home city favors the home
team.
A team's likelihood of winning a game is positively related to
the number of runs it scores. A plethora of literature has attempted to
derive quantifiable measures that explain the run production of
particular players or particular teams. Lindsey (1963) estimates a
formula where the number of runs a team scores is a function of the four
components of its hitting production; singles, doubles, triples, and
home runs. (3) Lindsey's model is a forerunner to the modern Linear
Weights System that is often used in sabermetrics. These models estimate
runs scored as a linear function of the various aspects of a batter
getting on base and then advancing once he reaches base. In addition to
the four hit-related variables in the Lindsey model, factors such as
walks, hit by pitch, stolen bases, and caught stealing are also included
in the Linear Weights models (see Palmer and Thorn, 2004). The
underlying premise of the Linear Weights System is that teams that are
more successful at putting runners on base and advancing them will score
more runs. On average, over the course of a season, high-scoring teams
win more games than low scoring teams, (4) and during a particular game
a team is more likely to win the game as its run production increases.
THE DATA AND DESCRIPTIVE STATISTICS
The data used in this study are from all 2,428 games played during
the 2004 Major League Baseball (MLB) season. The data were obtained from
the box scores posted on the Major League Baseball website
(http://www.mlb.com). For each game, data on 14 hitting and base-running
related variables for each team were collected. (5) Data indicating
whether a team was the home team or the visiting team, and whether the
team traveled to arrive at the first game of a series, were also
collected.
Table 1 indicates that during the 2004 season home teams won 53.5
percent of the games played. Overall, home teams won 170 more games than
visiting teams. A detailed examination of Table 1 indicates that 26 of
the 30 teams won more games as the home team than as the visiting team;
seven teams won at least 10 more games as the home team than as the
visiting team; five teams had a winning record as the home team but a
losing record as the visiting team; and no team had a losing record as
the home team but a winning record as the visiting team.
Table 2 lists and defines the variables used in this study. Some
variables that aren't included in the regressions are listed
because they are used to calculate a variable that is included in the
regressions. Table 3 reports the means and standard deviations of the
variables included as explanatory variables in the regressions. They are
reported for the entire sample of 4,856 observations, as well as
separately for home teams, visiting teams, teams that traveled to the
first game of a series, and teams that did not travel to the first game
of a series. In comparing the means of the home teams to those of the
visiting teams, only five of the variables have a difference that is
statistically significant at the .05 level. Both OBP and SLG have a
larger mean for home teams than for visiting teams, while Singles, SO
and GIDP all have a larger mean for visiting teams. To the extent that
OBP and SLG promote scoring while SO and GIDP reduce scoring, these
differences suggest that home teams score more runs than visiting teams.
In comparing the means for traveling teams to those for non-traveling
teams in the first game of a series, only the mean of Singles and SO
have a difference that is statistically significant at the .05 level.
Since Singles and strikeouts (SO) are relatively minor determinants of
runs, (6) and since the difference between the means is relatively
small, this suggests that the number of runs scored by traveling teams
is likely to be approximately equal to the number of runs scored by
non-traveling teams.
A LOGIT MODEL TO ESTIMATE THE PROBABILITY OF A TEAM WINNING A GAME
To further examine the effect that the home-field advantage, team
batting performance, and travel have on the probability of a team
winning a game, a series of five logit regressions are estimated. The
dependent variable is a dummy variable that indicates whether a team
wins or loses a particular game. Several factors that are likely to
affect the probability of a team winning a game have already been
discussed. Additionally, an analysis of the 2004 win-loss record of home
teams reveals that home teams have a substantially higher probability of
winning a game when the run differential between the winning team and
losing team is one run than when the differential is more than one run.
(7) To account for this phenomenon, in addition to the previously
discussed factors the regressions also include as an explanatory
variable a dummy variable that indicates whether or not a game is won by
one run.
The logit regressions estimated in this study are of the general
form,
(1) ln [P(WIN) / (1- P(WIN))] = [alpha] + [beta]X
The logit regressions (8) are estimated using data from the 2,428
major league regular-season games that were played during the 2004
season. Since two teams participated in each game, this yields 4,856
observations. WIN is a dummy variable that takes a value of 1 if a team
wins the game and a value of 0 if it loses. X is a vector of variables
that are hypothesized to affect a team's probability of winning a
particular game. The [alpha] and [beta] terms represent the intercept
and slopes, respectively.
Rearranging (1), the probability of a team winning a randomly
selected game, P(WIN), is computed as,
(2) P(WIN) = [(1 + [e.sup.-([alpha]+[beta]X])).sup.-1]
Equation (2) allows one to determine the probability of a team
winning a game under various scenarios. For example, one can determine
the probability that a team will win a game if the game is won by one
run, if the team in question scores four runs, if the team is the home
team, and if the team did not travel to the game, by simply inserting
the appropriate values into the X vector. Along these lines, one can
determine the probability that a team will win a particular game for any
chosen scenario.
THE RESULTS OF THE LOGIT REGRESSIONS
Table 4 reports the results of five versions of equation (1). The
most basic version of equation (1), Model 1, includes only Runs and Home
as independent variables. In expanded versions of equation (1), two
interaction terms, Home*OneRun and Home*Travel are included as
explanatory variables. The Home*OneRun interaction term is included to
account for the possibility that the likelihood of the home team winning
a game is different in games won by one run than in games won by more
than one run. The Home*Travel interaction term is included to account
for the possibility that the likelihood of the home team winning a game
is different in games to which the home team traveled than in games
where it did not travel.
As expected, the regression results indicate that the number of
runs a team scores has a statistically significant effect on the
probability that it will win the game. Model 1 also supports the
hypothesis that a home-field advantage exists in major league baseball
games.
Beginning with Model 2, the Home*OneRun interaction term is
included in the regressions. When the Home*OneRun interaction term is
added to the model, the effect of the Home variable becomes
statistically insignificant. The interaction term, on the other hand, is
highly significant and positive, indicating that the probability of a
team winning a game is higher for the home team than the visiting team
only when the game is won by one run. This is an important finding since
it reveals that a home-field advantage exists only in games that are won
by a single run; in games that are won by more than one run there is no
home-field advantage.
The effect of travel is determined beginning with Model 3. The
regression results reveal that travel is statistically insignificant,
indicating that travel does not affect a team's probability of
winning a game. (9) The initial expectation was that traveling to the
first game of a series would adversely affect a team's likelihood
of winning the game, due to factors such as travel-induced stress and
fatigue. An interaction term between the home dummy variable and the
travel dummy variable, Home*Travel, is included to examine the
possibility that the effect of travel on the probability of winning a
game is different for home teams than for visiting teams. The regression
results reveal that this variable is also statistically insignificant,
which indicates that travel does not affect the home team's
probability of winning a game differently than the visiting team's
probability. (10)
The marginal effects on the probability of winning a game for each
of the variables included in the logit models are reported in Table 5.
The marginal effect from Model 1 indicates that a home team's
probability of winning a game is .0925 larger than that of a visiting
team when the run differential is ignored. When the run differential is
considered, though, the marginal results indicate that the probability
of a home team winning a game is about .22 higher when the game is won
by one run than when it is won by more than one run. The marginal
results also indicate that by scoring one run more than the average, a
team's probability of winning a game increases by about .14.
To determine the relationship between a team's probability of
winning a game and the number of runs it scores, we insert the
regression coefficients of Model 2 into equation (2) and solve. Table 6
reports the probability of a team winning a game based on the number of
runs it scores for three categories of teams: the home team in games won
by one run; the home team in games won by more than one run; and the
visiting team. Several interesting results emerge. In low scoring games
(1 or 2 runs), the probability that the home team wins the game is more
than twice as large as that of the visiting team if the game is won by
one run. In moderately low scoring games (3 or 4 runs), the probability
of the home team winning the game is at least 60 percent larger than for
the visiting team if the game is won by one run. In games where the
number of runs scored is slightly above the season average of 4.81 runs
(5 or 6 runs), the probability of the home team winning the game is at
least 25 percent larger than for the visiting team if the game is won by
one run. (11) In all cases, when the game is won by more than one run
the probability of a home team winning a game is only minimally higher
than that of the visiting team.
The concept of the Linear Weights System (Palmer and Thorn, 2004)
is also incorporated into equation (1). The essence of the Linear
Weights System is that the number of runs a team scores in a game is
determined by its ability to get runners on base and by its ability to
advance the runners once they reach base. To incorporate this concept
into the model, two regressions, in which the Runs variable is replaced
with a set of variables that measure the ability of the team to get
runners on base and to advance the runners, are estimated.
The results of these regressions are reported in Table 4 and are
listed as Model 4 and Model 5. The variables that measure the
team's ability to get runners on base and to advance runners (i.e.,
performance variables) all have the expected effect. The results of the
variables related to the home-field advantage, the effect of travel, and
the effect of a game being won by one run are consistent with the
previous regressions. The performance variables that have a positive and
statistically significant effect on the probability of a team winning a
game are Singles, Extra, Home Runs, OBP, SLG, BBHBP, Net Steals, and SH.
The variables that have a negative and statistically significant effect
on the probability of a team winning a game are SO and GIDP. These
results suggest that teams that are more successful at getting runners
on base and then advancing the runners during a game are more likely to
win the game than teams that are less successful at doing so.
As in Model 1-3, Home is statistically insignificant in Models 4-5,
indicating that there is no home-field advantage, per se. (12) The
interactive term, Home*OneRun, is again positive and statistically
significant, indicating that the home team has an advantage over the
visiting team only in games that are won by one run; in games that are
won by more than one run there is no home-field advantage. Consistent
with the results of Model 3, the two travel-related variables are
statistically insignificant, indicating that travel does not affect the
probability of either the home team or visiting team winning a game.
The marginal effects of the variables in Models 4-5 are reported in
Table 5. The probability of a home team winning a game that is won by
one run is between .15-.19 larger than that of the visiting team. This
result is not trivial, given that 639 of the 2,428 games (26.3% of the
games) played during the 2004 season were won by one run. A typical team
then played approximately 43 games that were won by only one run. If the
probability of the home team winning such games is between .15 and .19
higher than for the visiting team, it suggests that home teams would be
expected to win 24 or 25 of the 43 games while visiting teams would only
be expected to win 17 or 18 of the games.
SUMMARY AND CONCLUDING REMARKS
The primary purpose of this paper has been to expand sabermetric
knowledge by examining the effect of the home-field advantage on a
team's probability of winning a major league baseball game.
Although it is commonly believed that the home team has a substantial
advantage in major league baseball games, the home-field advantage is an
aspect of baseball that has largely been ignored in prior research.
Birnbaum (2004, P.973) noted that although historically home teams have
won about 54 percent of their games, the question of why they enjoy such
an advantage "is one of the largest unresolved issues in
sabermetric research."
While a simple analysis of the data that focuses only on the number
of wins and losses by home teams and visiting teams supports the
contention of a home-field advantage, a more sophisticated analysis
indicates that a home-field advantage actually exists only in very close
games. In fact, the regression results in this paper indicate that there
is virtually no difference between the probability that the home team
will win a game and the probability that the visiting team will win the
game when the game is won by more than one run. Since about 26 percent
of the games played during the 2004 season were won by one run, the
results of this study imply that a home-field advantage exists in only
about one-quarter of major league baseball games. The results further
indicate that the home team advantage in games won by one run is much
larger than the eight-percentage point advantage implied by a simple
analysis of the data.
The major finding of this study is that the home-field advantage in
major league baseball is much more limited than is commonly believed.
Rather than existing across all types of games, the home-field advantage
exists only in very close games. In games that are decided by more than
one run, the home team and visiting team are equally likely to win the
game. This paper, has furthered our understanding of the home-field
advantage and, as such, has begun to resolve what Birnbaum (2004, p.973)
states is one of the largest unresolved issues in sabermetric research.
The next step in further resolving the issue should be to examine in
more detail differences in games won by one run and games won by more
than one run to see if these differences explain why the home team is so
much more successful in the games won by one run. This might involve an
inning-by-inning analysis of a sample of baseball games to determine if
some specific situation that gives the home team the advantage arises
predominately in game won by one run. If so, then this would explain why
home teams are much more successful in games won by one run than in
games won by more than one run.
ENDNOTES
(1) There were only two cases where the first place team in a
division won at least 13 more games than the second place team. In 2004,
the first place St. Louis Cardinals won 13 more games than the second
place Houston Astros in the National League's Central division. In
2003, the first place San Francisco Giants won 15 more games than the
second place Los Angeles Dodgers in the National League's West
division (Major League Baseball website, http://www.mlb.com).
(2) In major league baseball, unlike most other professional
sports, two teams generally play several games against each other over
consecutive days. Typically, three or four games are played over a three
or four day period. Of the 2,428 games played during the season, 772
were the first scheduled game of a series. The home team traveled to 328
of these games.
(3) Lindsey's formula is Runs = .41(1B) + .82(2B) + 1.06(3B) +
1.42(HR), where Runs is the number of runs scored, 1B is the number of
one-base hits, 2B is the number of two-base hits, 3B is the number of
three-base hits, and HR is the number of home runs. The formula measures
the contribution of each type of hit to a team's run production.
(4) We ran a regression, using data from the 1990-2004 seasons on
all major league teams, where the number of games a team won during the
season was regressed on the number of runs it scored and the number of
runs it allowed during the season. The results indicate that the number
of runs a team scores during a season positively and significantly
affect the number of games it wins. The results of the regression are
not reported here.
(5) The 14 variables collected are at-bats, runs, hits, walks (BB),
strikeouts, two-base hits, three-base hits, home runs, sacrifice hits,
sacrifice flies, ground into double or triple plays, stolen bases,
caught stealing, and hit by pitch (HBP).
(6) We ran an OLS regression using the dataset utilized in this
study, with runs scored by a team as the dependent variable. We find
that an additional single in a game induces a team to score an extra .5
runs while an additional strikeout reduces the number of runs it scores
by .07. Since the difference in mean singles and strikeouts are .34 and
.56, respectively, this implies a difference of about .13 runs between a
team that travels and a team that does not travel, a relatively small
difference. The full results of the regression are not reported here.
(7) There were 639 games during the 2004 season where the run
difference between the winning and losing team was one run. The home
team won 392, or 61.3%, of these games. There were 1,789 games where the
run difference exceeded one run. The home team only won 907, or 50.7%,
of these games.
(8) Discussions of logit models are presented in Aldrich and Nelson
(1984), Greene (1997), Pindyck and Rubinfeld (1991), and Ghosh (1991).
(9) We also ran regressions where a series of categorical variables
related to the distance traveled to arrive at a game replaced the travel
dummy variable. Like the travel dummy variable, the effects of the
distance variables were statistically insignificant. The results of the
regressions are not reported here.
(10) To further examine whether or not travel affects the home
team, we ran regressions where the sample was home teams in the first
game of a series. There were 772 observations in these regressions.
These regressions correspond to Models 3-5 reported in Table 4, with the
Home, Home*OneRun, and Home*Travel variables excluded. Consistent with
the results reported in Table 4, the results indicate that travel does
not significantly affect the probability of the home team winning a
game. The results of the regression are not reported here.
(11) Based on equation (2), the results of Model 2 in Table 6
reveal that a .50 probability of winning a game occurs at 4.90 runs for
visiting teams, at 4.70 runs for home teams in a game won by more than
one run, and at 3.11 runs for home teams in a game won by one run. This
suggests that in games won by one run, home teams need fewer runs, on
average, to win than in games won by more than one run.
(12) We also ran a two regressions comparable to Models 4 and 5
reported in Table 5 that included the Home variable but excluded the
Home*OneRun interaction term. The Home variable was statistically
significant and positive at the .05 level in both equations. The same
coefficients that were statistically significant in Table 4 were
statistically significant in these regressions. The results of these
regressions are not reported here.
REFERENCES
Albert, J. (1994). Exploring baseball hitting data: What about
those breakdown statistics? Journal of the American Statistical
Association, 89, 1066-1074.
Albright, S. C. (1993). A statistical analysis of hitting streaks
in baseball. Journal of the American Statistical Association, 88,
1175-1183.
Aldrich, J. H. & Nelson, F. D. (1984). Linear probability,
logit, and probit models. Beverly Hills, CA: Sage Publications.
Birnbaum, P. (2004). Sabermetrics. In J. Thorn, et. al. (Eds.),
Total baseball: The ultimate baseball encyclopedia (8th ed.) (pp.
963-975). Wilmington, DE: Sport Media Publishing.
Ghosh, S. K. (1991). Econometrics: Theory and applications.
Englewood Cliffs, NJ: Prentice Hall.
Gius, M. P. & T. R. Hylan. (1996). An interperiod analysis of
the salary impact of structural changes in major league baseball:
Evidence from panel data. In J. Fizel, E. Gustafson, & L. Hadley
(Eds.), Baseball Economics: Current Research. Westport, CT: Praeger.
Grabiner, D. The sabermetric manifesto. (n.d.) Retrieved on January
17, 2005 from http://www.baseball1.com/bbdata/grabiner/manifesto.html
Greene, W. H. (1997). Econometric analysis. Upper Saddle River, NJ.
Prentice Hall.
Lindsey, G. (1963). An investigation of strategies in baseball.
Operations Research, 11, 447-501.
Morong, C. (2004). Historical trends in home-field advantage. The
Baseball Research Journal, 32, 100-102
Palmer, P., & Thorn, J. (2004). Linear weights. In J. Thorn,
et. al. (Eds.), Total baseball: The ultimate baseball encyclopedia. (8th
ed.) (pp. 976-979). Wilmington, DE: Sport Media Publishing, Inc.
Pindyck, R. S. & Rubinfeld, D. L. (1991). Econometric models
and economic forecasts. New York, NY: McGraw-Hill.
William Levernier, Georgia Southern University
Anthony G. Barilla, Georgia Southern University
Table 1: Home Wins, Home Losses, Visiting Wins, Visiting Losses, and
Home-Field Advantage
Team Home W Home L Visiting W Visiting L HF Adv
Anaheim 45 36 47 34 -.0247
Arizona 29 52 22 59 .0864
Atlanta 49 32 47 34 .0247
Baltimore 38 43 40 41 -.0247
Boston 55 26 43 38 .1481
Chicago (NL) 45 37 44 36 -.0012
Chicago (AL) 46 35 37 44 .1111
Cincinnati 40 41 36 45 .0494
Cleveland 44 37 36 45 .0988
Colorado 38 43 30 51 .0988
Detroit 38 43 34 47 .0494
Florida 42 38 41 41 .0250
Houston 48 33 44 37 .0494
Kansas City 33 47 25 57 .1076
Los Angeles 49 32 44 37 .0617
Milwaukee 36 45 31 49 .0569
Minnesota 49 32 43 38 .0741
Montreal 35 45 32 50 .0473
New York (NL) 38 43 33 48 .0617
New York (AL) 57 24 44 37 .1605
Oakland 52 29 39 42 .1605
Philadelphia 42 39 44 37 -.0247
Pittsburgh 39 41 33 48 .0801
St. Louis 53 28 52 29 .0123
San Diego 42 39 45 36 -.0370
San Francisco 47 35 44 36 .0232
Seattle 38 44 25 55 .1509
Tampa Bay 41 39 29 52 .1545
Texas 51 30 38 43 .1605
Toronto 40 41 27 53 .1563
All Teams 1299 1129 1129 1299 .0700
Notes: The home-field advantage is the difference between the
proportion of games won as the home team and the proportion of games
won as the visiting team.
Table 2: Variable Definitions
Variable Definition
Runs The number of runs scored by the team
At-Bats The number of times the team's hitters officially
batted.
Hits The team's number of hits
Singles The team's number of one-base hits
Extra The team's combined number of two-base hits and
three-base hits
HR The number of home runs hit by the team
BBHBP The number a times the team's batters reach base on
a walk or on a hit-by-pitch
SB The team's number of stolen bases
CS The number of times a team's runners are caught
stealing
Net Steals The team's number of stolen bases minus its number
of caught stealing
GIDP The number of times the team ground into a double or
triple play
SH The number of times the team advanced a runner with
a sacrifice bunt
SF The number of times the team scored a run with a
sacrifice fly
OBP (1) The team's on-base-percentage
Total Bases (2) The team's number of total bases
SLG (3) The team's slugging percentage
Home A dummy variable that takes a value of 1 if the team
is the home team, 0 if not
OneRun A variable that takes a value of 1 if the run
differential in the game is 1 run, 0 if not
Home * OneRun An interactive term, Home multiplied by OneRun
Travel A variable that takes a value of 1 if the game is
the first scheduled game of a series and the team
had to travel to arrive at the game, 0 if not
Home*Travel An interactive term, Home multiplied by Travel
Notes: (1) OBP is calculated as (Hits + BBHBP)/(At Bats + BBHBP + SF)
(2) Total bases is calculated as (Hits + Two-base hits + (Three-base
hits * 2) + (Home Runs * 3))
(3). SLG is calculated as (Total Bases)/(At bats + SF)
Table 3: Means and Standard Deviations of Selected Variables by Team
Classification
Variable All Home Visitor Travel Non-Travel
Runs 4.814 4.829 4.798 4.913 4.837
(3.218) (3.122) (3.312) (3.252) (3.139)
Singles (a, b) 6.024 5.925 6.124 6.124 5.780
(2.729) (2.671) (2.783) (2.746) (2.503)
Extra 2.022 2.002 2.041 2.023 1.944
(1.514) (1.503) (1.525) (1.512) (1.532)
HR 1.123 1.117 1.129 1.184 1.192
(1.133) (1.117) (1.150) (1.178) (1.178)
BBHBP 3.722 3.785 3.658 3.724 3.788
(2.275) (2.295) (2.255) (2.206) (2.461)
SO (a, b) 6.554 6.239 6.869 6.718 6.156
(2.762) (2.713) (2.775) (2.841) (2.815)
Net Steals .307 .313 .300 .308 .359
(.948) (.941) (.956) (.960) (.953)
GIDP (a) 780 .747 .813 .783 .817
(.858) (.845) (.870) (.827) .878)
SH .356 .360 .353 .361 .323
(.613) (.615) (.611) (.623) (.579)
OBP (a) .328 .334 .322 .327 .330
(.083) (.084) (.081) (.083) (.086)
SLG (a) .419 .427 .411 .421 .429
(.156) (.159) (.153) (.158) (.163)
Total Bases 14.742 14.590 14.895 15.080 14.655
(6.442) (6.202) (6.672) (6.662) (6.357)
Observations 4856 2428 2428 1095 449
Notes: Standard deviations are shown in parenthesis.
(a) indicates there is a statistically significant difference at the
.05 level between the mean value of home teams and visiting teams
(b) indicates there is a statistically significant difference at the
.05 level between the mean value of traveling teams and non-traveling
teams
The team classifications are defined as follows:
All includes all teams in all games played.
Home includes only the home team in all games played.
Visiting includes only the visiting team in all games played.
Travel includes the traveling team(s) in the first scheduled game of a
series.
Non-Travel includes the non-traveling team, if any, in the first
scheduled game of a series.
Table 4: Results of Logit Regressions with Win as the Dependent
Variable, Expanded Model
Variable Model 1 Model 2 Model 3
Runs .5436 (a) .5538 (a) 5538 (a)
(32.862) (33.060) (33.055)
Singles
Extra
Home Runs
OBP
SLG
BBHBP
SO
Net Steals
GIDP
SH
Home .3715 (a) .1115 .1205
(5.258) (1.428) (1.376)
Home * OneRun .8816 (a) .8811 (a)
(7.994) (7.988)
Travel .0072
(.066)
Home*Travel -0.0575
(.313)
Observations 4856 4856 4856
Log Likelihood -2426.30 -2393.55 -2393.49
Num. Correct (b) 3645 3668 3668
Variable Model 4 Model 5
Runs
Singles .2047 (a)
(15.180)
Extra .3365 (a)
(14.087)
Home Runs .6358 (a)
(18.839)
OBP 11.4498 (a)
(16.523)
SLG 4.5812 (a)
(13.247)
BBHBP .1999 (a)
-12.406
SO -.1068 (a)
-8.314
Net Steals .1874 (a) .2033 (a)
-5.195 (5.488)
GIDP -.3019 (a) -.3867 (a)
(7.309) (9.102)
SH .3498 (a) .2664 (a)
(6.060) (4.469)
Home .1454 -.0586
(1.756) (.681)
Home * OneRun .5872 (a) .7768 (a)
(5.459) (6.906)
Travel .0005 .0106
(.005) (.101
Home*Travel -.0214 -.0175
(.124) (.096)
Observations 4856 4856
Log Likelihood -2635.29 -2472.04
Num. Correct (b) 3520 3612
Notes
The absolute values of the t-statistics are shown in parenthesis.
The regressions were run with a constant term, the results of which are
not reported here
(a) denotes statistically significant at the .05 level or higher
(b) denotes the number of correct predictions. If the predicted
probability for a team exceeds .5, the team is the predicted winner of
the game.
Table 5: Marginal Effects on the Probability of Win=1
Variable Model 1 Model 2 Model 3
Runs .1353 (a) .1379 (a) .1379 (a)
Singles
Extra
Home Runs
OBP * 100
SLG * 100
BBHBP
SO
Net Steals
GIDP
SH
Home .0925 (a) .0278 .0300
Home * OneRun .2196 (a) .2194 (a)
Travel .0018
Home * Travel -.0143
Variable Model 4 Model 5
Runs
Singles .0511 (a)
Extra 0841 (a)
Home Runs .1589 (a)
OBP * 100 .0286 (a)
SLG * 100 .0115 (a)
BBHBP .0500 (a)
SO -.0267 (a)
Net Steals .0468 (a) .0508 (a)
GIDP -.0754 (a) -.0967 (a)
SH .0874 (a) .0666 (a)
Home .0363 -.0146
Home * OneRun .1467 (a) .1942 (a)
Travel .0001 .0027
Home * Travel -.0054 -.0044
Notes: (a) denotes statistically significant at the .05 level or higher
in the logit regression
Table 6: Probability of Winning a Game, by Runs and Team Category,
Model 2
Runs Home, 1RD Home, 2RD Visiting
1 .2371 .1140 .1032
2 .3510 .1830 .1669
3 .4848 .2804 .2584
4 .6208 .4040 .3775
5 .7401 .5412 .5134
6 .8321 .6723 .6473
7 .8961 .7812 .7615
8 .9375 .8613 .8475
9 .9631 .9153 .9063
10 .9785 .9495 .9439
11 .9875 .9703 .9670
12 .9928 .9827 .9807
13 .9958 .9900 .9888
14 .9976 .9942 .9936
15 .9986 .9967 .9963
Notes:
Home, 1RD denotes the home team in a game where the run differential is
one run.
Home, 2RD denotes the home team in a game where the run differential is
two or more runs.
Visiting denotes the visiting team in a game, without respect to the
run differential.