文章基本信息

标题：The probability of winning and the effect of home-field advantage: the case of Major League Baseball.
作者：Levernier, William ; Barilla, Anthony G.
期刊名称：Academy of Information and Management Sciences Journal
印刷版ISSN：1524-7252
出版年度：2006
期号：July
语种：English
出版社：The DreamCatchers Group, LLC
摘要：This paper examines the factors that affect the probability of a major league baseball team winning a game. The basic hypotheses of the study are that home teams are more likely to win a game than visiting teams, that teams that travel to arrive at a game are less likely to win the game than teams that don't, and that teams having a strong batting performance are more likely to win a game than teams having a weak batting performance. To examine these issues, we estimate five logit regressions from data for all 2,428 regular season games played during the 2004 season. We find that while the strength of a team's batting performance does affect its probability of winning, travel does not affect the likelihood of either the home team or visiting team winning a game. The major finding of the paper, however, is that contrary to the commonly held belief that a home-field advantage exists in major league baseball games, home teams only have an advantage over visiting teams in very close games. In games that are won by more than one run, the likelihood of winning is roughly equal for home teams and visiting teams.
关键词：Baseball (Professional);Baseball teams;Professional baseball

The probability of winning and the effect of home-field advantage: the case of Major League Baseball.

Levernier, William ; Barilla, Anthony G.

ABSTRACT

This paper examines the factors that affect the probability of a major league baseball team winning a game. The basic hypotheses of the study are that home teams are more likely to win a game than visiting teams, that teams that travel to arrive at a game are less likely to win the game than teams that don't, and that teams having a strong batting performance are more likely to win a game than teams having a weak batting performance. To examine these issues, we estimate five logit regressions from data for all 2,428 regular season games played during the 2004 season. We find that while the strength of a team's batting performance does affect its probability of winning, travel does not affect the likelihood of either the home team or visiting team winning a game. The major finding of the paper, however, is that contrary to the commonly held belief that a home-field advantage exists in major league baseball games, home teams only have an advantage over visiting teams in very close games. In games that are won by more than one run, the likelihood of winning is roughly equal for home teams and visiting teams.

INTRODUCTION

In major league baseball, like most other professional sports, the conventional wisdom is that a home-field advantage exists. Birnbaum (2004, p. 972) reports that home teams have historically won about 54 percent of their games. The difference between a 54 percent winning percentage and a 46 percent winning percentage is substantial since, during a standard 162-game season, a team that wins 54 percent of its games will accumulate 12 more victories than a team that wins 46 percent of its games. Twelve additional wins during the course of a season often makes the difference between a team going to the post-season playoffs and not going to the playoffs. In the two most recent seasons, 2003 and 2004, the first place team won fewer than twelve more games than the second place team in five of the six Major League Baseball divisions. (1)

One reason the home team has the advantage in baseball is the fact that they bat last, which becomes a factor in one-run victories. If a game enters the top of the last inning with the score tied, for example, the manager of the visiting team doesn't know whether his strategy should involve trying to score a single run, since he doesn't know whether or not one run will ultimately be enough to win the game. If the score is tied entering the bottom of the last inning, however, the manager of the home team knows that a single run will be enough to win the game, and he can therefore employ a strategy designed to score just one run. Another possible reason that a home team has an advantage is that the visiting team experiences travel-induced stress and fatigue. Since the visiting team must travel to arrive at a game, it incurs the inconveniences associated with travel, in terms of both the physical act of traveling and the act of staying in an unfamiliar city. In some cases the home team also incurs the inconvenience of travel. (2) If the home team does travel, they would be subjected to the same travel-induced fatigue as the visiting team, but they would not experience the discomfort of being away from the familiar surroundings of home. As such, when both teams travel to a game the visiting team is more likely than the home team to be adversely affected by the travel.

The primary purpose of this paper is to determine the effect that home-field advantage has on the probability of a team winning a major league baseball game played during the 2004 season. We also determine the effect that team batting performance and travel have on the probability of a team winning a game. Specifically, we will determine whether a home-field advantage exists and, if so, whether it exists generally or only in limited situations. To examine these issues we develop and estimate a series of binary logit regressions where the outcome of the game (i.e., win or lose) is the dependent variable.

In the next section we review the literature pertaining to the home-field advantage in major league baseball and to the analysis of factors affecting the run production of baseball teams. In the third section we discuss the data and descriptive statistics and report the probability of victory in various situations. In the fourth section we describe the logit regressions. In the fifth section we report and discuss the regression results. Finally, in the last section we present a summary of our major findings and offer some concluding remarks, including a suggestion for potential directions that future research on the subject of the home-field advantage might take.

REVIEW OF THE LITERATURE

A relatively new and popular field that applies statistical models and methodologies to baseball data is sabermetrics, which derives its name from the Society for American Baseball Research (SABR), an organization devoted to furthering the study of baseball. Birnbaum (2004, p. 963) defines sabermetrics as "the science of answering questions about baseball through the analysis of the statistical evidence." It has also been defined by Bill James, the man who popularized sabermetrics in the early 1980s in the initial versions of the annual The Bill James Baseball Abstract, as "the search for objective knowledge about baseball" (Grabiner).

The scholarly literature has examined several baseball related issues. Lindsey (1963), in one of the earliest academic studies pertaining to baseball performance, derives a formula that explains the number of runs a team scores based on the various components of its hitting production. Albert (1994) employs a Bayesian hierarchical model to determine which game-situations affect players' batting average and determines that several situations affect batting average: the pitch count faced by the batter, facing a pitcher of the opposite arm, facing a groundball pitcher, and playing in a home game. Albright (1993) conducts a statistical analysis of hitting streaks among major league batters during the 1987-1990 seasons and concludes that hitting streaks happen at about the same rate as what would occur in a random model. Gius and Hylan (1996), in a statistical study of the determinants of baseball player salaries, use a fixed-effects multivariate regression model to estimate player salaries during the 1965 to 1992 period. They conclude that the bargaining power derived from free agency and salary arbitration is a major determinant of a baseball player's salary.

In a study of the home-field advantage, Morong (2004) analyzes the home-field advantage for each season from 1901-2003. He finds that, on average, a home-field advantage exists but that the advantage has gradually and slightly decreased over the century. The average yearly difference between the proportion of games won by the home team and the proportion of games won by the visiting team during the 1901 to 1950 period was .091, while the average yearly difference during the 1951 to 2002 period was .076.

Birnbaum (2004, p. 973) notes that the question of why the home-field advantage exists is one of the largest unresolved issues in sabermetric research. He postulates that there are several possible reasons for the existence of the advantage: 1) the stress of travel makes the visiting teams worse; 2) the support of the crowd lifts the home team to perform better; 3) the home team gains an advantage from batting last; and 4) the physical and psychological benefits associated with players being more comfortable in their home city favors the home team.

A team's likelihood of winning a game is positively related to the number of runs it scores. A plethora of literature has attempted to derive quantifiable measures that explain the run production of particular players or particular teams. Lindsey (1963) estimates a formula where the number of runs a team scores is a function of the four components of its hitting production; singles, doubles, triples, and home runs. (3) Lindsey's model is a forerunner to the modern Linear Weights System that is often used in sabermetrics. These models estimate runs scored as a linear function of the various aspects of a batter getting on base and then advancing once he reaches base. In addition to the four hit-related variables in the Lindsey model, factors such as walks, hit by pitch, stolen bases, and caught stealing are also included in the Linear Weights models (see Palmer and Thorn, 2004). The underlying premise of the Linear Weights System is that teams that are more successful at putting runners on base and advancing them will score more runs. On average, over the course of a season, high-scoring teams win more games than low scoring teams, (4) and during a particular game a team is more likely to win the game as its run production increases.

THE DATA AND DESCRIPTIVE STATISTICS

The data used in this study are from all 2,428 games played during the 2004 Major League Baseball (MLB) season. The data were obtained from the box scores posted on the Major League Baseball website (http://www.mlb.com). For each game, data on 14 hitting and base-running related variables for each team were collected. (5) Data indicating whether a team was the home team or the visiting team, and whether the team traveled to arrive at the first game of a series, were also collected.

Table 1 indicates that during the 2004 season home teams won 53.5 percent of the games played. Overall, home teams won 170 more games than visiting teams. A detailed examination of Table 1 indicates that 26 of the 30 teams won more games as the home team than as the visiting team; seven teams won at least 10 more games as the home team than as the visiting team; five teams had a winning record as the home team but a losing record as the visiting team; and no team had a losing record as the home team but a winning record as the visiting team.

Table 2 lists and defines the variables used in this study. Some variables that aren't included in the regressions are listed because they are used to calculate a variable that is included in the regressions. Table 3 reports the means and standard deviations of the variables included as explanatory variables in the regressions. They are reported for the entire sample of 4,856 observations, as well as separately for home teams, visiting teams, teams that traveled to the first game of a series, and teams that did not travel to the first game of a series. In comparing the means of the home teams to those of the visiting teams, only five of the variables have a difference that is statistically significant at the .05 level. Both OBP and SLG have a larger mean for home teams than for visiting teams, while Singles, SO and GIDP all have a larger mean for visiting teams. To the extent that OBP and SLG promote scoring while SO and GIDP reduce scoring, these differences suggest that home teams score more runs than visiting teams. In comparing the means for traveling teams to those for non-traveling teams in the first game of a series, only the mean of Singles and SO have a difference that is statistically significant at the .05 level. Since Singles and strikeouts (SO) are relatively minor determinants of runs, (6) and since the difference between the means is relatively small, this suggests that the number of runs scored by traveling teams is likely to be approximately equal to the number of runs scored by non-traveling teams.

A LOGIT MODEL TO ESTIMATE THE PROBABILITY OF A TEAM WINNING A GAME

To further examine the effect that the home-field advantage, team batting performance, and travel have on the probability of a team winning a game, a series of five logit regressions are estimated. The dependent variable is a dummy variable that indicates whether a team wins or loses a particular game. Several factors that are likely to affect the probability of a team winning a game have already been discussed. Additionally, an analysis of the 2004 win-loss record of home teams reveals that home teams have a substantially higher probability of winning a game when the run differential between the winning team and losing team is one run than when the differential is more than one run. (7) To account for this phenomenon, in addition to the previously discussed factors the regressions also include as an explanatory variable a dummy variable that indicates whether or not a game is won by one run.

The logit regressions estimated in this study are of the general form,

(1) ln [P(WIN) / (1- P(WIN))] = [alpha] + [beta]X

The logit regressions (8) are estimated using data from the 2,428 major league regular-season games that were played during the 2004 season. Since two teams participated in each game, this yields 4,856 observations. WIN is a dummy variable that takes a value of 1 if a team wins the game and a value of 0 if it loses. X is a vector of variables that are hypothesized to affect a team's probability of winning a particular game. The [alpha] and [beta] terms represent the intercept and slopes, respectively.

Rearranging (1), the probability of a team winning a randomly selected game, P(WIN), is computed as,

(2) P(WIN) = [(1 + [e.sup.-([alpha]+[beta]X])).sup.-1]

Equation (2) allows one to determine the probability of a team winning a game under various scenarios. For example, one can determine the probability that a team will win a game if the game is won by one run, if the team in question scores four runs, if the team is the home team, and if the team did not travel to the game, by simply inserting the appropriate values into the X vector. Along these lines, one can determine the probability that a team will win a particular game for any chosen scenario.

THE RESULTS OF THE LOGIT REGRESSIONS

Table 4 reports the results of five versions of equation (1). The most basic version of equation (1), Model 1, includes only Runs and Home as independent variables. In expanded versions of equation (1), two interaction terms, Home*OneRun and Home*Travel are included as explanatory variables. The Home*OneRun interaction term is included to account for the possibility that the likelihood of the home team winning a game is different in games won by one run than in games won by more than one run. The Home*Travel interaction term is included to account for the possibility that the likelihood of the home team winning a game is different in games to which the home team traveled than in games where it did not travel.

As expected, the regression results indicate that the number of runs a team scores has a statistically significant effect on the probability that it will win the game. Model 1 also supports the hypothesis that a home-field advantage exists in major league baseball games.

Beginning with Model 2, the Home*OneRun interaction term is included in the regressions. When the Home*OneRun interaction term is added to the model, the effect of the Home variable becomes statistically insignificant. The interaction term, on the other hand, is highly significant and positive, indicating that the probability of a team winning a game is higher for the home team than the visiting team only when the game is won by one run. This is an important finding since it reveals that a home-field advantage exists only in games that are won by a single run; in games that are won by more than one run there is no home-field advantage.

The effect of travel is determined beginning with Model 3. The regression results reveal that travel is statistically insignificant, indicating that travel does not affect a team's probability of winning a game. (9) The initial expectation was that traveling to the first game of a series would adversely affect a team's likelihood of winning the game, due to factors such as travel-induced stress and fatigue. An interaction term between the home dummy variable and the travel dummy variable, Home*Travel, is included to examine the possibility that the effect of travel on the probability of winning a game is different for home teams than for visiting teams. The regression results reveal that this variable is also statistically insignificant, which indicates that travel does not affect the home team's probability of winning a game differently than the visiting team's probability. (10)

The marginal effects on the probability of winning a game for each of the variables included in the logit models are reported in Table 5. The marginal effect from Model 1 indicates that a home team's probability of winning a game is .0925 larger than that of a visiting team when the run differential is ignored. When the run differential is considered, though, the marginal results indicate that the probability of a home team winning a game is about .22 higher when the game is won by one run than when it is won by more than one run. The marginal results also indicate that by scoring one run more than the average, a team's probability of winning a game increases by about .14.

To determine the relationship between a team's probability of winning a game and the number of runs it scores, we insert the regression coefficients of Model 2 into equation (2) and solve. Table 6 reports the probability of a team winning a game based on the number of runs it scores for three categories of teams: the home team in games won by one run; the home team in games won by more than one run; and the visiting team. Several interesting results emerge. In low scoring games (1 or 2 runs), the probability that the home team wins the game is more than twice as large as that of the visiting team if the game is won by one run. In moderately low scoring games (3 or 4 runs), the probability of the home team winning the game is at least 60 percent larger than for the visiting team if the game is won by one run. In games where the number of runs scored is slightly above the season average of 4.81 runs (5 or 6 runs), the probability of the home team winning the game is at least 25 percent larger than for the visiting team if the game is won by one run. (11) In all cases, when the game is won by more than one run the probability of a home team winning a game is only minimally higher than that of the visiting team.

The concept of the Linear Weights System (Palmer and Thorn, 2004) is also incorporated into equation (1). The essence of the Linear Weights System is that the number of runs a team scores in a game is determined by its ability to get runners on base and by its ability to advance the runners once they reach base. To incorporate this concept into the model, two regressions, in which the Runs variable is replaced with a set of variables that measure the ability of the team to get runners on base and to advance the runners, are estimated.

The results of these regressions are reported in Table 4 and are listed as Model 4 and Model 5. The variables that measure the team's ability to get runners on base and to advance runners (i.e., performance variables) all have the expected effect. The results of the variables related to the home-field advantage, the effect of travel, and the effect of a game being won by one run are consistent with the previous regressions. The performance variables that have a positive and statistically significant effect on the probability of a team winning a game are Singles, Extra, Home Runs, OBP, SLG, BBHBP, Net Steals, and SH. The variables that have a negative and statistically significant effect on the probability of a team winning a game are SO and GIDP. These results suggest that teams that are more successful at getting runners on base and then advancing the runners during a game are more likely to win the game than teams that are less successful at doing so.

As in Model 1-3, Home is statistically insignificant in Models 4-5, indicating that there is no home-field advantage, per se. (12) The interactive term, Home*OneRun, is again positive and statistically significant, indicating that the home team has an advantage over the visiting team only in games that are won by one run; in games that are won by more than one run there is no home-field advantage. Consistent with the results of Model 3, the two travel-related variables are statistically insignificant, indicating that travel does not affect the probability of either the home team or visiting team winning a game.

The marginal effects of the variables in Models 4-5 are reported in Table 5. The probability of a home team winning a game that is won by one run is between .15-.19 larger than that of the visiting team. This result is not trivial, given that 639 of the 2,428 games (26.3% of the games) played during the 2004 season were won by one run. A typical team then played approximately 43 games that were won by only one run. If the probability of the home team winning such games is between .15 and .19 higher than for the visiting team, it suggests that home teams would be expected to win 24 or 25 of the 43 games while visiting teams would only be expected to win 17 or 18 of the games.

SUMMARY AND CONCLUDING REMARKS

The primary purpose of this paper has been to expand sabermetric knowledge by examining the effect of the home-field advantage on a team's probability of winning a major league baseball game. Although it is commonly believed that the home team has a substantial advantage in major league baseball games, the home-field advantage is an aspect of baseball that has largely been ignored in prior research. Birnbaum (2004, P.973) noted that although historically home teams have won about 54 percent of their games, the question of why they enjoy such an advantage "is one of the largest unresolved issues in sabermetric research."

While a simple analysis of the data that focuses only on the number of wins and losses by home teams and visiting teams supports the contention of a home-field advantage, a more sophisticated analysis indicates that a home-field advantage actually exists only in very close games. In fact, the regression results in this paper indicate that there is virtually no difference between the probability that the home team will win a game and the probability that the visiting team will win the game when the game is won by more than one run. Since about 26 percent of the games played during the 2004 season were won by one run, the results of this study imply that a home-field advantage exists in only about one-quarter of major league baseball games. The results further indicate that the home team advantage in games won by one run is much larger than the eight-percentage point advantage implied by a simple analysis of the data.

The major finding of this study is that the home-field advantage in major league baseball is much more limited than is commonly believed. Rather than existing across all types of games, the home-field advantage exists only in very close games. In games that are decided by more than one run, the home team and visiting team are equally likely to win the game. This paper, has furthered our understanding of the home-field advantage and, as such, has begun to resolve what Birnbaum (2004, p.973) states is one of the largest unresolved issues in sabermetric research. The next step in further resolving the issue should be to examine in more detail differences in games won by one run and games won by more than one run to see if these differences explain why the home team is so much more successful in the games won by one run. This might involve an inning-by-inning analysis of a sample of baseball games to determine if some specific situation that gives the home team the advantage arises predominately in game won by one run. If so, then this would explain why home teams are much more successful in games won by one run than in games won by more than one run.

ENDNOTES

(1) There were only two cases where the first place team in a division won at least 13 more games than the second place team. In 2004, the first place St. Louis Cardinals won 13 more games than the second place Houston Astros in the National League's Central division. In 2003, the first place San Francisco Giants won 15 more games than the second place Los Angeles Dodgers in the National League's West division (Major League Baseball website, http://www.mlb.com).

(2) In major league baseball, unlike most other professional sports, two teams generally play several games against each other over consecutive days. Typically, three or four games are played over a three or four day period. Of the 2,428 games played during the season, 772 were the first scheduled game of a series. The home team traveled to 328 of these games.

(3) Lindsey's formula is Runs = .41(1B) + .82(2B) + 1.06(3B) + 1.42(HR), where Runs is the number of runs scored, 1B is the number of one-base hits, 2B is the number of two-base hits, 3B is the number of three-base hits, and HR is the number of home runs. The formula measures the contribution of each type of hit to a team's run production.

(4) We ran a regression, using data from the 1990-2004 seasons on all major league teams, where the number of games a team won during the season was regressed on the number of runs it scored and the number of runs it allowed during the season. The results indicate that the number of runs a team scores during a season positively and significantly affect the number of games it wins. The results of the regression are not reported here.

(5) The 14 variables collected are at-bats, runs, hits, walks (BB), strikeouts, two-base hits, three-base hits, home runs, sacrifice hits, sacrifice flies, ground into double or triple plays, stolen bases, caught stealing, and hit by pitch (HBP).

(6) We ran an OLS regression using the dataset utilized in this study, with runs scored by a team as the dependent variable. We find that an additional single in a game induces a team to score an extra .5 runs while an additional strikeout reduces the number of runs it scores by .07. Since the difference in mean singles and strikeouts are .34 and .56, respectively, this implies a difference of about .13 runs between a team that travels and a team that does not travel, a relatively small difference. The full results of the regression are not reported here.

(7) There were 639 games during the 2004 season where the run difference between the winning and losing team was one run. The home team won 392, or 61.3%, of these games. There were 1,789 games where the run difference exceeded one run. The home team only won 907, or 50.7%, of these games.

(8) Discussions of logit models are presented in Aldrich and Nelson (1984), Greene (1997), Pindyck and Rubinfeld (1991), and Ghosh (1991).

(9) We also ran regressions where a series of categorical variables related to the distance traveled to arrive at a game replaced the travel dummy variable. Like the travel dummy variable, the effects of the distance variables were statistically insignificant. The results of the regressions are not reported here.

(10) To further examine whether or not travel affects the home team, we ran regressions where the sample was home teams in the first game of a series. There were 772 observations in these regressions. These regressions correspond to Models 3-5 reported in Table 4, with the Home, Home*OneRun, and Home*Travel variables excluded. Consistent with the results reported in Table 4, the results indicate that travel does not significantly affect the probability of the home team winning a game. The results of the regression are not reported here.

(11) Based on equation (2), the results of Model 2 in Table 6 reveal that a .50 probability of winning a game occurs at 4.90 runs for visiting teams, at 4.70 runs for home teams in a game won by more than one run, and at 3.11 runs for home teams in a game won by one run. This suggests that in games won by one run, home teams need fewer runs, on average, to win than in games won by more than one run.

(12) We also ran a two regressions comparable to Models 4 and 5 reported in Table 5 that included the Home variable but excluded the Home*OneRun interaction term. The Home variable was statistically significant and positive at the .05 level in both equations. The same coefficients that were statistically significant in Table 4 were statistically significant in these regressions. The results of these regressions are not reported here.

REFERENCES

Albert, J. (1994). Exploring baseball hitting data: What about those breakdown statistics? Journal of the American Statistical Association, 89, 1066-1074.

Albright, S. C. (1993). A statistical analysis of hitting streaks in baseball. Journal of the American Statistical Association, 88, 1175-1183.

Aldrich, J. H. & Nelson, F. D. (1984). Linear probability, logit, and probit models. Beverly Hills, CA: Sage Publications.

Birnbaum, P. (2004). Sabermetrics. In J. Thorn, et. al. (Eds.), Total baseball: The ultimate baseball encyclopedia (8th ed.) (pp. 963-975). Wilmington, DE: Sport Media Publishing.

Ghosh, S. K. (1991). Econometrics: Theory and applications. Englewood Cliffs, NJ: Prentice Hall.

Gius, M. P. & T. R. Hylan. (1996). An interperiod analysis of the salary impact of structural changes in major league baseball: Evidence from panel data. In J. Fizel, E. Gustafson, & L. Hadley (Eds.), Baseball Economics: Current Research. Westport, CT: Praeger.

Grabiner, D. The sabermetric manifesto. (n.d.) Retrieved on January 17, 2005 from http://www.baseball1.com/bbdata/grabiner/manifesto.html

Greene, W. H. (1997). Econometric analysis. Upper Saddle River, NJ. Prentice Hall.

Lindsey, G. (1963). An investigation of strategies in baseball. Operations Research, 11, 447-501.

Morong, C. (2004). Historical trends in home-field advantage. The Baseball Research Journal, 32, 100-102

Palmer, P., & Thorn, J. (2004). Linear weights. In J. Thorn, et. al. (Eds.), Total baseball: The ultimate baseball encyclopedia. (8th ed.) (pp. 976-979). Wilmington, DE: Sport Media Publishing, Inc.

Pindyck, R. S. & Rubinfeld, D. L. (1991). Econometric models and economic forecasts. New York, NY: McGraw-Hill.

William Levernier, Georgia Southern University

Anthony G. Barilla, Georgia Southern University

Table 1: Home Wins, Home Losses, Visiting Wins, Visiting Losses, and
Home-Field Advantage

Team Home W Home L Visiting W Visiting L HF Adv

Anaheim 45 36 47 34 -.0247
Arizona 29 52 22 59 .0864
Atlanta 49 32 47 34 .0247
Baltimore 38 43 40 41 -.0247
Boston 55 26 43 38 .1481
Chicago (NL) 45 37 44 36 -.0012
Chicago (AL) 46 35 37 44 .1111
Cincinnati 40 41 36 45 .0494
Cleveland 44 37 36 45 .0988
Colorado 38 43 30 51 .0988
Detroit 38 43 34 47 .0494
Florida 42 38 41 41 .0250
Houston 48 33 44 37 .0494
Kansas City 33 47 25 57 .1076
Los Angeles 49 32 44 37 .0617
Milwaukee 36 45 31 49 .0569
Minnesota 49 32 43 38 .0741
Montreal 35 45 32 50 .0473
New York (NL) 38 43 33 48 .0617
New York (AL) 57 24 44 37 .1605
Oakland 52 29 39 42 .1605
Philadelphia 42 39 44 37 -.0247
Pittsburgh 39 41 33 48 .0801
St. Louis 53 28 52 29 .0123
San Diego 42 39 45 36 -.0370
San Francisco 47 35 44 36 .0232
Seattle 38 44 25 55 .1509
Tampa Bay 41 39 29 52 .1545
Texas 51 30 38 43 .1605
Toronto 40 41 27 53 .1563
All Teams 1299 1129 1129 1299 .0700

Notes: The home-field advantage is the difference between the
proportion of games won as the home team and the proportion of games
won as the visiting team.

Table 2: Variable Definitions

Variable Definition

Runs The number of runs scored by the team
At-Bats The number of times the team's hitters officially
 batted.
Hits The team's number of hits
Singles The team's number of one-base hits
Extra The team's combined number of two-base hits and
 three-base hits
HR The number of home runs hit by the team
BBHBP The number a times the team's batters reach base on
 a walk or on a hit-by-pitch
SB The team's number of stolen bases
CS The number of times a team's runners are caught
 stealing
Net Steals The team's number of stolen bases minus its number
 of caught stealing
GIDP The number of times the team ground into a double or
 triple play
SH The number of times the team advanced a runner with
 a sacrifice bunt
SF The number of times the team scored a run with a
 sacrifice fly
OBP (1) The team's on-base-percentage
Total Bases (2) The team's number of total bases
SLG (3) The team's slugging percentage
Home A dummy variable that takes a value of 1 if the team
 is the home team, 0 if not
OneRun A variable that takes a value of 1 if the run
 differential in the game is 1 run, 0 if not
Home * OneRun An interactive term, Home multiplied by OneRun
Travel A variable that takes a value of 1 if the game is
 the first scheduled game of a series and the team
 had to travel to arrive at the game, 0 if not
Home*Travel An interactive term, Home multiplied by Travel

Notes: (1) OBP is calculated as (Hits + BBHBP)/(At Bats + BBHBP + SF)

(2) Total bases is calculated as (Hits + Two-base hits + (Three-base
hits * 2) + (Home Runs * 3))

(3). SLG is calculated as (Total Bases)/(At bats + SF)

Table 3: Means and Standard Deviations of Selected Variables by Team
Classification

Variable All Home Visitor Travel Non-Travel

Runs 4.814 4.829 4.798 4.913 4.837
 (3.218) (3.122) (3.312) (3.252) (3.139)
Singles (a, b) 6.024 5.925 6.124 6.124 5.780
 (2.729) (2.671) (2.783) (2.746) (2.503)
Extra 2.022 2.002 2.041 2.023 1.944
 (1.514) (1.503) (1.525) (1.512) (1.532)
HR 1.123 1.117 1.129 1.184 1.192
 (1.133) (1.117) (1.150) (1.178) (1.178)
BBHBP 3.722 3.785 3.658 3.724 3.788
 (2.275) (2.295) (2.255) (2.206) (2.461)
SO (a, b) 6.554 6.239 6.869 6.718 6.156
 (2.762) (2.713) (2.775) (2.841) (2.815)
Net Steals .307 .313 .300 .308 .359
 (.948) (.941) (.956) (.960) (.953)
GIDP (a) 780 .747 .813 .783 .817
 (.858) (.845) (.870) (.827) .878)
SH .356 .360 .353 .361 .323
 (.613) (.615) (.611) (.623) (.579)
OBP (a) .328 .334 .322 .327 .330
 (.083) (.084) (.081) (.083) (.086)
SLG (a) .419 .427 .411 .421 .429
 (.156) (.159) (.153) (.158) (.163)
Total Bases 14.742 14.590 14.895 15.080 14.655
 (6.442) (6.202) (6.672) (6.662) (6.357)
Observations 4856 2428 2428 1095 449

Notes: Standard deviations are shown in parenthesis.

(a) indicates there is a statistically significant difference at the
.05 level between the mean value of home teams and visiting teams

(b) indicates there is a statistically significant difference at the
.05 level between the mean value of traveling teams and non-traveling
teams

The team classifications are defined as follows:

All includes all teams in all games played.

Home includes only the home team in all games played.

Visiting includes only the visiting team in all games played.

Travel includes the traveling team(s) in the first scheduled game of a
series.

Non-Travel includes the non-traveling team, if any, in the first
scheduled game of a series.

Table 4: Results of Logit Regressions with Win as the Dependent
Variable, Expanded Model

Variable Model 1 Model 2 Model 3

Runs .5436 (a) .5538 (a) 5538 (a)
 (32.862) (33.060) (33.055)
Singles

Extra

Home Runs

OBP

SLG

BBHBP

SO

Net Steals

GIDP

SH

Home .3715 (a) .1115 .1205
 (5.258) (1.428) (1.376)
Home * OneRun .8816 (a) .8811 (a)
 (7.994) (7.988)
Travel .0072
 (.066)
Home*Travel -0.0575
 (.313)
Observations 4856 4856 4856
Log Likelihood -2426.30 -2393.55 -2393.49
Num. Correct (b) 3645 3668 3668

Variable Model 4 Model 5

Runs

Singles .2047 (a)
 (15.180)
Extra .3365 (a)
 (14.087)
Home Runs .6358 (a)
 (18.839)
OBP 11.4498 (a)
 (16.523)
SLG 4.5812 (a)
 (13.247)
BBHBP .1999 (a)
 -12.406
SO -.1068 (a)
 -8.314
Net Steals .1874 (a) .2033 (a)
 -5.195 (5.488)
GIDP -.3019 (a) -.3867 (a)
 (7.309) (9.102)
SH .3498 (a) .2664 (a)
 (6.060) (4.469)
Home .1454 -.0586
 (1.756) (.681)
Home * OneRun .5872 (a) .7768 (a)
 (5.459) (6.906)
Travel .0005 .0106
 (.005) (.101
Home*Travel -.0214 -.0175
 (.124) (.096)
Observations 4856 4856
Log Likelihood -2635.29 -2472.04
Num. Correct (b) 3520 3612

Notes

The absolute values of the t-statistics are shown in parenthesis.

The regressions were run with a constant term, the results of which are
not reported here

(a) denotes statistically significant at the .05 level or higher

(b) denotes the number of correct predictions. If the predicted
probability for a team exceeds .5, the team is the predicted winner of
the game.

Table 5: Marginal Effects on the Probability of Win=1

Variable Model 1 Model 2 Model 3

Runs .1353 (a) .1379 (a) .1379 (a)
Singles
Extra
Home Runs
OBP * 100
SLG * 100
BBHBP
SO
Net Steals
GIDP
SH
Home .0925 (a) .0278 .0300
Home * OneRun .2196 (a) .2194 (a)
Travel .0018
Home * Travel -.0143

Variable Model 4 Model 5

Runs
Singles .0511 (a)
Extra 0841 (a)
Home Runs .1589 (a)
OBP * 100 .0286 (a)
SLG * 100 .0115 (a)
BBHBP .0500 (a)
SO -.0267 (a)
Net Steals .0468 (a) .0508 (a)
GIDP -.0754 (a) -.0967 (a)
SH .0874 (a) .0666 (a)
Home .0363 -.0146
Home * OneRun .1467 (a) .1942 (a)
Travel .0001 .0027
Home * Travel -.0054 -.0044

Notes: (a) denotes statistically significant at the .05 level or higher
in the logit regression

Table 6: Probability of Winning a Game, by Runs and Team Category,
Model 2

Runs Home, 1RD Home, 2RD Visiting

1 .2371 .1140 .1032
2 .3510 .1830 .1669
3 .4848 .2804 .2584
4 .6208 .4040 .3775
5 .7401 .5412 .5134
6 .8321 .6723 .6473
7 .8961 .7812 .7615
8 .9375 .8613 .8475
9 .9631 .9153 .9063
10 .9785 .9495 .9439
11 .9875 .9703 .9670
12 .9928 .9827 .9807
13 .9958 .9900 .9888
14 .9976 .9942 .9936
15 .9986 .9967 .9963

Notes:

Home, 1RD denotes the home team in a game where the run differential is
one run.

Home, 2RD denotes the home team in a game where the run differential is
two or more runs.

Visiting denotes the visiting team in a game, without respect to the
run differential.