GROUP LENDING WITH HETEROGENEOUS TYPES.
Gan, Li ; Hernandez, Manuel A. ; Liu, Yanyan 等
GROUP LENDING WITH HETEROGENEOUS TYPES.
I. INTRODUCTION
Group lending is a common practice in many microfinance programs in
developing countries. Given that the poor often lack appropriate
financial collateral, group lending programs are intended to provide a
feasible way of extending credit to poor people who are usually kept out
of traditional banking systems. Group lending allows lending
institutions to rely on information advantages among group members,
rather than on financial collateral, to mitigate information asymmetries
between lenders and borrowers. There is a debate, however, regarding
whether these programs are able to achieve and maintain sound repayment
performance while simultaneously serving poor borrowers (Armendariz de
Aghion and Morduch 2005). It is also frequently argued that the high
transaction costs faced by microfinance institutions in screening their
clients, processing applications, and collecting repayments keep
interest rates high and prevent the programs from further expanding
their operations (Armendariz de Aghion and Morduch 2004; Field and Pande
2008; Shankar 2006).
In this context, a large empirical literature explores the
different factors (including group characteristics) that determine
repayment performance in group lending programs (e.g., Ahlin and
Townsend 2007; Cull, Demirguc-Kunt, and Morduch 2007; Hermes, Lensink,
and Mehrteab 2005; Paxton, Graham, and Thraen 2000; Sharma and Zeller
1997; Wydick 1999; Zeller 1998). However, studies using observational
data are often subject to an endogeneity problem (Hermes and Lensink
2007; Karlan 2007). The groups are typically formed voluntarily based on
a set of common characteristics such as risk type, entrepreneurial
spirit, solidarity, and trust among group members; these characteristics
are generally observed by peers but not by lenders (or econometricians).
The unobserved group heterogeneity resulting from this peer screening
(or peer selection) process affects repayment performance and
potentially correlates with the observed member demographics and proxies
for social ties generally used in single-agent models to account for
group heterogeneity. (1)
Similarly, groups may also differ in the effectiveness of peer
monitoring (and enforcement) among members; this is also unobserved by
lenders and can have direct implications on the repayment performance of
group members. (2) The effectiveness of peer monitoring across groups
may be further correlated with peer screening because individuals who
team up with safe borrowers may exert different effort levels than those
who team up with risky borrowers. Ultimately, decisions made within a
group, including potential coordinated behavior, also depend on the
level of group cohesion. (3)
To control for the unobserved group heterogeneity, some recent
studies resort to a (quasi-) randomization of the group formation
process in particular settings. For example, Karlan (2007) exploited a
unique quasi-random group formation process in a rural town in Peru to
identify social connections and finds evidence of successful peer
monitoring and enforcement of joint liability loans, particularly among
individuals with stronger social ties. Gine et al. (2010) also examined
the impact of a variety of group lending schemes on default and
investment decisions using a controlled laboratory environment in an
urban market in Peru. However, in the majority of group lending
programs, only observational data are available. It is thus desirable to
develop methods that can better control for the endogeneity issue in
observational studies.
This paper intends to fill this gap by proposing and implementing a
finite mixture structure to model group members' repayment behavior
in the presence of unobserved group heterogeneity. In the proposed
mixture structure, the group type summarizes all unobserved individual
and group characteristics within the group. Individuals make repayment
decisions based on their unobserved group type as well as on observable
individual and loan characteristics. Average member characteristics and
other group and village characteristics are used, in turn, to identify
the group types. The proposed method can help to better account for the
endogeneity problem inherent in observational studies, relative to using
traditional probabilistic models, although the endogeneity bias is not
necessarily fully eliminated. Similarly, the model is more informative
because the effect of the factors explaining repayment behavior are
allowed to differ by group type and because the model can help to better
screen between likely defaulters and nondefaulters. This is critical in
the context of micro-lending because better identifying potential group
types and offering them differentiated contracts can help reduce
information asymmetries and increase microlending. We discuss the model
identification and provide evidence supporting the robustness of the
model.
Other studies that use mixture specifications to uncover
heterogeneous behaviors include the work of Keane and Wolpin (1997) to
model heterogeneous endowment ability in career decisions; the work of
Knittel and Stango (2003) to assess whether state-mandated price
ceilings serve as focal points for tacit collusion among credit card
companies; the work of Gan and Hernandez (2013) to examine whether
agglomerated hotels have a higher probability of following collusive
regimes; and the work of Dong, Gan, and Wang (2015) to evaluate varying
neighborhood effects on educational attainment. Most of these studies,
however, do not test the identification of the mixture model proposed.
Our model is applied using a rich dataset of 1,110 group loans,
which were allocated to a total of 12,833 members from a group lending
program in Andhra Pradesh, India. The results provide strong evidence
supporting the existence of two group types. We identify a first group
whose members are more inclined to fulfill their credit obligations
(i.e., a "responsible" group) and a second group whose members
are more inclined to default (i.e., an "irresponsible" group).
We also find important differences in the marginal effects of the
different individual and loan characteristics included in the repayment
equation, suggesting that the underlying factors driving default
behavior are likely to differ across types. For example, the existence
of more stringent repayment schedules and shorter loan terms and the
encouragement of a larger group size seem to be more relevant factors
among "irresponsible" groups. Finally, the type-varying model
shows a higher predictive performance than standard probabilistic
models, particularly in the identification of potential
"defaulters."
Overall, this paper makes three contributions to the literature of
group lending and the literature of mixture models. First, we provide a
new approach to better control for the potential endogeneity issue in
observational studies that look at performance of group lending
programs, relative to standard probabilistic models. Second, we test the
identification of the mixture model proposed, which has generally been
overlooked in the related empirical literature using mixture structures.
Third, we highlight the usefulness of applying a mixture model to
screening "better" versus "worse" groups, which
helps mitigate information asymmetries faced by lenders. In this regard,
our study highlights the suitability of implementing a mixture model in
other related settings such as other group-based programs, personal
loans, insurance markets, and filing decisions.
The remainder of the paper is organized as follows. Section II
presents and discusses in more detail the proposed model to evaluate
repayment decisions with unobserved group heterogeneity. Section III
describes the group lending data used in the application analysis and
reports and discusses the estimation results. Section IV concludes.
II. MODEL
Let the default behavior of individual i in group j be given by
(1) [D.sub.ij] = 1 ([alpha] + [X.sub.ij][[beta].sub.1] + [C.sub.j]
[[beta].sub.2] + [T*.sub.j] + [u.sub.ij] > 0)
where [D.sub.ij] is the observed binary outcome (i.e., [D.sub.ij]
equals one if the individual defaults [i.e., does not fully repay her
loan] and equals zero otherwise), [alpha] is a constant, [X.sub.ij] is a
vector of observable individual characteristics, [C.sub.j] is a vector
of loan characteristics, [T*.sub.j] is the unobserved group type which
is likely correlated with [X.sub.ij] and [C.sub.j], and [u.sub.ij] is an
error term. The correlation between [X.sub.ij] and [T*.sub.j] may
result, for example, from a proxy for an individual's social ties
included in [X.sub.ij] and potentially correlated with the social ties
of her peers (who generally live in the same neighborhood), which partly
describe [T*.sub.j]; the loan terms [C.sub.j] may also be correlated
with the group features that describe [T*.sub.j].
If group heterogeneity is solely based on observables, the observed
group characteristics [W.sup.o.sub.j] such as average member
characteristics and other group controls, including social ties, would
be sufficient to identify the group types, and [W.sup.o.sub.j] could be
used as a proxy for [T*.sub.j] to estimate Equation (1) using a standard
probabilistic regression (e.g., probit or logit). However, the
unobserved group type is more accurately characterized by both
observable and unobservable factors such that [T*.sub.j] =
[W.sup.o.sub.j][delta] + [W.sup.u.sub.j] + [[epsilon].sub.j], where
[W.sup.u.sub.j] is unobserved, [W.sup.o.sub.j] and [W.sup.u.sub.j] are
potentially correlated, and [[epsilon].sub.j] is an error term.
Following the previous example, a proxy for social ties or connections
of a group, included in [W.sup.o.sub.j], is likely correlated with the
unobserved entrepreneurial spirit or economic opportunities of group
members, which are comprised in [W.sup.u.sub.j] and further affect
repayment.
Hence, a standard probabilistic regression of Equation (1) with
only [W.sup.o.sub.j] in the right-hand side results in an omitted
variable bias, as [W.sup.u.sub.j] is embedded in the error term. Another
option is to incorporate the unobserved group component or type as fixed
effects in a conditional logit model. However, a fixed-effect logistic
regression mainly exploits within-group variation and will drop all
groups without intragroup differences in default behavior. (4)
Furthermore, the observed factors affecting repayment performance may
vary by group type.
We alternatively propose a finite mixture structure in which the
unobserved group heterogeneity can be captured by allowing groups to be
of a certain type. We can assume the existence of N different group
types but in practice we select the number of types that best fits our
data based on different selection criteria. In particular, we considered
models between two and three types (i.e., N = 2,3) and find that a
two-type specification is preferable to a three-type specification. (5)
The two-type model exhibits a lower Schwarz Bayesian Information
Criterion (SBIC), while a likelihood ratio test shows that the
three-type model does not provide a better fit than the two-type model.
(6)
We assume then that [T*.sub.j] can take two possible values:
[T*.sub.j] = [T.sup.H.sub.j] or [T*.sub.j] = [T.sup.L.sub.j]. We can
think of [T.sup.H.sub.j] as type-H or "responsible" groups and
[T.sup.L.sub.j] as type-L or "irresponsible" groups. The
repayment behavior of individual i in group j is given by (2)
[mathematical expression not reproducible].
By assuming [T*.sub.j] to be categorical (in this case, take two
possible values), the effect of group heterogeneity is absorbed by the
constant terms [[alpha].sub.H] and [[alpha].sub.L], while the covariance
between [X.sub.ij], [C.sub.j], and [u.sub.ij] is zero. A direct
implication of this specification is that the constant terms are
different for different types; specifically, [[alpha].sub.H] <
[[alpha].sub.L] as the type-H group is regarded as the
"responsible" group. The coefficients of the control variables
are also allowed to differ across types, which permits us to capture
varying effects of different factors on the repayment behavior by type.
(7)
Since the type is unobserved, it can only be determined with a
probability. We can further assume that the probability of being of a
certain group type varies with some observable characteristics
[W.sup.o.sub.j]. That is, we can correlate the apparent group types with
specific observable characteristics, which can be useful for screening
purposes among credit institutions. For example, if [W.sup.o.sub.j] =
([[bar.X].sub.j], [G.sub.j]), the probability of being in type-H group
can be modeled as
(3)
Pr ([T*.sub.j] = [T.sup.H.sub.j]) = Pr
([[bar.X].sub.j][[delta].sub.1] + [G.sub.j][[delta].sub.2] + [v.sub.j]
> 0)
where [[bar.X].sub.j] is a vector of average (leave-me-out)
characteristics of group members, [G.sub.j] is a vector of group and
village controls, and [v.sub.j] is an error term. The probability of
being in type-L group is, in turn, given by Pr ([T*.sub.j] =
[T.sup.L.sub.j]) = 1 -Pr ([T*.sub.j] = [T.sup.H.sub.j]).
Overall, in the proposed specification, the probability of default
is conditional on the unobserved group type ([T*.sub.j]) and depends on
observable individual and loan characteristics ([X.sub.ij] and
[C.sub.j]), while average member characteristics and other group and
village characteristics ([[bar.X].sub.j] and [G.sub.j], observed by
lenders prior to giving a loan) can help to identify the group type to
which individuals belong. Member characteristics may include, for
instance, education, asset ownership, housing condition, and occupation,
which is information generally disclosed during credit application
processes. Standard loan characteristics include loan amount, interest
rate, length of loan, and repayment frequency. The other group and
village controls used to identify the group type may include group age,
number of members, location, and access to programs and services.
Some factors are then directly included in the repayment Equation
(2), while other factors indirectly affect the likelihood of repayment
through the modeled group type Equation (3). We can still quantify the
(indirect) effect of the variables included in the type equation on the
probability of repaying. More specifically, we can recover unconditional
marginal effects of the variables included in the type equation on the
likelihood of repaying, as shown below. Certainly, there can be some
discussion regarding which variables should be included in the modeled
Equations (2) and (3), which is similar to the discussion when
estimating a selection model. In our application, the specification
above provides the best fit for the data. (8)
The resulting unconditional probability of default is equal to
(4a) [mathematical expression not reproducible].
Similarly,
(4b) Pr ([D.sub.ij] = 0) = [summation over (K=H,L)] Pr([D.sub.ij] =
0|[T*.sub.j] = [T.sup.K.sub.j]) x Pr ([T*.sub.j] = [T.sup.K.sub.j]).
If the error terms in Equations (2) and (3) have a F(*) and J(*)
cumulative distribution function (CDF), the estimated log likelihood for
individual i in group j is given by
(5) [mathematical expression not reproducible].
We approximate F(*) and J(*) with logistic CDFs and follow an
iterative procedure for the parameters estimation. (9)
The proposed model belongs to the class of finite mixture density
models. The identification of these models has been extensively studied
in recent years (see Fox and Gandhi 2008; Gan, Huang, and Mayer 2015;
Henry, Kitamura, and Salanie 2014; Hu 2008; Lewbel 2007; Mahajan 2006).
In particular, Henry, Kitamura, and Salanie (2014) showed that under the
following assumptions, the mixture density model with unobserved
heterogeneity, such as the one defined above, is nonparametrically
identified.
ASSUMPTION 1. (Mixture). The probability of belonging to a certain
group type depends on a set of characteristics, which are not all
necessarily observable; that is, depends on [W.sub.j] =
([W.sup.o.sub.j], [W.sup.u.sub.j]).
ASSUMPTION 2. (Exclusion Restriction). Conditional on the group
type, both observable and unobservable factors that characterize
[T*.sub.j] are not related to the probability of defaulting; that is, Pr
([D.sub.ij] = 1|[T*.sub.j] = [T.sup.K.sub.j], [W.sup.o.sub.j],
[W.sup.u.sub.j]) = Pr([D.sub.ij] = 1[parallel][T*.sub.j] =
[T.sup.K.sub.j])for K = H, L.
The second assumption is the key identifying assumption, which
implies that [W.sup.o.sub.j] = ([[bar.X].sub.j], [G.sub.j]) and
[W.sup.u.sub.j] are conditionally independent of the errors in Equation
(2); that is, [[bar.X].sub.j], [G.sub.j], [W.sup.u.sub.j] [perpendicular
to] [u.sub.ij,K] | [X.sub.ij], [C.sub.j], [T*.sub.j] = [T.sup.K.sub.j]
for K = H,L. Any association between [W.sub.j] and the probability of
default is driven solely by the association between these variables and
the probability of being of a certain group type. Mahajan (2006)
referred to [W.sub.j] as instrumental-like variables (ILV). (10)
Intuitively, the identification is similar to the requirement of
instrumental variables in a two-stage least squares (2SLS) procedure, in
which the instrumental variable is supposed to be correlated with the
unobserved type variable but not correlated with the error term. (11)
Assumption 2 further implies that group heterogeneity in the
proposed mixture structure can be fully controlled by only using a
partial set of variables in [W.sub.j]. Hence, we require some but not
all information about the factors describing group heterogeneity
([T*.sub.j]) to identify the parameters in repayment Equation (2).
Following Henry, Kitamura, and Salanie (2014) and Gan, Huang, and Mayer
(2015), using the full set of [W.sup.o.sub.j] or a subset of
[W.sup.o.sub.j] should produce consistent estimates of the parameters in
the filing Equation. A Hausman-type specification test can then be
implemented comparing the estimated coefficients in Equation (2) using
the full set of [W.sup.o.sub.j] versus the estimates using a subset of
[W.sup.o.sub.j]. This test is similar to an over-identification test in
an instrumental variables approach. Failing to reject the null
hypothesis of no systematic differences between the estimated
coefficients provides supporting evidence for the model identification.
Henry, Kitamura, and Salanie (2014) argued that under Assumptions 1
and 2, we can obtain a sharp boundary for both the probability of being
of a certain group type (i.e., mixture weights) and the probability of
defaulting conditional on your type (i.e., mixture components).
Furthermore, point identification can be achieved for the two-type case
under Assumptions 1 and 2 when one type dominates in the left tail of
the default distribution and the other type dominates in the right tail.
(12) This is satisfied in our case by the restriction that the error
terms [u.sub.ij, H] and [u.sub.ij,L] in Equation (2) follow the same
distribution but [[alpha].sub.H] < [[alpha].sub.L]. We can then
formulate the following argument.
ARGUMENT 1. Under Assumptions 1 and 2 and [[alpha].sub.H] <
[[alpha].sub.L], the two-type mixture structure summarized in Equations
(4) and (5) is uniquely identified.
Appendix A in Appendix SI, Supporting information, presents a
simple simulation exercise to better illustrate the advantages of using
a mixture structure such as the one described above when evaluating
default behavior with heterogeneous agents, compared to a standard logit
model. The exercise shows that even in the absence of heterogeneity, a
mixture structure can provide both a higher predictive performance and
more accurate marginal effects (i.e., the effect of changes in a
covariate on the probability of defaulting) than a logit model, although
the bias in the marginal effects is not fully eliminated.
III. AN APPLICATION TO A GROUP LENDING PROGRAM IN INDIA
Next, we implement the proposed two-type model using data from a
group lending program in India. We first describe the dataset and then
present the estimation results.
A. Data
The groups under study are located in the state of Andhra Pradesh
in India. (13) They are organized following a recent self-help group
(SHG) model promoted by the World Bank, which targets poor women in
rural areas and combines savings generation and microlending with social
mobilization. In this program, women who generally live in the same
village or habitat voluntarily form SHGs. A typical SHG consists of
10-20 members who meet regularly to discuss social issues and
activities. During the group meetings, each member also deposits a small
thrift payment into a joint bank account. Once enough savings have been
accumulated, group members can apply for internal loans that draw from
the accumulated savings at an interest rate to be determined by the
group. After the group establishes a record of internal savings and
repayment, it becomes eligible for loans through a commercial bank or
program funds. (14)
The group as a whole, then, borrows from a commercial bank or
program funds; all group members are held jointly liable for the debts
of the others. The group generally allocates the loan to its members on
an equal basis, and the group is not eligible for further loans unless
it has made full repayment. (15) In this study, we focus on the first
"expired" loan borrowed from commercial banks by each group.
An "expired" loan refers to a loan that had passed its due
date by the time the survey was conducted.
The working sample includes 1,110 different group loans which were
allocated to a total of 12,833 members. The data are from a SHG survey
conducted between August and October 2006 in eight districts in Andhra
Pradesh, which were chosen to represent the state's three
macro-regions (Rayalaseema, Telangana, and Coastal Andhra Pradesh). The
SHG survey contains socioeconomic characteristics of group members
(households) such as education background, housing condition, land and
livestock ownership, occupation, and caste. It also includes group
characteristics such as age, meeting frequency of members, and programs
and services available within the group. In addition, the survey
directly recorded from SHG account books the information regarding all
group loans that were taken between June 2003 and June 2006. The
information includes the terms of each loan, the group members to whom
the loan was allocated, and how much of the loan had been repaid by each
member at the time of the survey. (16)
The SHG survey was complemented with a previous village survey that
covered all the villages from which the SHGs were sampled. We use this
database to construct four indicators to account for the economic
environment at the village level, including availability of a financial
institution, public bus, telephone, and post office.
Table 1 presents descriptive statistics of the full sample. (17)
The top panel (Panel 1) reports member characteristics based on 12,833
observations, while the bottom panel (Panel 2) reports group and loan
characteristics based on 1,110 observations. The group characteristics
are determined prior to the start of the loan. Approximately 23% of the
group members are literate, 31% belong to a scheduled tribe or scheduled
caste, and about 65% own some land. About 61% are agricultural laborers
who do not own land or own such a small amount of land that they have to
provide agricultural labor for others, 20% are self-employed
agricultural workers, and the rest have other occupations. We observe
that 80% of the group members in our sample fully repaid their loan by
its due date (i.e., did not default).
Turning to the group and loan characteristics, the groups range
from 7 to 20 members and have close to 13 members on average. In roughly
nine of every ten groups, the members meet on a regular basis (at least
monthly). About 28% of the groups have a food credit program (in-kind
credit for subsidized rice), 15% have a marketing program, and 25% have
an insurance program. The average loan size received by a group member
is 3,338 rupees (about 67 USD). The annual rate of interest is about
12.8%, which is much lower than the prevailing rate of moneylenders in
India. The average duration of a loan is roughly 1 year, and the vast
majority of loans required the groups to make repayments at least
monthly.
Preliminary Analysis. A first look at the data is indicative of a
bimodal repayment distribution. Table 2 shows that in more than nine out
of every ten groups in our sample, either all of the members do not
default or all of them do default. In particular, in 76% of the groups
(848 out of 1,110 groups), all of the group members fully repaid their
loans or never defaulted; in another 17% of the groups (188 groups), all
of the members defaulted. As discussed earlier, this repayment behavior
may result from a combination of unobservable group factors. We can
think then of two apparent group types: "responsible" and
"irresponsible" groups. (18)
To further examine the possibility of homogeneous sorting among
groups, Table B2 of Appendix B in Appendix S1 reports the number of
groups in which the intragroup variance is less than or equal to the
total variance, considering all groups in the same village and mandal
for different borrower characteristics. (19) The characteristics include
literacy, household characteristics, land ownership, occupation, and
caste. The results show that individuals with similar observable
characteristics appear to group together. On average, in 70-72% of the
cases, the intragroup variance for a given characteristic is smaller
than the intravillage or intramandal variance. There is a relatively
higher degree of homogeneity among group members in terms of belonging
to a scheduled tribe or caste and being a self-employed agricultural
worker.
B. Estimation Results
Table 3 shows the estimation results of the mixture model proposed.
The model allows for two group types (type H and type L), and the
repayment decision is conditional on the unobserved type, where the
marginal effects of the member and loan characteristics may vary by
type. The average member characteristics and other group and village
controls help, in turn, to identify the group types. (20)
Several important patterns emerge from this table. First, the
conditional probability of default is considerably different between the
two group types, as reported at the bottom of the table. More
specifically, the estimated probability of default conditional on being
in a group of type-H individuals is 9.5% versus 62.8% in a group of
type-L individuals. Hence, the model clearly distinguishes two group
types: one "responsible" type (type H) and another
"irresponsible" type (type L). The former group is likely
composed of "low-risk" individuals with a strong social
cohesion and/or effective enforcement, while the latter group is likely
composed of "high-risk" individuals with a weak social
cohesion and ineffective enforcement.
Similarly, the average probability of being a type-H group is
roughly 80% in our sample; interestingly, groups in which all members
pay back their loan exhibit a higher probability of being a type-H group
than other groups. (21) In particular, in groups in which none of the
members defaulted, the likelihood of being a type-H group is 82.9%; this
is compared to 76.4% in groups in which some members defaulted and 66.9%
in groups in which all members defaulted. These results further support
the model's identification of seemingly "responsible" and
"irresponsible" groups.
An analysis of the factors used to describe the probability of
being in a type-H group also indicates that "responsible"
groups are more likely characterized by women who are literate, own some
portion of land, live in semi-pucca houses, participate in agricultural
activities, and belong to a scheduled tribe (but not necessarily to a
leading caste). (22) Similarly, "responsible" groups are more
prone to hold frequent meetings, have a marketing and insurance program
but not a food credit program, and have access to additional services in
the village such as a financial institution and telephone. This suggests
that lenders may want to look for these characteristics when trying to
identify potential "responsible" groups and areas in which to
operate or expand.
Holding frequent meetings appear to be particularly important. This
is in line with Rai and Sjostrom (2004), who emphasized the importance
of information sharing to sustain repayment in group lending. It is also
in line with other studies that suggest that frequent meetings, in
addition to helping peer monitoring and enforcement, may directly
increase social contact and reduce lending risks. Feigenberg, Field, and
Pande (2013) showed, for instance, that repeated interactions can
facilitate cooperation by allowing individuals to sustain reciprocal
economic ties; Gine and Karlan (2014) found that groups with stronger
social networks are less likely to experience default problems after
removing joint liability. (23) The existence of other programs in the
group (like marketing and insurance programs) could also stimulate
social cooperation and strengthen social ties, in addition to providing
additional services to members, thereby increasing risk-sharing among
members. (24)
Figure B2 of Appendix B in Appendix SI provides additional support
to the correct identification of "responsible" and
"irresponsible" groups, based on observed behavior patterns in
the data. For example, the probability of being a type-H
("responsible") group is positively correlated with the
proportion of literate women in the group; a closer look at the data
shows that among groups in which more than half of the women are
literate, there is a higher proportion of groups with no members
defaulting (82%) and a lower proportion of groups with all members
defaulting (13%), compared to groups in which less than half of the
women are literate (76% and 17%, respectively). The differences are more
pronounced when comparing the distribution of intragroup default
behavior between groups with high- and low-meeting frequencies. Among
groups that hold at least monthly meetings, which is also distinctive of
type-H groups, the proportions of groups with no members defaulting and
all members defaulting are 80% and 14%; among groups that hold less than
monthly meetings, the corresponding proportions are 48% and 41%. (25)
These findings suggest that several of the factors included in the
type-probability equation help to identify potential group types and, in
particular, that the types in the model are not purely identified by
functional form. We further discuss the model identification below.
Conditional Marginal Effects. Another important pattern that
emerges from Table 3 is the difference in direction and statistical
significance of several of the parameter estimates in the default
equation between the two group types. This suggests that the factors
driving individual repayment behavior may vary by type. Table 4 shows
the conditional marginal effects (evaluated at the sample means) for the
different individual and loan characteristics included in the repayment
equation after accounting for the group type; that is, the estimated
effect of a change in each covariate on the probability of defaulting,
conditional on being of a certain group type and keeping all else equal.
(26)
We do not observe major changes in the probability of default among
type-H group members after a change in most of the individual
covariates; being a self-employed agricultural worker and living in
pucca house both decrease the probability of default by roughly 3 and 1
percentage point, respectively, while owning some portion of land
increases the likelihood of defaulting by less than 1 percentage point.
Among type-L group members, in contrast, being a self-employed
agricultural worker increases the probability of default by 14
percentage points; being an agricultural laborer also substantially
increases the likelihood of defaulting (29 percentage points), as does
belonging to a scheduled caste (31 percentage points). Owning some
portion of land or living in either pucca or kutcha houses (relative to
semi-pucca houses), in turn, decreases the probability of default by
8-16 percentage points.
Regarding the loan covariates, monthly (or higher) repayment
frequencies and an additional member receiving a loan decrease the
likelihood of defaulting by 3 and 0.2 percentage points, respectively,
among type-H group members; among type-L group members, the
corresponding decrease is of 26 and 5 percentage points, respectively.
An increase in the loan amount, interest rate, and loan duration also
results in a much higher increase in the probability of default among
type-L group members than among type-H group members.
These varying effects by type can help lenders to better assess
their clients and understand the factors driving their behavior. Land
ownership, housing conditions, labor activities, and membership in a
scheduled tribe seem to matter among type-L groups, in contrast to
type-H groups, for which the effects of these factors (if any) are much
more limited. The loan characteristics are also more relevant for type-L
groups than for type-H groups. These differences can help lending
institutions to reduce their transaction costs by offering
differentiated contracts based on group types.
Field and Pande (2008), for example, point out the trade-off
between higher repayment frequencies (a standard practice among
microfinance institutions to encourage fiscal discipline and reduce
default risk) and a substantial increase in transaction costs of
installment collection. The authors find that switching to lower
frequency repayment schedules could allow lenders to significantly
reduce their transaction costs with virtually no increase in client
default, particularly among first-time borrowers. Our results suggest
that the fiscal discipline imposed by frequent repayment is critical
among groups suspected (or with a higher probability) of being type-L
groups, but is not important for type-H groups, for which less costly
repayment schedules could be implemented; the cost savings are likely
higher than the (marginal) increase in the default rate in this type of
group. Promoting longer term investments through higher loan terms also
seems more reasonable among type-H groups, which could improve the
borrowers' repayment capacity in the long run (similarly to a more
flexible repayment schedule).
Encouraging additional members to receive a loan also seems to be
more relevant among groups suspected of being type-L groups. As
indicated by Armendariz de Aghion (1999), a larger group size tends to
increase peer monitoring and pressure efforts due to joint
responsibility, cost-sharing, and commitment effects for debt repayment,
although this positive effect could be offset by the increase in the
scope of free riding and higher coordination costs in considerably large
groups. The results by group type suggest that among type-L groups, the
stronger peer monitoring and pressure effects could outweigh the higher
coordination costs of having additional members in the group.
Unconditional Marginal Effects. We can also compare the parameter
estimates of the type-varying model to those obtained under a standard
probabilistic regression. The two models are expected to produce
different results, as the mixture model permits us to better account for
the inherent (unobserved) group heterogeneity and reduce (but not
eliminate) the endogeneity bias. Table 5 reports the unconditional
marginal effects (evaluated at the sample means) on the probability of
default resulting from the probit, two-type and three-type model. (27)
We include the results of the three-type model for comparison with the
two-type model. Note that in the type-varying models, the average member
characteristics and other group and village characteristics affect the
likelihood of default through the probability of being in a particular
group type.
Two patterns are worth noting. First, the resulting marginal
effects of the two- and three-type models are relatively similar. While
this may indicate stability in the estimates when moving to a mixture
setup, it can also result from that fact that the predicted probability
of the third group type is very close to zero in the three-type
specification, which is consistent with the finding that a two-type
model provides a better fit. Second, it follows that the probit and
type-varying model produces different marginal effects. For example,
being an agricultural laborer or belonging to a scheduled caste
increases the overall probability of default by roughly 4 percentage
points in the two-type model (all else equal), while in the probit
model, the change in probability is not significant; a similar pattern
is observed for the condition of living in pucca houses or being
self-employed agricultural workers, both of which decrease the overall
probability of default by 3 and 1 percentage points, respectively, in
the type-varying model and are not significant in the probit model.
Similarly, monthly (or higher) repayment frequencies will decrease the
likelihood of defaulting by 6 percentage points in the two-type model
and by 7 percentage points in the probit model, while an additional year
in the length of the loan will increase the likelihood of defaulting by
4 percentage points in the two-type model and by more than 8 percentage
points in the probit model.
From all models, however, the importance of holding frequent
meetings among group members to improve individuals' performance on
loan repayments becomes clear. In groups in which members meet at least
monthly, the individual probability of default is 30 percentage points
lower in the probit model and 45 percentage points lower in the
type-varying model than in groups in which members meet less often.
Frequent meetings may promote higher social interactions and result in
stronger peer monitoring and pressure. Both models also suggest that
defaulting is negatively correlated with promoting marketing and
insurance programs among group members and positively correlated with
subsidized food credit programs, which is also distinctive of poorer
groups. (28)
In sum, the results show the importance of having a flexible,
type-varying model, which further allows for varying effects by type and
provides better insight about the possible factors affecting the
members' repayment behavior.
Predictive Performance. We now analyze whether allowing for
different group types yields better out-of-sample predictions for the
probability of default. We want to examine whether the proposed
type-varying model has a higher predictive power than standard
probabilistic methods, which can further help to reduce information
asymmetries in microlending, especially in the absence of experimental
or quasi-experimental settings. To conduct the performance assessment,
we follow a standard cross-validation procedure and randomly partition
our dataset into a design sample for model estimation (60% of the
observations) and a test sample for further analysis (40% of the
observations). This exercise permits us to better approximate how the
models will perform in practice when using new information sets. The
partition is conducted at the group level and both samples maintain the
population proportions of default and non-default cases.
Table 6 provides performance indicators for the different models
estimated. (29) The indicators include the mean square predicted error
and several performance indicators based on the conversion of the
estimated default probabilities to a binary regime prediction using the
standard 0.5 rule. (30) For the two-type model, the performance
assessment is based on two alternative calculations of the probability
of default. Generally speaking, a lender could evaluate a potential loan
based on the estimated unconditional probability of default or based on
the conditional probability of default, depending on the likelihood of
being in a certain group type. Hence, different mixtures for estimating
the probability of default could be used.
The two approaches considered are:
1. a "naive" approach that only uses the unconditional
probability of default, such that
[mathematical expression not reproducible].
2. a "conservative" approach which takes into account the
likelihood of being in a type-H group. In particular,
[mathematical expression not reproducible]
where [??]r ([T*.sub.j] = [T.sup.H.sub.J) is the estimated
probability of being in a type-H group.
As shown in the table, the "naive" and
"conservative" approach report a lower mean squared prediction
error than the probit model (0.145 and 0.156 vs. 0.159). The two-type
approaches also show a higher overall predictive performance based on
McFadden, Puig, and Kirschner's (1977) standard measure. (31) The
"naive" approach has a predictive performance of 76.4% and the
"conservative" approach has a predictive performance of 76%,
compared to 74.7% of the probit model. The poorer performance of the
probit model is largely explained by its lower correct default
classification rate (i.e., identification of "bad" borrowers):
17.2% versus 21.9% for the "naive" approach and 31.3% for the
"conservative" approach. Regarding the correct nondefault
classification rate (i.e., identification of "good"
borrowers), the probit model performs better than the
"conservative" approach but poorer than the "naive"
approach.
An alternative way to evaluate the out-of-sample performance
consists of examining the number of "good" clients that the
model rates as "bad" (Type I error) and the number of
"bad" clients that the model rates as "good" (Type
II error) for varying cut-off values of the probability of default. In
Table 5, we used the standard 0.5 rule for the performance assessment,
but a lender may consider alternative threshold rules. Figures 1 and 2
compare the percentage of "good" borrowers rejected and the
percentage of "bad" borrowers accepted across the probit,
"naive", and "conservative" approaches for different
cut-off values.
In the case of Type I errors, the "naive" approach and
the probit model outperform the "conservative" approach for
most of the cut-off values. More specifically, for cut-off values above
0.1, the lending institution will do better in identifying
"good" clients by relying on the "naive" approach or
the probit model. In the case of Type II errors, however, both the
"naive" and the "conservative" approach outperform
the probit model for basically the entire range of cut-off values; for
values above 0.3, the "conservative" approach has a
considerably higher (and increasing) performance than the
"naive" approach. For sufficiently lenient acceptance rules
(cut-off values above 0.5), the differences in the percentage of
"bad" borrowers accepted between the "conservative"
approach and the other models are in the order of 10-23 percentage
points.
Overall, we generally attain a higher predictive power when
allowing for unobserved group types when modeling the probability of
default of group members. The proposed model can thus aid lenders to
allocate their resources more efficiently by better identifying and
selecting current and future clients (groups). If the lending
institution is more interested in reducing its default rates (i.e., by
minimizing the number of "bad" clients classified as
"good"), the lender should probably follow a
"conservative" approach. In contrast, if the lender is more
interested in increasing its pool of "good" borrowers (i.e.,
by identifying "good" clients classified as "bad"),
it should follow a "naive" approach, although the probit model
will also perform well in this case. However, for more lenient
acceptance rules, using a "naive" approach or probit model
will also result in a much higher acceptance rate of "bad"
clients relative to the "conservative" approach. (32)
Model Identification. Finally, we formally evaluate the
identification of the model. As noted previously, a direct implication
of the type-varying model is that we require some but not all
information about the factors describing group heterogeneity
([T*.sub.j]) to identify the parameters in the main repayment equation.
If the model is correctly identified, a partial set of the observable
characteristics ([[bar.X].sub.j], [G.sub.j]) used in the type Equation
(3) should produce estimated coefficients in the repayment Equation (2)
similar to those produced by a full set of these variables.
Table 7 reports the corresponding Hausman test results when
comparing our baseline model that includes the full set of variables in
[[bar.X].sub.j] and [G.sub.j] versus alternative specifications that
exclude some of these variables. We use a Hausmantype specification test
because it is a standard test, although we acknowledge that its
statistical power may be low in some cases. (33) To make the test more
rigorous, we exclude different sets of variables instead of individual
variables. The coefficients of both the individual and the loan
characteristics, included in the repayment equation, are generally not
too sensitive to the exclusion of different sets of variables in the
group-type equation. In all cases, there are not major systematic
differences (at a 5% level of significance) between the estimated
coefficients in the repayment equation across the different models. (34)
This exercise supports the robustness of the estimated mixture model.
IV. CONCLUDING REMARKS
This paper proposes and implements a mixture structure to model
repayment behavior in group lending with unobserved group heterogeneity.
Group-level unobservables may result from a combination of factors,
including peer selection and pressure as well as other elements such as
social cohesion. In the model, individuals make repayment decisions
based on their unobserved group type and observable individual and loan
characteristics. Average member characteristics and other group and
village characteristics help, in turn, to identify the group types. We
also allow the marginal effects in the repayment equation to vary across
types. We discuss the model properties and identification and provide
evidence supporting the robustness of the model.
We implement the model using data from a group lending program in
India. The estimation results support the model specification and show
the advantages of relying on a type-varying method when examining the
probability of default of group members. First, the model clearly
distinguishes two group types: an apparent "responsible" group
with a low probability of default and an apparent
"irresponsible" group with a high probability of default.
Frequent group interactions seem to be the foremost characteristic of
"responsible" groups. Second, we find important differences
across types in the marginal effects of the different characteristics
included in the repayment equation. For example, imposing high-frequency
repayment schedules and shorter loan terms and promoting a larger group
size appear more appropriate for seemingly "irresponsible"
groups. Third, the type-varying model generally shows a higher
predictive performance than standard probabilistic models, particularly
in the identification potential "defaulters."
The proposed model can attenuate information asymmetries in
microlending by helping lenders to better classify their potential
clients. In particular, the model can help microfinance institutions to
decrease the default rates they face by reducing the inclusion of
potential "bad" borrowers and, to a minor extent, by
increasing the inclusion of "good" borrowers who are left out
in sensitive microcredit markets. In addition, the model can help
lenders to better understand the potential factors driving the repayment
behavior of different group members. Understanding these different
factors can aid lenders in the design of loan contracts for different
"types" of clients. By doing so, microfinance providers can
allocate resources more efficiently and reduce the high transaction
costs they face.
It is worth noting that the analysis has focused on a two-type
model, given the nature of the data used in the application. Certainly,
there can be a wider set of types in other contexts; the proposed model
can be easily adapted to allow for additional types. Considerably
increasing the number of types may require though the imposition of
restrictions on the value of the coefficients in the repayment equation
(e.g., not necessarily allowing for different marginal effects across
all types) in order to avoid a highly parameterized model, which could
be difficult to estimate in practice. Our analysis also follows a
discrete treatment of the repayment decision, given the observed
behavior of most of the borrowers in the sample (either full repayment
or no payment). However, the model can be modified to examine instead
the share of the loan repaid by members. Last, as opposed to several
other studies on group repayment, we take advantage of member-level
data, which is often difficult to obtain. The proposed model can also be
used in a similar manner to model group repayment using group-level
data.
ABBREVIATIONS
2SLS: Two-Stage Least Squares
CDF: Cumulative Distribution Function
ILV: Instrumental-Like Variables
NDVI: Normalized Difference Vegetation Index
SBIC: Schwarz Bayesian Information Criterion
SHG: Self-Help Group
SQP: Sequential Quadratic Programming
doi: 10.1111/ecin.12541
REFERENCES
Ahlin C. "Matching for Credit: Risk and Diversification in
Thai Microcredit Groups." BREAD Working Paper No. 251, December,
2009.
Ahlin, C., and R. M. Townsend. "Using Repayment Data to Test
across Models of Joint Liability Lending." Economic Journal,
117(517), 2007, F11-51.
Armendariz de Aghion, B. "On the Design of a Credit Agreement
with Peer Monitoring." Journal of Development Economics, 60(1),
1999, 79-104.
Armendariz de Aghion, B., and J. Morduch. "Microfinance: Where
Do We Stand?" in Financial Development and Economic Growth:
Explaining the Links, edited by C. Goodhart. Basingstoke: Palgrave
Macmillan, 2004.
--. The Economics of Microfinance. Cambridge, MA: MIT Press, 2005.
Banerjee, A., T. Besley, and T. Guinnane. "The Neighbor's
Keeper: The Design of a Credit Cooperative with Theory and a Test."
Quarterly Journal of Economics, 109(2), 1994, 491-515.
Chowdury, P. R. "Group Lending: Sequential Financing, Lending
Monitoring and Joint Liability." Journal of Development Economics,
77(2), 2005, 415-39.
Cull, R., A. Demirguc-Kunt, and J. Morduch. "Financial
Performance and Outreach: A Global Analysis of Leading Microbanks."
Economic Journal. 117(517), 2007, F107-33.
Dong, Y., L. Gan, and Y. Wang. "Residential Mobility,
Neighborhood Effects, and Educational Attainment of Blacks and
Whites." Econometric Reviews, 34(6-10), 2015, 763-98.
Fearon, J. D., M. Humphreys, and J. M. Weinstein. "Can
Development Aid Contribute to Social Cohesion after Civil War? Evidence
from a Field Experiment in Post-Conflict Liberia." American
Economic Review, 99(2), 2009, 287-91.
Feigenberg, B., E. Field, and R. Pande. "The Economic Returns
to Social Interaction: Experimental Evidence from Microfinance."
Review of Economic Studies, 80(4), 2013, 1459-83.
Field, E., and R. Pande. "Repayment Frequency and Default in
Microfinance: Evidence from India." Journal of the European
Economic Association, 6(2-3), 2008, 501-9.
Fox, J. T., and A. Gandhi. "Identifying Heterogeneity in
Economic Choice and Selection Models using Mixture Models." Mimeo,
University of Chicago, 2008.
Gan, L., and M. A. Hernandez. "Making Friends with Your
Neighbors? Agglomeration and Tacit Collusion in the Lodging
Industry." Review of Economics and Statistics, 95(3), 2013,
1002-17.
Gan, L., F. Huang, and A. Mayer. "A Simple Test of Private
Information in the Insurance Markets with Heterogeneous Insurance
Demand." Economics Letters, 136, 2015, 197-200.
Ghatak, M. "Group Lending, Local Information, and Peer
Selection." Journal of Development Economics, 60(1), 1999, 27-50.
--. "Screening by the Company You Keep: Joint Liability
Lending and the Peer Selection Effect." Economic Journal, 110(465),
2000, 601-31.
Gine, X., and D. Karlan. "Group versus Individual Liability:
Short and Long Term Evidence from Philippine Microcredit Lending
Groups." Journal of Development Economics, 107, 2014, 65-83.
Gine, X., P. Jakiela, D. Karlan, and J. Morduch. "Microfinance
Games." American Economic Journal: Applied Economics, 2(3), 2010,
60-95.
Henry, M., Y. Kitamura, and B. Salanie. "Partial
Identification of Finite Mixtures in Econometric Models."
Quantitative Economics, 5(1), 2014, 123-44.
Hermes, N., and R. Lensink. "The Empirics of Microfinance:
What Do We Know?" Economic Journal, 117, 2007, 1-10.
Hermes, N., R. Lensink, and H. Mehrteab. "Peer Monitoring,
Social Ties and Moral Hazard in Group Lending Programmes: Evidence from
Eritrea." World Development, 33(1), 2005, 149-69.
Hu, Y. "Identification and Estimation of Nonlinear Models with
Misclassification Error Using Instrumental Variables: A General
Solution." Journal of Econometrics, 144(1), 2008, 27-61.
Karlan. D. "Social Connections and Group Banking."
Economic Journal, 117, 2007, 52-84.
Keane, M., and K. Wolpin. "The Career Decisions of Young
Men." Journal of Political Economy, 105(3), 1997, 473-522.
Knittel, C., and V. Stango. "Price Ceilings as Focal Points
for Tacit Collusion: Evidence from Credit Cards." American Economic
Review, 93(5), 2003, 1703-29.
Lewbel, A. "Estimation of Average Treatment Effects with
Misclassification." Econometrica, 75(2), 2007, 537-51.
Li, S., Y. Liu, and K. Deininger. "How Important Are
Endogenous Peer Effects in Group Lending? Estimating a Static Game of
Incomplete Information." Journal of Applied Econometrics, 28(5),
2013, 864-82.
Mahajan, A. "Identification and Estimation of Regression
Models with Misclassification." Econometrica, 74(3), 2006, 631-65.
McFadden, D., C. Puig, and D. Kirschner. "Determinants of the
Long-Run Demand for Electricity." Proceedings of the American
Statistical Association (Business and Economics Statistics Section, Part
2), 1977, 109-17.
Paxton, J., D. Graham, and C. Thraen. "Modeling Group Loan
Repayment Behavior: New Insights from Burkina Faso." Economic
Development and Cultural Change, 48(3), 2000, 639-55.
de Quidt, J., T. Fetzer, and M. Ghatak. "Group Lending without
Joint Liability." Journal of Development Economics,
121,2012,217-36.
Rai, A., and T. Sjostrom. "Is Grameen Lending Efficient?
Repayment Incentives and Insurance in Village Economies." Review of
Economic Studies, 71(1), 2004, 217-34.
Shankar S. "Transaction Costs in Group Micro Credit in India:
Case Studies of Three Microfinance Institutions." Centre for
Microfinance, Institute for Financial and Management Research Working
Paper, August, 2006.
Sharma, M., and M. Zeller. "Repayment Performance in
Group-Based Credit Programs in Bangladesh: An Empirical Analysis."
World Development, 25(10), 1997, 1731-42.
Stiglitz, J. "Peer Monitoring and Credit Markets." World
Bank Economic Review, 4(3), 1990, 351-66.
van Tassel, E. "Group Lending under Asymmetric
Information." Journal of Development Economics, 60(1), 1999, 3-25.
Varian, H. "Monitoring Agents with Other Agents." Journal
of Institutional and Theoretical Economics, 146, 1990, 153-74.
Wydick, B. "Can Social Cohesion Be Harnessed to Repair Market
Failure? Evidence from Group Lending in Guatemala." Economic
Journal, 109(457), 1999, 463-75.
Zeller, M. "Determinants of Repayment Performance in Credit
Groups: The Role of Program Design, Intragroup Risk Pooling, and Social
Cohesion." Economic Development and Cultural Change, 46(3), 1998,
599-620.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online
version of this article:
Appendix A. Exercise using simulated data
Table A1. Model performance using simulated data
Appendix B. Supplementary Tables and Figures
Table B1. Data description
Table B2. Sorting based on observables
Figure B1. Location of villages in Andhra Pradesh and group default
behavior
Figure B2. Distribution of intra-group default behavior by
different group characteristics
(1.) Ghatak (1999, 2000) and van Tassel (1999) showed, for example,
that in a context of individuals with heterogeneous risk types and
asymmetric information (where borrowers know each other's type but
lenders do not), group lending with joint liability will lead to the
formation of relatively homogeneous groups of either safe or risky
borrowers (i.e., positive assortative matching or homogeneous sorting).
The rationale behind is that while a borrower of any type prefers a safe
partner because of lower expected joint liability payments, safe
borrowers value safe partners more than risky partners because they
repay more often. Ahlin (2009) also found that borrowers will
antidiversify risk within groups in order to lower their chances of
facing liability for group members. Even in the absence of a joint
liability scheme, the unobserved informal risk sharing and social
cohesion among members may result in heterogeneous group types with
different repayment rates (see de Quidt, Fetzer, and Ghatak 2012;
Feigenberg, Field, and Pande 2013; Gine and Karlan 2014).
(2.) Besides mitigating adverse selection through peer screening,
group lending helps alleviate moral hazard behavior and enforce
repayment because members can more closely monitor each other's use
of loans and exert pressure to prevent deliberate default. See Stiglitz
(1990), Varian (1990). Banerjee, Besley, and Guinnane (1994), Armendariz
de Aghion (1999), and Chowdury (2005).
(3.) For instance, we would expect more correlated defaults in
groups with low trust levels among members (i.e., if a partner falls
behind in her payments or defaults, it may induce others to do so in a
context of low trust), whereas we would expect more nondefaults or full
repayments in groups with high trust levels and important peer effects
like peer monitoring.
(4.) In our application, this implies dropping more than 90% of the
observations.
(5.) We did not consider additional type specifications, as the
estimation of models with more than three types present convergence
issues with our working sample.
(6.) The estimated probability of a third group type is also very
close to zero, as opposed to the other two types (0.00002 vs. 0.80686
and 0.19312). Further details are available upon request.
(7.) This flexibility is similar to Gan and Hernandez (2013), who
allow for varying coefficients across potential collusive and
noncollusive regimes when modeling the pricing and occupancy rate
behavior of hotels under a switching regression framework.
(8.) Additional details regarding the different model
specifications considered are available upon request. We tried including
the standard deviation of the members' characteristics (as proxies
of group bonding) in the type equation, but the model using only average
characteristics provides a better fit. We prefer to limit the number of
regressors for convergence purposes.
(9.) We obtain qualitative similar results when using a nor mal
CDF. For the optimization process, we use the sequential quadratic
programming (SQP) iterative method, which is a medium-scale algorithm.
(10.) Mahajan (2006) studies the identification of regression
models with a misclassified binary regressor in a mixture density
context; the existence of ILV is one of the key assumptions of his
study. ILV are assumed to be independent of the misclassified binary
regressor conditional on a set of observed covariates and the true type.
A direct implication of this conditional independence is that ILV only
affect the modeled outcome through the true type.
(11.) Note also that the parameters in Equation (3) may not be
consistently estimated, as [T*.sub.j] is determined by both observable
and unobservable factors; however, this does not prevent us from
obtaining consistent estimates of the parameters in the repayment
Equation (2).
(12.) For the case of more than two types, additional assumptions
are required for point identification.
(13.) The data were collected in 2006, before the split of Andhra
Pradesh. The "Andhra Pradesh" referred to throughout this
study includes the two current states of Andhra Pradesh and Telangana.
During the study period, Andhra Pradesh was then the fourth largest
state in India by area and the fifth largest by population.
(14.) The process of internal savings and repayments promotes
social interaction among members and also helps to further screen
individuals as some may leave the group prior to obtaining a formal
loan. Groups may also implement nonlending programs such as in-kind
credit for subsidized rice, marketing, and insurance programs.
(15.) If some members fail to repay some installments, the other
members still have the incentive to repay on time, in hope that the
delinquent borrowers will repay their installments on a future date.
Naturally, a woman who maintains a good record and ends up in a group in
which not all members fulfilled their loan obligations may join another
group in the future.
(16.) See Li, Liu, and Deininger (2013) for further details on the
survey instrument and data collection.
(17.) A detailed description of the variables used in the analysis
is provided in Table B1 of Appendix B in Appendix S1.
(18.) The fact that groups in which all members defaulted in our
sample are not concentrated at particular locations also reduces the
possibility of specific weather shocks or other contextual factors in
specific areas explaining the observed default behavior. Figure B1 of
Appendix B in Appendix S1 shows that villages with a high proportion of
groups in which all members default are well dispersed across the eight
districts analyzed. For areas with available weather data (rainfall) and
vegetation information (normalized difference vegetation index or NDVI),
we also did not find any significant correlation between these measures
and default behavior. In particular, we included these variables as
additional regressors in the empirical model for robustness check and
found that they are jointly insignificant.
(19.) The comparisons exclude all villages (150 out of457) and
mandals (3 out of 97) where there is only one group in the village or
mandal. A mandal is the equivalent to a subdistrict in India and
comprises several villages.
(20.) As noted earlier, the village and group controls are
predetermined before the start of the loan; these variables, however,
are still not required to be fully exogenous to identify the group
types.
(21.) Recall that in our raw data, we observe full repayment by all
members in 76% of the groups, while in another 17% of the groups, all
members default.
(22.) A semi-pucca house is characterized by a combination of
materials generally found in both pucca and kutcha houses (the other two
house types in the sample). A pucca house has walls and roofs made of
burnt bricks, stones, cement concrete, and timber, while a kutcha house
uses hay, bamboo, mud, and grass.
(23.) Ultimately, frequent meetings may proxy for female
empowerment, which could also affect whether the borrowers or group can
command household resources for repayment.
(24.) Fearon, Humphreys, and Weinstein (2009) and Feigenberg,
Field, and Pande (2013) showed the importance of community development
programs in different settings to encourage social cohesion.
(25.) Similar patterns are observed when comparing groups with and
without marketing programs and a financial institution in the village,
which are also correlated with the likelihood of being a type-H group in
the model.
(26.) The normal-based confidence intervals reported for the
estimated marginal effects are based on 200 bootstrap replications and
are biased-corrected. Although not reported, the bootstrap means are
very similar to the estimated marginal effects, which support the
bootstrap procedure implemented.
(27.) We use a probit model because it provides a better fit and
performance than a logit and a linear probability model. We consider
both a probit model that only accounts for member and loan
characteristics (simple probit model) and a second model that also adds
average member characteristics and other group and village controls
(full probit model). For comparison purposes, the confidence intervals
of the marginal effects for all models were derived using 200 bootstrap
replications.
(28.) The existence of a financial institution and a telephone in
the village is also highly correlated with a positive repayment behavior
under the two models.
(29.) The results are based on 200 repeated 60%-40% partitions. The
results are not sensitive to alternative data partitions (70%-30% and
50%-50%, respectively).
(30.) If the estimated default probability is greater or equal to
0.5, the individual is predicted to default; otherwise, the individual
is predicted to not default.
(31.) McFadden, Puig, and Kirschner (1977) overall performance
measure is equal to [p.sub.11] + [p.sub.22] - [p.sup.2.sub.12] -
[p.sup.2.sub.21], where [p.sub.ij] is the ijth entry (expressed as a
fraction of the sum of all entries) in the 2 x 2 confusion matrix of
actual versus predicted (0,1) outcomes using the 0.5 rule.
(32.) For example, for a cut-off value of 0.4, the
"naive" approach outperforms the "conservative"
approach by 3 percentage points in terms of the rejection rate of
"good" clients, while the "conservative" approach
outperforms the "naive" approach by a similar degree in terms
of the acceptance rate of "bad" clients. However, for a
cut-off value of 0.6, the "naive" approach outperforms the
"conservative" approach by 4 percentage points when
identifying "good" clients, while the "conservative"
approach outperforms the "naive" approach by 14 percentage
points when identifying "bad" clients.
(33.) We evaluated the statistical power of the Hausman test in a
context similar to ours using simulated data and find a power of 53%-73%
for sample sizes between 1,000 and 15,000 observations. Additional
details are available upon request.
(34.) Naturally, the coefficients are less sensitive when excluding
individual variables from the type equation. Details are available upon
request.
Caption: FIGURE 1 Comparison of Type I Errors
Caption: FIGURE 2 Comparison of Type II Errors
LI GAN, MANUEL A. HERNANDEZ AND YANYAN LIU*
*We thank Alan de Brauw, Arun Chandrasekhar, Hanming Fang, Dean
Karlan, Thierry Magnac, Carlos Martins-Filho, Eduardo Nakasone, Salvador
Navarro, Petra Todd, Annabel Vanroose, Ruth Vargas-Hill, and seminar
participants at the Winter Meetings of the Econometric Society, Latin
American Meeting of the Econometric Society, China Meeting of the
Econometric Society, Pacific Development Conference, Experimental
Methods in Policy Conference, Peruvian Economic Association Annual
Conference, IFPRI, GRADE, and Universidad de Piura for their helpful
comments. We also thank the staff of the Center for Economics and Social
Studies, particularly Prof. S. Galab, for their support and
collaboration in making the data available. Similarly, we thank Zhe Guo
for his valuable research assistance. Last, we would like to thank
Dietrich Vollrath and two anonymous referees for their many useful
comments. We gratefully acknowledge financial support from the CGIAR
Research Program on Policies, Institutions and Markets and the Private
Enterprise Research Center (PERC) of Texas A&M University.
Gan: Professor, Department of Economics, Texas A&M University
and NBER, College Station, TX 77843. Phone 979-862-1667, Fax
979-847-8757, E-mail
[email protected]
Hernandez: Research Fellow, Markets, Trade and Institutions
Division, International Food Policy Research Institute--IFPRI,
Washington, DC 20006. Phone 202-862-5645, Fax 202-467-4439, E-mail
[email protected]
Liu: Senior Research Fellow, Markets, Trade and Institutions
Division, International Food Policy Research Institute--IFPRI,
Washington, DC 20006. Phone 202862-4649, Fax 202-467-4439, E-mail
[email protected]
TABLE 1 Summary Statistics
Variable Mean Std. Dev. Min Max
Panel I: Individual characteristics
(12,883 observations)
If defaulted 0.20 0.40 0.00 1.00
If literate 0.23 0.42 0.00 1.00
If disabled member in household 0.06 0.24 0.00 1.00
If owns land 0.65 0.48 0.00 1.00
If lives in pucca house 0.33 0.47 0.00 1.00
If lives in kutcha house 0.22 0.42 0.00 1.00
If self-employed agricultural 0.20 0.40 0.00 1.00
worker
If agricultural laborer 0.61 0.49 0.00 1.00
If belongs to scheduled 0.31 0.46 0.00 1.00
tribe/caste
If belongs to leading caste 0.92 0.27 0.00 1.00
Panel 2: Group and loan
characteristics (1,110 groups)
Average member characteristics
% literate 0.22 0.21 0.00 0.94
% disabled member in household 0.05 0.10 0.00 0.94
% own land 0.59 0.31 0.00 0.95
% live in pucca house 0.32 0.31 0.00 0.95
% live in kutcha house 0.21 0.26 0.00 0.95
% self-employed agricultural 0.18 0.30 0.00 0.95
worker
% agricultural laborer 0.56 0.36 0.00 0.95
% belong to scheduled 0.31 0.43 0.00 1.00
tribe/caste
% belong to leading caste 0.91 0.14 0.36 1.00
Other group and village
characteristics
Age of group (years) 6.44 2.49 1.00 25.00
If group has food credit program 0.28 0.45 0.00 1.00
If group has marketing program 0.15 0.35 0.00 1.00
If group has insurance program 0.25 0.43 0.00 1.00
If group meets at least monthly 0.89 0.31 0.00 1.00
If located in Telangana 0.45 0.50 0.00 1.00
If located in Rayalaseema 0.26 0.44 0.00 1.00
If located in Coastal Andhra 0.29 0.45 0.00 1.00
Pradesh
Number of group members 12.52 2.37 7.00 20.00
If financial institution in 0.34 0.47 0.00 1.00
village
If public bus in village 0.66 0.48 0.00 1.00
If telephone in village 0.75 0.43 0.00 1.00
If post office in village 0.63 0.48 0.00 1.00
Loan characteristics
Amount of loan (rupees) 3,338 2,685 400 25,000
Number of members with loan 11.61 3.24 2.00 20.00
Annual interest rate (%) 12.83 3.10 6.00 25.00
Length of loan (years) 1.11 0.46 0.17 5.00
If repayment at least monthly 0.96 0.19 0.00 1.00
If loan due in 2004 0.11 0.31 0.00 1.00
If loan due in 2005 0.49 0.50 0.00 1.00
If loan due in 2006 0.40 0.49 0.00 1.00
TABLE 2
Intragroup Default Behavior
Groups
Default Behavior # %
If none of the members defaulted 848 76.4
If all of the members defaulted 188 16.9
If some of the members defaulted 74 6.7
Total 1,110 100.0
TABLE 3
Probability of Default, Two-Type Model
Variable Type H Type L
Coefficient Coefficient
(Std. Error) (Std. Error)
Dependent variable: If default
Constant -3.399 (0.629) 7.775 (28.740)
If literate 0.160 (0.105) 0.540 (0.206)
If disabled member in household 0.258 (0.163) -0.263 (0.383)
If owns land 0.180 (0.119) -0.556 (0.181)
If lives in pucca house -0.198 (0.122) -0.997 (0.186)
If lives in kutcha house 0.022 (0.124) -0.844 (0.209)
If self-employed agricultural worker -0.593 (0.184) 1.173 (0.266)
If agricultural laborer 0.120 (0.140) 1.748 (0.155)
If belongs to scheduled tribe/caste 0.082 (0.110) 2.736 (0.279)
If belongs to leading caste -0.092 (0.163) 0.260 (0.383)
Amount of loan (1,000 rupees) 0.068 (0.016) 0.462 (0.049)
Number of members with loan -0.062 (0.090) -0.338 (0.151)
Number of members with loan squared 0.001 (0.004) 0.003 (0.007)
Annual interest rate (%) 0.083 (0.013) 0.277 (0.034)
Length of loan (years) 0.508 (0.081) 0.963 (0.193)
If repayment at least monthly -0.497 (0.244) -10.989 (5.515)
If loan due in 2005 -1.267 (0.435) -0.128 (0.287)
If loan due in 2006 1.052 (0.189) 1.229 (0.286)
Probability of type-H group
Constant -2.901 (2.501)
% literate 1.921 (0.409)
% disabled member in household 1.630 (0.777)
% own land 0.707 (0.212)
% live in pucca house -1.124 (0.276)
% live in kutcha house -1.052 (0.228)
% self-employed agricultural worker 0.697 (0.323)
% agricultural laborer 1.902 (0.318)
% belong to scheduled tribe/caste 0.623 (0.167)
% belong to leading caste -1.020 (0.496)
Age of group (years) 0.025 (0.066)
Age of group squared -0.004 (0.004)
If group has food credit program -0.951 (0.115)
If group has marketing program 1.688 (0.277)
If group has insurance program 0.443 (0.139)
If group meets at least monthly 3.105 (0.223)
If located in Telangana 2.320 (0.255)
If located in Rayalaseema 0.652 (0.211)
Number of group members 0.132 (0.360)
Number of group members squared -0.014 (0.014)
If financial institution in village 0.979 (0.139)
If public bus in village 0.139 (0.117)
If telephone in village 1.076 (0.168)
If post office in village -0.684 (0.130)
Predicted probability of being
type-H group
Average 79.8%
Group, no members defaulting 82.9%
Groups, all members defaulting 66.9%
Groups, some members defaulting 76.4%
Predicted individual default
probability
Average 19.6%
Conditional on being in type-H 9.5%
group
Conditional on being in type-L 62.8%
group
# observations 12,883
Log likelihood -5,111.6
TABLE 4
Conditional Marginal Effects (Percentage Points)
Type H
Marginal Effect
Variable [95% Confidence
Interval]
Individual characteristics
If literate 0.84 [-0.14 to 1.81]
If disabled member in household 1.44 [-0.54 to 3.53]
If owns land 0.89 [0.23 to 1.69]
If lives in pucca house -0.97 [-1.91 to-0.06]
If lives in kutcha house 0.11 [-0.78 to 1.19]
If self-employed agricultural worker -2.57 [-3.91 to-1.19]
If agricultural laborer 0.60 [-0.72 to 1.82]
If belongs to scheduled tribe/caste 0.42 [-0.18 to 1.14]
If belongs to leading caste -0.48 [-2.48 to 1.18]
Loan characteristics
1,000 rupees increase in loan 0.36 [0.22 to 0.50]
One more member with loan -0.23 [-0.32 to -0.13]
1 % increase interest rate 0.44 [0.32 to 0.52]
One more year in length of loan 3.23 [2.27 to 3.95]
If repayment at least monthly -3.08 [-5.08 to-1.11]
If loan due in 2005 -6.60 [-8.33 to -4.97]
If loan due in 2006 6.03 [4.10 to 7.43]
Type L
Marginal Effect
Variable [95% Confidence
Interval]
Individual characteristics
If literate 7.33 [2.39 to 11.57]
If disabled member in household -4.21 [-24.12 to 11.92]
If owns land -7.87 [-13.13 to-2.19]
If lives in pucca house -16.44 [-21.08 to-9.58]
If lives in kutcha house -14.47 [-21.46 to-8.02]
If self-employed agricultural worker 13.95 [7.65 to 18.10]
If agricultural laborer 29.16 [19.65 to 36.86]
If belongs to scheduled tribe/caste 31.20 [24.78 to 36.05]
If belongs to leading caste 4.15 [-8.23 to 14.55]
Loan characteristics
1,000 rupees increase in loan 5.92 [4.08 to 6.88]
One more member with loan -4.77 [-7.24 to-1.04]
1 % increase interest rate 3.77 [2.39 to 4.68]
One more year in length of loan 10.39 [6.79 to 12.36]
If repayment at least monthly -26.28 [-35.23 to-13.69]
If loan due in 2005 -1.91 [-6.85 to 4.88]
If loan due in 2006 17.05 [12.08 to 20.68]
Note: The marginal effects are calculated at the means of the
covariates. For continuous variables, the corresponding change
is indicated in the table. For discrete variables, the change is
from 0 to 1. The confidence intervals reported are normal-based and
biased-corrected using 200 bootstrap replications.
TABLE 5
Unconditional Marginal Effects (Percentage Points)
Probit Model Full Probit Model
Marginal Effect Marginal Effect
[95% Confidence [95% Confidence
Variable Interval] Interval]
Individual
characteristics
If literate -0.81 -0.18
[-2.01 to 0.51] [-1.84 to 1.58]
If disabled member -1.62 -0.04
in household [-4.01 to 0.72] [-3.15 to 3.21]
If owns land -0.84 0.18
[-1.71 to 0.27] [-1.37 to 2.18]
If lives in pucca -0.37 -0.73
house [-1.48 to 0.64] [-2.86 to 1.22]
If lives in kutcha 2.82 -0.11
house [1.43 to 4.26] [-2.30 to 2.19]
If self-employed -0.37 0.04
agricultural worker [-2.09 to 1.02] [-3.25 to 2.76]
If agricultural 0.76 0.59
laborer [-0.67 to 2.02] [-2.12 to 3.16]
If belongs to 6.10 -1.98
scheduled [5.40 to 6.83] [-5.35 to 1.17]
tribe/caste
If belongs to 3.12 -0.23
leading caste [1.06 to 4.76] [-3.37 to 2.05]
Loan characteristics
1,000 rupees 1.60 1.45
increase in loan [1.46 to 1.76] [1.30 to 1.63]
One more member 0.01 0.15
with loan [-0.14 to 0.16] [-0.06 to 0.34]
1% increase 1.19 1.37
interest rate [1.13 to 1.26] [1.30 to 1.45]
One more year in 7.90 8.31
length of loan [7.47 to 8.26] [7.90 to 8.69]
If repayment at -14.03 -6.78
least monthly [-15.83 to-12.55] [-8.28 to-5.51]
If loan due in 2005 -6.01 -5.84
[-6.59 to-5.36] [-6.44 to -5.14]
If loan due in 2006 9.52 10.64
[8.90 to 10.18] [9.97 to 11.35]
Average member
characteristics
10% increase 0.00
literate [-0.21 to 0.21]
10% increase -0.94
disabled member [-1.35 to-0.56]
10% increase own -0.51
land [-0.74 to -0.33]
10% increase pucca -0.12
house [-0.33 to 0.12]
10% increase 0.45
kutcha house [0.20 to 0.68]
10% increase 0.12
self-employed [-0.19 to 0.48]
agricultural worker
10% increase 0.18
agricultural [-0.11 to 0.47]
laborer
10% increase 0.75
scheduled [0.42 to 1.11]
tribe/caste
10% increase 0.49
leading caste [0.24 to 0.85]
Other group and
village
characteristics
One more year of 1.19
age of group [1.03 to 1.36]
If group has food 8.08
credit program [7.67 to 8.57]
If group has -6.12
marketing program [-6.49 to-5.76]
If group has -5.29
insurance program [-5.75 to -4.88]
If group meets at -30.11
least monthly [-30.88 to -29.49]
If located in -9.58
Telangana [-10.03 to-9.13]
If located in -2.79
Rayalaseema [-3.32 to -2.28]
One more member -1.41
in group [-1.63 to-1.15]
If financial -6.01
institution in [-6.39 to-5.65]
village
If public bus in 1.19
village [0.83 to 1.59]
If telephone -3.43
in village [-3.83 to -3.01]
If post office 0.97
in village [0.66 to 1.34]
Two-Type Model Three-Type Model
Marginal Effect Marginal Effect
[95% Confidence [95% Confidence
Variable Interval] Interval]
Individual
characteristics
If literate 1.56 1.52
[0.54 to 2.50] [0.51 to 2.46]
If disabled member 0.82 0.65
in household [-1.45 to 2.76] [-1.62 to 2.60]
If owns land -0.08 -0.02
[-0.80 to 0.76] [-0.74 to 0.82]
If lives in pucca -2.68 -2.59
house [-3.67 to-1.50] [-3.57 to-1.40]
If lives in kutcha -1.50 -1.47
house [-2.74 to-0.20] [-2.70 to-0.16]
If self-employed -0.74 -0.81
agricultural worker [-2.13 to 0.51] [-2.20 to 0.44]
If agricultural 3.76 3.66
laborer [2.30 to 5.03] [2.20 to 4.93]
If belongs to 3.83 3.86
scheduled [2.38 to 5.33] [2.41 to 5.36]
tribe/caste
If belongs to 0.03 -0.21
leading caste [-1.94to 1.51] [-2.21 to 1.24]
Loan characteristics
1,000 rupees 0.97 0.94
increase in loan [0.77 to 1.11] [0.73 to 1.07]
One more member -0.74 -0.66
with loan [-0.95 to -0.37] [-0.87 to -0.29]
1% increase 0.81 0.77
interest rate [0.65 to 0.89] [0.62 to 0.86]
One more year in 4.02 3.84
length of loan [3.21 to 4.48] [3.01 to 4.29]
If repayment at -5.65 -5.49
least monthly [-7.60 to -3.39] [-7.43 to -3.22]
If loan due in 2005 -6.08 -6.01
[-7.17 to-4.85] [-7.10 to-4.78]
If loan due in 2006 7.25 6.97
[5.55 to 8.39] [5.27 to 8.11]
Average member
characteristics
10% increase -1.34 -1.35
literate [-1.66 to-1.04] [-1.67 to-1.05]
10% increase -1.15 -1.10
disabled member [-1.64 to-0.56] [-1.59 to-0.50]
10% increase own -0.52 -0.54
land [-0.80 to -0.28] [-0.83 to-0.31]
10% increase pucca 0.88 0.87
house [0.60 to 1.13] [0.59 to 1.12]
10% increase 0.82 0.85
kutcha house [0.46 to 1.25] [0.49 to 1.28]
10% increase -0.51 -0.49
self-employed [-0.92 to-0.05] [-0.90 to -0.03]
agricultural worker
10% increase -1.33 -1.32
agricultural [-1.64 to-1.01] [-1.63 to-1.00]
laborer
10% increase -0.46 -0.50
scheduled [-0.72 to -0.28] [-0.76 to -0.33]
tribe/caste
10% increase 0.80 0.91
leading caste [0.29 to 1.53] [0.40 to 1.64]
Other group and
village
characteristics
One more year of 0.06 0.08
age of group [-0.21 to 0.37] [-0.20 to 0.38]
If group has food 8.46 9.14
credit program [4.94 to 13.33] [5.64 to 14.02]
If group has -8.36 -8.47
marketing program [-9.43 to-7.51] [-9.55 to -7.62]
If group has -3.07 -3.35
insurance program [-4.50 to -2.20] [-4.79 to -2.49]
If group meets at -44.59 -44.95
least monthly [-47.40 to -42.51] [-47.78 to -42.89]
If located in -18.01 -18.23
Telangana [-22.78 to-13.68] [-23.03 to-13.93]
If located in -4.27 -4.18
Rayalaseema [-5.33 to -3.02] [-5.24 to-2.93]
One more member 1.27 1.17
in group [0.60 to 1.73] [0.49 to 1.62]
If financial -6.59 -6.84
institution in [-8.22 to -5.45] [-8.48 to -5.70]
village
If public bus in -1.06 -0.92
village [-1.72 to-0.12] [-1.56 to 0.04]
If telephone -9.96 -9.87
in village [-11.56 to-8.18] [-11.47 to-8.09]
If post office 4.85 5.10
in village [3.89 to 6.31] [4.14 to 6.56]
Note: The marginal effects are calculated at the means of
the covariates. For continuous variables, the corresponding
change is indicated in the table. For discrete variables, the
change is from 0 to I. The confidence intervals reported are
normal-based and bias-corrected using 200 bootstrap replications.
TABLE 6
Out-of-Sample Performance of Alternative Models
Probit Two-Type Two-Type
Indicator Model "Naive" "Conservative"
Out-of-sample predictive
performance (5,068 obs.)
Mean square predicted error 0.159 0.145 0.156
Predictive performance 74.7% 76.4% 76.0%
Correct default/nondefault 77.9% 79.2% 78.6%
classification
Correct default classification 17.2% 21.9% 31.3%
(sensitivity), 1,062 defaults
Correct nondefault 94.0% 94.4% 91.2%
classification (specificity),
4,006 nondefaults
Note: The "naive" approach is based on the unconditional
probability of default of each individual. The "conservative"
approach uses the probability of default based on the probability
of an individual being in a particular group type. The performance
and classification rates are based on converting the estimated
default probabilities to a binary regime prediction using the
standard 0.5 rule. The predictive performance measure is based on
McFadden, Puig, and Kirschner (1977); the measure is equal to
[p.sub.11] + [p.sub.22] - [p.sup.2.sub.12]--[p.sup.2.sub.21], where
[p.sub.ij] is the ijth entry in the standard 2x2 confusion matrix
of actual versus predicted (0,1) outcomes in which the entries are
expressed as a fraction of the sum of all entries. Sensitivity
accounts for the percentage of cases in which individuals
defaulting are also predicted to default, while specificity
measures the percentage of cases in which individuals not
defaulting are also predicted to not default. The results are based
on 200 repeated 60%-40% data partitions (averages reported).
TABLE 7
Hausman Tests: Baseline Model versus
Alternative Specifications
Variables Excluded [H.sub.0]: Difference in
Coefficients of Repayment
Equation between Baseline
Model and Alternative
Specifications
Not Systematic
Average member characteristics 16.610
(0.165)
Group programs 12.402
(0.574)
Frequency of group meetings 32.087
(0.076)
Group location 11.307
(0.662)
Note: Hausman chi-squared statistics reported and p
values in parenthesis.
COPYRIGHT 2018 Western Economic Association International
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.