文章基本信息

标题：Comparing apples to apples: insuring consistency of measurement with the balanced scorecard.
作者：Sale, Martha Lair
期刊名称：Journal of Organizational Culture, Communications and Conflict
印刷版ISSN：1544-0508
出版年度：2004
期号：January
语种：English
出版社：The DreamCatchers Group, LLC
摘要：Performance evaluation is one of the most complex measurement issues in management accounting. Despite the assertion that measurement of the performance of the individual be divorced from measurement of performance of the business unit, complex interrelations make such a measure extremely difficult. The task of evaluation is further complicated by various questions and concerns about who should do the evaluating and questions regarding what should be evaluated-outcomes, behaviors, or competency levels. The combination results in an overwhelmingly complex set of measurement criteria. Balanced Scorecard (Kaplan & Norton, 1992) has introduced a systematic approach to the measure of qualitative dimensions for business performance, but actually producing an internally consistent quantified score that is suitable for comparisons between periods and between subjects continues to be extremely difficult. This paper illustrates the use of the Analytic Hierarchy Process (Saaty, 1994) as a mechanism for dealing with this highly complex scoring problem.
关键词：Business performance management;Corporate culture;Decision making;Decision-making;Evaluation;Managerial accounting;Organizational effectiveness

Comparing apples to apples: insuring consistency of measurement with the balanced scorecard.

Sale, Martha Lair

ABSTRACT

Performance evaluation is one of the most complex measurement issues in management accounting. Despite the assertion that measurement of the performance of the individual be divorced from measurement of performance of the business unit, complex interrelations make such a measure extremely difficult. The task of evaluation is further complicated by various questions and concerns about who should do the evaluating and questions regarding what should be evaluated-outcomes, behaviors, or competency levels. The combination results in an overwhelmingly complex set of measurement criteria. Balanced Scorecard (Kaplan & Norton, 1992) has introduced a systematic approach to the measure of qualitative dimensions for business performance, but actually producing an internally consistent quantified score that is suitable for comparisons between periods and between subjects continues to be extremely difficult. This paper illustrates the use of the Analytic Hierarchy Process (Saaty, 1994) as a mechanism for dealing with this highly complex scoring problem.

INTRODUCTION

Performance evaluation is perhaps the most complex measurement issue in management accounting. Despite the assertion that measurement of the performance of the individual be divorced from measurement of performance of the business unit, complex interrelations make such a measure extremely difficult. Add to this, an environment in which teams are increasingly common. Personnel performance evaluations should focus on individual contributions sufficiently to prevent social loafing, but not to an extent that ignores the synergistic properties that make groups work.

The task of evaluation is further complicated by various questions and concerns about who should do the evaluating-supervisors, peers, customers, or subordinates. Add to this questions regarding what should be evaluated-outcomes, behaviors, or competency levels. All these dimensions result in an amazingly complex performance evaluation milieu. Although the Balanced Scorecard, or BSC (Kaplan & Norton, 1992) has introduced a systematic approach to the measure of qualitative dimensions for performance, actually producing an internally consistent quantified score that is suitable for comparisons between periods and between subjects continues to be extremely difficult.

Fortunately, a mechanism for dealing with this type of highly complex scoring problem already exists in a decision science technique known as the Analytic Hierarchy Process or AHP (Saaty, 1994). AHP is a widely acclaimed multicriteria decision-making technique that allows analysts not only to grasp a problem of this magnitude but, by using a mathematically rigorous process, can also offer a measure of internal consistency (Saaty, 1996). The AHP has become one of the most popular aids to decision-making and has been popularized and made widely available through "Expert Choice" software (Expert Choice, Inc., 2000; Forman et al., 1983). This paper presents a format for using a Balanced Scorecard approach and AHP to produce an internally consistent, comprehensive measure of personnel performance.

The first question that must be addressed in development of an effective performance evaluation system is the question of its purpose. Evaluation systems are used to give employees useful feedback, coaching, and guidance to help them improve their performance and develop job related skills (Peiperl, 2001). Evaluations can be used as part of a formal goal setting system (Scott & Einstein, 2001). They can also be used for making staffing decisions such as which employees will receive raises, or conversely, they may provide legal documentation protecting an organization from suit in the event that an employee must be dismissed. Depending on how the evaluation system is designed, it will work better for some of these purposes than for others (Goldstein, 1998).

Increasingly, companies are using the Balanced Scorecard (BSC) as a mechanism to link the performance of individuals in the company to accomplishment of the strategic goals of the company (Kaplan & Norton, 1992). The BSC is developed to provide an integrated set of performance criteria that provides a successful measure of how well each individual and sub-unit within an organization supports all the goals associated with the Critical Success Factors (CSF) of the company. The strength of the BSC is the use of a variety of measures to link the performance of all the elements of the organization to the strategic performance of the organization. The BSC incorporates both financial and non-financial measures that may be either qualitative or quantitative and provide both outcome (or lagging indicators) and performance drivers (or leading indicators) (Kaplan & Norton, 1992).

Some version of the Balanced Scorecard is now in use in approximately half the Fortune 1,000 companies in the United States and about forty percent of the European counterparts (Gambus & Lyons, 2002). According to Gambus and Lyons (2002) Philips Electronics has committed to the BSC as a tool that is both effective and enduring. Philips has involved its 250,000 employees in more than 150 countries around the world in developing a BSC with three levels. The current levels, the strategy review card, operations review card, and the business unit card, are expected to be enhanced by a fourth level, the individual employee card, right away. At Phillips, the corporate quality department provided comprehensive guidelines for metric linkage between these different levels to assure coordination. They address the robustness of the measurement system by addressing the fact that goals set at a lower level must support the goals of higher level to assure that it is possible to meet or exceed all Critical Success Factor goals (Gambus & Lyons, 2002).

Add to this emphasis on a variety of strategic measures that are applied consistently through different levels of the organization, the difficulty of accurately measuring the performance of individuals in an environment that is turning increasingly to group activities and the problem becomes almost intractable. For the BSC to assume its place as an effective measure of the strategic performance of all elements of the organization it must be practical to administer and the results must be accepted and relied upon by decision makers. Research shows that managers often have difficulty incorporating the results of subjective measures into the strategic decision-making process because they perceive such measures to be unreliable (Liberatore & Miller, 1998). Some adopters of the BSC have already abandoned its use because managers could not deal with the complexity and ambiguity inherent in the measures (Gambus & Lyons, 2002). Despite the adoption of BSC by increasing numbers of companies, little has been published about how managers deal with the issues of complexity and ambiguity in the measures.

To develop a Balanced Scorecard that effectively links the performance of all levels of the organization to the organizational strategy a number of evaluation problems must be addressed. The organization must develop an evaluation that consistently measures the contribution of subjects who work in a variety of situations involving various teams and individual efforts. In addition, this evaluation must consider the degree to which the evaluator is able to judge the performance of the subject or the weight which should be given different evaluators opinions of the subject's performance. Whether the subject of evaluation is an individual, a team or an entire business unit, the mechanics of the evaluation process may be the same, however the choice of evaluators and the weights given the contributions of individuals would differ. When evaluating teams, such as work or service teams, whose members all perform the same or similar tasks, the contributions of each individual would receive roughly the same weight. In network teams where each individual is an expert in a different field, the contributions of the expert in that area would be given a higher weight for the criteria that fall into their expertise. For example, individuals who participate in multiple teams, or work alone and in teams, first would have their evaluators grouped into various "evaluation groups" based on their connection to the evaluation subject. For example, three separate evaluations groups would assess an individual evaluation subject who belonged to Team A, Team B, and who also performed as an individual. One group would consist of constituents of each team and the third group would be made of constituents of the individual. These evaluation groups would then be subdivided into "evaluator classifications" based on their relationship to the subject, such as manager, team leader, team member, and coworker and the relative importance of the opinion of each classification determined. Finally, each individual evaluator would complete an evaluation, and these evaluations will be combined using weighting factors that recognized the relative importance of the individual evaluator's input. For example, perhaps the team leader's opinion is more important than the opinion of a coworker. Incorporation of these complex relationships results in a single evaluation that can be looked at in terms of individuals or groups.

The same outcome, behavior, and competency level measures can be used to evaluate various subjects whether individuals or teams. However, it should be pointed out that although what is good for two subjects may be the same, what is important for the same two subjects might be very different. For example, creative problem solving skills would probably be considered a good thing for any person or team to possess without regard to the type of team or individual job function. However, creative problem solving is probably much more important to research and development teams than it is to work or service teams that deal with repetitive, routine tasks. Obviously, the use of a generic one-size-fits-all performance-evaluation that ignores important differences between various individuals and teams is inappropriate, but that does not mean that the same format cannot be used for all of the evaluation systems (Scott & Einstein, 2001).

Use of a formal multiattribute decision analysis process makes it possible to incorporate the myriad elements of a multi-level Balanced Scorecard into a decision process that is capable of handling the complexity of the interaction between the elements and provides the degree if internal validity requisite to assure that performance measures are consistent from one evaluation period to the next and from one subject to the next. One such multiattribute decision analysis process is the Analytic Hierarchy Process, or AHP (Saaty, 1994; Saaty, 1996).

AHP is a widely used method for analyzing complex decision-making problems. It breaks the decision problem down into small easily understandable parts, organizes these component parts into a hierarchy of levels, and provides a mechanism for evaluating the interrelationships among the components of the hierarchy (Saaty, 1994). The AHP is a process designed to facilitate the formalization of multicriteria decision-making. It allows the decision maker to incorporate both "hard data" and less quantifiable elements such as judgments, feelings, and experiences. AHP has been widely used in a variety of decision-making applications (Saaty, 1996).

Using AHP the decision problem is structured hierarchically from criteria to lower level subcriteria. The resulting model is called a value tree or a hierarchy of criteria and objectives. Users of AHP make a series of pair-wise comparisons of these criteria. If the criteria being compared are objective, the numeric values for the criteria are compared. If, however, the criteria are wholly or partially subjective, then the comparisons are made on the basis of relative preference between the two on a scale of one to ten where one indicates no preference and ten indicates an overwhelming preference. Once these comparisons have been established for each criterion an n x n matrix of comparisons, where n equals the number of criteria, is constructed. In this matrix, the elements are arrayed where the Aij element is always the reciprocal of the Aji element. That is, if the first criterion is preferred over the second criterion by a factor of four then the A12 element of the matrix is three and the A21 element is 0.25. The principle eigenvector of the matrix is then calculated and normalized. This eigenvector represents the complete set of the relative importance of the criteria. This results in a dependable, mathematically rigorous, quantitative approach that overcomes the complexities and difficulties inherent in measuring unlike elements and delivers a system that can be trusted and relied upon by managers (Saaty, 1996). For a more complete discussion of the process of AHP see Saaty (1994). Harker and Vargas (1987) provide a discussion of the inherent theoretical strengths and weaknesses of AHP.

METHODS

Using the AHP to structure the Balanced Scorecard system into a single measure requires the decision maker first structure the problem as a hierarchy. The elements of that hierarchy are then prioritized by responses to questions about the dominance, or importance, of one element over another (Liberatore & Miller, 1998). The first, and perhaps most creative, step in the AHP process is structuring the problem as a hierarchy. A useful approach is to start with the goal and decompose it into the most general and easily controlled factors at the simplest or most basic level possible. The decision maker then works back up through the hierarchy starting with the simplest sub-criteria that must be met and combining the sub-criteria into generic higher level criteria until the various measurements are linked in such a way that comparisons between unlike elements are possible (Liberatore & Miller, 1998).

Many possible criteria can be measured or subjectively judged in association with any position. Abernathy (2001) suggests the measurement design process start with ideal criteria and then compromise from this ideal to develop criteria that can be defined and rated. An example of the types of criteria that may be used is provided in Table 1; it has been divided according to outcomes, behaviors, and competencies.

Although the number of possibilities that may be easily considered dramatically increases with use of an AHP software package, it is suggested that the number of performance measures be limited for other reasons. When the number of measures expands beyond those few essential to measure the most important criteria, the added complexity tends to cause the focus of the process to be misdirected from the importance of the criteria being measured to the measurement process itself. The number of measures is best left to fewer than twenty-five (Horngren, Foster & Datar, 2000, p. 446-453). The criteria used should mirror the real world as closely as possible, and should measure "as though the participants are franchised or in business for themselves." (Abernathy, 2001) Also, care should be taken only to measure controllable outputs. This means that broad measures that are affected by events out of the control of the subject should be avoided (Abernathy, 2001). Many of the measurements listed above can relate to inter-group interaction, but perhaps this is not enough. Care must be taken so that subjects do not optimize their own output at the cost of the organization as a whole. The addition of specific "linked" measures may help aid cooperation. If the performances of two subjects affect each other greatly, cooperation can be increased by including one or more of one group's criteria as a measurement of the other group. For instance, if Group A supplies parts to Group B, one of Group A's criteria could be the percentage of the time Group B has enough available stock on hand (Abernathy, 2001).

Not all subjects should be rated in every way by every one who could rate them. For instance, the outcome criteria of a work or service team can be measured, but due to the "tight interdependencies among team tasks" the individual members should only be measured according to their behaviors and competencies (Scott & Einstein, 2001). In addition, undue focus on the individual can undermine the performance of the group by encouraging finger pointing and discouraging cooperation (Abernathy, 2001).

Although every subject could be measured for the same criteria, the weight assigned to these criteria will be different between the individuals and groups participating. If it is determined that the weight of a particular criteria for a given subject is near zero, that criteria should be left off the evaluation. A group of key stakeholders, including managers, members of the personnel department, financial and operational experts, information technology professionals, and employee representatives must be involved in the design of the evaluation system. Their expertise is needed to refine the hierarchy to match the job requirement for each subject (Olveira, 2001). Although an agreement must be reached as to the exact weights for each subject's criteria, some general guidelines are available in the existing literature. Subjective or qualitative and objective or quantitative criteria must be present and weighted to be balanced appropriately. In other words, one should not concentrate entirely on outcomes or behaviors; both quantitative and qualitative criteria should be used where appropriate (Abernathy, 2001). Remember that not all effects are under the control of the subject. The weaker the link between effort and performance, the more weight should be placed on qualitative criteria. Qualitative criteria should also be more heavily weighted when there is a great need for organizational citizenship and teamwork. If a team works primarily independently of the rest of the organization, quantitative criteria that are more results oriented would be weighted heavier (Peiperl, 2001).

The balance of this paper provides an example of the performance evaluation of one hypothetical individual who functions as a member of two teams, and works independently. The performance evaluation is developed using AHP. What follows is a detailed example of use of AHP to link the performance evaluation based on the Balanced Scorecard to the overall mission and objectives of the organization. A panel brought together for this purpose and consisting of various key stakeholders in the organization would make almost all of the decisions made in this section. Throughout the example, these decisions were made concerning one specific hypothetical subject and made by the authors for illustrative purposes only. The reader should keep in mind that although one particular software program was used in this example, the same process might be accomplished using other software, and that the illustration would be equally valid.

At this point, several words of caution are in order. As the number of objects to be compared increase, the number of pair-wise comparisons necessary to rank them rapidly becomes unwieldy. In addition, it is not just necessary that the comparisons rate one of each pair of choices as more important than the other, a determination must be made as to what degree one is more important. This may be accomplished a number of ways, including calculation with a hand-held calculator, spreadsheet software, or math software. However, AHP software is available to facilitate the comparison and ranking operations of the AHP as well as providing the numeric solution. Calculations for this example were done using Web-HIPRE, a publicly available Web based software package provided by the Systems Analysis Laboratory of Helsinki University of Technology (Web-HIPRE, 1998). Other software packages are available that do the same calculations. One such package is Expert Choice. Expert Choice is a commercially available package that it is used by a large group of major companies (Expert Choice, Inc., 2000). These software packages makes it possible to easily manipulate a large number of variables, keeping track of comparisons, rankings, and weights. They also provide measures of the consistency of the judgments and allow complete sensitivity analysis.

Using Web-HIPRE is relatively simple, and help is available for most topics of concern. First, the various elements of the hierarchy are placed in the working area with the leftmost column corresponding to the first level of the hierarchy (Goal of the AHP) and the next column consisting of the elements from the second level of the hierarchy (Subject Functions), and the third consisting of the classifications of evaluator and so on (Figure 2: Hierarchical Relationships). Once the elements are located in the correct columns, they must be connected to depict their relationships. All of the elements in the second column representing functional areas for which the subject is being evaluated should be connected to the element in the first column. Then the elements in the third column representing the various evaluator classifications should be connected to the appropriate elements in the second column. For the individual subject in this example, the evaluation by customers is only applicable to the subject's function in Team A. In fact all the evaluator groups in the third column are deemed important in varying degrees to the individuals responsibilities as a member of Team A. Management's evaluation of the subject is considered important in all three of the subject's functional areas, but the evaluation provided by team leaders is applicable to the subject's function in Team A and Team B, not the subject's function as an individual. The process continues for each classification of evaluator. Next, the various elements in column four that represent the individual evaluations will be attached to the appropriate element in column three. Note that this might represent individuals that will fill out evaluations for the subject or it might represent the score on an examination or a survey. In this example there are two individual customer scores by which the subject will be rated. These could represent actual evaluations performed by individual customers, or they could represent the results of data collected by observing the subjects interaction with customers. One manager score will provide the evaluation for management. Again, this could represent a single evaluation by the subject's supervisor, or it could be a composite score derived from several sources. Leaders of each team provide input into the team leader evaluation of the subject. Thus, all the individual evaluations are linked to a class of evaluator. When this process is complete, the three elements in column five representing outcomes, behaviors, and competencies, will all be attached to the elements in column four to which they are considered important. Not every individual evaluator will evaluate the subject on outcomes, behaviors, and competencies all three. For instance, in this example, the subject is evaluated by two co-worker evaluations; however, the co-worker evaluations evaluate only behaviors and competencies, not outcomes. Similarly, the leader of the dependent team does not evaluate the individual on behaviors or competencies, only on outcomes. This recognizes that all three types of criteria are not important to all evaluators. Next, the elements in column six representing each performance measure will be attached to the appropriate element in column five.

[FIGURE 2 OMITTED]

Once these relationships are established, the weighting procedure is performed to compare the importance of each of the element in its relationship to the other elements to which it is attached. Beginning, in the first column, the user is prompted by the menu to answer a series of questions as to "how many times more important" one element is as compared to another. This individual subject is being evaluated as a member of two teams and as an individual. This subject's function as a member of each of the teams and as an individual was compared in a pair-wise comparison of the type, "Team A is how much more or less important that Team B?" By using a mechanical indicator and weighing the relative importance of each of the functions against the other the software program leads the user through the process to compare the three functions. The user is prompted to complete this comparison process for each set of elements or criteria. When this process is complete, the program automatically generates the matrices and eigenvector needed to accomplish AHP. The weighting factors for each of the sets of elements or criteria as generated by WebHIPRE are displayed with the element or criterion in Figure 2. In this example, note that the weight for the subject's function as a member of the two teams and as an individual are 0.55, 0.24, and 0.21 respectively. This means the 55% of the evaluation of the subject is related to being a member of Team A, 24% is related to being a member of Team B, and 21% is related to functioning as an individual. Users of the program may choose to enter these percentages directly instead of going through the pair-wise comparison process, if they so desire. Continuing the illustration, the 55% of the subject's evaluation that is related to Team A is composed of evaluation material from all the evaluator classes in column 2, but only select evaluator classes provide information for the portion of the evaluation that is related to Team B and to the subject's work as an individual. At this level, the user could not simply enter percentages because the importance of each class of evaluator is weighed in its relationship to each function.

The software program leads the user to compare individual elements representing evaluator classes in relationship to the other evaluator classes that provide input for each function. For instance, note that each of the six classes of evaluator provides input to the performance evaluation of the subject's function as a member of Team A. The user is led to determine the importance of each evaluator class's contribution to every other. The program automatically offers each pair-wise comparison for consideration. The user might begin by indicating that the importance of the opinion of the customer and the manager are equal and that the opinion of the customer's input should be considered to be twice as important as that of the team leader and so on until comparisons are made between each possible pair of the six elements or criteria. The program will then calculate and assign weights to each of the six as they relate to only Team A. These six weights total to one and represent the percentage of weight given to each evaluator class in the evaluation of the subjects performance as a member of Team A. The user then moves on to the next element or criterion, in this case Team B, and is led through the process of comparing the importance of each of the four evaluator classes that provide input into the evaluation of the subjects performance as a member of Team B. Again, the user may eschew this comparison process and enter the percentages directly if so desired. Using the matrix algebra process described above, the software program considers that each evaluator class provides different levels of input to multiple elements at the proceeding level and assigns the relative weight that should be given the opinion of that evaluator class. The resulting score such as .096 for Team Member means that 9.6% of the weight of all the evaluator classes should be assigned to the evaluation of team members. This determination is made considering the relative importance of team member evaluation to the subject's function as a member of Team A and Team B and the relative importance of the subject's function in each team to the subject's overall performance evaluation. This process is repeated for each level down to the lowest level to which the evaluation is being decomposed. In this case, the lowest level consists of the eleven criteria in the final column of Figure 2. When the process is complete, the software package provides the user with the weights for this lowest level, which will then be used in scoring of the individual subject's performance.

This process is, admittedly, time consuming. However, many parts of the hierarchy will be similar for classes of individuals and could be duplicated. Once this process is complete a performance evaluation template can be constructed for each individual and the process would not need to be repeated unless there were substantive changes in the subjects job description.

The weights as generated for each of the individual job performance criteria (Figure 1 and Figure 2) have been incorporated into the weight column in Table 2. After the weights have been determined, goal values must be entered as a standard of measure. For instance if the individual evaluator is rating the subject on a scale of 1 to 10 where 10 is the highest evaluation for this criterion, then the goal value would be 10. On the other hand, for another criterion that has a quantitative observable outcome, such as a defect rate or success rate, the goal value might be 0 (number of defects) or 100 (percentage of good parts). This goal would be compared with the actual results for that criterion. In order to determine the index for each criterion, the difference between the achieved value and the goal value is determined. This is then divided by the goal value and one is added. This number times 100 is the Weighted Score. The formula to accomplish this calculation is:

Index = 100 * (actual - goal / goal + 1)

In situations where the object is a low goal number rather than a high goal number the following alteration is necessary:

Index = 100 * ((100 - actual - (100 - goal) / 100 - goal + 1)

The Weighted Score for each criterion is then summed to provide the total score for the subject. If the subject accomplishes 100% of all set goals, the score is 100.

The labor-intensive task of constructing this goal template must be done only one time for each subject. A spreadsheet program was used to construct the template reproduced in Table 2 and subsequent evaluations would require only changing the numbers that represent the actual results.

In this example, the subject performed at a level slightly lower than the goal, but perhaps more important than the overall ranking are the individual indices. A glance at the various indices shows which areas need work, and which areas are being performed at or below goal levels. However, using the calculations described here, note that if scores above the goal level were entered it would result in a score that represented over 100% on those criteria. Users who wished to limit the performance rating to 100% would limit the score to the goal level for any criterion on which the subject performed at a level above goal.

RESULTS AND DISCUSSION

Although this example illustrated the performance evaluation for an individual the same process can easily provide performance ratings for groups or even the business unit as a whole. Use of AHP in the evaluation process provides the requisite level of theoretical rigor and internal consistency to assure reliability of the measure. In addition, even though the process is somewhat involved, it is doubtful that it would require more time or consideration than using a less meaningful calculation.

Although this example was performed using the publicly available Web-HIPRE, this particular software package would be appropriate for testing the process only. An organization that implemented this type of evaluation system would be better served purchasing AHP software. Because of the way Web-HIPRE is administered, all calculations are performed and the resulting data is stored on computers maintained by the Systems Analysis Laboratory of Helsinki University of Technology. Much of the information that is measured during the evaluation system is sensitive should not be transmitted outside of the organization.

REFERENCES

Abernathy. (2001). Secrets to success with balanced scorecards. HR Focus 78(10), October: 3.

Forman, E., Saaty, T., Selly, M. & Waldron, R. (1983). Expert Choice, McLean, VA, Decision Support Software, Inc.

Expert Choice, Inc. (2000) <http://www.expertchoice.com/>. Accessed 140203.

Goldstein, J. (1998). The case for learning styles. Training and Development, 52(9), September: 36.

Gambus, A. & Lyons, B. (2002). The balanced scorecard at Philips electronics, Strategic Finance, 84(5), 45-49.

Harker, P. & Vargas, L. (1987). The theory of ratio scale estimation: Saaty's analytic hierarchy process, Management Science 33(11), 1383-1403.

Horngren, C., Foster, G. & Datar, M. (2000). Cost Accounting: A Managerial Emphasis, Upper Saddle River, NJ: Prentice Hall.

Kaplan, R. & Norton, D. (1992). The balanced scorecard--Measures that drive performance. Harvard Business Review, 70, January-February, 71-79.

Liberatore, M. & Miller T. (1998). A framework for integrating activity-based costing and the balanced scorecard into the logistics strategy development and monitoring process, Journal of Business Logistics, 19(2), 131-55.

Olveira, J. (2001). The balanced scorecard: an integrative Approach to performance evaluation, Healthcare Financial Management, 55(5), May: 42-47.

Peiperl, M. (2001). Getting 360 degree feedback right, Harvard Business Review, 79(1), January: 142-47.

Saaty, T. (1994). How to make a decision: the analytic hierarchy process, Interfaces 24(6), November-December: 19-43.

Saaty, T. (1996). The analytic hierarchy process, Pittsburg, PA: RWS Publications.

Scott, S. & Einstein, W. (2001). Strategic performance appraisal in team-based organizations: one size does not fit all. IEEE Engineering Management Review, 4th Quarter: 7-15.

Web-HIPRE. (1998).M <http://www.hipre.hut.fi/>. Accessed 140203.

Martha Lair Sale, University of South Alabama

Table 1: Types of Criteria

OUTCOMES BEHAVIORS COMPETENCIES

Accurate Accepts criticism Adaptable
Continuous learning Adaptable Appraisal
Customer service Altruistic Collaboration
Job skill development Big picture oriented Communication
Neatness Conscientious Conflict resolution
Process improvement Cooperative Coordination
Professional appearance Courteous Delegation
Punctuality Cynical Goal setting
Rework required Dependable Identify needs of others
Work quality Detail oriented Knows when to seek help
Work quantity Good sport Leadership
Work timeliness Helpful Long range planning
 Listens Organizational
 Team player Planning
 Problem solving
 Time management

Table 2: Individual Subject Performance Score

 Goal Actual

Customer Rating on Customer Survey 100% 94%
Work Quality Percentage of Good Output 99% 97%
Work Quantity Percent of On-time 95% 92%
 Completion
Job Skill Hours of training 50 50
Professional Rating (1 to 10) 9 8
Conscientiousness Rating (1 to 10) 10 8.5
Flexibility Rating (1 to 10) 9 7
Consideration of Rating (1 to 10) 9 9
Others
Planning and Goal Rating (1 to 10) 9 9
Setting
Conflict Rating (1 to 10) 9 8
Resolution
Collaboration Rating (1 to 10) 10 9.5

Raw Index Weighted
 Score Score

Customer 94.00 0.073 6.86
Work Quality 97.98 0.105 10.29
Work Quantity 96.84 0.186 18.01

Job Skill 100.00 0.151 15.10
Professional 88.89 0.032 2.84
Conscientiousness 85.00 0.120 10.20
Flexibility 77.78 0.069 5.37
Consideration of 100.00 0.053 5.30
Others
Planning and Goal 100.00 0.067 6.70
Setting
Conflict 88.89 0.081 7.20
Resolution
Collaboration 95.00 0.065 6.18

 Total Score 94.05

Figure 1: Hierarchical Levels

Hierarchical Level 1: Goal of the AHP

Determine the relative importance of various measures of the Critical
Success Factors in following the strategy of this organization.

Hierarchical Level 2: Subject Functions

Team A Team B Individual

Hierarchical Level 3: Evaluator Class-Groups

Manager Team Team Other Co- Customer
 Leader Leader Team worker

Hierarchical Level 4: Individual Evaluator

Appraiser 1 Appraiser 2 Appraiser 3 Et cetera

Hierarchical Level 5: Classifications of Measurement Criteria

Outcomes Behaviors Competencies

Hierarchical Level 6: Performance Measurement Criteria

 Customer Professional Development Planning and Goal
 Orientation Setting
Work Quality Conscientiousness Conflict Resolution
Work Quantity Flexibility Collaboration
 Job Skill Conflict Resolution
 Development