The Journal of Social Psychology, 1981, 113, 85-93.


University of New South Wales, Australia



An attempt is made to construct an achievement motivation inventory with the use of single words rather than sentences as items. A set of 54 such items was given to a random doorstep sample of 145 Sydney adults. Ten of the 54 items were found to form a new scale with a reliability of .80 which correlated .59 with a conventional scale. The new scale was, however, all-positive (no anti-achievement items). To overcome this, a new set of negative items was administered with the new scale to a sample of 87 Sydney people (representative on age, sex, and education). Ten negative items were found which overall gave a correlation of -.46 with the all-positive scale. The reliability for the combined 20-item scale was .87 and it predicted the respondents' rated actual life achievement .405 compared with .170 for a conventional scale.


Perhaps one of the saddest episodes in social psychology so far has been the pronounced disagreement over the usefulness of projective tests in measuring achievement motivation. If mainstream opinion is any guide (e.g., 4, 7, 15) their unreliability probably means that all the great body of work done with such tests has gone for nought.

Projective tests do however still have their defenders (e.g., 6, 12) and they also have the one great attraction that from the beginning made them preferable to self-report tests: their lack of transparency, the fact that a person can potentially be tested without knowing what we want to know about him. Although self-report tests of achievement motivation go back at least as far as Murray (9), they were always held to be of less validity because of their openness to faking.

In spite of the transparency problem, however, it would seem at this stage that self-report tests might have a better claim to use because of their generally more satisfactory reliability. The transparency problem is, after all, common to most fields of personality and attitude measurement. It could be that self-report tests are all we have left [but see Ray (12)].

One possibility that might have enabled us to escape the horns of this dilemma to some extent was a new test format pioneered by Alpert and Sargent (1) and forgotten soon thereafter but which has recently been revived with great success by Wilson (17). In this format, the S replies not to long sentences as test items but rather to single words or phrases in isolation. Thus, instead of indicating agreement or disagreement with an item, such as "The death sentence is necessary in some cases and should be retained for certain criminals," the S simply says "Yes" or "No" to the single phrase: "Death sentence." Unclear though such an item might seem to be as a question, Ss do nonetheless seem to find the task highly meaningful and give consistent and easily validated replies. This format is referred to by Wilson as the "catch-phrase" format and by Alpert and Sargent as a "test of immediate emotional reactions." Both Wilson and Alpert and Sargent used it to construct scales of conservatism, and one incomplete bibliography (2) lists no less than 76 recent published papers using the Wilson scale.

So far, however, since the format has been used mainly for the measurement of conservatism, a very cogent objection to its wider use might be that it is in fact adaptable only to the measurement of such a highly general concept. Wilson (17) himself has presented a variety of evidence -- not all using his scale -- that conservatism is in fact the general factor in social attitudes (as intelligence is the general factor in abilities). One might feel that the vagueness of the items has to be matched by a variety of items for the scale to generate any discrimination at all. In some confirmation of this, the Wilson scale does have an unusually large number of items (50 items), suggesting that reliabilities might be unsatisfactory with fewer. This poses the question of whether there are enough words and phrases in the language to provide the number of items needed to measure less general concepts than conservatism.

If this difficulty can be overcome, however, the new format has obvious attractions for the measurement of achievement motivation. Like projective tests, it may draw on potentially deeper levels of the person than do normal self-report tests. Unlike projective tests, it does have satisfactory reliability. It is something of a half-way house that has to some extent the advantages of both projective and self-report tests with the disadvantages of neither.

Below is reported an attempt to construct such a test.


1. Method

Producing a set of candidate items for the new scale proved easier than expected. A total of 84 items were written to measure pro- and anti-achievement reactions. Because they were to be tested on a doorstep sample, however, the questionnaire had to be of limited length; hence, only 54 of the 84 items were selected for final inclusion. Also included was a self-report measure of achievement motivation in conventional behavior-inventory format, which was to provide concurrent validation for the new scale. The scale selected for this purpose was a short form of the Ray-Lynn "AO" scale, the only published scale validated for use in Australia but with an extensive range of validity confirmations (13). Additionally included to check on possible effects of common format was a short form of the Australian revision of the Wilson scale (10). As other findings in both Australia and the U. K. had suggested no relationship between achievement motivation and conservatism (14), any relationship observed in the present study would be due to common format rather than an intrinsic relationship between the things being measured. As further checks on validity, a range of demographic information was also secured from each respondent. It was felt that higher occupational status, higher education, immigrant origin, and the person's past or present leadership would indicate (among other things) higher achievement motivation.

The scale was administered by trained and supervised student interviewers to a randomly selected cluster sample of people reached at their homes in the Sydney metropolitan area. N was 145.

2. Results

The pool of candidate items was subjected to the automatic item comparison and deletion procedures of program ITRA (11). This program deletes one by one those items having least correlation with all other items. Reliability was calculated by Cronbach's (3) coefficient "alpha" which is the mean of all possible split-half reliabilities. The initial reliability of the unshortened item-pool was .70. The maximum reliability occurred at the 20-item length (.83). This was, however, an all-positive scale, since all anti-achievement items had been deleted. To correct this, the best 10 positive items and the best 10 negative items were selected out separately and combined to form a new balanced scale. This scale, however, had a reliability of only .71 and a correlation between its two halves of only -.128. More encouragingly, however, the 10 positive items treated as a short scale in their own right showed a reliability of .80.

The reliability of the 10-item short form of the Wilson C-scale was .66. The reliability of the 14-item short form of the Ray-Lynn "AO" scale was .79.

The 10-item all-positive scale was chosen as the one most promising for further use. It correlated .59 with the conventional format scale, .21 with occupation, .22 with education, and .26 with leadership background (p < .05). It did not, however, significantly predict immigrant origin (r = -.O5) or conservatism (r = -.O1). Higher scorers, then, were better educated, had higher status jobs, and had more often been leaders. On a number of criteria, then, the new scale appeared to have demonstrated predictive validity.


Although acquiescent response set has not generally been raised as a problem in the achievement motivation area and although concern with acquiescence is now normally limited to particularly ambiguous scales, it seemed desirable to preclude any possibility of such a problem with the new scale from the start (8). A new set of negative items to go with the best 10 positive items was thus called for.

Additionally, the validation so far available was of only a rather preliminary sort. Further validity studies seemed necessary. Of all the validation methods normally used, the one with intrinsically greatest generalizability seemed to be peer-ratings. Performance on laboratory tasks is always of unknown generalizability, but peer-ratings treat the rater as an accumulating data-bank about the person's everyday behavior over a long period.

1. Method

Sampling for validation studies is inherently more difficult. One has to find not only the respondents, but also the people who know them and are willing to rate them. The recourse adopted in this situation on the present occasion was a fairly usual one: Students were co-opted to give a questionnaire to people they knew and whom they could rate on the attribute in question. The single quota restraint of asking them not to use fellow students was imposed. It was also suggested that where alternatives were available they use people in the more humble occupations.

The key question that had been used for the new items in the previous study was, "Say Yes if the thing described is important to you and No if it is unimportant to you or you dislike it." This represented something of a departure from Wilson's (17) key question which read, "Which of the following do you favor or believe in?" It was felt that the original Wilson question might be more suitable for negative items than the one previously used in Study I. For this reason half of the new negative items were given under one such question and half under the other.

The new negative items themselves were obtained by seeking antonyms to the existing successful positive items. Altogether 43 negative and 10 positive items were administered.

The 15-item Williams (16) n-Achievement scale was also administered. This was the only other scale in conventional format which could be found that had been developed for Australian conditions. Although unpublished, it had been validated and seemed generally well-conceived. It was included, then, to extend the concurrent validity already given in Study I by the Ray-Lynn "AO" scale.

The students were asked to rate each person on eight attributes, all of which were carefully explained in advance. These were: "Task-oriented," "Success oriented," "Achievement motivated," "Fears failure," "Is actually successful," "Tends to boss others around," "Tends to accept direction from others," and "Needs to achieve." The first two correspond to a popular breakdown of the achievement motivation construct into intrinsic motivation versus extrinsic motivation; the third was a global measure of conscious motivation; the fourth taken with the second corresponds to another popular breakdown of the achievement motivation concept into positive and negative motivation; the fifth is perhaps the most important validity criterion of all and measures motivation by its outcome; the sixth and seventh correspond to the two main components of authoritarian behavior and were included on the grounds that there seemed some conceptual relationship between achievement motivation and authoritarianism; the eighth was a measure of motivation whether consciously acknowledged or not. It was a measure of motivation as inferred from behavior rather than as inferred from what the person openly acknowledged.

2. Results

A total of 87 people were interviewed and rated. All ratings were done unknown to the interviewee. The mean scores for the sample on age, sex, and education did not differ significantly from those of the previous sample. Occupation, however, showed a predictable bias towards being more non-manual.

Again subjecting the trial items to program ITRA, a balanced 20-item scale was automatically produced with a reliability of .87. The correlation between its negative and positive halves was a quite satisfactory -.461. All the new negative items came from those given under the key question most similar to Wilson's original one.

The Williams scale showed disappointing reliability. Even after five weak items were deleted, the reliability for the remaining 10 was only .55. The 10-item form was, however, the one used for subsequent analyses. As such, it correlated .529 with the new balanced scale and .545 with the 10 positive items alone. It did thus serve its purpose of demonstrating concurrent validity.

The new balanced 20-item scale consisted, of course, of the 10 positive items selected in Study I plus 10 new negative items (see Appendix). It did, as such, show generally superior validity characteristics. Its prediction of rated actual life achievement was particularly good -- a correlation of .405 compared to .243 for the all-positive scale and .170 for the Williams scale. Other correlations observed with the 20-item scale were as follows: .315 with occupation, .394 with rated success-orientation, .094 with rated task-orientation, .342 with rated achievement orientation, .254 with rated fear of failure, and .295 with rated need for achievement. With the use of a contrast-wise error-rate approach, correlations above .180 were significant.

Some of the correlations may appear low in absolute terms even though they are highly significant. They are nonetheless fairly characteristic of the correlations normally found in validity studies of this sort (5, 13). Their low magnitude presumably stems from the fact that motivation is intrinsically difficult to observe and rate; it is much harder to be certain of than is, for instance, actual achievement. This being so, it is encouraging that the scale correlated so much more strongly than did the other scales with rated actual achievement, particularly when one considers that actual achievement is itself only partly determined by motivation.

It was found that the new scale did not correlate at all with sex, indicating that it would be equally suitable for use with respondents of either sex.

The 10 positive items treated as a scale in their own right showed a reliability on this occasion that was actually higher than that observed earlier (.86). The means and standard deviations obtained with the scale of 10 positive items were 25.30 (4.39) in Study I and 26.05 (4.70) in Study II. In Study II the 20-item balanced scale statistics were a mean of 50.13 and an SD of 7.78.


With remarkable uniformity, new psychological tests seem to be constructed, provided with norms, and validated on groups of students. The respondents are seldom even a sample of students. Major personality inventories, such as the Jackson (5) "PRF," do not appear to this day to have ever been administered to a general population sample that even attempted to be random. Thus, although the sampling procedures used in the present studies were extremely modest, they are certainly not worse than those generally employed. They offer, in fact, some grounds for believing that the new test might be generally applicable and that it is reliable and valid.

Scales constructed in one English-speaking country generally show some loss of reliability when deployed in another country, but on the present occasion this effect could well be quite small. Given the extremely basic English of the new format, it is in fact difficult to imagine just what could be non-transferable to (say) U. S. or British culture. Nonetheless, other users would probably be advised to check its split-half reliability on the first such occasion of its use.

As a third alternative to the two existing measures, the new scale could well be routinely included in future studies of achievement motivation. Since its all-positive short form of 10 items takes only about one minute to administer, the extra check so provided requires little effort and might also make possible its use in situations which might otherwise have been difficult (e.g., in street surveys).

There can of course be no claim that the new scale should forthwith supplant existing methods of achievement motivation measurement. It is put forward merely as a third way that may have promise. Like all compromises, it offers the prospect of ascertaining a little of everything. Like some compromises, it could turn out to be worse than either of the alternatives. The validation reported above does clearly limit the risks involved in use of the new scale but only further research and more extensive experience will be able to tell whether it in fact represents an improvement over existing methods.


The New "Catchphrase" Achievement Motivation Scale

In the following list of words and phrases answer "Yes" or "No" by circling the number under that heading. We want to know whether you agree with or approve of the thing concerned.


1. Weakness........................3.....2......1..............6. Dabbling.................3......2......1
2. Aimlessness......................3.....2......1..............7. Incompetence.........3......2......1
3. Doodling..........................3.....2......1..............8. Muddling through....3......2......1
4. Living from day to day.....3.....2.......1.............9. Putting things off
................................................................................until tomorrow.......3......2......1
5. Failure.............................3.....2......1............10. Being unprepared
................................................................................for things...............3......2.....1

The next list of words and phrases is a bit different. Here we want to know whether or not the thing concerned is important to you personally.


11. Incentive.........................3.....2......1............16. Achievement.........3......2.......1
12. Goals..............................3.....2......1............17. Promotion.............3......2.......1
13. Ambition.........................3.....2......1............18. Getting results.......3.......2.......1
14. A career..........................3.....2......1............19. "Getting on"..........3.......2.......1
15. Success in life..................3....2......1.............20. Competition..........3.......2.......1

The scale score is obtained by addition of the circled numbers. For items 1 to 10, however, a "3" must be counted as a "1" and vice versa.


