Australian Psychologist Vol. 9 No. 1 March 1974, pp. 44 - 49.


John J. Ray

University of New South Wales

Given that there is a certain commitment to radicalism and questioning of conventional assumptions among behavioural scientists, it must come as no surprise when assumptions and practices in psychology itself come under critical scrutiny. One such very strong trend of late (Holmes, 1971; Gorsuch & McFarland, 1972; Taylor, Ptacek, Carithers, Griffin & Coyne, 1972) represented in this journal by Stanton (1972) has been to question the use of multi-item personality or attitude scales in favour of simply asking the subject to rate himself on the trait concerned. It has been shown that such self-ratings can be made reliably and because they are much simpler and quicker it is contended that they are to be preferred to conventional scales.

To date however the question of comparative validity appears to have been slighted. It is true that the multi-item format has as one of its justifications that it offers enhanced prospects of test-retest reliability. This is nonetheless not the sole point in favour of multi-item scales. An important reason for their use is to improve the validity of the measurements made. There are several reasons why greater validity is to be expected of multi-item scales: (a) The trait concerned may be differently defined or understood in common usage from the way psychologists conceptualize it (e.g. "Neuroticism"). In fact it is very often the case that a psychologist's definition of the construct is precisely "That which underlies (or is general to) the particular group of symptoms (items) listed in this particular personality inventory." This would be true of most factorially derived scales (such as Eysenck's N and E scales). Where this is so, a single self-rating logically cannot possibly replace an entire personality inventory. (b) The trait concerned is often one with clear social desirability or undesirability status. Again "neuroticism" is a good example of an attribute that very few people would wish to ascribe to themselves. The people who might be loath to describe themselves as "neurotic" however are much less likely to be reticent about reporting that they "often worry about their health" or "suffer from sleeplessness". Multi-item scales are, then, less transparent measures of the attributes concerned. (c) It is often simply harder to rate oneself in terms of a global trait such as ` extraversion" - much harder in fact than reporting whether one "likes lively parties" or "likes going out". Personality inventories enable one to be more concrete and precise when answering: A single self rating requires us to abstract and generalize in circumstances where we must often wonder just what would count as an instance of the trait concerned. (d) Multi-item scales represent a repeated estimate technique - with less reliance on one occasion of measurement. With the "one-hit" trait self rating, there is no possibility of controlling for random error effects.

For all the above reasons, then, we should not rush to abandon our multi-item scales. But what about empirical practice? Are the theoretical claims just made for the greater validity of multi-item scales supported in fact? An examination of this question is reported here which uses need for achievement as the key variable. This variable has not previously been examined in studies of this nature and it does offer the advantage of being a readily intelligible trait name - one that can be introduced without need for complicated technical definitions.


The study was carried out as part of a teaching project by thirteen students in a second year Sociology course. Each student administered to subjects selected by themselves a questionnaire containing both attitude scales designed to measure achievement motivation and a request (positioned without preamble immediately after the attitude scales) to the respondent worded simply as follows:

Finally, we would like to get you to rate yourself on your own need for achievement. Would you say that your need for achievement was:

1. Very weak
2. Weak
3. Average
4. Strong
5. Very strong

Note that this question was hence answered in the context of all the preceding scale items - which formed, among other things, an operational definition of the construct concerned.

Additionally, the student in each case completed ratings of the person filling out the questionnaire in terms of various attributes associated with the need for achievement construct. Because it was intended to use these ratings as the prime criterion for the comparative validity of the indices contained in the questionnaire, the student-experimenters were instructed to choose their subjects from among those people they knew (preferably non-students) whom they felt able to rate in terms of achievement motivation. They were also instructed to choose as far as possible a group of subjects among whom there were very large contrasts in apparent achievement motivation - from very low to very high. The actual n achieved in this way was 75 - very close to the planned 78. The student raters did of course have the benefit of extensive discussion of what was meant by achievement motivation, but no single arbitrary formula was laid down as the definition.

The use of peer ratings as a criterion of validity is of course a practice widespread in industrial and other applied psychology fields and has been championed generally by Hollander (1957) and Titus (1968). It has the effect of treating the rater as an accumulating data bank on the person being rated which can be tapped to provide a picture of how that person behaves in general (as distinct from how he might behave in one or two isolated experimental situations).

Anonymity was guaranteed to the respondents by providing each student with university envelopes in which the completed questionnaire was placed and sealed as soon as it was handed back. To match up the rating sheets with the questionnaires however, students either covertly kept track of which envelope was which or pre-inserted the completed rating sheet in the envelope.

The scales used to measure achievement motivation were the 30 item Ray-Lynn "AO" scale (Lynn, 1969; Ray, 1970 & 1975) and the 10 item Costello (1967) Task orientation scale.

The ratings completed by the students that are of relevance to the present work were of the following eight attributes: task oriented, lackadaisical, need for achievement, achievement oriented, competitive, hard worker,- leisure oriented, success oriented. The peer rating of primary importance was of course "Need for achievement" - since it provides the most direct comparison with the self-rating. The other ratings were made, however, out of consideration for the important, but frequently ignored, fact that the reliability of the criterion is at least as important as the reliability of the predictor. Comparisons of reliable predictors with a criterion that is itself unreliable are pointless indeed. One way to estimate the reliability of our criterion is to do precisely what we do with attitude or personality scales - gather repeated estimates. When this is done, the normal reliability coefficients (such as Cronbach's (1951) "alpha") can be computed. See also Ray, 1972. All eight ratings therefore were summed to provide a criterion called ORAM (Overall Rated Achievement Motivation).


The correlations of the Ray-Lynn and Costello scales with peer rated need for achievement were respectively .323 (p < .01) and .243 (p < .025). By comparison, the Self-rating correlated only .190 with the peer rating - which is on the very borderline of significance at the .05 level (one-tailed). The reliabilities of the two scales (as estimated by the "alpha" formula) were .74 and .75 respectively. The reliability of the ORAM criterion similarly estimated was .85. The correlations of the two scales with ORAM were .390 and .392 respectively. In comparison, the self-rating correlated .325. All three of these correlations are significant at the .O1 level. See Table 1.


Correlations between three measures of achievement motivation and criterion


1. Ray-Lynn...........................1.00...... .64...... .58....... .32...... .39..... .33....... .35
2. Costello..........................................1.00...... .42....... .24..... .39..... .26........ .34
3. Self-rating...................................................1.00...... .19...... .33..... .40........ .24
4. Peer-rating..............................................................1.00...... .66..... .31....... .40
5. ORAM..................................................................................1.00.... .23....... .20
6. Status (Congalton).........................................................................1.00....... .62
7. Manual/non-manual Occup..........................................................................1.00


Clearly, the theoretical expectation has been borne out. Multi-item scales do consistently provide an increment in validity. Against the most direct criterion this increment was very substantial indeed. Against the more general criterion, although there was an increment, it was perhaps surprisingly small in magnitude.

Of perhaps equally great interest is the prediction afforded of theoretically related variables - in this case occupational status of the respondent. Although not really a criterion of achievement motivation, occupational status attained is something that we would expect to be affected by achievement motivation.

In the present study, occupational status of all respondents was scored using the Congalton (1969) scale and the correlations found were: Ray-Lynn scale .330, Costello scale .262 and Self-rating .395. It should be noted however that previous studies have suggested (Ray, 1971) that the Congalton scale is of only poor validity and that the simple dichotomization of occupation into manual versus non-manual is a much more powerful predictor of theoretically class-related variables. Scoring occupation then as manual = 1 and non-manual = 2, the correlations observed became .347, .336 and .242 respectively. This reinstates the multi-item scales as being also in this case more valid.

Comparison aside, it may be noted that all the correlations reported were in absolute terms quite small. This should not be taken as representing the maximum validity of the measures. There is error in both the criterion and the predictor variables and it may in fact be particularly difficult for raters to assess as private an attribute as need for achievement. There is also the problem that not all respondents are rated by the same rater and different raters may disagree on what is an instance of a person with high need for achievement. For all these reasons it must be said that low correlations in absolute terms do not necessarily indicate seriously deficient validity. The important thing is to show that a variety of measures purportedly of the same thing do show convergence.

In conclusion, then, it must be said that the single trait self rating has fared surprisingly well in terms of validity. As a quick form of measurement it will no doubt make possible many studies where the use of multi-item scales would have been difficult. It is equally clear, however, that greater validity can be obtained with multi-item scales and these consequently should continue to be used where possible.

The present conclusion cannot of course be strictly generalized beyond the trait studied here - need for achievement. It is notable that in the present study, neither the scales nor the single self rating correlated significantly with a short form of the Marlowe-Crowne (1970, See Greenwald & Satow) Social Desirability Scale also included in the questionnaire. Where the trait is Social Desirability loaded (e.g. "Neuroticism), the single self rating might have to be totally precluded.


CONGALTON, A.A. Status and Prestige in Australia. Melbourne: Cheshire, 1969.

COSTELLO, C.A. Two scales to measure achievement motivation. Journal of Psychology 1967, 66, 231-235.

CRONBACH, L.J. Coefficient alpha and the internal structure of tests. Psychometrika 1951, 16, 297-334.

GORSUCH, R.L. & McFARLAND, S.G. Single vs. multiple item scales for measuring religious values. Journal for the Scientific Study of Religion 1972, 11, 53-64.

GREENWALD, H.J. & SATOW, Y. A short social desirability scale. Psychological Reports 1970, 27, 131-135.

HOLLANDER, D.S. The reliability of peer ratings under various conditions of administration. Journal of Applied Psychology 1957, 41, 85-90.

HOLMES, D.S. Conscious self-appraisal of achievement motivation: The self peer rank method revisited. Journal of Consulting & Clinical Psychology 1971, 36, 23-26.

LYNN, R. An achievement motivation questionnaire. British Journal of Psychology 1969, 60, 529-534.

RAY, J.J. (1970) Christianism.... The Protestant ethic among unbelievers. J. Christian Education, 13, 169-176.

RAY, J.J. (1971) The questionnaire measurement of social class. Australian & New Zealand J. Sociology 7(April), 58-64.

RAY, J.J. (1972) A new reliability maximization procedure for Likert scales. Australian Psychologist 7, 40-46.

Ray, J.J. (1975) A behavior inventory to measure achievement motivation. J. Social Psychology 95, 135-136.

STANTON, H.E. Rating or inventory: A Comparison of two approaches to personality measurement. Australian Psychologist 1972, 7, 33-39.

TAYLOR, J.B., PTACEK, M., CARITHERS, M., GRIFFIN C. & COYNE, L. Rating scales as measures of clinical judgment III: Judgments of the self on personality inventory scales and direct ratings. Educational & Psychological Measurement 1972, 32, 543-557.

TITUS, H.E. F. scale validity considered against peer nomination criteria. Psychological Record 1968, _18, 395-403.

