Personality Study and Group Behaviour, Vol. 7, No. 1, January, 1987. Pp. 68-70.


J. J. Ray

University of New South Wales

ABSTRACT: Although it is a widely-held notion that "Yes-No" questions in survey research yield information of very dubious accuracy and usefulness, such questions are nonetheless widely used. It is held that such questions can yield highly valid information if used in aggregated form. An aggregate of questions designed to measure 'Directiveness' was given to a community sample of Australians and the same people were assessed on a variety of criteria by people who knew them. Aggregate ratings and aggregate peer-ratings were found to correlate .79, indicating a very high degrees of validity for the self-reports. In the context of previous such results it is concluded that the usefulness of self reports can be great if certain simple precautions are taken.

It seems to be a common view among psychologists that there are several limitations to questionnaire research in general and to multiple-choice questions in particular. It is thought that questions requiring 'Yes-No' answers are inherently ambiguous and incapable of portraying the true complexity of what any one interviewee might want to express. It is held that the answers to such questions must inevitably have very little validity i. e, the information so obtained will be highly dubious.

On the other hand, psychologists have always made extensive use of such questions in their research so one must suspect that such questions can have some usefulness; But how much usefulness? Standard psychological personality inventories such as the Jackson (1967) 'PRF' have had extensive tests done to assess their validity and many of the validity coefficients mentioned in the manual for the inventory are as low as .3. In other words, only 9% overlap has been found between the view of reality provided by the inventory and the view of reality obtained from independent sources. This does suggest highly unsatisfactory validity for the inventory.

This is, however, a very superficial view of what is found in validity studies of self-report scales. Psychologists very seldom use single questions to assess any given attribute of a person, on the grounds that variables as complex as (say) conservatism need at least as many questions to survey them as there are aspects to the variables. A complex concept needs to be measured in complex ways. Many questions therefore will ordinarily be needed in order to get an accurate picture of where any one individual stands on some particular issue. In other words, psychologists normally take little interest in the answers to individual questions and instead look at the overall standing of a person on an aggregate of questions (such aggregates normally being referred to as "scales"). The same logic, however, should surely guide our assessments of the validity of such scales. A scale measuring a general trait cannot be expected to be a strong predictor of just one particular sort of behaviour. Rather we should expect it to be a strong predictor of a wide range of behaviours. In other words, complex criteria are needed to assess the validity of complex scales. If we want to make a judgment of the overall validity of a scale we must use methods that enable us to make overall judgments. To observe that scale Y predicts individual behaviour item X only to a small degree could be quite consistent with a further finding that scale Y predicts behaviours of the class X quite well. A demonstration of this is offered below.


A recent paper by Ray & Lovejoy (1986) is one of those that would appear at first sight to show poor validity for self-reports. Correlations between the Ray Directiveness scale and peer ratings of around .5 are reported, suggesting only 25% shared variance between scale and criterion. In fact, however, such an impression would be misleading.

If we combine the six most predictive criteria (ratings) into an additive scale, we find that we now have a criterion with a reliability (alpha) of .82. We now have a multi-item criterion to match the complexity of our multi-item scale. Only now, therefore can we answer a query about the overall validity of our self-report scale. The answer is that the self report and rating scales correlate .64. After correction for attenuation due to the imperfect reliability of both scales (Guilford, 1954, p. 400), even this correlation rises to .79. We have thus shown a nearly two thirds overlap between self-reports and the validity criterion.

Even this result could presumably have been improved (via partial correlation) if control for social desirability responding had been attempted. Be that as it may, however, it is evident that self-reports can be very much more valid than they at first appear. The main thing required is that both our self-report scales and our validity criteria match in complexity the concepts they purport to measure. Single, isolated self-report items may indeed be very problematical but carefully constructed and reliable multi-item scales do substantially overcome such problems.

Further evidence in support of that view is to be found in Epstein (1979 & 1980), Fishbein & Ajzen (1974), Green (1978), Guilford (1954), Jaccard (1974), Moskowitz (1982) and Cheek (1982).


