The Journal of Social Psychology, Volume 123, Second Half, August 1984. Pp. 279-280.

The Effect of Collapsing Response Categories on the Balanced F Scale


School of Sociology, University of New South Wales

THERE IS LITTLE UNIFORMITY of practice regarding the number of categories allowed to respondents when attitude scale items are administered. The authors of the California F scale (1) used seven (Strongly agree, Agree, Agree some, "?," Disagree some, Disagree, Strongly Disagree), but five categories (omitting "Agree some" and its negative) are also common. Particularly with behavior inventories, simple "Yes," "?," "No" are also often used. It has been argued that there may be slight advantages in validity and reliability accruing from the numerous options (2) but the effect on the mean scores is problematic. A common way of comparing scale means from two different populations receiving different response options is to "collapse" all responses to their lowest common denominator. Thus all types of "Agree" response might be scored 3 and all types of "Disagree" response might be scored 1 . Different degrees of agreement (or disagreement) are thus treated as equivalent. "Not sure" in the above example would be scored 2. It is obvious that to carry out such a "collapsing" procedure could be potentially very distorting. Information is being lost and responses that are known to be different are treated as the same. Such a procedure is, therefore, merely an alternative to consider when the only other option would be to conclude that no comparison at all is possible.

It is necessary to consider whether or not collapsing is empirically misleading. Psychological data are usually analyzed by the use of parametric statistics even though they seldom meet the assumptions of such statistics perfectly: Studies such as those by Norton (3) have shown that the statistics are little affected by a wide range of violations of assumptions. A similar question might therefore fruitfully be asked about "collapsed" or "lowest common denominator" scoring of attitude scale items. How much in fact does this affect comparisons of means?

Initially, two administrations of the 14-item short form of the California F scale in balanced format (the "BF'' scale (4)) were compared. On both occasions the sample was a random doorstep sample of the Australian city of Sydney. The first sample (N = 95) received the scale with five response options and the second with three (N = 206) (5). The mean and SD on the second sample were 26.49 and 5.44 and the "equivalent'' mean on the first sample (i.e. after collapsed scoring of "agrees" and "disagrees") was 24.17 (SD = 5.83). The shorter response format produced more "Rightist" scores by roughly half of an SD.

As such a finding could be adventitious, it was retested by using different sampling. This time postal (mail-out) sampling was used. In the first sample" (N = 201), respondents throughout Australia received questionnaires. This meant that rural respondents were now included, and more conservative responding than in the first two (city) samples was hence to be expected. Five response options were offered. The mean' (in 3, 2, 1 scoring) was 27.57 (SD = 5.71), which was more than half an SD more Rightist than the five-option city sample. When, however, a second mail-out sample also including rural areas was done" (N = 172) using seven response options, a yet higher "collapsed" BF mean was observed -- 29.25 (SD - 5.29).

Thus samples which should give quite similar results in fact give very different results according to the number of response options originally offered. "Collapsing" scores to enable comparisons between groups receiving different numbers of response options thus appears to be a potentially seriously misleading procedure, at least with the F scale.



