This article was written in 1989 for the academic journals but was not accepted for publication


J.J. Ray

University of New South Wales


The many severe problems inherent in the use of forced-choice scales are briefly summarized. Such problems mean that prudent researchers should avoid use of forced-choice scales. To do this they need alternatives to popular forced-choice scales. Two new scales for measuring aspects of sensation seeking are described and reliability, validity and general population norms for them are reported -- based on a random sample (N = 200) of the Australian population.


The recent paper by Johnson, Wood & Blinkhorn (1988) is merely one of many papers from psychometricians (e.g. Ray, 1973; Tenopyr, 1988) that have pointed out in increasingly dismayed tones that the use of forced-choice scales results in highly problematical measurements. Forced-choice scales have, however, long been popular among those doing psychological research so it is surely sad if results derived from them are as "spurious" as Johnson et al say they are. It may be, however, that Johnson et al understate the case. Let us in passing look briefly at some of the problems mentioned in the literature that Johnson et al did not touch on or touched on only lightly.

Before we do so, however, one small point of clarification seems needed. There are two major types of scale that are generally referred to as "forced-choice". The first is where a choice is made between items of two different scales and the second is where the items chosen between purport to be at opposite poles of the same concept. It is scales of the latter type that seem most common (e.g. Rotter, 1966; Bass, 1967; Berkowitz & Wolkon, 1964; Christie & Geis, 1970; Zuckerman, Eysenck & Eysenck, 1978) and they are therefore the principal subject of this paper. They are probably more correctly referred to as "polar choice" scales.

The superficial attraction of forced-choice format seems to be that they enable us to 'force' from people information that they would not otherwise express. Yet is that not chimerical? In our research we usually rely totally on the voluntary co-operation of our subjects. The idea that we can "force" people to do anything under such circumstances is therefore inherently rather implausible. Even where participation in research is compulsory (as with Army conscripts) subjects very readily find ways of not giving information -- by using zig-zag response sets and the like. Ray (1974, Ch. 43 Study I) gives some idea of the extent of this problem. "Forcing" people to make a choice may just lead to evasion. So no information will be gained where some otherwise might have been. In a word, validity will be reduced -- just the opposite of what was intended.

As some confirmation that this actually happens, it might be noted that Ray (1973) details an instance where converting a scale of task-orientation from forced-choice to Likert format gave greatly improved validity and Ray (1980c) reported that a Likert scale was more valid as a measure of achievement motivation than was a well-developed alternative instrument in forced-choice format.

There is also evidence that popular forced-choice scales can have negligible internal consistency. This has been shown for at least the Zuckerman et al (1978) Sensation-Seeking scale (Ridgeway & Russell, 1980), the Shostrom (1964) Personal Orientation inventory (Ray, 1984b & 1986; Hattie, 1986) and the Christie & Geis (1970) Machiavellianism scale (Shea & Beatty, 1983). If the items of a scale show little or no tendency to correlate with one another it is difficult to see that the scale can be valid in any sense.

Ray (1973) also sets out at some length details of other problems with forced-choice format. Briefly, these are the difficulty of equating the alternative choices for social desirability and the impossibility of routinely examining how "opposite" the alternatives in fact are.

This latter problem might not matter much if it were easy to write items that were "opposite" in import. In fact it is far from easy (Christie, Havel & Seidenberg, 1956). As has repeatedly been shown, the supposedly "opposite" items of popular forced-choice scales turn out on examination to be almost entirely orthogonal (e.g. Ray, 1973; Gatz & Good, 1978; Kleiber, Veldman & Menaker, 1973; Ray, 1983b). Such scales are, then, greatly lacking in construct validity. They do not meet the most basic assumption inherent in their use. The items demonstrably mean something other than what they purport to mean.

With a Likert scale, by contrast, such problems can at least be routinely examined. One can find the correlation between the positively worded and negatively worded subscales on any occasion when the scale is used. As Ray (1983a) has shown at some length, this is important. Items that are perceived as of opposite import by one sample on one occasion can very often be perceived as completely unrelated by other samples on other occasions. Perceived meanings at the item level seem to be quite unstable and any assumption that two statements will routinely be seen as opposed is quite untenable. We cannot reasonably make such an assumption on any occasion of a scale's use. We must be able to examine the degree of opposition between statements. We cannot assume it. Forced-choice scales do assume it, however.

The remaining major problem with forced-choice scales also stems from between-sample variability. An obvious precaution taken by most constructors of forced-choice scales is to equate the choices in terms of social desirability. As has been shown elsewhere (Orvik, 1972; Ray, 1973), however, items that are equally desirable for one group on one occasion are often not equally desirable for another group on another occasion. As further confirmation of this, Kestenbaum (1976) has shown that the paired items of the forced-choice Rotter (1966) Locus of control scale are poorly matched for social desirability. Under such circumstances, the scale will become at least in part a measure of social desirability rather than a measure of what it purports to measure. Ray (1973) documents the great distortions in validity that this can cause.

Sensation seeking

The review (of what is found in the literature on the problems inherent in forced-choice scales) that is given above is only a cursory and incomplete one but it surely suffices to strengthen yet further the Johnson et al (1988) contention that such scales are fundamentally problematical. Why, then, are they still widely used? One reason why many researchers use demonstrably faulty and essentially uninterpretable forced-choice scales is probably that good alternatives to them do not exist. One instance of this would seem to be the measurement of sensation seeking -- where the forced choice scale by Zuckerman, Eysenck & Eysenck (1978) is widely used. Most issues of the journal Personality and Individual Differences seem to contain at least one study that used this scale. As would be expected from the general considerations presented above, however, this scale has been found to have most unfortunate psychometric characteristics. Ridgeway & Russell (1980) reported quite unacceptably low reliabilities for the various sub-scales (alphas as low as .44) and the supposedly related sub-scales were found to have very little relationship at all. Given this combination of a scale that is at once widely used and at the same time quite unsuitable for use the provision of alternatives to it must be seen as at least worthwhile.

Fortunately the provision of an alternative scale in Likert format is rather easy. Ray (1984a) reported a study wherein two scales in Likert format were used to measure experience-seeking. In Zuckerman's schema (Zuckerman, Eysenck & Eysenck, 1978), experience-seeking is one of four components of sensation seeking but on any analysis it should be at least a major part of sensation seeking.

Of these two scales, only one has had some details (not including a full item listing) of it given in print. The publication where details were given, however, (Wilson, 1973) has little or no circulation outside Australia and the version of the scale used in Ray (1984a) was in any case a revised version. It would now seem opportune, therefore, to give full and previously unavailable details of both scales.


The first scale was initially constructed by taking the positive items of the Zuckerman experience-seeking sub-scale and adding to them a further set of items of hopefully opposite import. The two types of item were then administered as a single Likert scale. This scale was then analyzed to find items which correlated poorly with the scale total and such items were deleted from further consideration. The remaining items formed a new scale with a reliability (alpha) of .78.

As the sample used for the study was a random sample (N = 200) of the Australian population (See Ray, 1980a & b and Ray, 1984a for fuller particulars) it is also possible to present some norms for the new scale. The mean observed was 26.40 and the S.D. was 8.07.

Also included in the study was the Wilson (1973) Experience-seeking scale. This was included as a thematic balance to the modified Zuckerman scale. The Zuckerman scale tends to focus on experiences of a rather counter-cultural sort (such as Marijuana use) whereas the Wilson scale focuses on experiences offered by the consumer society. This scale was also item-analyzed and had weak items removed.

The resultant scale had a reliability of .73 and its mean and S.D. were 36.07 and 6.82. The items of both scales are given in the Appendix.


It would probably be generally most informative if the two scales were always used in conjunction with one-another but were scored separately. Different sub-sets of experience-seeking could well have slightly different correlates. As the two scales were, however, found to correlate .35, it would be perfectly reasonable to treat all the items from both scales as forming one single scale of experience- seeking in general. The greater length of such a scale would also give considerably enhanced reliability (alpha).


As both scales presented here are attempts to improve on existing scales that already had some claims to validity, the expectation should be that the improved scales would indeed have improved validity. This can be examined in a fairly preliminary way by looking at their correlation with other scales to which they should theoretically have some relationship. Also included in the test battery with the two modified scales were early forms (See Ray, 1980a & b for details) of the Eysenck Impulsiveness and Sociability scales. These scales originated as the two main factors of extraversion and Eysenck appears to claim (see Eysenck & Zuckerman, 1977) that extroverts are sensation seekers. The present work would seem to vindicate that claim. The modified Zuckerman scale was found to correlate .32 with the Impulsiveness scale. This is against the background that the Impulsiveness scale and the Sociability scale themselves correlated only .40. Even the Sociability scale showed some correlation (.22) with the modified Zuckerman scale. Both correlations are higher than the negligible correlations generally found (Eysenck & Zuckerman, 1977; Corulla, 1988) when the forced-choice Zuckerman scale has been used.

Although the present scales offer a distinct improvement over their forced choice predecessors, they are, of course, a beginning rather than an endpoint. They need to be joined by other Likert scales that tap other types of sensation-seeking (e.g. thrill and adventure seeking) before really comprehensive research into sensation-seeking can begin.


Adapted Zuckerman Experience-seeking scale:

1. If you have a happy home there is not much more you need in life. R

2. Happily married people should never feel lonely. R

3. If I can choose I always prefer to stay in a motel or hotel when on holidays rather than camp out. R

4. I wouldn't change my job unless it is absolutely necessary. R

5. I would like to hitchhike across the country.

6. I have tried marijuana or would like to

7. I would like to try some of the new drugs that produce hallucinations.

8. I would like to make friends in some of the "far out" groups like artists and hippies.

9. I would like to meet some persons who are homosexual (men or women)

10. I would like to have new and exciting experiences even if they are a little frightening, unconventional or illegal

11. I often enjoy flouting irrational authority.

12. I sometimes like to do crazy things just to see the effect they have on others.

Revised Wilson Experience-seeking scale:

1. I prefer the look of stainless steel and plastic to the look of leather and woven fabrics. R

2. Trying out new products is usually a waste of time. R

3. I enjoy many types of foreign food.

4. I find it more pleasant to watch TV in the evenings than to make the effort and go out to a show. R

5. I prefer friends who are exciting and unpredictable

6. One way I like to entertain friends is by having dinner parties.

7. I like to eat at new and strange restaurants.

8. I like to drink wine with my meals.

9. I like modern Swedish-style furniture

10. I am always ready to try new and different products that come on to the market.


Each item is responded to on a five-point scale scored from 5 to 1. Answers are given by circling numbers, as follows: "Yes --- for sure" 5; "Yes with reservations" 4; "Can't say either way" 3; "No, not really" 2; "Definitely No" 1. The number circled is the item score except in the case of the seven items above marked "R". To get the item score for these, the number circled must be subtracted from 6.

Replication is one of the cornerstones of science. A new research result will normally require replication by later researchers before the truth and accuracy of the observation concerned is generally accepted. If a result is to be replicated, however, careful specification of the original research procedure is important.

In questionnaire research it has been my observation that the results are fairly robust as to questionnaire format. It is the content of the question that matters rather than how the question is presented (But see here and here). It is nonetheless obviously desirable for an attempted replication to follow the original procedure as closely as possible so I have given here samples of how I presented my questionnaires in most of the research I did. On all occasions, respondents were asked to circle a number to indicate their response.

