Australian Journal of Psychology Vol. 26, No. 1, 1974, pp. 9-14

(With a post-publication addendum following the original article)



University of New South Wales

Balancing of one-way worded scales such as the D scale has been sought by several authors. So far only the Ray (1970) BD scale has appeared successful. It is here examined whether the 1970 success can be replicated and also whether a new scale can be produced which will be suitable for use with community samples -- not only with students. The suitability of the 1970 "Mark I" version of the BD scale for use with students was confirmed and a new "Mark II" BD scale was produced which was suitable for general use. An attempt to construct an eclectic scale using the best items of Mark I and Mark II was not successful.

The problem of acquiescent set has inspired many attempts to produce balanced versions of the F (Adorno et al., 1950) and 'D' (Rokeach, 1960) scales. See, for instance, Peabody (1961), Stanley & Martin (1964) and Haiman (1964). The poor success that for long attended such efforts inspired some writers to conclude that the scales concerned were "irreversible" (Christie, Havel & Seidenberg, 1956). Ray (1970 & 1972b) has, however, shown that this is not so -- by producing balanced Dogmatism (BD) and balanced F (BF) scales that showed both satisfactory reliability and high correlations between their oppositely-worded halves.

Unlike the BF scale mentioned above, however, the BD scale was normed entirely on a university student sample. This does raise the question of its suitability for use on non-university samples. Ray (1970) does report one application of the scale to a non-university sample where the reliability obtained did drop substantially and where the correlation between the two halves of the scale dropped to a level little better than that found by previous authors.

This raises the possibility that the results obtained with the university sample reported on in Ray (1970) were in some sense "freak" results and, as such, not likely to be replicated. There is in any case a clear need to see whether it is at all possible to construct a BD scale which is suitable for general, as well as student, use. It is proposed to examine both these questions in the present paper.

Initially, it must be reiterated that there is almost always a drop in reliability recorded when a new scale is administered to samples other than its own norming sample. This is because an item analysis contains no way of distinguishing "error" correlation (due to sampling variability) from "true" correlation and must in fact maximize the inter-item correlation in the group of items to be chosen, regardless of the origin of that correlation. In a subsequent administration of the scale so constructed, however, the error component of the correlation has of course changed and correlation from this source is no longer maximized by the set of items used. The major impact of this phenomenon, then, is that one always seeks some replication before considering a freshly constructed scale suitable for regular use --the hope always being that although some drop in reliability is to be expected, it will not be great.


In this study, the 1970 BD scale in its original format was administered by Dr J. Martin of Macquarie University to students in first, second and third year psychology courses. The results were also collated and prepared for computer anaysis by Dr Martin. Having someone other than the original author test the scale hopefully meant that "Rosenthal effects" due to experimenter expectations were reduced. In spite of the 1970 results, Dr Martin was quite sceptical about the possibility of a balanced 'D' scale.

The total of 177 first, second and third year students were sampled roughly in proportion to their numbers in the overall university popluation. They filled out the scale in class time.

Analysis of these results revealed a correlation of .40 between the two halves of the scale and a reliability ("alpha") of .81. As expected, this was less than the original (1970) findings of .71 and .91 respectively but still leaves a quite satisfactory scale. It might also be noted that some of this drop could be due to the fact that the second sample was different from the first in being much more widely based. It may then be regarded as confirmed that the Ray (1970) BD scale is suitable for use with students. The original results were not entirely "freak" findings. This gives some encouragement to the search for a balanced Dogmatism scale that will be applicable to the general population.


One characteristic of the above-mentioned BD scale that might be unsatisfactory to some users was that its negative items were entirely new constructions (new in both direction of wording and in content) -- not merely reversals of Rokeach originals. Since Ray (1972b) has shown that a balanced F scale using items that were merely reversals of the originals could be produced, it seems also possible that a BD scale of this sort might be produced. The scale reported on below is of this type. More importantly, however, the present study is an attempt to build a new BD scale from scratch using the responses of a community (as distinct from a student) sample. The second study of Ray (1970) had shown that the original BD scale was unsuitable for such a sample.

The data for this study was in fact gathered at the same time as the data used for the production of the balanced F scale. It is being presented here separately for the sake of thematic unity. Full details of the sampling etc. are to be found, therefore, in Ray (1972b) but it may briefly be said that the sample was a heterogeneous one obtained partly by house to house calls and partly by enlisting evening student volunteers. The evening students are of course an older and more varied group than the day students used on previous occasions. Total n was 120. The corpus of proposed negative items was placed before the 40 Rokeach original items in the questionnaire. For most positive items, there were at least two candidate-reversals -- many of which were drawn from the work of Peabody (1961) and Stanley & Martin (1964 ). A lengthy preamble was used (see Ray, 1972b for the wording) to disarm criticism among the respondents of the rather strange-sounding questionnaire that resulted.

Analysis of the data was carried out by correlating each negative item with the Rokeach scale. The 40 highest correlating items were then selected and combined with the Rokeach items to produce a new 80 item balanced scale. Each of these items was then correlated with the total score on the same scale and the 20 strongest negative and the 20 strongest positive items selected. The 40 items resulting did, however, show some content overlap in that on some occasions two reversals of the same original item had survived the selection process. Where repetitiouness of this sort was obvious, it was eliminated by deleting the least highly correlating item of such pairs and replacing it with another negative item from the 80 item scale -- i.e. by the 41st highest correlating item and so on.

The 40 items finally selected then formed a perfectly balanced scale with a reliability of .83 and a correlation between the oppositely-worded halves of .32. For the purposes of replication, this scale was then administered, in its new reduced format, to the Introductory Psychology class at Macquarie University (n = 180). A reliability of .84 and a correlation between the halves of .34 were observed -- a finding representing extremely close replication of the results obtained on the norming sample. This new BD scale was then named the "BD scale, Mark II". See Table 1 for the items.


The first 20 items are reverse scored. Five response for each item (SA, A, ?, D, SD).

1. If we are going to have free speech we must defend the right to be heard of even those we disagree with.
2. Man is master of his own fate and captain of his destiny.
3. If people in one's own group are always disagreeing among themselves that is probably a rather healthy sign.
4. There is no such thing as "the Truth".
5. We must find happiness in the present because no one can predict what the future will be like.
6. No one has a "mission in life" that he must accomplish no matter what.
7. Eat, drink and be merry -- for tomorrow we may die.
8. The "one true faith" is a myth.
9. The way to happiness is to get involved in the things going on about you.
10. There is never one right answer for any question.
11. Man has within himself the power to control his destiny.
12. In general most people show consideration for others.
13. It is not worth sacrificing your life to became a hero.
14. It's possible to really live without believing in any great cause.
15. Life can be meaningful without devotion to ideals or causes.
16. All of the philosophies which exist in this world have some truth in them and probably not one is totally correct.
17. In these present days everyone should look to their own happiness.
18. It is never necessary to be on guard against ideas no matter where they may originate.
19. Truth is so elusive that no one can say when he has it.
20. I think none the worse of a person for being concerned chiefly with pleasure.
21. Man on his own is a helpless and miserable creature.
22. Fundamentally, the world we live in is a pretty lonesome place.
23. Most people just don't give a "damn" for others.
24. I'd like it if I could find someone who would tell me how to solve my personal problems.
25. It is only natural for a person to be rather fearful of the future.
26. A person who thinks primarily of his own happiness is beneath contempt.
27. Unfortunately, a good many people with whom I have discussed important social and moral problems don't really understand what's going on.
28. Most people just don't know what's good for them.
29. The main thing in life is for a person to want to do something important.
30. If given the chance I would do something of great benefit to the world.
31. A man who does not believe in some great cause has not really lived.
32. It is only when a person devotes himself to an ideal or cause that life becomes meaningful.
33. Of all the different philosophies which exist in this world there is probably only one which is correct.
34. A person who gets enthusiastic about too many causes is likely to be a pretty "wishy-washy" sort of person.
35. To compromise with our political opponents is dangerous because it usually leads to the betrayal of our own side.
36. When it comes to differences of opinion in religion we must be careful not to compromise with those who believe differently from the way we do.
37. In times like these, a person must be pretty selfish if he considers primarily his own happiness.
38. The worst crime a person could commit is to attack publicly the people who believe in the same thing he does.
39. It is annoying to listen to a speaker or teacher who seems unable to make up his mind about what he really believes.
40. For most questions there is only one right answer once a person is able to get all the facts.

On neither sample, however, did this Mark II scale show a correlation between its halves that could be described as high. It did seem at least theoretically possible that better results could be obtained. For this reason a yet further attempt was made to achieve such results.


In this study a more eclectic approach to item selection was reverted to. In this study, the best (highest correlating) negative items from both previous scales (based on analyses of the data obtained from the community samples) were combined with the 20 positive items of the Mark II scale. Whatever the outcome of this study, it seemed a necessary one to examine the possibility that the somewhat disappointing results obtained with the Mark II scale might have been due to the restriction placed on it of using as negative items only reversals of Rokeach originals.

The sample used to test this third BD scale is again more fully described in Ray (1972b) but briefly it was obtained by sampling blocks in the Sydney metropolitan area and then attempting to interview one person from each house in the block. The final n was 118. With this data, the correlation of each item with the total score on the combined scale was found and the highest correlating negative items selected. A 34 item scale (17 negative and 17 positive items) was found to be the most reliable -- with an alpha of .82. The correlation between the two halves was only .27.

The Mark III scale is then no advance over the Mark II scale. Its construction was important in showing, however, that reversals of Rokeach items are not inferior to negative items of entirely new construction. In fact, they appear, at least with general population samples, to be superior to entirely new items.

The Mark II scale is then the scale most suitable for general use. Even with student samples, it is little inferior to the Mark I scale reported in Ray (1970). The correlation between its two halves (.32) is not ideal but does at least offer something to the research worker who wants to use general population sampling and, while so doing, eliminate the effects of response set. People who intend to do research only with students, however, would be best advised to use the existing Mark I scale.

In evaluating whether the correlation between the halves of the Mark II scale is satisfactory or not, it is extremely relevant to examine the findings reported in Ray (1972a). It is there shown that scales outside the authoritarianism/dogmatism area also show correlations between their halves which can be quite low. Low correlations between the halves of balanced scales are a general problem. Set in the context of the findings reported in Ray (1972a), in fact, the Mark II BD scale shows up as quite satisfactory -- as quite within the normally to be expected range.

The reason for this generally low correlation is not, of course, far to seek. Acquiescent set alone would be sufficient to explain it. Acquiescent set causes different items to be responded to as if they were similar. Items are agreed to regardless of their content. Thus acquiescent set will be causing a positive and a negative item to be responded to as if they were similar while their explicit content will be causing them to be responded to as if they were opposed. The outcome of the two opposing forces is a tendency towards orthogonality -- which is in fact what we usually observe.

Balanced scales then may reduce acquiescence but they do not of course eliminate it. What they do is to ensure that acquiescers are classified along with the indifferent or the genuinely non-polarized. They are not classified or confounded with genuine high scorers on the attribute purportedly being measured.

One could, of course, argue that the present results merely confirm that balanced scales in any field are not really possible. Correlations of .3 to .4 are just "too low" in some absolute sense. In evaluating such a claim, one must be clear about the purpose for which such a correlation is "too low". It is most certainly not "too low" to justify a claim that acquiescence artifact has been eliminated. In fact, any balanced scale, no matter what the correlation between its halves, eliminates acquiescence artifact completely! Any balanced scale ensures that acquiescers do not automatically get high scores regardless of any other property of that scale.

Why then might one require high correlations between the two halves of such a scale? As an indication of validity. We design all our items to measure one thing. If two subsets of our items fail to correlate, how can we have any confidence that either measures what we think it measures? The two halves of the Mark II BD scale do, however, correlate highly significantly, so some demonstration of validity in this sense has been accomplished. Since we can also assume that these results are affected by acquiescence, this validity demonstration must also be regarded as representing only some sort of minimum. Because of the masking effect of acquiescence, not all the validity of the scale can be demonstrated simply by using a single correlation coefficient.

One remaining question is the extent to which the Mark II scale covers the same content area as the Rokeach original. Does its content cover the same aspects that Rokeach originally had in mind (differentiation, time perspective etc.)? Obviously not. Several of Rokeach's items did not survive either in their original or in a reversed form. Nonetheless, the correlation observed between the Mark II scale and Rokeach's 40 item original unbalanced scale was .694 on the second sample mentioned above. Considering that scores on the Rokeach original were confounded with acquiescence, this is unequivocal evidence that there is a strong common component in what the two scales measure.


It has been confirmed that the Mark I BD scale is suitable and satisfactory for use with students and shown that the Mark II BD scale is suitable for use with general population samples. If an all-round acquiescence-free scale is required, the Mark II scale may be used.


Subsequent articles germane to the matters discussed above are:

SCALE FORMAT: Replication is one of the cornerstones of science. A new research result will normally require replication by later researchers before the truth and accuracy of the observation concerned is generally accepted. If a result is to be replicated, however, careful specification of the original research procedure is important.

In questionnaire research it has been my observation that the results are fairly robust as to questionnaire format. It is the content of the question that matters rather than how the question is presented (But see here). It is nonetheless obviously desirable for an attempted replication to follow the original procedure as closely as possible so I have given here samples of how I presented my questionnaires in most of the research I did. On all occasions, respondents were asked to circle a number to indicate their response.

