Journal of Personality Assessment, 1979, 43, 638-643.



University of New South Wales


It is shown that Rorer's exoneration of the F scale from acquiescent response style contamination is dependent on the finding that various acquiescence measures fail to intercorrelate. When acquiescence is measured as the total score on adequate balanced scales scored without reversals, significant internal reliability is found. It is found, in fact, even with scales that are not particularly ambiguous. It is concluded that some scales are not responded to meaningfully by some people and if these people are not to be confounded with real high scorers, balancing against acquiescence is still needed.

Although it had a respectable prior history, the problem of one-way worded attitude and personality scales first rose to prominence in connection with the California F scale (Adorno, Frenkel-Brunswik, Levinson, & Sanford, 1950). As Christie, Havel, & Seidenberg (1956) found, it was difficult to produce versions of F scale items that correlated negatively with the originals. It seemed, therefore. that high F scale scorers might not be genuine authoritarians but merely "acquiescers". This led to a massive literature devoted to exploring the "acquiescent response set" problem.

In this situation, therefore, Rorer's (1965) paper claiming that the problem of "acquiescent response style" was a "myth" came as something of a bombshell. Pointing out that the various measures of "acquiescence" generally fail to intercorrelate, he concluded that although such a thing was possible, there had in fact been no demonstration that acquiescent responding to questionnaires independent of meaning had ever been of anything but the most trivial importance.

Although a bombshell, it was in fact a bombshell of a rather welcome kind. It enabled researchers to abandon worry about the effects of one-way wording in their questionnaires. It both rehabilitated the F scale and made future scale construction tasks less difficult. No doubt for this reason, research into acquiescence reported in the Iiterature fell off drastically thereafter.

An important paper that may therefore have been somewhat overlooked was the rejoinder to Rorer by Peabody (196b). Peabody conceded the general point that acquiescence was in general not a great problem but also made a case that "ambiguously worded" questionnaires such as the F scale were very much an exception. Peabody did not however produce any new evidence to support the relationship between ambiguity and acquiescence --- relying instead on existing research that had been criticized by Rorer. Since Rorer himself had accepted that such a relationship existed, Peabody concentrated on showing that the F scale was in fact ambiguous. This he did largely by quotes from the original authors showing ambiguity to be part of their intentions.

A serious consequence of Peabody's success in making a case that an ambiguously-worded scale could be an exception to the generalization proposed by Rorer was that one could never be sure whether the scale one was using was or was not seriously ambiguous. The carte blanche offered by Rorer could only be utilized if it could also be shown that one's scale was not an ambiguous one on each particular occasion of its use. The acquiescence genie was not to be so easily banished.

This was particularly so because no one offered any standard way of assessing ambiguity. It seemed something as difficult to examine as acquiescence itself. In these circumstances, almost all subsequent authors have simply turned a blind eye to the problem --- perhaps reasoning that another Rorer would probably come along to banish that problem too.

So far, however, this has not eventuated. What has eventuated are two apparently successful attempts to overcome the problem that started the furore in the first place. There have been two apparently successful 'balanced F scales' produced (Lee & Warr, 1969; Ray, 1972a). Thus it is now possible at last to examine F scale acquiescence and content independently. Peabody's claims can, then, be directly tested. If the balanced F scale (BF scale) is in fact a measure of acquiescence as well as content, it should show some reliability when scored for acquiescence only (i.e. without reversals). If when so scored it shows no acquiescence variance, Peabody's claim that ambiguous scales are an exception to Rorer's generalization may be safely disregarded.

This approach, then, moves the question on to a more micro level than that considered by Rorer (1965). Instead of looking at scales, we are looking at items. Instead of asking whether scales of acquiescence intercorrelate, we are asking whether there is within any given test a consistent tendency to acquiesce. With a successfully balanced scale such a question can usefully be asked and meaningfully tested.

This is so because we have with such a scale a warrant that the item meanings of the positive and negative items are in fact genuinely opposed. Responding to them as similar must then be almost entirely an outcome of non-meaningful acquiescence. The number of "yeses" given in answer to such a scale must measure acquiescence as such. Note that acquiescence so measured need have no relation to the substantive score on the scale. Both maximally high and maximally low substantive scorers would get the same (middling) acquiescence score. Similarly, both a maximally high and a maximally low acquiescence scorer would get the same (middling) substantive score.

While acquiescence scores of this nature have been used (e.g., Martin, 1964) on previous occasions, there are two important differences in their proposed use on the present occasions: The first, of course, is their being derived from a successfully balanced scale as opposed to the unsuccessfully balanced scales previously available and the second is that it is proposed to test the internal reliability of the acquiescence score. It is in fact a little odd that although we normally require reliability evidence for any scale score we use, acquiescence scores have been used in the past without such evidence.

The means proposed for evaluating reliability is Cronbach's (1951) coefficient "alpha". This is the most general of the group of formulas known otherwise as KR 20 and 21 or the Spearman-Brown correction. It is equivalent to the mean of all possible split-half reliabilities.

The alpha of the acquiescence-scored scale will then be used to answer the question: "Even if acquiescence is not in general a significant element in response to attitude and personality scales, is it a significant element in the response to this particular scale? Given Peabody's (1966) characterization of the F scale as one particularly likely to be affected by acquiescence, failure to find an acquiescence influence on responses to the items of a balanced F scale would provide the carte blanche for ignoring acquiescence effects in future that even ever Rorer's ( 1965) work was not in fact able to give.


In this study the Ray (1972a) balanced F (BF) scale was administered as part of a longer questionnaire to a random doorstep sample of 95 people in the Sydney metropolitan area, Australia.

The success of the balancing in the BF scale was confirmed. The positive and negative halves correlated -.651. Scored for content, the internal reliability ("alpha") of the scale was .87.

When scored for acquiescence (i.e. without reversals) the internal reliability was .32. While this is properly much less than the substantively scored reliability, it does show that some acquiescence was in fact present. Using Hoyt's (1941) approach to testing of the coefficient, the significant level (< .05) of alpha on the given occasion was .21. The observed value clearly exceeds this and is then statistically significant.

Clearly, then, Peabody (1966) was right and little comfort can be drawn from Rorer's (1965) generalizations as far as the F scale is concerned. There is a consistent tendency for F scale type items to attract acquiescent responding.

An interesting finding that emerged from an item-analysis of the acquiescence-scored scale was that ten BF scale items correlated negatively (after correction for overlap) with the total acquiescence score on that scale. Of these, nine were negative items. Clearly again then, the original F items do generate a disproportionate amount of acquiescence variance. Thirteen original items correlate positively with acquiescence but only five reversed items do. Peabody's inferences again stand supported. Original F items do attract a disproportionate amount of acquiescence.

A further interesting finding arises from the fact that the questionnaire also contained three other balanced personality scales -- measuring respectively dominance, achievement motivation and social desirability set. (See Ray, 1976, 1979b: Greenwald & Satow, 1970 for their items). When also scored for acquiescence, two of these scales correlated significantly with BF scale acquiescence. Social desirability items correlated .209 and achievement motivation items .262. This is contrary to the findings relied on by Rorer ( 1965) and suggests that there is a weak tendency for acquiescence to generalize even between scales.

One suggestion of Peabody's that was not supported was his suggestion that acquiescers were really the ignorant, the uneducated, and the ill-informed. There were no significant correlations between BF scale acquiescence and the four demographic variables of age, sex, occupation and education. Who in fact the acquiescers are must then be a question left to future research.


In this study it was desired to place the above findings in a larger context and to provide a partial replication of the previous results. To do this the data from Study lI of Ray (1976) were re-analyzed. In that study a different balanced authoritarianism scale (the Ray [1971] "Attitude to authority" scale) was applied to a community sample of 282 subjects. They also received the same dominance scale as that used above.

When scored for content the reliability of the two scales was .74 for dominance and .86 for authoritarianism. They correlated .058. The correlations between the positive and negative halves of each were respectively -388 and -.546 (before reversals).

When scored for acquiescence, the two scales showed reliabilities of .36 for dominance and .43 for authoritarianism. They correlated .132. While the correlation is actually significant at the .05 level, it is of course very low. Again, then, both scales generated substantial acquiescence variance but among largely different groups of subjects. Clearly, finding one sort of item ambiguous (or not meaningfully answerable) does not necessarily mean that different items will be so perceived. When acquiescence will appear is substantially unpredictable.


In this study it was desired to extend the generalizability of the findings so far by subjecting the BF scale to a cross cultural test. It was therefore included in a questionnaire administered to a random doorstep sample of 100 people in the Johannesburg greater metropolitan area of the Republic of South Africa. The scale was, however, used in a 14-item short form. A previous test of the same short form in Australia had shown reliability of .80 and a correlation between the positive and negative halves (rPN) of -.50 before reversals (Ray; 1979a).

The reliability observed was .65 and the rPN was -.213. The reversals did, then, substantially break down on the South African sample. It was notable that the seven original F items included were almost universally assented to while the negative items distributed respondents much more evenly.

When scored for acquiescence, the reliability of the BF scale was .42. Because this was only a slight rise over the Study 1 results, it would appear that some of the collapse of the BF scale on this sample must be attributed to a genuine lack of meaning opposition between the two types of item.

Whatever else the results show, however, they do show that the BF scale once again generates significant amounts of acquiescence variance.

It may be important, however, to consider what the standard is against which the rPN obtained in this study is judged inadequate. Generally agreed standards do not of course exist. The obtained correlation was in fact significant but similar correlations obtained by Christie, Havel, & Seidenberg (1956) were judged to indicate that the F scale was "irreversible."

A more rational approach to obtaining an expected value for rPN may be to "de-correct" alpha by applying the Spearman-Brown formula in reverse. We would thus obtain an estimated average split-half correlation by the simple formula: r = alpha/(2-alpha). Thus the particular split represented by rPN can be compared with an average split and the deviation caused by acquiescence alone can thus be displayed. This is in fact even a slightly conservative test because alpha is slightly deflated where acquiescence is present. This can be seen if it is realized that pos.-pos. and neg.-neg. correlations are inflated by acquiescence while pos-neg. correlations are deflated. Since the number of pos.-neg. correlations is given by the formula: nPos.-Neg. = m squared (where m is half the number of items in the test) and the number of correlations between similarly scored items is given by the formula nSimilar = m (m-1), the number of the latter will always be smaller (m - 1 having been substituted for m). In a test of usual length, however, this effect will be small.

Applying the reversed Spearman-Brown formula on the present occasion, then, gives an expected value for rPN of .48. The observed value is clearly less than this -- indicating substantial acquiescence effect.

Applying the same formula to the data of Study I gives an expected value of rPN of .77. Since the obtained value (after reversals) was .651, this indicates that some acquiescence was present even on that occasion. Thus both methods of testing for the presence of acquiescence give convergent results: obtain the alpha for the acquiescence-scored scale or examine the gap between r and rPN. Both methods indicate a less dramatic change in the amount of acquiescence present than the simple gap between rPN on the two occasions would suggest.

Clearly, however, although acquiescence was present on both occasions of' the BF scale's administration, the Australian sample gave much more satisfactory results on any criterion.

Finally, it may be of some interest to note that when the BF scale was scored for content, the difference between the mean in the present Study and the mean for the same 14 items in Study I was significant. South Africans were more authoritarian than Australians. The difference was not however great (Mean and SD above of 42.14 and 5.50 versus 37.42 and 8.20 earlier).


The "within scales" approach employed in the present paper (as opposed to the "between scales" approach by Rorer [1965]) has shown that the F scale does in fact generate significant acquiescence variance. Study III above also showed that this is not peculiar to the F scale. Two other scales explicitly designed from the beginning to be unambiguous rather than ambiguous also showed substantial acquiescence content.

This being so, it must now be held that no scale can in future be assumed a priori to be unaffected by response-set. The apparent carte blanche offered by Rorer (1965} must be regarded as substantially withdrawn.

This withdrawal does not mean that authoritarianism research must return to the quagmire of confounded measurements that it was in before Rorer's (1965) paper. Since then, empirical methods have been demonstrated for solving the problem of seIecting successful negative items so that future research can preclude acquiescence confounding by experimental controls, i.e., by the use of balanced scales.

The fact that the negative items which worked well in Australia did not also work weII in South Africa may simply mean that a different set of negative items will have to be selected for each country in which the F scale is to be used. Doing this is, however, a largely mechanical task (See Ray, 1972a). If, on the other hand, South Africa is a very atypical society (as seems likely) the collapse of the BF scale there may not have serious implications for its general usefulness and the reversed items it embodies may still be quite suitable for use in North America or Britain.

A possible alternative to the Ray (1972a) BF scale is the Lee & Warr(1969) balanced F scale -- a scale which has been successfully tested in Britain and the USA. It must however be noted that, unlike the BF scale, the Lee and Warr scale does not use solely original F items for its positive items and in fact only five out of its 10 items seem directly traceable to the F original.

What, then, in non-technical terms, are the broad implications of the above results for prospective users of authoritarianism measures?

The first implication, of course, is that one-way-worded scales must be avoided. Results obtained with such scales are uninterpretable. All three studies above showed that there is a consistent tendency for some people to say "Yes" in response to authoritarianism items -- regardless of the meaning or direction of the item concerned. Unbalanced scales will therefore confound such meaningless agreement with meaningful agreement.

The low correlation between the three measures of acquiescence shown in Study I confirms however what was Rorer's most basic point: That the acquiescer is not a "type." There are very few people who just go around acquiescing to anything. It is therefore impossible to claim that the tendency to acquiesce is a sub-trait or part of authoritarianism. There is no general tendency to acquiesce. Acquiescence is a highly specific response to particular stimuli. Different things about different scales make different people acquiesce. What the present studies have further shown however, is that acquiescence is still general enough to affect all the items of one scale in a consistent manner. Acquiescence is general enough to be a nuisance but not general enough to be useful.

As previous work had suggested (Ray, 1972b), Study II also confirmed that the F scale is not peculiar in being affected by acquiescence. Even scales written from the beginning to be unambiguous are also affected by it. The confirmation given in Study I that the F scale can successfully be balanced does however show that at least most F scale items are not in fact totally meaningless. The F scale does measure something other than acquiescence. What that may be is a question beyond the scope of this paper but it may be noted that Ray (1973) has advanced extensive evidence of both a historical and current psychometric kind in support of the view that the F scale measures nothing more than general social conservatism of a quite traditional kind. It is certainly not valid as a predictor of authoritarian behaviour (Ray, 1976). Valid scales are, however, available which were from the beginning constructed to control for the influence of acquiescence (Ray. 1976).

For both technical and substantive reasons, therefore, one may draw the perhaps depressing conclusion that valid research into authoritarianism has only just begun.


