The Journal of Social Psychology, 1984, 122, 105-119.


The University of New South Wales, Australia



Because of the widely-acknowledged validity problems with the California F scale, it seems important to explore what alternative methods are available for the measurement of authoritarianism. Thirty-seven scales are reviewed which offer alternative conceptions to the F scale. Although uneven in quality, some seem to offer considerable potential for future research use. An account of recent attempts to produce a balanced version of the F scale is also given which concludes that this problem has now been substantially solved.


A recent book by Altemeyer (3) has given a very comprehensive review of the literature on authoritarianism which uses as its measuring instrument the California F scale (1). Like others before him (29, 41, 59, 82, 83), Altemeyer concludes that the F scale is of highly dubious validity and that the whole Adorno et al. account of authoritarianism is "Not proven." Given the great volume of extant work with the F scale, this is surely the strongest possible argument for the view that an entirely new approach to the measurement of authoritarianism is needed if knowledge in the area is to advance.

Unfortunately, Altemeyer's own coverage of alternatives to the F scale is, even by his own admission, very sketchy. He offers a new measuring instrument of his own, but it is in many ways even less sophisticated than the F scale it is designed to replace. To fill the gap in his literature review (which also has the defect of being comprehensive only up to about 1972), a full listing of all available published scales said to measure authoritarianism or closely related constructs will be attempted here. There are many scales (e.g., 47) designed simply as alternative (often shorter) forms of the F scale. No attempt will be made to review these except for the single case of "balanced" F scales. These are versions of the original F scale wherein half of the items earn the respondent a low score on authoritarianism if he agrees with them; the importance of such scales in controlling against meaningless acquiescence is well-known (3, 11, 52, 61, 62).


The attraction of balanced scales is that they do not, in aggregate, "lead the witness." To get a maximum score on such a scale the respondent must answer very discriminatingly, agreeing with some items and disagreeing with others. The person who carelessly agrees with almost everything put before him, by contrast, will receive an intermediate score on such a scale -- indicating that he cannot be placed as either high or low on the attribute in question. With one-way-worded scales, by contrast, the same careless responder would be shown artificially as high on all attributes whatsoever. For a balanced scale to have construct validity, however, an elementary requirement is that the items of supposedly opposite meaning should also in general be responded to oppositely. There should be some evidence that a "yes" to a "pro" item is equivalent to a "no" to an "anti" item.

Early attempts to produce a balanced version of the originally one-way-worded F scale failed this test. The revised (reversed-meaning) items failed to correlate as expected with the original, unaltered items. Instead of correlating highly negatively, they tended to correlate hardly at all (11). Faced with this impasse, several authors (e.g., 6) turned to scales in "forced-choice" format. Positive and reversed F scale items were presented in pairs and the respondent had to indicate with which he agreed more. Such scales, however, embodied the assumption that both items in the pair measured authoritarianism. The respondent earned a low score on authoritarianism by agreeing with one and a high score on authoritarianism by agreeing with the other. Yet it was surely precisely the fact that the two items in such pairs did not usually have anything in common that constituted the problem. Use of forced-choice scales did not make the problem go away; it just made it unexaminable.

Given the widespread failure of a-priori balanced F scales, several authors conducted an extensive empirical search for reversed F scale items. Byrne and Bounds (9) seem to have been first in the field and their final scale is as recorded by Cherry and Byrne (10). Their approach was to test a large number of possible reversed forms of any one F scale item and accept for final use only that form correlating most highly with the original items. Regrettably, however, Cherry and Byrne failed to report the overall correlation between the positive and negative halves of their scale (henceforth, rPN). In response to a personal request, however, Byrne supplied some raw data derived from applying the scale to 57 male students and 112 female students. The rPN in these data turned out to be -.26 for the males and -.32 for the females. Such rs are not much better than the result which led Christie, Havel, and Seidenberg (11) to conclude that the F scale is "irreversible."

The next attempt in the same direction was that by Altemeyer (3) begun in 1968. He insisted that for each original F item a pair must be found, each of which was opposite in meaning but was identical to the other in other psychometric characteristics. This procedure enabled him to make a very exact estimate of how large the role of acquiescence was in F scale scores, but it unfortunately failed to produce a balanced F scale with the required high negative rPN. The rPN for the scale he finally produced was, in fact, nonsignificant.

Lee and Warr (33) also used empirical methods to produce what they called a balanced F scale, but as their final product had only five out of 30 items with content traceable to the original F scale, this is a rather tendentious claim.

The next attempt was by Ray (52). A very large number of candidate reversed items were correlated with the F original; those with the most negative correlations were selected and combined with the original F scale to make a new balanced scale. This was analyzed by item-total correlations and the 14 original items and the 14 reversed items showing the highest correlations with this total were selected to form the new scale. The result was a scale with all items traceable to the F original; but some original F items did not survive in any form, either original or reversed. Instead, some items appeared in two forms, both original and reversed. The rPN for this scale has been consistently high, the lowest so far found in its country of origin (Australia) being -.47 for a shortened 14-item form of the scale (64). It should be noted, however, that the scale collapsed completely when used in a non-Anglo-Saxon culture (66). Only the short form of the scale has so far been used in the United States, where an rPN of -.45 and a reliability (alpha) of .79 were observed on a general population sample (63). Those who must use the F scale, then, (perhaps for comparison purposes) could at least use it in this balanced form. Its items are listed in Ray (59).


The Fascism and Force scales of Stagner (79, 80) are examples of predecessors of The Authoritarian Personality which, nevertheless, has been given the status of a seminal work. Good descriptions of both Stagner scales are to be found in Eysenck (19). As their content refers primarily to prewar issues, however, they are now too dated for further use.

The Tough-mindedness Scale of Eysenck (18, 19) is also an early scale which at least partly predates the F scale. It has however repeatedly been attacked as seriously lacking in face-validity and unidimensionality (3).

The Traditional Family Ideology Scale of Levinson and Huffman (34) was essentially an extension of the California (1) work by one of the original California authors and appears to have been widely used. The scale assesses ideals of the family along a continuum ranging from the "democratic" family featuring maximum equality between members on the one hand to the autocratic family featuring a rigid hierarchy on the other. It assesses, in other words, authoritarian attitudes towards the family. It showed reliabilities as high as .84 but had the same one-way-wording problem as the F scale. It was shown to correlate .65 with ethnocentrism and .73 with the F scale. It is probably best seen as a special purpose version of the F scale.

The Military Ideology Scale contains items which are all concerned with service in the Air Force (22) and hence are of very limited applicability.

The Authority Acceptance Scale is reported to have allotted 56 of its 75 items to concern "the direct military situation of the recruit" (43); such items were designed solely for use in a military setting.

The Authoritarianism vs Equalitarianism Scale (43) has 16 forced-choice items and is also loaded with allusions to a specifically military life.

The Stereopathy Scale (81) has 100 items similar to the F scale but with a wider range of content. It has correlated highly with the California F, PEC, and E scales but is balanced against acquiescence. Oskamp and Thompson (46), however, have reported that its supposedly oppositely-worded items show a significant positive correlation. It would seem, therefore, to be completely lacking in construct validity. The short Stereopathy Scale by Lee and Warr (33), however, appears to have a satisfactory rPN, but it features direct political and religious items, making its distinction from a conservatism scale rather tenuous and clearly introducing an unmistakable artifact into any study involving political variables. The alternative short Stereopathy scale by Athanasiou (4) has shown an rPN as low as -.18.

The Pensacola Z Scale (27) appears to be the first approach to the measurement of authoritarianism that used a behavior inventory rather than an attitude scale. The 66 items are in forced-choice format and fall into four subscales of anxiety, hostility, rigidity, and dependency. Each item, however, was selected only after it had been shown to correlate with the F scale; therefore, any biases in the F scale should also be in the Z scale. The Z scale is then perhaps best regarded as a version of the F scale in personality-scale format. A reliability of .87 was reported, but the forced-choice format precludes any examination of the relationship between the "oppositely worded" items.

The Rational Authoritarianism Scale (75) follows Fromm in dividing authoritarianism into acceptance of rational authority and acceptance of irrational authority. The two are typified by the teacher-student and master-slave relationship, respectively. Martin and Ray (42), however, found that Rudin's positive and negative items in fact correlated positively so only more work on what is obviously an interesting concept could make the scale usable.

The Militarism Scale by Eckhardt et al. (17) seems rather mislabeled. It might as well be called a political conservatism scale. Its items tend to concern national military preparedness in the face of world Communism. Its reliability was only .62. Another scale of the same name by Ray (55) was balanced and highly reliable, but was designed for use with Army personnel only. It measures approval/disapproval of the Army and military life.

The Deference to Organizational Authority is a Guttman scale by Denhardt (14) oriented towards use in industrial psychology. It measures authoritarianism in job attitudes and hence could be of wide interest. With six items, a reproducibility of .905 was found but validation appears not to have been sought. The main question about the scale might be whether the reproducibility would be as high on other samples.

The Wilson C Scale (84) was originally designed as a measure of conservatism but it is also referred to as measuring the same construct as that measured by the F scale. Its content is certainly reminiscent of the F scale, though the items are cast into adjective check-list format. No validation of the scale's measuring authoritarianism as distinct from conservatism is given, but it is at least highly reliable and balanced. On some occasions, however, the balancing seems to break down (62).

Hogan's Symbolic Authoritarianism Scale (24) is a descendant of earlier "intolerance of ambiguity" scales. Its items focus on preference for symmetry in comparing pairs of graphic stimuli. It is quite reliable, but has been shown not to predict authoritarianism of behavior or even of attitudes (.71).

Gordon's Bureaucratic Orientation Scale (23) is very similar in conceptualization to the F scale and predicts staying in the ROTC vs dropping out. It is highly reliable but appears to be one-way worded.

The Classical Authoritarianism Scale by Ray (51) was an attempt to get back to the meaning of the word "authoritarian" before Adorno et al. (1) loaded it with psychodynamic connotations. It is a balanced 24-item scale with a high rPN and reliability (alpha) among students. With items about Mussolini, etc., it has high face validity and a range of predictive validity demonstrations, but it does not predict ethnocentrism. Its major problems are its high correlation with measures of conservatism and its limited internal consistency on nonstudent samples.

The Attitude to Authority Scale by Ray (16, 50) has many items in common with the scale immediately above but differs in that it was designed as a simple pro or con measure of attitude to authority without any assumption of an "authoritarianism syndrome." It has high reliability, a good range of predictive validity demonstrations, a high rPN, and is usable with nonstudent samples. Its major defect is that its items concentrate very heavily on attitude towards the Army.

The Humanistic Radicalism Scale (53, 57) consists of 15 "humanist" items originally intended to measure the opposite of the F scale. On a sample of Army conscripts, however, they showed a high positive correlation with both the F and D (Dogmatism) scales. It was concluded that a love of sweeping generalizations was the common factor in both this scale and the F scale. It is one-way-worded but controls for acquiescence have shown that the correlations cannot be explained as entirely artifactual. Insofar as the items can be accepted as "Leftist," it could be regarded as the long-lost scale of "authoritarianism of the Left."

The Dogmatism Scale by Rokeach (74) is almost as well-known as the F scale itself; hence we will concentrate here on attempts to cure it of its one-way-wording, a problem which has, in fact, proved even more difficult than with the F scale. Several early attempts are reviewed by Ray (49), where a new balanced Dogmatism scale is also presented, which did not, however, survive replication (61) and was, in any case, designed for students only. A further balanced scale for general population use was produced (58) but showed an rPN of only -.32. An attempt to improve on this (61) resulted in an rPN that was even lower. Clearly, the D scale is even less meaningful than the F scale. The methods that did finally succeed in producing a balanced F scale had no such success with the D scale. It has proven truly "irreversible" and, as such, may measure nothing but acquiescent tendency.

The Baker et al. Tolerance for Bureaucratic Structure Scale (5) is very similar to the work of Gordon (23), and the authors do mention Gordon's work. Its four subscales include Attitude towards (a) rules and regulations, (b) authority, (c) monotonous tasks, and (d) delay of personal gratification. The reliability seems generally high and behavioral validity has been demonstrated. The scale is roughly balanced against acquiescence, but the rPN is not given. If this is satisfactory, it may be a very good attitude scale of authoritarianism, despite not purporting to be such.

The Hierarchical Control Scale by Cochran (12) was designed for use with police officers. "The 34 item forced-choice scale samples a broad range of legal, moral and interpersonal situations" (12, p. 642). Reliability was satisfactory and some validity was demonstrated. The only reservation one must entertain about this scale, therefore, centers on its forced-choice format, which embodies unexaminable and often false assumptions (56).

The Directiveness Scale by Ray (59) appears to be only the second authoritarianism scale that asks the respondent direct questions about his own behavior rather than his opinions on great issues in the world. It is balanced against acquiescence, has generally satisfactory levels of rPN, and has shown on several occasions a strong ability to predict authoritarian behavior (59, 65, 71). It is also ideology-free (60, 67) and does not predict ethnocentrism (59, 63, 64). It has been used on random population samples and has worked in a wide variety of cross-cultural applications (68). Its major problem is a just barely adequate reliability (.74 in its Australian norming sample and .73 for a 14-item short form in a U. S. general population sample). A "pirate" (and possibly improved) version of the scale appeared in the American edition of Penthouse magazine (15)! Other extant scales of the same name are by Lorr and Youniss (37) and by Saklofske, Black, and Schulz (76). See also Lorr and More (36).

The General Attitude Towards Institutional Authority Scale by Rigby and Rump (72, 73) is now available in 32-item and 16-item short forms and has shown satisfactory reliability and validity on students. It is balanced and also has satisfactory levels of rPN. It has four equal-length subscales in its short forms covering attitudes to the Army, the Police, the law, and teachers. All four subscales intercorrelate highly. A recent test of the scale in its 32-item form on a community sample yielded a reliability of .93 -- which is exceptionally good. In the same study (71), however, the scale was shown to predict authoritarian attitudes only, not authoritarian behavior. Since the relationship between attitude and behavior can never be assumed, however, this may not necessarily be a fatal flaw. The same study also showed that the best of all predictors of authoritarian behavior was a scale of achievement motivation! As an attitude scale, the Rigby and Rump instrument would seem to be one of the best now available.

The Right-wing Authoritarianism Scale by Altemeyer (3) is balanced, with a high rPN and reliability, but is in content indistinguishable from a conservatism scale. There is nothing to show that it measures anything other than conservatism and the only firm conclusion to which Altemeyer comes as a result of his research with it is that high scorers also have parents who are high scorers. It is, then, sadly lacking in discriminant validity.

"Dominance" scales, according to Ray (65), often seem to be measuring much the same thing as authoritarianism and assertiveness scales. Insofar as this is so, a very wide range of scales is opened up for use. For example, the Jackson n-Dominance scale (25), The Schutz (77) FIRO-B, The Allport and Allport A-S scale (2), and scales by Mehrabian and Russell (44) and by Nowlis (45). There is also a high reliability general population Dominance scale developed from the Directiveness scale (65). An extensive review of dominance scales is to be found in Butt and Fiske (8).

"Deference" is a concept well-known in the political science literature where it is used to explain the phenomenon of working-class conservative voting. The theory is that such voters are choosing as their political ruler someone who is in some sense "superior" to them. The literature is reviewed in Ray (54) where a new scale is also presented. This is a 20-item scale with a reliability on a community sample of .77. Its correlation with Ray's Attitude to Authority Scale (see above) was only .17. Some validation for the scale was given but rPN was not calculated. If this can be shown to be satisfactory, this may be the best scale yet of purely political authoritarianism. Its low correlation with attitude to authority in other fields may also convey an important caution against inferring political authoritarianism from authoritarianism of other sorts.

The Psychoticism Scale has been proposed by Eysenck and Wilson (21) as the personality dimension underlying political authoritarianism. It is also referred to as a measure of "tough-mindedness" (20), but has never been validated as such. Its reliability on a community sample was found to be only .68, quite poor for a 25-item scale. Its rPN was, however, -.47 which is quite good (69).

There are two scales of "Adolescent Attitude to Authority": one by Rigby and Rump (73) and one by Ray and Jones (70). Rigby and Rump used four subscales -- (a) a children's form of their attitude to authority scale (see above), (b) attitude towards your own father, (c) attitude towards your own mother, and (d) attitude towards parents in general -- all of which showed high reliabilities and rPNs. Some validity demonstrations were presented and the most interesting finding was that the relationship between attitude to authority and attitude to parents was nonsignificant for older adolescents -- another blow against the theory of Adorno et al. (1). In the Ray and Jones work (70), two types of scale were used: attitude scale and behavior inventory. Each was made up of equal numbers of items dealing with teachers and parents. Reliabilities and rPNs were high. Unlike the situation with adults (59), the attitude scale and the behavior inventory were found to be highly correlated. In a later study (26), validation was offered for both scales, and it was suggested that the 20-item behavior inventory alone would make the best measure of overall authoritarianism among schoolchildren.

The Authoritarianism in Middle Childhood Scale by Phillips (48) has 47 items that read very similarly to the F scale. It was constructed on over 2,000 fifth- and sixth-grade pupils in Australia. Roughly 90% of the items are positively worded. The scale would, then, seem to be potentially heir to the problems of the F scale. It is more a children's version of the F scale than an alternative to it.

The Authoritarian Childrearing Attitudes Scale is a revision of PARI (the Parental Attitude Research Instrument) by Cross and Kawash (13). It is a 30-item scale with 10 anti-authoritarian items. It measures adult attitudes towards how children should be brought up. No reliability or rPN was reported, but the scale was found to show some relationship with the Stereopathy Scale (see above).

The Spautz Scale (78) is a forced-choice instrument designed to measure preference between the following two theories: "Theory X," a term invented by McGregor (40) to describe the traditional autocratic style of business management, and "Theory Y," the democratic style. Reliabilities of .73 and .71 were found. A variety of correlations with other scales (including the F and D scales) were used to demonstrate validity. The usual doubts about forced-choice scales apply (56).

The Authoritarian vs Nonconforming Scale by Lorr, Suziedelis, and Tonesk (38) has 23 items which ask how highly people value certain things. Ten items are negatively coded. No reliability or validity data are given, nor is rPN reported.

The Attitude to Authority Sentence Completion Test by Lindgren (35) is a projective test requiring subjects to complete sentence "stems." There are three coding categories of response to authority: hostile, anxious, and accepting. Interjudge and split-half reliabilities tend to be low by usual standards. No validity data are presented, nor is there any discussion of "positive" vs "negative" responses.

The Anti-authoritarianism Scale by Kreml (31) is somewhat reminiscent of the F scale but with at least some balance against acquiescence. He discusses this 32-item scale only in terms of his factors, however, and gives no overall reliability, validity, or rPN.

The New General Authoritarianism Scale by Lederer (32) has 18 oneway-worded items which include some F and D scale items and which generally read very similarly to the F scale. The scale seems to have been produced by item analysis of the Kagitcibasi (28) questionnaire but details of the analysis or scale reliability are not given. Some validation, however, is offered and the reliability of the unshortened Kagitcibasi questionnaire is shown to be high. It seems rather amazing, however, that in 1982 completely unbalanced scales were still being produced in this field.

The Adolescent Social Attitudes Scale consists of eight items, with three negative items, and was used originally just after World War II by McGranahan (39) in Germany and was recently re-used by Lederer (32). Half of the items, however, reflect ethnocentrism. Lederer shows that responses to the scale have changed vastly since McGranahan's work, but she gives no reliability data or rPN.


If the above survey does nothing else, it serves to highlight the variety of ways in which authoritarianism can be conceptualized. No longer can anyone reasonably insist that there is just one particular meaning for the word "authoritarian," which must be pre-eminent above all others. To some, "authoritarianism" is a cognitive style, to others it is an interpersonal style, while to yet others it is simply an attitude. Unfortunately, these various "types" of authoritarianism tend not to go together. A large part of the attractiveness of the Adorno et al. (1) theory was surely that they purported to show many different things going together to form a single authoritarianism syndrome. Variations in one thing could be used to explain variations in another. Unfortunately, we now know that almost none of the proposed covariation actually exists (3, 7, 51, 59). In future studies, there is, then, a clear need to specify in advance just what precisely one means by authoritarianism and what one wants to measure. Any old measure will not do as it could turn out to be totally unrelated to the concept the researcher actually wanted to measure.

This need imposes some burden on the researcher. It would be so much easier if there was one generally acceptable measure of a construct. Operationalizing any concept in social psychology does, however, appear to present difficulties. It is part of the task of doing research. While we cannot hope for uniformity of practice in reported research, we should be able to expect careful specification of what is meant and use of measuring instruments that are demonstrably relevant to what is meant. As the above catalog shows, however, validity demonstrations are rare. That an instrument measures what it is said to measure all too often has to be taken on faith.

Scales originally presented with little or no validation often turn out to be completely invalid as predictors of behavior when the matter is finally examined (71). Given the multiplicity of scales available in the field, therefore, further work on such scales may have little point unless the conceptualization they embody has particularly strong theoretical attractions. With their potential for acquiescence artifact, one-way-worded scales should also normally be avoided, given the existence of a variety of scales that do not have this problem. Where a one-way-worded scale has sufficient interest for a researcher to contemplate using a "balanced" revision of it, Section B above represents both a warning about the difficulty of such an enterprise and an outline of how the enterprise could finally be successfully prosecuted.

Finally, the present catalog can be no more than an introduction to the literature. Much detail had to be omitted. For instance, both the Classical Authoritarianism attitude scale and the Directiveness behavior inventory were said to be good predictors of behavior, yet the behaviors they predict are quite different (submissive, conforming behavior on the one hand and domineering, aggressive behavior on the other). This catalog should, then, only be used to narrow down and direct one's choice of reading when a scale is being chosen rather than being used to substitute for such reading.


