Australian Psychologist Volume 7 No. 3 November 1972. Pp. 155-166.

A NEW BALANCED F SCALE, and Its Relation to Social Class

J. J. RAY, University of New South Wales

None of the attempts to produce a balanced F scale have so far produced versions wherein there is a large negative correlation between the original and the reversed items. Faced with this problem, some psychologists have turned to a scale format wherein this sort of question cannot possibly be examined (Berkowitz and Wolkon, 1964; Hughes, 1968) while others have abandoned the original F scale in favour of entirely new items (Lee and Warr, 1969). Since only three out of their positive items are drawn from the F scale and since only two out of fifteen negative items are reversals of F scale items, the claim by these latter authors to have produced a "balanced F scale" seems rather overstated. What is true is that they have produced a scale with a close conceptual affinity to the F scale. Even so, the presence of religious and direct political items seems far more obvious than in the F scale. This makes its distinction from a conservatism scale rather tenuous and would certainly introduce an undesirable artifact in any study involving political variables. In this paper the Lee and Warr scale will be referred to as the "LW" scale.

How do we explain the failure so far to produce a balanced F scale? It is suggested here that this failure is due not to the F scale's being inherently irreversible (Christie, Havel & Seidenberg, 1956) but to the omission of certain test-construction precautions in the past. The fundamental dictum that seems to have been overlooked is that one can never know in advance whether an item actually measures what it seems to measure. The psychologist can never know on a priori grounds whether his subjects are responding to the same features of any item that the psychologist himself sees as most important. It is because one must seek evidence that an item measures what it purports to measure that the entire business of item-analysis goes on. The elementary form of such evidence is that an item be shown to correlate with some criterion (usually the total score on the proto-scale ) . What this paper proposes is that the version of a reversed item finally used be selected on such an empirical basis -- in this case its correlation with the F scale total score.


Data Gathering

A pool of reversed versions of original F scale items was constructed, some being original constructions and some being drawn from previous work -- notably Christie, Havel and Seidenberg (1956). The dictum of these authors that an item should be a psychological as well as a grammatical reversal was closely followed. For every original item there were three to five reversed versions. These items were included in a questionnaire, preceding the original F and the LW scale. After these again came some other items not of interest here. The preamble of the questionnaire was as follows:

"This is a survey for research purposes only and no attempt will be made to trace back answers. Just an honest expression of your opinion is sought. All details will be strictly confidential.

This questionnaire is in fact the preliminary version of a survey that will later be made to find out what people's opinions and preferences are. Because it is a trial version we have in many cases used several different but similar questions to find out one thing we would like to know. You will therefore find the questionnaire very repetitious. This is definitely not a trick to find out how consistent you are. No two questions are in fact exactly the same.

On the basis of your responses we will be cutting down the questionnaire to about a quarter of its present length, so what we would like you to do is work carefully through and indicate your agreement or disagreement with each statement in its own right -- without looking back to see how you responded before. There is a space under each item where you can write any comments you would like to make.

For each statement you can give one of five answers -- from "Strongly Agree" to "Strongly Disagree". You indicate which answer you want to give by circling a number."

It was desired that the norming sample be more heterogenous than the normal 18-20 year old sample of university students. This, is because scales normed on students often do not work as well when applied elsewhere (Ray, 1970). The sample finally used therefore was obtained partly by house-to-house calls and partly by enlisting evening student volunteers. The evening students are an older and more varied group than the day students. The total was 120 persons -- of which the students comprised approximately half. They completed the questionnaire for course credit in Introductory Psychology at Macquarie University. The other half were obtained by attempting to interview as many people as possible in several randomly selected blocks in the Sydney suburb of Balmain. This suburb was selected for the wide range in socio-economic status of its inhabitants -- who range from university lecturers to day labourers. A questionnaire was left with any who would accept it and collected one week later. Several back-calls were made an all not-at-homes. It was felt that for the purposes of scale construction a heterogeneous sample was at least as appropriate as a representative one.

The preamble to the questionnaire appeared to work well in that there were no protests or challenges recorded from any of the interviewees -- in spite of the necessarily repetitious nature of the questionnaire.

For the students the questionnaire took approximately 30 minutes to complete. Administration for both groups was anonymous. For people accepting the questionnaire at their doors only the address was recorded for the purposes of making the back-call.

Data Analysis

The 74 reversed items were divided in the order of their occurrence into three sets of 25, 25, and 24 items. Each of these lots was successively combined with the 28 "F" items and reverse-scored. An item analysis of each of the three proto-scales was then carried out. This consisted of obtaining a list of correlations between item-score (all items were scored from 1 to 5 ) and the score on the total scale. The 28 out of 74 reversed items which had the highest item-total correlations were then selected and combined once more with the original 28 items to make a fourth proto-scale. The purpose of the three (rather than one) proto-scales was to avoid having the reversed items (totalling 74) having a disproportionately greater influence on the total score than the original items (totalling 28). The final proto-scale was then analysed by dropping items (positive or negative) which had low item-total correlations. After a small group of items was dropped, a new set of correlations with the total score was produced and the whole process repeated. In this way, a balanced 38 item scale with a reliability of .88 was produced. One problem with this scale, however, was that several of the items finally successful were merely versions of one another. In some cases there were three versions of one item-the original positive form plus two alternative negative forms. In an endeavour to eliminate this as far as possible, another 10 items were dropped to give the scale presented in Table 1. This scale showed a reliability (Cronbach's [1951] coefficient "alpha") of .87 and a correlation between positive and negative halves of -.71. It is named the "BF" scale. Fourteen items are original F scale items and the other 14 are reversed versions of original F scale items.


The final BF scale

The first 14 items are scored 1 to 5 for "Strongly Agree" to 'Strongly Disagree". The last 14 are scored 5 to 1.

1. The rebellious ideas of young people are often a constructive source of change for the better.
2. Homosexuality between consenting adults may be distasteful but it should not be regarded as a crime.
3. Many of the radical ideas of today will become the accepted beliefs and practices of tomorrow.
4. Familiarity does not breed contempt.
5. People who want to whip or imprison sex criminals are themselves sick.
6. What a youth needs most is to be free to make up his own mind, to be flexible and to work and fight for what he considers right personally, even though it might not be best for his family and country.
7. Many good people honestly could never bring themselves round to feeling much love, gratitude or respect for their parents.
8. You need to get out and rub elbows with all kinds of people to get new ideas and broaden your understanding of life.
9. There are times when it is necessary to probe into even the most personal and private matters.
10. It's all right for people to raise questions about even the most personal and private matters.
11. Insults to our honour are not always important enough to bother about.
12. Sex crimes such as rape and attacks on children, are signs of mental illness; such people belong in hospitals rather than in prison.
13. Most honest people admit to themselves that they have sometimes hated their parents.
14. Disobedience to the government is sometimes justified.
15. If people would talk less and work more, everybody would be better off.
16. The business man and the manufacturer are much more important to society than the artist and the professor.
17. Every person should have complete faith in some supernatural power whose decisions he obeys without question.
18. No sane, normal, decent person would ever think of hurting a close friend or relative.
19. Young people sometimes get rebellious ideas but as they grow up they ought to get over them and settle down.
20. An insult to our honour should always be punished.
21. Sex crimes, such as rape and attacks on children deserve more than mere imprisonment; such criminals ought to be publicly whipped or worse.
22. Obedience and respect for authority are the most important virtues children should learn.
23. Most of our social problems would be solved if we could somehow get rid of the immoral, crooked and feeble-minded people.
24. People can divide into two distinct classes: the weak and the strong.
25. What the youth needs most is strict discipline, rugged determination, and the will to work and fight for family and country.
26. Familiarity breeds contempt.
27. There is hardly anything lower than a person who does not feel great love, gratitude and respect for his parents.
28. Homosexuals are hardly better than criminals and ought to be severely punished.

Another analytical approach that might have been possible would have been to work from the intercorrelation matrix of the items. Ray (1972) has however shown that this method both theoretically and empirically produces closely similar results to the one adopted above.

The correlation between the LW scale and the BF scale was calculated and found to be .72. Scores on the LW scale correlated .62 with scores the original F scale. Scores on the BF scale and scores on the original F scale correlated .86. The BF scale is thus superior to the LW scale in this respect (.62 vs .86). Part of the reason for this difference is of course the much greater item-overlap of the BF with the F scale. Thus, although the LW scale is a well constructed instrument, it is at an inherent disadvantage in achieving the desired maximal correlation with the original F scale.

At this point the question arose: "The present test construction procedures maximize internal consistency; was not however the original F scale intended to be multi-factorial?" The short answer is that the original F scale was supposed to be both internally consistent and multi-factorial. The California authors used item-analysis procedures to maximize internal consistency and at the same time subdivided their items under several conceptual headings.

These headings do not however correspond to the factor structure of the scale (Camillieri, 1959; Krug, 1961). It is not therefore clear whether we should expect the new scale to be more or less multi-factorial than the original. We can however run at least a preliminary empirical check to find out what is in fact the case. The best way of doing this would seem to be to use the same structure-analytic method on both the original and the new scale. Effects specific to the analytic method can thus be controlled against. We also have in the present case the opportunity to control against effects due to the sample -- since the same sample did receive both scales.

The analytical method adopted, then, was McQuitty's (1961) "elementary factor analysis". This method analyses latent structure into clusters. An item is not shown as loading on a factor -- it is simply in one cluster and no other.

There were 4 first order clusters for the original scale and 6 for the balanced scale. Thus this analysis implies that the new scale was factorially slightly more diverse than the original.

The composition of the 6 clusters for the BF, scale was as follows: Cluster 1 -- items 4 and 26; cluster 2 -- items 2, 28, 14, and 17; cluster 3 -- items 5, 21, 12 and 8; cluster 4 -- items 22, 25, 3, 6, 19, 27, 1, 7, 13, 18, and 15; cluster 5 -- items 20, 24, 11, 16, and 23; cluster 6 -- items 9 and 10. Any attempt to assign names to each of these clusters would be unduly speculative. It is at least evident that the original and the modified items do not cluster separately. Only cluster 6 does not contain some of both type of item. When second order clusters were extracted, the entire BF scale fell into the one cluster. The BF scale is thus multi-factorial at one level and unifactorial at a higher level. This is perhaps reminiscent of the superficially contradictory findings reported by Krug (1961) and Camillieri (1959) on the one hand and Eysenck (1954) on the other. The two American authors found several small factors (small in terms of the number of items loading highly) in the original F scale whereas Eysenck, reporting an analysis of the original California data by Coulter, said that there was in the F scale "A strong general factor throughout".


To see whether the scale would stand up under replication, the final version was given as part of a much larger questionnaire to the entire group of students in the Introductory Psychology course at Macquarie University. The final form of the scale of course did not require the lengthy preamble given with the unreduced version. A simple and conventional "confidential -- research purposes only" introduction was therefore used. The n for analysis was 180. The correlation between the halves of the BF scale dropped somewhat to -.53. Even so, it is of course of a much greater magnitude than any such correlation reported heretofore. The reliability of the BF scale was .83.

Some mention of validity is also perhaps in order here. It was felt that the near identity of content in the new and old items would be sufficient guarantee of this. To give some empirical check however, the correlations between BF scores and other scales in this sample were found. The new scale correlated .66 with an "Attitude to Authority" scale (Ray [1971 (a)]), .46 with a "Political Conservatism" scale (see also Ray, 1971 [a] ) and .26 with conservatism of actual political party choice. All these are significant at the .01 level. Some concurrent and predictive validity has therefore been demonstrated -- the first by the correlations with other scales and the second by the correlation with political choice.


One final question of great interest that does arise is: "It is all very nice to have the opportunities for control offered by the new scale but what difference does it make? What effects are revealed by the new possibilities of control?" Obviously the number of possible topics one could examine in seeking to answer this question is very large. A good starting point however seemed to be Lipset's (1960) thesis relating authoritarianism to social class.

A third study was therefore carried out. In this study the sampling was designed explicitly to suit the requirements for a test of Lipset's hypothesis. These requirements, ideally, would of course be a random sample of the upper and middle classes combined and a random sample of the working class. As an approximation to this very difficult goal, clearly upper and lower class areas of the Sydney metropolitan area were selected with the guidance of Congalton's (1969) prestige ranking of Sydney suburbs. In each of the selected areas, blocks were randomly chosen from the map with a pin and blindfold. An attempt was then made to. interview one person from each household in the block. In this way 118 people were contacted who filled out the questionnaire (which also contained other material not of interest here).

The correlation observed between the two halves of the F scale was on this occasion -.56. Measures were obtained on an additive "subjective index" of class self-assignment and on occupation (scored manual [1] vs. non-manual [2]). For details and the rationale of both of these class measures, see Ray (1971 [b]). The results are given in Table 2. It will be obvious that, as expected, the negatively worded and the unaltered items gave somewhat different results. The positive items were more strongly related to occupation while, as far as subjective class was concerned, only the positive items predicted it. Since both the positive and the negative halves of the scale were, however, significantly negatively related to occupation, we can say that Lipset's hypothesis (that the "pre-Fascist" type of authoritarianism measured by the F scale is higher among people in manual occupations) stands confirmed. The coefficient "alpha" reliability of the BF scale on this sample was .86. The fact that the BF scale has thus stood up well on two replication samples of quite different composition does thus testify to the suitability of the norming procedure for producing a generally usable scale.


Correlations of the BF scale with class indices

In this table, all scales are scored so that a high score represents pro-authoritarian attitudes. A high score on class indices represents upper-class identity. The bottom row of the table is the result obtained with a single (seven-point) class self-assignment question. If r > .180 then p < .05 (2 tailed).

....................................BF Scale.....Positive Items.....Negative Items

Subjective Index............-.140............-.212...................-.014
Q. 11..............................-.090...........-.168................... .031

Also included in this study was a modified version of Ray's (1970) balanced Dogmatism (BD) scale. The modifications were designed to make the scale more suitable for general population use and are being reported on elsewhere. The original BD scale was normed on university students only. The correlations of the BD scale with other variables were as follows: BF scale .617; BF positive items .669; BF negative items .393; occupation -.283; subjective class -.204. If the BF score is partialled out from the correlation between BD and occupation, the coefficient drops from -.283 to -.128-which is non-significant. If we do the converse and partial out BD score from the correlation of the BF scale with occupation, the coefficient drops from -.308 to -.173. This means that the effect most strongly represented in the BF scale (-.173 is still significant -- though only on a one-tailed test) is the one which also accounts for the correlation between the BD scale and occupation. Putting it another way, there is one factor common to both scales which accounts for the correlation with occupation.

It is perhaps of some interest to note that the correlation between the D and F scales is not solely attributable to their common one-way wording. When, as at present, we use balanced scales to measure both variables, the correlation between the two still remains very high.


It does appear that a balanced F scale has now been produced. There is still, however, scope for further work. Better general population norms are obviously needed and the scale's suitability for international use can also not be guaranteed. If this study were to be replicated using for example, American subjects, attention might be focused on producing a large number of possible reversals of items for which this study did not succeed in finding a suitable reversal. If this is done, the remaining content overlap in the BF scale might be eliminated. As it stands, however, the overlap does not seem to be disturbing to respondents if the items are distributed in blocks throughout a larger questionnaire.

The BF scale has several advantages over the LW scale as a balanced F scale version. It has (necessarily) a much higher correlation with the original. Given the fact that there are effects due to acquiescence and details of wording, the correlation of .86 seems as high as one could reasonably expect. We must then have come fairly near to the goal of having BF scares differing from F scores only in that the former have the influence of acquiescence removed. By contrast the common variance between the F scale and the LW scale is only 38%. This against 74% for the BF scale. Obviously what the LW scale measures is in many respects different from what the F scale measures. Even aside from this, however, the BF scale has other advantages over the LW scale in its clearer conceptual identity and in that it has no overt political and religious polarization. It can thus be used in studies of political variables without introducing new artifacts.

The .86 correlation between BF and F scale scores is also an important sort of validation. Obviously, to the extent that the one is valid the other is also. The question of validity must however be put in context. It is taken that the demonstration of a significant correlation with Right-wing political preference was sufficient demonstration of predictive validity for this scale. Titus (1968 ) shows that the original F scale itself had practically no relation to actual behaviour. It would seem that this must be true of any version of the original scale also. Authoritarianism might be seen as having greatest interest in a study of ideology per se rather than as a predictor of detailed behaviour.

The fact that the balanced F scale (in Study 3 ) reduces the prediction of class variables below what is obtained with positive items only is in line with the purpose in constructing the new scale -- and in fact justifies the effort. The task was to test a theory -- not maximize a prediction -- hence the desirability of removing that part of the prediction contributed by acquiescence.

The support for Lipset's thesis given in the third study above is of greatest interest because of the many attacks that have been made on the thesis and the considerable body of contrary evidence that has been produced (e.g. Hamilton, 1968; Miller and Riessman, 1964). So extensive were the attacks in fact that Lipset (1961) himself felt obliged to hedge his theory extensively. The present results would seem to indicate that this might have been a too hasty retreat. Peabody's (1961 and 1966) belief (that the correlation of class with F scale scores was caused by acquiescent set only) has also received support -- but only as far as subjective class is concerned. Against the hard economic differentia that Lipset was concerned with, there was more than mere acquiescence involved.


A balanced F scale using only items which are versions of originals has been produced. It was normed on a heterogenous sample and the correlations between its positive and negative halves range from -.71 to -.53. Reliabilities range from .87 to .83. Validity coefficients range from .65 to .26. Using this scale, a test of Lipset's "working class authoritarianism" thesis shows that high authoritarianism is related to membership in manual occupations but is not related to subjective social class.


Berkowitz, N. H., and Wolkon, G. H. A forced choice form of the F scale-free of acquiescent response set. Sociometry, 1964, 27, 54-65.

Camillieri, S. F. A factor analysis of the F scale. Social Forces, 1959, 37, 316-323.

Christie, R., Havel, Joan, and Seidenberg, B. Is the F scale irreversible? Journal Abnormal and Social Psychology, 1956, 56, 141-158.

Congalton, A. A. Status and prestige in Australia. Melb. : Cheshire; 1969.

Cronbach, L. J. Coefficient alpha and the internal structure. of tests. Psychometrika, 1951, 16, 297-334.

Eysenck, H. J. The psychology of politics. Lond.: Routledge, 1954.

Hamilton, R. A research note on the mass support for "tough" military initiatives. American Sociological Review, 1968, 33, 439-445.

Hughes, A. H. Problems and solutions in measuring psychological dispositions. Paper delivered at Australian UNESCO seminar on mathematics in the social sciences, 1968.

Krug, P. An analysis of the F scale: I Item factor analysis. Journal Social Psychology, 1961, 53, 285-291.

Lee, R. E., and Warr, P. B. The development and standardisation of a balanced F scale. Journal General Psychology, 1969, 81, 109129.

Lipset, S. M. Political man. N.Y.: Doubleday, 1960.

Lipset, S. M. 'Working class authoritarianism' -- A reply to Riessman. British Journal Sociology, 1961, 12, 277-281.

McQuitty, L. C. Elementary factor analysis. Psychological Reports, 1961, 9, 71-78.

Miller, S. M., and Riessman, F. 'Working class authoritarianism': A critique of Lipset. British Journal Sociology, 1961, 12, 263-276.

Peabody, D. Attitude content and agreement set in scales of authoritarianism, dogmatism, anti-Semitism and economic conservatism. Journal Abnormal and Social Psychology, 1961, 63, 1-11.

Peabody, D. Authoritarianism scales and response bias. Psychological Bulletin, 1966, 65, 11-23.

Ray, J.J. (1970) The development and validation of a balanced Dogmatism scale. Australian Journal of Psychology, 22, 253-260.

Ray, J.J. (1971) An "Attitude to Authority" scale. Australian Psychologist, 6, 31-50.

Ray, J.J. (1971) The questionnaire measurement of social class. Australian & New Zealand J. Sociology 7(April), 58-64.

Ray, J.J. (1972) A new reliability maximization procedure for Likert scales. Australian Psychologist 7, 40-46.

Titus, H. E. F scale validity considered against peer nomination criteria. Psychological Record, 1968, 18, 395-403.


Replication is one of the cornerstones of science. A new research result will normally require replication by later researchers before the truth and accuracy of the observation concerned is generally accepted. If a result is to be replicated, however, careful specification of the original research procedure is important.

In questionnaire research it has been my observation that the results are fairly robust as to questionnaire format. It is the content of the question that matters rather than how the question is presented (But see here and here). It is nonetheless obviously desirable for an attempted replication to follow the original procedure as closely as possible so I have given here samples of how I presented my questionnaires in most of the research I did. On all occasions, respondents were asked to circle a number to indicate their response.

