The Journal of Social Psychology, 1982, 118, 141-142.



University of New South Wales, Australia

Construct validity is generally held to be best shown when the items of a scale are found all to load on one factor in a factor analysis. With balanced scales, however, another useful test becomes available: whether the two supposedly opposite halves of the scale in fact correlate significantly negatively. There have been some notable failures of scales to pass this test (1). When supposedly opposite items are shown not to be responded to in opposite ways, it seems clear disproof of any belief that the scale measures what it was thought to measure.

Another indication of construct validity that might more often be usefully reported is the correlation between the initial and the final forms of a scale. Often in scale construction one starts out with a collection of items that is deliberately as comprehensive a coverage of the range of the construct as possible. The total score on this form of the scale is therefore an important criterion for determining whether the full range of the construct is covered by the final form of the scale. It shows how generalizable are the answers given to the items of our final (generally much abbreviated) form of the scale.

Normal item analysis procedures, however, seem to offer little guarantee that the initial and final forms of a scale will correlate highly. It is normal to delete items in a step by step manner. A total scale score is calculated and a small group of items that correlate least with it are deleted. A new total score (without the deleted items) is then calculated and further items are deleted on the criterion of poor correlation with it. This whole procedure is commonly repeated many times before the maximally reliable form of the scale is reached (2). It is not clear, however, what effect this procedure will have on validity. Since the criterion for item selection and deletion changes with every cycle of the process, there seems little guarantee that the group of items eventually chosen will bear much relation to what the original scale focused upon.

When the final form of the scale must be balanced between positively and negatively worded items, the problem would appear to be compounded. Often stronger items are rejected and weaker ones are retained in order to ensure balance. We would thus seem to get even further away from what the original scale measured.

In construction of the Ray (3) scale of environmentalist attitudes, the procedures described above were used. Additionally, however, two forms of the scale were constructed, one without any attempt to force a balance and one that was deliberately made to give a balance between item types. The initial 77-item scale was, then, reduced to two 20-item forms and two 12-item forms. The 20- and 12-item unforced scales had some balancing items (five and three, respectively) and reliabilities of .87 and .84. The 20 and 12-item scales with forced balance had reliabilities of .85 and .78. Forced balancing did thus damage reliability. With validity, however, the situation was reversed. The unforced 20- and 12-item scales correlated .86 and .80 with the 77-item scale, but the balanced 20- and 12-item scales correlated .90 and .87, respectively. The reliability (alpha) of the 77-item scale without deletions was .87.

These findings thus represent some reassurance that in this instance normal scale reduction procedures sacrificed little in the way of construct validity (in the sense of generalizability) even when forced balancing was employed. There is even a suggestion that balanced scales may be more valid than semi-balanced scales.


1. Christie, R. Havel, J., & Seidenberg, B. Is the F scale irreversible? J. Abnormal & Social Psychology, 1956, 56, 141-158.

2. Ray, J.J. (1972) A new reliability maximization procedure for Likert scales. Australian Psychologist 7, 40-46.

3. Ray, J.J. (1975) Measuring environmentalist attitudes. Australian & New Zealand J. Sociology 11(2), 70-71.

