Academic paper written in 1992 but published on the internet only


J.J. Ray

University of N.S.W., Australia


Boyle has disinterred Cattell's concern over "bloated specifics" and used it to condemn all scales that have high internal consistencies. It is shown that his argument depends on an unwarranted belief that "breadth of coverage" is a uniformly desirable attribute of a scale.

Although not titled as such, Boyle's (1991) paper is essentially a reply to some criticisms I made (Ray, 1988) of earlier claims by him to the effect that attitude and personality scales can have "too much" internal consistency. It seems appropriate, therefore, for me to offer some rejoinder to Boyle's (1991) arguments.

Essentially, what Boyle is relying on is the old Cattellian fear of "bloated specifics": i.e. if you put into a factor analysis a set of items which includes a sub-set of items that are all essentially re-phrasings of one-another, you will undoubtedly get out a factor featuring the set of "re-phrased" items. Cattell (1978) regards such factors as being in some sense "spurious". Certainly, they do highlight the importance of asking whether the items analysed represent a sampling from any known domain -- something that seldom seems to be done.

Boyle has, however, overgeneralized Cattell's concerns in this matter. He extends suspicion of high internal consistency to all scales -- including scales which are not factor-analytic products and which hence embody no claim that they measure one factor of some given item domain.

What Boyle seems to want is for scales to have "breadth" of coverage. He appears to see this as a such self-evidently desirable goal that he is prepared to sacrifice reliability and internal consistency in order to attain it. I submit that Boyle's desideratum here is in fact probably a generally undesirable goal.

Take as an example one scale which should really find favour in Boyle's eyes -- the 'A-B' scale of Jenkins, Rosenman & Zyzanski (1974). This scale contains items that cover a broad field indeed -- items that might normally form parts of scales measuring achievement motivation, dominance, hostility, impulsiveness etc. (Ray & Bozek, 1980; Hansson, Hogan, Johnson & Schroeder, 1983). The aim of those who constructed this scale was, of course, to predict which people would contract coronary heart disease -- and the scale did seem to have some initial success at that. In describing what their scale measured, it authors stressed the extent to which it picked out individuals who were always in a hurry, people who tried to cram a lot of activity into their day. They therefore concluded that it was people such as that who were at high risk of heart disease. Subsequent research however (e.g. Diamond, 1982; Linden, 1987) found that it was in fact aggression/hostility which predicted heart disease and not time-urgency. It was only to the extent that the 'A-B' scale included some element of hostility that it occasionally gave the prediction required.

So the "broad" coverage of the 'A-B' scale led to its being either uninformative (in the hands of a careful user) or misleading (in the hands of less careful users) and the publicity given to the claims of its authors probably caused many busy people needless worry.

This episode does not conclusively show that "narrow" scales are generally desirable but it does clearly show that "broad" scales are not necessarily desirable. "Breadth" as such is just not a goal to be generally aimed at. There may be some occasions where "breadth" of coverage may be useful but that is surely a matter to be studied and determined empirically -- not mandated in advance. Boyle's attack on high internal consistency is therefore built on sand.

No doubt Boyle would say that the 'A-B' scale was "too" broad and that what he is criticizing are scales that are "too narrow" in their coverage. But who is to make such judgments? Might not a scale be "too narrow" for one purpose but just fine for other purposes? To take a light-hearted example, say, for instance, that I constructed an "Attitude to Boyle" scale that contained items such as "Gregory Boyle is muddled", "Gregory Boyle is no psychometrician", "Gregory Boyle needs to think more" etc. Would not this be just the sort of scale Boyle is condemning? Are not all the items essentially re-phrasings of one-another? But who is to say that such a scale could not be useful?

If I administered the scale to all the members of the Australian Psychological Society might it not correlate with how many courses in psychometrics people had taken? Might it not thus be a useful pointer to who needed re-training? It would be sheer dogmatism to deny the possibility.

But why construct such a scale when only one item describing Boyle might work equally well? Because we all know that reliability is important and reliability is related to (not identical with) internal consistency (Cronbach, 1950; Ray, 1988). As Boyle himself proclaims, scales where all the items are essentially re-phrasings of one-another do tend to have high internal consistency so I can make my scale as reliable as I like just by adding in more and more re-phrasings of the one basic item.

In short, high reliability and internal consistency can be obtained in fairly trivial ways (as in the example above) or in less trivial ways. In neither case is it bad and in both cases it increases our confidence in our measures. It should always therefore be sought. I have already shown above that the reason Boyle advances for not seeking it (because it impedes breadth of coverage) is no disadvantage at all.


