Current Affairs Bulletin, 1975, 52, 24-30.
PUBLIC OPINION POLLS AND ATTITUDE MEASUREMENT: Social science or a form of journalism?
By John Ray
Psephology - or in lay terms public opinion polling - has been described as failing between social science and journalism. R.H.S. Crossman once wrote that in his experience politicians suffered from a mild kind of schizophrenia about polls. They were read omnivorously in parliamentary circles yet politicians, especially when the polls made an adverse prediction, tended to express a rather lofty disdain for their accuracy. On the other hand, if election results confirmed pollsters' predictions, the politicians would talk darkly of the malignant influence of the polls' findings on the minds of the electorate.
Do public opinion polls in fact influence an electorate? Was, for example, the dismissal of Mr Snedden from the leadership of the Liberal Party largely the result of his poor showing as expressed by the pollsters?
Do polls results influence government thinking? A poll, even if not wanted for an assessment of an election result, can indicate the shift of public opinion over a given period, and pinpoint obstacles which could have to be faced in selling a party policy.
Of course it is argued, and rightly, that polls serve to indicate public opinion at the time of their being taken. How accurate are they in terms of prediction? In this area there have been notable instances of error: for examples - on the outcome of the Whitlam government's price and wage control referendum; in America of the Truman defeat of Dewey; and in Great Britain the 1970 defeat of Wilson by Heath when Labour supporters were lulled by the polls into such a sense of victory that large numbers stayed away from the polling booths.
That polls must inevitably affect morale and the behaviour of parliamentary leaders there can scarcely be very much doubt. They probably have an effect, too, on the reporting and discussion of politics in the media, and strengthen or weaken the position of a party in power or its opposition - may influence the way in which parties handle particular issues; at election time they must in some measure influence party strategy and tactics.
But one of the disconcerting aspects of the public opinion polls is that very few people appear to know very much about the methods employed in polling, about the techniques of sampling, and the numbers of views canvassed. How serious is sampling error and are question-framing and interviewer bias important? These are among the issues examined by this article.
To a significant degree, that all-important political entity "public opinion" has been created for us by public opinion polls. How well such polls do their job is therefore very important. In what follows we will consider what they should be doing from a social-scientific point of view and then ask how well they do it. It will not be possible to examine separately the various practices of each individual polling organisation so generalisations will have to be resorted to which are, of course, often much more true of one poll than of another.
Perhaps the field in which commercial polls are least open to criticism is in their sampling of people to be interviewed. The polls in Australia use different techniques to arrive at randomness but all seem to work very well. A.N.O.P. use a particularly careful procedure. They divide up the urban Commonwealth electoral subdivisions in terms of both socio-economic status and vote at the last elections so that there are equal numbers of (for instance) high- and low-status subdivisions used and equal numbers of (for instance) "safe Liberal" and "marginal Liberal" subdivisions used.
This is called "stratification". From each subdivision thus selected for study, an appropriate number of people to be interviewed is chosen completely at random (using a table of random numbers) from that subdivision's electoral roll.
Other Australian polls (such as Morgan-Gallup and Mc Nair) and all British polls (including AN.O.P.'s parent) differ in that they use a short-cut procedure known as "clustering". This means that if 100 people were needed from each subdivision, Morgan-Gallup would draw from the electoral roll only ten addresses and these addresses would be used as "starting points". From each starting point the interviewer then goes along the street until he has found ten people. Those interviewed are then always located in geographical "clusters" of ten. A.N.O.P., by contrast, would select from the beginning 100 entirely unrelated addresses. The advantage of clustering is that a bigger sample can be obtained for the same cost while the advantage of the A.N.O.P. method is that the sample is exactly random, rather than approximately random, at the time it is drawn. In actual fact, of course, no sample is ever collected as it is drawn as there are always some 15 per cent of the people from whom, for one reason or another, no response is obtainable.
Both the clustering and the non-clustering methods do, then, have strengths that aid accurate prediction. And in fact, regardless of method, all polls tend to give very similar and highly accurate predictions of election outcomes. The interviewing for each poll is normally done on two successive weekends. All polls take extensive precautions to see that their interviewers do not turn in "made up" results.
Methods of questioning
Finding out what people really think is a problem as old as the human race. Social scientists have not solved it. In their years of working on it, however, they have come up with some ways of going about the task that offer some improvement on the simple formula: "Just ask people".
A strategy that seldom seems to pay off is any attempt to trick information out of people. One "trick", however, that often does appear useful is to include in a questionnaire a "lie" scale - a list of unlikely-to-be-true statements such as "I never tell lies". Anybody who agrees with a large number of such statements is obviously "faking good" and making no attempt to represent himself honestly.
Thus, although the only really informative way of finding out what people think is to ask them directly, we are not totally reliant on their honesty. Methods do exist to help us detect and make some allowance for lying. Commercial polls, however, seldom take full advantage of this - relying usually on the "repeated question" technique only.
In asking people directly about their attitudes, however, there are still two major alternative ways of going about. it: the closed-ended and the open-ended methods. In the open-ended method we simply ask a person (for example): "What do you think about Aborigines?" and take down whatever he says. In the closed-ended method we ask: "Do you think Aborigines are in general hard workers?" and the person is given the option only of replying "Yes", "No" or "Not sure".
Open or closed?
Opinion research organisations use both closed- and open-ended methods. The closed-ended is the method most used for political polls and the like, whereas the open-ended method tends to be used in some market-research applications. The closed-ended method is sometimes criticised as imposing an artificial choice upon the respondent, while the open-ended method enables people to put things in their own words. The catch, however, is that once you enable people to put things in their own words, you have no objective or immediately quantifiable way of comparing people. In many cases you cannot even set up such simple categories as "percentage for" and "percentage against". To overcome this, open-ended surveys are almost always followed by a stage of "coding" the answers given - i.e., putting them into categories. This involves a lot of work and contains an inevitable element of arbitrariness. For this reason many researchers feel that it is better to give the respondents themselves the categories to start with and let them make their own choice between them.
It follows then that the best procedure in opinion measurement is an initial pilot survey of the open-ended type to decide the response categories to be used, followed by the major survey in closed-ended format. This comes near to getting the best of all possible worlds, in that the categories of opinion people are given to choose between are not artificial constructions but rather the type of words that people have been shown empirically to use most. At the same time the results are completely objective and beyond dispute - to the point where they can even be machine-scored.
Public opinion polls, however, often fail to carry out this full procedure. In so many surveys the statements given to people to respond to are composed not by the people themselves but by self-styled "experts" in a field where expertise is very much a matter of opinion. This is not, of course, true of routine political surveys where the questions to be asked are well-tried and the product of long sifting.
Summated measurement or single questions?
To the man in the street, if you want to find out whether a person likes ice-cream or not, you have only to ask: "Do you like ice-cream - Yes or No?" and that is all there is to it. Perhaps because the man in the street or someone not far removed from him is their most important customer, commercial polls often do that sort of thing.
For simple clear-cut choices that the person has thought adequately about, and about which he has had adequate experience, the method does work reasonably well. For polls about voting intention, for instance, it is often extremely accurate. Where it falls down, however, is in measuring complex or highly general attitudes. A good instance of this is attitude to Aborigines or racial prejudice generally. Such attitudes are very much a matter of concern in the modern world and polling organisations may often be asked to assist in studies of their epidemiology and correlates. An obvious first step in reducing racism is to gain an understanding of why it develops, and surveys can provide the basic information.
It can well be imagined that if a study of racism were to proceed by asking people: "Do you like Aborigines - Yes or No?", the results might not be very informative. Because racial attitudes are so complex, people will often object to such broad generalisations to the point of total non-co-operation. Usually, what people will say is that they object to (or like) some things about Aborigines but not others. They may, for instance, like their "easy-going-ness" but dislike their "drunkenness". In this case, then, we have to define racism in relative terms. A man is more or less racist according to how many senses there are in which he dislikes Aborigines, and this makes it necessary for us to ask several questions rather than only one about the topic. A closely related reason why we may have to ask several questions just to pin down one thing is that the concept we are concerned with may be inherently general. The measurement of conservatism would be a good example here. Most of us are conservative in some things but not in others. Again one can speak only of degrees of conservatism - with the most conservative people being those who take a conservative stance in the greatest number of areas. But in spite of these fairly obvious considerations, many polling research organisations do make inferences and statements about general traits in people on the basis of only one or two questions.
Even where a list of questions has been asked about one content-area, however, it is not always adequate simply to add them up without further ado. You need some guarantee that the questions you have asked do in fact all tap aspects of the one thing. You need some guarantee that the people answering see the relationship between the questions in roughly the same way as you do. In other words, you need to find out if answers to the set of questions do go together empirically; you need to calculate whether people who tend to get high scores on one half of your set of questions also tend to get high scores on the other half. If they do not, you are attempting to add up dissimilar things. It would be like adding up apples and oranges and getting as your answer grapefruit. In the technical terms of the psychometrician, you have to examine your "split-half reliability". Proper reliability checking is extremely rare in public opinion polls.
An example of the false inferences that can result from relying on answers to only one question rather than on "scales" (summated lists of questions) occurs in the work of the sociologist Lipset. Lipset was interested in whether working class people were authoritarian and used poll data to support his contention that they were. He found that the workers were much more likely to answer "one" when asked in polls: "How many political parties should there be?" This Lipset took as clear evidence that the workers preferred a one-party totalitarian state. He assumed that the desire for a one-party state arose from a desire to do away with dissent. There is however another possibility that made itself apparent to me only by accident. I was trying out in an authoritarianism "scale" (on a rationale similar to Lipset's) the similar statement "It would be much better if we could do without politics altogether." This I expected to be a strong pro-authoritarian item. I therefore made up my questionnaire including this and many other pro- and anti-authoritarianism items and asked a sample of people from the general population whether they agreed or disagreed with each statement.
What I found was that it was the people who generally disagreed with pro-authoritarianism items who agreed with this statement. The statement proved to express anti-authoritarian rather than pro-authoritarian sentiments. What happened was evidently that people saw the item as expressing anarchism to be the political ideal - not totalitarianism. In a similar vein, one might interpret Lipset's question by saying that people who express a desire for only one political party could thereby be choosing an ideal of brotherly and egalitarian consensus, rather than an authoritarian dictatorship.
One does, then, need evidence that a survey question is interpreted the way one wants and expects it to be. The best way of ensuring and examining this would again seem to be by using several questions to probe the one matter. Polls do this sometimes but they need to do it both regularly and in the systematic way I have described.
Balance against direction of wording
All questionnaire research seems to be plagued by the "acquiescence" problem. This is the tendency of people when answering survey questions to reply "yes" without really considering the questions at all. The phenomenon arises for a number of reasons. The most common is probably indifference. People reply "yes" believing that to be the easiest way to get rid of the interviewer. Another reason is that the question may be in some way obscure or complex and, rather than think it out, the person again simply says "yes" as the earliest answer. Thus if a poll puts the question: "Should Australia recognise Communist China?" and 50 per cent reply "yes", we can infer that in fact fewer than 50 per cent really favour this policy. The 50 per cent in favour might be made up of 30 per cent genuinely in favour and 20 per cent indifferent.
There is, of course, no way in which we can force people to stop being indifferent. One thing we can do, however, is ensure that the effect of indifference is evenly spread over all possible answers, so that one answer does not have the numbers choosing it artificially inflated at the expense of the others.
The way we do this is by the technique of "split plots". We divide our sample of people into two halves. To one half we give a negative form of the question and to the other half we give a positive form. On the China question, to give an oversimplified example, we could change the question to a statement in the form: "Australia should not recognise Communist China" and ask people whether they agreed or disagreed. This would be a negative form of the item to be answered by half the sample only. The other half would be given the positive statement: "Australia should recognise Communist China". The effect of this would be that half the indifferents were put into the "Pro" camp and half into the "Anti" camp. There would be no systematic inflation of one side versus the other and the results would give a truer reflection of the division of opinion in the community.
Pollsters do not of course ignore this problem altogether. Questions are often worded to require a more substantive answer than either "yes" or "no" - for instance by giving a list of options (such as political parties) from which to choose. This however may simply make the problem more intractable. One exchanges the acquiescence problem for the "donkey vote" problem - i.e., people simply number their preferences straight down the page as 1, 2, 3, 4. The proper "split plots" control technique is seen as too laborious and is seldom used. Other available control techniques, such as "balancing", are usable only on multi-item "scales" - which, as we have seen, polls seldom use.
Reliability and validity
Psychometricians use the words "reliability" and "validity" in a technical sense to refer to two indispensable characteristics that any form of attitude measurement must have if it is to be regarded as accurate: it must be repeatable and still give the same answer (reliability) and the answer given must be one that does in fact reflect what it purports to reflect (validity). There are standard procedures for ascertaining both of these but one could be forgiven the impression that all these procedures are totally unknown to most pollsters.
The simple test for reliability is to give a question or scale to a group of people twice and see how highly the answers on the two occasions correlate. As a short-cut, however, one can (only where scales are used) use the "split-half" procedure mentioned earlier. Use the correlation between any two halves of the scale to estimate the correlation between two occasions of administering the whole scale. Many scales and many single questions prove not to be reliable when a test is done and there generally in fact needs to be considerable trial and error item selection before reliable measurement can be achieved. When pollsters omit such considerations, therefore, they seriously reduce the value of their results.
Validity is a much more involved question. When we examine the validity of a conservatism index or scale, for instance, we ask ourselves: "Do the people who give the conservative answer on these questions actually act in the conservative way?" As has often been shown, attitude indexes very often do not predict behaviour. Perhaps the most generally satisfactory way of assessing such a correlation is to get independent ratings of each person's behaviour from others who know him and see whether what others say about him and what he says about himself actually do go together. Some sets of questions do produce a high correlation in such circumstances while others do not. Pollsters often seem to assume a high correlation without any prior proof whatsoever.
The way in which polling results are described in the press represents a fairly low level of informativeness. To know that "40 per cent" of the people favoured something sounds simple but does in fact disguise many ambiguities. Perhaps the basic error is in assuming that people can be lumped simply into "for" and "against" categories. When social scientists carry out surveys, they normally allow for a range of responses - a simple example of which would be: "Strongly Agree", "Agree", "Not sure", "Disagree", "Strongly Disagree". Such responses would be transformed into numerical scores such as 5, 4, 3, 2, 1 respectively. When this is done, one can then obtain an average "score" for each question which tells how much people tend to agree. Additionally, one can get a second statistic (known as a "standard deviation') telling us how much the scores tend to scatter around that average. It is obviously of some interest to know if the average we have was obtained by scores being all bunched closely to it or if it was obtained by scores being scattered fairly uniformly around the full range of possible responses.
Another factor that should be looked for in sample surveys is some statement about how likely it is that the result given will actually represent what the population as a whole would say. The results obtained from a sample do differ in variable ways from the results one would find by surveying the population as a whole. In a properly drawn sample, however, these variations are both small and their likely magnitude something that can be estimated. This estimate (or "standard error") is a simple and routine computation carried out by almost any computer program that polling organisations use to process their results. Practically never however does this statistic find its way into press reports of polling results. It is therefore of some importance not to take
seriously reports that two groups (of, say, voters) show small differences in their preferences for various things. A very small difference may in fact lie within the range of variation that the "standard error" would tell us could occur by chance sampling variations alone. When there is doubt about whether this explanation for any difference should be entertained, there is no alternative but to seek from the polling organisation directly fuller details than they release to the press. Unless one is either lucky or persuasive, this can cost money. Until their results have become outdated, polling organisations must - in order to support themselves - charge for what information they provide.
There have been some notable errors in poll predictions. The worst one so far by Australian polls was the prediction of the outcome of the Whitlam government's price and wage control referendum. From the polls that were published (particularly McNair's "Gallup" poll), no one expected the price control proposal to be defeated. It seems that what went wrong on that occasion was a last-minute mood of caution on the part of voters when actually confronted with the voting decision. They were not sure enough that they understood all the complicated arguments and divisions of opinion surrounding referendum proposals to give the government the go-ahead it sought. Last-minute changes of mood in the voters are of course something that no poll can allow for. The best one can ask is that they do truly report opinion that is consistently held.
A more important sort of polling error is the so-called "seldep" (self-defeating prophecy) and "selfup" (self-fulfilling prophecy). The most notable example of a "seldep" was probably the unexpected triumph of Mr Edward Heath in the 1970 British elections. What appears to have happened on that occasion was that the widespread predictions of Mr Heath's defeat led far too many Labour voters to stay at home and not bother to vote - confident that their man was in. Unlike Australia, Britain does not have compulsory voting. The Heath episode, then, leads to the ironical conclusion that polls will predict best when people have no confidence in them. This is at least a self-correcting system - melancholy a prospect though it may be to the pollster himself.
Another inference in this connection is that, whatever other merits it may or may not have, Australia's almost unique system of compulsory voting does have at least two advantages for pollsters. The first is that the population that will vote can be easily defined. It becomes the same as the total population. In the UK and the US this is not so. There, because a large proportion of the electorate fails to vote, polls may be more accurate in reflecting public preferences than elections are. One therefore cannot be used to predict the other. The second advantage is that the poll itself cannot influence the vote in the way it appeared to do in the 1970 British elections. Turnout is almost universal regardless of the polls. This should mean that Australian polls are more accurate than those in the US and the UK. Certainly, American polling, like British polling, has had its major defeats. The example of failure most quoted in the US is, of course, Truman's defeat of Dewey. Until recently, Australian polls had no comparable skeletons in the cupboard but the price control referendum did appear to dent this record. In fact, however, A.N.O.P. was prudent enough to stay right out of referendum predictions while Morgan-Gallup, in a poll taken just before the referendum, did actually make the right prediction. As Morgan-Gallup do not now have a daily-paper outlet this prediction appeared in print only after the outcome of the referendum was known.
A final, rather paradoxical, point about polling error is that a poll can be accurate even when it appears to fail to predict the right winner. To the general public, a poll that picks the right political party to win an election but gets the margin of victory quite wrong is good enough. Logically, however, such results indicate good luck more than scientifically accurate polling. A more commendable achievement is to get the percentage vote very close to the actual one, even if the slight margin of error does lead to picking the wrong candidate in a close contest. Nowadays polls, almost always do this. Whether they pick the right winner or not, their margin of error is normally very small. This is a strong indication of the adequacy of their sampling methods.
Finally, it seems appropriate to revert now to the question raised at the beginning of this article: what is the role of polls in creating opinions? As well as influencing voter turnout do they also influence opinion? It would certainly seem that they do influence politicians' opinions. The upheavals over Mr Snedden's leadership of the Liberal party seemed to turn on his poor "image" and polls are sources of information on this that are both primary and hard to dispute. In the absence of polls, it would take an election to find out for certain whether a leader had a poor image. Polls can thus make democracy work better by making the politicians more aware of the people's opinions. They thus "create" a "knowable" public opinion at all times. Otherwise we would have a public opinion that effectively existed only at election times.
The influence of polling on public opinion as such however is much more contentious. There is a possible "bandwagon" effect of people wanting to be on what they are told is the winning side and there is also the possibility of an opposite "underdog" effect of people wanting to give support to someone whom they feel sorry for because he is said to have little chance of winning. This latter effect could be especially strong in Australia - where sympathy for the underdog seems to be almost a national religion. In general, however, it should be fairly obvious that which of the two effects (if any) is dominant on any particular occasion must be highly dependant on the personalities of the particular political candidates involved and a whole host of other situational factors. Any secure generalisation as to which effect is "generally" dominant would therefore be very difficult. Perhaps the generalisation most likely to be true is that over a long period no systematic effect either way could be expected.
A more serious charge against polls as serving a political cause is that the pollster can word his questions in such a way as to produce a particular desired result. One instance of this is the large number of Australians who are shown in ordinary commercial polls to oppose "Abortion on demand". Polls taken by women's organisations show that the proportion of support for abortion obtained with this question is much smaller than the proportion of support obtained with other wordings of the same question - such as: "Do you think women should have the right to an abortion if they want one?" Does this mean that polls are "meaningless"? Not necessarily, but it does serve as a warning against reliance on one question only, or questions that are worded in one direction only. "Abortion on demand" tends to be a phrase used by opponents of liberalised abortion and phraseology used by people in favour of it should have been given equal usage by the commercial polls.
Also illustrated by these surveys on abortion is the possibility of interviewer bias - a well-known source of distorted results, and since the women were very definitely out to prove their case this could have had a quite substantial effect in this instance. It must also be said that the sampling in these amateur surveys left much to be desired.
To sum up, polls are in general no better than their market demands. This means a generally fairly low standard except so far as the sampling itself is concerned. More use of "lie" scales, less arbitrary judgment about the wording of questions, less reliance on single questions, balance against acquiescence, proof of reliability and validity and fuller descriptive statistics are all areas in considerable need of improvement. Greater sophistication in the academic discipline known as "psychometrics" Would appear to be what most pollsters need. This however is less of a criticism than it seems. Many sociologists and other survey-taking academics show equal lack of such sophistication and into the bargain are often much less careful in their sampling.
Go to Index page for this site
Go to John Ray's "Tongue Tied" blog (Backup here or here)
Go to John Ray's "Dissecting Leftism" blog (Backup here or here)
Go to John Ray's "Australian Politics" blog (Backup here or here)
Go to John Ray's "Gun Watch" blog (Backup here or here)
Go to John Ray's "Education Watch" blog (Backup here or here)
Go to John Ray's "Socialized Medicine" blog (Backup here or here)
Go to John Ray's "Political Correctness Watch" blog (Backup here or here)
Go to John Ray's "Greenie Watch" blog (Backup here or here)
Go to John Ray's "Food & Health Skeptic" blog (Backup here)
Go to John Ray's "Leftists as Elitists" blog (Not now regularly updated -- Backup here)
Go to John Ray's "Marx & Engels in their own words" blog (Not now regularly updated -- Backup here)
Go to John Ray's "A scripture blog" (Not now regularly updated -- Backup here)
Go to John Ray's recipe blog (Not now regularly updated -- Backup here or here)
Go to John Ray's Main academic menu
Go to Menu of recent writings
Go to John Ray's basic home page
Go to John Ray's pictorial Home Page (Backup here)
Go to Selected pictures from John Ray's blogs (Backup here)
Go to Another picture page (Best with broadband)