![]() |
![]() |
|
![]() |
![]() |
Encyclopedia :
M :
MA :
MAR :
Margin of error |
|
|
Margin of errorThis page discusses the use of the term in opinion polling. For other uses, see margin of error (disambiguation). that show the relative likelihood that the "true" percentage is in a particular area given a reported percentage of 50 percent. The bottom portion of this graphic shows the margin of error, the corresponding zone of 99 percent confidence. In other words, one is 99 percent sure that the "true" percentage is in this region given a poll with the sample size shown to the right. The larger the sample is, the smaller the margin of error is. If lower standards of confidence (95 or 90 percent) are used, the margins of error will be smaller (by 24 or 36 percent, respectively) for the same sample sizes. The margin of error can be calculated directly from the sample size (the number of poll respondents) and is commonly reported at one of three different levels of confidence. The 99 percent level is the most conservative, the 95 percent level is the most widespread, and the 90 percent level is rarely used. Formally, if the level of confidence is 99 percent, one is 99 percent certain that the "true" percentage in a population is within a margin of error of a poll's reported percentage for a reported percentage of 50 percent. Equivalently, the margin of error is the radius of the 99 percent confidence interval for a reported percentage of 50 percent. Note that the margin of error only takes into account sampling error. It does not take into account other potential sources of error such as bias in the questions, bias due to excluding groups who could not be contacted, people refusing to respond or lying (selection bias), or miscounts and miscalculations. Calculations and caveatsThe margin of error is a simple re-expression of the sample size, N. The numerators of these equations are rounded to two decimal places.
The margin of error is a poll-level statistic that should not be used to evaluate or compare reported percentages. However, due to its unfortunate name (it neither establishes a "margin" nor is the whole of "error"), it has become one of the most widely overinterpreted statistics in general use by the media. It is frequently misused to judge whether one percentage is "significantly" higher than another or to specify the error associated with reported percentages outside of 50 percent. Understanding the margin of errorA running exampleThis running example from the 2004 U.S. presidential campaign will be used to illustrate concepts throughout this article. It should be clear that the choice of poll and who is leading is irrelevant to the presentation of the concepts. According to an October 2 survey by Newsweek, 47 percent of registered voters would vote for John Kerry/John Edwards if the election were held today. Forty-five percent would vote for George W. Bush/Dick Cheney, and 2 percent would vote for Ralph Nader/Peter Camejo. The size of the sample is 1,013, and the reported margin of error is ±4 percent. The 99 percent level of confidence will be used for the remainder of this article. The basic conceptPolls require taking samples from populations. In the case of the Newsweek poll, the population of interest is the population of people who will vote. Since it is impractical to poll everyone who will vote, pollsters take smaller samples that are intended to be representative, that is, a random sample of the population. It is possible that pollsters happen to sample 1,013 voters who happen to vote for Bush when in fact the population is split, but this is very, very unlikely given that the sample is representative. Given the size of the sample (1,013), probability theory allows the calculation of the probability that the poll reports 47 percent for Kerry but is in fact 50 percent, or is in fact 42 percent, or is in fact zero percent. This theory and some Bayesian assumptions suggest that the "true" percentage will probably be very close to 47 percent. The more people that are sampled, the more confident pollsters can be that the "true" percentage is closer and closer to the observed percentage. The margin of error is a rough, poll-wide expression of that confidence. Statistical terms and calculationsThe margin of error is just a specific 99 percent confidence interval, which is in turn a simple manipulation of the standard error of measurement. This section will briefly discuss the standard error of a percentage, briefly discuss the confidence interval, and connect these two concepts to the margin of error. The standard error can be estimated simply given a proportion or percentage, p, and the number of polled respondents, N. In the case of the Newsweek poll, Kerry's percentage, p = 0.47 and N = 1,013. Given some statistical theory outlined below, the following holds:
Plus or minus 1 standard error is a 68 percent confidence interval, plus or minus 2 standard errors is approximately a 95 percent confidence interval, and a 99 percent confidence interval is 2.58 standard errors on either side of the estimate. The margin of error is the radius (half) of the 99 percent confidence interval, or 2.58 standard errors, when p = 50 percent. As such, it can be calculated directly from the number of poll respondents.
The use and abuse of the margin of errorThe margin of error grew out of a well-intentioned need to compare the accuracy of different polls. However, its widespread use in high-stakes polling has degraded from comparing polls to comparing reported percentages, a use that is not supported by theory. A web search of news articles using the terms "statistical tie" or "statistical dead heat" returns many articles that use these terms to describe reported percentages that differ by less than a margin of error. These terms are misleading; if one observed percentage is greater than another, the true percentages in the entire population are probably ordered in the same way. In addition, the margin of error as generally calculated is applicable to an *individual percentage* and not the difference between percentages (the margin of error applicable directly to the "lead" is approximately equal to twice the generally stated margin of error - this is exactly the case only for a two-choice poll with a result of 50% for each choice). The margin of error is often interpreted as if the poll gives either no information (a difference within a margin of error) or perfect information (a difference larger than a margin of error) about the ranking of two percentages in the population. As the margin of error continues to be inappropriately applied, simpler alternatives (sample size) or more complex alternatives (standard error, probability of leading) may be warranted. Incorrect interpretations of the margin of errorHere are some INCORRECT interpretations of the margin of error based on the Newsweek poll.
100,000,000 people. This may seem counter-intuitive at first; after all, each person in the population has a unique personality and opinion, and in a very large population, only a very small fraction of such people would actually be polled, and it would thus seem that the poll is not capturing enough information. However, because a poll involves only a very specific question, there is only one relevant attribute in the population that needs to be considered, and this means that an individual's opinion is effectively equivalent to those of many other members of the population, some fraction of which will be polled. For instance, in the running example, the only relevant attribute of a population member is whether he or she is a Bush voter, a Kerry voter, or a Nader voter - all other characteristics of a population member are irrelevant. Thus for instance if there are 100,000,000 registered voters, and 48,000,000 of them were Kerry voters, then for the purposes of this statistical To give an analogy, suppose that one is trying to estimate the percentage of salt in an ocean. This can be easily accomplished by taking a glass of seawater and then chemically analyzing the proportion of salt in that sample. The amount of salt and water in this glass is far smaller than the amount of salt and water in the ocean under study. Nevertheless, the sample is likely to give a very accurate measurement of the ocean's salinity, provided of course that the salt is evenly distributed across the ocean (this hypothesis is the analogue of the hypothesis that the poll sample is being randomly chosen). In fact, one could already obtain a crude but reasonable estimate of salinity by testing just a single drop of seawater, though of course the larger sample in the glass would provide a more accurate measurement. This analogy may help explain why it is the sample size, rather than the population size, that determines the margin of error in a poll. Comparing percentages: the probability of leadingTablesThe margin of error is frequently misused to determine whether one percentage is higher than another. The statistic that should be used is simply the probability that one percentage is higher than another. This can tentatively be called the Probability of Leading. Here is a table that gives the percentage probability of leading for two candidates, in the absence of any other candidates, assuming 95% confidence levels are used:
For example, the probability that Kerry is leading Bush given the data from the Newsweek poll (a 2% difference and a 4% margin of error) is about 68.8%, provided they used a 95% confidence level. Note that the 100% entries in the table are actually slightly less. Here is the same table for the 99% confidence level:
If the Newsweek poll used a 99% confidence level, the probability that Kerry is leading Bush would be only about 74.1%. It is evident that the confidence level has a significant impact on the probability of leading. DerivationThe rest of this section shows how the Newsweek percentage might be calculated. This probability can be derived with 1) the standard error calculation introduced earlier, 2) the formula for the variance of the difference of two random variables, and 3) an assumption that if anyone does not choose Kerry they will choose Bush, and vice versa, i.e. they are perfectly negatively correlatedd. This assumption may not be tenable given that a voter could be undecided or vote for Nader, but the results will still be illustrative. The standard error of the difference of percentages p for Kerry and q for Bush, assuming that they are perfectly negatively correlated, follows:
These calculations suggest that the probability that Kerry is "truly" leading is 74 percent. More advanced calculations behind the margin of errorLet N be the number of voters in the sample. Suppose them to have been drawn randomly and independently from the whole population of voters. This is perhaps optimistic, but if care is taken it can be at least approximated in reality. Let p be the proportion of voters in the whole population who will vote "yes". Then the number X of voters in the sample who will vote "yes" is a random variable with a binomial distribution with parameters N and p. If N is large enough, then X is approximately normally distributed with expected value Np and variance Np(1 − p). Therefore : is approximately normally distributed with expected value 0 and This is equivalent to Replacing p in the first and third members of this inequality by The first and third members of this inequality depend on the ReferencesExternal links |
|
|
This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License. |
|
| © 2008 Chamas Enterprises Inc. |