January 1995 (vol. 11, #2) 1601 N Tucson Blvd
#9, Tucson AZ 85716 c 1995 Physicians for Civil Defense
Mark Twain once said that ``there are lies, damned lies, and statistics.'' What he should have said is that there are liars, damned liars, and those who distort and misuse statistics.
The science of statistics is actually a powerful weapon in the search for truth. Those who take the trouble to learn the basic mathematics and to look at the data themselves can use statistics to shred the arguments of manipulators and scaremongers.
One of the best teachers was Petr Beckmann, author of The Elements of Applied Probability Theory. Back issues of Access to Energy provide useful and entertaining examples, explained in layman's terms. (Complete sets of back issues are now available on paper from Irene Beckmann for $145, Box 1342, Boulder, CO 80306, or on CD-ROM from the present editor Arthur Robinson, $95, Box 1250, Cave Junction, OR 97523). Additionally, there is an excellent discussion in the February issue of AtE. If you have trouble with these, you may need to borrow a Saxon math book from one of your children. (If they attend public school, they are unlikely to have one; books from elementary texts through calculus can be ordered from the Thompson School Book Depository, PO Box 60160, Oklahoma City, OK 73146.)
Statistics are very useful for determining whether an unusual (perhaps disastrous) event is occurring, and also for evaluating the possible causes.
Here are a few questions to test your understanding:
1. What percentage of the population is average?
2. What is the significance of a value that is twice the average?
3. If 20 independent tests are performed on a perfectly healthy individual, what is the probability that at least one will be outside the normal range?
4. Is an agent that doubles the incidence of a certain cancer of greater public health concern than one that increases the incidence by only 50%?
5. If two entities are always (or frequently) found in association, can we conclude that one causes the other?
Radical reformers, who wish to impose disastrous changes on society, have to convince people that there is a catastrophic threat. Instead of stating actual numbers, which might not sound impressive, they may make a comparison with the average.
It can be rigorously proved that exactly 0% of the population is precisely ``average.'' By definition, exactly half the population is above average, and exactly half is below. We need to know just how far above or below average a certain value is. And to say that it is ``twice'' the average is in itself meaningless, unless we know the shape of the probability distribution function. In other words, is the bell-shaped curve fat or skinny? Normal variability may be so great that 30% of the population has a value at least twice as high as the mean.
For example, consider Disease X, which normally affects 1 in 10,000 people. If we randomly select 10,000 people from the population, the chance that two or more persons will have disease X (giving a prevalence of at least twice the expected) is 26%, by chance alone. (This is calculated from the binomial probability distribution function, described in any standard statistics text or see ``Medical Poll-bearers and Statistical Malpractice'' by Jane M. Orient, J Med Assoc GA, Aug 1993, reprints available on request).
Besides the normal range, one must know the error of the measurement. You cannot be sure that a temperature is 1 degree higher than the mean if you are using a thermometer with an accuracy of ± 0.5 degrees. Furthermore, you need to know the error of the mean, which depends on the number of measurements. For example, if you determine the IQ of only a dozen individuals, the mean may be far different from the true mean of the entire population. If you test 10,000 persons, you are far more confident of having a representative sample.
The pitfalls of a small sample size should be obvious. Yet many highly paid ``decisionmakers'' easily forget this fact. They may, for example, decide to deny insurance coverage to patients of a physician who has a ``higher than average'' rate of Caesarian sections, based on a small and highly unusual group of patients.
The third question relates to the fallacy of ``fishing expeditions.'' If a plaintiff's lawyer wants to build a case against an occupational exposure, he can always find a disease that is present in ``statistically significant'' excess providing that he looks at enough diseases. He will also find just as many with lower than normal incidence, but you can bet no one will claim that the exposure is protective!
The precise answer to question 3 (again from the binomial distribution function) is a startling 64%, if ``abnormal'' is defined to be a value more than two standard deviations from the mean. Thus, the presence of a certain amount of abnormality is actually quite normal. (Because of this, large batteries of screening tests can be very lucrative: patients often need still more tests to check on the first set of results.)
If multiple comparisons are made in an experiment, the criterion for statistical significance must be more rigorous. And if an association ``turns up'' that was not part of the original hypothesis, it must be checked out in an experiment specifically designed for that purpose.
Another common error is to place more emphasis on a purported ``doubling'' of a rare disease such as brain cancer or leukemia and to downplay a smaller percentage increase in a common tumor such as breast cancer, especially if the proposed etiologies are more or less politically correct. A student research project: estimate the amount of ink, in column-inches per life at risk, devoted to cancers of various proposed causes.
It is impossible to overemphasize the importance of the answer to question 5. Even if all the statistics are done right, a correlation does not prove cause and effect. A perfect correlation may result from two variables having a common cause. On the other hand, lack of correlation can disprove causation.
How can the literate nonscientist acquire immunity to statistical disastermongering? When evaluating a scientific report, start with the following steps: (1) Look at the raw data, not just the ``averages'' and the conclusions. If there are few data, be wary. (2) Check the sample size, the method of selection (was it random, or biased?), and estimates of the error of measurement. (3) Consider possible uncontrolled factors that could influence the results. (If it's a cancer study, were the effects of smoking and age considered?) (4) Were multiple comparisons made? Was the significance criterion changed accordingly? (5) Is the result merely a correlation, or is there a plausible mechanism for causality? Are quantitative changes in the proposed cause (e.g. ozone level) related in the expected way to the effect (e.g. UV level, see p. 2)?