Maths encyclopedia and lessons  
Search

Mathematics Encyclopedia and Lessons

 
     
 

Lessons

Popular
Subjects

algebra
arithmetic
calculus
equations
geometry
differential equations
trigonometry
number theory
probability theory
more
 

References

applied mathematics
mathematical games
mathematicians
more
 
 

Misuse of statistics

"The pure and simple truth is rarely pure and never simple." Oscar Wilde

"First get your facts; then you can distort them at your leisure." Mark Twain

"There are three kinds of lies: lies, damn lies, and statistics." Benjamin Disraeli

"Then there was the man who drowned crossing a stream with an average depth of six inches." W. I. E. Gates

Contents

Methods of statistics misuse

There are many ways to misuse statistics, other than by deliberate deception.

Professional scientists, even mathematicians and professional statisticians themselves, can be fooled by some simple methods even if they are careful to check everything.

Some scientists have been known to fool themselves accidentally with statistics, due to lack of knowledge of probability and lack of standardisation of their tests. The false statistics trap is one of the most damaging to the quest for knowledge; especially to medical knowledge where correcting the damage done by a bad statistic might take decades.

Misuse of statistics by being selective

In marketing terms all you have to do to promote a neutral (useless) product is to find, say, 20 studies with a certainty level of 95%, one of which is favorable to to the product/idea being promoted. Ignore the 19 results that are against and promote endlessly the one study that says your product/idea is good.

If you are lucky enough to be in a field where 300 studies exist, you can use the 10 most positive and ignore the rest - you'll be able to fool the customers into thinking the product is widely accepted as safe and effective. You can even claim peer review has occurred and was successful!

Organisations that do not publish every study they carry out, such as tobacco companies denying a link between smoking and cancer, or miracle pill vendors, are likely to use this tactic.

Choosing the question to get a certain answer

If you give a questionnaire on politics, you can easily influence the answer by asking the question differently. You can also precede the question by 10 other questions which makes the questioned person "accidentally" aware of 10 issues that seem defavorable to your candidate, or simply omitting the third party view from the questionnaire - This will make many would-be third party voters who may be somewhat undecided, take the side you favour.

Misapplying

If you have a statistic saying 100% of apples are red in summer, and then publish "All apples are red", you will be lying by omission and many people will remember seeing green apples in the springtime.

You can also put on TV "All apples are red in summer" and many people will not remember you said "in summer" if asked weeks later.

With a subject on which the general public has no personal knowledge of, you can fool a lot of people. For example you can say on TV "Most autistics are hopelessly incurable" and people will believe the fallacy and forget the subtitle or details which say "if raised without parents or normal education". The main part of the claim will be remembered, however false it may be by itself, and the details will be forgotten. This is especially prevalent on TV, where talk show hosts interview one individual as representative of a whole class of people.

Biased samples

See bias (statistics).

If you question by home phone on the subject of homelessness, you will get a homeless rate near zero. That's because homeless don't have home phones!

Biased samples like that are often used to determine statistics about incomes.

Television companies have to satisfy themselves of the truth of any claims made in the advertisements they show. A company making cat food, claimed that 8 out of 10 people questioned in a supermarket, said their cats preferred Catex (ficticious brand name used here), and satisfied the TV company that their data were true. What they omitted to show was that their researchers only selected the customers seen taking Catex from the shelves.

Misreporting or misunderstanding of estimated error

If a research team wants to know how 300 million people feel about a certain topic, it would be impractical to ask all of them. However, if the team picks a random sample of about 1000 people, they can be fairly certain that the results given by this group are representative of what the larger group would have said if they had all been asked.

This confidence can actually be quantified by the law of large numbers, but the measurement has two parts. It is expressed as a probability of the true result (for the larger group) being within a certain range of the estimate (the figure for the smaller group). This is the "plus or minus" figure often quoted for statistical surveys. The probability part of the confidence level is usually not mentioned; if so, it is assumed to be a standard number like 95%.

The two numbers are related. If a survey has an estimated error of +/- 5% at 95% confidence, it might have an estimated error +/- 15% at 99% confidence. The larger the sample, the smaller the estimated error at a given confidence level.

Most people assume, because the confidence figure is omitted, that there is a 100% certainty that the true result is within the estimated error. This is not mathematically correct.

Many people may not realize that the randomness of the sample is very important. In practice, many opinion polls are conducted by phone, which distorts the sample due in several ways, including exclusion of people who do not have phones, favoring people who have more than one phone, favoring people who are willing to participate in a phone survey over those who refuse, etc. Non-random sampling makes the estimated error unreliable.

The problems mentioned above apply to all statistical experiments, not just population surveys.

There are also many other measurement problems in population surveys. See:

False causality

If the number of people buying ice cream at the beach is statistically related to the number of people who drown at the beach, then nobody would claim ice cream causes drowning because it's obvious that it isn't so (no more than drowning causes ice cream buying).

Replace "ice cream" by the technical name of a chemical the public is not familiar with, and "drowning" by cancer, and immediately you can have many people believe you.

The more common use of that fallacy is to replace "ice cream" by some exotic sounding foreign plant and "drowning" by a cure for common ailments which are vague and intermittent in nature. This way it is very hard and expensive to prove the product doesn't cure them. Many useless or bad products are marketed that way in the alternative health industry.

See Correlation implies causation (logical fallacy).

Discarding data

If a product is proven to be inferior in all respects except for one very narrow range where it is superior, the statistics will be about that narrow range and the one thing the product is superior.

This is often done by discovering an islet of superiority in a large study which is never published, followed by a few studies showing only a marginal superiority in a narrow margin of one thing about the product.

Such a study is replicable and does not include a straight lie (they are occasionally done by sheer luck). Discarding the unfavorable data from the first study is however lying by omission. Even the people replicating the later study will be fooled!

Being blind to the flaws in studies

Governments are generally unable to get elected without receiving enormous donations from companies. So they have a tendency to look the other way when scientists point out the polluting, cancer-causing, price fixing, and other illegalities detectable by available statistics.

Such a government will delay verifications and call it a mere technicality. The delay of any verification usually lasts close to forever.

Such tactics are often used in deciding the four - or five - food groups a country makes and the suggested portions of each in order to achieve the sales of the surplus food overproduced by the country. Some of the food might have only marginal or dubious nutritional value, or a single product might be the only one listed to get some specific benefit while alternatives are not listed on the 4 food groups flyers.

The hard disk failure and the secret excuse

Some people who have reached a conclusion from some data may decide to keep that data secret or "accidentally lost due to computer problems". This is done so no possible revision of a passing grade is possible in student projects or modest or dubious value; but more importantly it's the policy of some government entities.

Here is a review of a dubious fluoridation study [1]

Regardless of the validity of any other study or of fluoridation itself, this one study is ripe with flaws and the data will not be shared with reviewers - only some selected graphics and conclusions will be shown.

Some pro-fluoridationists will still quote that one bad study as if it were good, an obvious misuse of a deeply flawed statistic.

Some anti-fluoridationists will claim that the poor quality of this one study proved fluorides are toxic and there is a widespread coverup. This is a somewhat more subtle misuse of a statistic.

Applying group statistics to individuals

See Ecological fallacy.

Statistical sleight of hand

It's important not to miss important caveats.

See Statistical special pleading.

01-04-2007 01:18:14
The contents of this article are licensed from Wikipedia.org
under the GNU Free Documentation License. How to see transparent copy