This phrase is often attributed to Mark Twain though that is disputed.
Statistics is a very powerful science/math. It allows you to see things that are not always obvious at a glance. It is also sort of complicated. Because I’m no longer a math nerd, having become a computer nerd this is my goto book whenever I have to actually do statistics. The Cartoon Guide to Statistics
My favorite story from the book, that I repeat often, is the story of a small post graduate business school. As part of their recruiting drive they reported that the average salary for somebody graduating with an MBA from their school was well above $150k.
This sounded wonderful. What they failed to mention was that they got this mean by adding up the first year salaries of all their graduates and dividing by the number of graduates, including the one graduate that went into the NBA with a multi million dollar first year salary. With the small size of the graduating class that salary drove the mean(average) way higher than the median.
The school didn’t tell a lie, they didn’t tell a damn lie, they just used statistics to lie for them.
Almost all measurements of natural phenomena fall into what is called “The Bell Curve”. The bell curve is defined as by the mean (sum of the samples divided by the number of samples) and standard deviation. The larger the standard deviation the wider the bell, the smaller the standard deviation the narrower the bell is. Small SD have steep sides.
As shooters we can see this when we take samples of the muzzle velocity of a round. You collect enough samples and you will be able to calculate the mean and standard deviation. If the standard deviation is small then you know that most of your rounds have nearly the same velocity. If the SD is larger there is more fluctuation in the velocity and that will affect accuracy.
When you are looking at verifying a result as being significant and not just “noise” you want that result to be two standard deviations away from the mean of your comparison.
So if you have a population of 200 subjects and you give 100 of them a placebo and 100 of them a new medicine you and you want to compare the results you calculate the mean and SD of both groups after your tests. Let’s say the value you are collecting is weight loss or gain. If the placebo group loses 10 pounds on average with a SD of 3 you have a base line. If the other group loses an average of 13 pounds that sounds like it is “good” and it is, but it is not significant. The group would need to lose an average of 16 pounds (10+2*3) for it to be considered significant.
There is another issue in the samples, that is the outlier. Our example of first year salaries is an example of outliers. What if our medicated group had one or two people that managed to lose not 16 pounds but 24 and 26 pounds? In a sample size of 100 that moves the mean by 0.2 pounds. That could easily be the difference between meeting the two standard deviation threshold and not.
So we “throw out” the outliers. This can give you much better results.
And here we start to see some of the complications, when do you decide that a value is actually an outlier.
How to lie with statistics
The first way to lie with statistics is to measure the wrong thing. In gun rights we see this all of the time. When comparing crime statistics we will often time only hear about violent crimes involving guns. Is this the right measurement? There is a strong argument that it is not, the correct argument is violent crimes. Thus we see in the UK that their violent crime rate with guns has dropped since they banned guns but their violent crime rate overall has not dropped.
In the same way it is important how different groups define what is being measured. Some commenter explained to me that in the UK it is only a murder if somebody is convicted of the crime of murder. If there is no conviction then it wasn’t murder. In the United States murder is defined such that it does not require a conviction. If you were to compare murders reported in the UK v. murders reported in the US you are not comparing the same thing.
It is also the case when you are comparing internal values. A good example is the consumer price index. From memory, there was a point in time when the CPI was going up faster than the government wanted it to. The CPI was based on a “basket of goods and services”. Included in that basket was steak and other high quality goods that were purchased on a regular basis. The government decided that the basket was no longer representative because the cost of steak had gone up so much that people weren’t buying steaks as often, so they replaced steak with hamburger “because that’s what the people are buying.”
From a statistical point of view this means that comparing values of the CPI from before the change to after the change really doesn’t work.
By changing the definition they lied with statistics.
In the gun rights world we see this with the definitions of “mass shooting” and “school shootings.”. The FBI has defined both of those. Fortunately it turns out that there are very few mass shootings and fewer still mass school shootings. By changing the definitions you can increase the reported number of mass shootings.
So instead of “4 or more killed excluding the shooter” we end up with the much more inclusive definition of “4 or more people killed or wounded including the shooter.”
The FBI excludes gang violence. The fearmonger definition does not. This is why we hear claims of “more mass shootings than days”. If it was truly the case that there was a mass shooting a day we would be hearing about it non-stop. Instead, most of the shootings are regulated to gang shootings and it doesn’t make the news.
They lied with statistic by changing the definition of mass shooting to include many more events. Events that most people would not consider to be mass shootings.
The fear mongers have redefined “school shooting” in a similar way. Instead of a shooter entering a school and shooting students and staff they use “a gun was fired on school property.” That’s how we end up with school shootings including a man that committed suicide in a school parking lot. Drug deals gone wrong on school property after hours and after dark. A shooting in a school bus parking lot after midnight. Yes, all these took place on school property, but most of them did not involve actual students in school.
Another example of the definition game is in defensive gun use. The article that Miguel posted on July 25, 2022 uses a definition of “killing the suspect/attacker/criminal.” This definition ignores merely wounding the animal. It ignores presenting the weapon and having that stop the criminal act. It ignores all the other ways that a DGU happens where nobody ends up dead.
One last definition game, conflating suicide, justifiable homicide and murder in the term “gun violence”.
Next on our list of ways to lie with statistics is sample selection. The advertisements use to be “4 out of 5 dentists recommends Crest Toothpaste.” That is a cool trick, how many dentists did they sample? Did they sample 5 and stopped with the first that said something else? Or did they sample a few thousand?
How did the select that sample? Did they send out free “patient cleaning kits” to dentists with the small tube of Crest and include a survey form? Did they call dentists that had gotten samples? Or did they call a random yet large sample of dentists?
This is how you can have the same survey question asked on two different platforms and get completely different results. If you asked “What is better, to hardent schools by allowing teachers and staff to carry or to add metal detectors at the entrance?” here or on Daily KOS you will get very different answers. Same of asking Fox viewers v. CNN viewers, you get vastly different answer because your sample selection is vastly different.
In polling it is also an issue with what questions are asked and how.
All of these lead to answers based on statistics that completely lie.
Never ever trust a statistic without knowing what was measured, how it was measured, what the error rate is, what the standard deviation is.
Why do they think we are so stupid that we can’t see what they are doing?
Comments
10 responses to “Lies, Damn Lies, and Statistics”
“Why do they think we are so stupid that we can’t see what they are doing?”
A significant enough portion of the population does not see what they are doing, and the believe the statistical lie.
.
Another twist, not necessarily a statistical lie, but word play, altering definitions.
.
The gun control crowd will say a civilian has never stopped a mass shooting. Never.
.
They are not wrong. If a mass shooting is four or more dead, and a civilian stops the shooter before there are four dead, they did not stop a mass shooting. (You cannot prove the shooter was not going to stop when they killed three.) If they civilian does not stop the shooter before they kill four, they did not stop a mass shooting.
I think I’ve said that exact same thing in other places. If a good guy with a gun stops a potential mass shooting before four people excluding the bad guy then the good guy didn’t stop a mass shooting because there was no mass shooting. If they good guy stops the bad guy after there are four dead, then the good guy didn’t really stop a mass shooting as it took place.
If they have to lie and cheat to support their arguments then it isn’t a very good argument
“Why do they think we are so stupid that we can’t see what they are doing?”
Because most don’t. They simply, ‘trust the science’.
Why do they think we are so stupid that we can’t see what they are doing? Simply because they DO think we are stupid.. look at the attitude of many media “commontater”, the condensending sneer.. I saw a blurb from Fox news- 85% of Americans have zero trust in the media. And “media” keeps right on spewing the same lies…
They KNOW the general population is f-ing stupid. Why? Because they continue to vote demoncrats into office. And, please, don’t anyone go down the “well, demoncrats cheat” BS. Yes, they cheat, but not that much for that many decades.
Romney was right in 2012. 47% of the population was not going to vote for him no matter what he said or did.
As to the cheating, it certainly goes on, there is no way the Daleys could be elected and re-elected Chicago Mayor without the cheating. But, nationwide? No, not enough cheating votes, and the cities where they would make a difference are already covered. The democrat agenda will move forward even if the mayor of Allentown, PA or Eugene, OR, or Enid, OK is a republican. As long as NYC, Chicago, Baltimore, DC, Atlanta, etc… are all solidly under Democrat control, there is no reason to cheat to get Wasilla, AK solidly blue.
CB, the national vote doesn’t count, given the electoral college. You’re right that NYC is solid blue, but that just means NY votes D in presidential elections by a substantial margin. Similarly, there’s not much point in worrying about election integrity in Wyoming. It’s the “swing states” that matter, and that’s where you have to watch for trickery.
What CBMTTek said.
–
This play on definitions (leading to the lie that “civilians never stop mass shootings”) is a false Catch-22 that completely ignores the actual statistics. You’ve certainly seen the meme that says “Average mass shooting casualties when stopped by police: 22. Average mass shooting casualties when stopped by armed citizens: 1.9.” (or something similar). That is based on actual numbers and actual mass shooting attacks, but I believe it’s just as cherry-picked a sample as anything the anti-gun statistical-liars would pull. (Among other things, many of the worst mass shootings — Parkland, FL, and Newtown, CT, come to mind — weren’t stopped by either police or armed citizens — the killers stopped themselves or committed suicide — but I’d bet they were counted in the “stopped by police” bucket because police were present … and the fact they nudge that number higher doesn’t hurt. “Killer stopped himself” should be its own third bucket.) It’s just designed to tell the opposite narrative.
–
Speaking of, AWA, cherry-picking data points is another way to lie with statistics. Excluding data points is acceptable, but only with a good reason; the NBA accountant story could be excluded as an unrealistic outlier, or in drug trial results in which they discovered some testers have underlying conditions that affect how the drug works, those subjects can be left out of the results. Excluding data points merely because they go against your chosen narrative is NOT acceptable … but people do it anyway.
–
The opposite practice — including irrelevant or non-existent data points — also happens. (See above “killer stopped himself” example, which doesn’t really fit in either of the two published groups, but could be forced into one to drive the averages up.) I like to call this “ground-cherry-picking”; a ground cherry is a bush- or vine-growing fruit, not at all related to cherries (it’s more closely related to a tomatillo, I think), but an unscrupulous person could artificially inflate cherry numbers by including ground cherries.
–
My statistics teacher always said, “The numbers don’t lie; they just get misrepresented.” (IOW, the numbers say what they say, but the person telling you what they say isn’t always telling it correctly.) I agree, but intentional “misrepresentation” is a lie, and it happens all the time.
You are right, I totally left out cherry picking. The post was long already and I totally forgot. It was on my mental list of points to bring up.
Ground cherries are yummy. We don’t intentionally grow them but we do let them grow where we find them.
We intentionally grew ground cherries once … and then couldn’t get rid of them. I like them, so I didn’t mind letting a few bushes grow; the wife, not so much. LOL!