Samples and universe: what you need to know

In the early hours of Monday, September 8, we will know "exactly" the results of the legislative elections in the province of Buenos Aires. But there are anxious people who want to know "right now" the surnames of the winners and losers. Not trusting their hunches, they consult pollsters, who base their statements on samples. Perhaps out of ignorance, most likely due to pressure from their clients and journalists, they publish point estimates (and with decimals!), when they should publish ranges, to reflect the inevitable error implicit in any sample. How should we interpret the pollsters' findings?
In this regard, I consulted the Frenchman Augustin Louis Cauchy (1789-1857), who was persuaded by Pierre-Simon Laplace and Joseph-Louis Lagrange, friends of his father, to dedicate himself to mathematics. He studied at the École Polytechnique in Paris and at the École des Ponts et des Canals. In 1830, he went into exile in Turin and Prague, working as a tutor. He returned to Paris eight years later and from then on taught at the Sorbonne. His complete works were published in 27 volumes. They include the condition, the conjecture, the inequality, the law, the problem, the sequence, and Cauchy's theorem.
–A probability distribution is associated with your last name.
–That's right. A probability distribution is a function that associates each possible value of a given variable with its corresponding probability. The normal curve invented by Carl Friedrich Gauss, also known as the bell curve, is so popular that many people associate the product with a brand. It's the advertisers' ideal: consumers say Geniol instead of painkiller, or Xerox instead of photocopy. There are also binomial, uniform, Simeon Denis Poisson, and Laplace probability distributions, among others.
–What characterizes yours?
–It has a shape similar to the normal distribution, but with longer and thicker tails. It is primarily used to illustrate pathological situations, such as those illustrated by Nassim Nicholas Taleb in The Black Swan.
–Back to business. What do you have to say to those who are constantly biting their nails because they can't wait for the results of the polls?
–Let's distinguish between the universe and samples. A universe is a totality: for example, all Vélez Sarsfield fans. A sample is a portion of the universe: for example, all members of the aforementioned club. I use the plural because there can be many samples of the same universe. In the case of the province of Buenos Aires, the result of the universe will only be known when all the votes are counted.
–What can anxious people do?
–A survey that questions all voters, and praying that they tell the pollster the truth and don't change their voting intention between the time they answer the survey and the time they cast their vote.
–This is clearly very expensive.
–That's precisely why samples were invented, a subject on which there is much theory and also much experience. In the case of voting, consulting the universe is extremely costly. In other cases, it's completely counterproductive. Imagine if, to be sure of your blood quality, the doctor took not a small sample but the entire sample in your body. The diagnosis would be free of sampling error, but you would die.
–Sampling error, what are you talking about?
–I mean that, even if the survey is conducted randomly and stratified, there is no absolute certainty that the value obtained in the sample matches the respective value in the universe.
-And then?
–The values obtained in the samples do not have to be published as single numbers, much less with decimals, but rather as ranges. For example: the pollster should not say that candidate X's voting intention is 32.8%, but rather that it is between, for example, 30% and 34%, with a Y% sampling error.
–How is sampling error estimated?
–Because of the sample size, and also because of the probability distribution believed to exist in the universe. This way of presenting the results may be less shocking, but it's more appropriate. Incidentally, when, after an election, the media congratulates the pollster who got the right answer, he or she—inwardly, even if he or she doesn't say so publicly—knows that this involved a significant element of chance.
–Sampling error is inevitable.
–Yes, the important thing is to understand the conflict between the precision required for the sample estimate and the corresponding sampling error. Anyone who demands an accurate estimate, and even using decimals (like the aforementioned 32.8%), must know that the sampling error will be extremely high. On the other hand, a pollster who says that voting intentions for a given candidate are between 0% and 100% is certainly not going to be wrong, but this result is useless.
–In the electoral case, the problem is worse, because we want to know which candidate is going to beat which other.
–Indeed. The professionally responsible pollster has to say that candidate J's voting intention is between 40% and 44%, while candidate K's is between 39% and 43%; all with a sampling error of X%. And therefore, based on the survey, there is no way to plausibly predict the outcome of the election.
–It would be different if voting intentions were very different in the universe, for example, if one candidate had 85% of the votes and the other, the remaining 15%.
–Of course, because in this case, even a technically flawed poll would very likely predict the final result. This doesn't seem to be the case in the province of Buenos Aires between the candidates of the Frente La Libertad Avanza (Freedom Advances Front) and Fuerza Patria (Force Homeland).
–We are ready for suicide.
–No way. What we need to do is understand, so as not to buy mailboxes. The universe cannot be falsified; that's the universe (even though, in the case of elections, voting intentions can be modified). And when voting intentions are very similar across the universe, it's very difficult for us to know anything before next September 7th.
–Despite this, until election day, some radio and TV stations will devote the bulk of their programming to anticipating the results and speculating about the possible consequences.
–Well, we have to entertain ourselves with something while we wait for the ballot boxes to “speak.” But let's not ask either the theory of polls or those who use them professionally for what they are not capable of providing. Because the difficulty lies in the universe, not in the samples.
–Don Augustin, thank you very much.

lanacion