Election 2016: The Battle of the Statisticians...

Macrobius Tuesday 25 Oct 2016

More from this fellow: http://statisticalideas.blogspot.com/2016/10/sea-of-faulty-polls.html

His argument is correct (and saying much the same as Taleb with a different vocabulary) -- he is basically arguing that the 'house bias' effect should either be included in the variance (error bars) and if it were properly so included it would raise Trump's chances from 15% [like Nate Silver says] to 30% -- which is not a small number or a certain loss-- or else we are not in Mediocristan at all with these polls. The model error is *much* greater than a theoretical normal distribution.

A bit of background: Statisticians (and Econometricians) know that 'not everything is a Gaussian Bell Curve' like they teach in your first stats course. A much more plausible model is a linear model (GLM) and that comes in two flavours, 'Random Effects' and 'Fixed Effects'. The point of the discussion below is that a Random Effects model is not plausible. There are formal tests from this, but the effects cannot be drawn from the same distribution. This leaves the Fixed Effects GLM as a possibiilty -- there is a huge partisan bias to the polls -- but then the model uncertainty inherent in an FE GLM is not being quoted by the media. This guy guesses a factor of two understatement in the error bars shown, which is quite plausible given these data.

Another thing my Statistician's eye catches is that the second graph does not show a proper bias / variriance tradeoff. It is not unexpected that the polls are biased (not to Statistician it isn't). Nor is it surprising that the data are 'heteroskedastic' (variance gets larger with the bias effect). What is surprising is (1) the heteroskedasticity is not funnel shaped but *hour glass shaped* and (2) there is no evidence that the bias is buying anything in terms of reduced variance, which is a legitimate tradeoff. These are two red flags for any competent analyst.

It looks like partisan bias is driving *more* variance, not less. Translation for the layman: these polls are pure bullshit put out in the interest of partisans from both sides. Trump's odds (in the popular vote) are a coin toss maybe better. Nate Silver is, as Taleb points out, full of it, because he is not doing maths anymore. What a correct analysis along the lines of simulating the electoral vote would look like, no one can say because no one has published one yet (that we've seen, anyway). Silver's stuff is Garbage In, Garbage Out.

https://en.wikipedia.org/wiki/Heteroscedasticity
https://en.wikipedia.org/wiki/Bias–variance_tradeoff

Anyway, standard long excerpt to spare you the click

Short-term update: two polling articles here in past week saw a combined readership of >1.5 million, and several thousand shares. Thank you!

In this article we cover the theoretical bases for two interconnected ideas that we've discussed recently: (a) that the empirical polling results are not as dire as current landslide mainstream media projections make it out to be, and (b) many polls are oscillating about impossibly low probabilities right now for Donald Trump. This year is genuinely unique in merging several fundamental aspects, with a largely disenfranchised voting base across the country (i.e., record undecideds), and pollsters unable or unwilling to properly assess the true probability for Mr. Trump (and their incoherent polls evidence this). This is not a matter of apologizing for the ground-level odds currently shown by mainstream media, or that the average Hillary Clinton lead is merely unsustainably high. This loses the forest through the trees, as we theoretically prove here. Start by studying a sample of the general election polls below, taken in just the past couple days. Do you see anything wrong there? If you don't, then you have no business being around polling data. The average margin of error on these 7 spreads shown is only 3%. Most polls should therefore be within a few percent of the 6% average spread that is advertised by media. But instead most are not ! For example, the difference between the highest Ms. Clinton spread and the lowest Ms. Clinton spread is >14 percentage points! And the standard deviation among these mainstream polls is 5%. So both have to be added together, and each is already higher than 3%! That's an unusual, impossible outcome through luck alone. Therefore something is misrepresented in the polls. Also right now 2 of the 7 polls favor Donald (you just' don't hear about them), so double the 10-15% odds he is being given. In the final analysis of this trinomial data, on November 9 we'll look back and see only one poll being correct and most were flat out wrong. This evidence below is a breach of the probability theory behind proper polling, where most polls should see the correct spread within the margin of error interval (that's what the interval's definition must be!) If the margins are therefore completely busted, then so too are the egregious spreads that are seen to be all over the place (and mostly untrustworthy ). Likely the correct expected spread right now is 4-5%, and the larger spreads are coming from pollsters that ironically also have the highest margin of errors (casting further suspicion on how close the election really is for Americans). We stand by our long-running estimate that the current probability for a Donald Trump victory is about in the 20% range, or twice what mainstream media is projecting. Of course that is low, but to some it's still a compelling 1 in 4 chance (and much different than some might expect given all the twists and turns this campaign season has brought us). It's also a better reflection of the true odds, versus those dished out by the same inane talking heads who recently gave you the Brexit "remain" prediction, or the NeverTrump prediction!

So there you have it as clearly shown as possible. If these margins of error are correct, then most polls would have the spreads located within a few percent of 6% (so 3%, to 9%). Yet the majority of the polls are outside of this 3%, to 9%, interval. Probabilistically impossible. The idea that whatever the correct spread is determined to be on November 9 , we will ultimately prove -shockingly we might add- that one poll was correct but also that most of these other polls were wrong . Those polls (unsure right now which) are because there the correct spread will have been outside of their margin of error intervals.

The only correction anyone can make now to the failed margin of error is to enlarge it, in order to encapsulate most of the other intervals about the correct spread. Without these overlaps, we can't discuss spreads in the media, since the data is from an entirely corrupt polling system! The direction of unbiasing the data is also obvious.

To start with, the only correct expected value for the spread has to be reduced since that is the direction of asymmetric bias . The largest polling spreads have become too extreme and must be brought in already. Combined with larger margins of error. The result of this combination is a correct spread that is lower at about 4-5% , and a margin of error that is roughly double what's been advertised (5-6%). Implying Donald Trump's chances of winning is nearly twice what the mainstream media's been floating around.

How do we get a double of the margin of error, and the implications of it for where the expected spread should be? The likelihood that we would get a result of one where 2/3 of the polling spreads are inside the margin of error interval and yet most don't fall outside of the interval, is only about 1/4 or so of the time (other possible outcomes are that all , or most spreads within all margin of error intervals, or that no spreads overlap at all). In order to get that likelihood rebalanced back at majority, we need to have wider margins so the maximum likelihood outcome we expect to see at that time works out.

Note that these topics were discussed in a recently viral article that last weekend was on the top of ZeroHedge and reddit , and amassing 1 million reads and thousands of shares . For perspective on that number, it's equal to the print subscription/circulation of my cherished The New York Times (and a typical media article attains only several thousand reads). And we should note that a day after this article of ours noting the probability pricing arbitrage on gabling bets that Mr. Trump's spread would tighten, the largest bet ever was wagered for him.

Also the effect of wider margins is that the probability of Donald leading in the actual election doubles from the 10-15% or so that the current pollsters show (and he has not recently deteriorated from). Hence arriving at an actual probability for him that must be greater than 20% or so. Larger uncertainty therefore, given the undecideds for this candidate, and a more narrow spread. This is what we have been saying all along.

The last topic here is that we can see that the higher Hillary spreads are coming from pollsters that have the higher margins of error, though we also showed above they those error intervals are still not wide enough. It should be plain that between the highest spreads and the lowest spreads, the highest ones (those over 9% or so) should be the ones treated with the greatest reservation. Completed by the same shamelessly ignorant and flawed pollsters who gave you #NeverTrump and the Brexit stay prediction, both not so long ago.

Rest at the link.

Macrobius Friday 28 Oct 2016

Kaine has just moved to 250:1 for some reason or other

https://www.betfair.com/exchange/plus/#/politics/market/1.107373419

Augusto Pinochet Sunday 30 Oct 2016

Early Voting Points to a Tight Election

The election has already begun, voting has started, and guess what: polls which predicted a Hillary landslide are dead wrong. I can say one thing for certain: Hillary will not win in a landslide. And I think there’s a 50% chance she won’t even win. The early voting results paints a mixed, competitive picture of both good news and bad news.

First, the good news. Trump is up in Iowa without any adjustments and Republicans are doing better in early voting than they were in 2012 in Iowa. Georgia is not a battleground state , with Trump up 3 in the RCP average without any adjustments. Maine CD2 is probably going to go to Trump, but it’s unlikely to matter.

The bad news is out west. Trump is matching Romney’s results in Nevada, and Colorado is simply a lost cause. Trump will win Arizona, Romney won it by 9.03%, and Trump is doing well enough in early voting that he’s on track to win it, but he’s doing worse than Romney. Also, Virginia is probably not really a battleground state anymore . It’s gone.

But the really potential good news – and the way Trump can win the election, is a turnout boom in Ohio, Pennsylvania and possibly Michigan. This is where the data is exciting and is going to decide the election.

Broseph Monday 31 Oct 2016

Augusto Pinochet said: ↑

Early Voting Points to a Tight Election

The election has already begun, voting has started, and guess what: polls which predicted a Hillary landslide are dead wrong. I can say one thing for certain: Hillary will not win in a landslide. And I think there’s a 50% chance she won’t even win. The early voting results paints a mixed, competitive picture of both good news and bad news.

First, the good news. Trump is up in Iowa without any adjustments and Republicans are doing better in early voting than they were in 2012 in Iowa. Georgia is not a battleground state , with Trump up 3 in the RCP average without any adjustments. Maine CD2 is probably going to go to Trump, but it’s unlikely to matter.

The bad news is out west. Trump is matching Romney’s results in Nevada, and Colorado is simply a lost cause. Trump will win Arizona, Romney won it by 9.03%, and Trump is doing well enough in early voting that he’s on track to win it, but he’s doing worse than Romney. Also, Virginia is probably not really a battleground state anymore . It’s gone.

But the really potential good news – and the way Trump can win the election, is a turnout boom in Ohio, Pennsylvania and possibly Michigan. This is where the data is exciting and is going to decide the election.

This is pretty much how I see it, but I think election day turnout will win NV for Trump. He's winning every "must win" state. He just needs NV and one other and it's a done deal. Remember that polls had Clinton beating Sanders by 21% but Sanders won by 1%. Right now, polls show her about 5% ahead of Trump, FWIW.

Macrobius Tuesday 1 Nov 2016

http://www.thephora.net/forum/showthread.php?t=112542

I don't have time to explain this right now, but this is important

http://statisticalideas.blogspot.com/2016/10/pollsters-gone-wild.html

He's going down the right path here by looking at the bias, variance relation (error analysis) of the polls -- this is what real data analysts who know what they are doing think about. He also does a bootstrap, which is the right thing to do in a classical (90s-ish) analysis, and builds a 0/1 classifier (which is the aughties way to do it).

Bias and variance is lecture 9 of this mid-aughties set of course notes:

http://web.engr.oregonstate.edu/~tgd/classes/534/

The 'classical' part of his analysis shows Trump at 20% probability of winning the *popular* vote (it is too bad he doesn't repeat Silver's analysis of the electoral college, then we would all know)

Key finding is that the Liberal polls have a >=0.1% bias for some reason.

Now what all this means I will try to address later. I would caution the mathematical people here not to just to rash conclusions such as 'we should just throw out the Liberal polls'. Downweighting information *all the way to zero* is seldom the optimal course, even if it is a rough heuristic. We can do better.

There are ways (*cough* Van Kalman filter *cough*) to recover signal from noisy data without selling yourself short.

I wish he'd gone further and applied the (post 2000) analysis of Pedros Domingos (who is also author of The Master Algorithm which I've been recommending lately).

http://homes.cs.washington.edu/~pedrod/papers/mlc00a.pdf

Anyone who wants to get at the truth of these polls will read that paper *very* carefully and maybe apply what they learn.

Bias Variance analysis is *absolutely* the way to go -- but we need to think in terms of 'learning a binary classifier' for states, given the polls as training data. Think of '1' as votes for Trump, and '0' as anything else.

Here's a starter:

- a bernoulli trial is a sequence of 1s and 0s (like coin flips). predicting the electoral college means

- bad predictions have a 'loss' function which is brutal -- if you get a state wrong, add "1" to your loss function, otherwise "0".

- now find a learning algo that beats everyone else's loss function, even Nate Silver's, and predict the election for us

This is easier than it sounds -- the amazing thing is that no one has actually done it *correctly*. They are all off chasing last years' delusions.

Macrobius Tuesday 1 Nov 2016

*predicting the election means getting 54 or so bernoulli trials correctly predicted.

Macrobius Tuesday 1 Nov 2016

Anyone wanting to do real stats should pay attention to Andrew Gelman (main popularizer of MCMC in the 90s, and hence the starter of the Bayesian conquest before Machine Learning). Atari era PC based stats. Great stuff.

http://www.stat.columbia.edu/~gelman/research/unpublished/swing_voters.pdf - 'The Mythical Swing Voter'

No evidence sharks swing elections http://andrewgelman.com/2016/10/29/no-evidence-shark-attacks-swing-elections/

Mr P can solve problems with survey weighting http://andrewgelman.com/2016/10/12/31398/ (So why don't the Pollsters bother? [**])

and of course he has a book, Red State/Blue State.

[**] I divide statisticians into 3 classes -- C class statisticians have not qualifications at all. 90% of the people you meet in industry have an MBA or at best have taken one course in their STEM careers. All journalists, psychology practisioners, many economists who aren't econometricians, and some popularisers (Paulos) are C class. B class statisticians might have a Ph.D., but probably couldn't get an academic job at a good university, with prejudice. They tend to use outdated methods and not quite understand what they are doing. This makes them a bit dangerous. They are found in polling organizations, less seldom in industry except as consultants. In the land of commercial C class, the B class statistician is king. A class statisticians (and econometricians) are simply that -- competent, current, likely to have an academic post and the respect of their peers. Taleb, Gelman are popular examples. I'm not sure Nate Silver is A class -- he seems a little bit quackish to me but he could be A class and dumbing down his explanations. He gets a lot of flack from the real A class sort.

My categories are not hard to follow. People use this classifaction for doctors all the time. C class is 'persons with no qualification playing at being doctor'. B class are quacks. A class are real doctors.[/PDF]

Macrobius Tuesday 1 Nov 2016

err could a mod fix the above poast? It ended up syntactically malformed somehow.

Broseph Tuesday 1 Nov 2016

it was the

Code:

[PDF]

tag.

And thank you for this thread. I am reading.

Macrobius Wednesday 2 Nov 2016

So let's get this done.

Analysis has to start with 'error analysis'. Why? Because modern science started sometime in the 19th century, in German laboratories. It spread to England and America, mostly by person to person contact (apprenticeship). German science was bombed into oblivion in 1945, but some German scientists migrated to the US, and we had science here to by then. And this is what you do. You start designing experiments with error analysis.

There are three kinds of error -- systematic error, random error (instrumental measurement error) or noise, and statistical error (sampling error, variance).

For example, if you want to measure how long a stick is, you measure it several times with a micrometer or some other calibrated setup. You will get a distribution with a central tendency (estimated by the mean) and some measure of variation around that. For lots of measurements, that distribution will tend to a gaussian (bell curve) -- if you live in mediocristan and sticks being measured as to length do.

So, no matter what you measure you end up:

Loss function = bias (systematic error) + variance (sampling error) + noise (instrumental measurement error).

The overall loss rate in an election is a killer -- you end up miscalling a state, and your prediction of electoral votes is off.

Salil Mehta gives the classical version of this formula (for a mean square error loss function -- which we can only use in Mediocristan):

prediction error = bias^2 + variance of errors + irreducible errors

what does 'irreducible errors' mean? Every measurement has what is called 'ground truth' - the stuff you don't question but just accept. You have to start somewhere or face an infinite regress. Ground truth for an election has to be the tally of votes you count. Sure, it is possible to make innocent or not innocent errors in counting (fraud). Voting machines can be programmed to flip votes, and illegal voters can be less than ideal.

Let's borrow a concept from 'evidence based medicine'. When we are considering a diagnostic test, there is a 'golden diagnostic test' -- the best available procedure for making an objective diagnosis. We say some other test has failed only if we have a Verifiable Operational Procedure for knowing the right answer (and the right answer is our golden diagnostic procedures).

So here is how we deal with fraud in elections: we use court ordered recounts and *certify* the results. The certified election result is our first last and only defence against the scum of the universe. it is our golden data set. Let's remember this when we start winging about exit polls next week.

So what is noise? Once we know the operation procedure for ground truth, we can make verifiable claims of error -- noise.

Noise is anything that causes the certified outcome of the election to differ from what the court said that voter

y = f(x) + epsilon

x is the point we are sampling (a voter). f is the function that puts a *label* on the voter. Labels are what the court says is the truth after judging all the evidence. it is ground truth. y is what the voter intended to do.

So, if there is fraud, or a miscount, or an evil Maxwell's demon in the voting machine, it causes epsilon -- a difference between y (what the voter intended to do, and likely says they did to an exit pollster), and f(x) what the court labels the outcome of the attempted vote.

So, irreducible error means *noise* and noise (measurement error) includes miscounting, fraud. it doesn't count things we don't like about reality not being ideal. We might wish that x, an illegal immigrant, would not be allowed to vote by showing just an ID. But we live in a universe where x voted, the court ordered f(x) is a label for Clinton, and y is how x intended to vote. There is no noise there.

In may discussions, noise is presumed to be zero (we assume undetected fraud and miscounts approved by the courts do not in fact occur). This is likely a reasonable assumption.

Now, for Pedro Domingo's general decomposition, we need some concept (applied to a set of data such as a poll, from which we are going to try to learn f, and predict the election given x* -- the people who actually voted. We want Y* = f(x*) -- the *best prediction* of the outcome is y*.

In addition, there is the 'main prediction' -- which depends on the loss function. In classical statistics, we use the loss function of mediocristan (mean square error) and the 'main prediction' is the mean. In fat tail territory, we use mean average (absolute) deviation (MAD) and the 'main prediction' is the median. In 1/0 loss function land (our current analysis) we use the *mode*. Thus, in Illinois, the main prediction is the label ym, 'Clinton'. In Texas, the main prediction ym is 'Trump' currently. That is because either 1s or 0s are the mode -- whoever has the most voats.

Bias, for Domingos, is L(ym, y*) -- the difference between the best prediction (election outcome) and the main prediction (who got the most voats in the poll). If the polls don't predict the election, they are biased. The measure of bias is the average loss function. (Note that in Mehta's classical treatment Bias is called Bias-squared. That is an artifact of the metric, and we call Bias^2 the Bias term. There is no square in the 1/0 version of the formula)

Variance, similarly, is the loss between y and ym (the poll data and the main prediction). It is decomposed into two sorts -- the average variance in the biased case (where loss is 1), and the average variance in the unbiased case (where loss is 0).

So, setting Noise to zero,

L = <B> + Vu - Vb

(the prediction error rate will be the average Bias, in the definition of Domingos, + Vu - Vb where those terms are, respectively, the average variance for the unbiased case, and the average variance for the biased case)

Election 2016: The Battle of the Statisticians...

10 posts