So let's do a *real* analysis. There is a graph from the Gelman link above:
The media (and Gelman) is talking about a "2% bias" (shift up and down). But that's not all of the story is it? What about that tilt.
It is easiest to think of voat counts, sometimes, rather than percentages.
Eyeballing the band from OR to TN we get a 2x2 linear system:
-0.005 = a*T + b*H for T=0.3 and H=0.7
0.05 = a*T + b*H for T=0.6 and H=0.4
So we get, for a line down the middle, something like a=0.123 and b=-0.6
Well, that means Trump voats have to be counted 2:1 to predict the election, across the board!
Let's say that again: the polling models are *so biased* against Trump, that for every extra Trump voat you find you have to weight it 2x for every Clinton voat you find, just to compensate for your 'invalid demographic' model (that's what Reuters statistician was probably talking about, or his dumbed down explanation, before a the Reuters journalist wrote it up to the best of his limited understanding).
What this means is that 'oversampling Democrats' is a dandy way to rig a poll. No one pays attention to your prediction in CA or ND, where the conclusion is wrong, and you only 'shift' things by a few percent in the states that matter for the prediction. And, of course, it allows you to make predictions like 'the Democrats will win with 90% confidence' and demoralise the opposition, without appearing to be the total crook that you are.
Anyway, we can 'unscrew' the data if we rotate the whole thing around OR until TN comes down 4%. Then we get a pretty good prediction. It *still* has a 'maximal axis of confusion' from top to bottom of 6-7% and *that's* the important signal here.
Now let's look at another graph.
Romney 2012 vs Trump 2016. Notice anything? The 'errors' fall in the same places. It looks like the polls basically predicted a rematch of Romney 2012, but *something different happened*. You can unskew the election vs election data (that is, nothing to do with polls) by rotating around OR and bringing TN down 4%.... There is still the same 2:1 weighting of Trump needed to unskew the polls (meaning the polls are no better than just predicting Romeny 2012 without asking anyone questions at all, and they are oversampling exactly the same way).
This means, by the way, that we are dealing with reality, not polls, when we look at this chart. The polls and the 2012 election are in one box of rocks, and the Trump 2016 election is a different kettle of fish -- *turnout changed* and no one told the pollsters. New voaters registered [differentially Trump suporters], moar Republicans turned up, and fewer Dems came out for Hillary.
But now we get to the *really* interesting part. Why did Trump do so poorly in TX, and so much better in MN (even it not well enough to win...). There's a 6% differential as we walk from Texas to Minn!
TX AZ GA NC FL CO NV PA NH MN WI MI ME in the Romney vs Trump data
TZ AZ GA VA CO FL MI NC NH PA ME MN WI in the Polls vs Election 2016 data
Let's call this axis 'Texicity' (distance from TX). TX is the center of bad news for the Alt Right (and Trump), and the news gets better the further you get from Texas... until it gets absolutely awesome by the time you get to the rust belt.
There are a couple possible explanations:
- CA and MA (the left and right coasts) are on top of each other -- the polls are pretty accurate in coastal urban areas. They *suck* in flyover country.
- There is also a N-S gradient of liberalism, with liberals hugging the Canadian border.
- obviously it could be Hispanics. One thing that stands out is that Rural Hispanics in the Southwest, Urban Hispanics, and Migratory Hispanics, are three different populations. The first two are heavily democratic but likely have different tendencies for turnout (and ability to contact them by phone).
*Leaving aside the West coast* (which doesn't get pipeline oil from Texas, and has untypical taxes on gasoline), here is the price of gasoline:
https://www.gasbuddy.com/GasPriceMap?z=4
Green is (relatively) bad news for Trump and the Alt Right, and Orange is good news, east of the Rockies, except for New York and Chicago. PA especially stands out, as do IA, and upstate MI and WI. MN not so much.
(the two states that don't fit the pattern, MO and SC, have recently had extreme and extended racial conflict between blacks and whites. They may be somewhat special).
- Of course the incidence of Hispanics and the availability of cheap gas are not independent factors. The cost of gas as you get further from Texas and Louisiana goes up about 25 cents a gallon. This is the *integrated cost* of shipping it - not unlike the cost of migrating from Texas to the edge of Maine.
- of course, gasoline costs are also correlated with turnout. In particular, Whites pursuing a k strategy and having future time orientation (conservatives) are likely to spend money, all things equal, to drive to the polls. Discouraged r selectionists (liberals) are less likely.
This does suggest a strategy for the Alt Right to supress voating by the opposition: enact punitive and regressive gasoline taxes in your state. This gives your state revenues to be independent of the consolidated government (States Rights) and it discourages migration, and it supresses r selectionists from voating. Awesome! We can always lower state income taxes if we want to encourage working and keep the tax revenue neutral.
Is gasoline a poll tax? Or is it just that migrants are hard to count in polls? Probably not the latter, because the Romney election is just like the polls would have predicted if he had used Trump's polls instead of his own. No 'undercounted migrants' in 2012.
- finally, gasoline prices certainly correlate with White lower class distress, as also heating fuel costs.
Anyway, the tl;dr
- there was a yuuge oversampling of Democrats giving Trump a 2:1 disadvantage in the polls (not a mere 2% bias), that wasn't in the Romney polls.
- Part of this may be his own efforts at registering suporters, the 'group cohesion' is more White areas, and stronger in the North than the South, maybe. The Midwest is a perfect storm.
- It is probably not a 'shy Hillary supporter' effect because the biggest skew is in states where she has fewer voats by far. Likewise black turnout.
- the remaining bias is ordered north to south. Trump is in trouble in the South and his greatest vulnerabilities are TX, GA, AZ. FL and NC are the tipping point. Fortunately, in this election gasoline prices are high in the right places (and TX red enough not to matter).
- 'Texicity' (distance from Texas) is a factor, and migrant Hispanics (illegals?) responding to gas prices *may* be a factor. If TX tips to Latino voaters, demographically, Democrats will have a strategy to undermine the Alt Right.
- High gasoline prices may be an effective 'Wall' containing Hispanics in places there are already numerous.