Methods & Models

Milestone 3

Eight models. 250 counties. One question: can demographics predict where elections swing?

At a Glance

8
Models Tested
28
Demographic Features
5
County Archetypes
7.15×
Diversity Threshold Discovery

Our EDA revealed that no single demographic feature explains electoral volatility. So we attacked the problem from three directions: supervised classification to predict volatility quintiles, regression to quantify continuous volatility scores, and unsupervised clustering to discover natural county archetypes from demographics alone. K-Means (k=5) revealed five distinct profiles, from aging rural communities to high-diversity urban centers, and one cluster turned out to be exclusively North Carolina counties with extreme volatility. We reference these archetypes throughout this analysis to ground the models in recognizable county types.

K-Means cluster map of 250 counties across Michigan, Pennsylvania, and North Carolina showing 5 distinct county archetypes
K-Means clustering (k=5) reveals five distinct county archetypes across our three swing states. Cluster labels are 1-indexed on the map.
Archetype Counties Key Traits Avg Volatility
Suburban Middle (Cluster 1) 47 Mid-size (~241k pop), moderate diversity, avg income $66k Average
Affluent Exurban (Cluster 2) 47 High income ($83k), 79% homeownership, low density Low
High-Poverty Rural (Cluster 3) 28 35% Black, 21% poverty, small/rural (38k pop), NC only Highest (Q5 avg: 4.6)
Aging Rural (Cluster 4) 115 Oldest (median age 47), very homogeneous, <1% foreign-born Lowest
High-Diversity Urban (Cluster 5) 13 Largest (748k pop), most diverse, 50% bachelor’s+ Moderate-High

Volatility was measured per cluster by cross-referencing each county’s K-Means cluster assignment with its volatility quintile (Q1–Q5). The “Q5 avg: 4.6” for Cluster 3 means that on a 1–5 scale, these counties average 4.6, with 22 of 28 falling in the highest volatility class (Q5).

The Modeling Challenge

We measure electoral volatility as how much a county’s partisan balance shifts between elections, not simply who wins. Our composite volatility score (vol_z_abs_sum) captures the absolute z-scored margin swings across 2016–2020–2024. We then binned all 250 counties into five equal quintiles, from Q1 (Very Stable) through Q5 (Highly Volatile), giving us a balanced 5-class target with exactly 50 counties per class.

Choropleth map showing volatility class distribution across PA, MI, and NC counties
Geographic distribution of the five volatility quintiles across our study region

Two Complementary Lenses

We approached prediction from two angles. Classification asks: "which volatility bucket does this county belong to?" This is useful for campaign triage when you need a quick stable-vs-volatile label. Regression predicts the continuous volatility score directly, answering "exactly how volatile is this county?" This is valuable for ranking counties within the same class and quantifying the magnitude of expected swing.

Classification Pipeline

28 Standardized Features Demographics, income, housing, commuting
Nested Cross-Validation Outer 5-fold × Inner 3-fold
Volatility Predictions 5-class & binary classification

All classification models were evaluated with nested cross-validation: an outer 5-fold loop for unbiased performance estimation, and an inner 3-fold loop for hyperparameter tuning. This prevents information leakage and gives honest estimates of how each model would perform on truly unseen counties.

The Regression Track: Multiple Linear Regression

Our regression approach used Multiple Linear Regression (OLS) with both forward and backward stepwise feature selection, scored by R² and MAE. From the full set of 28 standardized features, the model selected 8 predictors that best explain continuous volatility. The overall pooled model achieved an R² of 0.411 on training data and 0.236 on the test set (RMSE = 1.122), explaining roughly 41% of the variance in county-level volatility scores.

Model Scope Key Selected Features Top Predictor
Pooled (All States) 0.411 8 features incl. rent, age, poverty, entropy Median Gross Rent (β = 0.879)
Pennsylvania 0.674 Median Age, Unemployment, Racial Diversity, Rent Racial Diversity Index
Michigan 0.331 % Bachelor’s Degree+, % Carpool % Bachelor’s Degree+
North Carolina 0.435 % Black, % Hispanic, % Below Poverty, Rent, % Carpool % Below Poverty Line

A striking finding: the pooled model’s strongest coefficient belongs to Median Gross Rent (β = 0.879), a housing cost variable that captures the intersection of urbanization, economic pressure, and demographic transition. This is the only feature that generalizes across all three states in both the classification and regression pipelines, suggesting that housing affordability stress may be a universal undercurrent of electoral instability.

Pennsylvania stands out with R² = 0.674, meaning demographics explain nearly two-thirds of PA’s volatility variance. That is far more than Michigan (0.331) or North Carolina (0.435). This suggests PA’s electoral instability is more structurally "readable" from census data alone, while Michigan’s volatility depends on harder-to-measure factors like local political organizing and candidate-specific effects.

The Classification Arena

Three classifiers competed on the 5-class volatility prediction task. We started with Random Forest as our baseline, then tested whether gradient boosting (XGBoost) or maximum-margin (SVM) approaches could improve on it.

Baseline Random Forest

An ensemble of 300 decision trees with balanced class weights. Random Forest handles nonlinear feature interactions naturally and provides built-in feature importance. With an accuracy of 36.8% and macro F1 of 0.362, it established our baseline. That is significantly above the 20% random-chance threshold for 5 balanced classes, though it clearly struggled with the middle quintiles.

Cluster insight: RF performed best on Q5 (Highly Volatile) counties, which are dominated by High-Poverty Rural (Cluster 3) communities. Their extreme demographic profile makes them the most distinctive signal in the data.

Best 5-Class XGBoost

Gradient boosting with 100 randomized hyperparameter configurations. XGBoost earned the highest 5-class accuracy (38.8%) and macro F1 (0.387), a meaningful improvement in a 5-class setting. Its key advantage was better calibration on the middle quintiles (Q3: 0.356 F1 vs. RF’s 0.257), suggesting it captures subtler demographic gradients that tree ensembles miss.

Contender SVM (Linear Kernel)

Support Vector Machine with a linear kernel and balanced class weights. Matched RF’s overall accuracy (36.8%) but showed a different error pattern: it achieved the strongest Q2 (Stable) performance of any model (F1 = 0.365), suggesting that the hyperplane-based approach finds a linear boundary for “low-volatility” counties that tree methods struggle with. We also tested an RBF kernel, which traded accuracy (35.2%) for higher ROC-AUC (0.790), indicating better probability calibration despite worse hard predictions.

Confusion Matrix Comparison

Click each tab to compare how the three models distribute their predictions across the five volatility classes. Diagonal cells (correct predictions) are highlighted in green; off-diagonal errors in red shading.

Accuracy 36.8%
Macro F1 0.362
ROC-AUC 0.803
Best Class Q5
Actual \ Pred Q1Q2Q3Q4Q5
Q1 Very Stable 25 8 10 4 3
Q2 Stable 17 8 13 8 4
Q3 Moderate 11 15 13 8 3
Q4 Volatile 4 6 10 17 13
Q5 Highly Volatile 2 3 5 11 29
Accuracy 38.8%
Macro F1 0.387
ROC-AUC 0.783
Best Class Q5
Actual \ Pred Q1Q2Q3Q4Q5
Q1 Very Stable 21 14 5 8 2
Q2 Stable 14 11 10 10 5
Q3 Moderate 6 14 16 11 3
Q4 Volatile 5 8 6 19 12
Q5 Highly Volatile 5 5 3 7 30
Accuracy 36.8%
Macro F1 0.364
ROC-AUC 0.765
Best Class Q1
Actual \ Pred Q1Q2Q3Q4Q5
Q1 Very Stable 26 6 10 6 2
Q2 Stable 12 19 10 2 7
Q3 Moderate 11 13 9 11 6
Q4 Volatile 9 8 9 16 8
Q5 Highly Volatile 6 8 6 8 22

5-Class Model Scoreboard

Model Accuracy Macro F1 ROC-AUC Best Per-Class F1
Random Forest 0.368 0.362 0.803 Q5: 0.569
XGBoost 0.388 0.387 0.783 Q5: 0.588
SVM (Linear) 0.368 0.364 0.765 Q2: 0.365

A clear pattern emerges: all three models excel at the extremes (Q1 and Q5) but struggle with the middle quintiles. The confusion matrices show a "smearing" effect where Q2–Q4 counties bleed into adjacent classes. This reflects genuine demographic overlap between moderate-volatility counties rather than a modeling shortcoming. The extremes are simply where demographics tell the clearest story.

Per-class F1 heatmap across all models and quintiles
Per-class F1 scores reveal the extremes-vs-middle pattern across all model families

The Binary Breakthrough

The 5-class struggle pointed to a pragmatic question: does a campaign manager really need to distinguish Q2 from Q3? What matters is identifying high-volatility targets (Q4 + Q5) versus stable counties (Q1–Q3). This binary reframing changed everything.

0.387 0.718 Macro F1 improvement: 5-class XGBoost → Binary Random Forest
ROC curves comparing all models on binary volatile vs stable classification
Binary ROC curves: Random Forest achieves AUC = 0.828, outperforming all competitors
Model Accuracy F1 Precision Recall ROC-AUC
Random Forest 0.772 0.718 0.726 0.690 0.828
XGBoost 0.748 0.660 0.718 0.610 0.814
SVM (RBF) 0.756 0.674 0.724 0.630 0.789
Logistic Regression 0.680 0.604 0.598 0.610 0.745
Cluster Insight

Aging Rural (Cluster 4) = Predictably Stable

These 115 homogeneous, older counties, spread across all three states (50 MI, 39 PA, 26 NC), were almost perfectly classified as low-volatility. Median age 47, under 1% foreign-born, and the highest homeownership rate of any cluster (79%). Their demographics tell a consistent story of partisan anchoring.

Cluster Insight

High-Poverty Rural (Cluster 3) = Volatile Signal

With 22 of 28 counties rated Highly Volatile (Q5) and an average quintile of 4.6, this cluster is the single clearest volatility signal in our dataset. Every county in this cluster is in North Carolina, concentrated in the eastern part of the state along the historic Black Belt.

Inside Cluster 3: North Carolina’s Volatility Engine

One of our most revealing discoveries is that Cluster 3 (High-Poverty Rural) is entirely composed of North Carolina counties, all 28 of them. No Pennsylvania or Michigan county shares this demographic profile. This geographic exclusivity explains a lot. It is why our cross-state models struggle, and why NC’s volatility responds to different levers than the other two states.

These counties sit at the intersection of every high-volatility signal in our data: racial diversity (avg 35% Black population), deep poverty (21% below poverty line), low education (only 17% with a bachelor’s degree), and small populations (avg 38k). The flagship example is Robeson County, the single most volatile county in our entire 250-county dataset (volatility score of 8.03). Robeson has the highest racial entropy score in the study, the lowest education level, and a poverty rate over 3 standard deviations above the mean. Unlike suburban counties where demographic shifts gradually reshape voting patterns, Robeson’s longstanding structural inequality keeps the electorate in constant flux.

The fact that K-Means, with no knowledge of election results or state boundaries, isolated these 28 counties into a single cluster using demographics alone validates the connection between socioeconomic structure and electoral instability. The algorithm found the volatility engine before we told it to look for one.

The Misfit Detector

We also applied logistic regression in a creative way. Instead of predicting volatility, we trained a model to predict partisan lean from demographics alone. The key insight is that counties where the model confidently predicts "blue" but voters choose "red" (or vice versa) represent demographic misfits, places where the electorate defies its demographic expectations. We hypothesized that this tension would correlate with volatility, and it does.

Scatter plot of demographic misfit scores vs volatility scores showing positive correlation
The misfit scatter: counties that defy their demographic predictions tend to be more volatile
Choropleth map of misfit scores across PA, MI, and NC
Geographic distribution of demographic misfits, with notable concentration in transitional suburban belts
Key Finding

“Surprise Dem” Misfits

Counties whose demographics say "Republican" but that vote Democratic tend to be university towns and resort communities with unique electorates, such as Marquette County, MI and Centre County, PA.

Key Finding

“Surprise Rep” Misfits

Southern counties with mixed demographic profiles that nonetheless lean Republican. The demographic tension in these communities maps directly to higher electoral volatility.

Misfit scores broken down by volatility class showing increasing misfit with volatility
Misfit scores rise consistently with volatility class, showing that demographic tension predicts electoral instability

The misfit detector gives us something no standard classifier can: a measure of how surprising a county’s voting behavior is given its demographics. Counties with high misfit scores are exactly the places where campaigns should pay closest attention, because they are the ones most likely to flip.

Cross-State Strategic Insight

A natural question: can a model trained on Pennsylvania and Michigan predict volatility in North Carolina? We tested this with Leave-One-State-Out (LOSO) cross-validation, where each state is held out in turn. The answer was revealing, not because the models worked, but because of how they failed.

Three confusion matrices from LOSO cross-validation showing poor cross-state generalization
LOSO confusion matrices: models trained on two states struggle to predict the third
Held-Out State Counties 5-Class F1 Binary F1 Binary AUC
Pennsylvania 67 0.189 0.591 0.838
Michigan 83 0.182 0.636 0.778
North Carolina 100 0.213 0.736 0.829
Full Dataset (Pooled) 250 0.387 0.718 0.828

The performance gap is striking: pooled 5-class F1 of 0.387 drops to an average of just 0.195 under LOSO. Rather than exposing a weakness, this result reveals something more interesting. Each state’s volatility is driven by a unique demographic fingerprint.

State-specific feature importance showing different top predictors per state
Each state has its own volatility fingerprint with distinct top predictors
Pennsylvania

Diversity-Driven Volatility

Racial entropy is the dominant predictor. PA’s volatile counties sit along the suburban diversity frontier between homogeneous rural areas and multicultural cities.

Michigan

Education-Driven Volatility

Bachelor’s degree attainment is Michigan’s key signal. The college-educated suburban swing, especially around Detroit and Grand Rapids, defines MI volatility.

North Carolina

Poverty-Driven Volatility

Economic distress dominates. NC’s most volatile counties are the High-Poverty Rural (Cluster 3) communities concentrated in the eastern part of the state.

Strategic Implication

The Power of State-Specific Models

While our pooled model captures broad national patterns, state-level models unlock a deeper layer of precision. By tailoring features to each state’s unique electorate, we can build sharper predictions and more targeted campaign strategies.

Cross-Method Convergence: Classification Meets Regression

The most compelling validation of these state-specific findings comes from an unexpected place: our regression models, developed independently from the classification pipeline, converge on the same top predictors per state. Classification uses SHAP feature importance from XGBoost; regression uses stepwise feature selection with R² scoring. Two completely different methods, same answer.

State Classification Top Feature (SHAP) Regression Top Feature (Stepwise) Agreement?
Pennsylvania Racial Diversity Index Racial Diversity Index ✓ Match
Michigan % with Bachelor’s Degree+ % with Bachelor’s Degree+ ✓ Match
North Carolina % Below Poverty Line % Below Poverty Line ✓ Match

This three-for-three convergence is striking. When a nonlinear tree ensemble (XGBoost) and a linear model (OLS) independently surface the same dominant feature for each state, the signal is robust and not an artifact of a single modeling choice. Pennsylvania’s volatility genuinely hinges on racial diversity, Michigan’s on education levels, and North Carolina’s on poverty rates.

One feature bridges all three states across both methods: Median Gross Rent. It appears in the pooled regression as the single strongest coefficient (β = 0.879) and consistently ranks among the top SHAP features nationally. Housing cost encodes a unique combination of urbanization pressure, economic vulnerability, and demographic change, making it the closest thing to a universal volatility signal in our data.

The Diversity Threshold

We used SHAP (SHapley Additive exPlanations) to understand which features drive our XGBoost model’s predictions. One result stood out clearly. The normalized racial entropy index, introduced on our EDA page as Shannon Entropy, emerged as the single most important predictor of electoral volatility.

SHAP summary plot for XGBoost showing feature importance with race_entropy_norm at top
SHAP summary plot: each dot is one county, with color indicating feature value (red = high, blue = low)
0.552
Shannon Entropy Threshold
7.15×
Odds Ratio Above Threshold

Our Racial Diversity Index (based on Shannon Entropy) measures how evenly distributed a county’s population is across racial groups, on a 0–1 scale. At a score of 0.552, we found a clear tipping point. Counties below this level show little connection between diversity and volatility. But once a county crosses 0.552, its odds of being classified as high-volatility jump by a factor of 7.15. In practical terms, a county with a moderately mixed population is far more likely to swing than either a very homogeneous or a very diverse one.

Visualization of the diversity threshold at 0.552 showing sharp volatility increase
The diversity threshold at 0.552: above this line, counties are 7.15× more likely to be volatile
SHAP dependence plots for top 5 features showing nonlinear effects
SHAP dependence plots reveal nonlinear relationships, with a sharp inflection in the Racial Diversity Index

Diversity Meets Poverty

The threshold effect intensifies when diversity intersects with economic distress. High-diversity and high-poverty counties, dominated by the High-Poverty Rural (Cluster 3) archetype, show the strongest volatility signal in the entire dataset. This is exactly the profile of those 28 North Carolina counties: their average racial entropy sits well above the 0.552 threshold, and their poverty rates are the highest of any cluster. When both conditions are present together, volatility rises significantly more than either factor alone would predict.

Interaction plot showing diversity x poverty amplifying volatility
The diversity–poverty interaction: when both are high, volatility rises sharply
Hypothesis Confirmed

Composite Index > Individual Demographics

Our proposal hypothesized that a composite Racial Diversity Index would outperform any single demographic variable. SHAP confirms it: race_entropy_norm has higher feature importance than pct_black, pct_hispanic, or any other individual race variable.

Top Interactions

Feature Combinations That Matter

The three strongest SHAP interactions: (1) racial entropy × population density, (2) bachelor’s degree % × racial entropy, and (3) total population × population density. Diversity is in two of the top three.

Coming Next: Results

Our methods revealed five county archetypes, a 7.15× diversity threshold, and state-specific volatility fingerprints. But what does this mean for real-world strategy? The Results page will answer our original research questions with data.

🎯

Priority County Rankings

Which of the 250 counties should campaigns target first for maximum impact?

Research Question 1
🔍

Hypothesis Testing

Do wealth, diversity, and amplification hypotheses hold up against the full evidence?

Hypotheses 1–3
🇧🇾

State-by-State Strategy

Tailored recommendations for PA, MI, and NC based on each state’s unique volatility drivers.

Research Questions 7–8
📈

Temporal Consistency

Does the swing map shift between 2016, 2020, and 2024, or do the same counties keep swinging?

Research Question 8