Proposal | Swing State Election Analysis

Research Topic

Analysis of swing-state county-level voting patterns in Presidential elections (2016–2024)
Focus on Pennsylvania, Michigan, and North Carolina as high-leverage Electoral states
Integration of election results with census demographics to understand voter behavior
Identification of counties with highest electoral volatility between major parties

Project Goals

Develop a model that identifies swing-state counties with largest voting pattern fluctuations
Identify demographic characteristics that correlate with electoral volatility
Uncover combinations of socio-economic traits that characterize high-volatility electorates
Provide actionable insights for political strategists and campaign managers

Scope

Geographic: Three swing states — Pennsylvania, Michigan, North Carolina
Temporal: Presidential elections from 2016 to 2024 (three election cycles)
Granularity: County-level analysis (~250+ counties across three states)
Data types: Election returns, demographic census data, derived volatility metrics

Data Sources

Dataset	Provider	Size Estimate	Key Variables
County Presidential Election Returns	MIT Election Data & Science Lab / Harvard Dataverse	~94,000 rows, 12 columns	county_fips, candidatevotes, totalvotes, party, year
County-Level Demographics	IPUMS NHGIS (University of Minnesota)	~3,200 counties × 16 years × 50–100 variables	Education, income, race/ethnicity, age, population density

Access & Constraints

MIT Election Data: Free download, CSV format, no login required
IPUMS NHGIS: Free for academic use, requires account registration, data selected via web GUI
All datasets are publicly accessible and appropriate for academic research

Methodology

K-Means Clustering: Segment swing-state counties into distinct archetypes based on wealth, education, housing, and demographic metrics to reveal structural groupings in the electorate
Association Rule Mining: Identify frequent combinations of county-level socio-economic traits that co-occur with high electoral volatility and Democratic opportunity
Volatility Metrics: Generate a custom "Volatility Score" measuring county-level variance in Democratic Party voting margin across the 2016, 2020, and 2024 Presidential elections

Expected Outputs

Ranked county volatility scores and classifications across Pennsylvania, Michigan, and North Carolina
Cluster profiles characterizing distinct county archetypes with their demographic and economic signatures
Association rules linking specific combinations of socio-economic traits to electoral volatility patterns
Interactive visualizations and choropleth maps illustrating county-level voting shifts and cluster assignments

Research Questions Summary

Which counties illustrate a combination of electoral elasticity and impact (vote volume), maximizing the marginal utility of Democratic resource allocation?
Which swing-state counties had the highest electorate volatility between the 2016, 2020, and 2024 elections in terms of percentage/volume of Democratic votes?
What specific demographic and economic features distinguish “High-Volatility” county clusters from “Stable” clusters?
Electoral volatility (defined as the standard deviation of vote margin 2016–2024) is inversely correlated with wealth. Counties in the bottom quartile of median household income will exhibit significantly higher variance in partisan swing than counties in the top quartile.
A composite ‘Racial Diversity Index’ will demonstrate higher feature importance than any single demographic (e.g., ‘% Black’ or ‘% Hispanic’), suggesting that racial heterogeneity is a stronger predictor of Democratic vote share than individual demographics.
High-volatility counties function as macro-trend amplifiers rather than independent outliers. These counties will shift in the same direction as the statewide or national trend but with significantly greater magnitude, making them the highest-leverage targets for 2028.
To what extent do demographic predictors of electoral volatility generalize across state lines? Specifically, does a model trained on high-variance counties in Pennsylvania successfully identify high-variance counties in Michigan, or are volatility drivers geographically distinct?
Does electoral volatility exhibit temporal consistency? Does the geographic composition of the ‘Swing Map’ exhibit significant shifts between the 2016, 2020, and 2024 cycles?
What is the statistical correlation between voter turnout variance and vote share volatility?

Potential Bias & Limitations

Limitation	Impact	Mitigation Strategy
Representation of individuals	Using county demographics as proxy for individual voting behavior may identify trends that don't exist at individual level	Seek additional data sources that break down voting habits by demographics; interpret findings carefully
Fluctuation of voting trends	2016 and 2020 data may be less predictive than 2024 data due to changing political landscape	Weight 2024 data more heavily in models; test temporal stability of patterns
Limited Geographic Scope	Focus on three states may miss relevant patterns in other swing states or emerging battlegrounds	Acknowledge limitations; design methodology that can be extended to other states

Future Directions

Which types of candidates would appeal most in these high-volatility counties?
Could these results be applied to non-presidential elections such as gubernatorial, Senate, or House races?
Can we predict voting patterns in other states based on the trends we find in our data?

Project Timeline

Milestone	Deliverables	Due Date
Milestone 1	Project Framing & Website Launch	February 9, 2026
Milestone 2	Data Collection & Exploratory Analysis	March 6, 2026
Milestone 3	Methods & Model Development	April 3, 2026
Milestone 4	Conclusion, Results & Final Report	April 17, 2026

Proposal Overview

Research Topic

Project Goals

Scope

Data Sources

Access & Constraints

Methodology

Expected Outputs

Research Questions Summary

Potential Bias & Limitations

Future Directions

Project Timeline