Research Topic
- Analysis of swing-state county-level voting patterns in Presidential elections (2016–2024)
- Focus on Pennsylvania, Michigan, and North Carolina as high-leverage Electoral states
- Integration of election results with census demographics to understand voter behavior
- Identification of counties with highest electoral volatility between major parties
Project Goals
- Develop a model that identifies swing-state counties with largest voting pattern fluctuations
- Identify demographic characteristics that correlate with electoral volatility
- Uncover combinations of socio-economic traits that characterize high-volatility electorates
- Provide actionable insights for political strategists and campaign managers
Scope
- Geographic: Three swing states — Pennsylvania, Michigan, North Carolina
- Temporal: Presidential elections from 2016 to 2024 (three election cycles)
- Granularity: County-level analysis (~250+ counties across three states)
- Data types: Election returns, demographic census data, derived volatility metrics
Data Sources
| Dataset | Provider | Size Estimate | Key Variables |
|---|---|---|---|
| County Presidential Election Returns | MIT Election Data & Science Lab / Harvard Dataverse | ~94,000 rows, 12 columns | county_fips, candidatevotes, totalvotes, party, year |
| County-Level Demographics | IPUMS NHGIS (University of Minnesota) | ~3,200 counties × 16 years × 50–100 variables | Education, income, race/ethnicity, age, population density |
Access & Constraints
- MIT Election Data: Free download, CSV format, no login required
- IPUMS NHGIS: Free for academic use, requires account registration, data selected via web GUI
- All datasets are publicly accessible and appropriate for academic research
Methodology
- K-Means Clustering: Segment swing-state counties into distinct archetypes based on wealth, education, housing, and demographic metrics to reveal structural groupings in the electorate
- Association Rule Mining: Identify frequent combinations of county-level socio-economic traits that co-occur with high electoral volatility and Democratic opportunity
- Volatility Metrics: Generate a custom "Volatility Score" measuring county-level variance in Democratic Party voting margin across the 2016, 2020, and 2024 Presidential elections
Expected Outputs
- Ranked county volatility scores and classifications across Pennsylvania, Michigan, and North Carolina
- Cluster profiles characterizing distinct county archetypes with their demographic and economic signatures
- Association rules linking specific combinations of socio-economic traits to electoral volatility patterns
- Interactive visualizations and choropleth maps illustrating county-level voting shifts and cluster assignments
Research Questions Summary
- Which counties illustrate a combination of electoral elasticity and impact (vote volume), maximizing the marginal utility of Democratic resource allocation?
- Which swing-state counties had the highest electorate volatility between the 2016, 2020, and 2024 elections in terms of percentage/volume of Democratic votes?
- What specific demographic and economic features distinguish “High-Volatility” county clusters from “Stable” clusters?
- Electoral volatility (defined as the standard deviation of vote margin 2016–2024) is inversely correlated with wealth. Counties in the bottom quartile of median household income will exhibit significantly higher variance in partisan swing than counties in the top quartile.
- A composite ‘Racial Diversity Index’ will demonstrate higher feature importance than any single demographic (e.g., ‘% Black’ or ‘% Hispanic’), suggesting that racial heterogeneity is a stronger predictor of Democratic vote share than individual demographics.
- High-volatility counties function as macro-trend amplifiers rather than independent outliers. These counties will shift in the same direction as the statewide or national trend but with significantly greater magnitude, making them the highest-leverage targets for 2028.
- To what extent do demographic predictors of electoral volatility generalize across state lines? Specifically, does a model trained on high-variance counties in Pennsylvania successfully identify high-variance counties in Michigan, or are volatility drivers geographically distinct?
- Does electoral volatility exhibit temporal consistency? Does the geographic composition of the ‘Swing Map’ exhibit significant shifts between the 2016, 2020, and 2024 cycles?
- What is the statistical correlation between voter turnout variance and vote share volatility?
Potential Bias & Limitations
| Limitation | Impact | Mitigation Strategy |
|---|---|---|
| Representation of individuals | Using county demographics as proxy for individual voting behavior may identify trends that don't exist at individual level | Seek additional data sources that break down voting habits by demographics; interpret findings carefully |
| Fluctuation of voting trends | 2016 and 2020 data may be less predictive than 2024 data due to changing political landscape | Weight 2024 data more heavily in models; test temporal stability of patterns |
| Limited Geographic Scope | Focus on three states may miss relevant patterns in other swing states or emerging battlegrounds | Acknowledge limitations; design methodology that can be extended to other states |
Future Directions
- Which types of candidates would appeal most in these high-volatility counties?
- Could these results be applied to non-presidential elections such as gubernatorial, Senate, or House races?
- Can we predict voting patterns in other states based on the trends we find in our data?
Project Timeline
| Milestone | Deliverables | Due Date |
|---|---|---|
| Milestone 1 | Project Framing & Website Launch | February 9, 2026 |
| Milestone 2 | Data Collection & Exploratory Analysis | March 6, 2026 |
| Milestone 3 | Methods & Model Development | April 3, 2026 |
| Milestone 4 | Conclusion, Results & Final Report | April 17, 2026 |