Proposal Overview

Quick Reference Guide

A summarized view of our research topic, scope, methodology, and key questions.

Research Topic

  • Analysis of swing-state county-level voting patterns in Presidential elections (2016–2024)
  • Focus on Pennsylvania, Michigan, and North Carolina as high-leverage Electoral states
  • Integration of election results with census demographics to understand voter behavior
  • Identification of counties with highest electoral volatility between major parties

Project Goals

  • Develop a model that identifies swing-state counties with largest voting pattern fluctuations
  • Identify demographic characteristics that correlate with electoral volatility
  • Uncover combinations of socio-economic traits that characterize high-volatility electorates
  • Provide actionable insights for political strategists and campaign managers

Scope

  • Geographic: Three swing states — Pennsylvania, Michigan, North Carolina
  • Temporal: Presidential elections from 2016 to 2024 (three election cycles)
  • Granularity: County-level analysis (~250+ counties across three states)
  • Data types: Election returns, demographic census data, derived volatility metrics

Data Sources

Dataset Provider Size Estimate Key Variables
County Presidential Election Returns MIT Election Data & Science Lab / Harvard Dataverse ~94,000 rows, 12 columns county_fips, candidatevotes, totalvotes, party, year
County-Level Demographics IPUMS NHGIS (University of Minnesota) ~3,200 counties × 16 years × 50–100 variables Education, income, race/ethnicity, age, population density

Access & Constraints

  • MIT Election Data: Free download, CSV format, no login required
  • IPUMS NHGIS: Free for academic use, requires account registration, data selected via web GUI
  • All datasets are publicly accessible and appropriate for academic research

Methodology

  • K-Means Clustering: Segment swing-state counties into distinct archetypes based on wealth, education, housing, and demographic metrics to reveal structural groupings in the electorate
  • Association Rule Mining: Identify frequent combinations of county-level socio-economic traits that co-occur with high electoral volatility and Democratic opportunity
  • Volatility Metrics: Generate a custom "Volatility Score" measuring county-level variance in Democratic Party voting margin across the 2016, 2020, and 2024 Presidential elections

Expected Outputs

  • Ranked county volatility scores and classifications across Pennsylvania, Michigan, and North Carolina
  • Cluster profiles characterizing distinct county archetypes with their demographic and economic signatures
  • Association rules linking specific combinations of socio-economic traits to electoral volatility patterns
  • Interactive visualizations and choropleth maps illustrating county-level voting shifts and cluster assignments

Research Questions Summary

  • Which counties illustrate a combination of electoral elasticity and impact (vote volume), maximizing the marginal utility of Democratic resource allocation?
  • Which swing-state counties had the highest electorate volatility between the 2016, 2020, and 2024 elections in terms of percentage/volume of Democratic votes?
  • What specific demographic and economic features distinguish “High-Volatility” county clusters from “Stable” clusters?
  • Electoral volatility (defined as the standard deviation of vote margin 2016–2024) is inversely correlated with wealth. Counties in the bottom quartile of median household income will exhibit significantly higher variance in partisan swing than counties in the top quartile.
  • A composite ‘Racial Diversity Index’ will demonstrate higher feature importance than any single demographic (e.g., ‘% Black’ or ‘% Hispanic’), suggesting that racial heterogeneity is a stronger predictor of Democratic vote share than individual demographics.
  • High-volatility counties function as macro-trend amplifiers rather than independent outliers. These counties will shift in the same direction as the statewide or national trend but with significantly greater magnitude, making them the highest-leverage targets for 2028.
  • To what extent do demographic predictors of electoral volatility generalize across state lines? Specifically, does a model trained on high-variance counties in Pennsylvania successfully identify high-variance counties in Michigan, or are volatility drivers geographically distinct?
  • Does electoral volatility exhibit temporal consistency? Does the geographic composition of the ‘Swing Map’ exhibit significant shifts between the 2016, 2020, and 2024 cycles?
  • What is the statistical correlation between voter turnout variance and vote share volatility?

Potential Bias & Limitations

Limitation Impact Mitigation Strategy
Representation of individuals Using county demographics as proxy for individual voting behavior may identify trends that don't exist at individual level Seek additional data sources that break down voting habits by demographics; interpret findings carefully
Fluctuation of voting trends 2016 and 2020 data may be less predictive than 2024 data due to changing political landscape Weight 2024 data more heavily in models; test temporal stability of patterns
Limited Geographic Scope Focus on three states may miss relevant patterns in other swing states or emerging battlegrounds Acknowledge limitations; design methodology that can be extended to other states

Future Directions

  • Which types of candidates would appeal most in these high-volatility counties?
  • Could these results be applied to non-presidential elections such as gubernatorial, Senate, or House races?
  • Can we predict voting patterns in other states based on the trends we find in our data?

Project Timeline

Milestone Deliverables Due Date
Milestone 1 Project Framing & Website Launch February 9, 2026
Milestone 2 Data Collection & Exploratory Analysis March 6, 2026
Milestone 3 Methods & Model Development April 3, 2026
Milestone 4 Conclusion, Results & Final Report April 17, 2026