2008 Presidential Election Data from Texas: [ Harvard Election Data Archive ]
ACS 2010 at Tract Level: [ US Census ]
Voting District to Census Tract Crosswalk: [ Missouri Census Data Center ]
2010 Nonprofit Firms and Locations: [ NCCS Core Files ]
The goal of the study is to build a sample framework whereby we can compare nonprofits from super-majority voting districts in order to determine how nonprofit missions vary by the political ideology of the communities in which they are located.
Since democratic and republican super-majority voting districts typically look very different (suburban and white versus urban and diverse), we must first match them based upon demographics in order to create balanced voting districts (similar poverty rates, percentage minorities, and population density in each) that differ primarily on political ideology.
The replication files here explain the process of linking voting data to census data, finding voting districts that are very similar demographically be very different in voting patterns, and then locating nonprofits within those districts to analyze differences in mission.
TRACT represents Census tracts that contain geographic data about the population. Voting districts are the geographic units of aggregation for voting data within states. These two datasets need to be merged in order to have both demographic and voting data.
The nonprofit firm database comes from tax records that have been compiled by the Urban Institute (NCCS Core files). The unit of analysis of the study is nonprofit missions. We filter voting districts to eliminate those that are not supermajority districts, then further eliminate voting districts that cannot be matched to doppelgangers (for each republican supermajority district, find one democratic supermajority district that is a demographic “twin”). The add nonprofits in those districts to the sample. Repeat for all that can be reasonably matched.
We followed steps are reported in Appendix A of the paper. They show the process for arriving at the 125 nonprofit mission statements that are coded for the paper.
8,400 total voting districts in Texas
3,513 census tracts in Texas
Of the 680 super-majority voting districts that can be linked to census data, the matching procedure generated 102 districts in the balanced sample:
Of the 22,295 nonprofits in the state, 323 are located in the matched supermajority districts and were used for analysis of NTEE codes and comparison of revenue and nonprofit age.
Of these 323 nonprofits located in the matched voting districts, we were able to find mission statements listed on websites for 125.
UPDATES
In the process of putting together replication files we discovered one minor error in reporting the sample framework:
Appendix A reports 3,513 census tracts in Texas in 2010.
There are actually 5,265 census tracts in Texas 2010, but only 3,513 in the voter-district-census-tract crosswalk obtained through the Missouri Census Data Center (see below).
We were also able to fix more voter district IDs to increase the merge rate in these replication examples. Compared to Appendix A in the paper (reported above) we now have:
In total that equals 1,638 supermajority voting districts that can be linked to primary census tracts, more than the 680 super-majority voting districts reported in Appendix A originally.
These things do not change the results of the study since the results represent a comparison of 102 nonprofits from matched and demographically balanced voting districts (51 republican and 51 democratic districts). So even though the matched sample was smaller, it still retains the properties that are important for achieving high internal validity with propensity score matching - only comparing the “twins” in the data instead of the full sample.
The updates are included here for anyone that wants to extend the study.
This study uses 2010 American Community Survey data from the US Census.
You can find codes for variable names at the Census API site:
https://api.census.gov/data/2010/acs/acs5/variables.html
For details on poverty measures see:
https://www.socialexplorer.com/data/ACS2013_5yr/metadata/?ds=ACS13_5yr&table=B17001
NOTE: The original study uses the acs package but I would highly recommend using Hannah Recht’s awesome censusapi package. It is much easier to use!
You will need to get a free Census API key: https://api.census.gov/data/key_signup.html
# library( censusapi )
census <- getCensus( name="acs/acs5",
vintage=2010,
key=my.censuskey,
vars=c( "NAME",
"B01002A_001E", # median age
"B19013_001E", # median household income
"B01003_001E", # total population"B01003_001"
"B17001_002E", # poverty
"B17001_001E", # population used for pov
"B03003_003E", # hispanic
"B02001_002E", # race.white
"B02001_003E"), # race.black
region="tract:*",
regionin="state:48") # texas
names( census )
## [1] "state" "county" "tract" "NAME" "B01002A_001E"
## [6] "B19013_001E" "B01003_001E" "B17001_002E" "B17001_001E" "B03003_003E"
## [11] "B02001_002E" "B02001_003E"
census$geoid <- paste0( census$state, census$county, census$tract )
names( census ) <- c("state","county","tract","NAME",
"medianage","income",
"totalpop","poverty","povbase",
"hispanic","white","black",
"geoid")
# Remove missing values
census$income[ census$income == -666666666 ] <- NA
census$medianage[ census$medianage == -666666666 ] <- NA
# Delete zero population cases so rates are finite
census$totalpop[ census$totalpop == 0 ] <- NA
census$povbase[ census$povbase == 0 ] <- NA
# Calculating rates and percentages
census <-
census %>%
mutate( poverty = round( 100*(poverty/povbase), 2),
hispanic = round( 100*(hispanic/totalpop), 2),
white = round( 100*(white/totalpop), 2),
black = round( 100*(black/totalpop), 2) )
# drop extra pop variable
census <- select( census, - povbase )
head( census )
census %>%
select( medianage, income, totalpop, poverty,
hispanic, white, black ) %>%
stargazer( type = "html", digits=0 )
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
medianage | 5,215 | 36 | 8 | 11 | 30 | 42 | 80 |
income | 5,209 | 52,713 | 28,259 | 6,140 | 33,831 | 63,284 | 250,001 |
totalpop | 5,224 | 4,654 | 2,241 | 24 | 3,100 | 5,826 | 25,073 |
poverty | 5,215 | 18 | 13 | 0 | 7 | 25 | 100 |
hispanic | 5,224 | 36 | 28 | 0 | 13 | 54 | 100 |
white | 5,224 | 72 | 20 | 0 | 63 | 87 | 100 |
black | 5,224 | 12 | 17 | 0 | 1 | 15 | 100 |
Capture the study data in case the API changes:
Voting districts and census tracts do not all share contiguous boundaries, so merging voting data and census data can be tricky. The Missouri Census Data Center has created tools that maps voting districts to census tracts using geographic apportionment. You can visit the MABLE Geocorr14 Geographic Correspondence Engine here:
http://mcdc.missouri.edu/websas/geocorr14.html
A correspondence table has been created by selecting the 2010 Census Tracts and Voting Tabulation Districts and is saved as the file “crosswalk.csv”.
Note, the variable pop10 comes from the crosswalk and refers to voting district population. The variable totalpop comes from the 2010 Census ACS and refers to the census tract population.
Since the relationships are not nested it will not be a one-to-one relationship, i.e. one voting district can match to multiple census tracts. As a result, we select the census tract for each voting district that has the highest apportionment rate (geographical overlap).
The mean apportionment rate is 89% (standard deviation of 17%), with a median of 100% overlap.
crosswalk <- read.csv( "../DATA/01-raw-data/crosswalk.csv", colClasses="character" )
head( crosswalk )
Save the TX crosswalk for ease of sharing replication files:
crosswalk$state <- substr( crosswalk$county, 1, 2 )
table( crosswalk$state )
crosswalk.tx <- filter( crosswalk, state == "48" )
write.csv( crosswalk.tx, "../DATA/02-processed-data/VTDtoTractCrosswalkTX.csv", row.names=F )
crosswalk$tract.key <- paste( crosswalk$county,
gsub( "\\.","", crosswalk$tract), sep="" )
head( crosswalk$tract.key )
## [1] "01001021000" "01001020500" "01001021000" "01001020900" "01001020900"
## [6] "01001020802"
Data was obtained from the Harvard Election Data Archive project , a source for 2008 presidential election results at a voting district level for all 50 states. Texas contains 8,400 separate voting districts (VTDs). In the 2008 election of John McCain versus Barack Obama Texas had 1,451 Democratic supermajority districts and 2,886 Republican supermajority districts, representing 51% of all voting districts in the state.
http://projects.iq.harvard.edu/eda/
The data comes as a shapefile with historic voting data embedded, so we need to load the shapefile using the rgdal package in R and extract the historic voting data frame.
Select Data Dictionary:
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\jdlecy\Dropbox\04 - PAPERS\03 - Published\18 - Republican and Democratic Nonprofits\PUBLISHED\political-ideology-of-nonprofits\DATA\01-raw-data", layer: "Texas_VTD"
## with 8400 features
## It has 21 fields
## Integer64 fields read as strings: CNTY COLOR VTDKEY CNTYKEY Gov_D_02 Gov_R_02 Pres_D_04 Pres_R_04 Gov_D_06 Gov_R_06 Pres_D_08 Pres_R_08 Gov_D_10 Gov_R_10 vap
## [1] 8400
Voter district patterns.
dem.count <- as.numeric( as.character( tx$Pres_D_08 ))
rep.count <- as.numeric( as.character( tx$Pres_R_08 ))
dem <- dem.count / ( dem.count + rep.count )
sum( dem <= 0.3, na.rm=T ) # supermajority republican districts
## [1] 2893
## [1] 1456
h <- hist( dem, breaks=100, plot=FALSE )
cuts <- cut( h$breaks, c(-0.1,0.3,0.7,1.1), labels=c("red","gray","steelblue") )
plot( h, col=as.character(cuts),
main="Percentage Voting for Obama by District",
yaxt="n", ylab="", xlab="Percent of Votes for Obama by District")
The vtdname in the Census to VTD Crosswalk file, and the vtdkey in the Voting dataset are currently incompatible.
The vtdname variables has four forms:
Each follows a format of: SS-CCC-DIST
## [1] "Vtng Dist 0018" "Vtng Dist 0006" "Vtng Dist 0012" "Vtng Dist 0024"
## [5] "Vtng Dist 0029" "Vtng Dist 0022" "Vtng Dist 0035" "48005008B"
## [9] "48005036B" "Vtng Dist 0036" "Vtng Dist 0001" "48005010B"
## [13] "48005014B" "Vtng Dist 0038" "Vtng Dist 0016" "Vtng Dist 0027"
## [17] "Vtng Dist 0031" "48005016B" "Vtng Dist 0019" "48005017B"
## [21] "48005011B" "Vtng Dist 0032" "Vtng Dist 004A" "Vtng Dist 001A"
## [25] "Vtng Dist 0010" "Vtng Dist 0007" "Vtng Dist 0009" "Vtng Dist 0003"
## [29] "Vtng Dist 0004" "Vtng Dist 0005" "Vtng Dist 0201" "Vtng Dist 0303"
## [33] "Vtng Dist 0202" "Vtng Dist 0402" "Vtng Dist 0301" "Vtng Dist 0101"
## [37] "Vtng Dist 0404" "Vtng Dist 0302" "Vtng Dist 0403" "Vtng Dist 0401"
## [41] "Vtng Dist 0011" "Vtng Dist 0023" "Vtng Dist 0020" "Vtng Dist 0014"
## [45] "Vtng Dist 0415" "Vtng Dist 0418" "Vtng Dist 0417" "Vtng Dist 0413"
## [49] "Vtng Dist 0312" "480150319"
To standardize the VTD IDs:
# Census Data
vtdnm <- census.dat$vtdname
vtdnm <- gsub( "Vtng Dist ", "xxxxx", vtdnm )
head( vtdnm, 50 )
## [1] "xxxxx0018" "xxxxx0006" "xxxxx0012" "xxxxx0024" "xxxxx0029" "xxxxx0022"
## [7] "xxxxx0035" "48005008B" "48005036B" "xxxxx0036" "xxxxx0001" "48005010B"
## [13] "48005014B" "xxxxx0038" "xxxxx0016" "xxxxx0027" "xxxxx0031" "48005016B"
## [19] "xxxxx0019" "48005017B" "48005011B" "xxxxx0032" "xxxxx004A" "xxxxx001A"
## [25] "xxxxx0010" "xxxxx0007" "xxxxx0009" "xxxxx0003" "xxxxx0004" "xxxxx0005"
## [31] "xxxxx0201" "xxxxx0303" "xxxxx0202" "xxxxx0402" "xxxxx0301" "xxxxx0101"
## [37] "xxxxx0404" "xxxxx0302" "xxxxx0403" "xxxxx0401" "xxxxx0011" "xxxxx0023"
## [43] "xxxxx0020" "xxxxx0014" "xxxxx0415" "xxxxx0418" "xxxxx0417" "xxxxx0413"
## [49] "xxxxx0312" "480150319"
# table( nchar( vtdnm ) ) # should all be 9 characters
vtd.temp <- substr( vtdnm, 6, 9 )
vtd.key1 <- paste0( census.dat$county, vtd.temp )
census.dat$vtd.key1 <- vtd.key1
head( census.dat$vtd.key1, 50 )
## [1] "480010018" "480010006" "480010012" "480010024" "480050029" "480050022"
## [7] "480050035" "48005008B" "48005036B" "480050036" "480050001" "48005010B"
## [13] "48005014B" "480050038" "480050016" "480050027" "480050031" "48005016B"
## [19] "480050019" "48005017B" "48005011B" "480050032" "48007004A" "48007001A"
## [25] "480090010" "480090007" "480090009" "480090003" "480090004" "480090005"
## [31] "480110201" "480110303" "480110202" "480110402" "480110301" "480110101"
## [37] "480110404" "480110302" "480110403" "480110401" "480130011" "480130023"
## [43] "480130020" "480130014" "480150415" "480150418" "480150417" "480150413"
## [49] "480150312" "480150319"
# Voting Data
# TX state fips = 48
fips <- 48000 + as.numeric( as.character( tx$CNTY ) )
vtd.key2 <- paste0( fips, as.character( tx$VTD ) )
# table( nchar( vtd.key2 ) ) # should all be 9 characters
# vtd.key2[ nchar( vtd.key2 ) == 10 ] # not sure about these 126
head( vtd.key2, 50 )
## [1] "484530218" "482010712" "481210408" "484530210" "480850054" "480294180"
## [7] "480292088" "484770205" "480294093" "480291098" "480294101" "480293098"
## [13] "480294098" "480291051" "480291036" "480291005" "480292005" "480293140"
## [19] "480292040" "480293040" "480291010" "480294036" "480292125" "480293125"
## [25] "480291018" "480292018" "480293018" "480291041" "480292054" "480293016"
## [31] "480292081" "480294081" "480291072" "480292049" "480293072" "480294072"
## [37] "480292020" "480293020" "480294020" "480294092" "480292113" "480293100"
## [43] "480294113" "480293097" "480294097" "480291035" "480291108" "480293131"
## [49] "480294144" "480293129"
tx$vtd.key2 <- vtd.key2
tx <- tx[ , c( "vtd.key2", "Pres_D_08", "Pres_R_08", "Shape_area" ) ]
head( tx )
Of the 8,400 voting districts, we can only match 3,496 to census data. The visual descriptives of this sample are as follows:
dem.count <- as.numeric( as.character( full.dat$Pres_D_08 ))
rep.count <- as.numeric( as.character( full.dat$Pres_R_08 ))
dem <- dem.count / ( dem.count + rep.count )
sum( dem <= 0.3, na.rm=T ) # supermajority republican districts
## [1] 900
## [1] 738
h <- hist( dem, breaks=100, plot=FALSE )
cuts <- cut( h$breaks, c(-0.1,0.3,0.7,1.1), labels=c("red","gray","steelblue") )
plot( h, col=as.character(cuts),
main="Percentage Voting for Obama by District",
yaxt="n", ylab="", xlab="Percent of Votes for Obama by District")
Number of voting districts with both voting and census data available: 3496
The original census data was obtained through the API using the acs
package.
The censusapi
package above is a much more elegant method. For replication purposes the original code is included here.
Lecy, J. D., Ashley, S. R., & Santamarina, F. J. (2019). Do nonprofit missions vary by the political ideology of supporting communities? Some preliminary results. Public Performance & Management Review, 42(1), 115-141. DOWNLOAD