Data Sources
Sample Framework
- Sampling Framework
R Packages
Gathering Census Data
- Variable Transformations
Voting Districts Crosswalk
- Create a crosswalk geoid
- Add census data to voting district IDs
Voter Data
Descriptives of the Merged Sample
Save final research database
Original Census Query (deprecated):
Citation

Data Sources

2008 Presidential Election Data from Texas: [ Harvard Election Data Archive ]
ACS 2010 at Tract Level: [ US Census ]
Voting District to Census Tract Crosswalk: [ Missouri Census Data Center ]
2010 Nonprofit Firms and Locations: [ NCCS Core Files ]

Sample Framework

The goal of the study is to build a sample framework whereby we can compare nonprofits from super-majority voting districts in order to determine how nonprofit missions vary by the political ideology of the communities in which they are located.

Since democratic and republican super-majority voting districts typically look very different (suburban and white versus urban and diverse), we must first match them based upon demographics in order to create balanced voting districts (similar poverty rates, percentage minorities, and population density in each) that differ primarily on political ideology.

The replication files here explain the process of linking voting data to census data, finding voting districts that are very similar demographically be very different in voting patterns, and then locating nonprofits within those districts to analyze differences in mission.

Relationship between the three data sources in our research database.

TRACT represents Census tracts that contain geographic data about the population. Voting districts are the geographic units of aggregation for voting data within states. These two datasets need to be merged in order to have both demographic and voting data.

The nonprofit firm database comes from tax records that have been compiled by the Urban Institute (NCCS Core files). The unit of analysis of the study is nonprofit missions. We filter voting districts to eliminate those that are not supermajority districts, then further eliminate voting districts that cannot be matched to doppelgangers (for each republican supermajority district, find one democratic supermajority district that is a demographic “twin”). The add nonprofits in those districts to the sample. Repeat for all that can be reasonably matched.

Dataset after all merges are complete, prior to matching.

Sampling Framework

We followed steps are reported in Appendix A of the paper. They show the process for arriving at the 125 nonprofit mission statements that are coded for the paper.

8,400 total voting districts in Texas

1,451 Democratic supermajority districts
2,886 Republican supermajority districts

3,513 census tracts in Texas

1,305 voting districts have IDs that can be matched to census tracts
216 Democratic supermajority districts remained
464 Republican supermajority districts remained

Of the 680 super-majority voting districts that can be linked to census data, the matching procedure generated 102 districts in the balanced sample:

51 Democratic supermajority districts
51 Republican supermajority districts

Of the 22,295 nonprofits in the state, 323 are located in the matched supermajority districts and were used for analysis of NTEE codes and comparison of revenue and nonprofit age.

158 nonprofits from Democratic supermajority districts
165 nonprofits from Republican supermajority districts

Of these 323 nonprofits located in the matched voting districts, we were able to find mission statements listed on websites for 125.

74 nonprofits from Democratic supermajority districts
51 nonprofits from Republican supermajority districts

UPDATES

In the process of putting together replication files we discovered one minor error in reporting the sample framework:

Appendix A reports 3,513 census tracts in Texas in 2010.

There are actually 5,265 census tracts in Texas 2010, but only 3,513 in the voter-district-census-tract crosswalk obtained through the Missouri Census Data Center (see below).

We were also able to fix more voter district IDs to increase the merge rate in these replication examples. Compared to Appendix A in the paper (reported above) we now have:

3,496 voting districts have IDs that can be matched to census tracts
738 Democratic supermajority districts remained
900 Republican supermajority districts remained

In total that equals 1,638 supermajority voting districts that can be linked to primary census tracts, more than the 680 super-majority voting districts reported in Appendix A originally.

These things do not change the results of the study since the results represent a comparison of 102 nonprofits from matched and demographically balanced voting districts (51 republican and 51 democratic districts). So even though the matched sample was smaller, it still retains the properties that are important for achieving high internal validity with propensity score matching - only comparing the “twins” in the data instead of the full sample.

The updates are included here for anyone that wants to extend the study.

R Packages

install.packages( "rgdal" )
install.packages( "acs" )
install.packages( "censusapi" )
install.packages( "rgenoud" )
install.packages( "dplyr" )
install.packages( "stargazer" )

library( rgdal )      # read GIS shapefiles
library( acs )        # get data from census
library( censusapi )  # get data from census
library( rgenoud )    # optimization
library( dplyr )      # data wrangling
library( stargazer )  # pretty tables

Gathering Census Data

This study uses 2010 American Community Survey data from the US Census.

You can find codes for variable names at the Census API site:

https://api.census.gov/data/2010/acs/acs5/variables.html

For details on poverty measures see:

https://www.socialexplorer.com/data/ACS2013_5yr/metadata/?ds=ACS13_5yr&table=B17001

NOTE: The original study uses the acs package but I would highly recommend using Hannah Recht’s awesome censusapi package. It is much easier to use!

You will need to get a free Census API key: https://api.census.gov/data/key_signup.html

api.key.install( key="your_key_here" )

my.censuskey <- "your_key_here"

# library( censusapi )
census <- getCensus( name="acs/acs5", 
                       vintage=2010, 
                       key=my.censuskey, 
                       vars=c( "NAME", 
                               "B01002A_001E",  # median age
                               "B19013_001E",   # median household income
                               "B01003_001E",   # total population"B01003_001"
                               "B17001_002E",   # poverty
                               "B17001_001E",   # population used for pov
                               "B03003_003E",   # hispanic
                               "B02001_002E",   # race.white
                               "B02001_003E"),  # race.black
                       region="tract:*", 
                       regionin="state:48")     # texas

names( census )

##  [1] "state"        "county"       "tract"        "NAME"         "B01002A_001E"
##  [6] "B19013_001E"  "B01003_001E"  "B17001_002E"  "B17001_001E"  "B03003_003E" 
## [11] "B02001_002E"  "B02001_003E"

census$geoid <- paste0( census$state, census$county, census$tract )

names( census ) <- c("state","county","tract","NAME",
                     "medianage","income",
                     "totalpop","poverty","povbase",
                     "hispanic","white","black",
                     "geoid")

Variable Transformations

# Remove missing values
census$income[ census$income == -666666666 ] <- NA
census$medianage[ census$medianage == -666666666 ] <- NA

# Delete zero population cases so rates are finite
census$totalpop[ census$totalpop == 0 ] <- NA
census$povbase[ census$povbase == 0 ] <- NA

# Calculating rates and percentages
census <- 
  census %>%
  mutate( poverty = round( 100*(poverty/povbase), 2),
          hispanic = round( 100*(hispanic/totalpop), 2),
          white = round( 100*(white/totalpop), 2),
          black = round( 100*(black/totalpop), 2) )

# drop extra pop variable 
census <- select( census, - povbase )   

head( census )

census %>%
  select( medianage, income, totalpop, poverty,
          hispanic, white, black ) %>% 
          stargazer( type = "html", digits=0 )


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Pctl(75)	Max

medianage	5,215	36	8	11	30	42	80
income	5,209	52,713	28,259	6,140	33,831	63,284	250,001
totalpop	5,224	4,654	2,241	24	3,100	5,826	25,073
poverty	5,215	18	13	0	7	25	100
hispanic	5,224	36	28	0	13	54	100
white	5,224	72	20	0	63	87	100
black	5,224	12	17	0	1	15	100

Capture the study data in case the API changes:

write.csv( census, "TexasCensusTractData2010.csv", row.names=F )

Voting Districts Crosswalk

Voting districts and census tracts do not all share contiguous boundaries, so merging voting data and census data can be tricky. The Missouri Census Data Center has created tools that maps voting districts to census tracts using geographic apportionment. You can visit the MABLE Geocorr14 Geographic Correspondence Engine here:

http://mcdc.missouri.edu/websas/geocorr14.html

A correspondence table has been created by selecting the 2010 Census Tracts and Voting Tabulation Districts and is saved as the file “crosswalk.csv”.

Note, the variable pop10 comes from the crosswalk and refers to voting district population. The variable totalpop comes from the 2010 Census ACS and refers to the census tract population.

Since the relationships are not nested it will not be a one-to-one relationship, i.e. one voting district can match to multiple census tracts. As a result, we select the census tract for each voting district that has the highest apportionment rate (geographical overlap).

The mean apportionment rate is 89% (standard deviation of 17%), with a median of 100% overlap.

crosswalk <- read.csv( "../DATA/01-raw-data/crosswalk.csv", colClasses="character" )

head( crosswalk )

crosswalk <- crosswalk[ -1 , ] # drop first row of labels

Save the TX crosswalk for ease of sharing replication files:

crosswalk$state <- substr( crosswalk$county, 1, 2 )
table( crosswalk$state )
crosswalk.tx <- filter( crosswalk, state == "48" )
write.csv( crosswalk.tx, "../DATA/02-processed-data/VTDtoTractCrosswalkTX.csv", row.names=F )

Create a crosswalk geoid

crosswalk$tract.key <- paste( crosswalk$county, 
                              gsub( "\\.","", crosswalk$tract), sep="" )
head( crosswalk$tract.key )

## [1] "01001021000" "01001020500" "01001021000" "01001020900" "01001020900"
## [6] "01001020802"

Add census data to voting district IDs

Drop duplicate variable names:

crosswalk <- select( crosswalk, tract.key, county, cntyname, tract, vtdname, pop10, afact  )
census <- select( census, - county, - tract  )

census.dat <- merge( crosswalk, census, by.x="tract.key", by.y="geoid" )
nrow( census.dat )

## [1] 3513

head( census.dat )

Voter Data

Data was obtained from the Harvard Election Data Archive project , a source for 2008 presidential election results at a voting district level for all 50 states. Texas contains 8,400 separate voting districts (VTDs). In the 2008 election of John McCain versus Barack Obama Texas had 1,451 Democratic supermajority districts and 2,886 Republican supermajority districts, representing 51% of all voting districts in the state.

http://projects.iq.harvard.edu/eda/

The data comes as a shapefile with historic voting data embedded, so we need to load the shapefile using the rgdal package in R and extract the historic voting data frame.

Select Data Dictionary:

CNTY - County FIPS ID
VTD - Voting District ID
Shape_area - Area of voting district polygon
Pres_D_08 - Number of presidential votes for Democratic candidate in 2008
Pres_R_08 - Number of presidential votes for Republican candidate in 2008

# library( rgdal )
TX <- readOGR( "../DATA/01-raw-data","Texas_VTD" )

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jdlecy\Dropbox\04 - PAPERS\03 - Published\18 - Republican and Democratic Nonprofits\PUBLISHED\political-ideology-of-nonprofits\DATA\01-raw-data", layer: "Texas_VTD"
## with 8400 features
## It has 21 fields
## Integer64 fields read as strings:  CNTY COLOR VTDKEY CNTYKEY Gov_D_02 Gov_R_02 Pres_D_04 Pres_R_04 Gov_D_06 Gov_R_06 Pres_D_08 Pres_R_08 Gov_D_10 Gov_R_10 vap

par( mar=c(0,0,4,0) )
plot( TX, main="All Voting Districts in TX" )

Convert spatial object to a dataframe

tx <- as.data.frame( TX )
nrow( tx )

## [1] 8400

head( tx )

Voter district patterns.

dem.count <- as.numeric( as.character( tx$Pres_D_08 ))
rep.count <- as.numeric( as.character( tx$Pres_R_08 ))
dem <- dem.count / ( dem.count + rep.count )

sum( dem <= 0.3, na.rm=T ) # supermajority republican districts

## [1] 2893

sum( dem >= 0.7, na.rm=T ) # supermajority democratic districts

## [1] 1456

h <- hist( dem, breaks=100, plot=FALSE )
cuts <- cut( h$breaks, c(-0.1,0.3,0.7,1.1), labels=c("red","gray","steelblue") )
plot( h, col=as.character(cuts),  
      main="Percentage Voting for Obama by District",
      yaxt="n", ylab="", xlab="Percent of Votes for Obama by District")

Supermajority Districts in Red (R) and Blue (D)

Create Compatible IDs

The vtdname in the Census to VTD Crosswalk file, and the vtdkey in the Voting dataset are currently incompatible.

The vtdname variables has four forms:

480190407
48041010A, 48041010B, etc.
Vtng Dist 3111
Vtng Dist 03-3

Each follows a format of: SS-CCC-DIST

SS = state fips code (2 digits)
CCC = county fips code (3 digits)
DIST = voting district (4 characters)

head( census.dat$vtdname, 50 )

##  [1] "Vtng Dist 0018" "Vtng Dist 0006" "Vtng Dist 0012" "Vtng Dist 0024"
##  [5] "Vtng Dist 0029" "Vtng Dist 0022" "Vtng Dist 0035" "48005008B"     
##  [9] "48005036B"      "Vtng Dist 0036" "Vtng Dist 0001" "48005010B"     
## [13] "48005014B"      "Vtng Dist 0038" "Vtng Dist 0016" "Vtng Dist 0027"
## [17] "Vtng Dist 0031" "48005016B"      "Vtng Dist 0019" "48005017B"     
## [21] "48005011B"      "Vtng Dist 0032" "Vtng Dist 004A" "Vtng Dist 001A"
## [25] "Vtng Dist 0010" "Vtng Dist 0007" "Vtng Dist 0009" "Vtng Dist 0003"
## [29] "Vtng Dist 0004" "Vtng Dist 0005" "Vtng Dist 0201" "Vtng Dist 0303"
## [33] "Vtng Dist 0202" "Vtng Dist 0402" "Vtng Dist 0301" "Vtng Dist 0101"
## [37] "Vtng Dist 0404" "Vtng Dist 0302" "Vtng Dist 0403" "Vtng Dist 0401"
## [41] "Vtng Dist 0011" "Vtng Dist 0023" "Vtng Dist 0020" "Vtng Dist 0014"
## [45] "Vtng Dist 0415" "Vtng Dist 0418" "Vtng Dist 0417" "Vtng Dist 0413"
## [49] "Vtng Dist 0312" "480150319"

To standardize the VTD IDs:

# Census Data
vtdnm <-  census.dat$vtdname
vtdnm <- gsub( "Vtng Dist ", "xxxxx", vtdnm )
head( vtdnm, 50 )

##  [1] "xxxxx0018" "xxxxx0006" "xxxxx0012" "xxxxx0024" "xxxxx0029" "xxxxx0022"
##  [7] "xxxxx0035" "48005008B" "48005036B" "xxxxx0036" "xxxxx0001" "48005010B"
## [13] "48005014B" "xxxxx0038" "xxxxx0016" "xxxxx0027" "xxxxx0031" "48005016B"
## [19] "xxxxx0019" "48005017B" "48005011B" "xxxxx0032" "xxxxx004A" "xxxxx001A"
## [25] "xxxxx0010" "xxxxx0007" "xxxxx0009" "xxxxx0003" "xxxxx0004" "xxxxx0005"
## [31] "xxxxx0201" "xxxxx0303" "xxxxx0202" "xxxxx0402" "xxxxx0301" "xxxxx0101"
## [37] "xxxxx0404" "xxxxx0302" "xxxxx0403" "xxxxx0401" "xxxxx0011" "xxxxx0023"
## [43] "xxxxx0020" "xxxxx0014" "xxxxx0415" "xxxxx0418" "xxxxx0417" "xxxxx0413"
## [49] "xxxxx0312" "480150319"

# table( nchar( vtdnm ) )  # should all be 9 characters
vtd.temp <- substr( vtdnm, 6, 9 )
vtd.key1 <- paste0( census.dat$county, vtd.temp )
census.dat$vtd.key1 <- vtd.key1
head( census.dat$vtd.key1, 50 )

##  [1] "480010018" "480010006" "480010012" "480010024" "480050029" "480050022"
##  [7] "480050035" "48005008B" "48005036B" "480050036" "480050001" "48005010B"
## [13] "48005014B" "480050038" "480050016" "480050027" "480050031" "48005016B"
## [19] "480050019" "48005017B" "48005011B" "480050032" "48007004A" "48007001A"
## [25] "480090010" "480090007" "480090009" "480090003" "480090004" "480090005"
## [31] "480110201" "480110303" "480110202" "480110402" "480110301" "480110101"
## [37] "480110404" "480110302" "480110403" "480110401" "480130011" "480130023"
## [43] "480130020" "480130014" "480150415" "480150418" "480150417" "480150413"
## [49] "480150312" "480150319"

# Voting Data
# TX state fips = 48

fips <- 48000 + as.numeric( as.character( tx$CNTY ) )
vtd.key2 <- paste0( fips, as.character( tx$VTD ) )

# table( nchar( vtd.key2 ) )  # should all be 9 characters
# vtd.key2[ nchar( vtd.key2 ) == 10 ]  # not sure about these 126

head( vtd.key2, 50 )

##  [1] "484530218" "482010712" "481210408" "484530210" "480850054" "480294180"
##  [7] "480292088" "484770205" "480294093" "480291098" "480294101" "480293098"
## [13] "480294098" "480291051" "480291036" "480291005" "480292005" "480293140"
## [19] "480292040" "480293040" "480291010" "480294036" "480292125" "480293125"
## [25] "480291018" "480292018" "480293018" "480291041" "480292054" "480293016"
## [31] "480292081" "480294081" "480291072" "480292049" "480293072" "480294072"
## [37] "480292020" "480293020" "480294020" "480294092" "480292113" "480293100"
## [43] "480294113" "480293097" "480294097" "480291035" "480291108" "480293131"
## [49] "480294144" "480293129"

Fields to Merge

vtd.key2 - voter district id
Pres_D_08 - votes cast for Obama in 2008
Pres_R_08 - votes cast for McCain in 2008
Shape_area - area of the voting district used for population density measures

tx$vtd.key2 <- vtd.key2
tx <- tx[ , c( "vtd.key2", "Pres_D_08", "Pres_R_08", "Shape_area" ) ]
head( tx )

Merge Voting and Census Data

full.dat <- merge( census.dat, tx, by.x="vtd.key1", by.y="vtd.key2" )
nrow( full.dat )

## [1] 3496

head( full.dat )

Descriptives of the Merged Sample

Of the 8,400 voting districts, we can only match 3,496 to census data. The visual descriptives of this sample are as follows:

dem.count <- as.numeric( as.character( full.dat$Pres_D_08 ))
rep.count <- as.numeric( as.character( full.dat$Pres_R_08 ))
dem <- dem.count / ( dem.count + rep.count )

sum( dem <= 0.3, na.rm=T ) # supermajority republican districts

## [1] 900

sum( dem >= 0.7, na.rm=T ) # supermajority democratic districts

## [1] 738

h <- hist( dem, breaks=100, plot=FALSE )
cuts <- cut( h$breaks, c(-0.1,0.3,0.7,1.1), labels=c("red","gray","steelblue") )
plot( h, col=as.character(cuts),  
      main="Percentage Voting for Obama by District",
      yaxt="n", ylab="", xlab="Percent of Votes for Obama by District")

Supermajority Districts

Save final research database

Number of voting districts with both voting and census data available: 3496

write.csv( full.dat, "../DATA/02-processed-data/CensusPlusVotingAll.csv", row.names=F )

Original Census Query (deprecated):

The original census data was obtained through the API using the acs package.

The censusapi package above is a much more elegant method. For replication purposes the original code is included here.

Citation

Lecy, J. D., Ashley, S. R., & Santamarina, F. J. (2019). Do nonprofit missions vary by the political ideology of supporting communities? Some preliminary results. Public Performance & Management Review, 42(1), 115-141. DOWNLOAD

Building the Research Database