
There are three indicators to construct that show the relationship between types of income to overall revenue, and one final indicator that analyzes across all three of those indicators, picking the largest of them for any organization, which ultimately shows what the average degree of dependence on any one source of income is across all organizations.
\[Donation/Grant\: Dependence \:Ratio = \frac{Donation\: or \: Grant\: Revenues}{Total \: Revenues} \]
This metric shows how much of an organization’s total revenues come from contributions, government grants, and special event reveues. High levels in this indicator are undesirable since that means that an organization’s revenues are volatile insofar as it is dependent on contributions that are highly likely to contract during economic downturns. Low values in this indicator mean an organization is not dependent on donations.
### Earned Income Dependence Ratio \[Earned\: Income\: Dependence \:Ratio =
\frac{Earned\: Income}{Total \: Revenues} \]
This metric shows how much of an organization’s total revenues come from earned income (program service revenues, dues, assessments, profits from sales). High levels in this indicator are more desirable since that means that an organization is fairly self-sustaining with its own activities. Low values in this indicator mean an organization is likely dependent on donations or investments for their revenues and thus more vulnerable to the sentiments of donors or market forces.
\[Investment\: Income\: Dependence \:Ratio = \frac{Investment\: Income}{Total \: Revenues} \]
This metric shows how much of an organization’s total revenues come from investment income (interest, dividents, gains/losses on sales of securities or other assets). High levels in this metric indicate an organization is more depenent on investment income (and thus vulnerable to market downturns), and low values indicate their income comes more from donations or earned income.
\[Income \:Reliance \:Ratio = Max({Investment\: Income \:Ratio,\: Donation \: Dependence \: Ratio,\: Earned \: Income \: Ratio) \]
This metric shows on average how much organizations depend on a single source of revenue.
This metric is limited because it doesn’t indicate which source of revenue organizations are dependent on. Given that high values in the donation and investment ratios indicate vulnerability to market downturns while high values in the earned income ratio indicate financial independence, interpretation of this variable must be restricted to simply understanding business models.
Note: This data is available only for organizations that file full 990s. [Organizations with revenues <$200,000 and total assets <$500,000 have the option to not file a full 990 and file an EZ instead.]
Numerator: Donation Revenues [total
contributions and net special event revenue]
Denominator: Total Reveues
On 990: (Part VIII, Line 12A) -SOI PC EXTRACTS: totrevenue
On EZ: Part I, Line 9 -SOI PC EXTRACTS: totrevnue
Numerator: Earned income (program service
revenue, royalties, profits from sale of inventory, and other revenue)
Denominator: Total Reveues
On 990: (Part VIII, Line 12A) -SOI PC EXTRACTS: totrevenue
On EZ: Part I, Line 9 -SOI PC EXTRACTS: totrevnue
Numerator: Investment income (interest,
dividends, net rental income, and realized gains/losses on sales of
securities or other assets)
Denominator: Total Reveues
On 990: (Part VIII, Line 12A) -SOI PC EXTRACTS: totrevenue
On EZ: Part I, Line 9 -SOI PC EXTRACTS: totrevnue
# TEMPORARY VARIABLES
donation_revenues <- core$totcntrbgfts + core$netincfndrsng
totalrevenues <- ( core$totrevenue )
# can't divide by zero
totalrevenues[ totalrevenues == 0 ] <- NA
# SAVE RESULTS
core$donation_ratio <- donation_revenues / totalrevenues
# summary( core$donation_ratio )Check high and low values to see what makes sense.
x.05 <- quantile( core$donation_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$donation_ratio, 0.95, na.rm=T )
ggplot( core, aes(x = donation_ratio ) ) +
geom_density( alpha = 0.5) +
xlim( x.05, x.95 ) 
core2 <- core
# proportion of values that are negative
#mean( core2$donation_ratio < 0, na.rm=T )
#core2$donation_ratio[ core2$donation_ratio < 0 ] <- 0
# proption of values above 200%
#mean( core2$donation_ratio > 50, na.rm=T )
#core2$donation_ratio[ core2$donation_ratio > 50 ] <- 50
x.05 <- quantile( core$donation_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$donation_ratio, 0.95, na.rm=T )
core2 <- core
# proportion of values that are negative
# mean( core2$der < 0, na.rm=T )
# proption of values above 1%
# mean( core2$der > 5, na.rm=T )
# WINSORIZATION AT 5th and 95th PERCENTILES
core2$donation_ratio[ core2$donation_ratio < x.05 ] <- x.05
core2$donation_ratio[ core2$donation_ratio > x.95 ] <- x.95Tax data is available for full 990 filers only, so this metric does not describe any organizations with Gross receipts < $200,000 and Total assets < $500,000. Some organizations with receipts or assets below those thresholds may have filed a full 990, but these would be exceptions.
The data have been capped to those with values between 5% and 95% of the normal distribution to cut off outliers and exempt organizations with zero profitability (though negative values are allowed still).
Note: All monetary variables have been converted to thousands of dollars. Metric has been scaled/multiplied by 100 for readability.
core2 %>%
mutate( donation_ratio = donation_ratio * 100,
totrevenue = totrevenue / 1000,
totfuncexpns = totfuncexpns / 1000,
lndbldgsequipend = lndbldgsequipend / 1000,
totassetsend = totassetsend / 1000,
totliabend = totliabend / 1000,
totnetassetend = totnetassetend / 1000 ) %>%
select( STATE, NTEE1, NTMAJ12,
donation_ratio,
AGE,
totrevenue, totfuncexpns,
lndbldgsequipend, totassetsend,
totnetassetend, totliabend ) %>%
stargazer( type = s.type,
digits=2,
summary.stat = c("min","p25","median",
"mean","p75","max", "sd"),
covariate.labels = c("Donation/Grant Reliance Ratio", "Age",
"Revenue ($1k)", "Expenses($1k)",
"Buildings ($1k)", "Total Assets ($1k)",
"Net Assets ($1k)", "Liabiliies ($1k)"))| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max | St. Dev. |
| Donation/Grant Reliance Ratio | 0.00 | 7.33 | 56.36 | 51.41 | 90.41 | 100.00 | 38.21 |
| Age | 3 | 22 | 30 | 32.04 | 41 | 95 | 14.75 |
| Revenue (1k) | -5,376.77 | 258.90 | 909.40 | 4,521.71 | 3,672.25 | 408,932.00 | 14,285.64 |
| Expenses(1k) | 0.00 | 263.50 | 840.06 | 4,192.08 | 3,327.50 | 382,666.50 | 13,465.77 |
| Buildings (1k) | -4.48 | 79.14 | 824.25 | 3,504.47 | 2,868.50 | 513,508.80 | 13,210.06 |
| Total Assets (1k) | -7,552.11 | 777.90 | 2,446.11 | 9,261.85 | 7,477.25 | 672,021.00 | 27,038.89 |
| Net Assets (1k) | -178,869.70 | 155.67 | 1,093.86 | 4,553.27 | 4,078.70 | 531,067.70 | 15,470.31 |
| Liabiliies (1k) | -2,707.10 | 115.44 | 815.58 | 4,708.51 | 3,133.16 | 705,623.10 | 18,721.86 |
What proportion of orgs have Donation/Grant Reliance Ratios equal to zero?
prop.zero <- mean( core2$donation_ratio == 0, na.rm=T )In the sample, 17 percent of the organizations have Donation/Grant Reliance Ratios equal to zero, meaning they have no donation revenues. These organizations are dropped from subsequent graphs to keep the visualizations clean. The interpretation of the graphics should be the distributions of Donation/Grant Reliance Ratios for organizations that have positive or negative values.
###
### ADD QUANTILES
###
### function create_quantiles() defined in r-functions.R
core2$exp.q <- create_quantiles( var=core2$totfuncexpns, n.groups=5 )
core2$rev.q <- create_quantiles( var=core2$totrevenue, n.groups=5 )
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2$age.q <- create_quantiles( var=core2$AGE, n.groups=5 )
core2$land.q <- create_quantiles( var=core2$lndbldgsequipend, n.groups=5 )min.x <- min( core2$donation_ratio, na.rm=T )
max.x <- max( core2$donation_ratio, na.rm=T )
ggplot( core2, aes(x = donation_ratio )) +
geom_density( alpha = 0.5 ) +
xlim( min.x, max.x ) +
xlab( variable.label1 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core3 <- core2 %>% filter( ! is.na(NTEE1) )
table( core3$NTEE1) %>% sort(decreasing=TRUE) %>% kable()| Var1 | Freq |
|---|---|
| Housing | 2837 |
| Community Development | 1585 |
| Human Services | 1102 |
t <- table( factor(core3$NTEE1) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
NTEE1=names(t) )
ggplot( core3, aes( x=donation_ratio ) ) +
geom_density( alpha = 0.5) +
# xlim( -0.1, 1 ) +
labs( title="Nonprofit Subsectors" ) +
xlab( variable.label1 ) +
facet_wrap( ~ NTEE1, nrow=1 ) +
theme_minimal( base_size = 15 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_text( face="bold") ) + # size=20
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Region) %>% kable()| Var1 | Freq |
|---|---|
| Midwest | 1444 |
| Northeast | 1368 |
| South | 1610 |
| West | 1088 |
t <- table( factor(core2$Region) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Region=names(t) )
core2 %>%
filter( ! is.na(Region) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Regions" ) +
ylab( variable.label1 ) +
facet_wrap( ~ Region, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Division ) %>% kable()| Var1 | Freq |
|---|---|
| East North Central | 1038 |
| East South Central | 289 |
| Middle Atlantic | 904 |
| Mountain | 303 |
| New England | 464 |
| Pacific | 785 |
| South Atlantic | 900 |
| West North Central | 406 |
| West South Central | 421 |
t <- table( factor(core2$Division) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Division=names(t) )
core2 %>%
filter( ! is.na(Division) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Sub-Regions (10)" ) +
ylab( variable.label1 ) +
facet_wrap( ~ Division, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 ) 
ggplot( core2, aes(x = totfuncexpns )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totfuncexpns, c(0.02,0.98), na.rm=T ) )
core2$totfuncexpns[ core2$totfuncexpns < 1 ] <- 1
# core2$totfuncexpns[ is.na(core2$totfuncexpns) ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totfuncexpns), core3$donation_ratio,
xlab="Nonprofit Size (logged Expenses)",
ylab=variable.label1,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(exp.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5) +
labs( title="Nonprofit Size (logged expenses)" ) +
xlab( variable.label1 ) +
facet_wrap( ~ exp.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totrevenue )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totrevenue, c(0.02,0.98), na.rm=T ) ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totrevenue[ core2$totrevenue < 1 ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totrevenue), core3$donation_ratio,
xlab="Nonprofit Size (logged Revenue)",
ylab=variable.label1,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(rev.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged revenues)" ) +
xlab( variable.label1 ) +
facet_wrap( ~ rev.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totnetassetend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totnetassetend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totnetassetend), core3$donation_ratio,
xlab="Nonprofit Size (logged Net Assets)",
ylab=variable.label1,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2 %>%
filter( ! is.na(asset.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged net assets, if assets > 0)" ) +
xlab( variable.label1 ) +
ylab( "" ) +
facet_wrap( ~ asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
Total Assets for Comparison
core2$totassetsend[ core2$totassetsend < 1 ] <- NA
core2$tot.asset.q <- create_quantiles( var=core2$totassetsend, n.groups=5 )
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totassetsend), core3$donation_ratio,
xlab="Nonprofit Size (logged Total Assets)",
ylab=variable.label1,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
ggplot( core2, aes(x = totassetsend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totassetsend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2 %>%
filter( ! is.na(tot.asset.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Nonprofit Size (logged total assets, if assets > 0)" ) +
ylab( variable.label1 ) +
facet_wrap( ~ tot.asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = AGE )) +
geom_density( alpha = 0.5 ) 
core2$AGE[ core2$AGE < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( core3$AGE, core3$donation_ratio,
xlab="Nonprofit Age",
ylab=variable.label1 ) 
core2 %>%
filter( ! is.na(age.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Age" ) +
xlab( variable.label1 ) +
ylab( "" ) +
facet_wrap( ~ age.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = lndbldgsequipend )) +
geom_density( alpha = 0.5 ) 
core2$lndbldgsequipend[ core2$lndbldgsequipend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
jplot( log10(core3$lndbldgsequipend), core3$donation_ratio,
xlab="Land and Building Value (logged)",
ylab=variable.label1,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
}
core2 %>%
filter( ! is.na(land.q) ) %>%
ggplot( aes(donation_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Land and Building Value" ) +
xlab( variable.label1 ) +
ylab( "" ) +
facet_wrap( ~ land.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
# TEMPORARY VARIABLES
earned_income <- core$totprgmrevnue +core$royaltsinc+ core$netincsales+core$miscrevtot11e
totalrevenues <- ( core$totrevenue )
# can't divide by zero
totalrevenues[ totalrevenues == 0 ] <- NA
# SAVE RESULTS
core$earned_income_ratio <- earned_income / totalrevenues
# summary( core$earned_income_ratio )Check high and low values to see what makes sense.
x.05 <- quantile( core$earned_income_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$earned_income_ratio, 0.95, na.rm=T )
ggplot( core, aes(x = earned_income_ratio ) ) +
geom_density( alpha = 0.5) +
xlim( x.05, x.95 ) 
core2 <- core
# proportion of values that are negative
#mean( core2$earned_income_ratio < 0, na.rm=T )
#core2$earned_income_ratio[ core2$earned_income_ratio < 0 ] <- 0
# proption of values above 200%
#mean( core2$earned_income_ratio > 50, na.rm=T )
#core2$earned_income_ratio[ core2$earned_income_ratio > 50 ] <- 50
x.05 <- quantile( core$earned_income_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$earned_income_ratio, 0.95, na.rm=T )
core2 <- core
# proportion of values that are negative
# mean( core2$der < 0, na.rm=T )
# proption of values above 1%
# mean( core2$der > 5, na.rm=T )
# WINSORIZATION AT 5th and 95th PERCENTILES
core2$earned_income_ratio[ core2$earned_income_ratio < x.05 ] <- x.05
core2$earned_income_ratio[ core2$earned_income_ratio > x.95 ] <- x.95Tax data is available for full 990 filers only, so this metric does not describe any organizations with Gross receipts < $200,000 and Total assets < $500,000. Some organizations with receipts or assets below those thresholds may have filed a full 990, but these would be exceptions.
The data have been capped to those with values between 5% and 95% of the normal distribution to cut off outliers and exempt organizations with zero profitability (though negative values are allowed still).
Note: All monetary variables have been converted to thousands of dollars.
core2 %>%
mutate( # earned_income_ratio = earned_income_ratio * 10000,
totrevenue = totrevenue / 1000,
totfuncexpns = totfuncexpns / 1000,
lndbldgsequipend = lndbldgsequipend / 1000,
totassetsend = totassetsend / 1000,
totliabend = totliabend / 1000,
totnetassetend = totnetassetend / 1000 ) %>%
select( STATE, NTEE1, NTMAJ12,
earned_income_ratio,
AGE,
totrevenue, totfuncexpns,
lndbldgsequipend, totassetsend,
totnetassetend, totliabend ) %>%
stargazer( type = s.type,
digits=2,
summary.stat = c("min","p25","median",
"mean","p75","max", "sd"),
covariate.labels = c("Earned Income Reliance Ratio", "Age",
"Revenue ($1k)", "Expenses($1k)",
"Buildings ($1k)", "Total Assets ($1k)",
"Net Assets ($1k)", "Liabiliies ($1k)"))| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max | St. Dev. |
| Earned Income Reliance Ratio | 0.00 | 0.05 | 0.35 | 0.43 | 0.81 | 1.00 | 0.38 |
| Age | 3 | 22 | 30 | 32.04 | 41 | 95 | 14.75 |
| Revenue (1k) | -5,376.77 | 258.90 | 909.40 | 4,521.71 | 3,672.25 | 408,932.00 | 14,285.64 |
| Expenses(1k) | 0.00 | 263.50 | 840.06 | 4,192.08 | 3,327.50 | 382,666.50 | 13,465.77 |
| Buildings (1k) | -4.48 | 79.14 | 824.25 | 3,504.47 | 2,868.50 | 513,508.80 | 13,210.06 |
| Total Assets (1k) | -7,552.11 | 777.90 | 2,446.11 | 9,261.85 | 7,477.25 | 672,021.00 | 27,038.89 |
| Net Assets (1k) | -178,869.70 | 155.67 | 1,093.86 | 4,553.27 | 4,078.70 | 531,067.70 | 15,470.31 |
| Liabiliies (1k) | -2,707.10 | 115.44 | 815.58 | 4,708.51 | 3,133.16 | 705,623.10 | 18,721.86 |
What proportion of orgs have Earned Income Reliance Ratios equal to zero?
prop.zero <- mean( core2$earned_income_ratio == 0, na.rm=T )In the sample, 10 percent of the organizations have Earned Income Reliance Ratios equal to zero, meaning they have no earned income. These organizations are dropped from subsequent graphs to keep the visualizations clean. The interpretation of the graphics should be the distributions of Earned Income Reliance Ratios for organizations that have positive or negative values.
###
### ADD QUANTILES
###
### function create_quantiles() defined in r-functions.R
core2$exp.q <- create_quantiles( var=core2$totfuncexpns, n.groups=5 )
core2$rev.q <- create_quantiles( var=core2$totrevenue, n.groups=5 )
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2$age.q <- create_quantiles( var=core2$AGE, n.groups=5 )
core2$land.q <- create_quantiles( var=core2$lndbldgsequipend, n.groups=5 )min.x <- min( core2$earned_income_ratio, na.rm=T )
max.x <- max( core2$earned_income_ratio, na.rm=T )
ggplot( core2, aes(x = earned_income_ratio )) +
geom_density( alpha = 0.5 ) +
xlim( min.x, max.x ) +
xlab( variable.label2 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core3 <- core2 %>% filter( ! is.na(NTEE1) )
table( core3$NTEE1) %>% sort(decreasing=TRUE) %>% kable()| Var1 | Freq |
|---|---|
| Housing | 2837 |
| Community Development | 1585 |
| Human Services | 1102 |
t <- table( factor(core3$NTEE1) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
NTEE1=names(t) )
ggplot( core3, aes( x=earned_income_ratio ) ) +
geom_density( alpha = 0.5) +
# xlim( -0.1, 1 ) +
labs( title="Nonprofit Subsectors" ) +
xlab( variable.label2 ) +
facet_wrap( ~ NTEE1, nrow=1 ) +
theme_minimal( base_size = 15 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_text( face="bold") ) + # size=20
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Region) %>% kable()| Var1 | Freq |
|---|---|
| Midwest | 1444 |
| Northeast | 1368 |
| South | 1610 |
| West | 1088 |
t <- table( factor(core2$Region) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Region=names(t) )
core2 %>%
filter( ! is.na(Region) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Regions" ) +
ylab( variable.label2 ) +
facet_wrap( ~ Region, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Division ) %>% kable()| Var1 | Freq |
|---|---|
| East North Central | 1038 |
| East South Central | 289 |
| Middle Atlantic | 904 |
| Mountain | 303 |
| New England | 464 |
| Pacific | 785 |
| South Atlantic | 900 |
| West North Central | 406 |
| West South Central | 421 |
t <- table( factor(core2$Division) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Division=names(t) )
core2 %>%
filter( ! is.na(Division) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Sub-Regions (10)" ) +
ylab( variable.label2 ) +
facet_wrap( ~ Division, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 ) 
ggplot( core2, aes(x = totfuncexpns )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totfuncexpns, c(0.02,0.98), na.rm=T ) )
core2$totfuncexpns[ core2$totfuncexpns < 1 ] <- 1
# core2$totfuncexpns[ is.na(core2$totfuncexpns) ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totfuncexpns), core3$earned_income_ratio,
xlab="Nonprofit Size (logged Expenses)",
ylab=variable.label2,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(exp.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5) +
labs( title="Nonprofit Size (logged expenses)" ) +
xlab( variable.label2 ) +
facet_wrap( ~ exp.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totrevenue )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totrevenue, c(0.02,0.98), na.rm=T ) ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totrevenue[ core2$totrevenue < 1 ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totrevenue), core3$earned_income_ratio,
xlab="Nonprofit Size (logged Revenue)",
ylab=variable.label2,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(rev.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged revenues)" ) +
xlab( variable.label2 ) +
facet_wrap( ~ rev.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totnetassetend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totnetassetend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totnetassetend), core3$earned_income_ratio,
xlab="Nonprofit Size (logged Net Assets)",
ylab=variable.label2,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2 %>%
filter( ! is.na(asset.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged net assets, if assets > 0)" ) +
xlab( variable.label2 ) +
ylab( "" ) +
facet_wrap( ~ asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
Total Assets for Comparison
core2$totassetsend[ core2$totassetsend < 1 ] <- NA
core2$tot.asset.q <- create_quantiles( var=core2$totassetsend, n.groups=5 )
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totassetsend), core3$earned_income_ratio,
xlab="Nonprofit Size (logged Total Assets)",
ylab=variable.label2,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
ggplot( core2, aes(x = totassetsend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totassetsend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2 %>%
filter( ! is.na(tot.asset.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Nonprofit Size (logged total assets, if assets > 0)" ) +
ylab( variable.label2 ) +
facet_wrap( ~ tot.asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = AGE )) +
geom_density( alpha = 0.5 ) 
core2$AGE[ core2$AGE < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( core3$AGE, core3$earned_income_ratio,
xlab="Nonprofit Age",
ylab=variable.label2 ) 
core2 %>%
filter( ! is.na(age.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Age" ) +
xlab( variable.label2 ) +
ylab( "" ) +
facet_wrap( ~ age.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = lndbldgsequipend )) +
geom_density( alpha = 0.5 ) 
core2$lndbldgsequipend[ core2$lndbldgsequipend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
jplot( log10(core3$lndbldgsequipend), core3$earned_income_ratio,
xlab="Land and Building Value (logged)",
ylab=variable.label2,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
}
core2 %>%
filter( ! is.na(land.q) ) %>%
ggplot( aes(earned_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Land and Building Value" ) +
xlab( variable.label2 ) +
ylab( "" ) +
facet_wrap( ~ land.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
# TEMPORARY VARIABLES
investment_income <- core$invstmntinc + core$txexmptbndsproceeds + core$netrntlinc + core$netgnls
totalrevenues <- ( core$totrevenue )
# can't divide by zero
totalrevenues[ totalrevenues == 0 ] <- NA
# SAVE RESULTS
core$investment_income_ratio <- investment_income / totalrevenues
# summary( core$investment_income_ratio )Check high and low values to see what makes sense.
x.05 <- quantile( core$investment_income_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$investment_income_ratio, 0.95, na.rm=T )
ggplot( core, aes(x = investment_income_ratio ) ) +
geom_density( alpha = 0.5) +
xlim( x.05, x.95 ) 
core2 <- core
# proportion of values that are negative
#mean( core2$investment_income_ratio < 0, na.rm=T )
#core2$investment_income_ratio[ core2$investment_income_ratio < 0 ] <- 0
# proption of values above 200%
#mean( core2$investment_income_ratio > 50, na.rm=T )
#core2$investment_income_ratio[ core2$investment_income_ratio > 50 ] <- 50
x.05 <- quantile( core$investment_income_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$investment_income_ratio, 0.95, na.rm=T )
core2 <- core
# proportion of values that are negative
# mean( core2$der < 0, na.rm=T )
# proption of values above 1%
# mean( core2$der > 5, na.rm=T )
# WINSORIZATION AT 5th and 95th PERCENTILES
core2$investment_income_ratio[ core2$investment_income_ratio < x.05 ] <- x.05
core2$investment_income_ratio[ core2$investment_income_ratio > x.95 ] <- x.95Tax data is available for full 990 filers only, so this metric does not describe any organizations with Gross receipts < $200,000 and Total assets < $500,000. Some organizations with receipts or assets below those thresholds may have filed a full 990, but these would be exceptions.
The data have been capped to those with values between 5% and 95% of the normal distribution to cut off outliers and exempt organizations with zero profitability (though negative values are allowed still).
Note: All monetary variables have been converted to thousands of dollars.
core2 %>%
mutate( # investment_income_ratio = investment_income_ratio * 10000,
totrevenue = totrevenue / 1000,
totfuncexpns = totfuncexpns / 1000,
lndbldgsequipend = lndbldgsequipend / 1000,
totassetsend = totassetsend / 1000,
totliabend = totliabend / 1000,
totnetassetend = totnetassetend / 1000 ) %>%
select( STATE, NTEE1, NTMAJ12,
investment_income_ratio,
AGE,
totrevenue, totfuncexpns,
lndbldgsequipend, totassetsend,
totnetassetend, totliabend ) %>%
stargazer( type = s.type,
digits=2,
summary.stat = c("min","p25","median",
"mean","p75","max", "sd"),
covariate.labels = c("Investment Income Reliance Ratio", "Age",
"Revenue ($1k)", "Expenses($1k)",
"Buildings ($1k)", "Total Assets ($1k)",
"Net Assets ($1k)", "Liabiliies ($1k)"))| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max | St. Dev. |
| Investment Income Reliance Ratio | -0.01 | 0.00 | 0.001 | 0.04 | 0.02 | 0.41 | 0.10 |
| Age | 3 | 22 | 30 | 32.04 | 41 | 95 | 14.75 |
| Revenue (1k) | -5,376.77 | 258.90 | 909.40 | 4,521.71 | 3,672.25 | 408,932.00 | 14,285.64 |
| Expenses(1k) | 0.00 | 263.50 | 840.06 | 4,192.08 | 3,327.50 | 382,666.50 | 13,465.77 |
| Buildings (1k) | -4.48 | 79.14 | 824.25 | 3,504.47 | 2,868.50 | 513,508.80 | 13,210.06 |
| Total Assets (1k) | -7,552.11 | 777.90 | 2,446.11 | 9,261.85 | 7,477.25 | 672,021.00 | 27,038.89 |
| Net Assets (1k) | -178,869.70 | 155.67 | 1,093.86 | 4,553.27 | 4,078.70 | 531,067.70 | 15,470.31 |
| Liabiliies (1k) | -2,707.10 | 115.44 | 815.58 | 4,708.51 | 3,133.16 | 705,623.10 | 18,721.86 |
What proportion of orgs have Investment Income Reliance Ratios equal to zero?
prop.zero <- mean( core2$investment_income_ratio == 0, na.rm=T )In the sample, 18 percent of the organizations have Investment Income Reliance Ratios equal to zero, meaning they have no earned income. These organizations are dropped from subsequent graphs to keep the visualizations clean. The interpretation of the graphics should be the distributions of Investment Income Reliance Ratios for organizations that have positive or negative values.
###
### ADD QUANTILES
###
### function create_quantiles() defined in r-functions.R
core2$exp.q <- create_quantiles( var=core2$totfuncexpns, n.groups=5 )
core2$rev.q <- create_quantiles( var=core2$totrevenue, n.groups=5 )
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2$age.q <- create_quantiles( var=core2$AGE, n.groups=5 )
core2$land.q <- create_quantiles( var=core2$lndbldgsequipend, n.groups=5 )min.x <- min( core2$investment_income_ratio, na.rm=T )
max.x <- max( core2$investment_income_ratio, na.rm=T )
ggplot( core2, aes(x = investment_income_ratio )) +
geom_density( alpha = 0.5 ) +
xlim( min.x, max.x ) +
xlab( variable.label3 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core3 <- core2 %>% filter( ! is.na(NTEE1) )
table( core3$NTEE1) %>% sort(decreasing=TRUE) %>% kable()| Var1 | Freq |
|---|---|
| Housing | 2837 |
| Community Development | 1585 |
| Human Services | 1102 |
t <- table( factor(core3$NTEE1) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
NTEE1=names(t) )
ggplot( core3, aes( x=investment_income_ratio ) ) +
geom_density( alpha = 0.5) +
# xlim( -0.1, 1 ) +
labs( title="Nonprofit Subsectors" ) +
xlab( variable.label3 ) +
facet_wrap( ~ NTEE1, nrow=1 ) +
theme_minimal( base_size = 15 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_text( face="bold") ) + # size=20
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Region) %>% kable()| Var1 | Freq |
|---|---|
| Midwest | 1444 |
| Northeast | 1368 |
| South | 1610 |
| West | 1088 |
t <- table( factor(core2$Region) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Region=names(t) )
core2 %>%
filter( ! is.na(Region) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Regions" ) +
ylab( variable.label3 ) +
facet_wrap( ~ Region, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Division ) %>% kable()| Var1 | Freq |
|---|---|
| East North Central | 1038 |
| East South Central | 289 |
| Middle Atlantic | 904 |
| Mountain | 303 |
| New England | 464 |
| Pacific | 785 |
| South Atlantic | 900 |
| West North Central | 406 |
| West South Central | 421 |
t <- table( factor(core2$Division) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Division=names(t) )
core2 %>%
filter( ! is.na(Division) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Sub-Regions (10)" ) +
ylab( variable.label3 ) +
facet_wrap( ~ Division, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 ) 
ggplot( core2, aes(x = totfuncexpns )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totfuncexpns, c(0.02,0.98), na.rm=T ) )
core2$totfuncexpns[ core2$totfuncexpns < 1 ] <- 1
# core2$totfuncexpns[ is.na(core2$totfuncexpns) ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totfuncexpns), core3$investment_income_ratio,
xlab="Nonprofit Size (logged Expenses)",
ylab=variable.label3,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(exp.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5) +
labs( title="Nonprofit Size (logged expenses)" ) +
xlab( variable.label3 ) +
facet_wrap( ~ exp.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totrevenue )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totrevenue, c(0.02,0.98), na.rm=T ) ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totrevenue[ core2$totrevenue < 1 ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totrevenue), core3$investment_income_ratio,
xlab="Nonprofit Size (logged Revenue)",
ylab=variable.label3,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(rev.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged revenues)" ) +
xlab( variable.label3 ) +
facet_wrap( ~ rev.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totnetassetend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totnetassetend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totnetassetend), core3$investment_income_ratio,
xlab="Nonprofit Size (logged Net Assets)",
ylab=variable.label3,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2 %>%
filter( ! is.na(asset.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged net assets, if assets > 0)" ) +
xlab( variable.label3 ) +
ylab( "" ) +
facet_wrap( ~ asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
Total Assets for Comparison
core2$totassetsend[ core2$totassetsend < 1 ] <- NA
core2$tot.asset.q <- create_quantiles( var=core2$totassetsend, n.groups=5 )
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totassetsend), core3$investment_income_ratio,
xlab="Nonprofit Size (logged Total Assets)",
ylab=variable.label3,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
ggplot( core2, aes(x = totassetsend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totassetsend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2 %>%
filter( ! is.na(tot.asset.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Nonprofit Size (logged total assets, if assets > 0)" ) +
ylab( variable.label3 ) +
facet_wrap( ~ tot.asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = AGE )) +
geom_density( alpha = 0.5 ) 
core2$AGE[ core2$AGE < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( core3$AGE, core3$investment_income_ratio,
xlab="Nonprofit Age",
ylab=variable.label3 ) 
core2 %>%
filter( ! is.na(age.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Age" ) +
xlab( variable.label3 ) +
ylab( "" ) +
facet_wrap( ~ age.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = lndbldgsequipend )) +
geom_density( alpha = 0.5 ) 
core2$lndbldgsequipend[ core2$lndbldgsequipend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
jplot( log10(core3$lndbldgsequipend), core3$investment_income_ratio,
xlab="Land and Building Value (logged)",
ylab=variable.label3,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
}
core2 %>%
filter( ! is.na(land.q) ) %>%
ggplot( aes(investment_income_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Land and Building Value" ) +
xlab( variable.label3 ) +
ylab( "" ) +
facet_wrap( ~ land.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
dat <- dplyr::select(core3, donation_ratio, investment_income_ratio, earned_income_ratio )
core$income_reliance_ratio <- apply( dat, MARGIN=1, FUN=max )
# summary( core3$income_reliance_ratio )Check high and low values to see what makes sense.
x.05 <- quantile( core$income_reliance_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$income_reliance_ratio, 0.95, na.rm=T )
ggplot( core, aes(x = income_reliance_ratio ) ) +
geom_density( alpha = 0.5) +
xlim( x.05, x.95 ) 
core2 <- core
# proportion of values that are negative
#mean( core3$income_reliance_ratio < 0, na.rm=T )
#core3$income_reliance_ratio[ core3$income_reliance_ratio < 0 ] <- 0
# proption of values above 200%
#mean( core3$income_reliance_ratio > 50, na.rm=T )
#core3$income_reliance_ratio[ core3$income_reliance_ratio > 50 ] <- 50
x.05 <- quantile( core$income_reliance_ratio, 0.05, na.rm=T )
x.95 <- quantile( core$income_reliance_ratio, 0.95, na.rm=T )
core2 <- core
# proportion of values that are negative
# mean( core2$der < 0, na.rm=T )
# proption of values above 1%
# mean( core2$der > 5, na.rm=T )
# WINSORIZATION AT 5th and 95th PERCENTILES
core2$income_reliance_ratio[ core2$income_reliance_ratio < x.05 ] <- x.05
core2$income_reliance_ratio[ core2$income_reliance_ratio > x.95 ] <- x.95Tax data is available for full 990 filers only, so this metric does not describe any organizations with Gross receipts < $200,000 and Total assets < $500,000. Some organizations with receipts or assets below those thresholds may have filed a full 990, but these would be exceptions.
The data have been capped to those with values between 5% and 95% of the normal distribution to cut off outliers and exempt organizations with zero profitability (though negative values are allowed still).
Note: All monetary variables have been converted to thousands of dollars.
core2 %>%
mutate( # income_reliance_ratio = income_reliance_ratio * 10000,
totrevenue = totrevenue / 1000,
totfuncexpns = totfuncexpns / 1000,
lndbldgsequipend = lndbldgsequipend / 1000,
totassetsend = totassetsend / 1000,
totliabend = totliabend / 1000,
totnetassetend = totnetassetend / 1000 ) %>%
select( STATE, NTEE1, NTMAJ12,
income_reliance_ratio,
AGE,
totrevenue, totfuncexpns,
lndbldgsequipend, totassetsend,
totnetassetend, totliabend ) %>%
stargazer( type = s.type,
digits=2,
summary.stat = c("min","p25","median",
"mean","p75","max", "sd"),
covariate.labels = c("Income Reliance Ratio", "Age",
"Revenue ($1k)", "Expenses($1k)",
"Buildings ($1k)", "Total Assets ($1k)",
"Net Assets ($1k)", "Liabiliies ($1k)"))| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max | St. Dev. |
| Income Reliance Ratio | 0.47 | 0.66 | 0.87 | 0.82 | 0.99 | 1.00 | 0.18 |
| Age | 3 | 22 | 30 | 32.04 | 41 | 95 | 14.75 |
| Revenue (1k) | -5,376.77 | 258.90 | 909.40 | 4,521.71 | 3,672.25 | 408,932.00 | 14,285.64 |
| Expenses(1k) | 0.00 | 263.50 | 840.06 | 4,192.08 | 3,327.50 | 382,666.50 | 13,465.77 |
| Buildings (1k) | -4.48 | 79.14 | 824.25 | 3,504.47 | 2,868.50 | 513,508.80 | 13,210.06 |
| Total Assets (1k) | -7,552.11 | 777.90 | 2,446.11 | 9,261.85 | 7,477.25 | 672,021.00 | 27,038.89 |
| Net Assets (1k) | -178,869.70 | 155.67 | 1,093.86 | 4,553.27 | 4,078.70 | 531,067.70 | 15,470.31 |
| Liabiliies (1k) | -2,707.10 | 115.44 | 815.58 | 4,708.51 | 3,133.16 | 705,623.10 | 18,721.86 |
What proportion of orgs have Income Reliance Ratios equal to zero?
prop.zero <- mean( core$income_reliance_ratio == 0, na.rm=T )In the sample, 0 percent of the organizations have Income Reliance Ratios equal to zero, meaning their highest source of income is equal to zero. These organizations are dropped from subsequent graphs to keep the visualizations clean. The interpretation of the graphics should be the distributions of Income Reliance Ratios for organizations that have positive or negative values.
###
### ADD QUANTILES
###
### function create_quantiles() defined in r-functions.R
core2$exp.q <- create_quantiles( var=core2$totfuncexpns, n.groups=5 )
core2$rev.q <- create_quantiles( var=core2$totrevenue, n.groups=5 )
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2$age.q <- create_quantiles( var=core2$AGE, n.groups=5 )
core2$land.q <- create_quantiles( var=core2$lndbldgsequipend, n.groups=5 )min.x <- min( core2$income_reliance_ratio, na.rm=T )
max.x <- max( core2$income_reliance_ratio, na.rm=T )
ggplot( core2, aes(x = income_reliance_ratio )) +
geom_density( alpha = 0.5 ) +
xlim( min.x, max.x ) +
xlab( variable.label4 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core3 <- core2 %>% filter( ! is.na(NTEE1) )
table( core3$NTEE1) %>% sort(decreasing=TRUE) %>% kable()| Var1 | Freq |
|---|---|
| Housing | 2837 |
| Community Development | 1585 |
| Human Services | 1102 |
t <- table( factor(core3$NTEE1) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
NTEE1=names(t) )
ggplot( core3, aes( x=income_reliance_ratio ) ) +
geom_density( alpha = 0.5) +
# xlim( -0.1, 1 ) +
labs( title="Nonprofit Subsectors" ) +
xlab( variable.label4 ) +
facet_wrap( ~ NTEE1, nrow=1 ) +
theme_minimal( base_size = 15 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_text( face="bold") ) + # size=20
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Region) %>% kable()| Var1 | Freq |
|---|---|
| Midwest | 1444 |
| Northeast | 1368 |
| South | 1610 |
| West | 1088 |
t <- table( factor(core2$Region) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Region=names(t) )
core2 %>%
filter( ! is.na(Region) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Regions" ) +
ylab( variable.label4 ) +
facet_wrap( ~ Region, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 )
table( core2$Division ) %>% kable()| Var1 | Freq |
|---|---|
| East North Central | 1038 |
| East South Central | 289 |
| Middle Atlantic | 904 |
| Mountain | 303 |
| New England | 464 |
| Pacific | 785 |
| South Atlantic | 900 |
| West North Central | 406 |
| West South Central | 421 |
t <- table( factor(core2$Division) )
df <- data.frame( x=Inf, y=Inf,
N=paste0( "N=", as.character(t) ),
Division=names(t) )
core2 %>%
filter( ! is.na(Division) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Census Sub-Regions (10)" ) +
ylab( variable.label4 ) +
facet_wrap( ~ Division, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() ) +
geom_text( data=df,
aes(x, y, label=N ),
hjust=2, vjust=3,
color="gray60", size=6 ) 
ggplot( core2, aes(x = totfuncexpns )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totfuncexpns, c(0.02,0.98), na.rm=T ) )
core2$totfuncexpns[ core2$totfuncexpns < 1 ] <- 1
# core2$totfuncexpns[ is.na(core2$totfuncexpns) ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totfuncexpns), core3$income_reliance_ratio,
xlab="Nonprofit Size (logged Expenses)",
ylab=variable.label4,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(exp.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5) +
labs( title="Nonprofit Size (logged expenses)" ) +
xlab( variable.label4 ) +
facet_wrap( ~ exp.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totrevenue )) +
geom_density( alpha = 0.5 ) +
xlim( quantile(core2$totrevenue, c(0.02,0.98), na.rm=T ) ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totrevenue[ core2$totrevenue < 1 ] <- 1
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totrevenue), core3$income_reliance_ratio,
xlab="Nonprofit Size (logged Revenue)",
ylab=variable.label4,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2 %>%
filter( ! is.na(rev.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged revenues)" ) +
xlab( variable.label4 ) +
facet_wrap( ~ rev.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = totnetassetend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totnetassetend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totnetassetend), core3$income_reliance_ratio,
xlab="Nonprofit Size (logged Net Assets)",
ylab=variable.label4,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
core2$totnetassetend[ core2$totnetassetend < 1 ] <- NA
core2$asset.q <- create_quantiles( var=core2$totnetassetend, n.groups=5 )
core2 %>%
filter( ! is.na(asset.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Size (logged net assets, if assets > 0)" ) +
xlab( variable.label4 ) +
ylab( "" ) +
facet_wrap( ~ asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
Total Assets for Comparison
core2$totassetsend[ core2$totassetsend < 1 ] <- NA
core2$tot.asset.q <- create_quantiles( var=core2$totassetsend, n.groups=5 )
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( log10(core3$totassetsend), core3$income_reliance_ratio,
xlab="Nonprofit Size (logged Total Assets)",
ylab=variable.label4,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
ggplot( core2, aes(x = totassetsend )) +
geom_density( alpha = 0.5) +
xlim( quantile(core2$totassetsend, c(0.02,0.98), na.rm=T ) ) +
xlab( "Net Assets" ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core2 %>%
filter( ! is.na(tot.asset.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
xlab( "Nonprofit Size (logged total assets, if assets > 0)" ) +
ylab( variable.label4 ) +
facet_wrap( ~ tot.asset.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = AGE )) +
geom_density( alpha = 0.5 ) 
core2$AGE[ core2$AGE < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
}
jplot( core3$AGE, core3$income_reliance_ratio,
xlab="Nonprofit Age",
ylab=variable.label4 ) 
core2 %>%
filter( ! is.na(age.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Nonprofit Age" ) +
xlab( variable.label4 ) +
ylab( "" ) +
facet_wrap( ~ age.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
ggplot( core2, aes(x = lndbldgsequipend )) +
geom_density( alpha = 0.5 ) 
core2$lndbldgsequipend[ core2$lndbldgsequipend < 1 ] <- NA
if( nrow(core2) > 10000 )
{
core3 <- sample_n( core2, 10000 )
} else
{
core3 <- core2
jplot( log10(core3$lndbldgsequipend), core3$income_reliance_ratio,
xlab="Land and Building Value (logged)",
ylab=variable.label4,
xaxt="n", xlim=c(3,10) )
axis( side=1,
at=c(3,4,5,6,7,8,9,10),
labels=c("1k","10k","100k","1m","10m","100m","1b","10b") )
}
core2 %>%
filter( ! is.na(land.q) ) %>%
ggplot( aes(income_reliance_ratio) ) +
geom_density( alpha = 0.5 ) +
labs( title="Land and Building Value" ) +
xlab( variable.label4 ) +
ylab( "" ) +
facet_wrap( ~ land.q, nrow=3 ) +
theme_minimal( base_size = 22 ) +
theme( axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank() )
core.donation_ratio <- select( core, ein, tax_pd, donation_ratio )
saveRDS( core.donation_ratio, "03-data-ratios/m-14-income-reliance-ratios.rds" )
write.csv( core.donation_ratio, "03-data-ratios/m-14-income-reliance-ratios.csv" )