Open Data for Nonprofit Research

A collection of tools for creating, wrangling, and sharing data on the nonprofit sector

This project is maintained by lecy

Open Data for Nonprofit Research


A slightly updated list of datasets is available on the Nonprofit Open Data Collective website.

There is a new package for harvesting IRS 990 Efile data and converting XML files into a well-structured database:

https://github.com/Nonprofit-Open-Data-Collective/irs990efile

Stay tuned: several datasets and packages will be released in Fall 2022!

https://github.com/orgs/Nonprofit-Open-Data-Collective/repositories


The IRS maintains several important nonprofit databases to track the current population of exempt organizations, their annual 990 filings, and organizations that have closed. This data has been released in formats that are not always easy to use - ASCII text files, json files, and XML queries.

In order to make the data accessible to the research community, we have created scripts to download data from IRS websites, clean and process it, and export into familiar formats (CSV, Stata, SPSS, etc.).

We have begun the process to catalog and document these resources, and will begin sharing them through the Nonprofit Open Data Collective portal:

The most up-to-date files are posted on Data World.

But we also have some legacy files on our Dataverse site.


Available Data

We have documented and posted the following open data assets:

  1. IRS E-Filer Database: All nonprofit 990 data that is filed electronically, about 60% of nonprofits.
  2. Index of all E-Filers from 2009 to Present: A list of all organizations that have electronically filed each year.
  3. Current Exempt Organizations: The current list of all tax-exempt organizations.
  4. IRS Business Master File: Organizational characteristics of all current exempt organizations.
  5. 990N Postcard Filers: Data on nonprofits that are small enough to file the abbreviated “postcard” version of the 990 form.
  6. IRS Automatic Revocations:” Database of nonprofits that had their tax exempt status revoked for failing to file.
  7. Organizations Granted Tax Exempt Status through 1023-EZ Form: Data filed electronically on the new shorter 1023-EZ application for 501(c) status.


The National Center for Charitable Statistics at the Urban Institute has opened up their data archives!

NCCS Open Data Portal


(1) IRS E-Filer 990 Data

The IRS has released all nonprofit 990 tax data that has been e-filed through their online system, approximately 60-65% of all 990-PC and 990-EZ filers. It is available for years 2012 to current years with a small set of returns avaialable for 2010 and 2011. The data has been posted as XML files in an Amazon Web Server (AWS) Cloud Server. More details about the data and the push to have it made public are below.

In order to support use of this data, we have converted the XML files into a research database similar to the NCCS Core dataset.

FORM 2009 2010 2011 2012 2013 2014 2015
990 33,360 123,107 159,539 179,675 198,615 215,764 73,233
990EZ 15,500 63.253 82,066 93,769 104,425 114,822 60,967
990PF 2,352 25,275 34,597 39,936 45,870 52,617 34,387

Check out a quick guide to working with XML files in R: [ HTML ] [ PDF ]

You can download the data in CSV and RDS formats here: [ Data Dictionary ] [ Link to Dataset ]

Liberating the 990 Data

For some background on the campaigns to open access to IRS data, see these articles and blogs:

Working With 990 Data

Example Forms:

Form 990: A Guide for Newcomers to Nonprofit Research [ LINK ]

A History of the Tax Exempt Sector: An SOI Perspective [ LINK ]

A Guided Tour of the 990 Form by GuideStar [ LINK ]

Revised Form 990: The Evolution of Governance and the Nonprofit World [ LINK ]

Wikipedia: History of the 990 [ LINK ]

Resources for the AWS Data

Charity Navigator has created an open-source 990 Toolkit that allows you to set up an Amazon EC2 instance and clone the full IRS dataset as a relational database. You can read their press release about the project here.

Chad Kruse at SmarterGiving has a script to convert 990-PF XML files into a MongoDB database on GitHub here.

You can find some useful scripts here for running queries directly within the cloud and downloading data as CSV files, for example this GitHub gist.

If you are more comfortable in Python, check out Yash Nanavati’s GitHup repo.

There are some forums on using the E-Filer data, for example this reddit forum.


(2) Index of 990, 990-EZ and 990-PF Electronic Filers from 2009 to Present

We provide an R script that builds the INDEX file (not the full dataset) for all IRS E-Filer open data provided on the Amazon Web Server. The index contains a limited number of variables such as nonprofit name, EIN, tax year, form type, and the URL link to the XML form of the 990 return data. This index file allows you to see what is available in the open E-Filer database.

FORM 2009 2010 2011 2012 2013 2014 2015
990 33,360 123,107 159,539 179,675 198,615 215,764 73,233
990EZ 15,500 63.253 82,066 93,769 104,425 114,822 60,967
990PF 2,352 25,275 34,597 39,936 45,870 52,617 34,387

[Data Dictionary] [Link to Dataset]


(3) List of all Current Exempt Organizations (all orgs granted 501(c)(3) status)

The IRS Publication 78 contains a list of all organizations that currently have 501(c)(3) tax exempt status and are in good standing (eligible to receive tax-deductible donations) under IRS code.

[ Data Dictionary ] [ Link to Dataset ]


(4) Business Master File of All Current Exempt Orgs

The IRS Exempt Organization Business Master File Extract (EO BMF) contains information on all active nonprofits including basic information about nonprofit location, ruling date (when they were granted tax exempt status), and activities. Note that the NTEE codes are noisy and incomplete. It is recommended to use the NCCS codes instead.

[ Data Dictionary ] [ Link to Dataset ]


(5) All 990-N Postcard Filers

Most small tax-exempt organizations whose annual gross receipts are normally $50,000 or less can satisfy their annual reporting requirement by electronically submitting Form 990-N if they choose not to file Form 990 or Form 990-EZ instead. Exceptions to this requirement include:

The Postcard Filers dataset contains close to a million cases from the following years:

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
26,969 28,704 45,846 31,734 36,457 36,779 52,202 120,831 475,084 65,211


[ Data Dictionary ] [ Link to Dataset ]


(6) All Organizations with a Revoked 501(c)(.) Status

Nonprofits that fail to file 990 returns for three years have their 501(c)(3) tax exempt status automatically revoked by the IRS. This dataset contains more than 670,000 cases for the following years:

2010 2011 2012 2013 2014 2015 2016
372,717 92,360 47,506 52,111 36,973 36,935 35,046


[ Data Dictionary ] [ Link to Dataset ]


(7) Organizations Granted Tax Exempt Status through 1023-EZ Form

This dataset contains information on nonprofits that have been granted tax-exempt status through the new 1023-EZ form, a more compact and simplified version of the original 1023 form. These data do not include organizations that filed for exempt status through the original 1023 form, nor those that filed via paper forms sent to the IRS through the mail. The forms and criteria for submitting a 1023-EZ can be found here:

[ 1023-EZ Documentation ]
[ 1023 Documentation ]

Current sample sizes are at:

2014 2015 2016
15,160 42,392 47,557


[ Data Dictionary ] [ Link to Dataset ]


Additional Open Data Resources of Note

There are some additional interesting sources of nonprofit data that have the potential to be leveraged for future research:

County Level Measures of Social Capital

Religious Congregation Data

Marc Joffe’s Federal Audit Clearinghouse Harvester

**Giving (and volunteering) in the Netherlands Panel Study (GINPS)

Notable APIs for Nonprofit Data

OpenCorporates Project

State of Indiana’s Audit Clearinghouse

Category Group Number Audited in 2015
4H-CLUB 62
ART FOUNDATIONS/CENTERS 114
BIG BROTHERS/BIG SISTERS 6
BOYS & GIRLS CLUBS 31
CAP AGENCIES 29
CEMETERY 2
CHURCH 40
CIVIC ORGANIZATIONS/CLUBS 8
COUNCIL ON AGING 43
COUNTY FAIR ORGANIZATION 21
CRISIS CENTER 48
DAY CARE CENTER 92
ECONOMIC DEVELOPMENT CORP. 128
EDUCATIONAL ORGANIZATION 185
ELECTRIC UTILITY 1
EMERGENCY MEDICAL SERVICES 34
EMPLOYMENT & TRAINING CORP. 30
FOR PROFIT CORPORATION 15
HANDICAPPED CENTER 37
HEALTH SERVICE ORGANIZATION 83
HISTORICAL SOCIETY 49
HOSPITALS 50
HUMANE SOCIETY 35
LEGAL AID 3
LIBRARY 5
MENTAL HEALTH ORGANIZATIONS 39
MENTALLY HANDICAPPED CTRS 40
ORCHESTRA/SYMPHONY/THEATRE 69
OTHER NOT-FOR-PROFIT 386
OTHER SPECIAL DISTRICT 1
PLANNED PARENTHOOD ASSOCIATION 1
RED CROSS 1
REG WASTEWATER(SEWER) DISTRICT 1
SENIOR CITIZEN CENTER 33
TOURISM & PROMOTION BUREAU 56
UNITED WAY 4
VETERANS ORGANIZATION 4
VOLUNTEER FIRE DEPARTMENT 442
WATER CORPORATION -NFP 1
YMCA/YWCA 29
YOUTH SERVICE BUREAU 33
YOUTH SPORTS ORGANIZATION 13


Authors and Contributors

If you are interested in submitting resources or building tools to support nonprofit scholarship please contact Jesse Lecy (jdlecy@syr.edu) or Nathan Grasse (nathangrasse@cunet.carleton.ca).

Special thanks to Francisco Santamarina for his meticulous work decoding the IRS XML documents to translate the data into a useful format and creating the Data Dictionary at the heart of this project.

Open Science

This project was inspired by the R Open Science initiative, which believes in making data accessible and building tools that help a research community better utilize the data. These scripts are written in the R language because it is a freely-available open-source platform that can be used by anyone.