The IRS maintains several important nonprofit databases to track the current population of exempt organizations, their annual 990 filings, and organizations that have closed. This data has been released in formats that are not always easy to use - ASCII text files, json files, and XML queries.
In order to make the data accessible to the research community, we have created scripts to download data from IRS websites, clean and process it, and export into familiar formats (CSV, Stata, SPSS, etc.).
We have begun the process to catalog and document these resources, and will begin sharing them through the Nonprofit Open Data Collective portal:
The most up-to-date files are posted on Data World.
But we also have some legacy files on our Dataverse site.
We have documented and posted the following open data assets:
The National Center for Charitable Statistics at the Urban Institute has opened up their data archives!
The IRS has released all nonprofit 990 tax data that has been e-filed through their online system, approximately 60-65% of all 990-PC and 990-EZ filers. It is available for years 2012 to current years with a small set of returns avaialable for 2010 and 2011. The data has been posted as XML files in an Amazon Web Server (AWS) Cloud Server. More details about the data and the push to have it made public are below.
In order to support use of this data, we have converted the XML files into a research database similar to the NCCS Core dataset.
You can download the data in CSV and RDS formats here: [ Data Dictionary ] [ Link to Dataset ]
For some background on the campaigns to open access to IRS data, see these articles and blogs:
Form 990: A Guide for Newcomers to Nonprofit Research [ LINK ]
A History of the Tax Exempt Sector: An SOI Perspective [ LINK ]
A Guided Tour of the 990 Form by GuideStar [ LINK ]
Revised Form 990: The Evolution of Governance and the Nonprofit World [ LINK ]
Wikipedia: History of the 990 [ LINK ]
Charity Navigator has created an open-source 990 Toolkit that allows you to set up an Amazon EC2 instance and clone the full IRS dataset as a relational database. You can read their press release about the project here.
Chad Kruse at SmarterGiving has a script to convert 990-PF XML files into a MongoDB database on GitHub here.
You can find some useful scripts here for running queries directly within the cloud and downloading data as CSV files, for example this GitHub gist.
If you are more comfortable in Python, check out Yash Nanavati’s GitHup repo.
There are some forums on using the E-Filer data, for example this reddit forum.
We provide an R script that builds the INDEX file (not the full dataset) for all IRS E-Filer open data provided on the Amazon Web Server. The index contains a limited number of variables such as nonprofit name, EIN, tax year, form type, and the URL link to the XML form of the 990 return data. This index file allows you to see what is available in the open E-Filer database.
[Data Dictionary] [Link to Dataset]
The IRS Publication 78 contains a list of all organizations that currently have 501(c)(3) tax exempt status and are in good standing (eligible to receive tax-deductible donations) under IRS code.
The IRS Exempt Organization Business Master File Extract (EO BMF) contains information on all active nonprofits including basic information about nonprofit location, ruling date (when they were granted tax exempt status), and activities. Note that the NTEE codes are noisy and incomplete. It is recommended to use the NCCS codes instead.
Most small tax-exempt organizations whose annual gross receipts are normally $50,000 or less can satisfy their annual reporting requirement by electronically submitting Form 990-N if they choose not to file Form 990 or Form 990-EZ instead. Exceptions to this requirement include:
The Postcard Filers dataset contains close to a million cases from the following years:
Nonprofits that fail to file 990 returns for three years have their 501(c)(3) tax exempt status automatically revoked by the IRS. This dataset contains more than 670,000 cases for the following years:
This dataset contains information on nonprofits that have been granted tax-exempt status through the new 1023-EZ form, a more compact and simplified version of the original 1023 form. These data do not include organizations that filed for exempt status through the original 1023 form, nor those that filed via paper forms sent to the IRS through the mail. The forms and criteria for submitting a 1023-EZ can be found here:
Current sample sizes are at:
There are some additional interesting sources of nonprofit data that have the potential to be leveraged for future research:
County Level Measures of Social Capital
Religious Congregation Data
Marc Joffe’s Federal Audit Clearinghouse Harvester
**Giving (and volunteering) in the Netherlands Panel Study (GINPS)
Notable APIs for Nonprofit Data
State of Indiana’s Audit Clearinghouse
|Category Group||Number Audited in 2015|
|BIG BROTHERS/BIG SISTERS||6|
|BOYS & GIRLS CLUBS||31|
|COUNCIL ON AGING||43|
|COUNTY FAIR ORGANIZATION||21|
|DAY CARE CENTER||92|
|ECONOMIC DEVELOPMENT CORP.||128|
|EMERGENCY MEDICAL SERVICES||34|
|EMPLOYMENT & TRAINING CORP.||30|
|FOR PROFIT CORPORATION||15|
|HEALTH SERVICE ORGANIZATION||83|
|MENTAL HEALTH ORGANIZATIONS||39|
|MENTALLY HANDICAPPED CTRS||40|
|OTHER SPECIAL DISTRICT||1|
|PLANNED PARENTHOOD ASSOCIATION||1|
|REG WASTEWATER(SEWER) DISTRICT||1|
|SENIOR CITIZEN CENTER||33|
|TOURISM & PROMOTION BUREAU||56|
|VOLUNTEER FIRE DEPARTMENT||442|
|WATER CORPORATION -NFP||1|
|YOUTH SERVICE BUREAU||33|
|YOUTH SPORTS ORGANIZATION||13|
If you are interested in submitting resources or building tools to support nonprofit scholarship please contact Jesse Lecy (email@example.com) or Nathan Grasse (firstname.lastname@example.org).
Special thanks to Francisco Santamarina for his meticulous work decoding the IRS XML documents to translate the data into a useful format and creating the Data Dictionary at the heart of this project.
This project was inspired by the R Open Science initiative, which believes in making data accessible and building tools that help a research community better utilize the data. These scripts are written in the R language because it is a freely-available open-source platform that can be used by anyone.