The Data Science Toolkit

Data science is an emerging field that combines data analysis with computer programming to provide practical insights for organizations and social problems.

In order to accelerate and improve the analytical process, Pete Warden created a website called The Data Science Toolkit which consisted of useful APIs and tools for with data. We have borrowed the idea to provide some tools for using data for nonprofit scholarship.

If you wish to embrace the open science movement, which values transparent and reproducible research, and if you wish to leverage new and interesting data sources for nonprofit scholarship, then you will need to start building your own data science toolkit.



polymer

Data Programming

R is one of the fastest-growing computer languages in the data science community. Learning a little R will take you a long way.

Check out R in 60 Seconds.

You can find some useful cheat sheets here.

Try this tutorial or this free intro course.





motorcycle

Data-Driven Documents

Markdown was a simple convention developed to integrate regular text with computer code so that results are embedded directly in the document, and publication-quality formatting can be applied.

See how the Urban Institute is using markdown files to automate report-generation: [ EXAMPLE ].

Check out some of the output format options R Markdown supports, such as publication-quality documents, HTML web pages, slide shows, or dashboards.

The R Markdown Cheet Sheet is a useful reference.





compare_arrows

Data API's

An Application Program Interface (API) is a standard that allows two program to speak to each other. There are many data publishers that have developed APIs that allow you send a request for data through R, and receive the data directly in R, without having to go to a website and download files, then import them into R.

Check out this tutorial for an example of how the Census API can simplify your life by allowing you to query Census datasets and return you desired set of variables aggregated by your desired geographic unit.