Finding and Using Datasets for Research

Not all researchers collect new data during the course of their investigation. Many find new uses for data previously collected by other researchers, or integrate existing data and newly collected data into research projects. Data that is openly available for redistribution and reuse is called open data. Reuse of data available through open repositories reduces the need for duplication of effort, saves time and money, and adds value. For more on open data, visit the Open Data Factsheet from SPARC.

There are plenty of places where you can find datasets to use in your research. Many funders, such as the NSF, NEH, Bill and Melinda Gates Foundation, Institute for Education Sciences, and others require that data collected during funded research be made public after the project’s completion. The government also makes a lot of the data it generates publicly available. These (and other) datasets are available for reuse to reproduce results or for a different purpose. Here are just a few places to find open datasets.

ICPSR – Inter-university Consortium for Political and Social Research. The University of South Carolina is a member institution, and USC-affiliated users can download datasets. For more on creating an account with ICPSR, click here.

Re3data.org – This searchable directory lists data repositories by discipline. It’s a useful way to find the archives containing the types of data needed for specific projects.

Figshare – Search or browse datasets from a number of disciplines.

Data Citation Index – A subscription database available from USC’s University Libraries, DCI provides the Web of Science search interface to find datasets from repositories around the world.

Still want more? See additional lists here or here or here.

You’ll need to find and understand the terms of use for the dataset you want to use. Open datasets should allow for reuse and distribution by anyone. Usage terms are commonly provided through a Creative Commons license. They may require that the user provide attribution and license their resulting dataset under the same terms. More information about Creative Commons licenses can be found here.

It’s important to cite the dataset you use in order to give the researchers who collected it credit for their work. Although no formal style for data citation exists, a recommended format is:

Creator (PublicationYear). Title. Publisher. Identifier

Or, if version and type of resource are available:

Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier

Here’s an example:

Allegrucci, Giuliana; Sbordoni, Valerio; Cesaroni, Donatella (2015): Polymorphism estimates in sampled
populations of Dolichopoda cave crickets. Figshare. http://dx.doi.org/10.1371/journal.pone.0122456.t002

Many repositories provide citations for their datasets that you can copy and paste or download.

If you would like assistance finding a dataset to use in your research, I’d love to help. Please contact me at winches2@mailbox.sc.edu, 803-777-1968, or make an appointment with me at http://libcal.library.sc.edu/appointment/31854.

-Contributed by Stacy Winchester

Leave a Reply

Your email address will not be published. Required fields are marked *