Skip to main content
eScience in Action

Identifying restricted data repositories supporting mediated access via data usage agreements


Abstract

In the modern era, the near impossibility of true anonymization means we must provide tangible recommendations for researchers who need to share de-identified, person-level data that could potentially be re-identified due to the presence of quasi-identifiers. This calls for data stewards to support researchers in depositing sensitive data in public repositories while still following institutional, ethical, and legal requirements.

While various repository aggregators like re3data and DataCite Repository Finder provide lists of data repositories, navigating these can be cumbersome when trying to locate options for depositing restricted data. These listings rarely include certain necessary details, making the process of recommending third-party repositories to researchers time-consuming — or even limited, and we often end up relying on a short list of well-known repositories. An additional challenge is the difficulty of identifying repositories that mediate access via data usage agreements, where the repository handles access requests to ensure potential users meet established security and privacy requirements and have taken the necessary steps to protect confidentiality and commit to appropriate data use.

The need to provide tangible recommendations to help researchers deposit data in public repositories while still protecting individual privacy served as the inception to this project to identify and create a spreadsheet of restricted data repositories with mediated access processes for researchers. This practical solution empowers data sharing while upholding essential ethical and institutional privacy requirements and, while currently limited to US based social sciences repositories, in sharing this resource, we hope others will continue to contribute and expand this work. 

Keywords: restricted data repositories, data usage agreements

How to Cite:

Oberlies, Mary K., and Megan Potterbusch. 2026. “Identifying restricted data repositories supporting mediated access via data usage agreements.” Journal of eScience Librarianship 15 (1): e1165. https://doi.org/10.7191/jeslib.1165.

Rights:

Copyright © 2026 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

59 Views

10 Downloads

Published on
2026-01-30

Peer Reviewed

d9925a9e-5463-4bce-9841-c0004903285f

Introduction

Responding to research requests for assistance with locating data repositories is an important element of research data services. To effectively respond to scenarios like, “I want to (or must) share my data, but it can’t be fully anonymized and remain valuable for research,” it is necessary to determine what repository options exist for sensitive data. A call from the Data Services Continuing Professional Education (DSCPE) program for capstone projects provided the opportunity to create a list of data sharing repositories, which allow for the sharing of restricted data via data use agreements (DUA) mediated by the repository itself. The DUA review process typically ensures users have a genuine research need, are appropriately trained in privacy/security, and have access to the appropriate infrastructure to safely manage the data.

Project Criteria

Without a DUA process implemented by a data repository, data are typically shared either with no mediation or with basic researcher mediation — via requests for access being facilitated by a fully automated process within a repository or via email. Both of these put a lot of responsibility on the corresponding author or data depositor and are largely inappropriate for data containing sensitive variables. DUAs are not perfect solutions, nor are they the only way to achieve properly mediated access to sensitive data, as they are a friction point for data access that can create significant workloads that do not always align with staffing. DUAs, however, are a standard mechanism for facilitating data sharing that respects terms and conditions. They allow the review of requests to be taken on by repositories designed for this kind of data access, instead of leaving the onus on individual researchers and their institutions.

The first step to generating a spreadsheet of data repositories that accept sensitive data and offer mediated DUAs was brainstorming known repositories like ICPSR, QDR, Figshare, and Vivli, followed by identifying lists of data repositories compiled by various agencies, institutions, and publishers. The following data repository lists were reviewed:

  • OSF Approved Protected Access Repositories

  • NIH Generalist Repositories

  • Simmons Data Repositories

  • Linguistics Data Consortium

  • Springer Nature Recommended Repositories

  • CESSDA

Repositories listed on these sites that were marked as accepting sensitive data were added to a spreadsheet for further investigation. Based on the stakeholder needs of the host institution, repositories needed to be based in the United States and have a social science or multidisciplinary focus.

Creating a Repository List

The initial review of the repositories focused on confirming the repository name, website address (or URL), and whether it accepted sensitive data. Review then moved to determining each repository’s disciplinary focus, noting whether it provided a mediated DUA, and creating a repository description. Some resources listed in these aggregators were defunct, did not accept data deposits, or were datasets rather than repositories. For others, it was not initially possible to determine the type of data protection process used by the repository. This required additional investigation into the repository documentation, and in some cases, reaching out directly to the repository administrators proved necessary.

Ultimately, 16 US-based data repositories that accept sensitive data within the social sciences and offer mediated access via DUAs were identified and organized into a spreadsheet. To improve the usability of the spreadsheet for data services librarians assisting researchers, additional details were needed. This led to including information about whether there were costs to deposit and/or access data, the file formats accepted, the curation requirements, and the process to deposit data. This resource is available on OSF at https://osf.io/k9u5x for others to expand on and use in the course of their own work.

Documenting the Process

In addition to creating the spreadsheet of data repositories, each step of the process was documented. This documentation will help anyone who wishes to expand the resource to understand the decisions taken, the current organizational structure, and general terminology. Establishing general terminology addresses a fundamental need across data repositories, as repositories lack a shared vocabulary, which proved particularly challenging during this project when parsing out important details about DUAs and restricted data. To overcome this issue, a glossary of terms was included in the documentation to create a shared vocabulary for anyone using the resource.

Project Limitations

While re3data, DataCite and FAIRsharing are all popular and extensive lists, the quantity of repositories listed in them exceeded the time parameters of the project. The Restricted Access Repositories spreadsheet also only provides additional information for repositories matching the project criteria, which limits the usefulness of the spreadsheet for researchers outside the social sciences. In addition to reviewing lists of repositories excluded in this project, there is significant value to be added to this resource by expanding the details related to science, humanities, and medical research data sharing, which were generally out of scope for the initial iteration of the project.

Conclusion

In our repository review, ICPSR, a generalist repository with mediated data deposit, stood out as an excellent repository to consider for social science researchers looking to archive sensitive data. The other data repositories outlined in our resource are varying degrees of discipline specific, which would be great choices if a researcher’s dataset aligns with their focus.

Although many researchers never need to deposit sensitive data, for those who do, resources that support this need are pivotal for their success. Funders that require data sharing are no longer as quick to accept to the impossibility of sharing sensitive data and instead are looking for researchers to make decisions that will support sharing in some way: be that by developing fully anonymous datasets or by depositing sensitive data into a repository with appropriate policies, processes, and security infrastructure to support proper management of the data.