Introduction
Repositories have become an essential part of a robust research data management strategy. Depositing research data in a good quality repository is considered a good practice in general (Briney et al. 2020; Goodman et al. 2014), because it contributes to the Findability, Accessibility, Interoperability and Reusability (Wilkinson et al. 2016) principles of research data (Stall et al. 2019), and leverages the value of research by allowing others to reuse the data for new purposes (Downs 2021). Many research funders require or strongly encourage depositing research data in repositories. For example, the National Institutes of Health (NIH) in the United States “strongly encourages the use of established repositories to the extent possible for preserving and sharing scientific data” in its Policy for Data Management and Sharing (National Institutes of Health 2023). Similar policies are enacted by many funders throughout the world. Research also suggests that depositing data in repositories has benefits for researchers in terms of citation count and increased exposure and discoverability (Demetres et al. 2020).
In the last few decades, the number of repositories available to deposit research data has increased rapidly worldwide. The Registry of Research Data Repositories, re3data.org, one of the most widely used research data repository registries in the world, listed 400 repositories in 2013, a year after their inception (Pampel et al. 2013). At the time of this writing in August 2025 there are 3415 listed repositories in this same registry. For a researcher, finding the appropriate repository for their data is an important task. Repositories make data findable, provide access to the files, and ensure that the data and the metadata are captured appropriately so that the data can be reused effectively and preserved long term. Generally, discipline-specific repositories are recommended, because they make the data more findable and document, describe and organize the data using the discipline’s established standards and formats. However, researchers often need to deposit their data in generalist repositories if a repository for their discipline is not available or practical, and so discipline-agnostic repositories are essential to ensure that all research can be shared, documented, and preserved appropriately. Two-thirds of the repositories in the re3data.org registry are disciplinary (Khan et al. 2024), and the rest are a mix of multidisciplinary repositories, governmental repositories, institutional repositories, and others that are generally discipline agnostic.
Given that researchers can choose from a variety of repositories to store their data, in this article we seek to better understand how researchers make that decision, what they value about their chosen data repository; and to apply this information to inform our repository development. In particular, we are interested in understanding the reasons that lead a researcher to deposit their data into an institutional repository instead of other commonly used repositories such as the GREI repositories. GREI repositories refer to the repositories identified by the Generalist Repository Ecosystem Initiative (GREI) — Dataverse, Dryad, Figshare, Mendeley data, Open Science Framework, Vivli, and Zenodo. These repositories all host hundreds to thousands of datasets, are well-funded, generally considered trustworthy, and offer a variety of services (Stall et al. 2023). Institutional repositories (IRs) are repositories that support a specific community linked to a particular institution (for example a research center or university). IRs can be research data repositories, but they can also be repositories that accept other types of content, like articles or presentations. Often, these repositories are smaller and have limited resources and staff and cannot offer the same features and functionality as the GREI repositories. At the same time, the fact that they are smaller and that they have a better knowledge about the needs of the community they serve allows them to offer services that are more personalized than what the GREI repositories can provide (Callicott et al. 2016).
In order for institutional repositories to maximize the impact of their limited resources, it is important that they understand their own strengths, so that they can emphasize characteristics or services that make them especially valuable to users. It is also important for IRs to recognize the features that repository users appreciate when they decide where to deposit their data, so that they can make sure that these features are present and well-supported. In this article, we attempt to answer these questions for Oregon State University and its institutional repository, ScholarsArchive@OSU. Oregon State University Libraries and Press (OSULP) has actively collected datasets in its institutional repository since 2015. As stated in the Research Data Curation Policy for the IR, “The primary purpose of hosting data in ScholarsArchive@OSU is to facilitate data sharing for the purposes of data reuse and contributing to open science, and to preserve the data” (Oregon State University Libraries and Press 2023). To date, Oregon State University (OSU) students, faculty, and staff have submitted over 200 datasets to the repository. OSULP joins many university libraries in devoting crucial personnel and infrastructure to curate datasets and support researchers in understanding where and how to both preserve their data and make it openly available.
Our hypotheses are:
Researchers are making informed, deliberate decisions about where to deposit their data.
ScholarsArchive@OSU, as an institutional repository, is filling a need different from other existing discipline-agnostic repositories.
The strength of the OSU institutional repository is the services offered rather than the software.
To test these hypotheses, we identified researchers from Oregon State University who had deposited at least one dataset into a generalist data repository and asked them how they chose where to deposit their data via a survey and interviews. To the researchers who had deposited into OSU’s repository, ScholarsArchive@OSU, we also asked about their experience.
Ultimately, the goal of this work is to use the information about researchers’ needs and wants regarding institutional repositories to make decisions. In order to generalize our findings, which represent the opinions of the users of only one repository, this article proposes three “personas” that can be used by others as a guide to make design choices and establish priorities for their institutional repositories.
Literature Review
Choosing a good data repository is an essential step to ensure the discoverability and reusability of research data. With differing requirements from funders and publishers, the criteria to identify high-quality, trusted repositories can sometimes be confusing, but there have been multiple efforts to generate recommendations and guidance to evaluate the quality of repositories. We will summarize here the guidance from three repository quality guidelines and standards that have been most relevant to the development of Oregon State University’s institutional repository (Key and Llebot 2023): the FAIR principles, the CoreTrustSeal certification, and the document from the Office of Science and Technology Policy outlining desirable repository characteristics for federally funded research. However, there has been a lot of work done on this topic, and there are other existing efforts worth mentioning, such as the TRUST principles for data repositories (Lin et al. 2020) the COAR Community framework for best practices in repositories (Confederation Of Open Access Repositories 2020) or the work of Cannon et al. (2021).
The FAIR principles (Wilkinson et al. 2016) have emerged as the global guidance for sharing research data and are widely used among research data professionals. The FAIR principles have been used not only for the data itself, but also to guide the development of a FAIR ecosystem that comprises research infrastructure and policy (Koers et al. 2020a), including defining the characteristics of good quality repositories. A couple of years after the publication of the FAIR principles, Dunning et al. (2018) examined how already existing repositories needed to be adjusted to adhere to the FAIR principles, and described a few concrete actions repositories can take in pursuit of encouraging FAIR data principles — having a policy for persistent identifiers; curating metadata to ensure reusability, as well as checking for metadata standards; and maintaining clear and open licensing and use cases. The FAIR principles are broad and aspirational by nature, so it is complicated to determine exactly how FAIR repositories are almost 10 years after the publication of the FAIR principles, but a lot of work has been done to define how the FAIR principles apply specifically to repositories (Koers et al. 2020b). One of the most useful initiatives is the FAIRsharing resource (Sansone et al. 2019), an effort to document and organize the information about repositories (and also standards and policies) necessary to evaluate how they align with the FAIR principles. For example, by searching for a repository in FAIRsharing one can learn the organizations that maintain and fund it, the standards used by the repository, tools that it supports, or the policies that govern or affect the use of the repository.
Another way of assessing the quality of repositories is using certification schemes. The landscape of available certifications has evolved in the last two decades (Downs 2021), but at the time of writing this document there are two major certification mechanisms: The ISO 16353:2025 Audit and Certification for Trustworthy Repositories (International Organization for Standardization 2025), and the CoreTrustSeal (CoreTrustSeal Standards and Certification Board 2019). The first one involves a full external audit of the repository, while the second one is a self-evaluation that is reviewed by peers and relies on publicly available evidence. The CoreTrustSeal certification includes criteria related to the organizational infrastructure of the repository (e.g., mission and scope, governance, expertise of staff, etc.), digital object management (e.g., preservation plan, quality assurance, reuse), and information technology and security (e.g., storage and integrity) (CoreTrustSeal Standards And Certification Board 2022).
For repositories in the United States, such as ScholarsArchive@OSU, it is also useful to look at the criteria that federal funders expect out of repositories. In 2022, the White House issued a guidance document, the “Desirable characteristics of data repositories for federally funded research” (The National Science and Technology Council 2022), to ensure that all federal funders have consistent criteria when recommending or mandating repositories to share and preserve their research data. The document includes 14 criteria for all repositories and has seven more considerations for repositories storing human data. The characteristics outlined in the document are used by each federal agency to guide their recommendations for researchers. For example, the National Institutes of Health in the United States refer to this document but also provide a list of NIH-supported repositories (Repositories for Sharing Scientific Data | Data Sharing, n.d.), and a list of generalist repositories considered to be of high quality (Generalist Repositories, n.d.).
This literature review shows that a lot of work has been done to define the quality of repositories. It is not clear, however, if researchers use these criteria to decide where to deposit their data, or if they take into account different considerations. We know that academia rewards specific actions and workflows that can affect the choice of repositories. For example, surveyed researchers (Cragin et al. 2010; Tenopir et al. 2015) express the importance of receiving acknowledgements and citations from the data they publish, supporting the importance of permanent identifiers and clear licensing in data repositories emphasized in repository quality guidelines. But they also value the ability to embargo data to ensure manuscript publication before releasing the underlying data, which is not necessarily a characteristic of a good quality data repository defined by the aforementioned bodies. Researchers surveyed about data repositories and sharing do not appear to be aware of metadata standards (Cragin et al. 2010), even while describing concerns regarding data misuse that could be mitigated by clear and extensive metadata, highlighting another difference in views between data repository certification and researchers using data repositories. Concerns related to a possible misuse or misinterpretation of shared data have also been captured by Tenopir et al. (2015). In a study about trust for data repositories (Yoon 2014), researchers described valuing that the repository presents data accurately and is transparent about any data limitations (validity). They also trusted the honesty of the repositories, that they are not trying to intentionally mislead the user (integrity). Components that were important for these participants were organization attributes, the repository process, user communities, their own past experiences, and their perception of the role of the repository. This study, however, focused on trust from the point of view of a data user, rather than the point of view of the depositor, so it does not necessarily represent how a researcher chooses a repository to preserve their data. Amorim et al. (2015) evaluated different data repositories’ attributes based on stakeholder needs; however, the authors concede that this study did not consider usability or specific discipline acceptance in their evaluation, likely another important facet of researcher repository choice.
Donaldson and Koepke (2022) evaluated scientists’ perceptions of repositories, asking which features they think are necessary to include in data repository systems and services to help researchers implement data sharing and preservation. Participants in the study identified metadata control (they wanted high quality metadata, metadata appropriate for their discipline, machine readability), data traceability (researchers wanted to know how many researchers view, cite, and publish based on the data they deposit), versioning (researchers wanted a repository to inform users about their data being updated), explanation of uses that are permitted for each dataset in a repository (CC and Open Access licenses, but also restrictive or proprietary licenses), security, and stable infrastructure as important.
Given both scientists’ perceptions of repositories and published criteria for good-quality repositories, what is the role of the generalist institutional repository in the ecosystem? Callicott et al. (2016) describe institutional repositories as a backup plan for some researchers to store their data when a discipline-specific repository is not available. For other researchers, institutional repositories are a way to contribute to their campus’s open-access resources and contribute scholarship to the university (Halder and Chandra 2012). Indeed, Akers and Green (2014) posit that the personalized services that institutional repositories can offer provide an accessible way to support researchers in sharing their data that may otherwise have remained unpublished. Overall, these studies are few and far between, providing a clear opportunity for us to evaluate how researchers are served differently by institutional repositories and GREI repositories.
Methods
To better understand how researchers at OSU made decisions regarding their selection of a data repository, we asked researchers to answer a survey. We later interviewed some of the survey responders in depth.
OSU’s Institutional Review Board reviewed the survey and interview protocol. It was determined not to meet the definition of human subjects’ research under the regulations set forth by the Department of Health and Human Services 45 CFR 46. The decision was influenced by our assertion that the primary goal of this study was to improve ScholarsArchive@OSU.
Survey Methodology
The survey, created in Qualtrics, had three parts. The first one asked the researchers to what extent repository features, services, and other considerations affected their decision to choose a repository. The list of repository characteristics was created with the help of the “Desirable characteristics of data repositories for federally funded research” (NSTC 2022), whose characteristics influenced the organization and structure of survey questions. In addition, we used a rubric created by Donaldson and Koepke (2022) to validate the themes and characteristics informed by the NSTC documents and to help create additional questions. The second and third part covered user experience and the researcher’s awareness and opinions of ScholarsArchive@OSU.
Potential participants were authors of datasets that met the following criteria:
Deposited in one of these discipline-agnostic open access repositories: Dataverse (specifically, Harvard’s shared instance), Dryad, Figshare, Mendeley Data, Zenodo, ScholarsArchive@OSU
Publicly available files and metadata
Deposited between July 2019 and September 2023
Resource type “dataset” or comparable
At least one author affiliated with OSU
We narrowed the list of discipline-agnostic repositories from the list of Generalist Repository Ecosystem Initiative (GREI) repositories (National Institutes of Health, n.d.), excluding Vivli for being specific to clinical research, and excluding OSF because its metadata structure did not allow us to identify OSU affiliated datasets.
Information from OSU’s directory or the public dataset documentation was used to match names with email addresses. The researcher and respective dataset(s) were removed if we could not find a usable email address. The final number of researchers and datasets identified is shown in Table 1.
Table 1: Datasets included in the study. Further broken down into repositories.
| Unique OSU Researchers (by name) | 533 |
| Unique OSU Research Datasets (by identifier) | 692 |
| Datasets in Dataverse | 46 |
| Datasets in Dryad | 169 |
| Datasets in Figshare | 76 |
| Datasets in Mendeley Data | 15 |
| Datasets in Zenodo | 311 |
| Datasets in ScholarsArchive@OSU | 79 |
The Data Repository Survey was sent via Qualtrics to the 533 identified OSU researchers during the Fall 2023 academic term. 71 participants were removed due to email error, or by the participants themselves. The survey was open for five weeks, with two reminder emails sent to encourage participation.
The full questionnaire and a more detailed look at our participant identification methodology can be consulted in our initial report on the survey responses (Key et al. 2024a) and the related dataset (Key et al. 2024b).
Interview Methodology
Survey participants were invited to opt-in to interviews at the end of the Qualtrics survey. Seventeen participants volunteered; from the set of volunteers, we scheduled ten interviews, which took place remotely via Zoom between December 2024 and February 2025.
Transcripts of live interviews were generated by Zoom's built-in transcription feature. A research team member encoded the transcripts into structured text using the “InTEIrviews” Text Encoding Initiative (TEI) customization developed by Puren and Cafiero (2024). Two team members made corrections to the automatically generated text both before and during the encoding step, using playback of audio recordings to verify correction decisions. The encoder marked sections of interview dialog reflecting single uninterrupted thoughts as “segments” and used the structured text approach to assign segment-level attributes including an interview ID, the speaker (anonymized for interviewees), and the question number being answered. The encoder deemed some segments as not relevant to the study and flagged these accordingly. The resulting set of combined, relevant, encoded and corrected text was then exported to a CSV, with one row per segment, for collaborative data coding.
Two research team members independently coded the ten interviews using Google Sheets, with each coder adding up to three topical codes per segment, for a total of up to six possible topic codes per segment. We based the set of codes on dimensions we asked about in the survey. Additional codes were assigned based on recurring themes in the interviews.
After the first round of independent coding, we compared codes between coders and across all interviews. Where it was unclear, we clarified the meaning of our respective codes. Here we ended up with two sets of codes, one that captured general topics and the other reflecting more nuanced concepts. The tables in the results section reflect the general topics. We refer to the nuanced topics in the text when they are useful to enrich the discussion. A code dictionary for both sets of codes is provided in a related dataset (Key et al. 2026).
Next, we undertook a second round of coding, which involved a more structured approach in which we identified categories and a subset of attributes rather than considering all codes topical. Categories [and attributes] included Hypothesis [1, 2, 3, none]; Hypothesis Support [supports, does not support, partially supports, undetermined]; and Sentiment [positive, important, neutral, unimportant, negative].
Once round two coding was complete, we used Python with Pandas to analyze the coded data. The complete dataset includes categories and codes that we do not discuss in this article due to space limitations. This dataset, as well as the interview instrument, the scripts used to analyze the results, and the quotes used in this article can be found in Key et al. (2026).
Results
This results section shows the overall survey and interview results for each hypothesis, as well as responses related to potential repository feature development, addressing the goals of this research. For a more detailed description of survey results see our report Key et al. (2024a). The datasets with survey and interview results can be found in Key et al. (2024b) and Key et al. (2026), respectively.
The survey received a total of 53 responses. Ten of the responses were incomplete, and three responses did not meet the initial screening criteria. Overall, we had 40 usable responses, which represents a 7.5% response rate.
The 10 interviews were broken up into a total of 412 unique relevant segments. Of these segments:
169 were interpreted as relevant to Hypothesis 1;
43 were interpreted as relevant to Hypothesis 2;
43 were interpreted as relevant to Hypothesis 3;
91 were interpreted as relevant to repository development;
92 were not interpreted as relevant to any of the above.
Note that a small number of segments were identified as relevant to more than one hypothesis.
While our second and third hypotheses relate to assessing the role of our institutional repository specifically, it should be noted that ScholarsArchive@OSU depositors made up a small percentage of both survey and interview participants. Just 25% of survey participants (10 of 40) responded to the survey based on a specific dataset deposited to the IR, and only one additional respondent (11 of 40) indicated they had deposited one or more datasets to the IR at any time. We saw similarly low levels of ScholarsArchive@OSU familiarity among interview participants. Of 10 participants:
Two had exclusively deposited data to the IR;
Two had deposited data to the IR at least once in the past but typically use other data repositories;
Three had only used the IR to deposit other types of scholarship;
Three were not aware of the IR.
Hypothesis 1 Results
Survey responses and interview segments associated with Hypothesis 1 (H1) speak to whether researchers are making informed, deliberate decisions about repositories, and provide evidence for the types of strategies that researchers use to select a data repository for depositing their data.
Survey
We asked survey participants what strategies were used to make their decision about where their research data was deposited. As shown in figure 1, “convenience” was the highest-ranking factor for external depositors, while convenience and institutional trust were tied for ScholarsArchive@OSU users. Surprisingly, “repository characteristics” was selected by less than half of our participants, opposing our hypothesis.

Figure 1: Strategies used by researchers to choose a data repository.
In addition to strategies, the survey asked how much researchers valued different features provided by data repositories. Overall, our participants valued the basics; the highest-ranking features were affordability, discoverability, DOI assignment, safety or reliability, and ease of use. The features with the least enthusiasm were peer-review workflows, embargoes, usage metrics, integration with other data management tools, and versioning.
Interviews
Overall we saw strong support among our interview participants for at least a moderate level of deliberate and informed decision-making regarding repositories, as shown in table 2.
Table 2: Support for Hypothesis 1 on whether researchers are making informed, deliberate decisions about repositories.
| Support for H1 | Count | Percentage |
| Supports | 107 | 63.31% |
| Does not support | 19 | 11.24% |
| Partially supports | 24 | 14.20% |
| Undetermined | 19 | 11.24% |
Examining the frequency of topical codes, and especially those associated with supportive segments, gives insight into the repository characteristics most salient to participants’ decision-making strategies when depositing their data. Because of the qualitative nature of analyzing interviews, some topical codes appear in segments associated with different support assessments with regard to making “informed and deliberate” choices, depending on the context of the participant’s comment. For instance, “ease of use” was assigned to one participant’s description of evaluating a potential repository but deciding not to deposit there because it was too difficult to use. This segment was coded as “ease of use” and supporting Hypothesis 1. Other participants indicated they continue using a familiar repository because they appreciate its ease of use, suggesting they were informed enough to assess its usability, but may not have considered more complex repository characteristics. These types of comments were coded as “ease of use,” and partially supports Hypothesis 1. Undetermined was coded for comments where there was no clear support for or lack of a given hypothesis.
The most frequently occurring general topical codes (total occurrences >= 10) in H1-related segments appear in table 3. Note that some topical codes, including “open_science” and “student_research,” reflect the segment’s context rather than the selection criteria. Both of these contextual codes appeared frequently as interviewees described overall strategies for data sharing. The “student_research” code was often associated with descriptions of ways that a graduate student or early-career researcher might approach data sharing differently from more experienced researchers. The “open_science” code tended to appear as participants commented on the inherent value of data sharing.
The most frequent nuanced topical codes (total occurrences >= 6) are academic_discipline (12), ease_of_use (9), cost (6), and metadata_standards (6). These codes are associated with participants' comments about relying on their colleagues’ chosen repositories as a decision factor or express participants' cost considerations. Participants also discussed the amount of rigor or time needed to complete a metadata schema for a deposit, which we interpret as a variant of ease of use. We note that participants did not talk about the quality or appropriateness of a given metadata schema. These contextual nuanced codes further support that participants’ selection decisions are minimally deliberate or informed.
Table 3: Code Assignments in Hypothesis 1 Segments.
| Code | Total occurrence | Supports | Does not support | Partial support | Undetermined |
| ease_of_use | 20 | 13 | 0 | 6 | 1 |
| student_research | 13 | 3 | 4 | 3 | 3 |
| appropriateness | 12 | 9 | 0 | 1 | 2 |
| open_science | 12 | 8 | 0 | 1 | 3 |
| commonly_used | 11 | 9 | 0 | 2 | 0 |
| flexibility | 10 | 8 | 0 | 2 | 0 |
| prescribed | 10 | 1 | 6 | 2 | 1 |
| metadata | 10 | 9 | 0 | 1 | 0 |
| trust | 10 | 9 | 0 | 0 | 1 |
Hypothesis 2 Results
Survey responses and interview segments associated with Hypothesis 2 (H2) speak to whether the institutional repository is filling a different need from other generalist repositories.
Survey
There are significant differences between IR users and external repository users when analyzing survey results. IR users were more interested in having staff available for help with deposits (50% for IR users vs 6% external depositors), and they were more likely to cite trust or institutional affinity as a factor in choosing a repository (80% IR users vs 30% external depositors). Satisfaction with IR users was also higher, with 100% of users recommending the repository to others versus 66% for external repository users. IR users differed from external depositors in being uninterested in specific integrations of the repository with other tools, robust support of peer review workflows, or versioning functionality (see table 4). We can conclude that IR users are looking for a repository that provides staff guidance and a trustworthy reputation.
Table 4: Percentage of users answering 0 as to whether they cared about repository features (scale of 0-3).
| Repository feature | IR users | External depositors |
| Repository staff are available for questions | 0% | 62% |
| Repository staff provide curation services | 22% | 67% |
| Integration with other tools (such as GitHub) | 89% | 38% |
| Support of peer review workflows. | 100% | 62% |
| Versioning | 89% | 38% |
Interviews
We saw support for Hypothesis 2 in interviews, with 72% of H2-coded segments marked as supportive (see table 5). In supportive segments, participants articulated explicit benefits to using the institutional repository, such as institutional affiliation, trust, or the ability to collocate data with related outputs such as publications, presentations, and theses. One participant expressed appreciation for institutional branding, while others were not motivated by this. Another expressed curiosity about how institutional repositories are funded. Participants also expressed appreciation for features that ScholarsArchive@OSU provides, including staff support and curation, file format and size flexibility, and zero cost for deposits; each of these features may be available in other discipline-agnostic repositories, but the IR is rare in offering them in combination. The potential to leverage the instructional aspect of the IR and related data support services as part of a broader open science or data literacy teaching opportunity surfaced as a recurring theme in multiple interview conversations as well.
In one unsupportive segment, a participant expressed that an explicit benefit of their preferred external discipline-agnostic repository was name recognition. As they phrased it, “It's not an institutional repository. [...] Not that you couldn't add somebody to something at ScholarsArchive@OSU that wasn't at OSU, but it's a bit different when it's a sort of agnostic place but still strongly connected to the research community.”
Table 5: Support for Hypothesis 2 on whether the institutional repository is filling a different need from other generalist repositories.
| Support for H2 | Count | Percentage |
| Supports | 31 | 72.09% |
| Does not support | 1 | 2.33% |
| Partially supports | 1 | 2.33% |
| Undetermined | 10 | 23.26% |
The most frequently occurring topical codes overall (total occurrences >= 5) in H2-related segments appear in table 6. For H2, the top nuanced codes were repository_staff_support (6), brand_recognition (3), repository_funding (3), and student_research (3).
Table 6: Code Assignments in Hypothesis 2 Segments.
| Code | Total occurrence | Supports | Does not support | Partial support | Undetermined |
| staff_support | 9 | 9 | 0 | 0 | 0 |
| student_research | 6 | 4 | 0 | 1 | 1 |
| instruction | 6 | 5 | 0 | 0 | 1 |
| curation | 5 | 5 | 0 | 0 | 0 |
| open_science | 5 | 5 | 0 | 0 | 0 |
Hypothesis 3 Results
Survey responses and interview segments associated with Hypothesis 3 (H3) speak to whether the human services provided to IR depositors are valued by participants, as opposed to its software features. We utilized an expansive context for “services” that includes the curation, consultations, and other help that repository staff provide one-on-one to researchers, as well as written policies and guidelines that we maintain.
Survey
As we have already seen in Hypothesis 2, survey respondents who had deposited to the IR ranked interactions with repository staff as more valuable, particularly the curation services that we offer. The interviews support and add nuance to these results.
Interviews
Given the low frequency of IR depositors among our participants, we included segments in the H3 analysis that provided evidence of the value (or not) of the kind of services offered by the IR, even if the participant was speaking about their experience using a different repository.
Again, we saw overall support for Hypothesis 3 in interviews, with 74.42% of H3-coded segments marked as supportive (table 7). In supportive segments, participants cited benefits of working with repository staff to deposit data; receiving useful guidance from staff regarding things like file formats; seeing improvements to data or metadata quality as a result of curation — or the opposite, the prevalence of poor-quality data in data repositories without curation; appreciation for IR deposits being free of cost; and having their experience lead to other types of research data service assistance such as help with data management plans or exposure to open science education. In segments marked as not supportive, participants expressed that data curation or review by repository staff was unnecessary, ineffective, or created too much burden for the depositors in terms of time and effort. Two participants expressed that they find curation services valuable but at the same time have been frustrated by a slower turnaround time for deposit approval.
Table 7: Support for Hypothesis 3 on whether the human services provided to IR depositors are valued by participants, as opposed to its software features.
| Support for H3 | Count | Percentage |
| Supports | 32 | 74.42% |
| Does not support | 3 | 6.98% |
| Partially supports | 7 | 16.28% |
| Undetermined | 1 | 2.33% |
The most frequently occurring general topical codes overall (total occurrences >= 5) in H3-related segments appear in table 8. Top nuanced codes (total occurrences >= 3) mirrored the general codes and context of the general codes: repository_staff_support (12), data_curation (5) and unmediated (3).
Table 8: Code Assignments in Hypothesis 3 Segments.
| Code | Total occurrence | Supports | Does not support | Partially supports | Undetermined |
| staff_support | 17 | 15 | 2 | 0 | 0 |
| curation | 15 | 12 | 1 | 2 | 0 |
| time | 6 | 1 | 0 | 5 | 0 |
| data_curation | 5 | 4 | 1 | 0 | 0 |
| metadata | 5 | 4 | 0 | 0 | 1 |
Repository Feature Development Results
Seeking to identify potential improvements we could make to data services in the IR, we dedicated a portion of each interview to asking whether participants would see value in development of specific features, including:
Support for peer review workflows for unpublished datasets (coded as “pr_workflow” in table 9 below)
Statistics or metrics about data use, reuse, and citations (coded as “metrics”)
Support for large datasets (coded as “large_data”)
Automatic DOIs immediately upon deposit (coded as “dois”)
For participant responses in this portion, we coded segments along two distinct lines of participant sentiment — whether they generally liked the idea of the feature (positive/neutral/negative), and whether it would influence their repository choice (important/unimportant). Overall participant sentiments about proposed features are shown in table 9.
Table 9: Sentiments Regarding Proposed Features
| General topical codes | Influence repository choice | Participant sentiment | |||
| Important | Unimportant | Positive | Neutral | Negative | |
| pr_workflow | 3 | 8 | 1 | 10 | 3 |
| metrics | 9 | 10 | 6 | 6 | 1 |
| large_data | 8 | 4 | 1 | 5 | 2 |
| dois | 4 | 8 | 7 | 8 | 1 |
Even for features where participants reacted positively to the idea, there was little clear mandate to spend resources developing any of our proposed features for ScholarsArchive@OSU; most responses could be interpreted as “X would be nice to have but isn’t necessary.” For DOIs, participants seemed to concur that receiving a DOI is a critical data repository service (the strong positive response), but that they do not need to receive a working DOI immediately upon deposit. The only feature with a clear indication of usefulness was large dataset support — which the IR already provides, though improvements are warranted. An unexpected theme during these discussions was the lack of interest in views or even citations of researchers’ deposited datasets; multiple participants stated that they had not considered using such metrics in their departmental evaluations or to support their tenure cases. Regarding inclusion of usage statistics in a promotion and tenure dossier, one interviewee said, “Yeah, it's nice, but I don't really pay that much attention to it. I think we're more concerned with citations on the article, right? And like, I mean, it's nice to see that it's been downloaded a hundred times or whatever, but yeah. It's not something that I frequently look at or, like, is part of my [...] annual reviews or anything.” Another told us, “We haven't seen anybody do it in our world yet, but that doesn't mean that it's not a good idea. I thank you for suggesting the idea to me because I would not have thought of it on my own.”
Apart from the features that we asked about specifically, other features that were mentioned as prospects for repository development include discovery/findability of datasets, ease of use for the site, and better support for researchers to build collections or otherwise collocate their materials. Adjacent to the discussion of software feature development were other potential repository improvements, including suggestions that we should increase outreach to promote awareness of the service and, again, engage in instruction related to data sharing practices.
Discussion
In general, our collected data shows moderate support for our hypotheses. Survey and interview results suggest that the aspects researchers value in research data repositories are less complex than we as repository or data service managers may imagine. Convenience, recommendations by peers, trust in the repository, ease of use, and familiarity appear to be the main criteria used for choosing a repository. Researchers rarely mentioned the criteria outlined by funders or best practices (e.g., retention policies, metadata, persistent identifiers, formats, etc.), and they never mentioned certifications like the CoreTrustSeal as a way of developing trust in a repository.
This research does not focus on the reasons why researchers share their data in repositories. However, the topic emerged in many of the interviews, and we heard researchers express philosophical support for open science; while some researchers certainly do share datasets because doing so is required for a traditional publication, many were explicit about the inherent value in making their data (or other scholarly content) available for future researchers. Some participants commented in the interviews that data management trainings had made a big difference in how they share their data, and their repository use.
The fact that we believe that this work validates Hypothesis 2 has implications for the direction and management of the repository. It is clear that an IR will not be competitive for researchers who are most concerned about their existing disciplinary norms and expectations to use particular external repositories, and we would do well to continue to maintain knowledge about discipline-specific repositories so that our research data services program can make appropriate recommendations. The IR also may not be competitive for researchers who are looking for specialized features in generalist repositories, such as integrations with tools (like the integration of Zenodo with GitHub). Repository staff must be ready to recognize these needs and recommend other generalist repositories to researchers.
Ultimately we saw support for Hypothesis 3, that the value of the IR is in its services (particularly curation and leveraging teaching opportunities). We also saw a lack of a strong call for improvement, described in the results for repository feature development. Furthermore, we saw evidence of researchers using decision criteria such as trust, flexibility, cost, and convenience in Hypothesis 1 results. Together these suggest that the niche for the IR as a data repository is this particular combination of offerings: enough functional features to effectively share data, flexibility to accept data in a relatively broad range of shapes and sizes, a set of metadata to describe it at sufficient detail, with personal curation services and human support, backed by the presumed continuity of university governance, available at no cost to the researcher.
Personas
One of the main motivations of this study is to strengthen ScholarsArchive@OSU. We want to make sure that we use our limited resources to support the users who need it most, and who will get the most out of the IR, given its strengths and limitations. After hearing from 40 researchers about their needs and opinions about generalist repositories, we have a better idea about what researchers value about repositories, and who our repository best serves. To formalize these findings and to make them more actionable for OSULP and other institutions, we propose the creation of three personas that represent users whose needs fit well with the services offered by our IR.
Personas are a design tool that has been commonly used by developers to design computer applications, which are easy to use without having deep expertise on usability concepts (Cooper 2004). These imaginary, archetypical people have characteristics, goals and needs that summarize the information obtained through this research and our experience as curators and managers of ScholarsArchive@OSU. Personas can be used to guide decisions and priorities. Even though these personas have been developed with ScholarsArchive@OSU in mind, they may be useful for other institutional repositories of similar characteristics.
The Open Data Champion
The Open Data Champion is a mid- to late-career researcher who is knowledgeable about open science and data sharing. They advocate for it and practice it. Their motivation for sharing data is philosophical. They are aware of best practices and recommendations, but they are also open to learning new open data concepts and appreciate interacting with data professionals in repositories. The Open Data Champion has used several repositories in their career, including discipline specific and generalist repositories. They see value in institutional branding, so they appreciate the institutional repository and use it to share data and other research outputs. They recommend it to others when appropriate. They do not need bells and whistles in a repository, they expect basic (but robust) functionality.
This persona may seem like a unicorn, but we interviewed several individuals who embody their characteristics, and we have encountered them in our work as curators and repository managers. For example, one of our interviewees said “I'm a supporter of the ScholarsArchive@OSU. And I think one of its values is that it really represents the intellectual property of the university. [...] And it's kind of the repository of [...] the research that the university does. And so that's [...] a lot of the motivation for me deciding to use the ScholarsArchive@OSU for it.” They can be allies to spread the word about repositories in general, and the IR in particular. One of the participants in the interviews said “Well, I'm glad that these archives are available because [...] I try to publish as much as I can. Because if you don't publish your research, it's really lost to science. You're just wasting your time and your resources. So, but there's some [research outputs] you can't publish [...], but they should be available. And one of my colleagues in Washington was very thankful that she was able to use the link to my reports, annual reports to put in her paper, in her report. It's been very valuable keeping [my research community] connected.” The Open Data Champion also trains students or refers training to others. Three of the interviewees discussed mentoring others on how to deposit data.
The Overwhelmed Early Career Researcher
The Overwhelmed Early Career Researcher may be a graduate student, a postdoc, or new research faculty. They are inexperienced in data sharing and need support to accomplish their tasks. They are still learning about data sharing benefits and are mainly motivated by requirements that they need to meet. They may also be doing time consuming data management tasks for senior researchers. Their interest in learning about different types of repositories or repository features is limited and they are looking for practical options, not advanced features. The Overwhelmed Early Career Researcher has fewer resources than other researchers and values repositories that are free or low cost. They benefit from depositing data in an institutional repository where a data curator can give them individualized attention, introduce them to best practices and teach them how to do a data deposit for the first time.
Because of the design of our study, where participants must have deposited a dataset between July 2019 and September 2023, the early career researcher persona was not well represented. We have not encountered that persona but have heard of them. For example, an interviewed researcher said “I feel like I'm a little further along in my career, so I have, for better or worse, stronger opinions about things now, and I think at the time, it was helpful talking to, just bouncing ideas off of data librarians who had a lot of experience with this kind of stuff. So I found it helpful when I was more junior. Now I feel more confident 10 years later so I'm not sure that I would need [...] the handholding.” Another researcher mentioned that “I still work with some students and they, yeah definitely, the more junior people oftentimes [...] could do with a little bit of a helping hand in terms of thinking about this stuff. You know, especially with somebody that has a lot of experience with this stuff [...] I think ScholarsArchive[@OSU], if the library is able to provide that kind of guidance, that's hugely helpful.”
The Repository of Last Resort
This persona represents a researcher with some experience and established workflows. They typically will use discipline specific repositories in their field, and deposit datasets in them regularly. They occasionally use the IR for datasets or other scholarly outputs that are different from their regular ones. In an exceptional instance, they may have produced a multidisciplinary project that uses different data formats, or a dataset larger than the allowable limits in their usual repository. When their usual repository cannot accept their datasets, the Repository of Last Resort persona will use the IR because it is convenient and flexible.
This persona is well represented in our study. For example, a researcher said, “I have used ScholarsArchive@OSU to be sure in the past, but it's primarily been a place to park things like the final accepted versions of manuscripts, or student theses, or things where there isn't already another repository that's the obvious place.” Another researcher explained “I did have a project [...] where we had a very large data set on the order of like a terabyte or something, I can't remember how big, but that we basically wanted to make available because it was a very potentially useful data set for others in the field. And, you know, that size is basically too large for any other repositories that I have experience with to take. And so I had worked with some folks there to make it available via torrent from ScholarsArchive[@OSU].”
Limitations
The small sample size and low response rate are the greatest limitations of this study. Approximately one-third of the survey participants opted in to the interviews, and in the end, we had 10 interview participants. Our small sample size is not representative of all the demographics of researchers who deposit data in a repository. Specifically, we are missing representation from graduate students and early-career faculty. While many participants discussed graduate students and their experiences as early-career faculty, none of our participants actively identified as either category.
The survey was administered in the fall of 2023, and interviews were conducted in December 2024 through February 2025. In January 2025, due to executive orders from the Trump administration, there was wide-scale data deletion across multiple US federal agencies. While our study was primarily interested in discipline-agnostic repositories, many of our interview participants also deposited data in discipline-specific federal repositories. Interviews conducted in late January and early February touched on the data deletion and concerns for the future longevity of many federal data repositories, but such concerns were not an explicit focus of our study.
Conclusion
In the end, we discovered that many researchers are choosing repositories not because of the characteristics of the chosen repository, but by their own experience and values. The participants from our study find value in institutional repositories, which appear to be especially useful for researchers who value deposit support services, institutional branding, and flexibility.
Our participants did not provide specific feedback on improving our institutional repository; however, the most obvious unspoken statement is the need to enhance our outreach strategy, as awareness about our institutional repository is low. In the future, we plan to develop customized outreach strategies for graduate students and early-career faculty to learn firsthand about their needs regarding data repositories. While our participants might not have ideas on improvements, their trust in the institutional repository is strong. In this climate of uncertainty, this trust is more important than ever, and we will continue to prioritize our users and high-quality services.
References
Akers, Katherine G., and Jennifer A. Green. 2014. “Towards a Symbiotic Relationship Between Academic Libraries and Disciplinary Data Repositories: A Dryad and University of Michigan Case Study.” International Journal of Digital Curation 9 (1): 119–131. https://doi.org/10.2218/ijdc.v9i1.306.
Amorim, Ricardo Carvalho, João Aguiar Castro, João Rocha Da Silva, and Cristina Ribeiro. 2015. “A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential.” In New Contributions in Information Systems and Technologies, edited by Alvaro Rocha, Ana Maria Correia, Sandra Costanzo, and Luis Paulo Reis, 353: 101–111. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-16486-1_10.
Briney, Kristin, Heather Coates, and Abigail Goben. 2020. “Foundational Practices of Research Data Management.” Research Ideas and Outcomes 6 (July): e56508. https://doi.org/10.3897/rio.6.e56508.
Callicott, Burton B, David Scherer, and Andrew Wesolek, eds. 2016. Making Institutional Repositories Work. Charleston Insights in Library, Archival, and Information Sciences. West Lafayette, Indiana: Purdue University Press.
Cannon, Matthew, Chris Graf, Kiera McNeice, Wei Mun Chan, Sarah Callaghan, Ilaria Carnevale, Imogen Cranston, et al. 2021. “Repository Features to Help Researchers: An Invitation to a Dialogue.” Zenodo, April 14, 2021. https://doi.org/10.5281/zenodo.4683794.
Confederation of Open Access Repositories. 2020. “COAR Community Framework for Best Practices in Repositories.” Zenodo, October 8, 2020. https://doi.org/10.5281/zenodo.4110829.
Cooper, Alan. 2004. The Inmates Are Running the Asylum. Indianapolis, IN: Sams.
CoreTrustSeal Standards and Certification Board. 2019. “CoreTrustSeal Trustworthy Data Repositories Requirements 2020–2022.” Zenodo, November 20, 2019. https://doi.org/10.5281/zenodo.3638211.
CoreTrustSeal Standards and Certification Board. 2022. “CoreTrustSeal Requirements 2023-2025.” Zenodo, September 5, 2022. https://doi.org/10.5281/zenodo.7051012.
Cragin, Melissa H., Carole L. Palmer, Jacob R. Carlson, and Michael Witt. 2010. “Data Sharing, Small Science and Institutional Repositories.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368 (1926): 4023–4038. https://doi.org/10.1098/rsta.2010.0165.
Demetres, Michelle R., Diana Delgado, and Drew N. Wright. 2020. “The Impact of Institutional Repositories: A Systematic Review.” Journal of the Medical Library Association 108 (2). https://doi.org/10.5195/jmla.2020.856.
Donaldson, Devan Ray, and Joshua Wolfgang Koepke. 2022. “A Focus Groups Study on Data Sharing and Research Data Management.” Scientific Data 9 (1): 345. https://doi.org/10.1038/s41597-022-01428-w.
Downs, Robert R. 2021. “Improving Opportunities for New Value of Open Data: Assessing and Certifying Research Data Repositories.” Data Science Journal 20 (1): 1–11. https://doi.org/10.5334/dsj-2021-001.
Dunning, Alastair, Madeleine De Smaele, and Jasmin Böhmer. 2018. “Are the FAIR Data Principles Fair?” International Journal of Digital Curation 12 (2): 177–195. https://doi.org/10.2218/ijdc.v12i2.567.
Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, et al. 2014. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Computational Biology 10 (4): e1003542. https://doi.org/10.1371/journal.pcbi.1003542.
Halder, Sambhu Nath, and Suvra Chandra. 2012. “Users’ Attitudes towards Institutional Repository in Jadavpur University: A Critical Study.” International Journal of Management and Sustainability 1 (2): 45–52. https://doi.org/10.18488/journal.11/2012.1.2/11.2.45.52.
International Organization for Standardization. 2025. “Space Data and Information Transfer Systems — Audit and Certification of Trustworthy Digital Repositories.” ISO 16363:2025. Preprint, March 2025. https://www.iso.org/standard/87472.html.
Key, Cara, and Clara Llebot. 2023. Compliance of ScholarsArchive@OSU with Requirements for Quality Research Data Repositories. Oregon State University, October 4, 2023. https://ir.library.oregonstate.edu/concern/technical_reports/vh53x4476?locale=en.
Key, Cara, Diana Park, Jane Nichols, L. K. Borland, and Clara Llebot. 2024a. The Role of ScholarsArchive@OSU as a Data Repository : Report on Survey Data. Oregon State University, August 28, 2024. https://ir.library.oregonstate.edu/concern/technical_reports/rj430d447?locale=en.
Key, Cara, Diana Park, Jane Nichols, L.K. Borland, and Clara Llebot. 2024b. “The Role of ScholarsArchive@OSU as a Data Repository : Survey Responses.” Dataset. Version 1. Oregon State University, August 29, 2024. https://doi.org/10.7267/8p58pp34p.
Key, Cara, Jane Nichols, Diana Park, Clara Llebot, and L.K. Borland. 2026. “The Role of ScholarsArchive@OSU as a Data Repository: Interview Responses.” Dataset. Version 1. Oregon State University, January, 2026. https://doi.org/10.7267/8336h999q.
Khan, Aasif Mohammad, Fayaz Ahmad Loan, Umer Yousuf Parray, and Sozia Rashid. 2024. “Global Overview of Research Data Repositories: An Analysis of Re3data Registry.” Information Discovery and Delivery 52 (1): 53–61. https://doi.org/10.1108/IDD-07-2022-0069.
Koers, Hylke, Daniel Bangert, Emilie Hermans, René Van Horik, Maaike De Jong, and Mustapha Mokrane. 2020a. “Recommendations for Services in a FAIR Data Ecosystem.” Patterns 1 (5): 100058. https://doi.org/10.1016/j.patter.2020.100058.
Koers, Hylke, Morane Gruenpeter, Patricia Herterich, Rob Hooft, Sarah Jones, Jessica Parland-von Essen, and Christine Staiger. 2020b. “Assessment Report on 'FAIRness of Services'”. Zenodo, February 28, 2020. https://doi.org/10.5281/zenodo.5470375.
Lin, Dawei, Jonathan Crabtree, Ingrid Dillo, Robert R. Downs, Rorie Edmunds, David Giaretta, Marisa De Giusti, et al. 2020. “The TRUST Principles for Digital Repositories.” Scientific Data 7 (1): 144. https://doi.org/10.1038/s41597-020-0486-7.
National Institutes of Health. 2023. NOT-OD-21-013: Final NIH Policy for Data Management and Sharing. Policy NOT-OD-21-013. 2020. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html.
National Institutes of Health. n.d. “Generalist Repository Ecosystem Initiative | Data Science at NIH.” Accessed August 14, 2025. https://datascience.nih.gov/data-ecosystem/generalist-repository-ecosystem-initiative.
National Institutes of Health. n.d. “Generalist Repositories.” Accessed June 30, 2025. https://www.nlm.nih.gov/NIHbmic/generalist_repositories.html.
National Institutes of Health. n.d. “Repositories for Sharing Scientific Data | Data Sharing.” Accessed June 30, 2025. https://sharing.nih.gov/data-management-and-sharing-policy/sharing-scientific-data/repositories-for-sharing-scientific-data.
Oregon State University Libraries and Press. 2023. ScholarsArchive@OSU Policies. Research data Curation Policy. https://osulp.atlassian.net/wiki/spaces/RP/pages/55705612/ScholarsArchive+OSU+Policies#Research-Data-Curation-Policy.
Pampel, Heinz, Paul Vierkant, Frank Scholze, Roland Bertelmann, Maxi Kindling, Jens Klump, Hans-Jürgen Goebelbecker, Jens Gundlach, Peter Schirmbacher, and Uwe Dierolf. 2013. “Making Research Data Repositories Visible: The Re3data.Org Registry.” PLoS ONE 8 (11): e78080. https://doi.org/10.1371/journal.pone.0078080.
Puren, Marie, and Florian Cafiero. 2024. “InTEIrviews: An ODD for Qualitative Interviews in the Humanities.” Journal of the Text Encoding Initiative (15). https://doi.org/10.4000/jtei.5007.
Stall, Shelley, Maryann E. Martone, Ishwar Chandramouliswaran, Lisa Federer, Julian Gautier, Jennifer Gibson, Mark Hahnel, et al. 2023. “Generalist Repository Comparison Chart.” Version 3.0. Preprint, Zenodo, May 17, 2023. https://doi.org/10.5281/zenodo.7946938.
Stall, Shelley, Lynn Yarmey, Joel Cutcher-Gershenfeld, Brooks Hanson, Kerstin Lehnert, Brian Nosek, Mark Parsons, Erin Robinson, and Lesley Wyborn. 2019. “Make Scientific Data FAIR.” Nature 570 (7759): 27–29. https://doi.org/10.1038/d41586-019-01720-7.
Tenopir, Carol, Elizabeth D. Dalton, Suzie Allard, Mike Frame, Ivanka Pjesivac, Ben Birch, Danielle Pollock, and Kristina Dorsett. 2015. “Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide.” PLOS ONE 10 (8): e0134826. https://doi.org/10.1371/journal.pone.0134826.
Sansone, Susanna-Assunta, Peter McQuilton, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Massimiliano Izzo, Allyson L. Lister, Milo Thurston, and the FAIRsharing Community. 2019. “FAIRsharing as a Community Approach to Standards, Repositories and Policies.” Nature Biotechnology 37 (4): 358–367. https://doi.org/10.1038/s41587-019-0080-8.
The National Science and Technology Council. 2022. Desirable Characteristics of Data Repositories for Federally Funded Research. Executive Office of the President of the United States, 2022. https://doi.org/10.5479/10088/113528.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.
Yoon, Ayoung. 2014. “End Users’ Trust in Data Repositories: Definition and Influences on Trust Development.” Archival Science 14 (1): 17–34. https://doi.org/10.1007/s10502-013-9207-8.