Full-Length Paper

Ethical Considerations in Integrating AI in Research Consultations: Assessing the Possibilities and Limits of GPT-based Chatbots

Authors
  • Yali Feng orcid logo (University of Illinois Urbana-Champaign Library)
  • Jun Wang orcid logo (Independent Researcher)
  • Steven G. Anderson orcid logo (University of Illinois Urbana-Champaign)

Abstract

Objective: This case study sought to provide early information on the accuracy and relevance of selected GPT-based product responses to basic information queries, such as might be asked in librarian research consultations. We intended to identify positive possibilities, limitations, and ethical issues associated with using these tools in research consultations and teaching.

Methods: A case simulation examined the responses of GPT-based products to a basic set of questions on a topic relevant to social work students. The four chatbots (ChatGPT-3.5, ChatGPT-4, Bard, and Perplexity) were given identical question prompts, and responses were assessed for relevance and accuracy. The simulation was supplemented by reviewing actual user exchanges with ChatGPT-3.5 using a ShareGPT file containing conversations with early users.

Results: Each product provided relevant information to queries, but the nature and quality of information and the formatting sophistication varied substantially. There were troubling accuracy issues with some responses, including inaccurate or non-existent references. The only paid product examined (ChatGPT-4), generally provided the highest quality information, which raises equitable access to quality technology concerns. Examination of ShareGPT conversations also raised issues regarding ethical use of chatbots to complete course assignments, dissertation designs, and other research products.

Conclusions: We conclude that these new tools offer significant potential to enhance learning if well-employed. However, their use is fraught with ethical challenges. Librarians must work closely with instructors, patrons, and administrators to assure that the potential is realized while ethical values are safeguarded.

Keywords: GPT-based chatbot, ChatGPT, ShareGPT, knowledge practice, disposition, Socratic questioning, research consultation, artificial intelligence, AI

How to Cite:

Feng, Yali, Jun Wang, and Steven G. Anderson. 2024. “Ethical Considerations in Integrating AI in Research Consultations: Assessing the Possibilities and Limits of GPT-based Chatbots.” Journal of eScience Librarianship 13 (1): e846. https://doi.org/10.7191/jeslib.846.

Rights: Copyright © 2024 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

930 Views

368 Downloads

Published on
06 Mar 2024
Peer Reviewed
d7be1861-7521-43e3-beba-071a4372a73e

Summary

This project explored some preliminary issues in using GPT-based chatbot applications as alternatives to, accessories, or context in conducting research consultations (RC), which are among the most important services provided in academic libraries. We developed a case example intended to simulate chatbot use by relatively novice users in searching for information on very basic research and education related questions, such as might be asked by a student or other patron who is not an expert either on the topic or on chatbot use. We focus on issues such as the ease of obtaining relevant information, its accuracy, and related ethical considerations in its use. Based on our findings, as well as the extremely rapid emergence of enhanced AI applications, we discuss ethical considerations facing librarians and their patrons as they use chatbots in developing research products and in providing related library RC.

Project details

The case study explores the capabilities of and issues associated with using different GPT-based chatbots in a research consultation (RC) context. We define RCs as schedule based, one-to-one individualized, personalized research services provided by subject librarians for students and researchers in academic libraries. RC may be viewed as the intercept between reference interviews and instruction; they sometimes are considered as one kind of reference interview, as well as information literacy instruction (Association of College & Research Libraries 2011). While we are particularly interested in RC, our exploration focuses on unguided interactions between people and chatbots. Such interactions can offer important guidance in constructing RCs and in considering ethical chatbot use more generally.

Roles, services, and infrastructures

The project involved a collaboration between a university behavioral sciences librarian, a senior professor in a field served by that librarian (social work), and an independent expert with extensive experience using large data sets and artificial intelligence (AI). The librarian and professor defined research consultation scenarios likely to occur as the librarian served social work faculty and students, and they also led the case simulation construction and analysis that was the centerpiece of the project. The AI expert led the review and selection of GPT-based products to be used, and also provided technical guidance on chatbot inquiries and related interpretations. In addition, he accessed ShareGPT data that served as important supplemental information to our case simulation.

We did not have to draw upon any formal university services or collections to execute this case study, although the librarian did review a historical record of research consultations she compiled to gain a basic understanding of the substance and range of RC in the social science areas she serves. Beyond our own efforts, the project primarily required access to leading GPT-based products, all of which are easily available both inside and outside of university libraries. Similarly, no special computing software or capabilities were required beyond those typically available on office computers. In selecting the GPT-based products to use, our technical expert sought to identify popular but diverse products to allow potentially useful comparisons. In addition, to examine possible differences associated with free versus paid products, we included ChatGPT-4 as a paid product along with the free products we used. ChatGPT-4 required a subscription fee of $20 per month when the study was conducted.

Process and methods

We developed the case study simulation using a social work/social policy scenario through which we could observe and reflect upon chatbot use ethical issues. We included a comparison of selected GPT-based chatbot products: ChatGPT-3.5, ChatGPT-4, Perplexity AI, and Google Bard. Our intent was to explore how rapidly improving AI tools might perform differently and interact with corresponding ethical issues. We especially valued taking the user’s growth into consideration when reflecting on ethical issues, and also observed that ethical considerations will vary with the users’ expertise level and use purpose. It is not a static standard, but rather dynamic and interactive. We referred to the ACRL Framework for Information Literacy for Higher Education for guidance on information literacy, which it defines as “the set of integrated abilities encompassing the reflective discovery of information, the understanding of how information is produced and valued, and the use of information in creating new knowledge and participating ethically in communities of learning” (Association of College & Research Libraries 2016).

An initial consideration was how we could simulate search on a basic question to facilitate our preliminary exploration of chatbot use. We decided on a strategy of having our subject area expert construct a few basic research-related questions that an undergraduate student from social work or a related field might be asked to pursue in gathering information for a term paper or other class project. After the team reviewed this initial list, we selected a question related to social work and technology on which to simulate a search. It is notable that social work students typically receive little educational training on this issue, and yet “Harnessing Technology for Social Good” has been identified as one of the 13 Grand Challenges of Social Work (Singer et al. 2022, 230). As such, we viewed this as a question of importance in the social work field, but one in which an undergraduate student typically would have limited background and hence would likely be a novice in terms of domain knowledge.

In addition, we assumed that the user would be a relative novice in terms of experience in using AI, given that our construction occurred in the early stages of GPT-based chatbot use. Hence, in our simulation we chose to limit the sophistication and scope of prompting. We recognize that this conceptualization of “novice” is crude, and it represents the least developed or “beginning” endpoint on a continuum of domain and technical competencies useful in addressing the question asked. Yet, this is an important group to consider as the use of GPT-based chatbot educational and research inquiry ramps up—and one that will be of particular importance to librarians and instructors.

To guide our search, we formulated our question in this way: “Technology is likely to have varying impacts on different groups in society. What are some of the overarching concerns with how the rapid development of technology may have negative effects on poor people and other disadvantaged groups?”

The AI expert and subject librarian then simulated a student search of this topic. We developed a series of prompts to simulate a user’s exploration process and data collection efforts, first starting with the broad research question, and then drilling down into specific areas. For example, after the broad research question was presented to the chatbots directly, the chatbots were asked follow-up questions about the “digital divide” and other aspects related to more fully exploring the basic question. These additional prompts can be viewed as corresponding to how one with slightly more subject matter expertise could refine and enhance a basic search.

We asked identical questions for each of the four AI products selected. For each question, we collected the output generated for subsequent analysis. We then evaluated selected aspects related to the accuracy and relevance of the answers, and compared them among ChatGPT-3.5, ChatGPT-4, Google Bard, and Perplexity AI. This included review of the output from each chatbot by team members, as well as related follow-up checking to verify the accuracy of response information. In assessing accuracy, we focused on determining if the chatbot output contained notable factual inaccuracies that would be obvious to a subject matter expert. Detailed reference checking then served to verify whether the sources provided in chatbot responses were accurate. With respect to relevance, we focused on the substantive quality of the output, how closely the response focused on the question, and how well response components were integrated. We ran these tests during March-May 2023, and fully recognize the point-in-time nature of these searches and that capabilities of these tools are changing very rapidly.

Results and supplemental methods

All of these models were able to generate an overview with relevant information in response to our initial broad research question. However, the responses differed in important ways. First, the range of information varied significantly. ChatGTP-3.5, Bard, and Perplexity each generated five different areas of concern related to technological development, while GPT-4 provided 10. Second, while some overlapping types of concern were provided by the different chatbots, the issues selected also varied significantly. Only the well-known “digital divide” was included as a category of response for each of the chatbots. Finally, the chatbots exhibited a noticeable differentiation of focus in their responses. GPT-3.5 and GPT-4 both provided responses that focused specifically and consistently on concerns about poor and disadvantaged groups, while some Bard and Perplexity responses drifted into technology concerns of importance to broader populations (i.e., fake news, sedentary lifestyle, cyberbullying). Perplexity differed from the other three chatbots in that it offered links to related citations, which made it easy to take a first step in exploring relevant information in more depth.

In considering more detailed follow-up questions, we decided to focus on the digital divide, given that it was the one topic on which each chatbot provided initial responses and it also is among the most fundamental technology issues facing poor people. In particular, we asked the chatbots: “Can you describe the most important research and researchers in the ‘digital divide’ research area?”

Again, the chatbots provided a great deal of relevant information, but the organization and quality varied substantially. Perhaps consistent with the dual focus of the question on “most important research” and “researchers,” the data were organized differently by each chatbot and sometimes focused on one aspect (i.e., research versus researchers) more than the other. ChatGPT-4 again stood out in terms of being well-organized, by providing an initial description of key related research areas and then identifying some well-known scholars with brief but substantive information on their contributions. Bard also provided a useful basic summary of important digital divide related research areas, but its list of researchers was minimal, and it did not provide much information on their contributions. ChatGPT-3.5 provided some useful information on digital divide researchers but no integrated discussion of important research areas. Perplexity likewise offered no overview summary or integration of topics, but provided many subcategories of relevant research with brief summaries and again provided links to related citations.

We then asked a follow-up question on one of the prominent researchers mentioned in responses to the previous question: “What are the most important research publications written by Jan van Dijk on digital divide?”

All four chatbots responded to the question and listed works purportedly written by van Dijk. As might be expected, there were variations in the books and articles provided. GPT-3.5 and GPT-4 appeared to provide the highest quality information, with GPT-4 again providing more detail and also presenting the information in a well-organized and easy to use format. In contrast, Perplexity provided relatively little information, while Bard included a short summary of van Dijk along with four purported publications by him.

Another important concern emerged when we engaged in systematic fact-checking on the publications provided by each chatbot for van Dijk. The Bard information proved most troubling, in that all four of the publications provided could not be verified as real. ChatGPT-3.5 also had one non-verifiable publication among the five it provided, and both ChatGPT-3.5 and ChatGPT-4 provided other publications with partially incorrect information, such as publication dates or collaborators. In contrast, consistent with follow-ups in earlier questions on its linked sources, the Perplexity citation information was consistently accurate.

In summary, all of these large language models (LLM) provided a reasonable introductory description on a well-known research topic, in terms both of the accuracy of information provided and the relevance of this information to the broader issue being queried. However, even these introductions varied substantially, and the LLMs generated divergent answers on specific questions that required additional knowledge. In addition, the credibility of the information sources provided varied greatly, and resulted in major information accuracy and related quality issues. The LLMs also varied widely in their organization and integration of materials provided, ranging from highly organized and well-integrated responses such as provided by ChatGPT-4 to more casual summaries with some drift of focus from the questions asked.

April 2023 data set from ShareGPT

While the case simulation was our principal source of information for this project, we also decided to examine actual user case reports to gain a preliminary sense of selected ways that chatbots are being used in pursuit of research-related information. We did so by accessing user case reports available from ShareGPT, which is a Chrome Extension that allows users to share their conversations with ChatGPT-3.5 by generating a URL that a user can share to social media or other internet sources. The shared URLs were originally open for public access, which later was discontinued after accusations that Google was using this data set to help fine-tune their own model Bard. We were able to download 90K conversations that had entered the public domain before the access was closed. We identified 53 conversations related to research consultation from this data set by using “research topic” and “research question” as search terms. In this sense, the cases diverge from our simulation exercise focusing on novice users, and many also fall outside the normal boundaries of library RC. Nonetheless, these cases provide useful broader exposure to a diverse and rich range of early uses, and thus can enhance our consideration of ethical issues.

We read the output from these cases to identify selected ways in which actual users were interacting with ChatGPT, as well as to ascertain some related ethical concerns. We emphasize that this exercise was purely exploratory and is not intended to quantify aspects like most common uses or the adequacy of responses. Each output case included a verbatim “conversation” in which the “Human” provides some background information and asks questions, and then ChatGPT responds. Many cases include fairly long interactions, in which the person seeking information continues to refine and expand questions based on the information that ChatGPT provides.

A few summary points from the ShareGPT cases merit attention. First, the range of questions and topics is striking, including but not limited to queries for help with short undergraduate papers or class project proposals; thesis writing; article background research, research question and hypothesis development, and writing; and what appear to be more formal funding proposals. It was not possible to review and verify the vast and diverse quantity of ChatGPT output provided through these cases, but in general the responsiveness and interactions appeared impressive. Second, similar to our earlier characterization of novice and more advanced users, these cases demonstrated wide ranges of sophistication among those asking questions. In more advanced conversations, the questioner was able to skillfully review the ChatGPT response and continue asking follow-ups until a desired product or set of answers was produced.

Some differences in the types of information being sought likewise are useful in considering possible ethical boundaries or instructor guidance in the acceptable use of chatbots for classes or research projects. For example, some of the queries either were seeking initial help with how to do something, such as requesting the basic steps in writing a research report. Others were using ChatGPT much like an editor or advanced form of clerical support. Uses of this nature included taking a set of materials provided and turning it into a paragraph form; pulling bullet points or power point slides from written material provided; and translating from one language to another. Even some of these basic uses have ethical implications, in that it was not possible to determine if the initial source materials presented to the ChatGPT were actually written by the person presenting them.

Other conversations are more challenging from an ethical standpoint and suggest the need for rich discussion about acceptable use guidelines in various academic venues. First, many queries asked ChatGPT to develop research questions and hypotheses related to a particular topic. One could argue that this often occurs when students and researchers review existing research, and as such is just a more efficient way to identify possibilities. However, this identification stage often is considered fundamental to critical thinking, so simply having a single prompt resulting in suggested research questions and hypotheses raises interesting questions. A related concern is that ChatGPT generally did not provide citations for such research advice, so there are important questions concerning the quality of information provided or the related crediting of ideas. Second, it was clear that many of the inquiries involved using ChatGPT to formulate or actually write large portions of papers. In some cases, this was done with a single blanket prompt (i.e., write me a project proposal on a given subject). In others, it involved a long dialogue in which the initial query asked for fairly general ideas (such as possible paper outline and sections), and then follow-ups requested more detailed drafting.

The ShareGPT data set is useful for studying actual prompting to gain understanding of the human side of human-chatbot interactions, as well as for analyzing prompt behavior to gain insights into the behavior of the chatbots. Yet, we must recognize that presenting prompts to obtain apparently ethically challenging output does not mean the output in turn would actually be used unethically. The actual use of such output in educational and research product development is a complex issue requiring further study.

Overall, the ability to capture and review actual conversations like these suggests an interesting area for further research. These case interactions point to the rich and diverse amount of information that ChatGPT is able to produce when provided with specific questions. At the same time, the more detailed back and forth between the person seeking information and ChatGPT suggests specific areas of inquiry in which ethical use guidance may be especially important. Finally, gaining a better sense of the quality of response, and how this may compare to alternative means of information access and research consultation, is a very complex subject requiring additional research.

Background

The nature of our project did not involve implementing specific AI tools either in the library or in the collaborating social work school. Rather, our intent was to learn how AI products readily available to library patrons are likely to be used, and what some of the practical and ethical impacts of such use will be on RC and teaching practices. As such, we view the benefits of our project as providing some early findings and related guidance on ethical issues that are evolving very rapidly. This obviously is an area in which discussions are exploding not only in libraries but across all academic units, and universities and other entities are scrambling to develop responsible policies, guidelines, and resources to guide both research and teaching practices. We intend for our project to be one of many responding to major knowledge gaps related to the use of these transformational tools. The intent is to provide evidence-based findings that raise important ethical questions related to developing effective practice guidelines.

With respect to other tools and guidance, we particularly were informed by the Framework for Information Literacy for Higher Education (Association of College & Research Libraries 2016) . It provides a broad framework that should be useful in framing ongoing thinking about information use in inquiry, even as AI and other information tools fundamentally change information seeking and use strategies. The ability to look at real inquiries through the ShareGPT files also was very useful in beginning to understand the vast range of inquiries, sophistication of prompting, and possible uses and misuses of these new capabilities.

Ethical considerations

Because our study developed and executed a case simulation involving only our project team, we focused on identifying and assessing selected ethical challenges that our results suggest will be important in shaping university library and teaching policies and practices. We consulted several references in considering such ethical issues broadly, including the ALA Code of Ethics, IFLA Code of Ethics for Librarians and other Information Workers, IFLA Statement on Libraries and Artificial Intelligence, the National Association of Social Workers Code of Ethics, and the Council on Social Work Education 2022 Educational Policy and Accreditation Standards.

Our overall reflection from this project is that the general stance and/or attitude one takes toward AI technology powerfully affects which ethical issues will receive greatest priority. On the one hand, there are legitimate concerns regarding how this new technology may be misused, which may lead to a primary “resistance attitude” that emphasizes ethical considerations related to preventing harm. However, this approach may be overly restrictive, and could foster a narrow focus that does not capture the rich potential of these new applications. In contrast, an “embracing attitude” is more reluctant to limit issue considerations to merely reducing potential harms, and instead encourages thoughtful exploration of possible benefits if appropriate accompanying guidance and guardrails can be developed. Ethical considerations in such a framework appear more dynamic, multidimensional, and holistic, and at this point are largely unresolved and will require ongoing thoughtful discourse as AI applications and use continue to develop. As we consider in the next section who will be affected in university educational settings by these emerging chatbot developments, we try to balance concerns about harm with more positive possibilities.

Within this broader context, our case simulation findings suggest several more specific ethical implications. First, the findings are particularly problematic in terms of novice users, because such users are unlikely to have the prior domain knowledge helpful in distinguishing between true vs. incorrect information generated by LLMs. The mixture of accurate and inaccurate information found in our simulation, all of which was presented in fluent human language, is extremely challenging to untangle for novice users and even potentially difficult for more advanced ones. To be effective, users thus need to build strong fact-checking and interpretive skills when collecting and using LLM-generated information.

A second related issue extends beyond assessing basic data accuracy. We found considerable variation in the nature and specificity of information provided by different chatbots, even when information was accurate. As such, the quality of learning for users is likely to be affected significantly by which chatbot they happen to employ. This suggests the need for strong guidance from instructors, librarians, or others regarding the quality of different chatbot products as well as on how best to formulate search questions.

A final emerging ethical issue pertains to equity in access to various chatbots. In our case study, the paid ChatGPT-4 model appeared to be more advanced in the focus, integration of ideas, and organization of output than the free alternatives we tested. This suggests that users with fewer resources may not only get less information, but lower quality information. The commercialization of chatbot applications in the future is likely to amplify these early findings of qualitative differences in free versus paid products. This inequality issue merits attention as universities develop guidance for chatbot use. It likewise raises another potentially troubling version of the digital divide with respect to access to higher quality chatbots.

Who is affected by this project?

The use and impact of GPT-based chatbots in research, writing and learning by university students, faculty members, and other library patrons is unfolding in many unpredictable ways and will have yet unknown influences on library practice. Based on this exploratory research, it is clear that ongoing chatbot developments will present significant opportunities and challenges for teaching and research, and in turn for RC and other related library services. Such developments will occur in the context of a whole university eco-system that includes but is not limited to students, instructors, librarians, and administrators engaged in setting instructional and technology use policies. The following is a brief initial assessment of selected challenges and opportunities that our study suggests will be relevant for each of these key stakeholders.

Students

It is useful to start with students, as they are the key patrons who other stakeholders are attempting to serve in empowering and ethical ways. Our case illustrates just how challenging guiding the use of these new technologies is likely to be even when considering the most well-meaning and dedicated students. One important challenge is assuring that students ask chatbots questions that are most likely to result in relevant and accurate information. Doing so requires sufficient domain knowledge to formulate initial questions well, and then to follow-up with thoughtful prompts based on initial responses. But even if students execute these functions reasonably well, we cannot assure that chatbots will provide relevant and accurate information. This points to the need for careful scrutiny and ongoing checking by students of the chatbot information they generate.

This is true of any search for new information. Yet it is amplified in unguided chatbot searches in which consumers are unsophisticated about issues such as how the chatbot is retrieving and assembling information, as well as about the accuracy and ethical use of the data retrieved. It is elevated in situations in which students are investigating a topic on which they have little background, because the lack of domain knowledge compromises one’s ability to assess the accuracy and relevance of the information retrieved.

Despite these challenges, well-executed student chatbot use will present important opportunities for enhancing learning. It provides the promise of quickly generating a reasonable set of basic information, and of providing both argumentation and references that can aid thinking about next steps in investigating a topic of interest. This can be the first step in an iterative process in which students conduct deeper chatbot searches building on the initial information produced. It can allow students to quickly obtain initial information on diverse topics, and then to focus at later stages on more integrative, application-oriented, and creative thinking.

Instructors

Instructors face even more complex challenges and opportunities. Like others, most instructors will experience a steep learning curve in developing chatbot use expertise, and often may be no more advanced than the average student they teach. Thus, they must navigate many of the same challenges as students in learning how to use chatbots in ways that are productive and uncompromising of core ethical and quality values. But their challenges extend much further, particularly in their central role in guiding and assessing student learning. Such issues include but are not limited to clarifying the extent to which chatbot use is acceptable for completing papers or other assignments; establishing how direct use of chatbot output should be quoted or otherwise cited; testing the accuracy of citations used in chatbot output; and developing familiarity concerning which GPT-based chatbots are most likely to meet performance standards. Instructors also are in the unenviable position of devising and enforcing strategies to assure that students use chatbots ethically in completing their assignments.

Again, however, skilled and well-trained instructors can use chatbots to enhance student learning opportunities as well as to assess student learning. For example, instructors can use chatbots to rapidly scan for basic information on a wide array of topics. Chatbots also may become a relatively easy way to check on the range and depth of information students access when completing assignments. They likewise may be used to explore information on teaching techniques and accompanying materials. As chatbots improve and instructor expertise develops, it may be that chatbots in some ways become to composition what computers are to calculation. In particular, students might be encouraged to use chatbots for basic learning searches and related compositions, thus not only instructing them on next wave information acquisition but freeing time for higher level critical and conceptual thinking. Just as instructors have long been trained on how to enhance student learning through effective questioning in classroom settings, they will be challenged on how to train students on effective question development for use with chatbots.

Librarians

Academic librarians will be key stakeholders for developing chatbot literacy strategies that help students and faculty members understand both the pitfalls and potential of chatbot access. The skill with which they develop and continually refine expertise will be critical to the relative success of chatbot use within institutions. Librarians will not only be influential in how they engage with students in RCs involving chatbot use. Perhaps more fundamentally, they can be critical partners in working with instructors to think through information gathering and use strategies that maximize the capabilities of chatbot use in student learning while safeguarding ethical standards and stimulating higher-level critical thinking.

Librarians likewise will need to work closely with faculty members and other university colleagues to determine and then disseminate information on the skills and capabilities essential to enhancing effective chatbot use. This role will include but not be limited to raising awareness of information accuracy issues, cultivating critical thinking in terms of relevance, and developing skills for translating effective questioning into productive chatbot prompts. Librarians also can work with faculty members to develop notes and guidelines on chatbot use for their courses, such as use purpose, effective prompt examples, and expectations related to crediting chatbot contributions appropriately.

Librarians will require training so they can integrate tools that are widely used by patrons into their skill repertoire, knowledge practice, and daily services, including a focus on cultivating critical thinking through question development. The prerequisite is to understand and identify some related concepts and corresponding ethical considerations, such as traditional question negotiation and questioning versus chatbot related prompt intention, prompt behavior, and output use. Such differences and the interaction between them need to be examined carefully so librarians can grasp the context and focus when they conduct RC. Question negotiation is an important part of RC through which the patron and librarian introduce the research topic and clarify the research question, which sets the foundation for developing effective search strategies. Further investigation is needed on how this process differs for interactions between users and chatbots, but here we present some initial observations.

Employing sound strategies of effective question design (EQD) should be an important focus in cultivating critical thinkers in the GPT-based chatbot era, as critical thinking and question development are interrelated and mutually reinforcing. Established questioning frameworks can serve as the threads that connect domain knowledge, language awareness, information literacy, and ethical considerations, which can enhance the development of critical thinking. GPT-based products can be effective tools during this process.

Developing thought-provoking, well-structured questions encourages individuals to engage in deeper analysis, evaluate different perspectives, and synthesize information from various sources. To facilitate this, questioning theories and frameworks have been developed in the realms of philosophy, psychology, and education. Bloom's Taxonomy and the Socratic questioning method are two such established frameworks, providing a structured approach to crafting questions (Anderson, Krathwohl, and Bloom 2001; Elder and Paul 2002). These frameworks target different levels of cognitive complexity, ranging from basic knowledge recall to higher-order analysis, synthesis, and evaluation, and they can be utilized to design a general question matrix (Knowledge Compass 2023) to promote effective question design.

Information literacy instruction that touches GPT-based chatbots can give a general introduction of EQD, which can be enhanced by individual research consultation from subject librarians. Chatbots can be used as a tool to help librarians and instructors foster critical thinking in the classroom and individual research consultations. In other words, with a general question matrix, the subject librarians can help patrons develop individualized question matrices for certain projects to guide them in developing critical thinking skills. Combined with the domain knowledge and language awareness, the question matrix can be properly transformed into ChatGPT prompts after careful trials according to the subjects and purpose.

Another reverse approach is to formulate questions through research articles that embody great critical thinking, turn these questions into prompts, and then use ChatGPT as an assistant in conducting critical thinking training. Classic articles in the domain can be utilized as “training cases” for critical thinking. ChatGPT may be a useful tool in alleviating the tension that has been perceived between subject matter instruction and critical thinking.

Administrators

In addition to these dyadic interactions with students and instructors, librarians will be critical stakeholders in the development of broader chatbot use guidance for the university community. This will involve increasing interactions with selected university administrators in establishing policies on chatbot use. These stakeholders include not only higher-level library administrators, but also other university administrators charged with implementing teaching, learning, and assessment frameworks that cross university units.

We have conducted some preliminary reviews of university policies regarding chatbot use, which appear to be very diverse. While further study is needed, our reviews suggest differing tensions between resisting or highly circumscribing chatbot use versus embracing its potential in learning. Another key issue is the extent to which administrative leaders attempt to develop centralized university policy in this area, as opposed to decentralizing it for unit decision-making or leaving important determinations to instructors or other personnel.

Reviewing such policies more systematically is an interesting area for further research, and university librarians will be well-suited to contribute to related dialogues needed to create best practices. They similarly will assume major responsibilities for developing broader scale training sessions and advisories for diverse university stakeholders, including administrators. This is likely to be a daunting challenge for librarians already needing to keep pace with new information provision technologies in the rapidly changing digital space, but it likewise presents a vital opportunity for librarians to elevate their stature and impact in RCs and in related university information strategy development.

Lessons learned and future work

We began this project with limited knowledge about chatbot use in education and research, and as such we will reiterate only a few of the many lessons learned. First, our case simulation demonstrated that GPT-based products differ substantially in their capabilities. Further research that differentiates these product capabilities in more depth is important, as findings in this respect will be useful not only to librarians but more widely to consumers. A related concern is that we found the one paid ChatGPT-based product we reviewed generally provided more sophisticated responses, and it would not be surprising if additional paid products of this nature will provide higher level capabilities than free ones. This raises questions concerning whether chatbots such as these will further extend the digital divide and other digital inequities.

Second, we were impressed by the power of these applications to support research writing and research product development, but also concerned with major limitations in what was produced. Working to clarify both the strengths and limits of these products, and developing related strategies for their effective and ethical use, requires additional attention. Research related to how best to engage in fact-checking appears especially important with respect to the limitations we observed.

Third, the manner in which questions are framed, especially with regard to their specificity and related follow-ups/dialogues, is very important to minimizing drifting and improving the overall quality of responses. Examining strategies for training users on how best to interact with chatbots in seeking information consequently merits attention, and librarians can play key roles in this area. Improving questioning strategy and integrating ethical considerations in questioning frameworks may be one promising direction for improving AI information literacy. Librarians can work with instructors and students to develop prompts that represent translations from questioning frameworks with built-in critical thinking and ethical considerations.

Fourth, as described in the previous section, librarians have many important roles to play in contributing to responsible AI practice. Those who develop a clear understanding of university AI related ethics policies and build rich expertise and experience in using chatbots can employ them selectively as a tool in conducting research consultations. For example, referring to the ACRL Framework “knowledge practice” set (Association of College & Research Libraries 2016, 7), librarians can use chatbots to help patrons in developing and clarifying research questions, finding theories/frameworks, harvesting search terms, translating search strings between databases, advising on individualized prompting for their projects, and providing simple guidelines on ethical use of output based on the university policy and course policy. In the “Disposition” set, librarians can contribute to ethical chatbot use by helping patrons to understand their relationship with chatbots, giving tips to students on how to raise awareness of their thinking and emotions, and to guide them towards a growth mindset during chatbot interactions.

Finally, the ACRL Framework applies not only to individuals, but also to institutions and learners. When formulating policies, universities should be mindful of potential contradictions and ensure that the measures taken are balanced and do not have unintended consequences that restrict learning. We are especially concerned that the very real need to assure ethical use of chatbots in learning does not inhibit institutions from simultaneously embracing and exploring their vast potential.

Documentation

We consulted the following policies and codes of ethics.

References

American Library Association. 2008. “Code of Ethics of the American Library Association.” http://www.ala.org/advocacy/proethics/codeofethics/codeethics .

Anderson, Lorin W., David R. Krathwohl, and Benjamin S. Bloom. 2001. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives . New York: Longman.

Association of College & Research Libraries (ACRL). 2011. “ACRL Guidelines for Instruction Programs in Academic Libraries.” http://www.ala.org/acrl/standards/guidelinesinstruction .

————. 2016. “Framework for Information Literacy for Higher Education.” https://www.ala.org/acrl/standards/ilframework .

————. 2021. “Social Work, Companion Document to the ACRL Framework for Information Literacy for Higher Education.” https://acrl.libguides.com/ld.php?content_id=62704385 .

Association of College & Research Libraries (ACRL), Educational and Behavioral Sciences Section [EBSS], Social Work Committee. 2020. “Companion d\Document to the ACRL Framework for Information Literacy for Higher Education: Social Work.” Accessed March 5, 2023. https://acrl.libguides.com/sw/about .

Council on Social Work Education. 2015. “2015 Educational Policy and Accreditation Standards [EPAS].” https://www.cswe.org/getattachment/Accreditation/Accreditation-Process/2015-EPAS/2015EPAS_Web_FINAL.pdf .

————. 2022a. “2022 Educational Policy and Accreditation Standards [EPAS].” https://www.cswe.org/accreditation/policies-process/2022epas .

————. 2022b . “2022 Educational Policies and Accreditation Standards [EPAS]: Frequently Asked Questions (Version 9.2.2022).” https://www.cswe.org/getmedia/67a67f0b-839e-420d-8cea-d919f9e6ca3a/2022-EPAS-FAQS.pdf .

Elder, Linda and Richard Paul. 2002. The Miniature Guide to the Art of Asking Essential Questions . Dillon Beach CA: The Foundation for Critical Thinking.

International Federation of Library Associations and Institutions. 2012. “IFLA Code of Ethics for Librarians and Other Information Workers.” IFLA Publications Aug-2012. https://repository.ifla.org/handle/123456789/1850 .

Knowledge Compass. n.d. Accessed March 7, 2023. https://www.knowledgecompass.org .

National Association of Social Workers. 2023. “Highlighted Revisions to the Code of Ethics.” https://www.socialworkers.org/About/Ethics/Code-of-Ethics/Highlighted-Revisions-to-the Code-of-Ethics .

ShareGPT dataset. 2023. Accessed November 14, 2023. https://huggingface.co/datasets/RyokoAI/ShareGPT52K .

Singer, Jonathan B., Melanie Sage, Stephanie Cosner Berzin, and Claudia J. Coulton. 2022. “Harness Technology for Social Good.” In Grand Challenges for Social Work and Society, 2nd ed , edited by Richard P. Barth, Jill T. Messing, Trina R. Shanks, and James H. Williams, 230-256. New York: Oxford University Press.