2. User Studies • CLIR

DLF respondents devoted the bulk of their discussion to user studies, reflecting the user-centered focus of their operations. One respondent referred to the results of user studies as “outcome” measures because, although they do not measure the impact of library use on student learning or faculty research, they do indicate the impact of library services, collections, facilities, and staff on user experiences and perceptions.

Libraries participating in the DLF survey organize, staff, and conduct user studies differently. Some take an ad hoc approach; others use a more systematic approach. Some sites have dedicated staff experts in research methodologies who conduct user studies; others train staff throughout the libraries to conduct user studies. Some libraries take both approaches. Some have consulted experts on their campuses or contracted with commercial firms to develop research instruments and analyze the results. For example, libraries participating in the DLF survey have recruited students in library science and human-computer interaction to conduct user studies or hired companies such as Websurveyor.com or Zoomerang.com to host Web-based surveys and analyze the data. Libraries that conduct user studies use spreadsheet, database, or statistical analysis software to manage and analyze the data. In the absence of standard instruments, guidelines, or best practices, institutions either adapt published efforts to local circumstances or make their own. There is clearly a flurry of activity, some of it not well organized or effective, for various reasons discussed elsewhere in this report.

Learning how to prepare research instruments, analyze and interpret the data, and use the results is a slow process. Unfortunately, however, the ability to quickly apply research results is often essential, because the environment changes quickly and results go out of date. Many DLF respondents reported instances where data languished without being analyzed or applied. They strongly cautioned against conducting research when resources and interest are insufficient to support use of the results. Nevertheless, DLF libraries are conducting many user studies employing a variety of research methods. The results of these studies run the gamut: they may reinforce librarian understanding of what users need, like, or expect; challenge librarian assumptions about what people want; or provide conflicting, ambiguous, misleading, or incomplete information that requires follow-up research to resolve or interpret. Multiple research methods may be required to understand fully and corroborate research results. This exacerbates an already complicated situation and can frustrate staff. Resources may not be available to conduct follow-up studies immediately. In other cases, new priorities emerge that make the initial study results no longer applicable; in such a case, any attempt at follow-up is worthless. Moreover, even when research data have been swiftly analyzed, interpreting the results and deciding how to apply them may be slowed if many people are involved in the process or if the results challenge long-held assumptions and preferences of librarians. Finally, even when a plan to use the results is in hand, implementation may pose a stumbling block. The longer the entire research process takes, from conception to implementing the results, the more likely the loss of momentum and conflict with other priorities, and the greater the risk that the process will break down and the effort will be wasted. The issue appears to be related to the internal organization and support for the library’s assessment effort.

To help libraries understand and address these concerns, this section of the report describes popular user study methods, when and why DLF libraries have used them, where they succeeded, and where they failed. Unless otherwise noted, all claims and examples derive from the DLF interviews. The focus is surveys, focus groups, and user protocols, which are the methods DLF libraries use most often. Heuristic evaluations, paper prototypes and scenarios, and card-sorting exercises are also described because several DLF institutions have also used these methods successfully.¹

2.1. Surveys (Questionnaires)

2.1.1. What Is a Survey Questionnaire?

Survey questionnaires are self-administered interviews in which the instructions and questions are sufficiently complete and intelligible for respondents to act as their own interviewers.² The questions are simply stated and carefully articulated to accomplish the purpose for which the survey is being conducted. Survey questions typically force respondents to choose from among alternative answers provided or to rank or rate items provided. Such questions enable a simple quantitative analysis of the responses. Surveys can also ask open-ended questions to gather qualitative comments from the respondents.

Surveys are an effective way to gather information about respondents’ previous or current behaviors, attitudes, beliefs, and feelings. They are the preferred method to gather information about sensitive topics because respondents are less likely to try to please the researcher or to feel pressured to provide socially acceptable responses than they would in a face-to-face interview. Surveys are an effective method to identify problem areas and, if repeated over time, to identify trends. Surveys cannot, however, establish cause-effect relationships, and the information they gather reveals little if anything about contextual factors affecting the respondents. Additional research is usually required to gather the information needed to determine how to solve the problems identified in a survey.

The primary advantage of survey questionnaires is economy. Surveys enable researchers to collect data from large numbers of respondents in relatively short periods of time at relatively low cost. Surveys also give respondents time to think about the questions before answering and often do not require respondents to complete the survey in one sitting.

The primary disadvantage of survey questionnaires is that they must be simple, impersonal, and relatively brief. If the survey is too long or complex, respondents may get tired and hurriedly answer or skip questions. The response rate and the quality of responses decline if a survey exceeds 11 pages (Dillman 1978). Instructions and questions must be carefully worded in language meaningful to the respondents, because no interviewer is present to clarify the questions or probe respondents for additional information. Finally, it is possible that someone other than the selected respondent may complete the survey. This can skew the results from carefully selected samples. (For more about sampling, see section 4.2.1.) When necessary, survey instructions may explicitly ask that no one complete the survey other than the person for whom it is intended.

2.1.2. Why Do Libraries Conduct Surveys?

Most of the DLF respondents reported conducting surveys, primarily to identify trends, “take the temperature” of what was happening among their constituencies, or get a sense of their users’ perceptions of library resources. Occasionally they conduct surveys to compare themselves with their peers. In summary, DLF libraries have conducted surveys to assess the following:

Patterns, frequency, ease, and success of use
User needs, expectations, perspectives, priorities, and preferences for library collections, services, and systems
User satisfaction with vendor products, library collections, services, staff, and Web sites
Service quality
Shifts in user attitude and opinion
Relevance of collections or services to the curriculum

A few respondents reported conducting surveys as a way to market their collections and services; others commented that this was an inappropriate use of survey research. One respondent referred to this type of survey as “push polling” and stated that there were easier, more appropriate ways than this to market what the library offers.

The data gathered from surveys are used to inform decision making and strategic planning related to the allocation of financial and human resources and to the organization of library units. Survey data also serve political purposes. They are used in presentations to faculty senates, deans’ councils, and library advisory boards as a means to bolster support for changes in library practice. They are also used in grant proposals and other requests for funding.

2.1.3. How Do Libraries Conduct Surveys?

DLF respondents reported that they conduct some surveys routinely; these include annual surveys of general library use and user priorities and satisfaction. Other surveys are conducted sporadically; in this category might be, for example, a survey to determine user satisfaction with laptop-lending programs. The library administrator’s approval is generally required for larger, more formal, and routine surveys. Smaller, sporadic, less expensive surveys are conducted at the discretion of middle managers.

Once the decision has been made to conduct a survey, libraries convene a small group of librarians or staff to prepare the survey instructions and questionnaire, determine the format of the survey (for example, print, e-mail, Web-based), choose the sampling method, identify the demographic groups appropriate for the research purpose, determine how many participants to recruit in each group and decide how to recruit them, and plan the budget and timetable for gathering, analyzing, interpreting, and applying the data. A few DLF respondents reported using screening questionnaires to find experienced or inexperienced users, depending on the purpose of the study.

Different procedures are followed for formal surveys than for small surveys. The former require more work. Because few libraries employ survey experts, a group preparing a formal survey might consult with survey experts on campus to ensure that the questions it has drafted will gather the information needed. The group might consult with a statistician on campus to ensure that it recruits enough participants to gather statistically significant results. When a survey is deemed to be extremely important and financial resources are available, an external consulting or research firm might be hired. Alternatively, libraries with adequate budgets and sufficient interest in assessment have begun to use commercial firms such as Websurveyor.com to conduct some surveys.

If the survey is to be conducted in-house, time and financial constraints and the skills of library staff influence the choice of survey format. Paper surveys are slow and expensive to conduct. Follow-up may be needed to ensure an adequate response rate. Respondents are not required to complete them in one sitting; for this reason, paper surveys may be longer than electronic surveys. E-mail surveys are less expensive than paper surveys; otherwise, their advantages are similar. Web-based surveys might be the least expensive to conduct, particularly if scripts are available to analyze the results automatically. They also offer several other advantages. For example, they can be edited up to the last minute, and the capabilities of the Web enable sophisticated branching and multimedia surveys, which are difficult or even impossible, in other formats. Both Web and e-mail surveys are easier to ignore than are paper surveys, and they assume participants have computer access. Web surveys have the further disadvantage that they must be completed in one sitting, which means they must be relatively short. They also require HTML skills to prepare and, if results are to be analyzed automatically, programming skills. Whether Web-based surveys increase response rate is not known. One DLF library reported conducting a survey in both e-mail and Web formats. An equal number of respondents chose to complete the survey in each format.

Considerable time and effort should be spent on preparing the content and presentation of surveys. Instructions and questions must be carefully and unambiguously worded and presented in a layout that is easy to read. If not, results will be inaccurate or difficult or impossible to interpret, worse yet, participants may not complete the survey. The choice of format affects the amount of control libraries have over the presentation or appearance of the survey. Print offers the most control; with e-mail and Web-based formats, there is no way for the library to know exactly what the survey will look like when it is viewed using different e-mail programs or Web browsers. The group preparing e-mail or Web surveys might find it helpful to view the survey using e-mail programs and Web browsers available on campus to ensure that the presentation is attractive and intelligible.

Libraries pilot test survey instructions and questions with a few users and revise them on the basis of test results to solve problems with vocabulary, wording, and the layout or sequence of the questions. Pilot tests also indicate the length of time required to complete a survey. Libraries appear to have ballpark estimates for how long it should take to complete their surveys. If the time it takes participants to complete the survey in the pilot tests exceeds this figure, questions might be omitted. The survey instructions include the estimated time required to complete the survey.

DLF respondents reported using different approaches to distribute or provide access to surveys, based on the sampling method and survey format. For example, when recruiting volunteers to take Web-based surveys, the survey might automatically pop up when users display the library home page or click the exit button on the online public access catalog (OPAC). Alternatively, a button or link on the home page might provide access to the survey. Posters or flyers might advertise the URL of a Web-based survey or, if a more carefully selected sample is needed, an e-mail address to contact to indicate interest in participating. Paper surveys may be made available in trays or handed to library users. With more carefully selected sample populations, e-mail containing log-in information to do a Web-based survey, or the e-mail or paper survey itself, is sent to the targeted sample. Paper surveys can be distributed as e-mail enclosures or via campus or U.S. mail. DLF respondents indicated that all of these methods worked well.

Libraries use spreadsheet or statistical software to analyze the quantitative responses to surveys. Cross-tabulations are conducted to discover whether different user groups responded to the questions differently; for example, to discover whether the priorities of undergraduate students are different from those of graduate students or faculty. Some libraries compare the distribution of survey respondents with the demographics of the campus to determine whether the distribution of user groups in their sample is representative of the campus population. A few libraries have used content analysis software to analyze the responses to open-ended questions.

2.1.4. Who Uses Survey Results? How Are They Used?

Libraries share survey results with the people empowered to decide how those results will be applied. The formality of the survey and the sample size also determine who will see the results and participate in interpreting them and determining how they will be used. High-profile, potentially contentious survey topics or research purposes tend to be treated more formally. They entail the use of larger samples and generate more interest. Survey results of user satisfaction with the library Web site might be presented to the library governing council, which will decide how the results will be used. Data from more informal surveys might be shared strictly within the department that conducted the survey. For example, the results of a survey of user satisfaction with the laptop-lending program might be presented to the department, whose members will then decide whether additional software applications should be provided on the laptops. Striking or significant results from a survey of any size seem to bubble up to the attention of library administrators, particularly if follow-up might have financial or operational implications or require interdepartmental cooperation. For example, results of a survey of reference service that suggest that users would be better served by longer reference desk hours or staffing with systems office personnel in addition to reference librarians should be brought to the addition of library administration. Survey data might also be shared with university administrators, faculty senates, library advisory boards, and similar groups, to win or bolster support for changing directions in library strategic planning or to support requests for additional funding. Multiyear trends are often included in annual reports. The results are also presented at conferences and published.

Although survey results often confirm expectations and validate what the library is doing, sometimes the results are surprising. In this case, they may precipitate changes in library services, user interfaces, or plans. The results of the DLF survey indicate the following applications of survey data:

Library administrators have used survey results to inform budget requests and secure funding from university administrators for electronic resources and library facilities.
Library administrators and middle managers have used survey results to guide reallocation of resources to better meet user needs and expectations. For example, low-priority services have been discontinued. More resources have been put into improving high-priority services with low satisfaction ratings or into enhancing existing services and tools or developing new ones.
Collection developers have used survey results to inform investment decisions-for example, to decide which vendor’s Modern Language Association (MLA) bibliography to license; whether to license a product after the six-month free trial period; or whether to drop journal titles, keep the titles in both print and electronic format, or add the journals in electronic format. Developers have also used survey data to inform collection-development decisions, for example, to set priorities for content to be digitized for inclusion in local collections or to decide whether to continue to create and collect analog slides rather than move entirely to digital images.
Service providers, such as reference, circulation, and resource sharing (interlibrary loan [ILL] and document delivery) departments, have used survey results to identify problem areas and formulate steps to improve service quality in a variety of ways, for example, by reducing turnaround time for ILL requests, solving problems with network ports and dynamic host assignments for loaner laptops, helping users find new materials in the library, improving staff customer service skills, assisting faculty in the transition from traditional to electronic reserves, and developing or revising instruction in the use of digital collections, online finding aids, and vendor products.
Developers have used survey results to set priorities and inform the customization or development of user interfaces for the OPAC, the library Web site, local digital collections, and online exhibits. Survey results have guided the revision of Web site vocabulary, the redesign of navigation and content of the library Web site, and the design of templates for personalized library Web pages. They have also been used to identify online exhibits that warrant upgrading. · Survey results have been used to inform or establish orientation, technical competencies, and training programs for staff, to prepare reports for funding agencies, and to inform a Request for Proposals from ILS vendors.
A multilibrary organization has conducted surveys to assess the need for original cataloging, the use of shared catalog records and vendor records, the standards for record acceptance (without local changes), and the applicability of subject classifications to library Web pages-all to inform plans for the future and ensure the appropriate allocation of cataloging resources.

DLF respondents mentioned that survey results often fueled discussion of alternative ways to solve problems identified in the survey. For example, when users report that they want around-the-clock access to library facilities, libraries examine student wages (since students provide most of the staffing in libraries during late hours) and management of late-night service hours. When users complain that use of the library on a campus with many libraries is unnecessarily complicated, libraries explore ways to reorganize collections to reduce the number of service points. When users reveal that the content of e-resources is not what they expect, libraries evaluate their aggregator and document delivery services.

2.1.5. What Are the Issues, Problems, and Challenges With Surveys?

2.1.5.1. The Costs and Benefits of Different Types of Surveys

DLF respondents agreed that general surveys are not very helpful. Broad surveys of library collections and services do provide baseline data and, if the same questions are repeated in subsequent surveys, offer longitudinal data to track changing patterns of use. However, such surveys are time-consuming and expensive to prepare, conduct, and interpret. Getting people to complete them is difficult. The results are shallow and require follow-up research. Some libraries believe the costs of such surveys exceed the benefits and that important usage trends can be tracked more cost-effectively using transaction log analysis. (See section 3.)

Point-of-use surveys that focus on a specific subject, tool, or product work as well as, or better than, general surveys. They are quicker to prepare and conduct, easier to interpret, and more cost-effective than broad surveys. However, they must be repeated periodically to assess trends, and they, too, frequently require follow-up research.

User satisfaction surveys can reveal problem areas, but they do not provide enough information to solve the problems. Service quality surveys, based on the gap model (which measures the “gap” or difference between users’ perceptions of excellent service and their perceptions of the service they received), are preferred because they provide enough information to plan service improvements. Unfortunately, service quality surveys are much more expensive to conduct than user satisfaction surveys.

2.1.5.2. The Frequency of Surveys

Surveys are so popular that DLF respondents expressed concern about their number and frequency. Over-surveying can decrease participation and make it more difficult to recruit participants. When the number of completed surveys is very small, the results are meaningless. Conducting surveys as a way to market library resources might exacerbate the problem.

2.1.5.3. Composing Survey Questions

The success of a survey depends on the quality and precision of the questions asked-their wording, presentation, and appropriateness to the research purpose. In the absence of in-house survey expertise, adequate training, or consultation with an expert, library surveys often contain ambiguous or inaccurate questions. In the worst cases, the survey results are meaningless and the survey must be entirely revised and conducted again the following year. More likely, the problem applies to particular questions rather than to the entire survey. For example, one DLF respondent explained that a survey conducted to determine the vocabulary to be used on the library Web site did not work well because the categories of information that users were to label were difficult to describe, particularly the category of “full-text” electronic resources. Developing appropriate and precise questions is the key reason for pilot testing survey instruments.

Composing well-worded survey questions requires a sense of what respondents know and how they are likely to respond. DLF respondents reported the following examples. A survey conducted to assess interface design based on heuristic principles did not work well, probably because the respondents lacked the knowledge and skills necessary to apply heuristic principles to interface design (see section 2.4.1.1). Surveys that ask respondents to specify the priority of each service or collection in a list yield results where everything is simply ranked either “high” or “low,” which is not particularly informative. Similarly, surveys that ask respondents how often they use a service or collection yield results of either “always use” or “never use.” Where it is desirable to compare or contrast collections or services, it is important to require users to rank the relative priority of services or collections and to rank the relative frequency of use. Otherwise, interpreting the results will be difficult.

Asking open-ended questions and soliciting comments can also be problematic. Many respondents will not take the time to write answers or comments. If they do, the information they provide can offer significant insights into user perceptions, needs, and expectations. However, analyzing the information is difficult, and the responses can be incomplete, inconsistent, or illegible. One DLF respondent reported having hundreds of pages of written responses to a large survey. Another respondent explained that he and his staff “spent lots of time figuring out how to quantify written responses.” A few DLF libraries have attempted to automate the process using content analysis software, but none of them was pleased with the results. Perhaps the problem is trying to extract quantitative results from qualitative data. The preferred approach appears to be to limit the number of open-ended questions and analyze them manually by developing conceptual categories based on the content of the comments. Ideally, the categories would be mutually exclusive and exhaustive (that is, all the data fit into one of them). After the comments are coded into the categories, the gist would be extracted and, if possible, associated with the quantitative results of the survey. For example, do the comments offer any explanations of preferences or problems revealed in the quantitative data? The point is to ask qualitative questions if and only if you have the resources to read and digest the results and if your aims in conducting the survey are at least partly subjective and indicative, as opposed to precise and predictive.

2.1.5.4. Lack of Analysis or Application

Theoretically, the process is clear: prepare the survey, conduct the survey, analyze and interpret the results, decide how to apply them, and implement the plan. In reality, the process frequently breaks down after the survey is conducted, regardless of how carefully it was prepared or how many hundreds of respondents completed it. Many DLF respondents reported surveys whose results were never analyzed. Others reported that survey results were analyzed and recommendations made, but nothing happened after that. No one knew, or felt comfortable enough to mention, who dropped the ball. No one claimed that changes in personnel were instrumental in the failure to analyze or apply the survey results. Instead, they focused on the impact this has on the morale of library staff and users. Conducting research creates expectations; people expect results. Faculty members in particular are not likely to participate in library research studies if they never see results. Library staff members are unlikely to want to serve on committees or task forces formed to conduct studies if the results are never applied.

The problem could be loss of momentum and commitment, but it could also be lack of skill. Just as preparing survey questions requires specific skills, so too do analysis, interpretation, and application of survey results. Libraries appear to be slow in acquiring the skills needed to use survey data. The problem is exacerbated when survey results conflict with other data. For example, a DLF respondent reported that their survey data indicate that users do not want or need reference service, even though the number of questions being asked at the reference desk is increasing. Morale takes a hit if no concrete next steps can be formulated from survey results or if the data do not match known trends or anecdotal evidence. In such cases, the smaller the sample, the more likely the results will be dismissed.

2.1.5.5. Lack of Resources or Comprehensive Plans

Paper surveys distributed to a statistically significant sample of a large university community can cost more than $10,000 to prepare, conduct, and analyze. Many libraries cannot afford or choose not to make such an investment. Alternative formats and smaller samples seem to be the preferred approach; however, even these take a considerable amount of time. Furthermore, surveys often fail to provide enough information to enable planners to solve the problems that have been identified. Libraries might not have the human and financial resources to allocate to follow-up research, or they could simply have run out of momentum. The problem could also be a matter of planning. If the research process is not viewed from conception through application of the results and follow-up testing, the process could likely halt at the point where existing plans end.

2.2. Focus Groups

2.2.1. What Is a Focus Group?

A focus group is an exploratory, guided interview or interactive conversation among seven to ten participants with common interests or characteristics.³ The purpose of a focus group is to test hypotheses; reveal what beliefs the group holds about a particular product, service, or opportunity and why; or to uncover detailed information about complex issues or behaviors from the group’s perspective. Focus group studies entail several such group conversations to identify trends and patterns in perception across groups. Careful analysis of the discussions reveals insights into how each group perceives the topic of discussion.

A focus group interview is typically one to two hours long. A trained moderator guides the conversation using five to ten predetermined questions or key issues prepared as an “interview guide.” The questions are open-ended and noncommittal. They are simply stated and carefully articulated. The questions are asked in a specific sequence, but there are no predetermined response categories. The moderator clarifies anything that participants do not understand. The moderator may also ask probing follow-up questions to identify concepts important to the participants, pursue interesting leads, and develop and test hypotheses. In addition to the moderator, one or two observers take detailed notes.

Focus group discussions are audio- or videotaped. Audiotape is less obtrusive and therefore less likely to intimidate the participants. Participants who feel comfortable are likely to talk more than those who are not; for this reason, audiotape and well-trained observers are often preferred to videotape. The observers’ notes should be so complete that they can substitute if the tape recorder does not work.

Focus groups are an effective and relatively easy way to gather insight into complex behavior and experience from the participants’ perspective. Because they can reveal how groups of people think and feel about a particular topic and why they hold certain opinions, they are good for detecting changes in behavior. Participant responses can not only indicate what is new but also distinguish trends from fads. Interactive discussion among the participants creates synergy and facilitates recall and insight. A few focus groups can be conducted at relatively low cost. Focus group research can inform the planning and design of new programs or services, be it a means for evaluating existing programs or services, and facilitate the development of strategies for improvement and outreach. Focus groups are also helpful as prelude to survey or protocol research; they may be used to identify appropriate language, questions, or tasks, and as follow-up to survey or protocol research to get clarification or explanation of factors influencing survey responses or user behaviors. (Protocol research is discussed in section 2.3.)

The quality of the responses to focus group questions depends on how clearly the questions are asked, the moderator’s skills, and the participants’ understanding of the goals of the study and what is expected of them. A skilled moderator is critical to the success of a focus group. Moderators must quickly develop rapport with the participant, remain impartial, and keep the discussion moving and focused on the research objectives. They should have background knowledge of the discussion topic and must be able to repress domineering individuals and bring everyone into the conversation. Before the focus group begins, the moderator should observe the participants and, if necessary, strategically seat extremely shy or domineering individuals. For example, outspoken, opinionated participants should be placed to the immediate left or right of the moderator and quiet-spoken persons must be placed at some distance from them. This enables the moderator to shut out the domineering person simply by turning his or her torso away from the individual. Moderators and observers must avoid making gestures (for example, head nodding) or comments that could bias the results of the study.

Moderators must be carefully selected, because attitude, gender, age, ethnicity, race, religion, and even clothing can trigger stereotypical perceptions in focus group participants and bias the results of the study. If participants do not trust the moderator, are uncomfortable with the other participants, or are not convinced that the study or their role is important, they can give incomplete, inaccurate, or biased information. To facilitate discussion, reduce the risk of discomfort and intimidation, and increase the likelihood that participants will give detailed, accurate responses to the focus group questions, focus groups should be organized so that participants and, in some cases, the moderator are demographically similar.

The selection of demographic participant groupings and focus group moderator should be based on the research purpose, the sensitivity of the topic, and an understanding of the target population. For example, topics related to sexual behavior or preferences suggest conducting separate focus groups for males and females in similar age groups with a moderator of the same age and gender. When the topic is not sensitive and the population is diverse, the research purpose is sufficient to determine the demographic groupings for selecting participants. For example, three focus groups-for undergraduate students, graduate students, and faculty-could be used to test hypotheses about needs or expectations for library resources among these groups. Mixing students and faculty could intimidate undergraduates. Although homogeneity is important, focus group participants should be sufficiently diverse to allow for contrasting opinions. Ideally, the participants do not know one another. This is because if they do, they tend to form small groups within the focus group and make it harder for the moderator to manage.

The primary disadvantage of focus groups is that participants may give false information to please the moderator, stray from the topic, be influenced by peer pressure, or seek a consensus rather than explore ideas. A dominating or opinionated participant can make more reserved participants hesitant to talk, which could bias the results. In addition, data gathered in focus groups can be difficult to evaluate because such information can be chaotic, qualitative, or emotional rather than objective. The findings should be interpreted at the group level. The small number of participants and frequent use of convenience sampling severely limit the ability to generalize the results of focus groups, and the results cannot be generalized to groups with different demographic characteristics. However, the results are more intelligible and accessible to lay audiences and decision makers than are complex statistical analyses of survey data.

A final disadvantage of focus groups is that they rely heavily on the observational skills of the moderator and observer(s), who will not see or hear everything that happens, and will see or hear even less when they are tired or bored. How the moderators or observers interpret what they see and hear depends on their point of reference, cultural bias, experience, and expectations. Furthermore, observers adjust to conditions. They may eventually fail to recognize language or behaviors that become commonplace in a series of focus groups. In addition, human beings cannot observe something without changing it. The Heisenberg principle states that any attempt to get information out of a system changes it. In the context of human subjects research, this is called the Hawthorne or “guinea pig” effect. Being a research subject changes the subject’s behavior. Having multiple observers can compensate for many of these limitations and increase the accuracy of observational studies, but it can also further influence the behaviors observed. The best strategy is to articulate the specific behaviors or aspects of behavior to be observed before conducting the study. Deciding, on the basis of the research objectives, what to observe and how to record the observations, coupled with training the observers, facilitates systematic data gathering, analysis of the research findings, and the successful completion of observational studies.

2.2.2. Why Do Libraries Conduct Focus Groups?

More than half of the DLF respondents reported conducting focus groups. They chose to conduct focus groups rather than small, targeted surveys because focus groups offer the opportunity to ask for clarification and to hear participants converse about library topics. Libraries have conducted focus groups to assess what users do or want to do and to obtain information on the use, effectiveness, and usefulness of particular library collections, services, and tools. They have also conducted focus groups to verify or clarify the results from survey or user protocol research, to discover potential solutions to problems identified in previous research, and to help decide what questions to ask in a survey. One participant reported conducting focus groups to determine how to address practical and immediate concerns in implementing a grant-funded project.

Data gathered from focus groups are used to inform decision making, strategic planning, and resource allocation. Focus groups have the added benefit of providing good quotations that are effec tive in public relations publications and presentations or proposals to librarians, faculty, university administrators, and funders. Several DLF respondents observed that a few well-articulated comments from users in conjunction with quantitative data from surveys or transaction log analysis can help make a persuasive case for changing library practice, receiving additional funding, or developing new services or tools.

2.2.3. How Do Libraries Conduct Focus Groups?

DLF respondents reported conducting focus groups periodically. Questions asked in focus groups, unlike those included in surveys, are not repeated; they are not expected to serve as a basis for assessing trends over time. The decision to convene a focus group appears to be influenced by the organization of the library and the significance or financial implications of the decision to be informed by the focus group data. For example, in a library with an established usability program or embedded culture of assessment (including a budget and in-house expertise), a unit head can initiate focus group research. If the library must decide whether to purchase an expensive product or undertake a major project that will require the efforts of personnel throughout the organization, a larger group of people might be involved in sanctioning and planning the research and in approving the expenditure to conduct it.

Once the decision has been made to conduct focus groups, one or more librarians or staff prepare the interview questions, identify the demographic groups appropriate for the research purpose, determine how many focus groups to conduct, decide how to recruit participants, and plan the budget and timetable for gathering, analyzing, interpreting and applying the data.

Focus group questions should be pilot tested with a group of users and revised on the basis of the test results to solve problems with vocabulary, wording, or the sequence of questions, and to ensure that the questions can be discussed in the allotted time. However, few DLF respondents reported testing focus group questions. More likely, the questions are simply reviewed by other librarians and staff before conducting the study. Questions are omitted or reorganized during the initial focus group session, on the basis of time constraints and the flow of the conversation. The revised list of questions is used in subsequent focus groups.

DLF libraries have used e-mail, posters, and flyers to recruit participants for focus group studies. The invitations to prospective participants briefly describe the goals and significance of the study, the participants’ role in the study, what is expected of them, how long the groups will last, and any token of appreciation that will be given to the participants. Typically, focus groups are scheduled for 60 to 90 minutes. If food is provided during the focus group, a 90-minute session is preferred. When efforts fail to recruit at least six participants for a group, some libraries have conducted individual interviews with the people they did recruit.

In addition to preparing interview questions and recruiting and scheduling participants, focus group preparation entails the following:

Recruiting, scheduling, and training a moderator and observer(s) for each focus group
Scheduling six to twelve (preferably seven to ten) participants in designated demographic groups, and sending them a reminder a week or a few days before the focus group
Scheduling an appropriate room for each focus group. DLF respondents offered the following cautions:
- Make sure that the participants can easily find the room. Put up signs if necessary.
- Beware of construction or renovation nearby, the sound of heating or air-conditioning equipment, and regularly scheduled noise makers (for example, a university marching band practice on the lawn outside).
- Ensure that there are sufficient chairs in the room to comfortably seat the participants, moderator, and observer(s) around a conference table.
- If handouts are to be distributed, for example, for participants to comment on different interface designs, be sure that the table is large enough to spread out the documents.
Ordering food if applicable
Photocopying the focus group questions for the moderator and observer(s)
Testing the audio- or videotape equipment and purchasing tapes

The focus group moderator or an observer typically arrives at the room early, adjusts the light and temperature in the room, arranges the chairs, and retests and positions the recording equipment. If audiotape is used, a towel or tablet is placed under the recording device to absorb any table vibrations. When the participants arrive, the moderator thanks them for participating, introduces and explains the roles of moderator and observer, reiterates the purpose and significance of the research, confirms that their anonymity will be preserved in any discussion or publication of the study, and briefly describes the ground rules and how the focus group will be conducted. The introductory remarks emphasize that the goal of the study is not for the participants to reach consensus, but to express their opinions and share their experiences and concerns. Disagreement and discussion are invited. Sometimes the first question is asked round-robin, so that each participant responds and gets comfortable talking. Subsequent questions are answered less formally, more conversationally. The moderator asks the prepared questions and may ask undocumented, probing questions or invite further comments to better understand what the participants are saying and test relevant hypotheses that surface during the discussion. For example, “Would you explain that further?” or “Please give me an example.” The moderator uses verbal and body language to invite comments from shy or quiet participants and to discourage domineering individuals from turning dialogue into monologue. If participants ask questions unre lated to the research purpose, the moderator indicates that the question is outside the scope of the topic under discussion, but that he or she will be happy to answer it after the focus group is completed. Observers have no speaking roles.

When the focus group is over, the moderator thanks the participants and might give them a token of appreciation for their participation. The moderator may also answer any questions the participants have about the study, the service or product that was the focus of the study, or the library in general. Observer notes and tapes are labeled immediately with the date and number of the session.

Libraries might or might not transcribe the focus group tapes. Some libraries believe the cost of transcribing exceeds the benefits of having a full transcription. One DLF respondent explained that clerical help is typically unfamiliar with the vocabulary or acronyms used by focus group participants and therefore cannot accurately transcribe the tapes. This means that a professional must also listen to the tapes and correct the transcriptions, which significantly increases the cost of the study. When the tapes are transcribed, a few libraries have used content analysis software to analyze the transcriptions, but they have not been pleased with the results, perhaps because the software attempts to conduct a quantitative analysis of qualitative data. Even when the tapes are not transcribed, at least one person listens to them carefully and annotates the notes taken by observers.

Analysis of focus group data is driven by the research purpose. Ideally, at least two people analyze the data-the moderator and observer-and there is high interrater reliability. With one exception, DLF respondents did not discuss the process of analyzing focus group data in detail. They talked primarily about their research purpose, what they learned, and how they applied the results. Participants who mentioned a specific method of data analysis named content analysis, but they neither described how they went about it nor specified who analyzed the data. No one offered an interrater reliability factor. Only one person provided details about the data analysis and interpretation. This person explained that the moderator analyzed the focus group data by using content analysis to cluster similar concepts, examining the context in which these concepts occurred, looking for changes in the focus group participants’ position based on the discussion, weighting responses based on the specificity of the participants’ experience, and looking for trends or ideas that cut across one or more focus group discussions. The overall impression from the DLF survey is that focus group data are somehow examined by question and user group to identify issues, problems, preferences, priorities, and concepts that surface in the data. The analyst prepares a written summary of significant findings from each focus group session, with illustrative examples or quotations from the raw data. The summaries are examined to discern significant differences among the groups or to determine whether the data support or do not support hypotheses being tested.

2.2.4. Who Uses Focus Group Results? How Are They Used?

Decisions as to who applies the results of focus group research and how it is applied depend on the purpose of the research, the significance of the findings, and the organization of the library. For example, the results of focus groups conducted to inform redesign of the library Web site were presented to the Web Redesign Committee. The results of focus groups conducted to assess the need for and use of electronic resources were presented to the Digital Library Initiatives Department. The larger the study, the more attention it seems to draw. Striking or significant results come to the attention of library administrators, especially if potential next steps have financial or operation implications or require interdepartmental cooperation. For example, if the focus group results indicate that customer service training is required or that facilities must be improved to increase user satisfaction, the administrator should be informed. Focus groups provide excellent quotations in support of cases being presented to university administrators, faculty senates, and deans’ councils to gain support for changing library directions or receiving additional funding. The results are also presented at conferences and published in the library literature.

The results of the DLF study indicate that focus group data have been used to

Clarify or explain factors influencing survey responses, for example, to discover reasons for undergraduate students’ declining satisfaction with the library
Determine questions to ask in survey questionnaires, tasks to be performed in protocols, and the vocabulary to use in these instruments
Identify user problems and preferences related to collection format and system design and functionality
Confirm hypotheses that user expectations and perceived needs for a library Web site differ across discipline and user status
Confirm user needs for more and better library instruction
Confirm that faculty are concerned that students cannot judge the quality of resources available on the Web and do not appreciate the role of librarians in selecting quality materials
Target areas for fundraising
Identify ways to address concerns in grant-funded projects

In addition, results from focus group research have been used to inform processes that resulted in

Canceling journal subscriptions
Providing needed information to faculty
Redesigning the library Web site, OPAC, or other user interface
Providing personalized Web pages for library users
Sending librarians and staff to customer service training
Eliminating a high-maintenance method of access to e-journals
Planning the direction and development priorities for the digital library, including the scope, design, and functionality of digital library services
Planning and allocating resources to market library collections and services continuously
Creating a Distance Education Department to integrate distance learning with library services
Renovating library facilities

2.2.5. What Are the Issues, Problems, and Challenges with Focus Groups?

2.2.5.1. Unskilled Moderators and Observers

If the moderator of a focus group is not well trained or has a vested interest in the research results, the discussion can easily go astray. Without proper facilitation, some individuals can dominate the conversation, while others may not get the opportunity to share their views. Faculty in particular can be problematic subjects. They frequently have their own agendas and will not directly answer the focus group questions. A skilled, objective moderator equipped with the rhetorical strategies and ability to keep the discussion on track, curtail domineering or rambling individuals, and bring in reticent participants is a basic requirement for a successful focus group.

Similarly, poor observer notes can hinder the success of a focus group. If observers do not know what comments or behaviors to observe and record, the data will be difficult, if not impossible, to analyze and interpret. The situation worsens if several observers attend different focus group sessions and record different kinds of things. Decisions should be made before conducting the focus groups to ensure that similar behaviors are observed and recorded during each focus group session. The following list can serve as a starting point for this discussion (Marczak and Sewell).

Characteristics of the focus group participants
Descriptive phrases or words used by participants in response to the key questions
Themes in the responses to the key questions
Subthemes held by participants with common characteristics · Indications of participant enthusiasm or lack of enthusiasm
Consistency or inconsistency between participant comments and observed behaviors
Body language
The mood of the discussion
Suggestions for revising, eliminating, adding questions in the future

2.2.5.2. Interpreting and Using the Data

A shared system of categories for recording observations will simplify the analysis and interpretation of focus group data. No DLF respondent mentioned establishing such a system before conducting a focus group study. Imposing a system after the data have been gathered significantly complicates interpreting the findings. The difficulty of interpreting qualitative data from a focus group study can lead to disagreement about the interpretation and delay preparation of the results. The limited number of participants in a typical focus group study, and the degree to which they are perceived to be representative of the target population, exacerbate the difficulty of interpreting and applying the results. The greater the time lapse between gathering the data and developing plans to use the data, the greater the risk of loss of momentum and abandonment of the study. The results of the DLF study suggest that the problem worsens if the results are presented to a large group within the library and if the recommended next steps are unpopular with or counterintuitive to librarians.

2.3. User Protocols

2.3.1. What Is a User Protocol?

A user protocol is a structured, exploratory observation of clearly defined aspects of the behavior of an individual performing one or more designated tasks. The purpose of the protocol is to gather in-depth insight into the behavior and experience of a person using a particular tool or product. User protocol studies include multiple research subjects to identify trends or patterns of behavior and experience. Data gathered from protocols provide insight into what different individuals do or want to do to perform specific tasks.

Protocol studies usually take 60 to 90 minutes per participant. The protocol is guided by a list of five to ten tasks (the “task script”) that individuals are expected to perform. Each participant is asked to think aloud while performing the designated tasks. The task script is worded in a way that tells the user what tasks to accomplish (for example, “Find all the books in the library catalog published by author Walter J. Ong before 1970), but not told how to accomplish the tasks using the particular tool or product involved in the study. Discovering whether or how participants accomplish the task is a typical goal of protocol research. A facilitator encourages the participants to think aloud if they fall silent. The facilitator may clarify what task is to be performed, but not how to perform it.

The participant’s think-aloud protocol is audio- or videotaped, and one or two observers take notes of his or her behavior. Some researchers prefer audiotape because it is less obtrusive. Experts in human-computer interaction (HCI) prefer videotape. In HCI studies, software can be used to capture participant keystrokes.

Protocols are very strict about the observational data to be collected. Before the study, the protocol author designates the specific user comments, actions, and other behaviors that observers are to record. The observers’ notes should be so complete that they can substitute for the audiotape, should the system fail. In HCI studies, observer notes should capture the participant’s body language, selections from software menus or Web pages, what the user apparently does or does not see or understand in the user interface, and, depending on the research goals, the speed and success (or failure) of task completion. Employing observers who understand heuristic principles of good design facilitates understanding the problems us ers encounter, and therefore the recording of what is observed and interpretation of the data.

User protocols are an effective method to identify usability problems in the design of a particular product or tool, and often the data provide sufficient information to enable the problems identified to be solved. These protocols are less useful to identify what works especially well in a design. Protocols can reveal the participant’s mental model of a task or the tool that he or she is using to perform the task. Protocols enable the behavior to be recorded as it occurs and do not rely on the participants’ memories of their behaviors, which can be faulty. Protocols provide accurate descriptions of situations and, unlike surveys, can be used to test causal hypotheses. Protocols also provide insights that can be tested with other research methods and supplementary data to qualify or help interpret data from other studies.

For protocols to be effective, participants must understand the goals of the study, appreciate their role in the study, and know what is expected of them. The selection of participants should be based on the research purpose and an understanding of the target population. Facilitators and observers must be impartial and refrain from providing assistance to struggling or frustrated participants. However, a limit can be set on how much time participants may spend trying to complete a task, and facilitators can encourage participants to move to the next task if the time limit is exceeded. Without a time limit, participants can become so frustrated trying to complete a task that they abandon the study. In HCI studies, it is essential that the participants understand it is the software that is being tested, not their skill in using it.

The primary disadvantage of user protocols is that they are expensive. Protocols require at least an hour per participant, and the results apply only to the particular product or tool being tested. In addition, protocol data can be difficult to evaluate, depending on whether the research focuses on gathering qualitative information (for example, the level of participant frustration) or quantitative metrics (for example, success rate and speed of completion). The small number of participants and frequent use of convenience sampling limit the ability to generalize the results of protocol studies to groups with different demographic characteristics or to other products or tools. Furthermore, protocols suffer from the built-in limitations of human sensory perception and language, which affect what the facilitator and observer(s) see and hear and how they interpret and record it.

2.3.2. Why Do Libraries Conduct User Protocols?

Half of the DLF respondents reported conducting or planning to conduct user protocols. With rare exception, libraries appear to view think-aloud protocols as the premier research method for assessing the usability of OPACs, Web pages, local digital collections, and vendor products. Protocol studies are often precipitated or informed by the results of previous research. For example, focus groups, surveys, and heuristic evaluations can identify frequently performed or suspected problematic tasks to be included in protocol research. (Heuristic evaluations are discussed in section 2.4.1.1.)

Libraries participating in the DLF study have conducted think-aloud protocols to

Identify problems in the design, functionality, navigation, and vocabulary of the library Web site or user interfaces to different products or digital collections
Assess whether efforts to improve service quality were successful
Determine what information to include in a Frequently Asked Questions (FAQ) database and the design of access points for the database

One DLF respondent reported plans to conduct a protocol study of remote storage robotics.

2.3.3. How Do Libraries Conduct User Protocols?

DLF respondents reported conducting user protocols when the results of previous research or substantial anecdotal evidence indicated that there were serious problems with a user interface or when a user interface was being developed as part of a grant-funded project, in which case the protocol study is described in the grant proposal. When protocols are conducted to identify problems in a user interface, often they are repeated later, to see whether the problems were solved in the meantime. In the absence of an established usability-testing program and budget, the decision to conduct protocols can involve a large group of people because of the time and expense of conducting such research.

After the decision has been made to conduct user protocols, one or more librarians or staff members prepare the task script, choose the sampling method, identify the demographic groups appropriate for the research purpose, determine how many participants to recruit in each group, decide how to recruit them, recruit and schedule the participants, and plan the budget and timetable for gathering, analyzing, interpreting and applying the data. Jakob Nielsen’s research has shown that four to six subjects per demographic group is sufficient to capture most of the information that could be discovered by involving more subjects. Beyond this number, the cost exceeds the benefits of conducting more protocols (Nielsen 2000). Sometimes protocols are conducted with only two or three subjects per user group because of the difficulty of recruiting research subjects.

DLF libraries immediately follow user protocol sessions with a brief survey or interview to gather additional information from each participant. This information helps clarify the user’s behavior and provides some sense of the user’s perception of the severity of the problems encountered with the user interface. One or more people prepare the survey or interview questions. In addition, some libraries prepare a recording sheet that observers use to structure their observations and simplify data analysis. Some also prepare a written facilitator guide that outlines the entire session.

DLF libraries pilot test the research instruments with at least one user and revise them on the basis of the test results. Pilot testing can help solve problems with the vocabulary, wording, or sequencing of protocol tasks or survey questions; it also can target ways to refine the recording sheet to facilitate rapid recording of observations. Pilot testing also enables the researcher to ensure that the protocol and follow-up research can be completed in the time allotted.

DLF libraries have used e-mail, posters, and flyers to recruit participants for user protocol studies. The recruitment information briefly describes the goals and significance of the research, the participants’ role, and what is expected of them, including the time it will take to participate and any token of appreciation that will be given to the participants. Other than preparing the instruments and recruiting participants, preparation for a user protocol study closely resembles preparation for a focus group. It involves the following steps:

Recruiting, scheduling, and training a facilitator and one or more observers; in some cases, the facilitator is the sole observer
Scheduling the participants and sending them a reminder a week or a few days before the protocol
Scheduling a quiet room; protocol studies have been conducted in offices, laboratories, or library settings.
If necessary, ordering computer or videotape equipment to be delivered a half hour before the protocol is to begin
Photocopying the research instruments
Testing the audio- or videotape equipment and purchasing tapes

The facilitator or an observer arrives at the room early, adjusts the light and temperature in the room, arranges the chairs so that the facilitator and observers can see the user’s face and the computer screen, and tests and positions the recording equipment. If audiotape is used, a towel or tablet is placed under the recording device to absorb any table vibrations. The audiotape recorder is positioned close enough to the user to pick up his or her comments, but far enough away from the keyboard to avoid capturing each key click. If computer or videotape equipment must be delivered to the room, someone must arrive at the room extra early to confirm delivery, be prepared to call if it is not delivered, test the computer equipment, and allow time for replacement or software reinstallation if something is not working.

Though HCI experts recommend videotape, all but one of the DLF libraries reported using audiotape to record user protocols. The library that used videotape observed that the camera made users uncomfortable and the computer screen did not record well, so the group used audiotape instead for the follow-up protocols. Few DLF libraries have the resources or facilities to videotape their research, and the added expense of acquiring these might also be a deterrent to using videotape.

When participants arrive, the facilitator thanks then for participating, explains the roles of facilitator and observer(s), reiterates the purpose and significance of the research, confirms that anonymity will be preserved in any discussion or publication of the study, and describes the ground rules and how the protocol will be conducted. The facilitator emphasizes that the goal of the study is to test the software, not the user. The facilitator usually reminds participants multiple times to think aloud. For example, “What are you thinking now?” or “Please share your thoughts.” Observers have no speaking role.

DLF libraries immediately followed protocol sessions with brief interviews or a short survey to capture additional information and give participants the opportunity to clarify what they did in the protocol, describe their experience, and articulate expectations they had about the task or the user interface that were not met. Protocol research is sometimes followed up with focus groups or surveys to confirm the findings with a larger sample of the target population.

When the protocol is over, the facilitator thanks the participant and usually gives him or her a token of appreciation. The facilitator also answers any questions the participant has. Observer notes and tapes are labeled immediately.

DLF libraries might or might not transcribe protocol tapes for the same reasons they do or do not transcribe focus group tapes. If the tapes are not transcribed, at least one person listens to them and annotates the observer notes. With two exceptions, DLF respondents did not discuss the process of analyzing, interpreting, and figuring out how to apply the protocol results, although several did mention using quantitative metrics. They simply talked about significant applications of the results. The two cases that outlined procedures for analyzing, interpreting, and applying results merit examination:

Case one: The group responsible for conducting the protocol study created a table of observations (based on the protocol data), interpretations, and accompanying recommendations for interface redesign. The recommendations were based on the protocol data and the application of Jakob Nielsen’s 10 heuristic principles of good user interface design (Nielsen, no date). The group assessed how easy or difficult it would be to implement each recommendation and plotted a continuum of recommendations based on the difficulty, cost, and benefit of implementing them. The cost-effective recommendations were implemented.
Case two: When protocol data identified many problems and yielded a high failure rate for task completion, the group responsible for the study did the following:
- Determined the severity of each problem on the basis of its frequency and distribution across users, whether it prevented users from successfully completing a task, and the user’s assessment of the severity of the problem, which was gathered in a follow-up survey.
- Formulated alternative potential solutions to the most severe problems on the basis of the protocol or follow-up survey data and heuristic principles of good design.
- Winnowed the list of possible solutions by consulting programmers and doing a quick-and-dirty cost-benefit analysis. Problems that can be fixed at the interface level are often less expensive to fix than those that require changes in the infrastructure.
- Recommended implementing the solutions believed to have the greatest benefit to users for the least amount of effort and expense.

The procedures in the two cases are similar, and although the other DLF respondents did not describe the process they followed, it could be that their processes resemble these. At least one other respondent reported ranking the severity of problems identified by protocol analysis to determine which problems to try to solve.

2.3.4. Who Uses Protocol Results? How Are They Used?

The results of the study suggest that who applies the results from user protocols and how the results are applied depend on the purpose of the research, the significance of the findings, and the organization of the library. The larger the study and the more striking its implications for financial and human resources, the more attention it draws in the library. Although the results of protocol studies are not always presented to university administrators, faculty senates, deans’ councils, and similar groups; they might be presented at conferences and published in the library literature.

DLF libraries have used significant findings from protocol analysis to inform processes that resulted in the following:

Customizing the OPAC interface, or redesigning the library Web site or user interfaces to local digital collections. Examples of steps taken based on protocol results include
- rearranging a hierarchy
- changing the order and presentation of search results
- changing the vocabulary, placement of links, or page layout
- providing more online help, on-screen instructions, or suggestions when searches fail
- changing the labeling of images
- changing how to select a database or start a new search
- improving navigation
- enhancing functionality
Revising the metadata classification scheme for image or text collections
Developing or revising instruction for how to find resources on the library Web site and how to use full-text e-resources and archival finding aids

The results of protocol studies have also been used to suggest revisions or enhancements to vendor products, to verify improvements in interface design and functionality, and to counter anecdotal evidence or suggestions that an interface should be changed.

2.3.5. What Are the Issues, Problems, and Challenges With User Protocols?

2.3.5.1. Librarian Assumptions and Preferences

Several DLF respondents commented that librarians can find it difficult to observe user protocols because they often have assumptions about user behavior or preferences for interface design that are challenged by what they witness. Watching struggling or frustrated participants and refraining from providing assistance run counter to the librarians’ service orientation. Participants often ask questions during the protocol about the software, the user interface, or how to use it. Facilitators and observers must resist providing answers during the protocol. Librarians who are unable to do this circumvent the purpose of the research.

Librarians can also be a problem when it comes to interpreting and applying the results of user protocols. Those trained in social science research methods often do not understand or appreciate the difference between HCI user protocols and more rigorous statistical research. They may dismiss results that challenge their own way of thinking because they believe the research method is not scientific enough or the pool of participants is too small.

2.3.5.2. Lack of Resources and Commitment

User protocols require skilled facilitators, observers, and analysts and the commitment of human and financial resources. Requisite skills might be lacking to analyze, interpret, and persuasively present the findings. Even if the skills are available, there could be a breakdown in the processes of collecting, analyzing, and interpreting the data, planning how to use the findings, and implementing the plans, which could include conducting follow-up research to gather more information. Often the process is followed to the last stage, implementation, where Web masters, programmers, systems specialists, or other personnel are needed. These people can have other priorities. Human and financial resources or momentum can be depleted before all the serious problems identified have been solved. Limited resources frequently restrict implementation to only the problems that are cheap and easy to fix, which are typically those that appear on the surface of the user interface. Problems that must be addressed in the underlying architecture often are not addressed.

2.3.5.3. Interpreting and Using the Data

Effective, efficient analysis of data gathered in user protocols depends on making key decisions ahead of time about what behaviors to observe and how to record them. For example, if quantitative usability metrics are to be used, they must be carefully defined. If the success rate is to be calculated, what constitutes success? Is it more than simply completing a task within a set time limit? What constitutes partial success, and how is it to be calculated? Similar questions should be posed and answers devised for qualitative data gathering during the protocols. Otherwise, observer notes are chaotic and data analysis may be as difficult as is analyzing the responses to open- ended questions in a survey. The situation worsens if different observers attend different protocols and record different kinds of things. Such key decisions should be made prior to conducting the study. If made afterward, they can result in significant lag time between data gathering and the presentation of plans to apply the results of the data analysis. The greater the lag time, the greater the risk of loss of momentum, which can jeopardize the entire effort.

2.3.5.4. Recruiting Participants Who Can Think Aloud

General problems and strategies for recruiting research subjects are discussed in section 4.2.1. DLF respondents reported difficulty in getting participants to think aloud. At least one librarian is considering conducting screening tests to ensure that protocol participants can think aloud. Enhancing the skills of the facilitator (through training or experience) and including a pretest task or two for the participants to get comfortable thinking aloud would be preferable to risking biasing the results of the study by recruiting only participants who are naturally comfortable thinking aloud.

2.4. Other Effective Research Methods

2.4.1. Discount Usability Research Methods

Discount usability research can be conducted to supplement more expensive usability studies. This informal research can be done at any point in the development cycle, but is most beneficial in the early stages of designing a user interface or Web site. When done at this time, the results of discount usability research can solve many problems and increase the efficiency of more formal testing by targeting specific issues and reducing the volume of data gathered. Discount usability research methods are not replacements for formal testing with users, but they are fruitful, inexpensive ways to improve interface design. In spite of these merits, few DLF libraries reported using discount methods. Are leading digital libraries not using these research methods because they are unaware of them or because they do not have the skills to use them?

2.4.1.1. Heuristic Evaluations

Heuristic evaluation is a critical inspection of a user interface conducted by applying a set of design principles as part of an iterative design process.⁴The principles are not a checklist, but conceptual categories or rules that describe common properties of a usable interface and guide close scrutiny of an interface to identify where it does not comply with the rules. Several DLF respondents referred to Nielsen’s heuristic principles of good design, mentioning the following:

Visibility of system status
Match between system and real world
User control and freedom
Consistency and standards
Recognition rather than recall
Flexibility and efficiency of use
Aesthetics and minimalist design
Error prevention
Assistance with recognizing, diagnosing, and recovering from errors
Help and documentation⁵

Heuristic evaluations can be conducted before or after formal usability studies involving users. They can be conducted with functioning interfaces or with paper prototypes (see section 2.4.1.2.). Applying heuristic principles to a user interface requires skilled evaluators. Nielsen recommends using three to five evaluators, including someone with design expertise and someone with expertise in the domain of the system being evaluated. According to his research, a single evaluator can identify 35 percent of the design problems in the user interface. Five evaluators can find 75 percent of the problems. Using more than five evaluators can find more problems, but at this point the cost exceeds the benefits (Nielsen 1994).

Heuristic evaluations take one to two hours per evaluator. The evaluators should work independently but share their results. An evaluator can record his or her own observations, or an observer may record the observations made by the evaluator. Evaluators follow a list of tasks that, unlike the task script in a user protocol, may indicate how to perform the tasks. The outcome from a heuristic evaluation is a compiled list of each evaluator’s observations of instances where the user interface does not comply with good design principles. To guide formulating solutions to the problems, each problem identified is accompanied by a list of the design principles that are violated in this area of the user interface.

Heuristic evaluations have several advantages over other methods for studying user interfaces. No participants need to be recruited. The method is inexpensive, and applying even a few principles can yield significant results. The results can be used to expand or clarify the list of principles. Furthermore, heuristic evaluations are more comprehensive than think-aloud protocols are, because they can examine the entire interface and because even the most talkative participant will not comment on every facet of the interface. The disadvantages of heuristic evaluations are that they require familiarity with good design principles and interpretation by an evaluator, do not provide solutions to the problems they identify, and do not identify mismatches between the user interface and user expectations. Interface developers sometimes reject the results of heuristic evaluations because no users were involved.

A few DLF libraries have conducted their own heuristic evaluations or have made arrangements with commercial firms or graduate students to do them. The evaluations were conducted to assess the user-friendliness of commercially licensed products, the library Web site, and a library OPAC. In the process, libraries have analyzed such details as the number of keystrokes and mouse movements required to accomplish tasks and the size of buttons and links that users must click. The results of these evaluations were referred to as a “wake-up call” to improve customer service. It is unclear from the survey whether multiple evaluators were used in these studies or the study was conducted in-house, and whether the libraries have the interface design expertise to apply heuristic principles or conduct a heuristic evaluation effectively. Nevertheless, several DLF libraries reported using heuristic principles to guide redesign of a user interface.

2.4.1.2. Paper Prototypes and Scenarios

Paper prototype and scenario research resembles think-aloud protocols, but instead of having users perform tasks with a functioning system, this method employs sketches, screen prints, or plain text and asks users how they would use a prototype interface to perform different tasks or how they would interpret the vocabulary. For example, where would they click to find a feature or information? What does a link label mean? Where should links be placed? Paper prototypes and scenarios can also be a basis for heuristic evaluations.

Paper prototype and scenario research is portable, inexpensive, and easy to assemble, provided that the interface is not too complicated. Paper prototypes do not intimidate users. If it is used early in the development cycle, the problems identified can be rectified easily because the system has not been fully implemented. Paper prototypes are more effective than surveys to identify usability, navigation, functionality, and vocabulary problems. The disadvantage is that participants interact with paper interfaces differently than they do with on-screen interfaces; that is, paper gets closer scrutiny.

A few DLF respondents reported using paper prototype research. They have used it successfully to evaluate link and button labels and to inform the design of Web sites, digital collection interfaces, and classification (metadata) schemes. One library used scenarios of horizontal paper prototypes, which provide a conceptual map of the entire surface layer of a user interface, and scenarios of vertical paper prototypes, which cover the full scope of a feature, such as searching or browsing. This site experimented with using Post-it^TM notes to display menu selections in a paper prototype study, and accordion-folded papers to imitate pages that would require scrolling. The notes were effective, but the accordion folds were awkward.

2.4.2. Card-Sorting Tests

Vocabulary problems can arise in any user study, and they are often rampant in library Web sites. A few respondents reported conducting research specifically designed to target or solve vocabulary prob lems, including card-sorting studies to determine link labels and appropriate groupings of links on their Web sites. Card-sorting studies entail asking individual users to

Organize note cards containing service or collection descriptions into stacks of related information
Label the stacks of related information
Label the service and collection descriptions in each stack

Reverse card-sorting exercises have been used to test the labels. These exercises ask users what category (label) they would use to find which service or collection. Alternatively, the researcher can simply ask users what they would expect to find in each category, then show them what is in each category and ask them what they would call the category.

The primary problem encountered in conducting card-sorting tests is describing the collections and services to be labeled and grouped. Describing “full-text” e-resources appears to be particularly difficult in card-sorting exercises, and the results of surveys, focus groups, and user protocols indicate that users often do not understand what “full-text” means. Unfortunately, this is the term found on many library Web sites.

FOOTNOTES

¹ To give the reader a better understanding of the care with which user studies must be designed and conducted, sample research instruments may be viewed at https://www.clir.org/wp-content/uploads/sites/6/instr.pdf.

² Much of the information in this section is taken from Chadwick, Bahr, and Albrecht 1984.

³ Much of the information in this section is taken from Chadwick, Bahr, and Albrecht 1984.

⁴ See, for example, Nielsen 1994. Other chapters in the book describe other usability inspection methods, including cognitive walk-throughs.

⁵ A brief description of these principles is available in Nielsen, no date.