By Vessela Ensberg
This is the second of a two-part blog post that discusses incorporating data management and data sharing plans into researcher workflows. The first blog in the series, by CLIR/DLF Postdoctoral Fellow Kendall Roark, appeared last week.
Traditionally, librarians have been involved at the beginning and the end of the research process, assisting researchers with finding information and with disseminating information. With the changing data landscape, however, it is important for the library to participate in other aspects of research. A librarian who can provide concise and customized training in needed areas—for example, documentation in the process of data collection—will be valuable to the research team.
In the life sciences, even the original collaboration points between researchers and librarians are changing with the growing influence of Big Data. For instance, the literature search at the beginning of a research project may be supplemented by a data set search. Discovering relevant data is hard, so keeping track of the new databases as well as all the clicks that go into doing a search are important skills that librarians can teach researchers. The number of NCBI (National Center for Biotechnology Information) databases alone has exploded over the past five years, and while it is great that they are interconnected, it is very hard to keep track of how a particular piece of information was found. Introducing researchers to MyNCBI and the ability to save recent searches and results will make their lives a lot easier. For example, one of the oldest tools available there—BLAST—searches for homologous (similar) DNA sequences. It’s a gold mine for teaching a search strategy based on filtering organisms, experimentally proven functions, and, for those who want to dig in more, the meaning of the E-value.
Another point of collaboration is in developing folder hierarchies before a project starts. Once a system is in place, there is little chance that a researcher will redo it. Suggesting good naming for the specific project adds value to a researcher’s documentation process. A lot of life scientists use Excel spreadsheets, so introducing spreadsheet best practices in a researcher’s field is another key education point. Try to be as specific as possible: find the metadata schema used in their field, present it to them, and find out how much of it is relevant during the data collection process.
I have found that a seminar on lab notebooks is a good way to get researchers’ attention. Lab notebooks are supposed to contain all the information necessary to reproduce an experiment. However, notebook maintenance is not taught in every lab, and the problem of good record keeping is complicated by the fact that while previously all measurements were recorded directly into the notebook, most data are now digital. There is little advice on how to manage the digital files that are produced by experiments, so librarians can help by providing file naming best practices. Researchers understand the problem from personal experience, so lab notebook management is a perfect educational seminar to offer when reaching out to scientists for the first time.
I would also recommend learning more about tools for data management. The process is time-consuming, so anything that can be automated should be. For instance, listing a large number of files resulting from a microscopy experiment can be done with Directory List & Print, which outputs the file names and paths in a document that can be further edited with information about the file content (e.g., what was on that microscope slide?). Electronic Lab Notebooks are becoming more widely accepted. Many have free demo periods, and I would encourage librarians to test and explore them. OneNote, Evernote, LabArchives, RSpace (redesigned eCat) and LabGuru would all be good places to start. There are many others (University of Utah has a long list), and some are more field-focused than others, so it is helpful to know researchers’ needs when exploring the notebooks.
Another tool for data management is Quartzy. It is a free tool used for building inventories and is very easy to learn. Inventory building is management of physical specimens and reagents, many of which are shared among the team. This is a perfect opportunity for librarians to suggest organizing strategies for labs. When talking to researchers about this tool, ask if they have encountered boxes of Eppendorf tubes left around for years—tubes nobody dares to use and nobody dares to throw out.
Finally, I suggest that data managers explore REDCap. This tool generates forms with built-in data management requirements and helps keep track of longitudinal projects. For example, entries for certain fields can be required and the type of data can be validated. The longitudinal study forms essentially create a database. Linking between separate projects is also possible (with the help of the institution’s REDCap administrator). REDCap is used in clinical studies, but I am currently finishing a project designing a record-tracking database for an animal study. As the user becomes more familiar with the tool, new ideas will surface on how to use it.
A librarian who keeps current with emerging databases and data management tools will bring added value to any research group. Such a person will bring understanding of how to efficiently use the tools and how to apply the best-practices standards to the researchers’ specific situation. At the end of the day, data managed along the way is easier to find and easier to use.
Vessela Ensberg is a data curation analyst at the UCLA Biomedical Library and at the UCLA Social Sciences Data Archive. She holds a Ph.D. in Cellular and Molecular Biology from the University of Wisconsin-Madison, and is a former CLIR/DLF Fellow in Data Curation.