Once a scientist has finished her research and published a paper, where is that data stored?
Three of the panelists at Science Online New York City (SONYC)’s event last night, “Thinking Digital: Giving your research more reach (and making sure others can find it),” are figuring out answers to this question.
Carol Feltes, Rockefeller University’s head librarian, comes from a business background. In a for-profit setting, there are clear policies and procedures for data management, retention, and removal. But at Rockefeller, each lab head is responsible for his or her data—there is no overarching policy for data management.
The library encourages lab heads to store data on its DSpace, “a managed, digital repository designed to archive, preserve and make accessible the scholarly works authored by The Rockefeller University faculty, staff and students.” So far, only six labs have started using the open-access repository. “Unless they absolutely need it, we can’t sell it,” Feltes said.
There are also data storage solutions outside institutional settings. The new repository Figshare invites researchers to post research data, both publically and privately, under a creative commons license (materials can used with proper attribution). Figshare developer Mark Hahnel finished his Ph.D. in September, and during the process realized that most of his research would never be seen outside of his lab group—much of it was negative, or simply didn’t fit into the larger research picture. He wanted to share and get credit for all his research, and found that existing online places to store and share data were too difficult to use. Thus, Figshare was born. Hahnel hopes that the visible metrics on the site, tracking page views, shares, and citations, will encourage people to post their research.
Efforts are also underway to make published literature available. One such undertaking is the Biodiversity Heritage Library (BHL), a consortium of 12 libraries digitizing all published biodiversity literature. To date, said Cathy Norton, library scholar at BHL and the Woods Hole Oceanographic Institution Library, 32 million pages—or 6.5 percent of the literature—has been scanned. Much of that material predates 1923, and is no longer under copyright. In addition, publishers have given the project materials to scan.
But once these materials are available, how can people find them? Rare and old works do not have DOIs or ISBNs, and digitizers must add metadata and discoverability tools. The next step, said Norton, it to contextualize data, providing a story regarding what the collection is all about.
Clearly, research tools are changing as more information becomes available. And, if used well, this could change the speed of science. As Feltes said, “People begin to realize that with greater access to a greater variety [of data], the pace of knowledge generation increases. Having data available accelerates the pace at which problems are solved and disparate disciplines are stitched together.”