Reputation instead of obligation: forging new policies to motivate academic data sharing

Despite strong support from funding agencies and policy makers academic data sharing sees hardly any adoption among researchers. Current policies that try to foster academic data sharing fail, as they try to either motivate researchers to share for the common good or force researchers to publish their data. Instead, Sascha Friesike, Benedikt Fecher, Marcel Hebing, and Stephanie Linek argue that in order to tap into the vast potential that is attributed to academic data sharing we need to forge new policies that follow the guiding principle reputation instead of obligation.

What drives academic data sharing?

Benedikt Fecher, Sascha Friesike and Marcel Hebing in an PLOS article on what drives data sharing in academia.
We show that this process can be divided into six descriptive categories: Data donor, research organization, research community, norms, data infrastructure, and data recipients. Drawing from our findings, we discuss theoretical implications regarding knowledge creation and dissemination as well as research policy measures to foster academic collaboration. We conclude that research data cannot be regarded as knowledge commons, but research policies that better incentivise data sharing are needed to improve the quality of research results and foster scientific progress.

Sowing the Seed

Qualitative study that has gathered evidence, examples and opinions on current and future incentives for research data sharing from the researchers’ point of view. Including recommendations for research funders, research institutions, publishers, data centres and repositories and other relevant actors.

System error: Open research data and publication-driven research

Blog post on data sharing in a publication-driven academic system:
Would more researchers share data if they got more for it? Possibly. The currency does not even have to change. What is missing in the academic system is the recognition for intermediaries, also for data. Those who publish well get cited. The H-index increases and thereby the chances for professional advancement. Good articles are good for the career. Good data however are still not as important than they should be.

Perspectives on open science and scientific data sharing: an interdisciplinary workshop

Looking at Open Science and Open Data from a broad perspective. This is the idea behind “Scientific data sharing: an interdisciplinary workshop”, an initiative designed to foster dialogue between scholars from different scientific domains which was organized by the Istituto Italiano di Antropologia in Anagni, Italy, 2-4 September 2013.We here report summaries of the presentations and discussions at the meeting. They deal with four sets of issues: (i) setting a common framework, a general discussion of open data principles, values and opportunities; (ii) insights into scientific practices, a view of the way in which the open data movement is developing in a variety of scientific domains (biology, psychology, epidemiology and archaeology); (iii) a case study of human genomics, which was a trail-blazer in data sharing, and which encapsulates the tension that can occur between large-scale data sharing and one of the boundaries of openness, the protection of individual data; (iv) open science and the public, based on a round table discussion about the public communication of science and the societal implications of open science.

What drives academic data sharing?

Check out the working paper that two of our HIIG colleagues, Sascha Friesike and Benedikt Fecher, published on the barriers of data sharing. Feedback is welcome!

Despite widespread support from policy makers, funding agencies, and scientific journals, academic researchers rarely make their research data available to others. At the same time, data sharing in research is attributed a vast potential for scientific progress. It allows the reproducibility of study results and the reuse of old data for new research questions. Based on a systematic review of 98 scholarly papers and an empirical survey among 603 secondary data users, we develop a conceptual framework that explains the process of data sharing from the primary researcher’s point of view. We show that this process can be divided into six descriptive categories: Data donor, research organization, research community, norms, data infrastructure, and data recipients. Drawing from our findings, we discuss theoretical implications regarding knowledge creation and dissemination as well as research policy measures to foster academic collaboration. We conclude that research data cannot be regarded a knowledge commons, but research policies that better incentivise data sharing are needed to improve the quality of research results and foster scientific progress.

A note on the practical costs of data sharing

Aside from the ethics and etiquette of fully open data-sharing, there are practical issues that journals still need to address.   One is the cost of sharing data. Both the Public Library of Science and the UK Royal Society recommend the storage repository Dryad, which currently charges US$15 for the first gigabyte of data over its 10-gigabyte limit, and $10 per gigabyte thereafter. However, studies in areas such as neuroscience can generate terabytes of raw data (1 terabyte is 1,000 gigabytes) — a quantity that few labs could afford to upload.

Ten Rules for the Care and Feeding of Scientific Data

Goodman et al. on the importance of concise data curation and annotation:
Today, most research projects are considered complete when a journal article based on the analysis has been written and published. The trouble is, unlike Galileo's report in Sidereus Nuncius, the amount of real data and data description in modern publications is almost never sufficient to repeat or even statistically verify a study being presented. Worse, researchers wishing to build upon and extend work presented in the literature often have trouble recovering data associated with an article after it has been published. More often than scientists would like to admit, they cannot even recover the data associated with their own published works.
 

So what is the difference between data, code and text?

Bjoern Brembs thinks there isn't really any. But read his interesting take on data sharing yourself:
So far, I can’t see any principal difference between our three kinds of intellectual output: software, data and texts.   I admit I’m somewhat surprised that there appears to be a need to write this post in 2014. After all, this is not really the dawn of the digital age any more. Be that as it may, it is now March 6, 2014, six days since PLoS’s ‘revolutionary’ data sharing policy was revealed and only few people seem to observe the irony of avid social media participants pretending it’s still 1982. For the uninitiated, just skim Twitter’s #PLoSfail, read Edmund Hart’s post or see Terry McGlynn’s post for some examples. I’ll try to refrain from reiterating any arguments made there already.

Data-Sharing Angst: An Insight into an ongoing Research Project (Feedback appreciated)

Benedikt Fecher, one of our editors here, wrote a blog entry about a systematic review on data sharing in academia that he and his colleague Sascha Friesike conducted. It is part of his doctoral research. Particularly interesting might be the Data Sharing Angst:
Data sharing in academia is good for everyone. It allows better and more research. For the single researcher, however, these advantages are not so apparent. The category Returns indicates that sharing research data is rather related to negative than positive individual outcomes. For instance: Inhibiting returns include concerns about competitivedisadvantage regarding other researchers, commercial misuse of data, the falsification of results and flawed data interpretation by others – all of which can be subsumed as Data sharing Angst. They exist, whether justified or not.
Feedback, questions and ideas are more than welcome!

Data Sharing Goes Linux

The Linux Foundation, champion of all things open-source, has just announced a new collaboration with OpenBEL, an open-source platform for sharing scientific data.
The marriage should provide a boost to OpenBEL, which is already used by many in industry and academia to enable better collaboration on big data projects, such as drug development. The hope is that the nonprofit can do for OpenBEL what it is doing for Linux, which has become lingua franca among collaborative software developers and hackers alike.

Beyond Property Rights: Thinking About Moral Definitions of Openness

Referring to the idea of Sunil Abraham, founder and executive director of the Center for Internet and Society (CIS) in India, David Eaves points out that the spectrum of openness actually extends well beyond the variants typically encountered in the West.
Conversations about open knowledge outside the supposedly settled lands of the 'rich' often stretch beyond permission-based 'fair use' and 'creative commons' approaches. There is a desire to explore potential moral rights to use 'content' in addition to just property rights that may be granted under statutes.

Improving Science through Data Management and Sharing

Here is a paper by Kathryn A. Kane on Data Management and Sharing from the URJHS—Undergraduate Research Journal for the Human Sciences that is.
Scientific discovery and innovation move society into the future, and it is the responsibility of researchers to use their work to advance that purpose. By effectively managing and sharing their data with the public, researchers can facilitate collaboration with their peers, thus conserving time and resources.

Improving Science Through Data Management

An interesting undergraduate research project on the importance of data management and sharing by Kathryn A. Kane.
This is just one small example of how valuable data can be lost to both current and future researchers when there is no data management plan in place. Scientific discovery and innovation move society into the future, and it is the responsibility of researchers to use their work to advance that purpose. By effectively managing and sharing their data with the public, researchers can facilitate collaboration with their peers, thus conserving time and resources. This also leads to increased transparency and improved scientific reputations. There are some challenges facing this proposal, but with a concerted effort data management and sharing can become an integral part of the scientific culture.

Data Sharing Using Difference-on-write

From the abstract:
When a virtual machine writes to a page that is being shared across VMs, a share value is calculated to determine how different the page would be if the write command were implemented. If the share value is below a predefined threshold (meaning that the page would not be “too different”), then the page is not copied (as it would be in a standard copy-on-write operation). Instead, the difference between the contents of the pages is stored as a self-contained delta. The physical to machine memory map is updated to point to the delta, and the delta contains a pointer to the original page. When the VM needs to access the page that was stored as a delta, the delta and the page are then fetched from memory and the page is reconstructed.