Code of gratitude

Giving due credit to scientific software development.

A substantial amount of work involved in data-intensive problem-solving is based on resources that virtually cost nothing to the end-user. Nada, gratis. You find it, you download it, you use it, and in some cases, you can modify it to tailor it to your own needs.  You might even benefit from the experience of a large community of developers and users. Almost everything you need to learn or advance in your work.

Nowhere is this more apparent than in scientific research, across disciplines and application domains. The positive effects of free software and other types of open resources on scientific advancement are priceless. No single indicator of “impact” and “return on investment” could do justice to the benefits generated by such tools, languages and platforms.

Although users and other stakeholders often cite sources when reporting their work, this is not even consistently done. There are so many people (and work hours) invested in the noble idea of making and sharing code. So many anonymous contributors, unknown leaders and unselfish enablers who make such a moral choice every day. And yet, there is so much work that goes unrecognized.

The marvelous thing is that the communities that bring us all these ideas and solutions are made up of a particular kind of people. Mostly, they are people who care about societal challenges and do things just for the fun of it. They nurture projects that are fueled by unremitting passion, enjoyment and curiosity. This is not a world particularly suited to the celebrity wannabe and the bullshitter.

There is the rub: There is always the temptation to take the labors and contributions from these communities for granted. Almost a sense of entitlement among direct and indirect beneficiaries. The expectation that all those great tools and resources are self-made and self-perpetuating. The presumption that somewhere, somebody will make them for you, one way or the other.

In times of increasing excitement about “all things data”, it is crucial that stakeholders continue finding concrete ways to support, motivate or give due credit to those who create and share software. Multiple reward instruments, monetary and non-monetary, will be important. Impact assessment will also be necessary. This is more than a matter of fairness for a particular group of people. This is also about sustaining scientific and societal progress.

For now, I take a step back to tell the men and women behind the screen: It is a privilege to have access to the products of your talents and hard work. To the good people who give us the languages, databases and tools that make science happen: I am grateful to you.

The perks of data sharing

“Pattern 7” by F. Azuaje

Data sharing is no longer a question of ‘why’, but rather of ‘when’ and ‘how’.

The access to biomedical research data is both a critical requirement and concern in the road to generating benefits to patients and society at large. In the scientific community there seems to be two “extreme” opinion sectors: those who firmly oppose steps for making data more accessible to all, and those who seek ways to free data of any restrictions for further uses.

The first group argues that the researchers responsible for acquiring the data should be the exclusive “owners” of that data. In this group you may find scientists who see other potential data users as “research parasites”. On the other hand, in the second group you have researchers who argue that data should be made available to the community without delays and favor full data openness.

Whether we feel closer to one group or the other, we cannot overlook a fact: It is time to talk about data sharing in a more dispassionate manner. Such a conversation will need to address two central questions: When and how should data be shared?

J. Wilbanks and S.H. Friend at Sage Bionetworks (Seattle, USA) have recently made a significant contribution to this conversation by reporting their motivation and experience in health data sharing. This follows Sage Bionetworks’ decision to share data obtained from thousands of participants of the mPower project, a smartphone-enabled study in Parkinson’s disease, even before the publication of their own analyses.

Data sharing: When?

According to Wilbanks and Friend, data sharing is especially needed in research areas where the problem of transforming raw data into interpretable findings has no generalized solutions. In their area, mobile health research, there is still a need to develop computational methods for making sense of these data. They argue that, by rapidly sharing such data, researchers will be enabled to come up with new tools to accelerate discoveries and applications, which in the long-term may result in benefits to patients.

Data sharing: How?

Scientists not only have a duty to maximize the potential value of their data, but also to enhance the conditions for their ethical use. To address these obligations, Sage Bionetworks’ approach does not solely rely on researchers or ethical committees to decide on who can re-utilize data. Instead, they directly allow the study participants to decide on whether or not other “qualified researchers” can access their coded data. In their project, more than 75% of the study participants chose to share their data widely. Participants can make this decision, or even modify it at any time, by using the study’s smartphone app.

Once researchers are given data access, additional restrictions are put in place, such as those concerning the commercial use or re-identification of the data. Additionally, there is the question of who can be recognized as a qualified researcher. To deal with this issue, Sage Bionetworks ask data requestors to complete various steps, including the validation of their identity and agreement to a data sharing contract.

Data sharing: Patients first.

Sage Bionetworks’ data sharing approach offers insights that go beyond the question of whether or not data should be shared. It reframes the discussion as a question of when (and how) to share data, according to the specific context and needs of a study. Although several legal and ethical concerns still remain, this approach at least aims to balance crucial requirements: the participants’ privacy and their motivation to support research, while promoting the transparent and ethical use of the data.

We should enhance the decision-making power of study participants to facilitate responsible and meaningful data applications. Decisions on when and how to share data should not be driven by the self-interest of scientists. The rationale should be grounded in the need to maximize the potential benefits to patients.


Longo, D., & Drazen, J. (2016). Data Sharing New England Journal of Medicine, 374 (3), 276-277 DOI: 10.1056/NEJMe1516564
Wilbanks, J., & Friend, S. (2016). First, design for data sharing Nature Biotechnology DOI: 10.1038/nbt.3516
Bot, B., Suver, C., Neto, E., Kellen, M., Klein, A., Bare, C., Doerr, M., Pratap, A., Wilbanks, J., Dorsey, E., Friend, S., & Trister, A. (2016). The mPower study, Parkinson disease mobile data collected using ResearchKit Scientific Data, 3 DOI: 10.1038/sdata.2016.11

This article was published in the United Academics Magazine.