How to Make Science More Reliable: Tips for Managing Data

Jul 01, 2016

Nearly anyone who has worked in research is familiar with the frustrating scenario: a postdoc leaves for another job, and with him goes all sorts of valuable knowledge. It’s become loud and clear that results from many published scientific studies are unreliable. While ethical violations like fraud clearly contribute to the problem, so does a seemingly more benign and much more common issue, poor data management. Darell Schmick, research librarian at the Eccles �ǿմ�ý Sciences Library at the University of Utah, describes scenarios that could happen as a result of poor data management, and ways to overcome them. at the upcoming Research Reproducibility Conference.

Episode Transcript

Interviewer: What's becoming loud and clear is that most scientific studies are, well, unreliable. But there's help out there for scientists. We'll talk about that next on The Scope.

Announcer: Examining the latest research, and telling you about the latest breakthroughs. "The Science and Research Show" is on The Scope.

Interviewer: I'm talking with Darell Schmick, research librarian at the Eccles �ǿմ�ý Sciences Library at the University of Utah. So there are a number of reasons for what some people are calling a research reproducibility crisis, including fraud. But even scientists with the best intentions are at risk for doing sloppy work and there are a lot of reasons for that as well. One of the things that you're interested in looking at is data management. I really like this example of the Postdoc who leaves the lab.

Schmick: So it was it an age-old issue where you do a lot of work and it happens to be on your personal computer. It happens to be in a folder with poorly-named files and you produce a bunch of research on behalf of the institution, but then, once you're done, you obviously take your computer with you, right, and then head off to that next position. Now, that seemingly innocuous, however, has a lot of implications. The data that you produce on behalf the university has ownership concerns. Is it the Postdoc's? Is it the University's?

Interviewer: How can that lead to issues with research reproducibility?

Schmick: So if the Postdoc takes all that data with him or her and hasn't been saved into the department bio or anything like that, how can you ensure that you have records of all the work that Postdoc has done? They could have taken just a little bit of it, they could have taken a substantial chunk of it. And it really leaves the PI as well as the rest of the members of the lab with potentially a significant disadvantage.

Interviewer: So there's a lot of knowledge that can be lost?

Schmick: Absolutely.

Interviewer: So what are some ways to avoid that?

Schmick: We do teach a research administration training class that talks about just the basic fundamentals of just good data management, which involves things like where to properly back up your files and how often to do that. Myself and a couple of the librarians on campus have been working on a pilot for electronic lab notebook technology.

So if, say for instance, you happen to have the perennial issue of lab members recording data in their own personal computers because it's inconvenient to share it, this sort of technology allows for a lab to share in a collaborative notebook technology something that has all those questions that we're talking about answered, like how frequently will it be backed up? Is it going to be backed up in a trustworthy source?

Interviewer: Right. Well, and not to mention that most lab notebooks that I've seen are kind of a disaster.

Schmick: Are you saying that scientists don't uniformly have amazing handwriting?

Interviewer: Exactly. Right, we're all human. And even how the information is recorded is varied from person to person. So the idea is that this would become more standardized?

Schmick: You hit the nail on the head there.

Interviewer: Any other approaches to make their data more accessible or reliable?

Schmick: When we're talking about optimizing the mechanics of anything in the research process, we want to ensure that we're doing it in a way that is not only accessible by us because we can look at our notes, presumably, and be able to understand what we were saying, understand what we were recording, understand what we were encountering in that process. But to think about how the results that you're producing are going to be read by somebody else that's not in that same context. So if you're doing an experiment, you do it for you, but you also ensure that you're doing it in a way that if somebody wants to reproduce that experiment, they can do that.

Interviewer: You've talked about ways about preserving information within a lab group, for example, a research group. What about sharing information more broadly with the scientific community?

Schmick: That's a great question, Julie. And a lot of people think that all you are able to really produce is that end product, that finalized article. And we don't realize, many times, that when we're doing experiments, when we're producing all this data that we're talking about, that data could be good data. It could be good information and good intel for another scientist that's stumbling across that same issue.

If you're embarking on answering a research question and you come across a dataset that has already sought to ask that question, you find out that maybe those results weren't satisfactory enough to produce something into a finished article, that could potentially save you years in otherwise reinventing the wheel.

So another thing that we like to talk about is the idea of ensuring that researchers know that the data that they produce is of value and there are places that you can store that. So one example that comes to mind is figshare. And figshare is a repository that you can actually assign a DOI to the data sets that you're uploading on there. Figshare are all about open science so they say as long as you're making it public, I mean, you can upload it for free.

Interviewer: So how can sharing data with the scientific community help with research reproducibility?

Schmick: There's a lot of news as of late in the way of that openness toward science where folks on a peer review panel want to see the steps you were able to take in order to draw the conclusions that you were able to take or able to make. And if they're able to go ahead and see that data right there from the start, it answers all those questions.

It's when things like that data being withheld presents a larger problem not only for you as author but greater implications for science in general. When we start to withhold that data, when we start to conceal certain steps in that recipe toward what we ended up with the final product, it leads towards a slippery slope of was it not open science and that closed scholarly environment, I think, is something that is well worth fighting against.

Interviewer: Where can people go to learn more about best data management practices?

Schmick: There are a lot of places. If you're embarking on a data management plan there's a great tool called DMPTool. That's dmptool.org, and it'll take you through the steps of the processes of, "Let's take you through the steps of the data management process." By asking you in a 20 questions format, "What's this data going to be for? Which agencies are going to be seeing this data?" And it'll give you recommendations at the end from there. If you're at the University of Utah's campus, I'd encourage you to talk to me or one of our other fine staff at Eccles �ǿմ�ý Sciences Library. I'd be delighted answer any questions regarding data management plans.

Announcer: Interesting, informative and all in the name of better health. This is The Scope �ǿմ�ý Sciences Radio.

�ǿմ�ý

How to Make Science More Reliable: Tips for Managing Data

Episode Transcript

More Episodes