Imperial's Research Data Management policy

Why does Imperial have a policy on research data management?

Imperial College london is committed to promoting the highest standards of academic research, including excellence in research data management. Research data is a valuable research output in its own right and as such has to be managed appropriately. Good management of research data is beneficial to researchers and ensures that data remains available to validate claims made in published research. Research funders increasingly require data management plans and that shareable data is made publicly available. The policy is designed to give academics clear guidance on the principles of research data management, while allowing them flexibility to select the tools that best suit their approach.

How was the policy developed?

The policy was developed by the Research Office working closely with College’s Research Data Management Group. Faculty research committees and the Vice Provost Advisory Group for Research were consulted and their feedback incorporated. The policy was approved by the Provost’s Board.

What are my responsibilities as PI?

Principal  Investigators  have  overall  responsibility  for  the  effective  management  of  research  data generated within or obtained for their research, including by their research groups. This includes developing a data management plan (DMP) at the start of a project, ensuring that data is stored in a way that minimises the risk of loss and making shareable data publicly available where this is required to validate published research. See the policy for details. The Library and ICT will provide training, guidance and services to support PIs.

What is ‘research data’?

Research data is the evidence that underpins all research conclusions (except those which are purely theoretical) and includes data that has been collected, observed, generated, created or obtained from commercial, government or other sources, for subsequent analysis and synthesis to produce original research results. These results are then used to generate research papers and submitted for publication.

Why do I have to provide information about software?

Like data, software can be a valuable research output in its own right. In many cases data cannot be fully understood without the software that was used to generate it. Where software is developed as part of a research project, PIs are required to archive the version of the software that was used to generate/analyse the data and to inform the Library of the location, using Symplectic. Software should be archived even when it cannot be made publicly available.

Project planning

What should I do when planning a research project?

You should consider what data you are likely to generate, how the data is going to be used, what data you will have to store, how you will store it and what you will do with the data when the project ends. This will inform overall project planning, minimise risk of loss of data and allow you to cover the costs of data management in your grant. The best way to do this is to develop a data Management plan (DMP).

What is a data management plan?

A data management plan or DMP is a document that outlines how you will handle your data both during your research, and after the project is completed. Research funders do usually require you to submit a DMP with a proposal. The College provides guidance and tools for developing DMPs.

Data storage

Where can I store my research data?

The College is in the process of making the Box service available to staff. Box provides unlimited storage for research data.

Why should I use Box?

Box provides unlimited, backed-up data storage and the ability to easily share data with colleagues in the College and with external partners. You have full control over who can access what data, and what permissions to give them. Box desktop clients and mobile apps allow you to automatically sync data between different devices. Data in Box is stored securely and backed up automatically, allowing you to recover data even if it is accidentally deleted by collaborators. Box machine learning tools help you describe your data.

Is Box data storage really unlimited?

There is no limit to the amount of data you can store in Box. If you upload more than 1TB per month full upload speeds are not guaranteed, but we do not expect there to be a significant impact on speeds.

When will Box be made available?

Box is currently being tested at Imperial. It will be made available across the College in December. You will receive a notification when your account is ready.

What operating systems and devices are supported by Box?

The Box web interface can be accessed through all online-enabled devices, including smartphones, tablets and laptops, irrespective of the operating system. Desktop clients are available for Windows and Mac, mobile apps for Android, Apple, Blackberry and Windows. Linux users can mount Box storage via WebDAV (see How to mount Box.com cloud storage on Linux). The College is working with Box to improve Linux support.

Does Box require a separate username and password?

You will be able to log in to Box with your College username and password.

Where does Box store data?

Box data centres are geographically dispersed, so the solution may not be suitable for data that for legal reasons cannot leave the EU. Box is Safe Harbor certified. If you have specific requirements for data security, please contact ICT.

I already use Box. What should I do?

If you are already using Box your account can be rolled into the College Box subscription, giving you unlimited storage free of charge. To allow us to identify your account, please make sure to add your College email address to your Box profile.

I use another external service to store my data. What should I do?

We recommend that you migrate to Box as the service offers unlimited, backed-up storage at no cost to you. If you use Box to collaborate with a group, the data is retained even when colleagues who originally created it leave the College. The College is working with Box to make new features available to help you manage your data.

Publication

What is a data access statement?

All research publications produced by Imperial authors must include a statement on how the underlying data can be accessed (a ‘data access statement’). This is in line with RCUK and individual funder policies. The College provides guidance and templates for data access statements.

Where can I publish my data?

The most straightforward way to make data available is to deposit it in a public repository (also known as an archive or data centre). In many cases, this will also meet the requirement to archive your data for the long term. Most data repositories will give you a Digital Object Identifier (DOI) for your dataset, enabling it to be cited and tracked like a regular publication. The College provides guidance on identifying a suitable data repository.

Does the College have a data repository or catalogue?

There are currently no plans for the College to develop its own data repository; instead we recommend suitable external solutions. Spiral, the College’s open access repository, is currently being enhanced to act as a data catalogue: while it won’t hold the data, it will point to the repository where data is stored. Adding information about your datasets and software to your Symplectic profile ensures that your outputs will be featured when the Spiral upgrade work will be finished in late autumn 2015.

Do I have to publish all my data?

Funders require that all data required to validate published research has to be made public. This is usually only a subset of the data created by a project, and funders accept that academics will evaluate this in line with disciplinary practice. Funders, and the College, also recognise that not all data can be made publicly available, for example where patient or other sensitive data is concerned or where data cannot be shared for legal reasons.

When should I publish my data?

Data should be made available when the research that it supports is published. In some cases it may be necessary to embargo the data for a reasonable period, for example when other outputs based on the data have not yet been published. The College recognises the PI’s entitlement to be the first to publish based on data they have generated. If the PI will not be able to publish their findings by the funder’s deadline for data sharing, the PI should request an extension of the embargo period from the funder.