As a general rule of thumb, while actively working with a dataset you should use whichever file format best suits the way you work. In most cases, this will be dictated by the software that you prefer to use. If you have some flexibility, perhaps because your software supports several formats or you are writing your own software, consider using an archive-suitable formats described below.

When you have finished working with a particular dataset, you should transform it to a more stable, standard format for archive. It is increasingly common to find old files which are completely unreadable now, just because the software that created them is no longer available.

Ideally, your archival format should be at least one of:

  • readable using free tools (ideally plain text): so it can be accessed without a potentially-expensive license
  • a well-documented standard: so a wide variety of software is available to access it
  • a de facto standard in your research area: so the majority of researchers you share it with can be expected to have access to the right software

If possible, try to choose a format that allows you to describe and document the data directly within the file.

Examples of file formats
CategoryFormatsComments
 Text  Plain text, HTML, Rich Text Format, Markdown/RST/Textile/etc.  
   PDF/A  Only use for scans or if page layout is critical
 Tabular/numeric  Comma-/Tab-Separated Values, XML  Human-readable with just a text editor
   NetCDF, HDF5, FITS  Particularly good for complex or hierarchical data structures, and embedding metadata
 Images  TIFF, PNG, JPEG2000  Avoid GIF and standard JPEG
 Movies  MP4, Ogg Video  Prefer open codecs wherever possible
 Sound  FLAC, Ogg Audio  Prefer open codecs wherever possible
See more examples from the UK Data Archive
Examples of file formats