Adding test data

  1. We really, really like test images, but

  2. We are rather conservative about the size of our code repository.

So, we have two different ways of adding test data.

  1. Small, open licensed files can go in the nibabel/tests/data directory (see below);

  2. Larger files or files with extra licensing terms can go in their own git repositories and be added as submodules to the nibabel-data directory.

Small files

Small files are around 50K or less when compressed. By “compressed”, we mean, compressed with zlib, which is what git uses when storing the file in the repository. You can check the exact length directly with Python and a script like:

import sys
import zlib

for fname in sys.argv[1:]:
    with open(fname, 'rb') as fobj:
        contents = fobj.read()
    compressed = zlib.compress(contents)
    print(fname, len(compressed) / 1024.)

One way of making files smaller when compressed is to set uninteresting values to zero or some other number so that the compression algorithm can be more effective.

Please don’t compress the file yourself before committing to a git repo unless there’s a really good reason; git will do this for you when adding to the repository, and it’s a shame to make git compress a compressed file.

Files with open licenses

We very much prefer files with completely open licenses such as the PDDL 1.0 or the CC0 license.

The files in the nibabel/tests/data will get distributed with the nibabel source code, and this can easily get installed without the user having an opportunity to review the full license. We don’t think this is compatible with extra license terms like agreeing to cite the people who provided the data or agreeing not to try and work out the identity of the person who has been scanned, because it would be too easy to miss these requirements when using nibabel. It is fine to use files with these kind of licenses, but they should go in their own repository to be used as a submodule, so they do not need to be distributed with nibabel.

Adding the file to nibabel/tests/data

If the file is less then about 50K compressed, and the license is open, then you might want to commit the file under nibabel/tests/data.

Put the license for any new files in the COPYING file at the top level of the nibabel repo. You’ll see some examples in that file already.

Adding as a submodule to nibabel-data

Make a new git repository with the data.

There are example repos at

Despite the fact that both the examples are on github, Bitbucket is good for repos like this because they don’t enforce repository size limits.

Don’t forget to include a LICENSE and README file in the repo.

When all is done, and the repository is safely on the internet and accessible, add the repo as a submodule to the nitests-data directory, with something like this:

git submodule add https://bitbucket.org/nipy/rosetta-samples.git nitests-data/rosetta-samples

You should now have a checked out copy of the rosetta-samples repository in the nibabel-data/rosetta-samples directory. Commit the submodule that is now in your git staging area.

If you are writing tests using files from this repository, you should use the needs_nibabel_data decorator to skip the tests if the data has not been checked out into the submodules. See nibabel/tests/test_parrec_data.py for an example. For our example repository above it might look something like:

from .nibabel_data import get_nibabel_data, needs_nibabel_data

ROSETTA_DATA = pjoin(get_nibabel_data(), 'rosetta-samples')

@needs_nibabel_data('rosetta-samples')
def test_something():
    # Some test using the data

Using submodules for tests

Tests run via nibabel on travis start with an automatic checkout of all submodules in the project, so all test data submodules get checked out by default.

If you are running the tests locally, you may well want to do:

git submodule update --init

from the root nibabel directory. This will checkout all the test data repositories.

How much data should go in a single submodule?

The limiting factor is how long it takes travis-ci to checkout the data for the tests. Up to a hundred megabytes in one repository should be OK. The joy of submodules is we can always drop a submodule, split the repository into two and add only one back, so you aren’t committing us to anything awful if you accidentally put some very large files into your own data repository.

If in doubt

If you are not sure, try us with a pull request to nibabel github, or on the nipy mailing list, we will try to help.