.. _biap1: ################################ BIAP1 - Towards immutable images ################################ :Author: Matthew Brett :Status: Rejected :Type: Standards :Created: 2011-03-23 ********** Resolution ********** Retired as of nibabel 2.0 in favor of exposed `dataobj` property. See: * http://nipy.org/nibabel/nibabel_images.html#the-image-data-array * http://nipy.org/nibabel/images_and_memory.html See image `in_memory` attribute and `uncache` method. We haven't implemented an `is_as_loaded` attribute yet. ********** Background ********** Nibabel implicitly has two types of images * array images * proxy images Array images ============ Array images are the images you get from a typical constructor call:: import numpy as np import nibabel as nib arr = np.arange(24).reshape((2,3,4)) img = nib.Nifti1Image(arr, np.eye(4)) ``img`` here is an array image, that is to say that, internally, the private ``img._data`` attribute is reference to ``arr`` above. ``img.get_data()`` just returns ``img._data``. If you modify ``arr``, you will modify the result of ``img.get_data()``. Proxy images ============ Proxy images are what you get from a call to ``load``:: px_img = nib.load('test.nii') It's a proxy image in the sense that, internally, ``px_arr._data`` is a proxy object that does not yet contain an array, but can get an array by the application of:: actual_arr = np.asarray(px_img._data) This is in fact what ``px_img.get_data()`` does. Actually, ``px_img.get_data()`` also stores the read array in ``px_img._data``, so that:: px_img = nib.load('test.nii') assert not isinstance(px_img._data, np.ndarray) # it's a proxy actual_arr = px_img.get_data() assert isinstance(px_img._data, np.ndarray) # it's an array now So, at this point, if you change ``actual_arr`` you'll also be changing ``px_img._data`` and therefore the result of ``px_img.get_data()``. In other words, ``actual_arr = px_img.get_data()`` turns the proxy image into an array image. Issues for design ================= The code at the moment is a little bit confusing because: * there isn't an explicit API to check if you have an array image or a proxy image and * there isn't anywhere in the docs that you can go and see this distinction. Use cases ========= Loading images, minimizing memory --------------------------------- I want to load lots of images, or several large images. I'm going to do something with the image data. I want to minimize memory use. This tempts me to do something like this:: large_img1 = nib.load('large1.nii') large_img2 = nib.load('large2.nii') li1_mean = large_img1.get_data().mean() li2_mean = large_img2.get_data().mean() The problem with the current design is that, after the ``li1_mean =`` line, ``large_img1`` got unproxied, and there's a huge array inside it. Loading images, maximizing speed -------------------------------- On the other hand, I might want to do the same thing, but each call to unproxy the data (loading off disk, applying scalefactors) will be expensive. So, when I do ``li1_mean = large_img1.get_data().mean()`` I want any subsequent call to to ``large_img1.get_data()`` to be much faster. This is the case at the moment, at the expense of the memory hit above. Loading images, assert not modified ----------------------------------- In pipelines in particular, we frequently want to load images, maybe have a look at some parameters, and then pass that image *filename* to some other program such as SPM or FSL. At the moment we've got a problem:: img = nib.load('test.nii') # do stuff run_spm_thing_on(img) # is 'img' the same as test.nii? The problem is that when the routine ``run_spn_thing`` receives ``img``, it can know that ``img`` has a filename, ``test.nii``, but it can't currently know if ``img`` is the same object that it was when it was loaded. That is, it can't know whether ``test.img`` still corresponds to ``img`` or not. In practice that means that ``run_spm_thing`` will need to save every ``img`` to another file before passing that filename to the SPM routine, just in case ``img`` has been modified. So, we would like a *dirty bit* for the image, something like:: # Not implemented yet if not img.is_as_loaded(): save(img, 'some_filename.nii') The last line, like it or not, modifies ``img`` in-place. Array images, proxy images, copy, view ====================================== With thanks to Roberto Viviani for some clarifying thoughts on the nipy mailing list. At the moment, ``img.get_data()`` always returns a reference to an array. That is, whenever you call:: data = img.get_data() Then, if you modify ``data`` you will modify the next result of ``img.get_data()``. In particular, the interface currently intends that there should be no functional difference between proxied images and non-proxied images. The proposal below exposes a functional difference between them. When do you want a copy and when do you want a view? ---------------------------------------------------- This is a discussion of this proposal:: img.get_data(copy=True|False) compared to:: img.get_data(unproxy=True|False) Summary: * array images - you nearly always want a view * proxy images - you may want a copy, but you want a copy only because you want to leave the image as a proxy. You might want to leave the image as a proxy because you want to be sure the image corresponds to the file, or save memory. For array images, it doesn't make sense to return a copy from ``img.get_data()``, because it buys you nothing that you would not get from ``data = img.get_data().copy()``. This is because you can't save memory (the image already contains the whole array), and it won't help you be sure that the image has not been modified compared to the original array, because there may be references to the array that existed before the image was made, that can be used to modify the data. So, for array images, you always want a reference, or you want to do a manual copy, as above. For proxied images, it does make sense to get a copy, because a) you want to preserve memory by not unproxying the image, and / or b) you want to be able to be sure that the file associated with the image still corresponds to the data. For the ``img.get_data(copy=False)`` proposal, on a proxied image, the ``copy=False`` call, in order to return a view, must *implicitly* unproxy the image. Similarly, ``img.get_data(unproxy=False)`` must *implicitly* copy the image. It seems to me (MB) that an implicit copy is familiar to a numpy user, but the implicit unproxying may be less obvious. My (MBs) reasons then for preferring 'unproxy' to 'copy=True' or 'copy=False' or get_data_copy() is that 'unproxy' is closer to how I think the user would think about deciding what they wanted to do. The ``unproxy=False`` case covers the situation where you want to preserve memory. It doesn't fully cover the cases where we want to keep track of when the image data has been modified. Here there are three cases: * array image, instantiated with an array; the image data can be modified using the array reference passed into the image - we can't know whether the data has been modified without doing hashing or similar. * proxy image; the array data is still in the file, so we know it corresponds to the file. * proxy images that have been converted to array images, but have not passed out a reference to the data. Let's call these *shy unproxied* images. For example, with an API like this:: img = load('test.nii') data = img.get_data(copy=True) the ``img`` is now an array image, but there's no public reference to the internal array object. Someone could get one by cheating with ``ref = img._data``, but, we don't need to worry about that - following Python's "mess around if you like but take the consequences" philosophy. Proposal ======== An ``is_proxy`` property:: img.is_proxy This is just for clarity. Allow the user to specify what unproxying they want with a kwarg to ``get_data()``:: arr = large_img1.get_data(unproxy=False) * for proxied images, ``unproxy=False`` would leave the underlying array data as a pointer to the file. The returned ``arr`` would be therefore a copy of the data as loaded from file, and ``arr[0] = 99`` would have no effect on the data in the image. ``unproxy=True`` would convert the proxy image into an array image (load the data into memory, return reference). Here ``arr[0] = 99`` would affect the data in the image * for array images, ``unproxy`` would always be ignored. Thus ``unproxy=True`` in fact means, ``unproxy_if_this_is_a_proxy_do_nothing_otherwise``. The default would continue to be ``unproxy=True`` so that the proxied image would continue, by default, to behave the same way as an unproxied image (``get_data`` returns a view). If ``img.is_proxy`` is True, then we know that the array data has not changed. We then need to be sure that the ``header`` and ``affine`` data haven't changed. We might be able to do this with default ``copy`` kwargs to the ``get_header`` and ``get_affine`` methods:: hdr = img.get_header(copy=True) # will be default aff = img.get_affine(copy=True) # will be default We could also do that by caching the original header and affine, but the header in particular can be rather large. For the next version of nibabel, for backwards compatibility, we'll set ``copy=False`` to be the default, but warn about the upcoming change. After that we'll set ``copy=True`` as the default. Now we can know whether the image has been modified, because if ``get_header`` and ``get_affine`` have only been called with ``copy=True`` and ``img.is_proxy == True`` - then it must be the same as when loaded. This leads to an ``is_as_loaded`` property:: if img.is_as_loaded: fname = img.get_filename() else: fname = 'tempname.nii' save(img, 'tempname.nii') Questions ========= Should there also be a ``set_header`` and ``set_affine`` method? The header may conflict with the affine. So, would we need something like:: img.set_header(hdr, hdr_affine_from='affine') or some other nasty syntax. Or can we avoid this and just do:: img2 = nib.Nifti1Image(img.get_data(), new_affine, new_header) ? How about the names in the proposal? ``is_proxy``; ``unproxy=True``? .. vim: ft=rst