algorithms.utils.pca¶
Module: algorithms.utils.pca¶
This module provides a class for principal components analysis (PCA).
PCA is an orthonormal, linear transform (i.e., a rotation) that maps the data to a new coordinate system such that the maximal variability of the data lies on the first coordinate (or the first principal component), the second greatest variability is projected onto the second coordinate, and so on. The resulting data has unit covariance (i.e., it is decorrelated). This technique can be used to reduce the dimensionality of the data.
More specifically, the data is projected onto the eigenvectors of the covariance matrix.
Functions¶
- nipy.algorithms.utils.pca.pca(data, axis=0, mask=None, ncomp=None, standardize=True, design_keep=None, design_resid='mean', tol_ratio=0.01)¶
- Compute the SVD PCA of an array-like thing over axis. - Parameters:
- datandarray-like (float)
- The array on which to perform PCA over axis axis (below) 
- axisint, optional
- The axis over which to perform PCA (axis identifying observations). Default is 0 (first) 
- maskndarray-like (np.bool_), optional
- An optional mask, should have shape given by data axes, with axis removed, i.e.: - s = data.shape; s.pop(axis); msk_shape=s
- ncomp{None, int}, optional
- How many component basis projections to return. If ncomp is None (the default) then the number of components is given by the calculated rank of the data, after applying design_keep, design_resid and tol_ratio below. We always return all the basis vectors and percent variance for each component; ncomp refers only to the number of basis_projections returned. 
- standardizebool, optional
- If True, standardize so each time series (after application of design_keep and design_resid) has the same standard deviation, as calculated by the - np.stdfunction.
- design_keepNone or ndarray, optional
- Data is projected onto the column span of design_keep. None (default) equivalent to - np.identity(data.shape[axis])
- design_residstr or None or ndarray, optional
- After projecting onto the column span of design_keep, data is projected perpendicular to the column span of this matrix. If None, we do no such second projection. If a string ‘mean’, then the mean of the data is removed, equivalent to passing a column vector matrix of 1s. 
- tol_ratiofloat, optional
- If - XZis the vector of singular values of the projection matrix from design_keep and design_resid, and S are the singular values of- XZ, then tol_ratio is the value used to calculate the effective rank of the projection of the design, as in- rank = ((S / S.max) > tol_ratio).sum()
 
- Returns:
- resultsdict
- \(G\) is the number of non-trivial components found after applying - tol_ratio to the projections of design_keep and design_resid. - results has keys: - basis_vectors: series over axis, shape (data.shape[axis], G) -
- the eigenvectors of the PCA 
 
- pcnt_var: percent variance explained by component, shape
- (G,) 
 
- basis_projections: PCA components, with components varying
- over axis axis; thus shape given by: - s = list(data.shape); s[axis] = ncomp
 
- axis: axis over which PCA has been performed.
 
 
 - Notes - See - pca_image.mfrom- fmristatfor Keith Worsley’s code on which some of this is based.- See: http://en.wikipedia.org/wiki/Principal_component_analysis for some inspiration for naming - particularly ‘basis_vectors’ and ‘basis_projections’ - Examples - >>> arr = np.random.normal(size=(17, 10, 12, 14)) >>> msk = np.all(arr > -2, axis=0) >>> res = pca(arr, mask=msk, ncomp=9) - Basis vectors are columns. There is one column for each component. The number of components is the calculated rank of the data matrix after applying the various projections listed in the parameters. In this case we are only removing the mean, so the number of components is one less than the axis over which we do the PCA (here axis=0 by default). - >>> res['basis_vectors'].shape (17, 16) - Basis projections are arrays with components in the dimension over which we have done the PCA (axis=0 by default). Because we set ncomp above, we only retain ncomp components. - >>> res['basis_projections'].shape (9, 10, 12, 14) 
- nipy.algorithms.utils.pca.pca_image(img, axis='t', mask=None, ncomp=None, standardize=True, design_keep=None, design_resid='mean', tol_ratio=0.01)¶
- Compute the PCA of an image over a specified axis - Parameters:
- imgImage
- The image on which to perform PCA over the given axis 
- axisstr or int, optional
- Axis over which to perform PCA. Default is ‘t’. If axis is an integer, gives the index of the input (domain) axis of img. If axis is a str, can be an input (domain) name, or an output (range) name, that maps to an input (domain) name. 
- maskImage, optional
- An optional mask, should have shape == image.shape[:3] and the same coordinate map as img but with axis dropped 
- ncomp{None, int}, optional
- How many component basis projections to return. If ncomp is None (the default) then the number of components is given by the calculated rank of the data, after applying design_keep, design_resid and tol_ratio below. We always return all the basis vectors and percent variance for each component; ncomp refers only to the number of basis_projections returned. 
- standardizebool, optional
- If True, standardize so each time series (after application of design_keep and design_resid) has the same standard deviation, as calculated by the - np.stdfunction.
- design_keepNone or ndarray, optional
- Data is projected onto the column span of design_keep. None (default) equivalent to - np.identity(data.shape[axis])
- design_residstr or None or ndarray, optional
- After projecting onto the column span of design_keep, data is projected perpendicular to the column span of this matrix. If None, we do no such second projection. If a string ‘mean’, then the mean of the data is removed, equivalent to passing a column vector matrix of 1s. 
- tol_ratiofloat, optional
- If - XZis the vector of singular values of the projection matrix from design_keep and design_resid, and S are the singular values of- XZ, then tol_ratio is the value used to calculate the effective rank of the projection of the design, as in- rank = ((S / S.max) > tol_ratio).sum()
 
- Returns:
- resultsdict
- \(L\) is the number of non-trivial components found after applying - tol_ratio to the projections of design_keep and design_resid. - results has keys: * - basis_vectors: series over axis, shape (data.shape[axis], L) -- the eigenvectors of the PCA - pcnt_var: percent variance explained by component, shape
- (L,) 
 
- basis_projections: PCA components, with components varying
- over axis axis; thus shape given by: - s = list(data.shape); s[axis] = ncomp
 
- axis: axis over which PCA has been performed.
 
 
 - Examples - >>> from nipy.testing import funcfile >>> from nipy import load_image >>> func_img = load_image(funcfile) - Time is the fourth axis - >>> func_img.coordmap.function_range CoordinateSystem(coord_names=('aligned-x=L->R', 'aligned-y=P->A', 'aligned-z=I->S', 't'), name='aligned', coord_dtype=float64) >>> func_img.shape (17, 21, 3, 20) - Calculate the PCA over time, by default - >>> res = pca_image(func_img) >>> res['basis_projections'].coordmap.function_range CoordinateSystem(coord_names=('aligned-x=L->R', 'aligned-y=P->A', 'aligned-z=I->S', 'PCA components'), name='aligned', coord_dtype=float64) - The number of components is one less than the number of time points - >>> res['basis_projections'].shape (17, 21, 3, 19) 
