algorithms.statistics.empirical_pvalue

Module: algorithms.statistics.empirical_pvalue

Inheritance diagram for nipy.algorithms.statistics.empirical_pvalue:

Inheritance diagram of nipy.algorithms.statistics.empirical_pvalue

Routines to get corrected p-values estimates, based on the observations.

It implements 3 approaches:

  • Benjamini-Hochberg FDR: http://en.wikipedia.org/wiki/False_discovery_rate

  • a class that fits a Gaussian model to the central part of an histogram, following [1]

    [1] Schwartzman A, Dougherty RF, Lee J, Ghahremani D, Taylor JE. Empirical null and false discovery rate analysis in neuroimaging. Neuroimage. 2009 Jan 1;44(1):71-82. PubMed PMID: 18547821. DOI: 10.1016/j.neuroimage.2008.04.182

    This is typically necessary to estimate a FDR when one is not certain that the data behaves as a standard normal under H_0.

  • a model based on Gaussian mixture modelling ‘a la Oxford’

Author : Bertrand Thirion, Yaroslav Halchenko, 2008-2012

Class

NormalEmpiricalNull

class nipy.algorithms.statistics.empirical_pvalue.NormalEmpiricalNull(x)

Bases: object

Class to compute the empirical null normal fit to the data.

The data which is used to estimate the FDR, assuming a Gaussian null from Schwartzmann et al., NeuroImage 44 (2009) 71–82

__init__(x)

Initialize an empirical null normal object.

Parameters:

x : 1D ndarray

The data used to estimate the empirical null.

fdr(theta)

Given a threshold theta, find the estimated FDR

Parameters:

theta : float or array of shape (n_samples)

values to test

Returns:

afp : value of array of shape(n)

fdrcurve()

Returns the FDR associated with any point of self.x

learn(left=0.2, right=0.8)

Estimate the proportion, mean and variance of a Gaussian distribution for a fraction of the data

Parameters:

left: float, optional

Left cut parameter to prevent fitting non-gaussian data

right: float, optional

Right cut parameter to prevent fitting non-gaussian data

Notes

This method stores the following attributes:

  • mu = mu
  • p0 = min(1, np.exp(lp0))
  • sqsigma: variance of the estimated normal distribution
  • sigma: np.sqrt(sqsigma) : standard deviation of the estimated normal distribution
plot(efp=None, alpha=0.05, bar=1, mpaxes=None)

Plot the histogram of x

Parameters:

efp : float, optional

The empirical FDR (corresponding to x) if efp==None, the false positive rate threshold plot is not drawn.

alpha : float, optional

The chosen FDR threshold

bar=1 : bool, optional

mpaxes=None: if not None, handle to an axes where the fig

will be drawn. Avoids creating unnecessarily new figures

threshold(alpha=0.05, verbose=0)

Compute the threshold corresponding to an alpha-level FDR for x

Parameters:

alpha : float, optional

the chosen false discovery rate threshold.

verbose : boolean, optional

the verbosity level, if True a plot is generated.

Returns:

theta: float

the critical value associated with the provided FDR

uncorrected_threshold(alpha=0.001, verbose=0)

Compute the threshold corresponding to a specificity alpha for x

Parameters:

alpha : float, optional

the chosen false discovery rate (FDR) threshold.

verbose : boolean, optional

the verbosity level, if True a plot is generated.

Returns:

theta: float

the critical value associated with the provided p-value

Functions

nipy.algorithms.statistics.empirical_pvalue.check_p_values(p_values)

Basic checks on the p_values array: values should be within [0,1]

Assures also that p_values are at least in 1d array. None of the checks is performed if p_values is None.

Parameters:

p_values : array of shape (n)

The sample p-values

Returns:

p_values : array of shape (n)

The sample p-values

nipy.algorithms.statistics.empirical_pvalue.fdr(p_values=None, verbose=0)

Returns the FDR associated with each p value

Parameters:

p_values : ndarray of shape (n)

The samples p-value

Returns:

q : array of shape(n)

The corresponding fdr values

nipy.algorithms.statistics.empirical_pvalue.fdr_threshold(p_values, alpha=0.05)

Return FDR threshold given p values

Parameters:

p_values : array of shape (n), optional

The samples p-value

alpha : float, optional

The desired FDR significance

Returns:

critical_p_value: float

The p value corresponding to the FDR alpha

nipy.algorithms.statistics.empirical_pvalue.gamma_gaussian_fit(x, test=None, verbose=0, mpaxes=False, bias=1, gaussian_mix=0, return_estimator=False)

Computing some prior probabilities that the voxels of a certain map are in class disactivated, null or active using a gamma-Gaussian mixture

Parameters:

x: array of shape (nvox,)

the map to be analysed

test: array of shape (nbitems,), optional

the test values for which the p-value needs to be computed by default, test = x

verbose: 0, 1 or 2, optional

verbosity mode, 0 is quiet, and 2 calls matplotlib to display graphs.

mpaxes: matplotlib axes, optional

axes handle used to plot the figure in verbose mode if None, new axes are created if false, nothing is done

bias: float, optional

lower bound on the Gaussian variance (to avoid shrinkage)

gaussian_mix: float, optional

if nonzero, lower bound on the Gaussian mixing weight (to avoid shrinkage)

return_estimator: boolean, optional

if return_estimator is true, the estimator object is returned.

Returns:

bfp: array of shape (nbitems,3)

The probability of each component in the mixture model for each test value

estimator: nipy.labs.clustering.ggmixture.GGGM object

The estimator object, returned only if return_estimator is true.

nipy.algorithms.statistics.empirical_pvalue.gaussian_fdr(x)

Return the FDR associated with each value assuming a Gaussian distribution

nipy.algorithms.statistics.empirical_pvalue.gaussian_fdr_threshold(x, alpha=0.05)

Return FDR threshold given normal variates

Given an array x of normal variates, this function returns the critical p-value associated with alpha. x is explicitly assumed to be normal distributed under H_0

Parameters:

x: ndarray

input data

alpha: float, optional

desired significance

Returns:

threshold : float

threshold, given as a Gaussian critical value

nipy.algorithms.statistics.empirical_pvalue.smoothed_histogram_from_samples(x, bins=None, nbins=256, normalized=False)

Smooth histogram corresponding to density underlying the samples in x

Parameters:

x: array of shape(n_samples)

input data

bins: array of shape(nbins+1), optional

the bins location

nbins: int, optional

the number of bins of the resulting histogram

normalized: bool, optional

if True, the result is returned as a density value

Returns:

h: array of shape (nbins)

the histogram

bins: array of shape(nbins+1),

the bins location

nipy.algorithms.statistics.empirical_pvalue.three_classes_GMM_fit(x, test=None, alpha=0.01, prior_strength=100, verbose=0, fixed_scale=False, mpaxes=None, bias=0, theta=0, return_estimator=False)

Fit the data with a 3-classes Gaussian Mixture Model, i.e. compute some probability that the voxels of a certain map are in class disactivated, null or active

Parameters:

x: array of shape (nvox,1)

The map to be analysed

test: array of shape(nbitems,1), optional

the test values for which the p-value needs to be computed by default (if None), test=x

alpha: float, optional

the prior weights of the positive and negative classes

prior_strength: float, optional

the confidence on the prior (should be compared to size(x))

verbose: int

verbosity mode

fixed_scale: bool, optional

boolean, variance parameterization. if True, the variance is locked to 1 otherwise, it is estimated from the data

mpaxes:

axes handle used to plot the figure in verbose mode if None, new axes are created

bias: bool

allows a rescaling of the posterior probability that takes into account the threshold theta. Not rigorous.

theta: float

the threshold used to correct the posterior p-values when bias=1; normally, it is such that test>theta note that if theta = -np.inf, the method has a standard behaviour

return_estimator: boolean, optional

If return_estimator is true, the estimator object is returned.

Returns:

bfp : array of shape (nbitems,3):

the posterior probability of each test item belonging to each component in the GMM (sum to 1 across the 3 classes) if np.size(test)==0, i.e. nbitem==0, None is returned

estimator : nipy.labs.clustering.GMM object

The estimator object, returned only if return_estimator is true.

Notes

Our convention is that:

  • class 1 represents the negative class
  • class 2 represents the null class
  • class 3 represents the positive class