# algorithms.statistics.empirical_pvalue¶

## Module: algorithms.statistics.empirical_pvalue¶

Inheritance diagram for nipy.algorithms.statistics.empirical_pvalue: Routines to get corrected p-values estimates, based on the observations.

It implements 3 approaches:

• Benjamini-Hochberg FDR: http://en.wikipedia.org/wiki/False_discovery_rate

• a class that fits a Gaussian model to the central part of an histogram, following 

 Schwartzman A, Dougherty RF, Lee J, Ghahremani D, Taylor JE. Empirical null and false discovery rate analysis in neuroimaging. Neuroimage. 2009 Jan 1;44(1):71-82. PubMed PMID: 18547821. DOI: 10.1016/j.neuroimage.2008.04.182

This is typically necessary to estimate a FDR when one is not certain that the data behaves as a standard normal under H_0.

• a model based on Gaussian mixture modelling ‘a la Oxford’

Author : Bertrand Thirion, Yaroslav Halchenko, 2008-2012

## NormalEmpiricalNull¶

class nipy.algorithms.statistics.empirical_pvalue.NormalEmpiricalNull(x)

Bases: object

Class to compute the empirical null normal fit to the data.

The data which is used to estimate the FDR, assuming a Gaussian null from Schwartzmann et al., NeuroImage 44 (2009) 71–82

__init__(x)

Initialize an empirical null normal object.

Parameters: x : 1D ndarray The data used to estimate the empirical null.
fdr(theta)

Given a threshold theta, find the estimated FDR

Parameters: theta : float or array of shape (n_samples) values to test afp : value of array of shape(n)
fdrcurve()

Returns the FDR associated with any point of self.x

learn(left=0.2, right=0.8)

Estimate the proportion, mean and variance of a Gaussian distribution for a fraction of the data

Parameters: left: float, optional Left cut parameter to prevent fitting non-gaussian data right: float, optional Right cut parameter to prevent fitting non-gaussian data

Notes

This method stores the following attributes:

• mu = mu
• p0 = min(1, np.exp(lp0))
• sqsigma: variance of the estimated normal distribution
• sigma: np.sqrt(sqsigma) : standard deviation of the estimated normal distribution
plot(efp=None, alpha=0.05, bar=1, mpaxes=None)

Plot the histogram of x

Parameters: efp : float, optional The empirical FDR (corresponding to x) if efp==None, the false positive rate threshold plot is not drawn. alpha : float, optional The chosen FDR threshold bar=1 : bool, optional mpaxes=None: if not None, handle to an axes where the fig will be drawn. Avoids creating unnecessarily new figures
threshold(alpha=0.05, verbose=0)

Compute the threshold corresponding to an alpha-level FDR for x

Parameters: alpha : float, optional the chosen false discovery rate threshold. verbose : boolean, optional the verbosity level, if True a plot is generated. theta: float the critical value associated with the provided FDR
uncorrected_threshold(alpha=0.001, verbose=0)

Compute the threshold corresponding to a specificity alpha for x

Parameters: alpha : float, optional the chosen false discovery rate (FDR) threshold. verbose : boolean, optional the verbosity level, if True a plot is generated. theta: float the critical value associated with the provided p-value

## Functions¶

nipy.algorithms.statistics.empirical_pvalue.check_p_values(p_values)

Basic checks on the p_values array: values should be within [0,1]

Assures also that p_values are at least in 1d array. None of the checks is performed if p_values is None.

Parameters: p_values : array of shape (n) The sample p-values p_values : array of shape (n) The sample p-values
nipy.algorithms.statistics.empirical_pvalue.fdr(p_values=None, verbose=0)

Returns the FDR associated with each p value

Parameters: p_values : ndarray of shape (n) The samples p-value q : array of shape(n) The corresponding fdr values
nipy.algorithms.statistics.empirical_pvalue.fdr_threshold(p_values, alpha=0.05)

Return FDR threshold given p values

Parameters: p_values : array of shape (n), optional The samples p-value alpha : float, optional The desired FDR significance critical_p_value: float The p value corresponding to the FDR alpha
nipy.algorithms.statistics.empirical_pvalue.gamma_gaussian_fit(x, test=None, verbose=0, mpaxes=False, bias=1, gaussian_mix=0, return_estimator=False)

Computing some prior probabilities that the voxels of a certain map are in class disactivated, null or active using a gamma-Gaussian mixture

Parameters: x: array of shape (nvox,) the map to be analysed test: array of shape (nbitems,), optional the test values for which the p-value needs to be computed by default, test = x verbose: 0, 1 or 2, optional verbosity mode, 0 is quiet, and 2 calls matplotlib to display graphs. mpaxes: matplotlib axes, optional axes handle used to plot the figure in verbose mode if None, new axes are created if false, nothing is done bias: float, optional lower bound on the Gaussian variance (to avoid shrinkage) gaussian_mix: float, optional if nonzero, lower bound on the Gaussian mixing weight (to avoid shrinkage) return_estimator: boolean, optional if return_estimator is true, the estimator object is returned. bfp: array of shape (nbitems,3) The probability of each component in the mixture model for each test value estimator: nipy.labs.clustering.ggmixture.GGGM object The estimator object, returned only if return_estimator is true.
nipy.algorithms.statistics.empirical_pvalue.gaussian_fdr(x)

Return the FDR associated with each value assuming a Gaussian distribution

nipy.algorithms.statistics.empirical_pvalue.gaussian_fdr_threshold(x, alpha=0.05)

Return FDR threshold given normal variates

Given an array x of normal variates, this function returns the critical p-value associated with alpha. x is explicitly assumed to be normal distributed under H_0

Parameters: x: ndarray input data alpha: float, optional desired significance threshold : float threshold, given as a Gaussian critical value
nipy.algorithms.statistics.empirical_pvalue.smoothed_histogram_from_samples(x, bins=None, nbins=256, normalized=False)

Smooth histogram corresponding to density underlying the samples in x

Parameters: x: array of shape(n_samples) input data bins: array of shape(nbins+1), optional the bins location nbins: int, optional the number of bins of the resulting histogram normalized: bool, optional if True, the result is returned as a density value h: array of shape (nbins) the histogram bins: array of shape(nbins+1), the bins location
nipy.algorithms.statistics.empirical_pvalue.three_classes_GMM_fit(x, test=None, alpha=0.01, prior_strength=100, verbose=0, fixed_scale=False, mpaxes=None, bias=0, theta=0, return_estimator=False)

Fit the data with a 3-classes Gaussian Mixture Model, i.e. compute some probability that the voxels of a certain map are in class disactivated, null or active

Parameters: x: array of shape (nvox,1) The map to be analysed test: array of shape(nbitems,1), optional the test values for which the p-value needs to be computed by default (if None), test=x alpha: float, optional the prior weights of the positive and negative classes prior_strength: float, optional the confidence on the prior (should be compared to size(x)) verbose: int verbosity mode fixed_scale: bool, optional boolean, variance parameterization. if True, the variance is locked to 1 otherwise, it is estimated from the data mpaxes: axes handle used to plot the figure in verbose mode if None, new axes are created bias: bool allows a rescaling of the posterior probability that takes into account the threshold theta. Not rigorous. theta: float the threshold used to correct the posterior p-values when bias=1; normally, it is such that test>theta note that if theta = -np.inf, the method has a standard behaviour return_estimator: boolean, optional If return_estimator is true, the estimator object is returned. bfp : array of shape (nbitems,3): the posterior probability of each test item belonging to each component in the GMM (sum to 1 across the 3 classes) if np.size(test)==0, i.e. nbitem==0, None is returned estimator : nipy.labs.clustering.GMM object The estimator object, returned only if return_estimator is true.

Notes

Our convention is that:

• class 1 represents the negative class
• class 2 represents the null class
• class 3 represents the positive class