algorithms.clustering.imm

Module: algorithms.clustering.imm

Inheritance diagram for nipy.algorithms.clustering.imm:

Inheritance diagram of nipy.algorithms.clustering.imm

Infinite mixture model : A generalization of Bayesian mixture models with an unspecified number of classes

Classes

IMM

class nipy.algorithms.clustering.imm.IMM(alpha=0.5, dim=1)

Bases: nipy.algorithms.clustering.bgmm.BGMM

The class implements Infinite Gaussian Mixture model or Dirichlet Proces Mixture Model. This simply a generalization of Bayesian Gaussian Mixture Models with an unknown number of classes.

__init__(alpha=0.5, dim=1)
Parameters:

alpha: float, optional,

the parameter for cluster creation

dim: int, optional,

the dimension of the the data

Note: use the function set_priors() to set adapted priors

cross_validated_update(x, z, plike, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim),

the input data

z: array of shape(n_samples),

the associated membership variables

plike: array of shape(n_samples),

the likelihood under the prior

kfold: int, or array of shape(n_samples), optional,

folds in the cross-validation loop

Returns:

like: array od shape(n_samples),

the (cross-validated) likelihood of the data

likelihood(x, plike=None)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x: array of shape (n_samples, self.dim),

the data used in the estimation process

plike: array os shape (n_samples), optional,x

the desnity of each point under the prior

Returns:

like, array of shape(nbitem,self.k)

component-wise likelihood

likelihood_under_the_prior(x)

Computes the likelihood of x under the prior

Parameters:x, array of shape (self.n_samples,self.dim)
Returns:w, the likelihood of x under the prior model (unweighted)
reduce(z)

Reduce the assignments by removing empty clusters and update self.k

Parameters:

z: array of shape(n),

a vector of membership variables changed in place

Returns:

z: the remapped values

sample(x, niter=1, sampling_points=None, init=False, kfold=None, verbose=0)

sample the indicator and parameters

Parameters:

x: array of shape (n_samples, self.dim)

the data used in the estimation process

niter: int,

the number of iterations to perform

sampling_points: array of shape(nbpoints, self.dim), optional

points where the likelihood will be sampled this defaults to x

kfold: int or array, optional,

parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate

verbose=0: verbosity mode

Returns:

likelihood: array of shape(nbpoints)

total likelihood of the model

sample_indicator(like)

Sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k)

component-wise likelihood

Returns:

z: array of shape(nbitem): a draw of the membership variable

Notes

The behaviour is different from standard bgmm in that z can take arbitrary values

set_constant_densities(prior_dens=None)

Set the null and prior densities as constant (assuming a compact domain)

Parameters:

prior_dens: float, optional

constant for the prior density

set_priors(x)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (n_samples,self.dim)

the data used in the estimation process

simple_update(x, z, plike)
This is a step in the sampling procedure

that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim),

the input data

z: array of shape(n_samples),

the associated membership variables

plike: array of shape(n_samples),

the likelihood under the prior

Returns:

like: array od shape(n_samples),

the likelihood of the data

update(x, z)

Update function (draw a sample of the IMM parameters)

Parameters:

x array of shape (n_samples,self.dim)

the data used in the estimation process

z array of shape (n_samples), type = np.int

the corresponding classification

update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters:

z array of shape (n_samples), type = np.int

the allocation variable

MixedIMM

class nipy.algorithms.clustering.imm.MixedIMM(alpha=0.5, dim=1)

Bases: nipy.algorithms.clustering.imm.IMM

Particular IMM with an additional null class. The data is supplied together with a sample-related probability of being under the null.

__init__(alpha=0.5, dim=1)
Parameters:

alpha: float, optional,

the parameter for cluster creation

dim: int, optional,

the dimension of the the data

Note: use the function set_priors() to set adapted priors

cross_validated_update(x, z, plike, null_class_proba, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim),

the input data

z: array of shape(n_samples),

the associated membership variables

plike: array of shape(n_samples),

the likelihood under the prior

kfold: int, optional, or array

number of folds in cross-validation loop or set of indexes for the cross-validation procedure

null_class_proba: array of shape(n_samples),

prior probability to be under the null

Returns:

like: array od shape(n_samples),

the (cross-validated) likelihood of the data

z: array of shape(n_samples),

the associated membership variables

Notes

When kfold is an array, there is an internal reshuffling to randomize the order of updates

sample(x, null_class_proba, niter=1, sampling_points=None, init=False, kfold=None, co_clustering=False, verbose=0)

sample the indicator and parameters

Parameters:

x: array of shape (n_samples, self.dim),

the data used in the estimation process

null_class_proba: array of shape(n_samples),

the probability to be under the null

niter: int,

the number of iterations to perform

sampling_points: array of shape(nbpoints, self.dim), optional

points where the likelihood will be sampled this defaults to x

kfold: int, optional,

parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate

co_clustering: bool, optional

if True, return a model of data co-labelling across iterations

verbose=0: verbosity mode

Returns:

likelihood: array of shape(nbpoints)

total likelihood of the model

pproba: array of shape(n_samples),

the posterior of being in the null (the posterior of null_class_proba)

coclust: only if co_clustering==True,

sparse_matrix of shape (n_samples, n_samples), frequency of co-labelling of each sample pairs across iterations

sample_indicator(like, null_class_proba)

sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k)

component-wise likelihood

null_class_proba: array of shape(n_samples),

prior probability to be under the null

Returns:

z: array of shape(nbitem): a draw of the membership variable

Notes

Here z=-1 encodes for the null class

set_constant_densities(null_dens=None, prior_dens=None)

Set the null and prior densities as constant (over a supposedly compact domain)

Parameters:

null_dens: float, optional

constant for the null density

prior_dens: float, optional

constant for the prior density

simple_update(x, z, plike, null_class_proba)

One step in the sampling procedure (one data sweep)

Parameters:

x: array of shape(n_samples, dim),

the input data

z: array of shape(n_samples),

the associated membership variables

plike: array of shape(n_samples),

the likelihood under the prior

null_class_proba: array of shape(n_samples),

prior probability to be under the null

Returns:

like: array od shape(n_samples),

the likelihood of the data under the H1 hypothesis

Functions

nipy.algorithms.clustering.imm.co_labelling(z, kmax=None, kmin=None)

return a sparse co-labelling matrix given the label vector z

Parameters:

z: array of shape(n_samples),

the input labels

kmax: int, optional,

considers only the labels in the range [0, kmax[

Returns:

colabel: a sparse coo_matrix,

yields the co labelling of the data i.e. c[i,j]= 1 if z[i]==z[j], 0 otherwise

nipy.algorithms.clustering.imm.main()

Illustrative example of the behaviour of imm