# algorithms.clustering.imm¶

## Module: algorithms.clustering.imm¶

Inheritance diagram for nipy.algorithms.clustering.imm: Infinite mixture model : A generalization of Bayesian mixture models with an unspecified number of classes

## Classes¶

### IMM¶

class nipy.algorithms.clustering.imm.IMM(alpha=0.5, dim=1)

The class implements Infinite Gaussian Mixture model or Dirichlet Proces Mixture Model. This simply a generalization of Bayesian Gaussian Mixture Models with an unknown number of classes.

__init__(alpha=0.5, dim=1)
Parameters: alpha: float, optional, the parameter for cluster creation dim: int, optional, the dimension of the the data Note: use the function set_priors() to set adapted priors
cross_validated_update(x, z, plike, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters: x: array of shape(n_samples, dim), the input data z: array of shape(n_samples), the associated membership variables plike: array of shape(n_samples), the likelihood under the prior kfold: int, or array of shape(n_samples), optional, folds in the cross-validation loop like: array od shape(n_samples), the (cross-validated) likelihood of the data
likelihood(x, plike=None)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters: x: array of shape (n_samples, self.dim), the data used in the estimation process plike: array os shape (n_samples), optional,x the desnity of each point under the prior like, array of shape(nbitem,self.k) component-wise likelihood
likelihood_under_the_prior(x)

Computes the likelihood of x under the prior

Parameters: x, array of shape (self.n_samples,self.dim) w, the likelihood of x under the prior model (unweighted)
reduce(z)

Reduce the assignments by removing empty clusters and update self.k

Parameters: z: array of shape(n), a vector of membership variables changed in place z: the remapped values
sample(x, niter=1, sampling_points=None, init=False, kfold=None, verbose=0)

sample the indicator and parameters

Parameters: x: array of shape (n_samples, self.dim) the data used in the estimation process niter: int, the number of iterations to perform sampling_points: array of shape(nbpoints, self.dim), optional points where the likelihood will be sampled this defaults to x kfold: int or array, optional, parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate verbose=0: verbosity mode likelihood: array of shape(nbpoints) total likelihood of the model
sample_indicator(like)

Sample the indicator from the likelihood

Parameters: like: array of shape (nbitem,self.k) component-wise likelihood z: array of shape(nbitem): a draw of the membership variable

Notes

The behaviour is different from standard bgmm in that z can take arbitrary values

set_constant_densities(prior_dens=None)

Set the null and prior densities as constant (assuming a compact domain)

Parameters: prior_dens: float, optional constant for the prior density
set_priors(x)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters: x, array of shape (n_samples,self.dim) the data used in the estimation process
simple_update(x, z, plike)
This is a step in the sampling procedure

that uses internal corss_validation

Parameters: x: array of shape(n_samples, dim), the input data z: array of shape(n_samples), the associated membership variables plike: array of shape(n_samples), the likelihood under the prior like: array od shape(n_samples), the likelihood of the data
update(x, z)

Update function (draw a sample of the IMM parameters)

Parameters: x array of shape (n_samples,self.dim) the data used in the estimation process z array of shape (n_samples), type = np.int the corresponding classification
update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters: z array of shape (n_samples), type = np.int the allocation variable

### MixedIMM¶

class nipy.algorithms.clustering.imm.MixedIMM(alpha=0.5, dim=1)

Particular IMM with an additional null class. The data is supplied together with a sample-related probability of being under the null.

__init__(alpha=0.5, dim=1)
Parameters: alpha: float, optional, the parameter for cluster creation dim: int, optional, the dimension of the the data Note: use the function set_priors() to set adapted priors
cross_validated_update(x, z, plike, null_class_proba, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters: x: array of shape(n_samples, dim), the input data z: array of shape(n_samples), the associated membership variables plike: array of shape(n_samples), the likelihood under the prior kfold: int, optional, or array number of folds in cross-validation loop or set of indexes for the cross-validation procedure null_class_proba: array of shape(n_samples), prior probability to be under the null like: array od shape(n_samples), the (cross-validated) likelihood of the data z: array of shape(n_samples), the associated membership variables

Notes

When kfold is an array, there is an internal reshuffling to randomize the order of updates

sample(x, null_class_proba, niter=1, sampling_points=None, init=False, kfold=None, co_clustering=False, verbose=0)

sample the indicator and parameters

Parameters: x: array of shape (n_samples, self.dim), the data used in the estimation process null_class_proba: array of shape(n_samples), the probability to be under the null niter: int, the number of iterations to perform sampling_points: array of shape(nbpoints, self.dim), optional points where the likelihood will be sampled this defaults to x kfold: int, optional, parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate co_clustering: bool, optional if True, return a model of data co-labelling across iterations verbose=0: verbosity mode likelihood: array of shape(nbpoints) total likelihood of the model pproba: array of shape(n_samples), the posterior of being in the null (the posterior of null_class_proba) coclust: only if co_clustering==True, sparse_matrix of shape (n_samples, n_samples), frequency of co-labelling of each sample pairs across iterations
sample_indicator(like, null_class_proba)

sample the indicator from the likelihood

Parameters: like: array of shape (nbitem,self.k) component-wise likelihood null_class_proba: array of shape(n_samples), prior probability to be under the null z: array of shape(nbitem): a draw of the membership variable

Notes

Here z=-1 encodes for the null class

set_constant_densities(null_dens=None, prior_dens=None)

Set the null and prior densities as constant (over a supposedly compact domain)

Parameters: null_dens: float, optional constant for the null density prior_dens: float, optional constant for the prior density
simple_update(x, z, plike, null_class_proba)

One step in the sampling procedure (one data sweep)

Parameters: x: array of shape(n_samples, dim), the input data z: array of shape(n_samples), the associated membership variables plike: array of shape(n_samples), the likelihood under the prior null_class_proba: array of shape(n_samples), prior probability to be under the null like: array od shape(n_samples), the likelihood of the data under the H1 hypothesis

## Functions¶

nipy.algorithms.clustering.imm.co_labelling(z, kmax=None, kmin=None)

return a sparse co-labelling matrix given the label vector z

Parameters: z: array of shape(n_samples), the input labels kmax: int, optional, considers only the labels in the range [0, kmax[ colabel: a sparse coo_matrix, yields the co labelling of the data i.e. c[i,j]= 1 if z[i]==z[j], 0 otherwise
nipy.algorithms.clustering.imm.main()

Illustrative example of the behaviour of imm