algorithms.clustering.gmm¶

Module: `algorithms.clustering.gmm`¶

Inheritance diagram for nipy.algorithms.clustering.gmm:

Inheritance diagram of nipy.algorithms.clustering.gmm

Gaussian Mixture Model Class: contains the basic fields and methods of GMMs The class GMM _old uses C bindings which are computationally and memory efficient.

Author : Bertrand Thirion, 2006-2009

Classes¶

`GMM`¶

class nipy.algorithms.clustering.gmm.GMM(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶

Bases: object

Standard GMM.

this class contains the following members k (int): the number of components in the mixture dim (int): is the dimension of the data prec_type = ‘full’ (string) is the parameterization

of the precisions/covariance matrices: either ‘full’ or ‘diagonal’.

means: array of shape (k,dim):: all the means (mean parameters) of the components
precisions: array of shape (k,dim,dim):: the precisions (inverse covariance matrix) of the components

weights: array of shape(k): weights of the mixture

__init__(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶

Initialize the structure, at least with the dimensions of the problem

Parameters:

k (int) the number of classes of the model
dim (int) the dimension of the problem
prec_type = ‘full’coavriance:precision parameterization: (diagonal ‘diag’ or full ‘full’).
means = None: array of shape (self.k,self.dim)
precisions = None: array of shape (self.k,self.dim,self.dim): or (self.k, self.dim)
weights=None: array of shape (self.k)
By default, means, precision and weights are set as
zeros()
eye()
1/k ones()
with the correct dimensions

average_log_like(x, tiny=1e-15)¶

returns the averaged log-likelihood of the mode for the dataset x

Parameters:

x: array of shape (n_samples,self.dim): the data used in the estimation process
tiny = 1.e-15: a small constant to avoid numerical singularities

bic(like, tiny=1e-15)¶

Computation of bic approximation of evidence

Parameters:

like, array of shape (n_samples, self.k): component-wise likelihood
tiny=1.e-15, a small constant to avoid numerical singularities

Returns:

the bic value, float

check()¶: Checking the shape of different matrices involved in the model

check_x(x)¶

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

estimate(x, niter=100, delta=0.0001, verbose=0)¶

Estimation of the model given a dataset x

Parameters:

x array of shape (n_samples,dim): the data from which the model is estimated
niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which: convergence is declared
verbose=0: verbosity mode

Returns:

bican asymptotic approximation of model evidence

evidence(x)¶

Computation of bic approximation of evidence

Parameters:

x array of shape (n_samples,dim): the data from which bic is computed

Returns:

the bic value

guess_regularizing(x, bcheck=1)¶

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x array of shape (n_samples,dim): the data used in the estimation process

initialize(x)¶

Initializes self according to a certain dataset x: 1. sets the regularizing hyper-parameters 2. initializes z using a k-means algorithm, then 3. update the parameters

Parameters:

x, array of shape (n_samples,self.dim): the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶

Estimation of self given x

Parameters:

x array of shape (n_samples,dim): the data from which the model is estimated
z = None: array of shape (n_samples): a prior labelling of the data to initialize the computation
niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which: convergence is declared
ninit=1: number of initialization performed: to reach a good solution
verbose=0: verbosity mode

Returns:

the best model is returned

likelihood(x)¶

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x array of shape (n_samples,self.dim): the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k): component-wise likelihood

map_label(x, like=None)¶

return the MAP labelling of x

Parameters:

x array of shape (n_samples,dim): the data under study
like=None array of shape(n_samples,self.k): component-wise likelihood if like==None, it is recomputed

Returns:

z: array of shape(n_samples): the resulting MAP labelling: of the rows of x

mixture_likelihood(x)¶

Returns the likelihood of the mixture for x

Parameters:

x: array of shape (n_samples,self.dim): the data used in the estimation process

plugin(means, precisions, weights)¶

Set manually the weights, means and precision of the model

Parameters:

means: array of shape (self.k,self.dim)
precisions: array of shape (self.k,self.dim,self.dim): or (self.k, self.dim)
weights: array of shape (self.k)

pop(like, tiny=1e-15)¶

compute the population, i.e. the statistics of allocation

Parameters:

like: array of shape (n_samples,self.k):: the likelihood of each item being in each class

show(x, gd, density=None, axes=None)¶

Function to plot a GMM, still in progress Currently, works only in 1D and 2D

Parameters:

x: array of shape(n_samples, dim): the data under study
gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins)): density of the model one the discrete grid implied by gd by default, this is recomputed

show_components(x, gd, density=None, mpaxes=None)¶

Function to plot a GMM – Currently, works only in 1D

Parameters:

x: array of shape(n_samples, dim): the data under study
gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins)): density of the model one the discrete grid implied by gd by default, this is recomputed
mpaxes: axes handle to make the figure, optional,: if None, a new figure is created

test(x, tiny=1e-15)¶

Returns the log-likelihood of the mixture for x

Parameters:

x array of shape (n_samples,self.dim): the data used in the estimation process

Returns:

ll: array of shape(n_samples): the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶: Idem initialize_and_estimate

unweighted_likelihood(x)¶

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim): the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k): unweighted component-wise likelihood

Notes

Hopefully faster

unweighted_likelihood_(x)¶

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim): the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k): unweighted component-wise likelihood

update(x, l)¶: Identical to self._Mstep(x,l)

`GridDescriptor`¶

class nipy.algorithms.clustering.gmm.GridDescriptor(dim=1, lim=None, n_bins=None)¶

Bases: object

A tiny class to handle cartesian grids

__init__(dim=1, lim=None, n_bins=None)¶

Parameters:

dim: int, optional,: the dimension of the grid
lim: list of len(2*self.dim),: the limits of the grid as (xmin, xmax, ymin, ymax, …)
n_bins: list of len(self.dim),: the number of bins in each direction

make_grid()¶

Compute the grid points

Returns:

grid: array of shape (nb_nodes, self.dim): where nb_nodes is the prod of self.n_bins

set(lim, n_bins=10)¶

set the limits of the grid and the number of bins

Parameters:

lim: list of len(2*self.dim),: the limits of the grid as (xmin, xmax, ymin, ymax, …)
n_bins: list of len(self.dim), optional: the number of bins in each direction

Functions¶

nipy.algorithms.clustering.gmm.best_fitting_GMM(x, krange, prec_type='full', niter=100, delta=0.0001, ninit=1, verbose=0)¶

Given a certain dataset x, find the best-fitting GMM with a number k of classes in a certain range defined by krange

Parameters:

x: array of shape (n_samples,dim): the data from which the model is estimated
krange: list of floats,: the range of values to test for k
prec_type: string (to be chosen within ‘full’,’diag’), optional,: the covariance parameterization
niter: int, optional,: maximal number of iterations in the estimation process
delta: float, optional,: increment of data likelihood at which convergence is declared
ninit: int: number of initialization performed
verbose=0: verbosity mode

Returns:

mgthe best-fitting GMM instance

nipy.algorithms.clustering.gmm.plot2D(x, my_gmm, z=None, with_dots=True, log_scale=False, mpaxes=None, verbose=0)¶

Given a set of points in a plane and a GMM, plot them

Parameters:

x: array of shape (npoints, dim=2),: sample points
my_gmm: GMM instance,: whose density has to be plotted
z: array of shape (npoints), optional: that gives a labelling of the points in x by default, it is not taken into account
with_dots, bool, optional: whether to plot the dots or not
log_scale: bool, optional: whether to plot the likelihood in log scale or not
mpaxes=None, int, optional: if not None, axes handle for plotting
verbose: verbosity mode, optional

Returns:

gd, GridDescriptor instance,: that represents the grid used in the function
ax, handle to the figure axes

Notes

my_gmm is assumed to have have a ‘nixture_likelihood’ method that takes an array of points of shape (np, dim) and returns an array of shape (np,my_gmm.k) that represents the likelihood component-wise

algorithms.clustering.gmm¶

Module: `algorithms.clustering.gmm`¶

Classes¶

`GMM`¶

`GridDescriptor`¶

Functions¶

Site Navigation

NIPY Community

Github repo

Table of Contents

Previous topic

Next topic

This Page

algorithms.clustering.gmm¶

Module: algorithms.clustering.gmm¶

Classes¶

GMM¶

GridDescriptor¶

Functions¶

Module: `algorithms.clustering.gmm`¶

`GMM`¶

`GridDescriptor`¶