algorithms.clustering.gmm¶
Module: algorithms.clustering.gmm
¶
Inheritance diagram for nipy.algorithms.clustering.gmm
:
Gaussian Mixture Model Class: contains the basic fields and methods of GMMs The class GMM _old uses C bindings which are computationally and memory efficient.
Author : Bertrand Thirion, 2006-2009
Classes¶
GMM
¶
- class nipy.algorithms.clustering.gmm.GMM(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶
Bases:
object
Standard GMM.
this class contains the following members k (int): the number of components in the mixture dim (int): is the dimension of the data prec_type = ‘full’ (string) is the parameterization
of the precisions/covariance matrices: either ‘full’ or ‘diagonal’.
- means: array of shape (k,dim):
all the means (mean parameters) of the components
- precisions: array of shape (k,dim,dim):
the precisions (inverse covariance matrix) of the components
weights: array of shape(k): weights of the mixture
- __init__(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶
Initialize the structure, at least with the dimensions of the problem
- Parameters:
- k (int) the number of classes of the model
- dim (int) the dimension of the problem
- prec_type = ‘full’coavriance:precision parameterization
(diagonal ‘diag’ or full ‘full’).
- means = None: array of shape (self.k,self.dim)
- precisions = None: array of shape (self.k,self.dim,self.dim)
or (self.k, self.dim)
- weights=None: array of shape (self.k)
- By default, means, precision and weights are set as
- zeros()
- eye()
- 1/k ones()
- with the correct dimensions
- average_log_like(x, tiny=1e-15)¶
returns the averaged log-likelihood of the mode for the dataset x
- Parameters:
- x: array of shape (n_samples,self.dim)
the data used in the estimation process
- tiny = 1.e-15: a small constant to avoid numerical singularities
- bic(like, tiny=1e-15)¶
Computation of bic approximation of evidence
- Parameters:
- like, array of shape (n_samples, self.k)
component-wise likelihood
- tiny=1.e-15, a small constant to avoid numerical singularities
- Returns:
- the bic value, float
- check()¶
Checking the shape of different matrices involved in the model
- check_x(x)¶
essentially check that x.shape[1]==self.dim
x is returned with possibly reshaping
- estimate(x, niter=100, delta=0.0001, verbose=0)¶
Estimation of the model given a dataset x
- Parameters:
- x array of shape (n_samples,dim)
the data from which the model is estimated
- niter=100: maximal number of iterations in the estimation process
- delta = 1.e-4: increment of data likelihood at which
convergence is declared
- verbose=0: verbosity mode
- Returns:
- bican asymptotic approximation of model evidence
- evidence(x)¶
Computation of bic approximation of evidence
- Parameters:
- x array of shape (n_samples,dim)
the data from which bic is computed
- Returns:
- the bic value
- guess_regularizing(x, bcheck=1)¶
Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)
- Parameters:
- x array of shape (n_samples,dim)
the data used in the estimation process
- initialize(x)¶
Initializes self according to a certain dataset x: 1. sets the regularizing hyper-parameters 2. initializes z using a k-means algorithm, then 3. update the parameters
- Parameters:
- x, array of shape (n_samples,self.dim)
the data used in the estimation process
- initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶
Estimation of self given x
- Parameters:
- x array of shape (n_samples,dim)
the data from which the model is estimated
- z = None: array of shape (n_samples)
a prior labelling of the data to initialize the computation
- niter=100: maximal number of iterations in the estimation process
- delta = 1.e-4: increment of data likelihood at which
convergence is declared
- ninit=1: number of initialization performed
to reach a good solution
- verbose=0: verbosity mode
- Returns:
- the best model is returned
- likelihood(x)¶
return the likelihood of the model for the data x the values are weighted by the components weights
- Parameters:
- x array of shape (n_samples,self.dim)
the data used in the estimation process
- Returns:
- like, array of shape(n_samples,self.k)
component-wise likelihood
- map_label(x, like=None)¶
return the MAP labelling of x
- Parameters:
- x array of shape (n_samples,dim)
the data under study
- like=None array of shape(n_samples,self.k)
component-wise likelihood if like==None, it is recomputed
- Returns:
- z: array of shape(n_samples): the resulting MAP labelling
of the rows of x
- mixture_likelihood(x)¶
Returns the likelihood of the mixture for x
- Parameters:
- x: array of shape (n_samples,self.dim)
the data used in the estimation process
- plugin(means, precisions, weights)¶
Set manually the weights, means and precision of the model
- Parameters:
- means: array of shape (self.k,self.dim)
- precisions: array of shape (self.k,self.dim,self.dim)
or (self.k, self.dim)
- weights: array of shape (self.k)
- pop(like, tiny=1e-15)¶
compute the population, i.e. the statistics of allocation
- Parameters:
- like: array of shape (n_samples,self.k):
the likelihood of each item being in each class
- show(x, gd, density=None, axes=None)¶
Function to plot a GMM, still in progress Currently, works only in 1D and 2D
- Parameters:
- x: array of shape(n_samples, dim)
the data under study
- gd: GridDescriptor instance
- density: array os shape(prod(gd.n_bins))
density of the model one the discrete grid implied by gd by default, this is recomputed
- show_components(x, gd, density=None, mpaxes=None)¶
Function to plot a GMM – Currently, works only in 1D
- Parameters:
- x: array of shape(n_samples, dim)
the data under study
- gd: GridDescriptor instance
- density: array os shape(prod(gd.n_bins))
density of the model one the discrete grid implied by gd by default, this is recomputed
- mpaxes: axes handle to make the figure, optional,
if None, a new figure is created
- test(x, tiny=1e-15)¶
Returns the log-likelihood of the mixture for x
- Parameters:
- x array of shape (n_samples,self.dim)
the data used in the estimation process
- Returns:
- ll: array of shape(n_samples)
the log-likelihood of the rows of x
- train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶
Idem initialize_and_estimate
- unweighted_likelihood(x)¶
return the likelihood of each data for each component the values are not weighted by the component weights
- Parameters:
- x: array of shape (n_samples,self.dim)
the data used in the estimation process
- Returns:
- like, array of shape(n_samples,self.k)
unweighted component-wise likelihood
Notes
Hopefully faster
- unweighted_likelihood_(x)¶
return the likelihood of each data for each component the values are not weighted by the component weights
- Parameters:
- x: array of shape (n_samples,self.dim)
the data used in the estimation process
- Returns:
- like, array of shape(n_samples,self.k)
unweighted component-wise likelihood
- update(x, l)¶
Identical to self._Mstep(x,l)
GridDescriptor
¶
- class nipy.algorithms.clustering.gmm.GridDescriptor(dim=1, lim=None, n_bins=None)¶
Bases:
object
A tiny class to handle cartesian grids
- __init__(dim=1, lim=None, n_bins=None)¶
- Parameters:
- dim: int, optional,
the dimension of the grid
- lim: list of len(2*self.dim),
the limits of the grid as (xmin, xmax, ymin, ymax, …)
- n_bins: list of len(self.dim),
the number of bins in each direction
- make_grid()¶
Compute the grid points
- Returns:
- grid: array of shape (nb_nodes, self.dim)
where nb_nodes is the prod of self.n_bins
- set(lim, n_bins=10)¶
set the limits of the grid and the number of bins
- Parameters:
- lim: list of len(2*self.dim),
the limits of the grid as (xmin, xmax, ymin, ymax, …)
- n_bins: list of len(self.dim), optional
the number of bins in each direction
Functions¶
- nipy.algorithms.clustering.gmm.best_fitting_GMM(x, krange, prec_type='full', niter=100, delta=0.0001, ninit=1, verbose=0)¶
Given a certain dataset x, find the best-fitting GMM with a number k of classes in a certain range defined by krange
- Parameters:
- x: array of shape (n_samples,dim)
the data from which the model is estimated
- krange: list of floats,
the range of values to test for k
- prec_type: string (to be chosen within ‘full’,’diag’), optional,
the covariance parameterization
- niter: int, optional,
maximal number of iterations in the estimation process
- delta: float, optional,
increment of data likelihood at which convergence is declared
- ninit: int
number of initialization performed
- verbose=0: verbosity mode
- Returns:
- mgthe best-fitting GMM instance
- nipy.algorithms.clustering.gmm.plot2D(x, my_gmm, z=None, with_dots=True, log_scale=False, mpaxes=None, verbose=0)¶
Given a set of points in a plane and a GMM, plot them
- Parameters:
- x: array of shape (npoints, dim=2),
sample points
- my_gmm: GMM instance,
whose density has to be plotted
- z: array of shape (npoints), optional
that gives a labelling of the points in x by default, it is not taken into account
- with_dots, bool, optional
whether to plot the dots or not
- log_scale: bool, optional
whether to plot the likelihood in log scale or not
- mpaxes=None, int, optional
if not None, axes handle for plotting
- verbose: verbosity mode, optional
- Returns:
- gd, GridDescriptor instance,
that represents the grid used in the function
- ax, handle to the figure axes
Notes
my_gmm
is assumed to have have a ‘nixture_likelihood’ method that takes an array of points of shape (np, dim) and returns an array of shape (np,my_gmm.k) that represents the likelihood component-wise