algorithms.clustering.bgmm

Module: algorithms.clustering.bgmm

Inheritance diagram for nipy.algorithms.clustering.bgmm:

Inheritance diagram of nipy.algorithms.clustering.bgmm

Bayesian Gaussian Mixture Model Classes: contains the basic fields and methods of Bayesian GMMs the high level functions are/should be binded in C

The base class BGMM relies on an implementation that performs Gibbs sampling

A derived class VBGMM uses Variational Bayes inference instead

A third class is introduces to take advnatge of the old C-bindings, but it is limited to diagonal covariance models

Author : Bertrand Thirion, 2008-2011

Classes

BGMM

class nipy.algorithms.clustering.bgmm.BGMM(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Bases: GMM

This class implements Bayesian GMMs

this class contains the following fields k: int,

the number of components in the mixture

dim: int,

the dimension of the data

means: array of shape (k, dim)

all the means of the components

precisions: array of shape (k, dim, dim)

the precisions of the components

weights: array of shape (k):

weights of the mixture

shrinkage: array of shape (k):

scaling factor of the posterior precisions on the mean

dof: array of shape (k)

the degrees of freedom of the components

prior_means: array of shape (k, dim):

the prior on the components means

prior_scale: array of shape (k, dim):

the prior on the components precisions

prior_dof: array of shape (k):

the prior on the dof (should be at least equal to dim)

prior_shrinkage: array of shape (k):

scaling factor of the prior precisions on the mean

prior_weights: array of shape (k)

the prior on the components weights

shrinkage: array of shape (k):

scaling factor of the posterior precisions on the mean

dof : array of shape (k): the posterior dofs

__init__(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Initialize the structure with the dimensions of the problem Eventually provide different terms

average_log_like(x, tiny=1e-15)

returns the averaged log-likelihood of the mode for the dataset x

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities
bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:
x: array of shape (nb_samples,dim)

the data from which bic is computed

z: array of shape (nb_samples), type = np.int_

the corresponding classification

nperm=0: int

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode
Returns:
bf (float) the computed evidence (Bayes factor)

Notes

See: Marginal Likelihood from the Gibbs Output Journal article by Siddhartha Chib; Journal of the American Statistical Association, Vol. 90, 1995

bic(like, tiny=1e-15)

Computation of bic approximation of evidence

Parameters:
like, array of shape (n_samples, self.k)

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities
Returns:
the bic value, float
check()

Checking the shape of sifferent matrices involved in the model

check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z, perm=None)

Compute the probability of the current parameters of self given x and z

Parameters:
x: array of shape (nb_samples, dim),

the data from which bic is computed

z: array of shape (nb_samples), type = np.int_,

the corresponding classification

perm: array ok shape(nperm, self.k),typ=np.int_, optional

all permutation of z under which things will be recomputed By default, no permutation is performed

estimate(x, niter=100, delta=0.0001, verbose=0)

Estimation of the model given a dataset x

Parameters:
x array of shape (n_samples,dim)

the data from which the model is estimated

niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which

convergence is declared

verbose=0: verbosity mode
Returns:
bican asymptotic approximation of model evidence
evidence(x, z, nperm=0, verbose=0)

See bayes_factor(self, x, z, nperm=0, verbose=0)

guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:
x, array of shape (nb_samples,self.dim)

the data used in the estimation process

nocheck: boolean, optional,

if nocheck==True, check is skipped

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:
x array of shape (n_samples,dim)

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then update the parameters

Parameters:
x: array of shape (nb_samples,self.dim)

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Estimation of self given x

Parameters:
x array of shape (n_samples,dim)

the data from which the model is estimated

z = None: array of shape (n_samples)

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which

convergence is declared

ninit=1: number of initialization performed

to reach a good solution

verbose=0: verbosity mode
Returns:
the best model is returned
likelihood(x)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:
x array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
like, array of shape(n_samples,self.k)

component-wise likelihood

map_label(x, like=None)

return the MAP labelling of x

Parameters:
x array of shape (n_samples,dim)

the data under study

like=None array of shape(n_samples,self.k)

component-wise likelihood if like==None, it is recomputed

Returns:
z: array of shape(n_samples): the resulting MAP labelling

of the rows of x

mixture_likelihood(x)

Returns the likelihood of the mixture for x

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:
means: array of shape (self.k,self.dim)
precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim)

weights: array of shape (self.k)
pop(z)

compute the population, i.e. the statistics of allocation

Parameters:
z array of shape (nb_samples), type = np.int_

the allocation variable

Returns:
histarray shape (self.k) count variable
probability_under_prior()

Compute the probability of the current parameters of self given the priors

sample(x, niter=1, mem=0, verbose=0)

sample the indicator and parameters

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

niter=1the number of iterations to perform
mem=0: if mem, the best values of the parameters are computed
verbose=0: verbosity mode
Returns:
best_weights: array of shape (self.k)
best_means: array of shape (self.k, self.dim)
best_precisions: array of shape (self.k, self.dim, self.dim)
possibleZ: array of shape (nb_samples, niter)

the z that give the highest posterior to the data is returned first

sample_and_average(x, niter=1, verbose=0)

sample the indicator and parameters the average values for weights,means, precisions are returned

Parameters:
x = array of shape (nb_samples,dim)

the data from which bic is computed

niter=1: number of iterations
Returns:
weights: array of shape (self.k)
means: array of shape (self.k,self.dim)
precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim) these are the average parameters across samplings

Notes

All this makes sense only if no label switching as occurred so this is wrong in general (asymptotically).

fix: implement a permutation procedure for components identification

sample_indicator(like)

sample the indicator from the likelihood

Parameters:
like: array of shape (nb_samples,self.k)

component-wise likelihood

Returns:
z: array of shape(nb_samples): a draw of the membership variable
set_priors(prior_means, prior_weights, prior_scale, prior_dof, prior_shrinkage)

Set the prior of the BGMM

Parameters:
prior_means: array of shape (self.k,self.dim)
prior_weights: array of shape (self.k)
prior_scale: array of shape (self.k,self.dim,self.dim)
prior_dof: array of shape (self.k)
prior_shrinkage: array of shape (self.k)
show(x, gd, density=None, axes=None)

Function to plot a GMM, still in progress Currently, works only in 1D and 2D

Parameters:
x: array of shape(n_samples, dim)

the data under study

gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins))

density of the model one the discrete grid implied by gd by default, this is recomputed

show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:
x: array of shape(n_samples, dim)

the data under study

gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins))

density of the model one the discrete grid implied by gd by default, this is recomputed

mpaxes: axes handle to make the figure, optional,

if None, a new figure is created

test(x, tiny=1e-15)

Returns the log-likelihood of the mixture for x

Parameters:
x array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
ll: array of shape(n_samples)

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Idem initialize_and_estimate

unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
like, array of shape(n_samples,self.k)

unweighted component-wise likelihood

Notes

Hopefully faster

unweighted_likelihood_(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
like, array of shape(n_samples,self.k)

unweighted component-wise likelihood

update(x, z)

update function (draw a sample of the GMM parameters)

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

z array of shape (nb_samples), type = np.int_

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:
x: array of shape (nb_samples,self.dim)

the data used in the estimation process

z: array of shape (nb_samples), type = np.int_

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

z array of shape (nb_samples), type = np.int_

the corresponding classification

update_weights(z)

Given the allocation vector z, resample the weights parameter

Parameters:
z array of shape (nb_samples), type = np.int_

the allocation variable

VBGMM

class nipy.algorithms.clustering.bgmm.VBGMM(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Bases: BGMM

Subclass of Bayesian GMMs (BGMM) that implements Variational Bayes estimation of the parameters

__init__(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Initialize the structure with the dimensions of the problem Eventually provide different terms

average_log_like(x, tiny=1e-15)

returns the averaged log-likelihood of the mode for the dataset x

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities
bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:
x: array of shape (nb_samples,dim)

the data from which bic is computed

z: array of shape (nb_samples), type = np.int_

the corresponding classification

nperm=0: int

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode
Returns:
bf (float) the computed evidence (Bayes factor)

Notes

See: Marginal Likelihood from the Gibbs Output Journal article by Siddhartha Chib; Journal of the American Statistical Association, Vol. 90, 1995

bic(like, tiny=1e-15)

Computation of bic approximation of evidence

Parameters:
like, array of shape (n_samples, self.k)

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities
Returns:
the bic value, float
check()

Checking the shape of sifferent matrices involved in the model

check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z, perm=None)

Compute the probability of the current parameters of self given x and z

Parameters:
x: array of shape (nb_samples, dim),

the data from which bic is computed

z: array of shape (nb_samples), type = np.int_,

the corresponding classification

perm: array ok shape(nperm, self.k),typ=np.int_, optional

all permutation of z under which things will be recomputed By default, no permutation is performed

estimate(x, niter=100, delta=0.0001, verbose=0)

estimation of self given x

Parameters:
x array of shape (nb_samples,dim)

the data from which the model is estimated

z = None: array of shape (nb_samples)

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which

convergence is declared

verbose=0:

verbosity mode

evidence(x, like=None, verbose=0)

computation of evidence bound aka free energy

Parameters:
x array of shape (nb_samples,dim)

the data from which evidence is computed

like=None: array of shape (nb_samples, self.k), optional

component-wise likelihood If None, it is recomputed

verbose=0: verbosity model
Returns:
ev (float) the computed evidence
guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:
x, array of shape (nb_samples,self.dim)

the data used in the estimation process

nocheck: boolean, optional,

if nocheck==True, check is skipped

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:
x array of shape (n_samples,dim)

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then update the parameters

Parameters:
x: array of shape (nb_samples,self.dim)

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Estimation of self given x

Parameters:
x array of shape (n_samples,dim)

the data from which the model is estimated

z = None: array of shape (n_samples)

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process
delta = 1.e-4: increment of data likelihood at which

convergence is declared

ninit=1: number of initialization performed

to reach a good solution

verbose=0: verbosity mode
Returns:
the best model is returned
likelihood(x)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:
x: array of shape (nb_samples, self.dim)

the data used in the estimation process

Returns:
like: array of shape(nb_samples, self.k)

component-wise likelihood

map_label(x, like=None)

return the MAP labelling of x

Parameters:
x array of shape (nb_samples,dim)

the data under study

like=None array of shape(nb_samples,self.k)

component-wise likelihood if like==None, it is recomputed

Returns:
z: array of shape(nb_samples): the resulting MAP labelling

of the rows of x

mixture_likelihood(x)

Returns the likelihood of the mixture for x

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:
means: array of shape (self.k,self.dim)
precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim)

weights: array of shape (self.k)
pop(like, tiny=1e-15)

compute the population, i.e. the statistics of allocation

Parameters:
like array of shape (nb_samples, self.k):

the likelihood of each item being in each class

probability_under_prior()

Compute the probability of the current parameters of self given the priors

sample(x, niter=1, mem=0, verbose=0)

sample the indicator and parameters

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

niter=1the number of iterations to perform
mem=0: if mem, the best values of the parameters are computed
verbose=0: verbosity mode
Returns:
best_weights: array of shape (self.k)
best_means: array of shape (self.k, self.dim)
best_precisions: array of shape (self.k, self.dim, self.dim)
possibleZ: array of shape (nb_samples, niter)

the z that give the highest posterior to the data is returned first

sample_and_average(x, niter=1, verbose=0)

sample the indicator and parameters the average values for weights,means, precisions are returned

Parameters:
x = array of shape (nb_samples,dim)

the data from which bic is computed

niter=1: number of iterations
Returns:
weights: array of shape (self.k)
means: array of shape (self.k,self.dim)
precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim) these are the average parameters across samplings

Notes

All this makes sense only if no label switching as occurred so this is wrong in general (asymptotically).

fix: implement a permutation procedure for components identification

sample_indicator(like)

sample the indicator from the likelihood

Parameters:
like: array of shape (nb_samples,self.k)

component-wise likelihood

Returns:
z: array of shape(nb_samples): a draw of the membership variable
set_priors(prior_means, prior_weights, prior_scale, prior_dof, prior_shrinkage)

Set the prior of the BGMM

Parameters:
prior_means: array of shape (self.k,self.dim)
prior_weights: array of shape (self.k)
prior_scale: array of shape (self.k,self.dim,self.dim)
prior_dof: array of shape (self.k)
prior_shrinkage: array of shape (self.k)
show(x, gd, density=None, axes=None)

Function to plot a GMM, still in progress Currently, works only in 1D and 2D

Parameters:
x: array of shape(n_samples, dim)

the data under study

gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins))

density of the model one the discrete grid implied by gd by default, this is recomputed

show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:
x: array of shape(n_samples, dim)

the data under study

gd: GridDescriptor instance
density: array os shape(prod(gd.n_bins))

density of the model one the discrete grid implied by gd by default, this is recomputed

mpaxes: axes handle to make the figure, optional,

if None, a new figure is created

test(x, tiny=1e-15)

Returns the log-likelihood of the mixture for x

Parameters:
x array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
ll: array of shape(n_samples)

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Idem initialize_and_estimate

unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
like, array of shape(n_samples,self.k)

unweighted component-wise likelihood

Notes

Hopefully faster

unweighted_likelihood_(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:
x: array of shape (n_samples,self.dim)

the data used in the estimation process

Returns:
like, array of shape(n_samples,self.k)

unweighted component-wise likelihood

update(x, z)

update function (draw a sample of the GMM parameters)

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

z array of shape (nb_samples), type = np.int_

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:
x: array of shape (nb_samples,self.dim)

the data used in the estimation process

z: array of shape (nb_samples), type = np.int_

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:
x array of shape (nb_samples,self.dim)

the data used in the estimation process

z array of shape (nb_samples), type = np.int_

the corresponding classification

update_weights(z)

Given the allocation vector z, resample the weights parameter

Parameters:
z array of shape (nb_samples), type = np.int_

the allocation variable

Functions

nipy.algorithms.clustering.bgmm.detsh(H)

Routine for the computation of determinants of symmetric positive matrices

Parameters:
H array of shape(n,n)

the input matrix, assumed symmmetric and positive

Returns:
dh: float, the determinant
nipy.algorithms.clustering.bgmm.dirichlet_eval(w, alpha)

Evaluate the probability of a certain discrete draw w from the Dirichlet density with parameters alpha

Parameters:
w: array of shape (n)
alpha: array of shape (n)
nipy.algorithms.clustering.bgmm.dkl_dirichlet(w1, w2)

Returns the KL divergence between two dirichlet distribution

Parameters:
w1: array of shape(n),

the parameters of the first dirichlet density

w2: array of shape(n),

the parameters of the second dirichlet density

nipy.algorithms.clustering.bgmm.dkl_gaussian(m1, P1, m2, P2)

Returns the KL divergence between gausians densities

Parameters:
m1: array of shape (n),

the mean parameter of the first density

P1: array of shape(n,n),

the precision parameters of the first density

m2: array of shape (n),

the mean parameter of the second density

P2: array of shape(n,n),

the precision parameters of the second density

nipy.algorithms.clustering.bgmm.dkl_wishart(a1, B1, a2, B2)

returns the KL divergence bteween two Wishart distribution of parameters (a1,B1) and (a2,B2),

Parameters:
a1: Float,

degrees of freedom of the first density

B1: array of shape(n,n),

scale matrix of the first density

a2: Float,

degrees of freedom of the second density

B2: array of shape(n,n),

scale matrix of the second density

Returns:
dkl: float, the Kullback-Leibler divergence
nipy.algorithms.clustering.bgmm.generate_Wishart(n, V)

Generate a sample from Wishart density

Parameters:
n: float,

the number of degrees of freedom of the Wishart density

V: array of shape (n,n)

the scale matrix of the Wishart density

Returns:
W: array of shape (n,n)

the draw from Wishart density

nipy.algorithms.clustering.bgmm.generate_normals(m, P)

Generate a Gaussian sample with mean m and precision P

Parameters:
m array of shape n: the mean vector
P array of shape (n,n): the precision matrix
Returns:
ngarray of shape(n): a draw from the gaussian density
nipy.algorithms.clustering.bgmm.generate_perm(k, nperm=100)

returns an array of shape(nbperm, k) representing the permutations of k elements

Parameters:
k, int the number of elements to be permuted
nperm=100 the maximal number of permutations
if gamma(k+1)>nperm: only nperm random draws are generated
Returns:
p: array of shape(nperm,k): each row is permutation of k
nipy.algorithms.clustering.bgmm.multinomial(probabilities)

Generate samples form a miltivariate distribution

Parameters:
probabilities: array of shape (nelements, nclasses):

likelihood of each element belongin to each class each row is assumedt to sum to 1 One sample is draw from each row, resulting in

Returns:
z array of shape (nelements): the draws,

that take values in [0..nclasses-1]

nipy.algorithms.clustering.bgmm.normal_eval(mu, P, x, dP=None)

Probability of x under normal(mu, inv(P))

Parameters:
mu: array of shape (n),

the mean parameter

P: array of shape (n, n),

the precision matrix

x: array of shape (n),

the data to be evaluated

Returns:
(float) the density
nipy.algorithms.clustering.bgmm.wishart_eval(n, V, W, dV=None, dW=None, piV=None)

Evaluation of the probability of W under Wishart(n,V)

Parameters:
n: float,

the number of degrees of freedom (dofs)

V: array of shape (n,n)

the scale matrix of the Wishart density

W: array of shape (n,n)

the sample to be evaluated

dV: float, optional,

determinant of V

dW: float, optional,

determinant of W

piV: array of shape (n,n), optional

inverse of V

Returns:
(float) the density