# algorithms.clustering.hierarchical_clustering¶

## Module: algorithms.clustering.hierarchical_clustering¶

Inheritance diagram for nipy.algorithms.clustering.hierarchical_clustering:

These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints

In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.

For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)

Author : Bertrand Thirion,Pamela Guevara, 2006-2009

## WeightedForest¶

class nipy.algorithms.clustering.hierarchical_clustering.WeightedForest(V, parents=None, height=None)

This is a weighted Forest structure, i.e. a tree - each node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms

__init__(V, parents=None, height=None)
Parameters: V: the number of edges of the graph parents=None: array of shape (V) the parents of the graph by default, the parents are set to range(V), i.e. each node is its own parent, and each node is a tree height=None: array of shape(V) the height of the nodes
check_compatible_height()

Check that height[parents[i]]>=height[i] for all nodes

get_height()

Get the height array

list_of_subtrees()

returns the list of all non-trivial subtrees in the graph Caveat: theis function assumes that the vertices are sorted in a way such that parent[i]>i forall i Only the leaves are listeed, not the subtrees themselves

partition(threshold)

Partition the tree according to a cut criterion

plot(ax=None)

Plot the dendrogram associated with self the rank of the data in the dendogram is returned

Parameters: ax: axis handle, optional ax, the axis handle
plot_height()

Plot the height of the non-leaves nodes

set_height(height=None)

Set the height array

split(k)

idem as partition, but a number of components are supplied instead

## Functions¶

Agglomerative function based on a (hopefully sparse) similarity graph

Parameters: G the input graph t a weightForest structure that represents the dendrogram of the data

Agglomerative function based on a (hopefully sparse) similarity graph

Parameters: G the input graph stop: float the stopping criterion qmax: int, optional the number of desired clusters (in the limit of the stopping criterion) verbose : bool, optional If True, print diagnostic information u: array of shape (G.V) a labelling of the graph vertices according to the criterion cost: array of shape (G.V (?)) the cost of each merge step during the clustering procedure
nipy.algorithms.clustering.hierarchical_clustering.fusion(K, pop, i, j, k)

Modifies the graph K to merge nodes i and j into nodes k

The similarity values are weighted averaged, where pop[i] and pop[j] yield the relative weights. this is used in average_link_slow (deprecated)

nipy.algorithms.clustering.hierarchical_clustering.ward(G, feature, verbose=False)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters: G : graph the input graph (a topological graph essentially) feature : array of shape (G.V,dim_feature) vectorial information related to the graph vertices verbose : bool, optional If True, print diagnostic information t : WeightedForest instance structure that represents the dendrogram

Notes

When G has more than 1 connected component, t is no longer a tree. This case is handled cleanly now

nipy.algorithms.clustering.hierarchical_clustering.ward_field_segment(F, stop=-1, qmax=-1, verbose=False)

Agglomerative function based on a field structure

Parameters: F the input field (graph+feature) stop: float, optional the stopping crterion. if stop==-1, then no stopping criterion is used qmax: int, optional the maximum number of desired clusters (in the limit of the stopping criterion) verbose : bool, optional If True, print diagnostic information u: array of shape (F.V) labelling of the graph vertices according to the criterion cost array of shape (F.V - 1) the cost of each merge step during the clustering procedure

Notes

Caveat : only approximate

nipy.algorithms.clustering.hierarchical_clustering.ward_quick(G, feature, verbose=False)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters: G : graph instance topology-defining graph feature: array of shape (G.V,dim_feature) some vectorial information related to the graph vertices verbose : bool, optional If True, print diagnostic information t: weightForest instance, that represents the dendrogram of the data Notes Hopefully a quicker version A euclidean distance is used in the feature space Caveat : only approximate
nipy.algorithms.clustering.hierarchical_clustering.ward_quick_segment(G, feature, stop=-1, qmax=1, verbose=False)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters: G: labs.graph.WeightedGraph instance the input graph (a topological graph essentially) feature array of shape (G.V,dim_feature) vectorial information related to the graph vertices stop1 : int or float, optional the stopping crterion if stop==-1, then no stopping criterion is used qmax : int, optional the maximum number of desired clusters (in the limit of the stopping criterion) verbose : bool, optional If True, print diagnostic information u: array of shape (G.V) labelling of the graph vertices according to the criterion cost: array of shape (G.V - 1) the cost of each merge step during the clustering procedure

Notes

Hopefully a quicker version

A euclidean distance is used in the feature space

Caveat : only approximate

nipy.algorithms.clustering.hierarchical_clustering.ward_segment(G, feature, stop=-1, qmax=1, verbose=False)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters: G : graph object the input graph (a topological graph essentially) feature : array of shape (G.V,dim_feature) some vectorial information related to the graph vertices stop : int or float, optional the stopping crterion. if stop==-1, then no stopping criterion is used qmax : int, optional the maximum number of desired clusters (in the limit of the stopping criterion) verbose : bool, optional If True, print diagnostic information u: array of shape (G.V): a labelling of the graph vertices according to the criterion cost: array of shape (G.V - 1) the cost of each merge step during the clustering procedure

Notes

A euclidean distance is used in the feature space

Caveat : when the number of cc in G (nbcc) is greter than qmax, u contains nbcc values, not qmax !