algorithms.clustering.hierarchical_clustering¶
Module: algorithms.clustering.hierarchical_clustering
¶
Inheritance diagram for nipy.algorithms.clustering.hierarchical_clustering
:
These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed:  Distance based averagelink  Similaritybased averagelink  Distance based maximumlink  Ward’s algorithm under graph constraints  Ward’s algorithm without graph constraints
In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.
For backcompatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)
Author : Bertrand Thirion,Pamela Guevara, 20062009
Class¶
WeightedForest
¶

class
nipy.algorithms.clustering.hierarchical_clustering.
WeightedForest
(V, parents=None, height=None)¶ Bases:
nipy.algorithms.graph.forest.Forest
This is a weighted Forest structure, i.e. a tree  each node has one parent and children (hierarchical structure)  some of the nodes can be viewed as leaves, other as roots  the edges within a tree are associated with a weight: +1 from child to parent 1 from parent to child  additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms

__init__
(V, parents=None, height=None)¶ Parameters: V: the number of edges of the graph
parents=None: array of shape (V)
the parents of the graph by default, the parents are set to range(V), i.e. each node is its own parent, and each node is a tree
height=None: array of shape(V)
the height of the nodes

check_compatible_height
()¶ Check that height[parents[i]]>=height[i] for all nodes

get_height
()¶ Get the height array

list_of_subtrees
()¶ returns the list of all nontrivial subtrees in the graph Caveat: theis function assumes that the vertices are sorted in a way such that parent[i]>i forall i Only the leaves are listeed, not the subtrees themselves

partition
(threshold)¶ Partition the tree according to a cut criterion

plot
(ax=None)¶ Plot the dendrogram associated with self the rank of the data in the dendogram is returned
Parameters: ax: axis handle, optional Returns: ax, the axis handle

plot_height
()¶ Plot the height of the nonleaves nodes

set_height
(height=None)¶ Set the height array

split
(k)¶ idem as partition, but a number of components are supplied instead

Functions¶

nipy.algorithms.clustering.hierarchical_clustering.
average_link_graph
(G)¶ Agglomerative function based on a (hopefully sparse) similarity graph
Parameters: G the input graph Returns: t a weightForest structure that represents the dendrogram of the data

nipy.algorithms.clustering.hierarchical_clustering.
average_link_graph_segment
(G, stop=0, qmax=1, verbose=False)¶ Agglomerative function based on a (hopefully sparse) similarity graph
Parameters: G the input graph
stop: float
the stopping criterion
qmax: int, optional
the number of desired clusters (in the limit of the stopping criterion)
verbose : bool, optional
If True, print diagnostic information
Returns: u: array of shape (G.V)
a labelling of the graph vertices according to the criterion
cost: array of shape (G.V (?))
the cost of each merge step during the clustering procedure

nipy.algorithms.clustering.hierarchical_clustering.
fusion
(K, pop, i, j, k)¶ Modifies the graph K to merge nodes i and j into nodes k
The similarity values are weighted averaged, where pop[i] and pop[j] yield the relative weights. this is used in average_link_slow (deprecated)

nipy.algorithms.clustering.hierarchical_clustering.
ward
(G, feature, verbose=False)¶ Agglomerative function based on a topologydefining graph and a feature matrix.
Parameters: G : graph
the input graph (a topological graph essentially)
feature : array of shape (G.V,dim_feature)
vectorial information related to the graph vertices
verbose : bool, optional
If True, print diagnostic information
Returns: t :
WeightedForest
instancestructure that represents the dendrogram
Notes
When G has more than 1 connected component, t is no longer a tree. This case is handled cleanly now

nipy.algorithms.clustering.hierarchical_clustering.
ward_field_segment
(F, stop=1, qmax=1, verbose=False)¶ Agglomerative function based on a field structure
Parameters: F the input field (graph+feature)
stop: float, optional
the stopping crterion. if stop==1, then no stopping criterion is used
qmax: int, optional
the maximum number of desired clusters (in the limit of the stopping criterion)
verbose : bool, optional
If True, print diagnostic information
Returns: u: array of shape (F.V)
labelling of the graph vertices according to the criterion
cost array of shape (F.V  1)
the cost of each merge step during the clustering procedure
Notes
See ward_quick_segment for more information
Caveat : only approximate

nipy.algorithms.clustering.hierarchical_clustering.
ward_quick
(G, feature, verbose=False)¶ Agglomerative function based on a topologydefining graph and a feature matrix.
Parameters: G : graph instance
topologydefining graph
feature: array of shape (G.V,dim_feature)
some vectorial information related to the graph vertices
verbose : bool, optional
If True, print diagnostic information
Returns: t: weightForest instance,
that represents the dendrogram of the data
Notes
Hopefully a quicker version
A euclidean distance is used in the feature space
Caveat : only approximate

nipy.algorithms.clustering.hierarchical_clustering.
ward_quick_segment
(G, feature, stop=1, qmax=1, verbose=False)¶ Agglomerative function based on a topologydefining graph and a feature matrix.
Parameters: G: labs.graph.WeightedGraph instance
the input graph (a topological graph essentially)
feature array of shape (G.V,dim_feature)
vectorial information related to the graph vertices
stop1 : int or float, optional
the stopping crterion if stop==1, then no stopping criterion is used
qmax : int, optional
the maximum number of desired clusters (in the limit of the stopping criterion)
verbose : bool, optional
If True, print diagnostic information
Returns: u: array of shape (G.V)
labelling of the graph vertices according to the criterion
cost: array of shape (G.V  1)
the cost of each merge step during the clustering procedure
Notes
Hopefully a quicker version
A euclidean distance is used in the feature space
Caveat : only approximate

nipy.algorithms.clustering.hierarchical_clustering.
ward_segment
(G, feature, stop=1, qmax=1, verbose=False)¶ Agglomerative function based on a topologydefining graph and a feature matrix.
Parameters: G : graph object
the input graph (a topological graph essentially)
feature : array of shape (G.V,dim_feature)
some vectorial information related to the graph vertices
stop : int or float, optional
the stopping crterion. if stop==1, then no stopping criterion is used
qmax : int, optional
the maximum number of desired clusters (in the limit of the stopping criterion)
verbose : bool, optional
If True, print diagnostic information
Returns: u: array of shape (G.V):
a labelling of the graph vertices according to the criterion
cost: array of shape (G.V  1)
the cost of each merge step during the clustering procedure
Notes
A euclidean distance is used in the feature space
Caveat : when the number of cc in G (nbcc) is greter than qmax, u contains nbcc values, not qmax !