[ENH] Tree (hierarchical clustering, dendrogram) of clusters #3680

mstrazar · 2019-03-15T14:25:46Z

It is unfeasible to compute hierarchical clustering with datasets of more than a few thousand data points. Instead, a tree between the clusters is useful. Use cluster centroid (mean point) or other linkage options from Scipy to compute linkages.

Tree between 17 clusters:
tree.pdf

lanzagar · 2019-03-22T11:36:01Z

I agree with everything written: HC is unfeasible for large data, tree of clusters can be useful, cluster centroids can be used for this (provided by kmeans).
What exactly is the issue here?

One issue I have with this approach is that when the original data contains categorical variables, the centroids have a different domain (kmeans continuizes the data). But for numerical data (e.g. gene expression) this is not a problem.

mstrazar · 2019-03-22T12:42:12Z

It's more an enhancement than an issue. Certain clustering algorithms do not output cluster centroids, but only categorical labels. Hierarchical clustering would not even need to operate on centroids, but use the appropriate linkage method to aggregate distances between provided clusters. Categorical variables can be continuized too.

lanzagar · 2019-03-22T13:04:50Z

I see - so instead of using k-means you would like to use predefined clusters (categorical var in data), which can be a result of (any) clustering or otherwise given.
And in this case we currently don't have a good option to get the centroids, which can be used in hierarchical clustering.

In that case, the solution would be to have a Pivot table, which can compute averages (or other aggregates) to obtain the centroids of selected groups.
This is becoming a recurring issue :) We ended up with a similar conclusion many times in the past few months. +1 for Pivot tables, we just need to come up with a good widget proposal.

ajdapretnar · 2019-05-29T09:02:23Z

Ahem, #3823. 😁 Would this help? @mstrazar If so, this would be a nice thing to add to the docs.

janezd · 2019-11-29T16:55:38Z

Closed due to inactivity. Probably partially solved by Pivot widget.

janezd closed this as completed Nov 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Tree (hierarchical clustering, dendrogram) of clusters #3680

[ENH] Tree (hierarchical clustering, dendrogram) of clusters #3680

mstrazar commented Mar 15, 2019

lanzagar commented Mar 22, 2019

mstrazar commented Mar 22, 2019

lanzagar commented Mar 22, 2019

ajdapretnar commented May 29, 2019

janezd commented Nov 29, 2019

[ENH] Tree (hierarchical clustering, dendrogram) of clusters #3680

[ENH] Tree (hierarchical clustering, dendrogram) of clusters #3680

Comments

mstrazar commented Mar 15, 2019

lanzagar commented Mar 22, 2019

mstrazar commented Mar 22, 2019

lanzagar commented Mar 22, 2019

ajdapretnar commented May 29, 2019

janezd commented Nov 29, 2019