Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Cosine similarity for HDBSCAN #6123

Open
nivibilla opened this issue Oct 24, 2024 · 2 comments
Open

[FEA] Cosine similarity for HDBSCAN #6123

nivibilla opened this issue Oct 24, 2024 · 2 comments
Labels
feature request New feature or request

Comments

@nivibilla
Copy link

Is your feature request related to a problem? Please describe.
I would like to use cosine similarity as a metric for HDBSCAN for high dimensional data without dimension reduction as its quite slow to do for large datasets

Describe the solution you'd like
Allow clusterer = cuml.cluster.hdbscan.HDBSCAN(min_cluster_size=50, metric='cosine', prediction_data=True)

Describe alternatives you've considered
normal DBSCAN supports this but this has issues with clusters of various sizes.

@nivibilla nivibilla added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 24, 2024
@nivibilla
Copy link
Author

Trying this currently raises this error

raise ValueError(f"metric '{self.metric}' not supported, only "

@divyegala
Copy link
Member

@nivibilla you could l2-normalize your dataset for now and pass metric=euclidean for equivalent results to cosine distance.

@divyegala divyegala removed the ? - Needs Triage Need team to review and classify label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants