-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pre-computed knn with preserve_neighbors #65
Comments
@jlmelville, sorry for the late response. Yes, with PyMDE, you can use pre-computed k-nearest neighbor data, with the caveat that you'll need to manually wire up the MDE problem (instead of calling This notebook has an example that shows how to make a neighborhood preserving embedding from scratch. Start reading from the section called "Embeddings, from scratch", which shows how to create the k-nearest neighbor graph: https://github.com/cvxgrp/pymde/blob/main/examples/mnist.ipynb Then, the code for creating a neighborhood-preserving embedding starts in the section called "Including dissimilar pairs". For more in-depth documentation:
Hope that helps! |
@akshayka thanks for pointing me to notebook. I think from looking at the functions called there, if I want to make my own import numpy as np
import pymde
import pynndescent
import scipy.sparse as sp
from pymde.preprocess.graph import Graph
mnist = pymde.datasets.MNIST()
# assume we use pynndescent for the purposes of this example
# we just need a precomputed n_items x n_neighbors numpy array of indices from somewhere
mnist_nnd = pynndescent.NNDescent(
mnist.data,
n_neighbors=16,
max_candidates=60,
).neighbor_graph
# ignore the "self" neighbors
idx = mnist_nnd[0][:, 1:]
# create the edge list
n_items = idx.shape[0]
n_neighbors = idx.shape[1]
items = np.arange(n_items)
items = np.repeat(items, n_neighbors)
edges = np.stack([items, idx.flatten()], axis=1)
# canonicalize edges
flip_idx = edges[:, 0] > edges[:, 1]
edges[flip_idx] = np.stack([edges[flip_idx][:, 1], edges[flip_idx][:, 0]], axis=1)
# create a sparse matrix
rows = edges[:, 0]
cols = edges[:, 1]
weights = np.ones(edges.shape[0], dtype=np.float32)
graph = sp.coo_matrix((weights, (rows, cols)), shape=(n_items, n_items))
# symmetrize
graph = graph + graph.T
graph = graph.tocsr()
graph = Graph(graph) then I can use |
Something like that does look right! Were you able to get it working? This is probably a common enough use case that we should consider including a helper function in the library to automate it. |
Yes thanks! I was able to make it work, although on top of a helper function to generate a Graph from nearest neighbors indices, |
Great! Thanks for reporting back. At the very least, I will keep this issue open so others can chime in if they'd like this functionality as well. |
Hello, is it possible to use
preserve_neighbors
with pre-computed k-nearest neighbors data in the distance-matrix-like form used by the likes of UMAP and t-SNE? In this format, the indices and distances are stored as separate matrices of shapen_objects, n_neighbors
. The element (i, j) in the index matrix is the index of thej
th nearest neighbor ofi
, and the equivalent element in the distance matrix is the distance between those two objects.I (think) I can work out how to manually convert that into the edges/weights form used by
pymde.Graph
(various acts ofmelt
ing andravel
ing), but I don't get the results I expect compared to passing the data directly. I think that's because whenpreprocess.generic.k_nearest_neighbors
is called, for the graph code path,graph.k_nearest_neighbors
hasgraph_distances=True
. So another way to put it is: is there a way to pass aGraph
so that it is interpreted as a distance matrix? This would be helpful because calculating the k-nearest neighbors is usually the slowest part of this sort of dimensionality reduction method.The text was updated successfully, but these errors were encountered: