Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One hot encoding #2

Open
pakiessling opened this issue Sep 23, 2024 · 4 comments
Open

One hot encoding #2

pakiessling opened this issue Sep 23, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@pakiessling
Copy link

Description of feature

Hi Darius,

was nice meeting in person :)

I am trying out the one-hot encoding we talked about.

import pandas as pd
import numpy as np

cell_types = adata.obs['cell_type_tmp']
one_hot = pd.get_dummies(cell_types,dtype=np.int8)
one_hot_array = one_hot.values
adata_one_hot = ad.AnnData(one_hot_array, obs=adata.obs)
adata_one_hot.obsm["spatial"] = adata.obsm["spatial"]

Do I now just run nichepca on this?

@pakiessling pakiessling added the enhancement New feature or request label Sep 23, 2024
@dschaub95
Copy link
Contributor

dschaub95 commented Sep 23, 2024

Hi Paul,

yes exactly! In our experiments, it worked better for multi-slide integration, but then you need to omit the harmony part. I would suggest to just copy the relevant parts from the nichepca function and implement it yourself. I did not have time yet to adapt the nichepca function. You only need to have these lines:

def run_nichepca(
    adata: AnnData,
    knn: int = None,
    radius: float = None,
    sample_key: str = None,
    n_comps: int = 30,
    **kwargs,
):

  if sample_key is not None:
          construct_multi_sample_graph(
              adata, sample_key=sample_key, knn=knn, radius=radius, **kwargs
          )
      else:
          if knn is not None:
              knn_graph(adata, knn, **kwargs)
          elif radius is not None:
              distance_graph(adata, radius, **kwargs)
          else:
              raise ValueError("Either knn or radius must be provided.")
  
  aggregate(adata)
  
  sc.tl.pca(adata, n_comps=n_comps)

@pakiessling
Copy link
Author

Thanks, I will give it a shot

@pakiessling
Copy link
Author

construct_multi_sample_graph(adata, sample_key="sample", knn=5)
aggregate(adata)
rsc.get.anndata_to_GPU(adata)
rsc.tl.pca(adata, n_comps=5)
rsc.pp.neighbors(adata)  
rsc.tl.leiden(adata, resolution=0.1, key_added="nichepca_0.1")
rsc.tl.leiden(adata, resolution=0.5, key_added="nichepca_0.5")
rsc.tl.leiden(adata, resolution=0.3, key_added="nichepca_0.3")
rsc.tl.leiden(adata, resolution=0.8, key_added="nichepca_0.8")

I did this and got more than 6000 cluster for all of the resolutions 😅

Do you know that could cause this?

@pakiessling pakiessling reopened this Sep 23, 2024
@dschaub95
Copy link
Contributor

Hi Paul,

sorry for the late reply. I think it might be caused by the low number of knn, which might lead to many similar neighborhood compositions. What happens if you run it with say knn=20 and 30 comps?

Best
Darius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants