Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processed Data Access #15

Open
Qiqing-Fu opened this issue Dec 15, 2023 · 5 comments
Open

Processed Data Access #15

Qiqing-Fu opened this issue Dec 15, 2023 · 5 comments

Comments

@Qiqing-Fu
Copy link

It is difficult for us to download the pre-trained model and the pre-processed data stored on Amazon, could you share me with the email?

@Qiqing-Fu
Copy link
Author

And where is /home/ubuntu/COVID_Data/NeuroCOVID/TrainSplitData/NeuroCOVID_preprocessed_splitted.h5ad and /home/ubuntu/scGAN_ProcessedData/MADE_BY_scGAN/20Kneurons_2KTest.h5?

@dr-aheydari
Copy link
Member

dr-aheydari commented Dec 15, 2023

Hi @Seraph-009,

Thank you for your interest in ACTIVA! Given the size and the number of files, it would be best to share it over a cloud service, such as AWS (as we have).

If you are not familiar with downloading data from AWS, I highly recommend taking a look at issue #14. I provided some guidance on different ways of downloading our data from AWS there : )

The path in the code was a local path that we used for training (on a virtual machine), which you would need to change with the path to the data on your machine once the data is downloaded.

I hope this helps. Please let me know if you have any other questions : )

Best,
Ali

@Qiqing-Fu
Copy link
Author

Qiqing-Fu commented Dec 16, 2023

Thank you for your kind reply.
However, I have to register the Amazon account through the VISA card which I don't have now. Because I live in China, and we don't use the VISA card.
Could you provide the script of converting the raw_68kPBMCs.h5ad into the 68kPBMCs_7kTest.h5ad? Maybe this will help me more! thank you.

@dr-aheydari
Copy link
Member

hi @Seraph-009,

Sorry to hear about the AWS registration issues. Of course, I'd be happy to point you to the Notebooks for splitting sc files to train/test sets.

This notebook shows how one can use our SCProcessing pipeline for splitting the data into train/test/validation sets. This is how we went from <dataset>.h5ad to <dataset>_xkTrain/Test.h5ad : )

I hope this helps!

Best,
Ali

@Qiqing-Fu
Copy link
Author

Qiqing-Fu commented Dec 22, 2023

hi @dr-aheydari ,
Thank you for your instruction, and It did work!
However, I see the pipeline and find that the labels of generated cells are subsampled from the raw labels, which makes me puzzled. Does this work?
For example, I have raw data with 9000 cells and the corresponding cell type. if I want to generate 20000 cells, how should I use the ACTIVA?
I appreciate your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants