Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could you provide the documentation/instructions on how to use your code? #2

Open
quincy-125 opened this issue Mar 6, 2021 · 27 comments

Comments

@quincy-125
Copy link

Hello, I plan to use your code on our own svs H&E-stained Whole Slide Images. But I did not find any documentation in your repo, could you provide more information on the input data preprocessing and code instructions? Thanks!

@kheffah
Copy link
Collaborator

kheffah commented Mar 8, 2021

Hello @quincy-125 ,

Thank you for your interest in running our code. You're right, the code is still insufficiently documented, and I plan to expand the documentation substantially over the next week once I get a few priorities out of the way. For now, here's a sample code snippet to train NuCLS using this code base. Again, more expansion to documentation to come in the coming 1-2 weeks.

import sys
import os
from os.path import join as opj
import argparse

parser = argparse.ArgumentParser(description='Train nucleus model.')
parser.add_argument('-f', type=int, default=[1], nargs='+', help='fold(s) to run')
parser.add_argument('-g', type=int, default=[0], nargs='+', help='gpu(s) to use')
parser.add_argument('--qcd', type=int, default=1, help='use QCd data for training?')
parser.add_argument('--train', type=int, default=1, help='train?')
parser.add_argument('--vistest', type=int, default=1, help='visualize results on testing?')
args = parser.parse_args()
args.qcd = bool(args.qcd)
args.train = bool(args.train)
args.vistest = bool(args.vistest)

# GPU allocation MUST happen before importing other modules
from GeneralUtils import save_configs, maybe_mkdir, AllocateGPU
AllocateGPU(GPUs_to_use=args.g)

from nucleus_model.MiscUtils import load_saved_otherwise_default_model_configs
from configs.nucleus_model_configs import CoreSetQC, CoreSetNoQC
from nucleus_model.NucleusWorkflows import run_one_maskrcnn_fold

# %%===========================================================================
# Configs

model_name = '002_MaskRCNN_tmp'
dataset_name = CoreSetQC.dataset_name if args.qcd else CoreSetNoQC.dataset_name
all_models_root = opj(BASEPATH, f'results/tcga-nucleus/models/{dataset_name}/')
model_root = opj(all_models_root, model_name)
maybe_mkdir(model_root)

# load configs
configs_path = opj(model_root, 'nucleus_model_configs.py')
cfg = load_saved_otherwise_default_model_configs(configs_path=configs_path)

# for reproducibility, copy configs & most relevant code file to results
if not os.path.exists(configs_path):
    save_configs(
        configs_path=opj(BASEPATH, 'configs/nucleus_model_configs.py'),
        results_path=model_root)
save_configs(
    configs_path=os.path.abspath(__file__),
    results_path=model_root, warn=False)
save_configs(
    configs_path=opj(BASEPATH, 'nucleus_model/NucleusWorkflows.py'),
    results_path=model_root, warn=False)

# %%===========================================================================
# Now run

for fold in args.f:
    run_one_maskrcnn_fold(
        fold=fold, cfg=cfg, model_root=model_root, model_name=model_name,
        qcd_training=args.qcd, train=args.train, vis_test=args.vistest)

# %%===========================================================================

@quincy-125
Copy link
Author

Thanks! Quick question here, what's the input data for your model? Is an entire svs slide or the png image patches? Do you need the binary mask image with the annotated cells? Thanks!

@kheffah
Copy link
Collaborator

kheffah commented Mar 8, 2021

Sure thing. For model training, the input is the patches and however you prefer to load the annotations. In my case, I preferred to use a the annotation csv files. For model inference using the trained models, you can use the trained weights on anything you'd like, including stand-alone .png images or tiles from a slide that are fetched using, say, openslide.

@quincy-125
Copy link
Author

So that means I need to load image patches and a csv file for how many cells in each patch? For example the csv file ideally should have at least 2 columns, patch names and corresponding number of cells? Am I interpret what you just said correctly? Thanks for your immediate response, highly appreciate!

@kheffah
Copy link
Collaborator

kheffah commented Mar 8, 2021

Yes, you are correct. Each patch RGB image has an associated csv file, which contains the coordinates and classes of the nuclei in that patch. You can read more about the data format in this page. Let me know if anything is unclear, always happy to help :)

@quincy-125
Copy link
Author

Thanks a lot, could u provide me a link with your test sample data that I could play with? Btw, I did not see your email on the manuscript, if u don't mind, could you leave ur email here? If I run into any problems, I probably need to reach out to you again.

@kheffah
Copy link
Collaborator

kheffah commented Mar 8, 2021

Sure thing. When I prototype, I usually use the Corrected single-rater dataset, which you can find the link for in this page. You can either download the full dataset or just a few images to play around with for prototyping.
Actually, anything that happens in this repository sends me an email directly (I "watch" it), so the preferred way to ask questions is here through github issues. That being said, if you have any requests or questions that are not suitable for public viewing, feel free to email me at: [email protected] .

@quincy-125
Copy link
Author

Sounds good. Thank you

@demonhawk007
Copy link

Hi, I have a query regarding where to locate "configs.nucleus_style_defaults". Could you please help me? I couldn't find it in the dependencies.

@kheffah
Copy link
Collaborator

kheffah commented Apr 25, 2021

Hi @demonhawk007 , Thank you for the question. That was just a typo. I renames the folder config to configs within this repository. You should now be able to find it.

@demonhawk007
Copy link

The issue is not the typo. I had already corrected that. I am unable to find "nucleus_style_defaults". I receive "ModuleNotFoundError: No module named 'configs.nucleus_style_defaults'"

@kheffah
Copy link
Collaborator

kheffah commented Apr 25, 2021

@demonhawk007 Ah I see what you mean. Well, you are right this repo is not yet a "package" as there is no setup.py etc. For now, you can just make sure to add the path of the repository root at the start of your script:

import sys
sys.path.insert(0, '/path/to/this/repository')

Then all of these imports would work. The file you are looking for is here.

@demonhawk007
Copy link

What I am trying to say is "nucleus_style_defaults" is supposed to be a python file inside configs folder which can be found here. Since you are using "from configs import nucleus_style_defaults", it is unable to locate that. On the other hand we can see that "nucleus_model_configs.py" is present. That is why we are able to import that. Hope I am able to make my point.

@kheffah
Copy link
Collaborator

kheffah commented Apr 25, 2021

@demonhawk007 Woops! Thank you for pointing this out. In my head, I was confusing nucleus_model_configs.py with nucleus_style_defaults.py. OK, I added it, and you should see it now here.

@demonhawk007
Copy link

demonhawk007 commented Apr 25, 2021

Hi @kheffah , Thank you. I have another follow up query. The file which you mentioned here
image,
what does this refers to? A sample would be extremely helpful.

@kheffah
Copy link
Collaborator

kheffah commented Apr 25, 2021

@demonhawk007 Sure thing. __file__ here is literally the same script you are running. For example, if you run:

import os
print(os.path.abspath(__file__))

from a script file called myPythonFile.py, it would print something like /path/to/myPythonFile.py. See here for example.

@demonhawk007
Copy link

Thank you @kheffah. I am very close to executing the setup. Currently facing issues with sqlitedb setup. I will revert here if an clarification is needed.

@demonhawk007
Copy link

Hi @kheffah , I would like to know if there is a separate utility/dependency file for Database, as I dont see where the tables are created.
image

@zunairaR
Copy link

zunairaR commented May 1, 2021

Salam, I was thinking of using this dataset, but I couldn't understand what is the image size to the model? As the paper states the crop size is 300. Does it mean that all the rgb patch images (.png format, with variable sizes) are resized to 300(pxls), or patches are extracted from patch images using some sliding window technique?
Secondly, why the number of nuclei differ for each of detection, segmentation and classification task?
Thanks

@zunairaR
Copy link

zunairaR commented May 2, 2021

I have one more query regarding the available FOVs, have they already undergone the stain normalization? or we need to do it before preparing the data for DL? Thanks

@kheffah
Copy link
Collaborator

kheffah commented May 10, 2021

@demonhawk007 Apologies for the delay. Actually, the database is just an sqlite version of the csv files that were publicly shared. If you convert the csv to sqlite, you would have the data you need. The only reason we shared them as csv is because these are more platform agnostic and widely recognized.

@kheffah
Copy link
Collaborator

kheffah commented May 10, 2021

@zunairaR Salam, thank you for your question. I hope my response to the other issue answered your questions about the crop size and stain normalization. As for your second question, the number of nuclei will be different for each image, and that's OK. In fact, MaskRCNN (and by extension, our modified NuCLS version) already handles this by setting a very large maximum number of detections per image, say 300 nuclei, then using non-maximum suppression to remove detections that are unrealistically close and overlapping.

@Amshoreline
Copy link

@demonhawk007 Apologies for the delay. Actually, the database is just an sqlite version of the csv files that were publicly shared. If you convert the csv to sqlite, you would have the data you need. The only reason we shared them as csv is because these are more platform agnostic and widely recognized.

Could you provide the sqlite version of the csv files? Thanks

@player1321
Copy link

player1321 commented Jun 10, 2021

@kheffah I tried to convert the csv to sqlite with following code, but there are still some missing columns, can you provide the sqlite file or some scripts to generate the sqlite from csv?

def get_fov_meta(csv_file):
    df = pd.read_csv(csv_file)
    df['fov_id'] = df['fovname'].apply(lambda x: x.split('_')[1])
    df['slide_name'] = df['fovname'].apply(lambda x: x.split('_')[0])
    return df

def get_annotation_elements(csv_folder):
    csv_file_list = [i for i in os.listdir(csv_folder) if 'TCGA' in i]
    df_all = pd.DataFrame()
    for csv_file in csv_file_list:
        df = pd.read_csv(os.path.join(csv_folder, csv_file))
        df['fov_id'] = csv_file[:-4].split('_')[1]
        df['slide_name'] = csv_file[:-4].split('_')[0]
        df['fovname'] = csv_file[:-4]
        df_all = df_all.append(df, ignore_index=True)
    return df_all

con = sqlite3.connect('QC.sqlite')
df = get_fov_meta('ALL_FOV_LOCATIONS.csv')
df_all = get_annotation_elements('QC/csv')
df.to_sql('fov_meta', con, if_exists='replace', index=False)
df_all.to_sql('annotation_elements', con, if_exists='replace', index=False)

@kheffah
Copy link
Collaborator

kheffah commented Jun 10, 2021

@Amshoreline @player1321 Please use the following links to access the sqlite database:

Important note: this is RAW data! The csv files are better suited for use, but feel free to use the raw data if you have a very strong preference for sqlite.

@Amshoreline
Copy link

@Amshoreline @player1321 Please use the following links to access the sqlite database:

Important note: this is RAW data! The csv files are better suited for use, but feel free to use the raw data if you have a very strong preference for sqlite.

Thanks, but I encountered a new problem:
NuCLS/nucls_model/DataLoadingUtils.py", line 539, in getitem
boxes=np.int32(target['boxes']))
NuCLS/nucls_model/DataFormattingUtils.py", line 67, in from_dense_to_sparse_object_mask
obj_ids = dense_mask[ys[:, 0], xs[:, 0]]
IndexError: too many indices for array

@chenqz1998
Copy link

I'm still not sure how to use your code. Could you please explain that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants