could you provide the documentation/instructions on how to use your code? #2

quincy-125 · 2021-03-06T22:56:16Z

Hello, I plan to use your code on our own svs H&E-stained Whole Slide Images. But I did not find any documentation in your repo, could you provide more information on the input data preprocessing and code instructions? Thanks!

kheffah · 2021-03-08T05:05:25Z

Hello @quincy-125 ,

Thank you for your interest in running our code. You're right, the code is still insufficiently documented, and I plan to expand the documentation substantially over the next week once I get a few priorities out of the way. For now, here's a sample code snippet to train NuCLS using this code base. Again, more expansion to documentation to come in the coming 1-2 weeks.

import sys
import os
from os.path import join as opj
import argparse

parser = argparse.ArgumentParser(description='Train nucleus model.')
parser.add_argument('-f', type=int, default=[1], nargs='+', help='fold(s) to run')
parser.add_argument('-g', type=int, default=[0], nargs='+', help='gpu(s) to use')
parser.add_argument('--qcd', type=int, default=1, help='use QCd data for training?')
parser.add_argument('--train', type=int, default=1, help='train?')
parser.add_argument('--vistest', type=int, default=1, help='visualize results on testing?')
args = parser.parse_args()
args.qcd = bool(args.qcd)
args.train = bool(args.train)
args.vistest = bool(args.vistest)

# GPU allocation MUST happen before importing other modules
from GeneralUtils import save_configs, maybe_mkdir, AllocateGPU
AllocateGPU(GPUs_to_use=args.g)

from nucleus_model.MiscUtils import load_saved_otherwise_default_model_configs
from configs.nucleus_model_configs import CoreSetQC, CoreSetNoQC
from nucleus_model.NucleusWorkflows import run_one_maskrcnn_fold

# %%===========================================================================
# Configs

model_name = '002_MaskRCNN_tmp'
dataset_name = CoreSetQC.dataset_name if args.qcd else CoreSetNoQC.dataset_name
all_models_root = opj(BASEPATH, f'results/tcga-nucleus/models/{dataset_name}/')
model_root = opj(all_models_root, model_name)
maybe_mkdir(model_root)

# load configs
configs_path = opj(model_root, 'nucleus_model_configs.py')
cfg = load_saved_otherwise_default_model_configs(configs_path=configs_path)

# for reproducibility, copy configs & most relevant code file to results
if not os.path.exists(configs_path):
    save_configs(
        configs_path=opj(BASEPATH, 'configs/nucleus_model_configs.py'),
        results_path=model_root)
save_configs(
    configs_path=os.path.abspath(__file__),
    results_path=model_root, warn=False)
save_configs(
    configs_path=opj(BASEPATH, 'nucleus_model/NucleusWorkflows.py'),
    results_path=model_root, warn=False)

# %%===========================================================================
# Now run

for fold in args.f:
    run_one_maskrcnn_fold(
        fold=fold, cfg=cfg, model_root=model_root, model_name=model_name,
        qcd_training=args.qcd, train=args.train, vis_test=args.vistest)

# %%===========================================================================

quincy-125 · 2021-03-08T05:07:55Z

Thanks! Quick question here, what's the input data for your model? Is an entire svs slide or the png image patches? Do you need the binary mask image with the annotated cells? Thanks!

kheffah · 2021-03-08T05:12:02Z

Sure thing. For model training, the input is the patches and however you prefer to load the annotations. In my case, I preferred to use a the annotation csv files. For model inference using the trained models, you can use the trained weights on anything you'd like, including stand-alone .png images or tiles from a slide that are fetched using, say, openslide.

quincy-125 · 2021-03-08T05:14:50Z

So that means I need to load image patches and a csv file for how many cells in each patch? For example the csv file ideally should have at least 2 columns, patch names and corresponding number of cells? Am I interpret what you just said correctly? Thanks for your immediate response, highly appreciate!

kheffah · 2021-03-08T05:26:53Z

Yes, you are correct. Each patch RGB image has an associated csv file, which contains the coordinates and classes of the nuclei in that patch. You can read more about the data format in this page. Let me know if anything is unclear, always happy to help :)

quincy-125 · 2021-03-08T05:30:54Z

Thanks a lot, could u provide me a link with your test sample data that I could play with? Btw, I did not see your email on the manuscript, if u don't mind, could you leave ur email here? If I run into any problems, I probably need to reach out to you again.

kheffah · 2021-03-08T05:35:49Z

Sure thing. When I prototype, I usually use the Corrected single-rater dataset, which you can find the link for in this page. You can either download the full dataset or just a few images to play around with for prototyping.
Actually, anything that happens in this repository sends me an email directly (I "watch" it), so the preferred way to ask questions is here through github issues. That being said, if you have any requests or questions that are not suitable for public viewing, feel free to email me at: [email protected] .

quincy-125 · 2021-03-08T05:37:50Z

Sounds good. Thank you

demonhawk007 · 2021-04-25T15:56:35Z

Hi, I have a query regarding where to locate "configs.nucleus_style_defaults". Could you please help me? I couldn't find it in the dependencies.

kheffah · 2021-04-25T16:11:23Z

Hi @demonhawk007 , Thank you for the question. That was just a typo. I renames the folder config to configs within this repository. You should now be able to find it.

demonhawk007 · 2021-04-25T16:17:21Z

The issue is not the typo. I had already corrected that. I am unable to find "nucleus_style_defaults". I receive "ModuleNotFoundError: No module named 'configs.nucleus_style_defaults'"

kheffah · 2021-04-25T16:22:37Z

@demonhawk007 Ah I see what you mean. Well, you are right this repo is not yet a "package" as there is no setup.py etc. For now, you can just make sure to add the path of the repository root at the start of your script:

import sys
sys.path.insert(0, '/path/to/this/repository')

Then all of these imports would work. The file you are looking for is here.

demonhawk007 · 2021-04-25T16:32:02Z

What I am trying to say is "nucleus_style_defaults" is supposed to be a python file inside configs folder which can be found here. Since you are using "from configs import nucleus_style_defaults", it is unable to locate that. On the other hand we can see that "nucleus_model_configs.py" is present. That is why we are able to import that. Hope I am able to make my point.

kheffah · 2021-04-25T16:41:25Z

@demonhawk007 Woops! Thank you for pointing this out. In my head, I was confusing nucleus_model_configs.py with nucleus_style_defaults.py. OK, I added it, and you should see it now here.

demonhawk007 · 2021-04-25T17:37:46Z

Hi @kheffah , Thank you. I have another follow up query. The file which you mentioned here
,
what does this refers to? A sample would be extremely helpful.

kheffah · 2021-04-25T17:42:31Z

@demonhawk007 Sure thing. __file__ here is literally the same script you are running. For example, if you run:

import os
print(os.path.abspath(__file__))

from a script file called myPythonFile.py, it would print something like /path/to/myPythonFile.py. See here for example.

demonhawk007 · 2021-04-25T19:38:55Z

Thank you @kheffah. I am very close to executing the setup. Currently facing issues with sqlitedb setup. I will revert here if an clarification is needed.

demonhawk007 · 2021-04-26T17:56:35Z

Hi @kheffah , I would like to know if there is a separate utility/dependency file for Database, as I dont see where the tables are created.

zunairaR · 2021-05-01T06:17:46Z

Salam, I was thinking of using this dataset, but I couldn't understand what is the image size to the model? As the paper states the crop size is 300. Does it mean that all the rgb patch images (.png format, with variable sizes) are resized to 300(pxls), or patches are extracted from patch images using some sliding window technique?
Secondly, why the number of nuclei differ for each of detection, segmentation and classification task?
Thanks

zunairaR · 2021-05-02T05:26:41Z

I have one more query regarding the available FOVs, have they already undergone the stain normalization? or we need to do it before preparing the data for DL? Thanks

kheffah · 2021-05-10T20:09:40Z

@demonhawk007 Apologies for the delay. Actually, the database is just an sqlite version of the csv files that were publicly shared. If you convert the csv to sqlite, you would have the data you need. The only reason we shared them as csv is because these are more platform agnostic and widely recognized.

kheffah · 2021-05-10T20:12:43Z

@zunairaR Salam, thank you for your question. I hope my response to the other issue answered your questions about the crop size and stain normalization. As for your second question, the number of nuclei will be different for each image, and that's OK. In fact, MaskRCNN (and by extension, our modified NuCLS version) already handles this by setting a very large maximum number of detections per image, say 300 nuclei, then using non-maximum suppression to remove detections that are unrealistically close and overlapping.

Amshoreline · 2021-06-08T10:07:55Z

@demonhawk007 Apologies for the delay. Actually, the database is just an sqlite version of the csv files that were publicly shared. If you convert the csv to sqlite, you would have the data you need. The only reason we shared them as csv is because these are more platform agnostic and widely recognized.

Could you provide the sqlite version of the csv files? Thanks

player1321 · 2021-06-10T04:17:00Z

@kheffah I tried to convert the csv to sqlite with following code, but there are still some missing columns, can you provide the sqlite file or some scripts to generate the sqlite from csv?

def get_fov_meta(csv_file):
    df = pd.read_csv(csv_file)
    df['fov_id'] = df['fovname'].apply(lambda x: x.split('_')[1])
    df['slide_name'] = df['fovname'].apply(lambda x: x.split('_')[0])
    return df

def get_annotation_elements(csv_folder):
    csv_file_list = [i for i in os.listdir(csv_folder) if 'TCGA' in i]
    df_all = pd.DataFrame()
    for csv_file in csv_file_list:
        df = pd.read_csv(os.path.join(csv_folder, csv_file))
        df['fov_id'] = csv_file[:-4].split('_')[1]
        df['slide_name'] = csv_file[:-4].split('_')[0]
        df['fovname'] = csv_file[:-4]
        df_all = df_all.append(df, ignore_index=True)
    return df_all

con = sqlite3.connect('QC.sqlite')
df = get_fov_meta('ALL_FOV_LOCATIONS.csv')
df_all = get_annotation_elements('QC/csv')
df.to_sql('fov_meta', con, if_exists='replace', index=False)
df_all.to_sql('annotation_elements', con, if_exists='replace', index=False)

kheffah · 2021-06-10T15:02:48Z

@Amshoreline @player1321 Please use the following links to access the sqlite database:

Single-rater datasets: this link.
Multirater datasets: this link.

Important note: this is RAW data! The csv files are better suited for use, but feel free to use the raw data if you have a very strong preference for sqlite.

Amshoreline · 2021-06-11T08:54:02Z

@Amshoreline @player1321 Please use the following links to access the sqlite database:

Single-rater datasets: this link.

Multirater datasets: this link.

Important note: this is RAW data! The csv files are better suited for use, but feel free to use the raw data if you have a very strong preference for sqlite.

Thanks, but I encountered a new problem:
NuCLS/nucls_model/DataLoadingUtils.py", line 539, in getitem
boxes=np.int32(target['boxes']))
NuCLS/nucls_model/DataFormattingUtils.py", line 67, in from_dense_to_sparse_object_mask
obj_ids = dense_mask[ys[:, 0], xs[:, 0]]
IndexError: too many indices for array

chenqz1998 · 2021-07-01T09:14:07Z

I'm still not sure how to use your code. Could you please explain that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

could you provide the documentation/instructions on how to use your code? #2

could you provide the documentation/instructions on how to use your code? #2

quincy-125 commented Mar 6, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021 •

edited

Loading

demonhawk007 commented Apr 25, 2021 •

edited

Loading

kheffah commented Apr 25, 2021 •

edited

Loading

demonhawk007 commented Apr 25, 2021

demonhawk007 commented Apr 26, 2021

zunairaR commented May 1, 2021

zunairaR commented May 2, 2021 •

edited

Loading

kheffah commented May 10, 2021

kheffah commented May 10, 2021

Amshoreline commented Jun 8, 2021

player1321 commented Jun 10, 2021 •

edited

Loading

kheffah commented Jun 10, 2021

Amshoreline commented Jun 11, 2021

chenqz1998 commented Jul 1, 2021

could you provide the documentation/instructions on how to use your code? #2

could you provide the documentation/instructions on how to use your code? #2

Comments

quincy-125 commented Mar 6, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

kheffah commented Mar 8, 2021

quincy-125 commented Mar 8, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021

demonhawk007 commented Apr 25, 2021

kheffah commented Apr 25, 2021 • edited Loading

demonhawk007 commented Apr 25, 2021 • edited Loading

kheffah commented Apr 25, 2021 • edited Loading

demonhawk007 commented Apr 25, 2021

demonhawk007 commented Apr 26, 2021

zunairaR commented May 1, 2021

zunairaR commented May 2, 2021 • edited Loading

kheffah commented May 10, 2021

kheffah commented May 10, 2021

Amshoreline commented Jun 8, 2021

player1321 commented Jun 10, 2021 • edited Loading

kheffah commented Jun 10, 2021

Amshoreline commented Jun 11, 2021

chenqz1998 commented Jul 1, 2021

kheffah commented Apr 25, 2021 •

edited

Loading

demonhawk007 commented Apr 25, 2021 •

edited

Loading

kheffah commented Apr 25, 2021 •

edited

Loading

zunairaR commented May 2, 2021 •

edited

Loading

player1321 commented Jun 10, 2021 •

edited

Loading