A problem in the using my own data #3

githubcooperation · 2023-05-06T01:35:15Z

I execute the following code,
python BEN_DA.py -t E:\ProgramResearch\BEN\data\train -l E:\ProgramResearch\BEN\data\label -r E:\ProgramResearch\BEN\data\raw-all -prefix sen2 -check RIA
I find during the train model, the loss is nan

The follow is my train data and my label data,

Can you give me some suggestions to solve this problem, thank you!

grandjeanlab · 2024-05-17T09:59:24Z

I have the same issue..

here is the data I use
https://surfdrive.surf.nl/files/index.php/s/0mmRiXTveveiq8C

and the command line
python /opt/BEN/BEN_DA.py -t epi_train_small -l epi_label_small -r epi_train -weight weight/Rat-EPI-94T-CAMRI_2022_07191156/.hdf5 -prefix stdrat_func_240517

Any idea?

grandjeanlab · 2024-05-17T10:27:37Z

this is also the case when using the data / command from the tuto here https://www.youtube.com/watch?v=VZFNDh3MliA

yu02019 · 2024-05-17T11:33:27Z

Hi Joanes,

Thank you for reporting this issue.

I ran the data you provided, and here are the results and logs. It appears that BEN is working well with this dataset.
Uploading epi_train_predict.zip…

However, based on your questions, I discovered a bug related to the "weight path" in the inference call to BEN. Please allow me some time to fix this issue and update the source code.

Thank you for your patience.

Best regards

yu02019 · 2024-05-17T11:36:02Z

The link I provided above seems to be unavailable. Please refer to this temporary link: https://drive.google.com/file/d/1itIhlgW2dpUMwy_qmrymv2HsXwd1XUoE/view?usp=sharing

grandjeanlab · 2024-05-17T12:10:41Z

thanks for the fast reponse. might be installation related? I run ben on linux within a singularity container (the only way I found to install tf 1.15.4.). Here is the definition file to build the container. Maybe you could consider making docker / singularity versions available in the future?? https://github.com/grandjeanlab/apptainer/tree/main/ubuntu_ben

I got the python installed.
absl-py 0.10.0
asn1crypto 0.24.0
astor 0.8.1
certifi 2024.2.2
cryptography 2.1.4
cycler 0.11.0
decorator 4.4.2
gast 0.2.2
google-pasta 0.2.0
grpcio 1.32.0
h5py 2.10.0
idna 2.6
imageio 2.15.0
importlib-metadata 2.0.0
Keras 2.2.4
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
keyring 10.6.0
keyrings.alt 3.0
kiwisolver 1.3.1
Markdown 3.2.2
matplotlib 3.3.2
networkx 2.5.1
nibabel 3.0.0
numpy 1.16.0
opencv-python 4.1.2.30
opt-einsum 3.3.0
pandas 0.23.4
Pillow 8.4.0
pip 20.2.3
protobuf 3.13.0
pycrypto 2.6.1
pygobject 3.26.1
pyparsing 3.1.2
python-apt 1.6.5+ubuntu0.3
python-dateutil 2.9.0.post0
pytz 2024.1
PyWavelets 1.1.1
pyxdg 0.25
PyYAML 6.0.1
scikit-image 0.16.2
scipy 1.5.4
seaborn 0.9.0
SecretStorage 2.3.1
setuptools 50.3.0
SimpleITK 2.0.0
six 1.11.0
tensorboard 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.4
termcolor 1.1.0
Werkzeug 1.0.1
wheel 0.30.0
wrapt 1.12.1
zipp 3.2.0

grandjeanlab · 2024-05-17T12:15:05Z

is the weight path issue related to ben expecting the weights from unet_fp32_all_BN_NoCenterScale_polyic_epoch15_bottle256_04012051 if no weight are specified in the BEN_DA interface??

I put those in my path, but also didn't work.
let me know if I can do something else to help troubleshoot

yu02019 · 2024-05-17T12:25:59Z

I suppose all you have to do in the stripts is maintain the proper weight path. This issue is not caused by the installation.

In the previous code, ".hdf5" needed to be added in BEN_infer.py but not in BEN_DA.py, which could cause confusion.

To address this issue, I added a line of code to unify the path calling methods. Now, both scripts (BEN_infer.py and BEN_DA.py) require only the folder path of the weight files.

BEN/utils/inference.py

Line 75 in 9c325e5

weight += '' if weight.endswith('.hdf5') else '.hdf5'

grandjeanlab · 2024-05-17T17:47:54Z

cool. that seems like it is not it. I still have empty masks when using the weights generated from BEN_DA
the log does not show any kind of error. This is with your demo dataset.

log.txt

would you have some ideas what I could try.

yu02019 · 2024-05-20T14:32:39Z

I haven't encountered this bug (loss being NaN) before. Initially, I thought it might be due to the model not correctly loading the pre-trained weights. However, it now seems more likely to be an environment-related issue.

Here are my logs and the installed Python libraries. I am running this on Windows 10 with a Titan RTX GPU.
conda list.txt
pip list.txt
log-our-20240520.txt

This is just my speculation, but could it be a problem with the CUDA environment? I am using a conda-managed environment with the following setup (shown by the command ”conda list“):

cudatoolkit               10.0.130                      0
cudnn                     7.6.5                cuda10.0_0

I am not sure if using pip to manage the environment is directly calling the host machine's CUDA environment. Thus, you might need to maintain compatibility between the CUDA version on the host machine and the installed TensorFlow version. Another way is, you could try installing via a conda virtual environment.

grandjeanlab · 2024-05-21T18:07:03Z

hi,
I tried conda management instead of containers and that seems to work.

before closing the issue, here is what worked,
using cuda 10.2 on alma linux 8.6.

conda create -y -n ben python=3.6
source activate ben
git clone  --branch doc https://github.com/yu02019/BEN.git
cd BEN
pip install -r requirements.txt

or on google collab using a trick i found (but lost the link)

%env PYTHONPATH = # /env/python

!wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
!chmod +x Miniconda3-py38_4.12.0-Linux-x86_64.sh
!./Miniconda3-py38_4.12.0-Linux-x86_64.sh -b -f -p /usr/local
!conda update conda -y

import sys
sys.path.append('/usr/local/lib/python3.8/site-packages')

!conda create -y -n ben python=3.6

%%shell
eval "$(conda shell.bash hook)"
conda activate ben
pip install ipykernel
git clone --branch doc https://github.com/yu02019/BEN.git
cd BEN
pip install -r requirements.txt

and every following chunks need to be run with the following. but only works to run python scripts!!!

%%shell
eval "$(conda shell.bash hook)"
conda activate ben

yu02019 added the bug Something isn't working label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A problem in the using my own data #3

A problem in the using my own data #3

githubcooperation commented May 6, 2023

grandjeanlab commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 17, 2024

yu02019 commented May 17, 2024

grandjeanlab commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 20, 2024

grandjeanlab commented May 21, 2024

A problem in the using my own data #3

A problem in the using my own data #3

Comments

githubcooperation commented May 6, 2023

grandjeanlab commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 17, 2024

yu02019 commented May 17, 2024

grandjeanlab commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 17, 2024

grandjeanlab commented May 17, 2024

yu02019 commented May 20, 2024

grandjeanlab commented May 21, 2024