"Model running produces NaN values." #12

Tbabtm · 2024-04-18T00:16:39Z

Hello, thank you for your work. However, I encountered some issues when using your model. When training the model with my dataset, I encountered NaN values. My dataset has the same format as weather.csv, but with different field values and numbers of fields. Interestingly, the same dataset can be trained on other models without any issues, such as ICLR's spotlight 'Itransformer'. When training with your model, all parameters remain unchanged, and training with seq_len=96 and pred_len in [96, 192, 336] results in NaN values and failure. However, training with seq_len=96 and pred_len=336 does not result in NaN values and is successful. I believe my data is fine, so there might be some bugs in your model. The specific error message is as follows:
Traceback (most recent call last): File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/run.py", line 112, in <module> exp.train(setting) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/exp/exp_main.py", line 143, in train outputs, balance_loss = self.model(batch_x) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/models/PathFormer.py", line 57, in forward out, aux_loss = layer(out) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 103, in forward gates, load = self.noisy_top_k_gating(new_x, self.training) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 94, in noisy_top_k_gating load = (self._prob_in_top_k(clean_logits, noisy_logits, noise_stddev, top_logits)).sum(0) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 61, in _prob_in_top_k prob_if_in = normal.cdf((clean_values - threshold_if_in) / noise_stddev) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/normal.py", line 87, in cdf self._validate_sample(value) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/distribution.py", line 300, in _validate_sample raise ValueError( ValueError: Expected value argument (Tensor of shape (256, 4)) to be within the support (Real()) of the distribution Normal(loc: tensor([0.], device='cuda:0'), scale: tensor([1.], device='cuda:0')), but found invalid values: tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ..., [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], device='cuda:0', grad_fn=<DivBackward0>)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Model running produces NaN values." #12

"Model running produces NaN values." #12

Tbabtm commented Apr 18, 2024 •

edited

Loading

"Model running produces NaN values." #12

"Model running produces NaN values." #12

Comments

Tbabtm commented Apr 18, 2024 • edited Loading

Tbabtm commented Apr 18, 2024 •

edited

Loading