You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
----------------------------------------
Device Setting
----------------------------------------
Entity embedding place: gpu
Relation embedding place: gpu
----------------------------------------
----------------------------------------
Embedding Setting
----------------------------------------
Entity embedding dimension: 400
Relation embedding dimension: 200
----------------------------------------
2022-12-03 20:54:31,717 INFO seed :0
2022-12-03 20:54:31,718 INFO data_path :/home/aistudio/data/
2022-12-03 20:54:31,718 INFO save_path :/home/aistudio/result/Rotate/rotate_OpenBG500_d_200_g_12.0_e_gpu_r_gpu_l_Logsigmoid_lr_0.018_0.1_KGE
2022-12-03 20:54:31,718 INFO init_from_ckpt :None
2022-12-03 20:54:31,718 INFO data_name :OpenBG500
2022-12-03 20:54:31,718 INFO use_dict :False
2022-12-03 20:54:31,718 INFO kv_mode :False
2022-12-03 20:54:31,718 INFO batch_size :1
2022-12-03 20:54:31,718 INFO test_batch_size :16
2022-12-03 20:54:31,718 INFO neg_sample_size :256
2022-12-03 20:54:31,718 INFO filter_eval :True
2022-12-03 20:54:31,718 INFO model_name :rotate
2022-12-03 20:54:31,718 INFO embed_dim :200
2022-12-03 20:54:31,718 INFO reg_coef :1e-07
2022-12-03 20:54:31,718 INFO loss_type :Logsigmoid
2022-12-03 20:54:31,718 INFO max_steps :1
2022-12-03 20:54:31,718 INFO lr :0.018
2022-12-03 20:54:31,718 INFO optimizer :adagrad
2022-12-03 20:54:31,718 INFO cpu_lr :0.1
2022-12-03 20:54:31,718 INFO cpu_optimizer :adagrad
2022-12-03 20:54:31,719 INFO mix_cpu_gpu :False
2022-12-03 20:54:31,719 INFO async_update :False
2022-12-03 20:54:31,719 INFO valid :True
2022-12-03 20:54:31,719 INFO test :False
2022-12-03 20:54:31,719 INFO task_name :KGE
2022-12-03 20:54:31,719 INFO num_workers :2
2022-12-03 20:54:31,719 INFO neg_sample_type :chunk
2022-12-03 20:54:31,719 INFO neg_deg_sample :True
2022-12-03 20:54:31,719 INFO neg_adversarial_sampling:True
2022-12-03 20:54:31,719 INFO adversarial_temperature:1.0
2022-12-03 20:54:31,719 INFO filter_sample :False
2022-12-03 20:54:31,719 INFO valid_percent :1.0
2022-12-03 20:54:31,719 INFO use_feature :False
2022-12-03 20:54:31,719 INFO reg_type :norm_er
2022-12-03 20:54:31,719 INFO reg_norm :3
2022-12-03 20:54:31,719 INFO weighted_loss :False
2022-12-03 20:54:31,719 INFO margin :1.0
2022-12-03 20:54:31,719 INFO pairwise :False
2022-12-03 20:54:31,719 INFO gamma :12.0
2022-12-03 20:54:31,719 INFO ote_scale :0
2022-12-03 20:54:31,719 INFO ote_size :1
2022-12-03 20:54:31,719 INFO quate_lmbda1 :0.0
2022-12-03 20:54:31,719 INFO quate_lmbda2 :0.0
2022-12-03 20:54:31,719 INFO num_epoch :30
2022-12-03 20:54:31,719 INFO scheduler_interval :-1
2022-12-03 20:54:31,720 INFO num_process :1
2022-12-03 20:54:31,720 INFO print_on_screen :True
2022-12-03 20:54:31,720 INFO log_interval :1000
2022-12-03 20:54:31,720 INFO save_interval :-1
2022-12-03 20:54:31,720 INFO eval_interval :20000
2022-12-03 20:54:31,720 INFO ent_emb_on_cpu :False
2022-12-03 20:54:31,720 INFO rel_emb_on_cpu :False
2022-12-03 20:54:31,720 INFO use_embedding_regularization:True
2022-12-03 20:54:31,720 INFO ent_dim :400
2022-12-03 20:54:31,720 INFO rel_dim :200
2022-12-03 20:54:31,720 INFO num_chunks :1
W1203 20:54:51.171296 20466 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1203 20:54:51.174912 20466 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py:3983: DeprecationWarning: Op `adagrad` is executed through `append_op` under the dynamic mode, the corresponding API implementation needs to be upgraded to using `_C_ops` method.
DeprecationWarning,
2022-12-03 20:54:54,271 INFO [evaluation] start...
0%| | 0/313 [00:00<?, ?it/s]terminate called after throwing an instance of 'paddle::memory::allocation::BadAlloc'
what():
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 multiply_ad_func(paddle::experimental::Tensor const&, paddle::experimental::Tensor const&)
1 paddle::experimental::multiply(paddle::experimental::Tensor const&, paddle::experimental::Tensor const&)
2 void phi::MultiplyRawKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, int, phi::DenseTensor*)
3 float* phi::DeviceContext::Alloc<float>(phi::TensorBase*, unsigned long, bool) const
4 phi::DeviceContext::Impl::Alloc(phi::TensorBase*, phi::Place const&, paddle::experimental::DataType, unsigned long, bool) const
5 phi::DenseTensor::AllocateFrom(phi::Allocator*, paddle::experimental::DataType, unsigned long)
6 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
7 paddle::memory::allocation::Allocator::Allocate(unsigned long)
8 paddle::memory::allocation::Allocator::Allocate(unsigned long)
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
11 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
12 phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 2.977216GB memory on GPU 0, 29.256836GB memory has been allocated and available memory is only 2.491699GB.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is `export FLAGS_use_cuda_managed_memory=false`.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::pybind::ThrowExceptionToPython(std::__exception_ptr::exception_ptr)
----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
[TimeInfo: *** Aborted at 1670072104 (unix time) try "date -d @1670072104" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e800004ff2) received by PID 20466 (TID 0x7f2dee9e2700) from PID 20466 ***]
The text was updated successfully, but these errors were encountered:
使用ai studio上的32g v100在OpenBG500数据集上跑RotatE模型,训练集正常,测试集无论bs调多少都会OOM。
我的代码:
因为一直报错,我就把max_steps设为1,事实上模型的训练很正常,但是测试的时候会oom
The text was updated successfully, but these errors were encountered: