You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:520 (errno: 13 - Permission denied). The server socket has failed to bind to ?UNKNOWN? (errno: 13 - Permission denied).
#140
Traceback (most recent call last):
File "/home/fangzhijun2/ChatGLM-Finetuning-master/train.py", line 234, in
main()
File "/home/fangzhijun2/ChatGLM-Finetuning-master/train.py", line 79, in main
deepspeed.init_distributed()
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 670, in init_distributed
cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 121, in init
self.init_process_group(backend, timeout, init_method, rank, world_size)
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 149, in init_process_group
torch.distributed.init_process_group(backend,
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 900, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 245, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "/home/fangzhijun2/anaconda3/envs/torch/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 176, in _create_c10d_store
return TCPStore(
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:520 (errno: 13 - Permission denied). The server socket has failed to bind to ?UNKNOWN? (errno: 13 - Permission denied).
[2024-04-02 16:47:05,134] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3061266
[2024-04-02 16:47:05,134] [ERROR] [launch.py:322:sigkill_handler] ['/home/fangzhijun2/anaconda3/envs/torch/bin/python', '-u', 'train.py', '--local_rank=0', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'ChatGLM3-6B/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm3', '--lora_dim', '16', '--lora_alpha', '64', '--lora_dropout', '0.1', '--lora_module_name', 'query_key_value,dense_h_to_4h,dense_4h_to_h,dense', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm3'] exits with return code = 1
The text was updated successfully, but these errors were encountered: