assertion failed #10

tangh18 · 2019-07-22T06:52:45Z

enviroment: ubuntu 18.04 lts

command: python3 -m augmentation
log:
E0722 14:48:58.055194283 4374 sync_posix.cc:103] assertion failed: pthread_mutex_lock(mu) == 0
Aborted (core dumped)

causing train.py:
Epoch 1/1
Traceback (most recent call last):
File "train.py", line 30, in
train(-1)
File "train.py", line 25, in train
nb_val_samples=100
File "/home/ryan/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/training_generator.py", line 181, in fit_generator
generator_output = next(output_generator)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 709, in get
six.reraise(*sys.exc_info())
File "/home/ryan/.local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 685, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 626, in next_sample
return six.next(_SHARED_SEQUENCES[uid])
File "/home/ryan/Documents/audioNet/dataGenerator.py", line 23, in DataGenerator
ret = stub.Control(CS(sign = CS.START))
File "/home/ryan/.local/lib/python3.6/site-packages/grpc/_channel.py", line 565, in call
return _end_unary_response_blocking(state, call, False, None)
File "/home/ryan/.local/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1563778152.782501702","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3528,"referenced_errors":[{"created":"@1563778152.782490411","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":399,"grpc_status":14}]}"

zhenchen3419 · 2019-07-22T07:00:29Z

试试 nb_val_samples=10

tangh18 · 2019-07-22T07:02:32Z

试试 nb_val_samples=10

no change ...
still the same problem, stop at 22 / 3000

zhenchen3419 · 2019-07-22T07:54:52Z

grpc调用出错。 @tangh18

ritou11 · 2019-07-22T09:54:16Z

依然是protobuf这个包的问题。可以尝试检查下版本，参考#9 。
另外看错误有一些是进程问题，这种情况先尝试重启电脑看看有没有变化。

HaoyunHong · 2019-07-22T15:10:42Z

目前来看，如果是卡在 x/3000 (x较小) ，应当修改train.py中的这些数值，改小就行了：

            #指的是每个epoch里面跑3000个样本，样本数越多，同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本，但是一般情况下可能会因为喂太多数据而卡住，可以把值减小
            nb_val_samples=100

但如果是之前都匀速但是卡在2999/3000，那么很可能是老师说的“grpc调用出错"（虽然这个我还不太懂），感觉上和tensorflow以及python的版本都有关系，因为我这里跑的时候，最前面会出很多warning，大多都说现在这个不用了、过时了，请换成另一种说法，我有怀疑过可能和keras的版本也有关系，因为里面也有keras2是怎么样的，但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

tangh18 · 2019-07-22T15:29:28Z

依然是protobuf这个包的问题。可以尝试检查下版本，参考#9 。
另外看错误有一些是进程问题，这种情况先尝试重启电脑看看有没有变化。

I degrade tensorflow to 1.10.0 but still the same problem

tangh18 · 2019-07-22T15:36:01Z

目前来看，如果是卡在 x/3000 (x较小) ，应当修改train.py中的这些数值，改小就行了：
            #指的是每个epoch里面跑3000个样本，样本数越多，同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本，但是一般情况下可能会因为喂太多数据而卡住，可以把值减小
            nb_val_samples=100
但如果是之前都匀速但是卡在2999/3000，那么很可能是老师说的“grpc调用出错"（虽然这个我还不太懂），感觉上和tensorflow以及python的版本都有关系，因为我这里跑的时候，最前面会出很多warning，大多都说现在这个不用了、过时了，请换成另一种说法，我有怀疑过可能和keras的版本也有关系，因为里面也有keras2是怎么样的，但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

doesn't work...
seems the problem of grpc

tangh18 · 2019-07-22T15:36:10Z

目前来看，如果是卡在 x/3000 (x较小) ，应当修改train.py中的这些数值，改小就行了：
            #指的是每个epoch里面跑3000个样本，样本数越多，同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本，但是一般情况下可能会因为喂太多数据而卡住，可以把值减小
            nb_val_samples=100
但如果是之前都匀速但是卡在2999/3000，那么很可能是老师说的“grpc调用出错"（虽然这个我还不太懂），感觉上和tensorflow以及python的版本都有关系，因为我这里跑的时候，最前面会出很多warning，大多都说现在这个不用了、过时了，请换成另一种说法，我有怀疑过可能和keras的版本也有关系，因为里面也有keras2是怎么样的，但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

doesn't work...
seems the problem of grpc

zhenchen3419 assigned zhenchen3419, ritou11 and zhengwx11 and unassigned zhenchen3419 Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assertion failed #10

assertion failed #10

tangh18 commented Jul 22, 2019

zhenchen3419 commented Jul 22, 2019

tangh18 commented Jul 22, 2019 •

edited

Loading

zhenchen3419 commented Jul 22, 2019

ritou11 commented Jul 22, 2019

HaoyunHong commented Jul 22, 2019

tangh18 commented Jul 22, 2019

tangh18 commented Jul 22, 2019

tangh18 commented Jul 22, 2019

assertion failed #10

assertion failed #10

Comments

tangh18 commented Jul 22, 2019

zhenchen3419 commented Jul 22, 2019

tangh18 commented Jul 22, 2019 • edited Loading

zhenchen3419 commented Jul 22, 2019

ritou11 commented Jul 22, 2019

HaoyunHong commented Jul 22, 2019

tangh18 commented Jul 22, 2019

tangh18 commented Jul 22, 2019

tangh18 commented Jul 22, 2019

tangh18 commented Jul 22, 2019 •

edited

Loading