Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed #10

Open
tangh18 opened this issue Jul 22, 2019 · 6 comments
Open

assertion failed #10

tangh18 opened this issue Jul 22, 2019 · 6 comments
Assignees

Comments

@tangh18
Copy link

tangh18 commented Jul 22, 2019

enviroment: ubuntu 18.04 lts

command: python3 -m augmentation
log:
E0722 14:48:58.055194283 4374 sync_posix.cc:103] assertion failed: pthread_mutex_lock(mu) == 0
Aborted (core dumped)

causing train.py:
Epoch 1/1
Traceback (most recent call last):
File "train.py", line 30, in
train(-1)
File "train.py", line 25, in train
nb_val_samples=100
File "/home/ryan/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/training_generator.py", line 181, in fit_generator
generator_output = next(output_generator)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 709, in get
six.reraise(*sys.exc_info())
File "/home/ryan/.local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 685, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/ryan/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 626, in next_sample
return six.next(_SHARED_SEQUENCES[uid])
File "/home/ryan/Documents/audioNet/dataGenerator.py", line 23, in DataGenerator
ret = stub.Control(CS(sign = CS.START))
File "/home/ryan/.local/lib/python3.6/site-packages/grpc/_channel.py", line 565, in call
return _end_unary_response_blocking(state, call, False, None)
File "/home/ryan/.local/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1563778152.782501702","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3528,"referenced_errors":[{"created":"@1563778152.782490411","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":399,"grpc_status":14}]}"

@zhenchen3419
Copy link
Contributor

试试 nb_val_samples=10

@tangh18
Copy link
Author

tangh18 commented Jul 22, 2019

试试 nb_val_samples=10

no change ...
still the same problem, stop at 22 / 3000

@zhenchen3419
Copy link
Contributor

grpc调用出错。 @tangh18

@ritou11
Copy link
Contributor

ritou11 commented Jul 22, 2019

依然是protobuf这个包的问题。可以尝试检查下版本,参考#9
另外看错误有一些是进程问题,这种情况先尝试重启电脑看看有没有变化。

@HaoyunHong
Copy link

目前来看,如果是卡在 x/3000 (x较小) ,应当修改train.py中的这些数值,改小就行了:

            #指的是每个epoch里面跑3000个样本,样本数越多,同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本,但是一般情况下可能会因为喂太多数据而卡住,可以把值减小
            nb_val_samples=100

但如果是之前都匀速但是卡在2999/3000,那么很可能是老师说的“grpc调用出错"(虽然这个我还不太懂),感觉上和tensorflow以及python的版本都有关系,因为我这里跑的时候,最前面会出很多warning,大多都说现在这个不用了、过时了,请换成另一种说法,我有怀疑过可能和keras的版本也有关系,因为里面也有keras2是怎么样的,但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

@tangh18
Copy link
Author

tangh18 commented Jul 22, 2019

依然是protobuf这个包的问题。可以尝试检查下版本,参考#9
另外看错误有一些是进程问题,这种情况先尝试重启电脑看看有没有变化。

I degrade tensorflow to 1.10.0 but still the same problem

@tangh18
Copy link
Author

tangh18 commented Jul 22, 2019

目前来看,如果是卡在 x/3000 (x较小) ,应当修改train.py中的这些数值,改小就行了:

            #指的是每个epoch里面跑3000个样本,样本数越多,同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本,但是一般情况下可能会因为喂太多数据而卡住,可以把值减小
            nb_val_samples=100

但如果是之前都匀速但是卡在2999/3000,那么很可能是老师说的“grpc调用出错"(虽然这个我还不太懂),感觉上和tensorflow以及python的版本都有关系,因为我这里跑的时候,最前面会出很多warning,大多都说现在这个不用了、过时了,请换成另一种说法,我有怀疑过可能和keras的版本也有关系,因为里面也有keras2是怎么样的,但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

doesn't work...
seems the problem of grpc

1 similar comment
@tangh18
Copy link
Author

tangh18 commented Jul 22, 2019

目前来看,如果是卡在 x/3000 (x较小) ,应当修改train.py中的这些数值,改小就行了:

            #指的是每个epoch里面跑3000个样本,样本数越多,同等条件下跑完一个epoch的时间越长
            samples_per_epoch=3000,
            #指的是一共有多少个epoch
            nb_epoch=1,
            validation_data=test,
            #指的是每次喂几个样本,但是一般情况下可能会因为喂太多数据而卡住,可以把值减小
            nb_val_samples=100

但如果是之前都匀速但是卡在2999/3000,那么很可能是老师说的“grpc调用出错"(虽然这个我还不太懂),感觉上和tensorflow以及python的版本都有关系,因为我这里跑的时候,最前面会出很多warning,大多都说现在这个不用了、过时了,请换成另一种说法,我有怀疑过可能和keras的版本也有关系,因为里面也有keras2是怎么样的,但是代码里是keras1的版本之类的warning。工具的版本不一致可能导致client和service的通信代码需要进行改变。

doesn't work...
seems the problem of grpc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants