Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gwhisper v3.3.3 hangs after RPC #135

Open
rainerschoe opened this issue Feb 23, 2022 · 5 comments
Open

gwhisper v3.3.3 hangs after RPC #135

rainerschoe opened this issue Feb 23, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@rainerschoe
Copy link
Member

rainerschoe commented Feb 23, 2022

This seems to happen very seldomly:

(gdb) bt
#0  0x00007ff7e63d80f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007ff7e71c8874 in ?? () from /lib64/libgrpc.so.7
#2  0x00007ff7e71ec948 in ?? () from /lib64/libgrpc.so.7
#3  0x00000000005424dc in grpc::CompletionQueue::Pluck (this=0x1723270, tag=0x7ffcc2ab97e0)
    at /usr/include/grpcpp/impl/codegen/completion_queue.h:311
#4  0x0000000000547772 in grpc::ClientReaderWriter<grpc::reflection::v1alpha::ServerReflectionRequest, grpc::reflection::v1alpha::ServerReflectionResponse>::Finish (this=0x1723250) at /usr/include/grpcpp/impl/codegen/sync_stream.h:535
#5  0x000000000053fce2 in grpc::ProtoReflectionDescriptorDatabase::~ProtoReflectionDescriptorDatabase (this=0x17219b0, 
    __in_chrg=<optimized out>)
    at /usr/src/debug/opensrc_gwhisper-29-1.x86_64/third_party/gRPC_utils/proto_reflection_descriptor_database.cc:47
#6  0x0000000000528dff in __gnu_cxx::new_allocator<grpc::ProtoReflectionDescriptorDatabase>::destroy<grpc::ProtoReflectionDescriptorDatabase> (this=0x17219b0, __p=0x17219b0) at /usr/include/c++/8/ext/new_allocator.h:140
#7  0x0000000000528cf3 in std::allocator_traits<std::allocator<grpc::ProtoReflectionDescriptorDatabase> >::destroy<grpc::ProtoReflectionDescriptorDatabase> (__a=..., __p=0x17219b0) at /usr/include/c++/8/bits/alloc_traits.h:487
#8  0x0000000000528a01 in std::_Sp_counted_ptr_inplace<grpc::ProtoReflectionDescriptorDatabase, std::allocator<grpc::ProtoReflectionDescriptorDatabase>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x17219a0)
    at /usr/include/c++/8/bits/shared_ptr_base.h:554
#9  0x00000000004e4caa in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x17219a0)
    at /usr/include/c++/8/bits/shared_ptr_base.h:155
#10 0x00000000004e44d1 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x1719300, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:728
#11 0x00000000004ee5e4 in std::__shared_ptr<grpc::ProtoReflectionDescriptorDatabase, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x17192f8, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#12 0x00000000004ee600 in std::shared_ptr<grpc::ProtoReflectionDescriptorDatabase>::~shared_ptr (this=0x17192f8, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#13 0x00000000004ef1be in cli::ConnList::~ConnList (this=0x17192e8, __in_chrg=<optimized out>)
    at /usr/src/debug/opensrc_gwhisper-29-1.x86_64/src/libCli/ConnectionManager.hpp:22
#14 0x0000000000505164 in std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>::~pair (this=0x17192c8, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/stl_pair.h:208
#15 0x0000000000505190 in __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, true> >::destroy<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList> > (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __p=0x17192c8)
    at /usr/include/c++/8/ext/new_allocator.h:140
#16 0x0000000000502aa3 in std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, true> > >::destroy<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList> > (__a=..., __p=0x17192c8)
    at /usr/include/c++/8/bits/alloc_traits.h:487
#17 0x00000000005009dd in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, true> > >::_M_deallocate_node (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __n=0x17192c0)
    at /usr/include/c++/8/bits/hashtable_policy.h:2100
#18 0x00000000004fe10a in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, true> > >::_M_deallocate_nodes (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __n=0x0)
    at /usr/include/c++/8/bits/hashtable_policy.h:2113
#19 0x00000000004fb984 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>) at /usr/include/c++/8/bits/hashtable.h:2051
#20 0x00000000004f2ea2 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __in_chrg=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/include/c++/8/bits/hashtable.h:1369
#21 0x00000000004ee75a in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, cli::ConnList, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cli::ConnList> > >::~unordered_map (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/unordered_map.h:102
#22 0x00000000004ee792 in cli::ConnectionManager::~ConnectionManager (
    this=0x7a1e60 <cli::ConnectionManager::getInstance()::connectionManager>, __in_chrg=<optimized out>)
    at /usr/src/debug/opensrc_gwhisper-29-1.x86_64/src/libCli/ConnectionManager.hpp:38
#23 0x00007ff7e6314b0c in __run_exit_handlers () from /lib64/libc.so.6
#24 0x00007ff7e6314c40 in exit () from /lib64/libc.so.6
#25 0x00007ff7e62fe49a in __libc_start_main () from /lib64/libc.so.6
#26 0x00000000004e2cfe in _start ()
(gdb) info threads
  Id   Target Id                                          Frame 
* 1    Thread 0x7ff7e7bd3c80 (LWP 5787) "gwhisper"        0x00007ff7e63d80f7 in epoll_wait () from /lib64/libc.so.6
  2    Thread 0x7ff7e4752700 (LWP 5790) "default-executo" 0x00007ff7e60c92fc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  3    Thread 0x7ff7e4551700 (LWP 5791) "resolver-execut" 0x00007ff7e60c92fc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  4    Thread 0x7ff7e4350700 (LWP 5792) "grpc_global_tim" 0x00007ff7e60c96e8 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  5    Thread 0x7ff7dfbe3700 (LWP 5793) "grpc_global_tim" 0x00007ff7e60c92fc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
(gdb) 
@rainerschoe rainerschoe added the bug Something isn't working label Feb 23, 2022
@rainerschoe
Copy link
Member Author

have one instance where this happened with following call:

gwhisper <host> <...>Control ActivateP....Domain slot_id=0x6508

@rainerschoe
Copy link
Member Author

can reproduce in all gwhisper versions:

  1. start test server
  2. start gwhisper call gwhisper --rpcTimeoutMilliseconds=2000 127.0.0.1 examples.StatusHandling rpcSleepForSeconds number=5
  3. within 2 seconds switch to running testserver terminal and press Ctrl+Z so stop scheduling the test server
  4. Observe that gwhisper prints RPC failed ;( Status code: 4 DEADLINE_EXCEEDED, error message: Deadline Exceeded after the 2s timeout, but then blocks forever
  5. start scheduling the test server again by typing fg into the server shell
  6. observe that then gwhisper continues and terminates.

@rainerschoe rainerschoe self-assigned this Mar 1, 2022
@rainerschoe
Copy link
Member Author

rainerschoe commented Mar 1, 2022

Tried to reproduce with modified hello_greeter example (using stubs directly with added deadline):
https://github.com/rainerschoe/grpc_deadline_test

Interestingly I cannot reproduce the problem in this context, so it seems to be somehow related to ProtoReflectionDescriptorDatabase

@rainerschoe
Copy link
Member Author

Ok, had another look at this.
Problem is that gRPC provided code in proto_reflection_descriptor_database.cc is not setting a deadline for the reflection stream RPC.
If the server then crashes or hangs, the stream also is not terminated. This is not a big problem for gWhisper,.
However in the destructor of the DescDb, the implementation calls
Status status = stream_->Finish();
Which waits for a clean shutdown of the stream.
Now if no deadline is set and server is hanging. this block indefinitely.

I successfully circumvented this by adding a deadline. No I am looking into how best to release this as a fix. (fis it upstream, or just pach it in gwhisper, as it is quite a specific usage scenario)

@rainerschoe
Copy link
Member Author

unfortunately the "circumvention" won't do it, as the deadline affects the whole reflection stream, which we have no way of knowing how long it should last. So we cannot specify a deadline here.
After some research, I think gRPC does not provide a way to check health status of a stream and/or safely close it in case of hanging remote side.
There is the TryCancel() function in ClientContext, which circomvents the problem, however then all other stream error messages are lost (e.g. when server has not implemented reflection service). So this is also no option here.
I am out of ideas here.
Only option would be to not use the reflection stream as a stream, but instead open the stream again and again for each reflection request with a set timeout. but this would be insane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant