-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syncd crash in syncd::VendorSai::logSet() during docker startup #21180
syncd crash in syncd::VendorSai::logSet() during docker startup #21180
Comments
@anamehra - could you please help clarify the image version and platform on which this is seen? |
@anamehra - please also upload the techsupport |
|
@kcudnik : can you help here ? |
As you can see crash is in std map on insert and rebalance tree, I'm 99% sure that this is race condition since that set is made after swss common linktodb bind notification, and just checked that logset is not protected by mutex, I will make PR to fix this |
Do you have consistent repro of this ? Can you show other threads backlog from this dump ? |
i see you have " thread apply all bt but its empy ? |
Thanks fo rlooking into this, @kcudnik ! The issue is not easily reproducible and is rarely seen. We had only a couple of instances known for this on two different platforms in the last 3 months. Here is the 'thread apply all bt' from another instance:
|
from thsoe threads it does not seem like other thread is actively accessing std::map, maybe it alredy did, since i don't see any other scenario that _Rb_tree_insert_and_rebalance would crash, it's this line here: |
wrote this: #include <map>
#include <thread>
std::map<int,int> m;
int i = 0;
void run()
{
while(1)
{
m[i] = i;
i++;
}
}
int main()
{
std::thread t1(run);
std::thread t2(run);
t1.join();
t2.join();
} // g++ a.cpp -O2 -g -lpthread immediately got crash:
same crash: _Rb_tree_insert_and_rebalance in my example since it's simple 1st trhead is doing node alocation, i was expecting something similar in syncd code, but this is evidently the cause, will make FIX and post this |
fix here: sonic-net/sonic-sairedis#1505 |
@bingwang-ms , this is not specific to t2. |
@kcudnik This looks like a day-1 issue? |
what that means ? |
I mean this issue has been there since day-1, it's not caused by a recent change. Do we confirm that? |
* [syncd] Move logSet logGet under mutex to prevent race condition Fixes: sonic-net/sonic-buildimage#21180 Mutex is added to protect m_logLevelMap when doing logSet from multiple thread Co-authored-by: Jianquan Ye <[email protected]>
Description
We have observed this syncd crash once on single asic system recentl during config reload:
(gdb) bt
#0 0x00007fce07be70ca in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x000055e218d7f837 in syncd::VendorSai::logSet(_sai_api_t, _sai_log_level_t) ()
#2 0x000055e218d53487 in syncd::Syncd::saiLoglevelNotify(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) ()
#3 0x000055e218d6d5d2 in std::_Function_handler<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >), std::_Bind<void (syncd::Syncd::(syncd::Syncd, std::_Placeholder<1>, std::_Placeholder<2>))(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> >::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&) ()
#4 0x00007fce08ccdbc9 in swss::Logger::linkToDbWithOutput(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#5 0x00007fce08ccdf4a in swss::Logger::linkToDb(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#6 0x000055e218d55422 in syncd::Syncd::setSaiApiLogLevel() ()
#7 0x000055e218d66382 in syncd::Syncd::Syncd(std::shared_ptrsairedis::SaiInterface, std::shared_ptrsyncd::CommandLineOptions, bool) ()
#8 0x000055e218d4f82d in syncd_main(int, char**) ()
#9 0x000055e218d4d98f in main ()
(gdb) thread apply all bt
Looks like the crash happened very early in bringup stage. No appearent errors seein in syslogs.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: