Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception stack support and stacktrace improvements #24

Conversation

rhoneyager-tomorrow
Copy link
Contributor

Description

This makes debugging ioda and ufo a lot easier.

Enable C++11-style nested exceptions

This PR reworks the oops applications just a bit to enable nested C++11-style exceptions in the JEDI codes. This is used extensively in ioda, where nested exceptions should provide additional information about variable and file names.

This should probably be backported to eckit in several locations, particularly in eckit/testing/Test.h to add this to unit testing, and eckit/exception for a general fix. Adding it at the oops level, however, is sufficient when diagnosing runtime issues with increasing numbers of observation types in experiments.

Improvements to stack trace code

This PR reworks the stack trace code just a bit to improve the robustness of the code and to enable traces outside of the oops sigfpe checks. Additional comments have been added regarding why Boost's stacktrace is used in favor of eckit's. eckit does not depend on Boost, otherwise I would also consider making this change over there.

Nested exceptions example:

Here's a somewhat synthetic example of this code's error output. It shows what should happen when IODA tries to access a nonexistent variable. This is a bit verbose, but nested exceptions convey explanations at different processing levels.

Exception: oops::TimeIodaIO<IODA> terminating...
Exception: level: 0
	Reason:	An exception occurred inside ioda while opening a variable.
	source_filename:	/backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp
	source_function:	ioda::Variable ioda::detail::Has_Variables_Base::open(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &) const
	source_line:	108
	stacktrace:	 
 0# util::stacktrace_current[abi:cxx11]() at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/oops/src/oops/util/Stacktrace.cc:23
 1# ioda::Exception::add_call_stack() at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Exception.cpp:46
 2# ioda::Exception::Exception(char const*, ioda::detail::compat::source_location::source_location const&, ioda::Options const&) at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Exception.cpp:25
 3# ioda::detail::Has_Variables_Base::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp:109
 4# ioda::TimeIodaIO<ioda::IodaTrait>::execute(eckit::Configuration const&, bool) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/build/ioda/include/ioda/mains/timeIodaIO.h:59
 5# oops::Run::execute(oops::Application const&, eckit::mpi::Comm const&) at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/oops/src/oops/runs/Run.cc:186
 6# main at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/mains/timeIodaIO.cc:20
 7# __libc_start_call_main in /lib64/libc.so.6
 8# __libc_start_main_alias_1 in /lib64/libc.so.6
 9# _start in ../../bin/time_IodaIO.x

Exception: level: 1
	Reason:	Variable not found.
	name:	test
	source_filename:	/backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Engines/ObsStore/Variables.cpp
	source_function:	std::shared_ptr<ioda::ObsStore::Variable> ioda::ObsStore::Has_Variables::open(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &) const
	source_line:	189
	stacktrace:	 
 0# util::stacktrace_current[abi:cxx11]() at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/oops/src/oops/util/Stacktrace.cc:23
 1# ioda::Exception::add_call_stack() at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Exception.cpp:46
 2# ioda::Exception::Exception(char const*, ioda::detail::compat::source_location::source_location const&, ioda::Options const&) at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Exception.cpp:25
 3# ioda::ObsStore::Has_Variables::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Engines/ObsStore/Variables.cpp:189
 4# ioda::Engines::ObsStore::ObsStore_HasVariables_Backend::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Engines/ObsStore/ObsStore-variables.cpp:261
 5# ioda::detail::Has_Variables_Base::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp:105
 6# ioda::TimeIodaIO<ioda::IodaTrait>::execute(eckit::Configuration const&, bool) const at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/build/ioda/include/ioda/mains/timeIodaIO.h:59
 7# oops::Run::execute(oops::Application const&, eckit::mpi::Comm const&) at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/oops/src/oops/runs/Run.cc:186
 8# main at /backup/ryan/src/JCSDA/jedi-bundle/stacktraces/ioda/src/mains/timeIodaIO.cc:20
 9# __libc_start_call_main in /lib64/libc.so.6
10# __libc_start_main_alias_1 in /lib64/libc.so.6
11# _start in ../../bin/time_IodaIO.x

Dependencies

None, but I might make a ioda PR soon-ish that would depend on this.

Checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have run the unit tests before creating the PR

@climbfuji
Copy link
Contributor

Can we review this PR here before I bring over the commits to JCSDA-internal?

Copy link
Contributor

@srherbener srherbener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these improvements!

For my understanding with the exception handling improvements. Do I have this right?

  • As long as the exceptions at the various function levels are based on std::exception, this should work at getting all of the messages printed.
  • At the point a non std::exception is hit, the unwinding stops as well as the messaging.
  • The eckit and ioda exception classes are both based on std::exception so they should be good with this unwinding method

Also, have one question (which is probably outside the scope of this PR). An issue we have seen is where Fortran throws the exception and it doesn't appear to get caught (or processed correctly) on the C++ side. Perhaps this change will help, but the issue could also be that the Fortran exception is not based on std::exception, correct? I'm guessing in that case, one would need to wrap an interface around the C++ std::exception and have Fortran use that (which is perhaps what fckit provides?).

@rhoneyager-tomorrow
Copy link
Contributor Author

rhoneyager-tomorrow commented Feb 29, 2024

Thanks for these improvements!

For my understanding with the exception handling improvements. Do I have this right?

  • As long as the exceptions at the various function levels are based on std::exception, this should work at getting all of the messages printed.

Yes

  • At the point a non std::exception is hit, the unwinding stops as well as the messaging.

Yes. While the C++11 standard allows exceptions that do not inherit from std::exception, all of the post-C++11 features like std::exception_ptr require this dependence.

  • The eckit and ioda exception classes are both based on std::exception so they should be good with this unwinding method

Yes

Also, have one question (which is probably outside the scope of this PR). An issue we have seen is where Fortran throws the exception and it doesn't appear to get caught (or processed correctly) on the C++ side. Perhaps this change will help, but the issue could also be that the Fortran exception is not based on std::exception, correct? I'm guessing in that case, one would need to wrap an interface around the C++ std::exception and have Fortran use that (which is perhaps what fckit provides?).

You cannot reliably throw exceptions across language boundaries. It is undefined behavior. This definitely will interfere with any type of object or resource handle initialization, finalization, or destruction. This was why ioda's C interface, for example, was written to catch all exceptions at the interface layer and terminate.

You can, however, catch and encapsulate or translate the exceptions right before you pass through different languages in the codebase. I think I had some prototype code for doing this in one or more branches of jckit. Somewhere in there you can also find a basic implementation of C-style exceptions using setjmp/longjmp. Similar examples in Fortran exist on the web, like this one.

@srherbener
Copy link
Contributor

Thank you @rhoneyager-tomorrow for the information - very helpful!

@shlyaeva shlyaeva requested review from fmahebert and ctgh March 6, 2024 14:38
Copy link
Contributor

@ctgh ctgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for adding this feature - it should make understanding exceptions much easier.

#ifndef BOOST_STACKTRACE_USE_NOOP
std::ostringstream s;
s << boost::stacktrace::stacktrace() << std::endl;
std::string sout = s.str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be const.

@@ -0,0 +1,19 @@
/*
* (C) Copyright 2021-2023 UCAR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024

#include <iostream>

int main(int argc, char **argv) {
std::cout << "Stacktrace:\n" << util::stacktrace_current() << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion but you could use oops::Log::info() here instead of std::cout.

@climbfuji
Copy link
Contributor

Since we have two approvals for this, I'll go ahead and

  • update the branch from develop
  • create a PR in jcsda-internal

@climbfuji
Copy link
Contributor

The PR for jcsda-internal is https://github.com/JCSDA-internal/oops/pull/2578

I will keep this PR open until https://github.com/JCSDA-internal/oops/pull/2578 is merged

@fmahebert
Copy link
Contributor

Thanks for this improvement, @rhoneyager-tomorrow !

@climbfuji
Copy link
Contributor

PR was merged in jcsda-internal and should show up on JCSDA public develop shortly. Thanks for this contribution @rhoneyager-tomorrow

@climbfuji climbfuji closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants