Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Native EXC_BAD_ACCESS signal for NullRefrenceExceptions #3909

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented Jan 21, 2025

Resolves #3776:

Analysis

The following all appear to impact error reporting behaviour for iOS applications:

  1. The runtime (determined by whether AOT compilation is enabled or not)
  2. The ObjCRuntime.MarshalManagedExceptionMode
  3. Whether or not we try to suppress EXC_BAD_ACCESS errors (as in this PR)
  4. Whether or not the exception is caught/ignored by managed code

MONO vs CLR Runtime

When AOT compilation is not used, .NET iOS apps use the Mono runtime. When AOT compilation is used, .NET iOS apps use the CLR runtime. Managed exceptions are marshalled differently by the two different runtimes and unwinding native exceptions is currently not supported by the CLR.

MarshalManagedExceptionMode

In earlier versions of .NET, the AppDomain.CurrentDomain.UnhandledException was not triggered correctly unless the marshalling mode for managed exceptions was set to MarshalManagedExceptionMode.UnwindNativeCode. (see xamarin/xamarin-macios#15252 and the workaround in Sentry).

This has been fixed in net9.0 on the MONO runtime.

Scenarios

I tested the various different permutations of the above, with the following results:

Scenario AOT Marshal Mode Suppress EXC_BAD_ACCESS Catch Exception Desired Behaviour Actual Behaviour Analysis
A Y Default Y Y No Exception (as it's caught) As expected(for the wrong reason) We're catching the managed exception and suppressing the native one, so we don't except any exception. However we can see from scenario B (where we don't catch the manged exception) that the Unhandled Exception Handler never fires anyway. See Notes 1, 2
B Y Default Y N App terminates... NRE exception (on next run) App terminates... No exception We don't get an EXC_BAD_ACCESS error (since we suppress this). However we don't get the NullReferenceException since the Unhandled Exception Handler never fires. See Notes 1, 2
C Y Default N Y No Exception (as it's caught) EXC_BAD_ACCESS error There's no way for the native library to know it's operating in the context of a managed app that has caught the exception... so we get an EXC_BAD_ACCESS error.
D Y Default N N App terminates... NRE exception (on next run) App terminates... EXC_BAD_ACCESS error We get an EXC_BAD_ACCESS error (since we don't suppress this).We don't get the NullReferenceException since the Unhandled Exception Handler never fires. See Notes 1, 2
E Y UnwindNative Y Y No Exception (as it's caught) As expected(for the wrong reason) We get no exception because the Unhandled Exception Handler never fires. See Notes 1, 2
F Y UnwindNative Y N App terminates... NRE exception (on next run) App terminates... No exception We don't get an EXC_BAD_ACCESS error (since we suppress this). We also don't get an NRE since the Unhandled Exception Handler never fires. See Notes 1, 2, 3
G Y UnwindNative N Y No Exception (as it's caught) EXC_BAD_ACCESS error There's no way for the native library to know it's operating in the context of a managed app that has caught the exception... so we get an EXC_BAD_ACCESS error.
H Y UnwindNative N N App terminates... NRE exception (on next run) App terminates... EXC_BAD_ACCESS error We get no exception because the Unhandled Exception Handler never fires. There's no way for the native library to know it's operating in the context of a managed app that has caught the exception... so we get an EXC_BAD_ACCESS error.
I N Default Y Y No Exception (as it's caught) As expected  
J N Default Y N App terminates... NRE exception (on next run) As expected  
K N Default N Y No Exception (as it's caught) EXC_BAD_ACCESS error There's no way for the native library to know it's operating in the context of a managed app that has caught the exception... so we get an EXC_BAD_ACCESS error.
L N Default N N App terminates... NRE exception (on next run) As expected  
M N UnwindNative Y Y No Exception (as it's caught) As expected  
N N UnwindNative Y N App terminates... NRE exception (on next run) As expected  
O N UnwindNative N Y No Exception (as it's caught) EXC_BAD_ACCESS error There's no way for the native library to know it's operating in the context of a managed app that has caught the exception... so we get an EXC_BAD_ACCESS error.
P N UnwindNative N N App terminates... NRE exception (on next run) App terminates... NRE exception (on next run)... EXC_BAD_ACCESS error The native library isn't aware a managed exception has already been reported so we get one NRE and one EXC_BAD_ACCESS error.

Notes

  1. See Issues with unhandled exceptions xamarin/xamarin-macios#15252
  2. It's not possible to apply this fix to AOT applications
  3. MarshalManagedExceptionMode.UnwindNative isn't supported in AOT / CLR apps.

Solution

AOT / CLR

  • Fix the issue of not getting NREs (unhandled exception handler not firing) in AOT Apps
  • Suppress EXC_BAD_ACCESS errors, to avoid duplication (this PR)

MONO

Notes on EXC_BAD_ACCESS suppression

We don't know, from the stack trace, whether the EXC_BAD_ACCESS comes from native or managed code. This workaround assumes it comes from managed code and will be reported separately as a managed exception.

Prior to this PR the behaviour was the opposite. All EXC_BAD_ACCESS errors were assumed to be native exceptions and reported by the native SDK (so often users would get a EXC_BAD_ACCESS, that was a bit confusing, in addition to an NRE that makes more sense to .NET devs)

Potential alternative: add an option to let users decide whether to suppress these native errors or not.

Copy link
Member

@bruno-garcia bruno-garcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only remove the events we'd like to drop. @filipnavara might have tips on how to do this better but ai think it's good enough

@jamescrosswell
Copy link
Collaborator Author

This seems to only remove the events we'd like to drop. @filipnavara might have tips on how to do this better but ai think it's good enough

Kind of.

If it's a SIGABRT that results from something that has already been captured and translated to a managed NullReferenceException then this will remove an event we're not interested in (we've already processed it in the managed SDK).

If the SIGABRT results from a null pointer reference in some native code that didn't get translated to a managed NullReferenceException though, we're dropping an exception here and the customer won't have any visibility of this.

Those are our two options at the moment though: either have duplicate events (one of them confusing) or risk dropping legitimate native exceptions.

Unfortunately there's nothing in the stack trace we can use for these exceptions, like we did here.

@jamescrosswell jamescrosswell marked this pull request as draft January 29, 2025 09:34
@jamescrosswell jamescrosswell marked this pull request as ready for review January 30, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sentry reports handled NRE as EXC_BAD_ACCESS signal
3 participants