Xamarin.iOS Crashing when SIGSEGV catching enabled
mzekrallah
Posted on
May 19 2015
Hi,
I'm using Xamarn.iOS and everything is running ok. However, When I wanted to take advantage of catching SIGSEV Signals to catch native craches, I followed this article https://raygun.io/blog/2014/10/report-native-ios-errors-new-beta-xamarin-provider/ and added this line in Main.CS :
RaygunClient.Attach ("", true, false);
The problem now is that the app will always crash at startup and I this error in Xamarin Studio Console :
[PLCrashReport] plframecursorreadframeptr:86: Stack growing in wrong direction, terminating stack walk [PLCrashReport] plcrashwriterwritethread:1044: Terminated stack walking early: Corrupted frame [PLCrashReport] plframecursorreadframeptr:86: Stack growing in wrong direction, terminating stack walk [PLCrashReport] plcrashwriterwritethread:1044: Terminated stack walking early: Corrupted frame
Please help as I want to take advantage of native crashes as I'm not that interested in Managed Exceptions only.
Jason Fauchelle
Raygun
Posted on
May 20 2015
Hello,
Thanks for pointing this out. I have not been able to reproduce this yet but will keep trying tomorrow. If you are able to send a simple repro project, that would be a huge help to debugging this. In any case, I'll let you know what I find tomorrow.
-Jason Fauchelle
Jason Fauchelle
Raygun
Posted on
May 21 2015
Hello,
I have had another extensive look into this, but have not yet been able to reproduce this issue. The answers to these questions will help me look into this further:
Are you throwing or causing an exceptions on app startup (either in managed or native code)?
If you change both parameters in the RaygunClient.Attach call to true, does that solve the problem?
If you create a fresh new app and integrate Raygun4Net in the same way as you are doing in your app, does the problem still occur?
-Jason Fauchelle
mzekrallah
Posted on
May 21 2015
Hi Jason,
Thank you for your response. Actually, this issue is really pretty weird and hard to debug. I just tried a new empty app and it is working with (true,false) combination.
In my app, the error is still there even though I'm not throwing any errors on start-up (managed nor native).
Below some notes for further debugging ;
Any combination other than (true,false) is working in my app and not making the error I stated in my original thread and if I throw any errors they are reported to the portal.
I thought maybe because I have some other craching libraries references in my project, then maybe they cause that error if they have PLCrashReporter ios lib integrated, but I removed them all and still the error appears.
Nothing special in my main or app delegate code .. normal Xamarin.iOS unified project.
I thought maybe something stuck in my app simulator and raygun tries to do something special like sending previous build start up errors but I reset simulator and the error is still there.
I have another app with similar functionality, I integrated RayGun and used (true,false) and the app run and I can see the first login screen. When I click login, it crached and sent a 'SIGBUS BUS_ADRERR' to the portal. Now, the second time I run the app it craches immediately with the stack frame issue described in the original thread ! Now, no matter what I do, I can't get pass that and have the first page open as I did. I reset simulator, clean build but it will always crache on app loading with stack frame issue and I never make it to that login page anymore !
In what other ways could you think this error is related to ? I mean in which use cases does raygun asks the PLCrashReporter to walk the stack frames ? or is that a generic initialization code that happens anyway for all cases ?
I'm ready to help you further debug this problem so please asks me for any info you may need other than sending my project of course :P
I really need to get this working as I have tried all other libs and they all suck at reporting native errors (Xamarin Insights, Ubertesters, TestFairy, etc..) and Raygun is my last hope in my evaluation.
Best Regards
Jason Fauchelle
Raygun
Posted on
May 21 2015
Hello,
Thanks for all this additional information.
The only way the PLCrashReporter should be walking the stack frames is if a native error has occurred (This can include errors occurring in the native code that Xamarin generates from your managed code). Just to clarify (true, true) is working in your apps, but (true, false) is not? This most likely means that you have a NullReferenceException occurring within a try catch block somewhere in your code.
In mono, null reference exceptions start out as SIGSEGV signals in native code which in Objective-C is bad, and so PLCrashReporter detects this, reports it, and aborts the app before the try/catch in the managed code can do anything about it. That is the reason for the second parameter - when true, Raygun hijacks the SIGSEGV and SIGBUS signals to workaround the null reference within try/catch issue, but prevents native SIGSEGV and SIGBUS signals from being reported altogether.
The next step would be to look through any try/catch blocks in your apps to determine if this really is the case. If so, null reference exceptions are generally very easy to avoid with a null check, so I recommend fixing them where possible.
As for your other app with the login screen, I'm not sure what could be causing that, but would again recommend look at managed exceptions occurring in try/catch blocks. Does (true, true) fix the issue in your other app?
Hope that helps you look further into the issue. Let me know how it goes.
-Jason Fauchelle
mzekrallah
Posted on
May 21 2015
Awesome explanation Jason. I fully understand the context now. To answer your questions :
Some Info :
Yes, (true,true) works for me. However, (true,false) doesn't.
I don't have any exception handling as of yet in all the app as I'm still to add that layer later.
Some Questions :
- Would null references occurring outside try/catch blocks have the same effect as you said ?
- I'm a bit more convinced that it may not be null reference issues given that I have tried the other app now and stackframe issue appears 7 times out of 10 trials .. so if I clean the simulator and try again several times, I could get lucky and see the login screen. However, when I click the login button the crach happens again but the debugger never even reaches the click method handler.
Anyway, this is pretty weird and looks hard to debug. My concern is that even if I have null references and I make sure to avoid them, this solution is not practical because I can't even see the stack trace or the lines throwing the errors in the console or even in RayGun. It looks like this bypass mono handler as well. Couldn't you guys kick-in the PLCrashReporter after mono handler is done with them so we can catch the real ugly SIGSEGV that mono handler can't process into exceptions ? and is this a Xamarin issue, RayGun's or PLCrashReporter's ?
Thank you
Jason Fauchelle
Raygun
Posted on
May 22 2015
Some excellent questions.
Null reference exceptions occurring outside try/catch blocks do in fact have the same effect - PLCrashReporter picks up the native SIGSEGV signal, reports it, and then aborts the app. That said, (true,true) would mean that the managed null reference exception gets reported which you would know about, so this shouldn't be related to your case.
If you are seeing native exceptions showing up in your Raygun dashboard, you can upload the dSYM that Xamarin generates when you build the app which will be able to symbolicate the stack trace to help with debugging. Line numbers will probably be off for any managed stack frames though. dSYMs will be generated for iPhone builds (not simulator builds) and can be found in the bin folder of your project. You can upload to Ragun by clicking Application Settings -> dSYM Center in the left-hand sidebar of your Raygun application. To symbolicate the stack trace, the dSYM will need to be generated from the same build that you report the error from. Note that these steps are only applicable if you are seeing native errors in your Raygun dashboard.
Both the PLCrashReporter and managed exception handler needs to be attached to get the most wide spread exception handling coverage. I have noted down to have a look at resolving the null native signal issue, but not sure when I can look into it yet.
Who is causing this issue can't easily be answered until I'm able to fully reproduce this at my end which is what I'll need to do in order to continue looking into this. If you can, please try to create a repro project that can demonstrate this issue. You mentioned that you tried Raygun in a new empty app, so you could use that project as a starting point. Perhaps comment out bits of you application to try and narrow down where the issue could be originating from. My next best guess is that a stack overflow exception is occurring in native code - this is something that PLCrashReporter doesn't detect, so that may be the next thing to search for. Since you have two apps with the same issue, try look into the similarities between them.
Also, in case you haven't done so already, make sure your using the latest stable build of Xamarin Studio and Xamarin.iOS
I hope we can get to the bottom of this. Let me know any further questions, information or observations you find.
-Jason Fauchelle