Using Hopper to investigate crashes

Using Hopper to investigate crashes

I'd like to share a crash investigation story where I unexpectedly ended up disassembling our own release code.

De-mangling swift symbols

A colleague recently investigated a crash in the wild which looked something like this (I've obfuscated a few details):

EXC_BAD_INSTRUCTION EXC_I386_INVOP 0x0000000000000000

Crashed: com.apple.main-thread
0  ComponentA     0x110748baa globalinit_33_D3DAB360B4CBCC82A755F3663EB4895A_func0 + 8394
1  ComponentB    0x10ff859a2 globalinit_33_87B8F2CE4CB5E64F9242EE8C482CF7A5_func13 + 15458
2  ComponentB    0x10ff87f5e globalinit_33_87B8F2CE4CB5E64F9242EE8C482CF7A5_func13 + 25118
3  ComponentB    0x10ff8a0dd globalinit_33_87B8F2CE4CB5E64F9242EE8C482CF7A5_func13 + 33693
Mysterious crash report

We didn't have any repro steps, but it was our third most frequent crash report.

I got involved when he asked me if I could remember offhand how to demangle Swift names in stack traces. I had a dig through my notes and recalled that you could use the terminal command xcrun swift-demangle.

Generated dispatch_once methods

That didn't work with those globalinit symbols. I had a suspicion these are generated code for things like a static initialiser in Swift. Here's a simple example:

class Foo {
  static let shared = Foo()
}
Swift singleton

This gets translated by the compiler to the C equivalent (this pattern also used to be what we did pre-Swift 3):

static void shared() {
  static Foo *sharedInstance = nil;
  static dispatch_once_t onceToken;
    
  dispatch_once(&onceToken, ^{
    sharedInstance = [[Foo alloc] init];
  });
    
  return sharedInstance;
}
C singleton

In order to be certain, I disassembled the crashing release build in Hopper and had a look. Sure enough, you can see that it's a global function being called after a swift_once call.

int _$s6ComponentA19ComponentAWindowComponentC14sharedInstanceACvau() {
  if (*_globalinit_33_D3DAB360B4CBCC82A755F3663EB4895A_token0 != 0xffffffffffffffff) {
    swift_once(_globalinit_33_D3DAB360B4CBCC82A755F3663EB4895A_token0, _globalinit_33_D3DAB360B4CBCC82A755F3663EB4895A_func0);
  }
  return static ComponentA.ComponentAWindowComponent.sharedInstance : ComponentA.ComponentAWindowComponent;
}

EXC_BAD_INSTRUCTION

Now let's think back to the type of crash in our report: EXC_BAD_INSTRUCTION. Now the legendary Quin "The Eskimo" has a few words on these type of crashes:

usually caused by two things:

  1. you’ve jumped (via a function pointer, or method dispatch with a corrupt object, or by smashing the stack, or whatever) to invalid code
  2. You’ve hit an invalid opcode (typically ud2) inserted by the compiler as a trap mechanism

We had a look at all the functions in the final three frames but nothing stood out. Martin had also pointed out that the instruction byte offsets in the frames were ridiculously big - 8K, 15K, 25K and 33K. Usually you'd expect them to be in the 10s or 100s perhaps. Anyway, no ud2 instructions in sight. We put it aside and figured there must be some kind of corruption in the instruction pointer or a mangled stack trace - Quin's option 1.

It kept bothering me though. I went back through different reports and those final three frames were always the same symbols, in the same frameworks and always with the same offsets (plus or minus a few bytes as the code changes from build to build). It seemed improbable that a mangled stack trace would be so consistent from crash to crash.

Trust the numbers

I had a hypothesis - what if the raw addresses were correct, but the symbolification was wrong. We've still got the crash address, we're just being given it relative to an unrelated symbol.

0  ComponentA     0x110748baa globalinit_33_D3DAB360B4CBCC82A755F3663EB4895A_func0 + 8394

So I fired up Hopper again, navigated back to the globalinit.. function and this time used the 'Navigate to offset in procedure' command, putting in 8394. This time we landed on an ud2 assembly instruction.

Our culprit

Hmm, didn't Quin mention that in option 2? One of the most powerful features of Hopper is being able to show the back-references to an instruction. From that I could find the real code section this instruction a belonged to - calling a selector on a class. Something where I imagine an unexpected nil value would definitely cause an EXC_BAD_INSTRUCTION.

Hopper also lets you export a PDF with a flow diagram showing the complete function you are looking at. I sent that over to Martin with the crashing instruction highlighted on it.

Hopper flow diagram

The next day Martin managed to use that information to recreate the crash:

When our app closes with Window A and Window B open, it crashes if Window A is frontmost
AppKit closes them front to back so when Window B tries to access something in Window A (via the method we're crashing in) it unexpectedly finds nil as it's already been torn down

Key takeaways

I've learnt some valuable lessons from this one:

  • don't dismiss crash reports out of hand when they don't look like they make sense
  • don't always trust the symbol names but do trust the addresses
  • disassembling your own release code can be a very powerful tool in your crash reporting

We've now got a list of similar crashes which we previously thought were unsolvable, it's going to be a lot of fun using the same technique to see if we can get further in tracking them down.

Next steps

I still want to know why the report didn't correctly connect the crashing address with the right symbol. I'm going to be having a play with dwarf-dump, xcrun atos and a few other tools to see if I can get to the bottom of it. If I have any luck I'll be posting back here..


Any questions or insights? Message me on Twitter @earltedly. You can also sign-up for emails with more posts like this one.
Show Comments