Monday, February 10, 2014

Making sense of backtraces with addr2line

When the X server crashes it prints a backtrace to the log file. This backtrace looks something like this:

(EE) Backtrace:
(EE) 0: /usr/bin/Xorg (OsLookupColor+0x129) [0x473759]
(EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x3cd140f74f]
(EE) 2: /lib64/libc.so.6 (__select_nocancel+0xa) [0x3cd08ec78a]
(EE) 3: /usr/bin/Xorg (WaitForSomething+0x1ac) [0x46a8fc]
(EE) 4: /usr/bin/Xorg (SendErrorToClient+0x111) [0x43a091]
(EE) 5: /usr/bin/Xorg (_init+0x3b0a) [0x42c00a]
(EE) 6: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 7: /usr/bin/Xorg (_start+0x29) [0x428c35]
(EE) 8: ? (?+0x29) [0x29]
This is a forced backtrace from the current F20 X Server package, generated by killall -11 Xorg. There is not a lot of human-readable information but you can see the call stack, and you can even recognise some internal functions. Now, in Fedora we compile with libunwind which gives us relatively good backtraces. Without libunwind, your backtrace may look like this:
(EE) 
(EE) Backtrace:
(EE) 0: /opt/xorg/bin/Xorg (xorg_backtrace+0xb5) [0x484989]
(EE) 1: /opt/xorg/bin/Xorg (0x400000+0x8d1a4) [0x48d1a4]
(EE) 2: /lib64/libpthread.so.0 (0x3cd1400000+0xf750) [0x3cd140f750]
(EE) 3: /lib64/libc.so.6 (__select+0x33) [0x3cd08ec7b3]
(EE) 4: /opt/xorg/bin/Xorg (WaitForSomething+0x3dd) [0x491a45]
(EE) 5: /opt/xorg/bin/Xorg (0x400000+0x3561b) [0x43561b]
(EE) 6: /opt/xorg/bin/Xorg (0x400000+0x43761) [0x443761]
(EE) 7: /opt/xorg/bin/Xorg (0x400000+0x9baa8) [0x49baa8]
(EE) 8: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 9: /opt/xorg/bin/Xorg (0x400000+0x25df9) [0x425df9]
So, even less information and it certainly makes it hard to figure out where to even get started. Luckily there is a tool to get some useful info out of that: eu-addr2line. All you need is to install the debuginfo package for the crashing program. Then it's just a matter of copying addresses.
$ eu-addr2line -e /opt/xorg/bin/Xorg 0x48d1a4
/home/whot/xorg/xserver/os/osinit.c:132
Alright, this is useful now, I can download the source package and check where it actually goes wrong. But wait - it gets even better. Let's say you have a driver module in the callstack:
(EE) Backtrace:
(EE) 0: /opt/xorg/bin/Xorg (xorg_backtrace+0xb5) [0x484989]
(EE) 1: /opt/xorg/bin/Xorg (0x400000+0x8d1a4) [0x48d1a4]
(EE) 2: /lib64/libpthread.so.0 (0x3cd1400000+0xf750) [0x3cd140f750]
(EE) 3: /opt/xorg/lib/libinput.so.0 (libinput_dispatch+0x19) [0x7ffff1e51593]
(EE) 4: /opt/xorg/lib/xorg/modules/input/libinput_drv.so (0x7ffff205b000+0x2a12) [0x7ffff205da12]
(EE) 5: /opt/xorg/bin/Xorg (xf86Wakeup+0x1b1) [0x4af069]
(EE) 6: /opt/xorg/bin/Xorg (WakeupHandler+0x83) [0x444483]
(EE) 7: /opt/xorg/bin/Xorg (WaitForSomething+0x3fe) [0x491a66]
(EE) 8: /opt/xorg/bin/Xorg (0x400000+0x3561b) [0x43561b]
(EE) 9: /opt/xorg/bin/Xorg (0x400000+0x43761) [0x443761]
(EE) 10: /opt/xorg/bin/Xorg (0x400000+0x9baa8) [0x49baa8]
(EE) 11: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 12: /opt/xorg/bin/Xorg (0x400000+0x25df9) [0x425df9]
You can see that we have a xf86libinput driver (libinput_drv.so) which in turn loadsd libinput.so. You can debug the crash the same way now, just change the addr2line argument:
$ eu-addr2line -e /opt/xorg/lib/libinput.so.0 libinput_dispatch+0x19 
/home/whot/code/libinput/src/libinput.c:603
Having this information of course doesn't mean you can fix any bug. But when you're reporting a bug it can be invaluable. If I have access to the same rpms that you're running it's possible to look up the context of the crash in the source. Or, even better, since you already have access to those you can make debugging a lot easier by attaching the required bits and pieces to a bug report. Seeing a bug report where a reporter already narrowed down where it crashes makes it a lot easier than guessing based on hex numbers what went wrong.