Thursday, March 21, 2013

Debuggin Oops using Crash Utility


Crash log snippet of binder crash,

Before you start make sure the ramdump and vmlinux matches by using strings command,
strings vmlinux | grep "Linux version"

[49081.080352] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[49081.088500] pgd = d748c000
[49081.091186] [00000030] *pgd=00000000
[49081.094787] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[49081.100067] kona_fb: die notifier invoked
[49081.104064] kona_display_crash_image:359 image_idx=4
[49081.109863] post_async:540 AXIPV has received the same buffer which is used
[49081.116790] post_async:541 It is likely that we see display tearing
[49081.123046] post_async:542 Preventing a potential lock-up by signalling
[49081.129669] Modules linked in: [last unloaded: bcmdhd_4330]
[49081.135253] CPU: 1    Tainted: G        W     (3.4.5+ #1)
[49081.140655] PC is at binder_get_node+0x8/0x34
[49081.144989] LR is at binder_thread_write+0x147c/0x239c
[49081.150115] pc : []    lr : []    psr: 20000053
[49081.150146] sp : f43a3e10  ip : f8f004fc  fp : 00000000
[49081.161560] r10: f8f004c4  r9 : 73622a85  r8 : 00000064
[49081.166778] r7 : f43a2000  r6 : f8f00460  r5 : 411e36f8  r4 : c0a69260
[49081.173278] r3 : 00000010  r2 : 411e3858  r1 : 411e36f8  r0 : f32c7a40
[49081.179779] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[49081.186981] Control: 10c53c7d  Table: 9948c04a  DAC: 00000015

Disassemble of binder_get_node,

crash> dis binder_get_node
0xc03dc4a8 :   ldr     r3, [r0, #12]
0xc03dc4ac : b       0xc03dc4cc
0xc03dc4b0 : ldr     r2, [r3, #32]
0xc03dc4b4 :        sub     r0, r3, #16
0xc03dc4b8 :        cmp     r1, r2
0xc03dc4bc :        ldrcc   r3, [r3, #8]
0xc03dc4c0 :        bcc     0xc03dc4cc
0xc03dc4c4 :        bxls    lr
0xc03dc4c8 :        ldr     r3, [r3, #4]
0xc03dc4cc :        cmp     r3, #0
0xc03dc4d0 :        bne     0xc03dc4b0
0xc03dc4d4 :        mov     r0, r3
0xc03dc4d8 :        bx      lr
crash>

ldr  r3,[r0, #12]
From crash log above, r0 = f32c7a40

To confirm read memory 0xf32c7a40 + 0xc using  crash,

crash> rd f32c7a4c
f32c7a4c:  00008000                              ....
crash>

But r3 is 0x10. Which is not matching with 0x8000. So this means r0 has changed (since ldr  r3,[r0, #12]). Proves that crash not happened in first instance. It had looped for some time

C code binder_get_node,

static struct binder_node *binder_get_node(struct binder_proc *proc,
                                           void __user *ptr)
{
        struct rb_node *n = proc->nodes.rb_node;
        struct binder_node *node;

        while (n) {
                node = rb_entry(n, struct binder_node, rb_node);
                if (ptr < node->ptr)
                        n = n->rb_left;
                else if (ptr > node->ptr)
                        n = n->rb_right;
                else
                        return node;
        }
        return NULL;
}

Back trace from crash log,

[49081.971740] [] (binder_get_node+0x8/0x34) from [] (binder_thread_write+0x147c/0x239c)
[49081.981262] [] (binder_thread_write+0x147c/0x239c) from [] (binder_ioctl+0x37c/0x974)
[49081.990814] [] (binder_ioctl+0x37c/0x974) from [] (do_vfs_ioctl+0x4d8/0x54c)
[49081.999572] [] (do_vfs_ioctl+0x4d8/0x54c) from [] (sys_ioctl+0x34/0x54)
[49082.007904] [] (sys_ioctl+0x34/0x54) from [] (ret_fast_syscall+0x0/0x48)

Since the back trace was showing a sys_ioctl we iterated on stack to pick up the struct file * associated with.
Here is the hit,

crash> struct file f37cbef0
struct file {
  f_u = {
    fu_list = {
      next = 0xf37cb550,
      prev = 0xf2a0def0
    },
    fu_rcuhead = {
      next = 0xf37cb550,
      func = 0xf2a0def0
    }
 },
  f_path = {
    mnt = 0xf5015e58,
    dentry = 0xf3cc5d08
  },
  f_op = 0xc064dc0c,
----sniped---
  f_version = 0,
  private_data = 0xf2e9ca00,
  f_ep_links = {
    next = 0xf37cbf74,
    prev = 0xf37cbf74
  },
  f_tfile_llink = {
    next = 0xf37cbf7c,
    prev = 0xf37cbf7c
  },
  f_mapping = 0xf4090ad0
}

f_op symbol says it is binder_fops,

crash> sym 0xc064dc0c
c064dc0c (r) binder_fops
crash>

From code(drivers/staging/android/binder.c +3012) we figured out that private_data is binder_proc

crash> struct binder_proc 0xf2e9ca00
struct binder_proc {
  proc_node = {
    next = 0xf3514200,
    pprev = 0xc0a69c84
  },
  threads = {
    rb_node = 0xf2a10c84
  },
  nodes = {
    rb_node = 0xf2d1a7d0
  },
  refs_by_desc = {
    rb_node = 0xf3126284
  },
  refs_by_node = {

From the analysis of the code, we understand that crash happened while traversing the rb tree.
So tried to parse the rb tree using crash,

crash> tree -t rbtree -N 0xf2d1a7d0 -p
f2d1a7d0
  position: root
f2a00ad0
  position: root/l
f2a90e50
  position: root/r
f32c7a50
  position: root/r/r
10
  position: root/r/r/l
tree: invalid kernel virtual address: 18  type: "rb_node rb_left"
crash>

This shows that rb tress has a corrupted address to next left node.

0x10 which is the same value in r3 during crash.
And crash happened because of address 0x30.
[49081.080352] Unable to handle kernel NULL pointer dereference at virtual address 00000030

This explains why crashed.
0xc03dc4b0 : ldr     r2, [r3, #32]

R3 + #32 = 0x10 + 0x20 = 0x30

Dumping the corrupted struct rb_node from memory,
crash> struct rb_node f32c7a50
struct rb_node {
  rb_parent_color = 44100,
  rb_right = 0x2,
  rb_left = 0x10
}
crash>

This shows the rb tree is corrupted. Because rb_left and rb_right should have a valid kernel address or 0.

Dumping the memory from f32c7a00, 100 elements to inspect the memory,
crash> rd f32c7a00 100
f32c7a00:  f32c79fc 00000000 00000000 00000000   .y,.............
f32c7a10:  00000000 00000000 00000000 00000000   ................
f32c7a20:  00000000 00000000 00000000 00000000   ................
f32c7a30:  00000000 00000000 00000000 00000000   ................
f32c7a40:  01bb5e6b c03eed84 f4727600 00008000   k^....>..vr.....
f32c7a50:  0000ac44 00000002 00000010 f5c10000   D...............
f32c7a60:  00010000 b7c10000 00000000 00000000   ................
f32c7a70:  411e3858 411e3840 00001fe5 00000002   X8.A@8.A........
f32c7a80:  f32c7a7c 00000001 00000000 00000000   |z,.............
f32c7a90:  00000000 00000000 00000000 00000000   ................
f32c7aa0:  00000000 00000000 00000000 00000000   ................
f32c7ab0:  00000000 00000000 00000000 00000000   ................
f32c7ac0:  00000000 00000000 f32c7ac8 f32c7ac8   .........z,..z,.
f32c7ad0:  00000000 00000000 00000010 00000001   ................
f32c7ae0:  00000001 00000000 00000001 00000001   ................
f32c7af0:  00000000 00000000 00000000 f3faf330   ............0...
f32c7b00:  f3a6c000 00000000 00000000 00000000   ................
f32c7b10:  00000000 00000000 00000000 00000000   ................
f32c7b20:  00000000 00000000 00000000 00000000   ................
f32c7b30:  00000000 00000000 00000000 00000000   ................
f32c7b40:  006aefa7 f32c7b44 f32c7b44 00000003   ..j.D{,.D{,.....
f32c7b50:  f2bf1450 f3bef7d0 00000000 00000000   P...............
f32c7b60:  f3854bdc 00000001 00000000 00000000   .K..............
f32c7b70:  54d4c670 54d3f828 00001fe5 f32c7b7c   p..T(..T....|{,.
f32c7b80:  f32c7b7c 00000000 00000000 00000000   |{,.............

Dumping with symbols,
crash> rd -s f32c7a00 100
f32c7a00:  __crc_v4l2_norm_to_name+482177 00000000 00000000 00000000
f32c7a10:  00000000 00000000 00000000 00000000
f32c7a20:  00000000 00000000 00000000 00000000
f32c7a30:  00000000 00000000 00000000 00000000
f32c7a40:  01bb5e6b AUDIO_DRIVER_InterruptPeriodCB __crc_bmp18x_disable+1706299 00008000
f32c7a50:  0000ac44 00000002 00000010 __crc_generic_segment_checks+42732
f32c7a60:  00010000 b7c10000 00000000 00000000
f32c7a70:  411e3858 411e3840 00001fe5 00000002
f32c7a80:  __crc_v4l2_norm_to_name+482305 00000001 00000000 00000000
f32c7a90:  00000000 00000000 00000000 00000000
f32c7aa0:  00000000 00000000 00000000 00000000
f32c7ab0:  00000000 00000000 00000000 00000000
f32c7ac0:  00000000 00000000 __crc_v4l2_norm_to_name+482381 __crc_v4l2_norm_to_name+482381
f32c7ad0:  00000000 00000000 00000010 00000001
f32c7ae0:  00000001 00000000 00000001 00000001
f32c7af0:  00000000 00000000 00000000 __crc_qdisc_class_hash_grow+482810
f32c7b00:  __crc_rt_mutex_trylock+115850 00000000 00000000 00000000
f32c7b10:  00000000 00000000 00000000 00000000
f32c7b20:  00000000 00000000 00000000 00000000
f32c7b30:  00000000 00000000 00000000 00000000
f32c7b40:  006aefa7 __crc_v4l2_norm_to_name+482505 __crc_v4l2_norm_to_name+482505 00000003
f32c7b50:  __crc_usb_get_descriptor+260561 __crc_bdi_set_max_ratio+149137 00000000 00000000
f32c7b60:  __crc_snd_interval_ratnum+776842 00000001 00000000 00000000
f32c7b70:  54d4c670 54d3f828 00001fe5 __crc_v4l2_norm_to_name+482561
f32c7b80:  __crc_v4l2_norm_to_name+482561 00000000 00000000 00000000

Somebody has over written from f32c7a40
This causes rb_tree corruption.
Surprisingly there is an audio function address in the corrupted memory as highlighted above.


From the kernel logs we can see audio and rpc errors just before the crash.
Eg:
[48902.141143] WARNING: at /build/REL_sdb-common-android-jb-4.2.2_428843/android_hawaii_garnet_edn010/kernel/hawaii/3.4.5/sound/caph_hawaii/caph_pcm.c:248 PcmHwFree+0x8c/0xdc()
[48902.157012] ERROR timeout waiting for STOP REQ.t=30000 ret=0
[48902.162719] Modules linked in: [last unloaded: bcmdhd_4330]
[48902.168579] [] (unwind_backtrace+0x0/0x11c) from [] (warn_slowpath_common+0x4c/0x64)
[48902.178192] [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_fmt+0x2c/0x3c)
[48902.187957] [] (warn_slowpath_fmt+0x2c/0x3c) from [] (PcmHwFree+0x8c/0xdc)
[48902.196716] [] (PcmHwFree+0x8c/0xdc) from [] (snd_pcm_release_substream+0x44/0xa4)
[48902.206512] [] (snd_pcm_release_substream+0x44/0xa4) from [] (snd_pcm_release+0x28/0x6c)
[48902.216735] [] (snd_pcm_release+0x28/0x6c) from [] (fput+0x10c/0x214)
[48902.224914] [] (fput+0x10c/0x214) from [] (filp_close+0x70/0x7c)
[48902.232879] [] (filp_close+0x70/0x7c) from [] (put_files_struct+0x90/0xec)
[48902.241668] [] (put_files_struct+0x90/0xec) from [] (do_exit+0x280/0x7e0)
[48902.250671] [] (do_exit+0x280/0x7e0) from [] (do_group_exit+0x88/0xb4)
[48902.259338] [] (do_group_exit+0x88/0xb4) from [] (get_signal_to_deliver+0x510/0x568)
[48902.269470] [] (get_signal_to_deliver+0x510/0x568) from [] (do_signal+0xc4/0x518)
[48902.278808] [] (do_signal+0xc4/0x518) from [] (do_notify_resume+0x18/0x50)
[48902.287567] [] (do_notify_resume+0x18/0x50) from [] (work_pending+0x24/0x28)

Conclusion is one possible suspect for corruption can be Audio.

Interrupts are disabled in ISR

Linux kernel Stack Memory