Linux - Developed by hackers for hackers: 2013

Wednesday, December 11, 2013

using printascii and printhex8 for debugging during very early init

Situation:
Porting one of brcm board to kernel 3.13-rc3

Problem:
kernel was hanging at one do_one_initcall
To figure out which init call, put prints in do_one_initcal function

diff --git a/init/main.c b/init/main.c
index febc511..dd9f5c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -88,6 +88,8 @@
#include
#endif

+extern void printascii(const char *);
+extern void printhex8(unsigned int);
static int kernel_init(void *);

@@ -693,6 +701,9 @@ int __init_or_module do_one_initcall(initcall_t fn)

int ret;
char msgbuf[64];

+ printascii("func:");
+ printhex8(fn);
+ printascii("\n");
if (initcall_debug)

Output:
func:c0415524

System.map shows
c0415524 t customize_machine

arch/arm/kernel/setup.c:782:arch_initcall(customize_machine);
Function was calling machine_init.

Fixed in machine_init. It was an smc call to init L2 cache controller.

Friday, November 22, 2013

SMP bit affects L1 cache operation on Cortex-A9 MPCore

Q) We have a region which innercacheble with Write back write allocate and SMP bit is set. So if we update this region with some data it's updating in cache, main memory is not updated, which is as expected.

But if we clear the SMP bit and update this region then the data is getting updated to external memory not getting updated in Cache. Please not all these operation is from core0, core1 is in WFI during this and it's SMP bit is not set.

Ans) This information is contained in the ARM Architecture Reference Manual with the TRM for the Cortex-A9 MPCore and the Cortex-A9 TRM offering additional information.

When an area of memory is declared as Normal, Write-back and shareable with the processor operating in SMP mode, the data will be cached. At this point the SCU will effectively ensure that the data shared between the processors in SMP mode is managed to maintain coherency.

However, when you leave SMP the shareable attribute tells the MMU that this data is shared amongst devices and this overrides the cache attributes and is treated as non-cacheable. This is explained in section 4.3.10, Auxiliary Control Register of the Cortex-A9 TRM. This is consistent with the behaviour that you're observing.

Thursday, November 21, 2013

IIT courseware

http://nptel.iitm.ac.in/

Wednesday, November 20, 2013

GNU assembler - Linker Scripts

GNU assembler - Linker Scripts,
http://www.bravegnu.org/gnu-eprog/index.html

Arm gcc inline cook book,
http://www.ethernut.de/en/documents/arm-inline-asm.html

Tuesday, November 19, 2013

Work queue: What will happen if same work is queued twice?

> Hi,
>
> I have a single work queue, on which I have scheduled a worker function [using queue_work(wq, fn)] in interrupt context.
>
> I get the interrupt twice before the work queue gets a chance to run, and hence the same function will get queued twice (with different private context - arguments etc) which is fine and expected.

You got it wrong here.

bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work)
{
bool ret = false;
unsigned long flags;

local_irq_save(flags);

if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
__queue_work(cpu, wq, work);
ret = true;
}

local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(queue_work_on);

If you look at function queue_work_on, the function returns
immediately if WORK_STRUCT_PENDING_BIT was already set.
So in effect your second call to queue_work will just return.

HTH

Thanks,
Arun

Friday, November 15, 2013

Ram Dump - Linux System

Ram Dump
=========
Used for forensic analysis of a crashed or freeze kernel/system.
As the name points, its the dumping of complete ram contents for further investigation.

Divided into three sections
===================
The whole design is divided into three sections based on how ramdump is entered, transferred and stored.
1) Support in kernel to enter into ramdump mode when kernel crashes or freezes.
2) Support in boot-loader to detect ramdump, wait for a host tool to initiate ramdump and send the data to host.
3) A host tool which can receive data from device and save it on host PC file system.

Boot-loader Support
===============
During booting, boot-loader has to decide whether it has to enter in to ramdump mode or to boot linux normally. This is done by monitoring a magic value in any predefined location(typically SRAM register or any other, value should persists across reboot or reset). There are two magic numbers for example 0x12345678(for crash scenario) and 0x87654321(for freeze scenario). If the SRAM register is having any of the two magic number, the booloader will enter into ramdump mode and wait for the host side application to initiate upload. And if SRAM is not having any of these magic numbers, the bootloader will proceed with normal booting.

Kernel Support
===============
Kernel has to handle panic or freeze scenario and manage the magic numbers to be written in SRAM register.

Ramdump driver init
-----------------------
Write magic number for freeze scenario(MAGIC_FREEZE) in SRAM register.
Register for reboot and panic kernel notifiers.

During a freeze
-----------------
Kernel cannot do much here as the kernel is freezed. User has to press Power ON key for 10 sec, on which device will reset by PMU. While booting back, boot-loader sees MAGIC_FREEZE in SRAM register(which was written during init) and enters to ramdump mode.

During a crash
------------------
Panic notifier registered from the ramdump driver will be called by kernel. Ramdump driver has to do few things.
a) Invoke cache sycn api to make sure all the content the ram is coherent.(flush_cache_all(), outer_flush_all())
b) save core registers to ram
c) save MMU registers
d) write MAGIC_CRASH to SRAM regiter.
Then do a reboot. Now boot-loader finds the MAGIC_CRASH in SRAM register and goes to ramdump mode.

During a normal reboot
-------------------------
Device should not enter in to ramdump mode now.
Reboot notifier registered from the ramdump driver will be called by kernel. Ramdump driver clears MAGIC_FREEZE( written during the init ) and returns.

Saving pt_reg is done in two places,
File: arch/arm/mm/fault.c call the ramdump driver with pt_reg as argument to save for crashed cpu.
File: arch/arm/kernel/smp.c call ramdump driver with pt_reg from IPI_CPU_STOP to save the context of other cpus.

HOST side tool
============
Host side tool receive the data and saves to file system. We exploited fastboot to upload the data.

Design considerations
================
After a crash, device reboots and bootloader will be loaded again to RAM, which corrupts some part of the ram. This is something which cannot be avoided, unless ram upload is implemented in kernel itself. Another approach is to load the bootloader to a location which is of not much importance. For example, load bootloader to framebuffer location or some memory used by video subsystem or some carveout memory in the system.

U-boot relocates itself to the end of the ram. This behavior causes harm to the contents of the memory. So this should be taken care off.

Analyzing ramdump
==============
Ramdump can be analysed using T32 simulator or use amazing crash utility.
You need the corresponding vmlinux.

Sparse/unsparse images

For sparse and unsparse android images,

http://forum.xda-developers.com/showthread.php?t=1081239

Monday, October 28, 2013

Notes about Virtual memory and Highmem

On a 1:3 VM split, kernel has 1GB of virtual space. This 1GB kenel VM is split accross I/O mapped registers, Interrupt vector table, VMALLOC area and your system memory(ie RAM).

For example let's say,
I/O mapped register + VMALLOC area + vector table = 200MB.
So virtual space left for System memory(DDR) = 1024MB - 200MB = 824MB.
Now consider you have 1GB of System memory(RAM). How will you use it?

Two ways
1) You can change VM split(user space:kernel space) to 2:2, or
2) Use High Mem.

If using 2:2 VM split, then your kernel space is 2GB and you have plenty of space to map system memory.

If using High mem, you will have a temporary mapping for High mem area. Virtual address for this temporary mapping depends on usage. Most of the allocation are for user space. Then virtual address will be user space virtual address. Kernel uses kmap function to map high mem area. In that case virtual address will be used from vmalloc area.

High mem, in our case will be 200MB( 1024MB - 824 MB)
where as:
1024 MB is total ram size
824 MB is already mapped to kernel virtual space as low mem.

Permanent mapping also is available from high mem. But this VM space will be limited in size(eg 2MB). This is know as pkmappings.

In general,
Low mem is the ram area which have permanent kernel address mapping.
High mem is mapped depending on the requirement, dynamically.

For more information have a look at page 253 of Linux kernel developement 3rd edition- Robert Love.

All this complexity will go into vents when 64 bit architeure comes because,
2^64 = 16ExtraByte(EB)
You have 16EB of virtual space compared to 4GB of virtual space for 32 bit architecure.

When Virtual memory is enabled, every memory your cpu generates is a virtual address. Which has to be converted to physical memory by MMU. MMU uses page tables(setup by kernel) to accomplish this task.

So when ever a kernel thread runs, it uses the mm pointer of the previous task which was preempted.
All the process has its own page tables, in which the page table entries pointing to the last 1GB address space(which is for kernel space) is same for all process. So kernel thread uses previous process mm to access information relating to kernel space memory.

Wednesday, October 9, 2013

git merge

Here is the procedure while porting kernel from 3.4.5 to 3.10 kernel.

$cd /projects/mobcom_andrwks_ext8/users/arunks/kernel-porting/aosp/

Clone the AOSP kernel,

$git clone https://android.googlesource.com/kernel/common

Now move into 3.4.5 kernel,

$git remote add porting-3.10 /projects/mobcom_andrwks_ext8/users/arunks/kernel-porting/aosp/common

$git fetch porting-3.10

/*Set the renamelimit to make merge complete. Otherwise git merge will

stop rename resolution when default value of renamelimit is reached.*/
$git config merge.renameLimit 999999

Create a local branch with name android-3.10 tracking remotes/origin/experimental/android-3.10

$git merge -s recursive android-3.10

Now resolve all merge conflicts and git add or rm according.

Then do a git commit.

Wednesday, April 17, 2013

Using I2C FIFO

FIFO will enhance the performance of i2c,
Mutli byte transfer.
If slave device is having something like 32 bit of register address and data.

Below is an example of reading a i2c slave register.
Register address and data is 32 bits wide.

When FIFO is disabled it took 2.47 ms to complete the entire transaction compared to 391 us when FIFO enabled.

Tuesday, April 16, 2013

I2C Clock Stretching

Clock stretching is a phenomenon where the I2C slave pull the SCL line low on the 9th clock of every I2C data transfer (before the ACK stage). This is done to inform the master that client is not ready for another transfer.

He is a classic example,

The exact verbatim of clk stretching from the i2c spec is as follows,

“After each acknowledge (A) or not-acknowledge bit (A) the active master disables its current-source pull-up circuit. This enables other devices to delay the serial transfer by stretching the LOW period of the SCLH signal. The active master re-enables its current-source pull-up circuit again when all devices are released and the SCLH signal reaches a HIGH level,”

Thursday, March 21, 2013

Debuggin Oops using Crash Utility

Crash log snippet of binder crash,

Before you start make sure the ramdump and vmlinux matches by using strings command,
strings vmlinux | grep "Linux version"

[49081.080352] Unable to handle kernel NULL pointer dereference at virtual address 00000030

[49081.088500] pgd = d748c000

[49081.091186] [00000030] *pgd=00000000

[49081.094787] Internal error: Oops: 5 [#1] PREEMPT SMP ARM

[49081.100067] kona_fb: die notifier invoked

[49081.104064] kona_display_crash_image:359 image_idx=4

[49081.109863] post_async:540 AXIPV has received the same buffer which is used

[49081.116790] post_async:541 It is likely that we see display tearing

[49081.123046] post_async:542 Preventing a potential lock-up by signalling

[49081.129669] Modules linked in: [last unloaded: bcmdhd_4330]

[49081.135253] CPU: 1 Tainted: G W (3.4.5+ #1)

[49081.140655] PC is at binder_get_node+0x8/0x34

[49081.144989] LR is at binder_thread_write+0x147c/0x239c

[49081.150115] pc : [] lr : [] psr: 20000053

[49081.150146] sp : f43a3e10 ip : f8f004fc fp : 00000000

[49081.161560] r10: f8f004c4 r9 : 73622a85 r8 : 00000064

[49081.166778] r7 : f43a2000 r6 : f8f00460 r5 : 411e36f8 r4 : c0a69260

[49081.173278] r3 : 00000010 r2 : 411e3858 r1 : 411e36f8 r0 : f32c7a40

[49081.179779] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment user

[49081.186981] Control: 10c53c7d Table: 9948c04a DAC: 00000015

Disassemble of binder_get_node,

crash> dis binder_get_node

0xc03dc4a8 : ldr r3, [r0, #12]

0xc03dc4ac : b 0xc03dc4cc

0xc03dc4b0 : ldr r2, [r3, #32]

0xc03dc4b4 : sub r0, r3, #16

0xc03dc4b8 : cmp r1, r2

0xc03dc4bc : ldrcc r3, [r3, #8]

0xc03dc4c0 : bcc 0xc03dc4cc

0xc03dc4c4 : bxls lr

0xc03dc4c8 : ldr r3, [r3, #4]

0xc03dc4cc : cmp r3, #0

0xc03dc4d0 : bne 0xc03dc4b0

0xc03dc4d4 : mov r0, r3

0xc03dc4d8 : bx lr

crash>

ldr r3,[r0, #12]

From crash log above, r0 = f32c7a40

To confirm read memory 0xf32c7a40 + 0xc using crash,

crash> rd f32c7a4c

f32c7a4c: 00008000 ....

crash>

But r3 is 0x10. Which is not matching with 0x8000. So this means r0 has changed (since ldr r3,[r0, #12]). Proves that crash not happened in first instance. It had looped for some time

C code binder_get_node,

static struct binder_node *binder_get_node(struct binder_proc *proc,

void __user *ptr)

{

struct rb_node *n = proc->nodes.rb_node;

struct binder_node *node;

while (n) {

node = rb_entry(n, struct binder_node, rb_node);

if (ptr < node->ptr)

n = n->rb_left;

else if (ptr > node->ptr)

n = n->rb_right;

else

return node;

}

return NULL;

}

Back trace from crash log,

[49081.971740] [] (binder_get_node+0x8/0x34) from [] (binder_thread_write+0x147c/0x239c)

[49081.981262] [] (binder_thread_write+0x147c/0x239c) from [] (binder_ioctl+0x37c/0x974)

[49081.990814] [] (binder_ioctl+0x37c/0x974) from [] (do_vfs_ioctl+0x4d8/0x54c)

[49081.999572] [] (do_vfs_ioctl+0x4d8/0x54c) from [] (sys_ioctl+0x34/0x54)

[49082.007904] [] (sys_ioctl+0x34/0x54) from [] (ret_fast_syscall+0x0/0x48)

Since the back trace was showing a sys_ioctl we iterated on stack to pick up the struct file * associated with.

Here is the hit,

crash> struct file f37cbef0

struct file {

f_u = {

fu_list = {

next = 0xf37cb550,

prev = 0xf2a0def0

fu_rcuhead = {

next = 0xf37cb550,

func = 0xf2a0def0

}

f_path = {

mnt = 0xf5015e58,

dentry = 0xf3cc5d08

f_op = 0xc064dc0c,

----sniped---

f_version = 0,

private_data = 0xf2e9ca00,

f_ep_links = {

next = 0xf37cbf74,

prev = 0xf37cbf74

f_tfile_llink = {

next = 0xf37cbf7c,

prev = 0xf37cbf7c

f_mapping = 0xf4090ad0

}

f_op symbol says it is binder_fops,

crash> sym 0xc064dc0c

c064dc0c (r) binder_fops

crash>

From code(drivers/staging/android/binder.c +3012) we figured out that private_data is binder_proc

crash> struct binder_proc 0xf2e9ca00

struct binder_proc {

proc_node = {

next = 0xf3514200,

pprev = 0xc0a69c84

threads = {

rb_node = 0xf2a10c84

nodes = {

rb_node = 0xf2d1a7d0

refs_by_desc = {

rb_node = 0xf3126284

refs_by_node = {

From the analysis of the code, we understand that crash happened while traversing the rb tree.

So tried to parse the rb tree using crash,

crash> tree -t rbtree -N 0xf2d1a7d0 -p

f2d1a7d0

position: root

f2a00ad0

position: root/l

f2a90e50

position: root/r

f32c7a50

position: root/r/r

position: root/r/r/l

tree: invalid kernel virtual address: 18 type: "rb_node rb_left"

crash>

This shows that rb tress has a corrupted address to next left node.

0x10 which is the same value in r3 during crash.

And crash happened because of address 0x30.

[49081.080352] Unable to handle kernel NULL pointer dereference at virtual address 00000030

This explains why crashed.

0xc03dc4b0 : ldr r2, [r3, #32]

R3 + #32 = 0x10 + 0x20 = 0x30

Dumping the corrupted struct rb_node from memory,

crash> struct rb_node f32c7a50

struct rb_node {

rb_parent_color = 44100,

rb_right = 0x2,

rb_left = 0x10

}

crash>

This shows the rb tree is corrupted. Because rb_left and rb_right should have a valid kernel address or 0.

Dumping the memory from f32c7a00, 100 elements to inspect the memory,

crash> rd f32c7a00 100

f32c7a00: f32c79fc 00000000 00000000 00000000 .y,.............

f32c7a10: 00000000 00000000 00000000 00000000 ................

f32c7a20: 00000000 00000000 00000000 00000000 ................

f32c7a30: 00000000 00000000 00000000 00000000 ................

f32c7a40: 01bb5e6b c03eed84 f4727600 00008000 k^....>..vr.....

f32c7a50: 0000ac44 00000002 00000010 f5c10000 D...............

f32c7a60: 00010000 b7c10000 00000000 00000000 ................

f32c7a70: 411e3858 411e3840 00001fe5 00000002 X8.A@8.A........

f32c7a80: f32c7a7c 00000001 00000000 00000000 |z,.............

f32c7a90: 00000000 00000000 00000000 00000000 ................

f32c7aa0: 00000000 00000000 00000000 00000000 ................

f32c7ab0: 00000000 00000000 00000000 00000000 ................

f32c7ac0: 00000000 00000000 f32c7ac8 f32c7ac8 .........z,..z,.

f32c7ad0: 00000000 00000000 00000010 00000001 ................

f32c7ae0: 00000001 00000000 00000001 00000001 ................

f32c7af0: 00000000 00000000 00000000 f3faf330 ............0...

f32c7b00: f3a6c000 00000000 00000000 00000000 ................

f32c7b10: 00000000 00000000 00000000 00000000 ................

f32c7b20: 00000000 00000000 00000000 00000000 ................

f32c7b30: 00000000 00000000 00000000 00000000 ................

f32c7b40: 006aefa7 f32c7b44 f32c7b44 00000003 ..j.D{,.D{,.....

f32c7b50: f2bf1450 f3bef7d0 00000000 00000000 P...............

f32c7b60: f3854bdc 00000001 00000000 00000000 .K..............

f32c7b70: 54d4c670 54d3f828 00001fe5 f32c7b7c p..T(..T....|{,.

f32c7b80: f32c7b7c 00000000 00000000 00000000 |{,.............

Dumping with symbols,

crash> rd -s f32c7a00 100

f32c7a00: __crc_v4l2_norm_to_name+482177 00000000 00000000 00000000

f32c7a10: 00000000 00000000 00000000 00000000

f32c7a20: 00000000 00000000 00000000 00000000

f32c7a30: 00000000 00000000 00000000 00000000

f32c7a40: 01bb5e6b AUDIO_DRIVER_InterruptPeriodCB __crc_bmp18x_disable+1706299 00008000

f32c7a50: 0000ac44 00000002 00000010 __crc_generic_segment_checks+42732

f32c7a60: 00010000 b7c10000 00000000 00000000

f32c7a70: 411e3858 411e3840 00001fe5 00000002

f32c7a80: __crc_v4l2_norm_to_name+482305 00000001 00000000 00000000

f32c7a90: 00000000 00000000 00000000 00000000

f32c7aa0: 00000000 00000000 00000000 00000000

f32c7ab0: 00000000 00000000 00000000 00000000

f32c7ac0: 00000000 00000000 __crc_v4l2_norm_to_name+482381 __crc_v4l2_norm_to_name+482381

f32c7ad0: 00000000 00000000 00000010 00000001

f32c7ae0: 00000001 00000000 00000001 00000001

f32c7af0: 00000000 00000000 00000000 __crc_qdisc_class_hash_grow+482810

f32c7b00: __crc_rt_mutex_trylock+115850 00000000 00000000 00000000

f32c7b10: 00000000 00000000 00000000 00000000

f32c7b20: 00000000 00000000 00000000 00000000

f32c7b30: 00000000 00000000 00000000 00000000

f32c7b40: 006aefa7 __crc_v4l2_norm_to_name+482505 __crc_v4l2_norm_to_name+482505 00000003

f32c7b50: __crc_usb_get_descriptor+260561 __crc_bdi_set_max_ratio+149137 00000000 00000000

f32c7b60: __crc_snd_interval_ratnum+776842 00000001 00000000 00000000

f32c7b70: 54d4c670 54d3f828 00001fe5 __crc_v4l2_norm_to_name+482561

f32c7b80: __crc_v4l2_norm_to_name+482561 00000000 00000000 00000000

Somebody has over written from f32c7a40

This causes rb_tree corruption.

Surprisingly there is an audio function address in the corrupted memory as highlighted above.

From the kernel logs we can see audio and rpc errors just before the crash.

Eg:

[48902.141143] WARNING: at /build/REL_sdb-common-android-jb-4.2.2_428843/android_hawaii_garnet_edn010/kernel/hawaii/3.4.5/sound/caph_hawaii/caph_pcm.c:248 PcmHwFree+0x8c/0xdc()

[48902.157012] ERROR timeout waiting for STOP REQ.t=30000 ret=0

[48902.162719] Modules linked in: [last unloaded: bcmdhd_4330]

[48902.168579] [] (unwind_backtrace+0x0/0x11c) from [] (warn_slowpath_common+0x4c/0x64)

[48902.178192] [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_fmt+0x2c/0x3c)

[48902.187957] [] (warn_slowpath_fmt+0x2c/0x3c) from [] (PcmHwFree+0x8c/0xdc)

[48902.196716] [] (PcmHwFree+0x8c/0xdc) from [] (snd_pcm_release_substream+0x44/0xa4)

[48902.206512] [] (snd_pcm_release_substream+0x44/0xa4) from [] (snd_pcm_release+0x28/0x6c)

[48902.216735] [] (snd_pcm_release+0x28/0x6c) from [] (fput+0x10c/0x214)

[48902.224914] [] (fput+0x10c/0x214) from [] (filp_close+0x70/0x7c)

[48902.232879] [] (filp_close+0x70/0x7c) from [] (put_files_struct+0x90/0xec)

[48902.241668] [] (put_files_struct+0x90/0xec) from [] (do_exit+0x280/0x7e0)

[48902.250671] [] (do_exit+0x280/0x7e0) from [] (do_group_exit+0x88/0xb4)

[48902.259338] [] (do_group_exit+0x88/0xb4) from [] (get_signal_to_deliver+0x510/0x568)

[48902.269470] [] (get_signal_to_deliver+0x510/0x568) from [] (do_signal+0xc4/0x518)

[48902.278808] [] (do_signal+0xc4/0x518) from [] (do_notify_resume+0x18/0x50)

[48902.287567] [] (do_notify_resume+0x18/0x50) from [] (work_pending+0x24/0x28)

Conclusion is one possible suspect for corruption can be Audio.

Linux - Developed by hackers for hackers

Wednesday, December 11, 2013

using printascii and printhex8 for debugging during very early init

Friday, November 22, 2013

SMP bit affects L1 cache operation on Cortex-A9 MPCore

Thursday, November 21, 2013

IIT courseware

Wednesday, November 20, 2013

GNU assembler - Linker Scripts

Tuesday, November 19, 2013

Work queue: What will happen if same work is queued twice?

Friday, November 15, 2013

Ram Dump - Linux System

Sparse/unsparse images

Monday, October 28, 2013

Notes about Virtual memory and Highmem

Wednesday, October 9, 2013

git merge

Wednesday, April 17, 2013

Using I2C FIFO

Tuesday, April 16, 2013

I2C Clock Stretching

Thursday, March 21, 2013

Debuggin Oops using Crash Utility

About Me

Blog Archive