Efficient C for ARM processors – PART I

ARM is an architecture 32-bit RISC based. Which means you have to always load values to its internal registers before working with those values.

Also, it expects data access to be 32-bit aligned. Unaligned access in ARM aren't denied, but causes extra overhead on code execution.

On this first part of this series, we'll see that int variables (or any variation of 32-bit variables) are the best choice when declaring local variables in functions that'll run on ARM processor cores.

We assume you're familiar with C programming language and can read ARM assembly code.

Like said before, ARM processors have 32-bit registers and 32-bit data processing operations. Remember you have load values from memory into registers before working with them, this is because ARM is a RISC load/store architecture. Also, there are no arithmetic or logical instructions to handle values in memory directly.

ARMv4-based can load and store 8-, 16- and 32-bit data efficiently however, most data processing are 32-bit only. This is why you should avoid using char and short as local variable types.

To illustrate this behavior, we're gonna see the output of arm-linux-gcc for both cases (int and char as local variables):

 
short main (int *data){
   char    idx;
   int     sum = 0;
   for (idx = 0; idx < 64; idx++) {
      sum += data[idx];
   }
   return sum;
}

This compiles to:

 
000081c0 main:
    81c0:	e1a01000 	mov	r1, r0
    81c4:	e3a00000 	mov	r0, #0	; 0x0
    81c8:	e1a02000 	mov	r2, r0
    81cc:	e7923001 	ldr	r3, [r2, r1]
    81d0:	e0800003 	add	r0, r0, r3
    81d4:	e2822004 	add	r2, r2, #4	; 0x4
    81d8:	e3520c01 	cmp	r2, #256	; 0x100
    81dc:	1afffffa 	bne	81cc main+0xc
    81e0:	e1a00800 	mov	r0, r0, lsl #16
    81e4:	e1a00840 	mov	r0, r0, asr #16
    81e8:	e12fff1e 	bx	l
 
int main (short *data)
{
   unsigned int     idx;
   int                    sum = 0;
    for (idx = 0; idx < 64; idx++) {
      sum += *(data++);
   }
   return sum;
}

This compiles to:

 
000081c0 main:
    81c0:	e1a01000 	mov	r1, r0
    81c4:	e3a00000 	mov	r0, #0	; 0x0
    81c8:	e1a02000 	mov	r2, r0
    81cc:	e0d130f2 	ldrsh	r3, [r1], #2
    81d0:	e0800003 	add	r0, r0, r3
    81d4:	e2822001 	add	r2, r2, #1	; 0x1
    81d8:	e3520040 	cmp	r2, #64	; 0x40
    81dc:	1afffffa 	bne	81cc main+0xc
    81e0:	e12fff1e 	bx	l

It's safe to say the second version is faster just by looking at the assembly output. The first version has two extra mov instructions making the cast from int to short.

VN:F [1.9.0_1079]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.0_1079]
Rating: 0 (from 0 votes)
Efficient C for ARM processors - PART I, 5.0 out of 5 based on 1 rating

Leave a Reply