ARM is an architecture 32-bit RISC based. Which means you have to always load values to its internal registers before working with those values.
Also, it expects data access to be 32-bit aligned. Unaligned access in ARM aren't denied, but causes extra overhead on code execution.
On this first part of this series, we'll see that int variables (or any variation of 32-bit variables) are the best choice when declaring local variables in functions that'll run on ARM processor cores.
We assume you're familiar with C programming language and can read ARM assembly code.
Like said before, ARM processors have 32-bit registers and 32-bit data processing operations. Remember you have load values from memory into registers before working with them, this is because ARM is a RISC load/store architecture. Also, there are no arithmetic or logical instructions to handle values in memory directly.
ARMv4-based can load and store 8-, 16- and 32-bit data efficiently however, most data processing are 32-bit only. This is why you should avoid using char and short as local variable types.
To illustrate this behavior, we're gonna see the output of arm-linux-gcc for both cases (int and char as local variables):
short main (int *data){ char idx; int sum = 0; for (idx = 0; idx < 64; idx++) { sum += data[idx]; } return sum; }
This compiles to:
000081c0 main:
81c0: e1a01000 mov r1, r0
81c4: e3a00000 mov r0, #0 ; 0x0
81c8: e1a02000 mov r2, r0
81cc: e7923001 ldr r3, [r2, r1]
81d0: e0800003 add r0, r0, r3
81d4: e2822004 add r2, r2, #4 ; 0x4
81d8: e3520c01 cmp r2, #256 ; 0x100
81dc: 1afffffa bne 81cc main+0xc
81e0: e1a00800 mov r0, r0, lsl #16
81e4: e1a00840 mov r0, r0, asr #16
81e8: e12fff1e bx l
int main (short *data) { unsigned int idx; int sum = 0; for (idx = 0; idx < 64; idx++) { sum += *(data++); } return sum; }
This compiles to:
000081c0 main:
81c0: e1a01000 mov r1, r0
81c4: e3a00000 mov r0, #0 ; 0x0
81c8: e1a02000 mov r2, r0
81cc: e0d130f2 ldrsh r3, [r1], #2
81d0: e0800003 add r0, r0, r3
81d4: e2822001 add r2, r2, #1 ; 0x1
81d8: e3520040 cmp r2, #64 ; 0x40
81dc: 1afffffa bne 81cc main+0xc
81e0: e12fff1e bx l
It's safe to say the second version is faster just by looking at the assembly output. The first version has two extra mov instructions making the cast from int to short.