ARM Assembler
ARM processors are RISC (Reduced Instruction Set Computer) chips used widely in mobile phones and other embedded devices. The use of an ARM processor in the Raspberry Pi will promote the study of RISC processors in schools. The ARM registers and instruction sets are different from those we have described for x86 Intel processors. This section will compare ARM assembler with Intel syntax, so you should be familiar with the material in our in-line assembler tutorial. It is our intention to write assembler code for the Raspberry Pi, but because of its restricted availability we will start by testing our code in a simulator.
binutils-2.17, gcc-4.1.1-c-c++, newlib-1.14.0, insight-6.5, setup.exe [25.1MB].
A most useful reference for the code was the Tonc Whirlwind Tour of ARM Assembly (for Gameboy Advance - GBA).
You can choose how you use the registers, but you may need to preserve the contents of certain registers by pushing them before use then popping them afterwards. By convention, registers r0 to r3 and r11 may legitimately be "corrupted" by a routine. The following table shows conventional uses of registers.
Name | Alternative name | Description |
r0 - r3 | Used to hold arguments for procedures and as scratch registers (for temporary storage). R0 is used to return the result of a function. | |
r4 - r9 | v1 - v6 | General purpose or storage of variables |
r10 | sl, v7 | Stack limit pointer, used by assemblers for stack checking when this option is selected by the user. |
r11 | fp, v8 | Frame pointer. From Jack Crenshaw's section on local variables when translating procedures, "Formal parameters are addressed as positive offsets from the frame pointer, and locals as negative offsets". |
r12 | ip | Intra-Procedure-call scratch. Used with r0 - r3 for temporary storage and original contents do not need to be preserved. |
r13 | sp | Stack pointer |
r14 | lr | Link register, holding the return address from a function |
r15 | pc | Program counter, holding the address of the next instruction |
The instruction set is summarised in a quick reference card. We tabulate below selected instructions that you are most likely to use at first, together with their equivalents in Intel syntax.
ARM Mnemonic | Intel Mnemonic | Function |
ADD | ADD |
Addition |
SUB | SUB |
Subtraction |
RSB | Reverse subtraction | |
MUL | IMUL | Multiplication |
AND | AND | Bitwise AND |
ORR | OR | Bitwise OR |
EOR | XOR | Bitwise exclusive OR |
MVN | NOT | Bitwise NOT |
TST | TEST | Test (performs bitwise AND and sets flags according to the result) |
CMP | CMP | Compare |
B | JMP | Unconditional jump/branch |
BEQ | JE | Jump/branch if equal |
PUSH | PUSH | Push onto stack |
POP | POP | Pop from stack |
MOV |
MOV | Transfer (copy) |
MOVEQ | CMOVE | Copy if equal |
ADR | LEA | Load address |
BL | CALL/RET | Call a subroutine then return |
The following bullet points highlight features of ARM assembly language that are strikingly different from Intel syntax.
- The result register (the first register following the mnemonic) can be different from the two operands. For example, the instruction ADD r0, r1, r2 will add r2 to r1 and store the result in r0.
- The suffix "S" added to mnemonics such as MOV makes the operation affect flags.
- A conditional suffix such as EQ (if equal), NE (if not equal), GT (if greater than), GE (if greater than or equal) can be added to most mnemonics.
- If you need to operate on data in a memory location you must load it into a register first. (ARM processors have a load/store architecture).
- For storing (mnemonic STR) the source precedes the destination.
- When loading, you can load directly from memory (e.g. ldr r1, num1), but when saving, you must put the address in the destination register and indirect address using square brackets (e.g. str r0, [r3]).
- There are restrictions on the immediate values that you can load directly with MOV, but you can use the syntax ldr rx, =immediate value e.g. ldr r0, =625.
- You can apply shifts to the second operand "cheaply" as part of an operation.
- Only recent ARM processors handle the division operator. You need to branch to a library routine instead. You can use other routines such as puts for outputting a string and printf for outputting a string with formatted parameters.
Command at Cygwin prompt | Result |
cd C:\ARM | Changes current directory to C:\ARM |
arm-elf-gcc -o temp.elf temp.s | Assembles temp.s |
arm-elf-run temp.elf | Simulates the running of temp.elf |
arm-elf-gcc -S -o div.s div.c | Compiles div.c as far as the ARM assembler div.s (showing how compiler uses registers and routines) |
scan_format: .asciz "%d" out_format: .asciz "Sum: %d Difference: %d Product: %d\n" instr1: .asciz "Enter first integer." instr2: .asciz "Enter second integer." .align 2 num1: .word 0 num2: .word 0 .global main main: push {ip, lr} @Used with pop at end of main, allowing program to end. ldr r0, =instr1 bl puts ldr r1, =num1 ldr r0, =scan_format bl scanf ldr r0, =instr2 bl puts ldr r1, =num2 ldr r0, =scan_format bl scanf ldr r4, num1 ldr r5, num2 add r1, r4, r5 sub r2, r4, r5 mul r3, r4, r5 ldr r0, =out_format bl printf pop {ip, pc} @Used with push at start of Main, allowing program to end.
The above code would not assemble on the Raspberry Pi. The amended code below assembles and runs on the simulator and on the Raspberry Pi. Instead of loading a variable directly with ldr r4, num1, we need to load the address of the variable into another register (ldr r6 =num1) then use indirect addressing (ldr r4, [r6]). We have added .data before the data declarations and .text to mark the start of the code section.
Commands for the Raspberry Pi
You can transfer files between a PC and the Pi without networking by using either the SD Card or a USB memory stick. The boot directory /boot is visible in windows and is useful for transferring files between a computer running Windows and the Pi using the SD card.
- cd /usr/bin changes the current directory to the one containing gcc.
- gcc -o ~/temp /boot/temp.s compiles the assembler file temp.s in the boot directory to the executable temp in your home directory.
- cd Changes the current directory to your home directory.
- ./temp executes the file temp in the current directory.
This screenshot shows a way of compiling without changing the current directory. The 80 KB program scrot is for capturing screenshots. You can install it on your Pi with the command sudo apt-get install scrot. Our screenshot is saved to a USB memory stick for cropping on a PC. (We installed usbmount with the command sudo apt-get install usbmount so that it detects our flash drive, mounts it to /media/usb then unmounts it upon removal).

Screenshot of our first Pi assembler program in action
ARM Assembler Code for the Raspberry Pi
.data .align 2 scan_format: .asciz "%d" .align 2 out_format: .asciz "Sum: %d Difference: %d Product: %d\n" .align 2 instr1: .asciz "Enter first integer." instr2: .asciz "Enter second integer." .align 2 num1: .word 0 num2: .word 0 .text .global main main: push {ip, lr} @Used with pop at end of main, allowing program to end. ldr r0, =instr1 bl puts ldr r1, =num1 ldr r0, =scan_format bl scanf ldr r0, =instr2 bl puts ldr r1, =num2 ldr r0, =scan_format bl scanf ldr r6, =num1 @load address of num1 into r6 ldr r4, [r6] @load value of num1 into r4 ldr r6, =num2 ldr r5, [r6] add r1, r4, r5 sub r2, r4, r5 mul r3, r4, r5 ldr r0, =out_format bl printf pop {ip, pc} @Used with push at start of Main, allowing program to end.
The next version (for the simulator and Pi) shows how you can save values to memory. The arm processor is well-supplied with registers for storing variables, but if you run out of registers you can load the address of a variable into a register (r3 in the code below) and then use indirect addressing to save data. The use of memory locations instead of registers will slow down the execution.
.data .align 2 scan_format: .asciz "%d" .align 2 out_format: .asciz "Sum: %d Difference: %d Product: %d\n" .align 2 instr1: .asciz "Enter first integer." instr2: .asciz "Enter second integer." .align 2 num1: .word 0 num2: .word 0 sum: .word 0 difference: .word 0 product: .word 0 .text .global main main: push {ip, lr} ldr r0, =instr1 bl puts ldr r1, =num1 ldr r0, =scan_format bl scanf ldr r0, =instr2 bl puts ldr r1, =num2 ldr r0, =scan_format bl scanf ldr r6, =num1 ldr r1, [r6] ldr r6, =num2 ldr r2, [r6] add r0, r1, r2 @store r0 in sum ldr r3, =sum str r0, [r3] @The source r0 precedes the destination sub r0, r1, r2 ldr r3, =difference str r0, [r3] mul r0, r1, r2 ldr r3, =product str r0, [r3] ldr r6, =sum ldr r1, [r6] ldr r6, =difference ldr r2, [r6] ldr r6, =product ldr r3, [r6] ldr r0, =out_format bl printf pop {ip, pc}
See how ARM assembler code for the simulator is generated from TINY source code using programs TINY11ELF and TINY14ELF. These programs generate assembler code that arm-elf-gcc assembles and links to create .elf executables that run in the arm-elf-run simulator. We need to make minor modifications to them so that their output code will assemble using the gcc translator on the Raspberry Pi.
See also examples of using ARM in-line assembler in Lazarus on the Raspberry Pi.