Learn Multi platform ARM Assembly Programming... For the Future!

We've covered a wide variety of chips in these tutorials, but now it's time to look at one of the most modern... the ARM

Powering everything from the Gameboy Advance to the Nitendo Switch and Iphone, the arm is NOT a dated 'Classic' CPU... it's not even a system with 8 bit 'legacy roots' like the 8086...

The ARM is 32 bit CPU from the ground up, designed around RISC principles and bytecode structure, it's highly optimized for low power situations - the arm is widely believed to be the future of computing.

In this series, we'll take a look at the ARM CPU on the GBA and as always, learn about the CPU from the ground up!

If you want to learn ARM get the Cheatsheet! it has all the ARM7 commands, it covers the commands, and options like Bitshifts and conditions as well as the bytecode structure of the commands!

We'll be using the excellent VASM for our assembly in these tutorials... VASM is an assembler which supports Z80, 6502, 68000, ARM and many more, and also supports multiple syntax schemes...

You can get the source and documentation for VASM from the official website HERE

Generations and Early uses:

Cpu	Instruction set	System
ARM2	Arm v2	Acorn Archimedes
ARM60	Arm v3	3D0 (12 Mhz)
ARM7TDMI	ARMv4T	GBA (16.78)

Useful Documents:

ARM � DeveloperSuite Assembler Guide Version 1.2

Early ARM manual

ARM7TDMI - Technical Reference Manual

Platforms covered in these tutorials
Gameboy Advance
Risc OS

ChibiAkumas Tutorials

What is the ARM and what are 32 'bits' You can skip this if you know about binary and Hex (This is a copy of the same section in the Z80 tutorial)
The ARM is a 32-Bit processor with a 32 bit Address bus!...
What's a bit... well, one 'Bit' can be 1 or 0
four bits make a Nibble (0-15)
two nibbles (8 bits) make a byte (0-255)
two bytes (16 bits) make a word (0-65535)

And what is 65535? well that's 64 kilobytes ... in computers Kilo is 1024, because 2^10 = 1024

With the ARM we actually have some serious memory resources available to us, both in RAM or ROM!

if you're looking to develop serious games or software, you probably want to use C++, but looking at assembly lets us see how the hardware really works, and that's the point of these tutorials!

Numbers in Assembly can be represented in different ways.
A 'Nibble' (half a byte) can be represented as Binary (0000-1111) , Decimal (0-15) or Hexadecimal (0-F)... unfortunately, you'll need to learn all three for programming!

Also a letter can be a number... Capital 'A' is stored in the computer as number 65!

Think of Hexadecimal as being the number system invented by someone wit h 15 fingers, ABCDEF are just numbers above 9!
Decimal is just the same, it only has 1 and 0.

In this guide, Binary will shown with a % symbol... eg %11001100 ... hexadecimal will be shown with & eg.. &FF.

Assemblers will use a symbol to denote a hexadecimal number, some use $FF or #FF or even 0x, but this guide uses & - as this is how hexadecimal is represented in CPC basic
All the code in this tutorial is designed for compiling with WinApe's assembler - if you're using something else you may need to change a few things!
But remember, whatever compiler you use, while the text based source code may need to be slightly different, the compiled "BYTES' will be the same!

Decimal	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	...	255
Binary	0000	0001	0010	0011	0100	0101	0110	0111	1000	1001	1010	1011	1100	1101	1110	1111		11111111
Hexadecimal	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F		FF

Another way to think of binary is think what each digit is 'Worth' ... each digit in a number has it's own value... lets take a look at %11001100 in detail and add up it's total

Bit position	7	6	5	4	3	2	1	0
Digit Value (D)	128	64	32	16	8	4	2	1
Our number (N)	1	1	0	0	1	1	0	0
D x N	128	64	0	0	8	4	0	0
128+64+8+4= 204 So %11001100 = 204 !

If a binary number is small, it may be shown as %11 ... this is the same as %00000011
Also notice in the chart above, each bit has a number, the bit on the far right is no 0, and the far left is 7... don't worry about it now, but you will need it one day!

If you ever get confused, look at Windows Calculator, Switch to 'Programmer Mode' and it has binary and Hexadecimal view, so you can change numbers from one form to another!
If you're an Excel fan, Look up the functions DEC2BIN and DEC2HEX... Excel has all the commands to you need to convert one thing to the other!

But wait! I said a Byte could go from 0-255 before, well what happens if you add 1 to 255? Well it overflows, and goes back to 0!... The same happens if we add 2 to 254... if we add 2 to 255, we will end up with 1
this is actually usefull, as if we want to subtract a number, we can use this to work out what number to add to get the effect we want

Negative number	-1	-2	-3	-5	-10	-20	-50	-254	-255
Equivalent Byte value	255	254	253	251	246	236	206	2	1
Equivalent Hex Byte Value	FF	FE	FD	FB	F6	EC	CE	2	1

All these number types can be confusing, but don't worry! Your Assembler will do the work for you!
You can type %11111111 , &FF , 255 or -1 ... but the assembler knows these are all the same thing! Type whatever you prefer in your ode and the assembler will work out what that means and put the right data in the compiled code!

The ARM-32 Registers

For the purposes of the normal programmer in "User Mode" the ARM has 15 registers. R0-R12 are free for us to do whatever we want, R13 is the Stack Pointer (also addressable as SP), R15 is the Program Counter (PC)

R14 may be surprising to those familiar with other CPUs, when we call a subroutine (With BL - Branch and Link) the return address is not pushed onto the stack, instead it's moved into R14/LR... to return from the subroutine we need to move the R14/LR register into R15/PC.

This poses a problem, as nesting subroutines will lose the return value, if this is needed, the best solution is to simply push R14/LR onto the stack at the start of a sub, and pop PC/R15 off the stack at the end.

In the ARM2, the flags were stored in unused bits of the PC Register (The top 6, and bottom 2 bits), with the ARM3+ a register called the CSPR is the main flags register.

The arm technically has 27 general purpose 32 bit registers, but all but 16 are hidden from the user...

There is also SPSR reigsters which is the flags used during interrupts, you'll not need to worry about these.

Main Registers:

	32 Bit Registers	Use cases
R0	R0
R1	R1
R2	R2
R3	R3
R4	R4
R5	R5
R6	R6
R7	R7
R8	R8
R9	R9
R10	R10
R11	R11 / FP	Frame Pointer (Optional)
R12	R12 / IP	Intra Procedural Call (Optional)
R13	SP	Stack Pointer
R14	LR / LK	Link Register / R15 Save Area
R15	PC	System Program Counter

Added in ARM3:

	32 Bit Registers	Use Cases
CPSR	CPSR	Processor Status

Special registers for protected modes:

R13/14 have alternative versions and there is a SPSR for each of IRQ/SVC UNDEF and ABORT modes
FIQ mode has alternate R8-R14 and SPSR

A Frame pointer points to data areas in the Stack used by the function, allowing for relative offsets... it's entirely optional if you reall use R11 for this or not.

the Intra Procedural Call register can be used as a backup of LR for subroutines

PC Flags: NZCVIF------------------------MM

CSPR Flags: NZCV--------------------IFTMMMMM

	Name	Meaning
N	Negative	Signed Less Than
Z	Zero	Zero
C	Carry	Carry / Not Borrow / Rotate Extend
V	oVerflow	Overflow
I	IRQ Disable	1=disable
F	FIQ disable	1=disable
T	Thumb mode	V4 only
M	Mode	00=user 01=FIQ 10=IRQ 11=Supervisor

Getting and Setting Flags:

	Arm2	Arm3+
Backup Flags to R0	MOV R0, R15	MRS R4, CPSR
Restore Flags from R0	TEQP R0, #0	MSR CPSR, R0

Because the ARM loads instructions in advance, R15 is always 2 instructions (8 bytes) ahead of the current running command

Number Representation

Decimal	#1234
Hexadecimal	#0x12EF
Binary	#0b11110000

Equivalent commands

Z80 command	Description	Command
CALL (no nesting)	Jump to subroutine	BL label
JP	Jump to label	B label
RET (no nesting)	Return from linked branch	MOV pc,lr
CALL - start Sub (allows nesting)	After BL	LDMFD sp!,{pc}
RET - end Sub (allows nesting)	End of sub (RET)	STMFD sp!,{r0-r12, lr}
DEC r1	Decrement r1 and set flags	SUBS r1,r1,#1
Push r0	Put r0 onto the stack	str r0, [sp, #-4]!
Pop r0	take r0 off the stack	ldr r0, [sp], #4
Push all	Push all+ return address	STMFD sp!,{r0-r12, lr}
Pop all.. RET	Pop all + return	LDMFD sp!,{r0-r12, pc}
LDIR	r12=src r13=dest r14=bytecount+dest	loop: LDMIA r12!, {r0-r11} STMIA r13!, {r0-r11} CMP r12, r14 BNE loop

Loading Registers
Unlike most systems, it is not possible to directly load a 32 bit register from an immediate value, we must either load from a relative address, or merge multiple values together,

If we're merging values together, we can specify a 16 bit Immediate (Though the assembler actually converts it to a MOV and an OR), then use Rotation to add the other two bytes in, Eg:
    mov r0,   #0x0000FFFF ;Can't load 32 bits directly - GRR!
    orr r0,r0,#0x00FF0000
    orr r0,r0,#0x12000000

Using rotation, we can specify 8 bits, and a rotation of 0-15 (each moves 2 bits)... allowing us to control the following bits:

Result	Bitshift
. . . . . . . . . . . . . . . . . . . . . . . . 76543210	0
10. . . . . . . . . . . . . . . . . . . . . . . . 765432	1
3210. . . . . . . . . . . . . . . . . . . . . . . . 7654	2
543210. . . . . . . . . . . . . . . . . . . . . . . . 76	3
76543210. . . . . . . . . . . . . . . . . . . . . . . .	4
. . 76543210. . . . . . . . . . . . . . . . . . . . . .	5
. . . . 76543210. . . . . . . . . . . . . . . . . . . .	6
. . . . . . 76543210. . . . . . . . . . . . . . . . . .	7
. . . . . . . . 76543210. . . . . . . . . . . . . . . .	8
. . . . . . . . . . 76543210. . . . . . . . . . . . . .	9
. . . . . . . . . . . . 76543210. . . . . . . . . . . .	10
. . . . . . . . . . . . . . 76543210. . . . . . . . . .	11
. . . . . . . . . . . . . . . . 76543210. . . . . . . .	12
. . . . . . . . . . . . . . . . . . 76543210. . . . . .	13
. . . . . . . . . . . . . . . . . . . . 76543210. . . .	14
. . . . . . . . . . . . . . . . . . . . . . 76543210. .	15

Rotations
For normal commands, rotations are defined by 5 bits, allowing a shift from 1-31

LSL	Logical Shift Left
LSR	Logical Shift Right
ASR	Arithmatic shift Right
ROR	Rotate Right
RRX	Rotate Right with eXtend (1 bit only - opcode is ROR #0)

Data Definitions

Bytes	Z80	68000	8086	ARM
1	DB	DC.B	DB	.BYTE
2	DW	DC.W	DW	.WORD
4		DC.L	DD	.LONG
n	DS n,x	DS n,x	n DUP (x)	.SPACE n,xx

Addressing Modes

Param	Mode	Format	Details	Example
Op2	Immediate	#n	Fixed value of n Can be any value made by an 8 bit immediate shifted by an even number of bits, eg 0xFF or 0xFF000000 are OK.	ADD R0,R0,#1
Op2	Register	Rn	value in register Rn	ADD R0,R0,R1
Op2	Register Shifted by Immediate	Rn, shft #n	Shift Register Rn by #n using shifter shft Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX note: RRX can only shift 1 bit	MOV R0,R1,ROR #2
Op2	Register Shifted by Register	Rn, shft Rm	Shift Register by Rm using shifter shft Options: LSL Rm, LSR Rm, ASR Rm, ROR Rm	MOV R0,R1,ROR R2
Flex	Immediate offset Immediate pre-indexed	[Rn,#n] [Rn,#n]!	value from address in register Rn+n ! means Preindexed, set Rn=Rn+n	LDR R0,[R1] ;#n=0 LDR R0,[R1,#4] LDR R0,[R1,#-4]!
Flex	Register offset Register pre-indexed	[Rn,{-}Rm] [Rn,{-}Rm]!	value from address in register Rn+Rm ! means Preindexed, set Rn=Rn+Rm	LDR R0,[R1,-R2] LDR R0,[R1,R2]!
Flex	Scaled register offset Scaled register pre-indexed	[Rn, Rm,shft #n] [Rn, Rm,shft #n]!	value from address in register, if LSL then Rn+(Rm*#n) ! means Preindexed, set Rn=Rn+n	LDR R0,[R1,R2, LSL #2] LDR R0,[R1,R2, LSL #2]!
Flex	Immediate post-indexed	[Rn],#n	value from address in register Rn... set Rn=Rn+n (No need for ! - as it's the only purpose of the command!)	LDR R0,[R1],#4
Flex	Register post-indexed	[Rn], {-}Rm	value from address in register Rn... set Rn=Rn+Rm (No need for ! - as it's the only purpose of the command!)	LDR R0,[R1],R2 LDR R0,[R1],-R2
Flex	Scaled register post-indexed	[Rn], {-}Rm, shft #n	Shift Register by #n using shifter shft Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX	LDR R0,[R1],R2,LSL #2 LDR R0,[R1],-R2,RRX

All addressing modes are available for the main commands, but others are more limited.

Command format

Command Dest, Source, Param, Shifts

Command {COND}{B}{S} Dest, rd, [rs,off]{!}
B= byte transfer
!= update reg Rs
S= update conditional flags

� post-indexed offset.
The syntax of the four forms, in the same order, are:
� zero offset
op{cond}type Rd, [Rn]
� pre-indexed offset
op{cond}type Rd, [Rn, Offset]{!}
� program-relative
op{cond}type Rd, label
op{cond}type Rd, [Rn], Offset
where:
op is either LDR or STR .
cond is an optional condition code
type must be one of:
SH for Signed Halfword ( LDR only)
H for unsigned Halfword
SB for Signed Byte ( LDR only).
Rd is the ARM register to load or save.
Rn is the register on which the memory address is based.
Rn must not be the sameas Rd , if the instruction is either:
� pre-indexed with writeback
� post-indexed.

label is a program-relative expression. label must be within 255 bytes of the current instruction.
Offset is an offset applied to the value in Rn
! is an optional suffix. If ! is present, the address including the offset is written back into Rn . You cannot use the ! suffix if Rn is r15.

Zero offset
The value in Rn is used as the address for the transfer.

Pre-indexed offset
The offset is applied to the value in Rn before the transfer takes place. The result is used as the memory address for the transfer. If the ! suffix is used, the result is written back into Rn .

Program-relative
This is an alternative version of the pre-indexed form. The assembler calculates the offset from the PC for you, and generates a pre-indexed instruction with the PC as Rn .You cannot use the ! suffix.
Post-indexed offset
The value in Rn is used as the memory address for the transfer. The offset is applied to the value in Rn after the transfer takes place. The result is written back into Rn .
Offset syntax
Both pre-indexed and post-indexed offsets can be either of the following:
#expr {-}Rm
where:
- is an optional minus sign. If - is present, the offset is subtracted from Rn . Otherwise, the offset is added to Rn .
expr is an expression evaluating to an integer in the range �255 to +255. This is often a numeric constant
Rm is a register containing a value to be used as the offset.
The offset syntax is the same for LDR and STR, doublewords on page 4-15.\

Architectures
These instructions are available in ARM architecture v4 and above.
Examples
LDREQSH r11,[r6] ; (conditionally) loads r11 with a 16-bit halfword from the address in r6. Sign extends to 32 bits.
LDRH r1,[r0,#22] ; load r1 with a 16 bit halfword from 22 bytes above the address in r0. Zero extend to 32 bits.
STRH r4,[r0,r1]! ; store the least significant halfword from r4 to two bytes at an address equal to contents(r0) plus contents(r1). Write address back into r0.
LDRSB r6,constf ; load a byte located at label constf. Sign extend.

Lesson 1 - Getting started with ARM
Lets start looking at some simple commands, and get the hang of the ARM registers!

These tutorials will use VASM to build... RPCEmu to run compiled code, and we'll use a simple monitor... you can download all the tools in the links to the right

There's a video of this lesson, just click the icon to the right to watch it ->

Our Compiler and emulator

We're going to be using VASM as an assembler, it's a free which works on windows, OSX and Linux
My Devtools provide a batch file which will build the programs for you, but if you don't want to use them, the format of the build script is shown below:

-Fbin ... Specifies to create a Binary file
-Dxxx=Y ... Specifies to define a symbol xxx=y (we'll learn about symbols later.
-L ... Specifies a Listing file - this shows source code and resulting bytes... it's used for debugging if we have problems
-o ... Specifies the output file.
%BuildFile%... this would be the sourcefile you want to compile... Eg: Lesson1.asm
-m7tdmi... (or equivalent) specifies the ARM architecture we're building for.
-chklabels -nocase ... Disable case sensitivity, and check for lines where we've forgotten a tab on a command (it will be mistaken for a label)

Once we've successfully compiled our program, we can run it with VisualBoyAdvance

We'll also use RiscOS, but setting this up is more complex if you're doing it yourself.

A template program

To allow us to get started programming quickly and see the results, we'll be using a 'template program'...
This consists of 3 parts:

A Generic Header - this will set up the screen and a few parameters we'll need to start.

The Program - this is the body of our program where we do our work.

A Generic Footer - this gives us some support tools, and includes a common bitmap font.

This template program will compile on any of the systems in these tutorials (RiscOS and the GameboyAdvance!)

There's a lot of complex scary stuff in the include files - don't panic about it for now, you'll be able to understand it more later once you've covered all the lessons.

Commands, Labels and Calls

Lets take a look at a simple program!...

The first line is a command 'BL' (Branch and Link)... this is the same as CALL or JSR on other systems... it runs the subroutine labeled 'DoMonitor' - when that subroutine finishes (when register LR is transferred to PC) the program will carry on with the line after the BL call ... notice the command starts indented *this is required for commands*

the next line is not indented and ends with a colon : - that makes it a label called 'infloop' ... labels tell the assembler to 'name' this position in the program - the assembler will convert the label to a byte number in the executable... thanks to the assembler we don't need to worry what number that ends up being...

finally we have the command 'B' (Branch)... this is a jump! unlike BL (Branch and Link), it never returns... notice we're Branching to the label we just defined on the line before.... this makes the program run infinitely... a crude way to end our program so we can see the result!

you'll also notice text in green starting with a Semicolon ; - this is a comment (REMark) - they have no effect on the code

Subroutines and returns

Lets look at another subroutine.

This one stars with a label 'GetNextLine'... we know it's a label because it's not indented and ends in a colon... this is the name of the subroutine - we'll see the name with BL (Branch and Link) statements (calls on the arm).

Then there is an ADD Command... it adds 160 to r10 (R10=R10+160)... it is indented, so they are clearly commands...

Finally there is a MOV PC,LR command - this ends a subroutine... BL transfers PC (the program counter... the current running byte) to LR - transferring LR back to PC returns to the command after the BL command

if our code has a RET at the end - it's a subroutine and should probably be started with a CALL... if we start it with a JMP something bad will probably happen!

ARM calls are very weird... CALL is called BL - and rather than push the PC (Reg T15 - Program counter) onto the stack, it move it to LR (reg R14)
We can also return by popping a previously pushed LR back into PC

Don't worry if you don't understand this yet - this info is just for those familiar with other CPUs- we'll cover it in more detail soon!

The power of ARM, and the limitations of RISC!

It's time to start loading data into 'registers'... Registers are the small bits of memory in the processor we use to store values we want to perform calculations on... The ARM has 16 registers R0-R15 but many have special purposes - when doing mathematical operations we need to limit our use to R0-R12 we load a value into a register with a MOVe command... the destination register is on the left of the comma - the value is on the right (Starting with as #) 0x defines the value as a hexadecimal All the registers are 32 bit, but due to the limitations of the instruction set, only 4 consecutive digits of the 8 hexadecimal digits can be nonzero - we'll learn more about this later.
We can see here the two registers have been set.
Because of the RISC limitation this command will not compile - it has 5 digits that are nonzero.

it may seem weird we can't set all 8 bytes of a register in one go, but there's ways around this!

it all comes down to the way the instructions turn into bytes - each instruction is 4 bytes - and there's only enough 'space' in the MOV command to set 2 bytes of the register value

Hex,Dec,Binary and Asc Oh my!... also Adding and Subtracting.

We can load hexadecimal values (Base 16 - 0123456789ABCDEF) into registers by starting the value #0x.... if we want to use binary (Base 2 - 1's and 0s) by starting the value #0b... Decimal values are just started #... unfortunately it seems VASM doesn't allow ascii characters as immediate values (They can be stored in BYTE string data, but not here)
Here's the result!
We've looked at loading numbers into registers, but MOV can also move one register into another.. In this example R2 will be moved into R0.... the destination is on the left of the comma, the source is on the right.
R0 will be set to the value that was in R2
Of course, we don't just have a MOVe command - we also have ADD and SUB for addition and subtraction! The destination is the first parameter , the two values to be added are the second and third... for example: add r1,r0,#0x00000001 could be thought of as: R1=R0+#0x00000001 if we just want to change a value, the second and third parameters can be the same, for example: add r0,r0,#1 - or they can be different, for example: add r0,r1,#1 Before we learned we could not load #0x12345678 directly into a register, however we can do this in two parts, loading the first 4 digits with MOV - then adding the other 4 with ADD
the changes to the registers are shown here

Reading and writing 32bit values to RAM

MOV is good for setting registers from fixed values or other registers, but it's not what we need for working with RAM For this example we'll define a 32 bit 'long' in ram called 'TestVal' We'll use LDR to load from the testvalue... with STR, the Destination register is on the left, and the Source address is on the right... with LDR, the Source register is on the left, and the Destination address is on the right... LDR and STR load and save 32 bit values (the entire length of the register)
We loaded in 0xFEDCBA98 from RAM with LDR Added 1 and wrote it back as 0xFEDCBA99 with STR
USER Ram is defined with a SYMBOL Like a label, a symbol is a text name which is replaced by the assembler for a number we use .EQU to define a symbol, in this case we're setting ramarea to 0x02000000 (this is the GBA version)
If we want to write to the address in a register, we put the register in Square Brackets (Eg [R1] ) We can load the address of a label like 'testval' with the ADR command... this will transfer a label address into a register We can then use LDR to read in from that register. If the address is in a symbol not a label like 'userram', ADR will not work - in this case we just use MOV to transfer the address into our register we can use STR with that address to save back to that address
We load the address of TestVal into R2 with ADR We then loaded R1 from [R2] with LDR Next we loaded the address of 'UserRam' into R2 with MOV We gave R0 a new value with MOV and store R0 back to [R2] with STR

The example here shows data is stored by the ARM in 'Little Endian' format... meaning the lowest value byte in a 32 bit register is stored first... and the highest is stored last.

This is basically always the case with the ARM - however the ARM CPU can actually also work in Big Endian mode.

Reading and writing Byte 8bit values to RAM

The previous LDR and STR worked with 32 bit registers... but we'll often want to work with bytes, The ARM allows this with a LDRB and STRB command - they work the same as the other commands, but just load a single byte
We loaded in a byte from TestVal with LDRB... Note that the 24 unused bits of the register changed to 0 We then added 255 - causing the R1 to expand out of a single byte... We then save back with STRB - because we used a byte command, only the low byte was saved

LDR and STR work with 32 bit values... LDRB and STRB work at 8 bit...But what about 16 bit? well LDRH and STRH (H=Half) will load and save 16 bit...
but these commands only exist on later processors, the Gameboy Advance uses them fine - but RiscOS can't use them!

Because the ARM is 32 bit, a WORD is 32 bits on arm, rather than 16 bit like on the Z80 or 68000
VASM uses the statement '.long' to define a 32 bit value - but a LONG on the ARM would typically be 64 bit.

To avoid confusion the terms WORD and LONG won't be used in these tutorials - the length will be referred to in bits instead

Lesson 2 - Addressing modes and rotation on the ARM
We learned a few simple commands last time, but now it's time to start getting serious!
The ARM has many clever ways of addressing memory - and has something called the a 'Barrel Shifter' - We'll learn what that is soon...
Lets learn about each addressing mode on the arm!

1. Immediate - direct numeric values

We've already come across this!... Immediate addressing is where the values are numbers stored directly in the code In this example The value is transferred into the register. The size of the immediate value depends on the command, sometimes it can be 16 bit, others it can be only 8 bit... Though it can be shifted - so 0x0000FF00 is a valid 8 bit option.
The results are shown - our registers now contain the requested values

2. Register - Data from other registers

Register addressing is far less exciting than it sounds... it's just where a parameter is taken from the value in a register.
Here we've set R1 to the value in R2, then R2 to the value of R1+R2 These are both examples of register addressing

3 . Register indirect - Address is in register

Register Indirect is where the register holds an address, and that address is the source of the value for the command.. The register is wrapped in square brackets eg [r2]
We can load with LDR or save with STR The value in RO has been read and written into the address in R2

4 .Register indirect with constant offset - Direct numeric values

As well as using the value of a register as the address, we can use the register plus a fixed offset.

the Offset is put in the square brackets [] after a comma

This is useful if our register points to a bank of settings, and the offsets point to individual settings in that bank.

To make things easier, we can define symbols and use those as the offset... also notice the offset can be negative

R2 points to the start of the data bank...

we read in R0 from the base+4 - R1 from the base+8 - and R2 from the base-4 (not shown in the ram dump)

5. Register indirect with register offset - Address in sum of two registers

Rather than a fixed offset from the address, we can use the value in a register... effectively the resulting address is the sum of the two addresses
The registers will be loaded from their respective offsets.

6. Register indirect with Preincrement - Increase register and Get from address in register

There will be times when we want to read a sequence of bytes in a loop - we'll probably want to read in using a register - then increase the address specified by that register, so we read the following data in the next loop iteration The ARM can do this for us... just put a ! at the end of the command, and the address register (R1) will go up by the offset #4 BEFORE each read.
The First Value was read without preincrement - each other was done with a preincrement of 4... notice how R1 goes up by 4 each time

Just like before, the 'Increment' can actually be negative - so you can read backwards sequentially as well as forwards!

Isn't the ARM great!??

7. Register indirect with Postincrement - Get from address in register and Increase register

If we want to increase the register AFTER the read, we can do this too... instead of putting the offset inside the square brackets [] and a ! - we just put the offset OUTSIDE the square brackets (no ! required)
The First Value was read without postincrement... the second used postincrement but this also read from the same address... the others also used postincrement, and these loaded from successive addresses

8. Program Counter Relative - label relative to current code

PC Relative allows us to load directly from a label near to the current running code, we don't need to know what the PC is, the assembler works it out for us
The specified addresses are loaded into the registers.

9. Register Shifted - Value of a register bit shifted

Many CPU's have rotation and shifting commands commands which will perform bit shifts on a register - but the ARM is special, bit shifts can be performed on the value of a register with virtually any command!

There's no dedicated ROT commands, n case we just rotate a registers value an move that result into another register.

LSL/LSR We have two 'Logical shift commands'... these are designed to work with unsigned numbers Shifting Right with LSR essentially halves the number, shifting left with LSL effectively doubles - of course they can also be used to move bits around!
We loaded R0 with our test value and shifted 8 bits to the left into R1... the top '8' got pushed out and was lost We then shifted 8 bits to the right in R2 - the 2 went of the right hands side, and was lost
ASR is 'Arithmetic shift Right' - this effectively shifts bits to the right like LSR - however it's designed to work with negative values, and will copy the top bit to the freed up bits to allow negative numbers to be halved
Here is the result... in R2 the top byte has changed to FF as all the bits are 1
ROR is Rotate right - unlike the other commands which push the bits our of the register ROR will rotate them around again, so any bits pushed off the right will me moved to the left of the register.
We rotated by 8 bits right - this pushed 02 off the right side, and onto the left side... the remaining bytes 800010 moved to the right
We don't just have to use immediate values - we can use a register value as the shift amount!
Here's the result - the value in R3 was used as the shift amount
RRX is the last option - this rotates the rightmost bit into the Carry bit - and any value that was in the Carry bit is moved to the topmost bit. RRX can only rotate by 1 bit, also there is no left rotate. we have to add an S to the mov command - making it movs - otherwise the Carry bit won't be set
The TestValue was shifted 1 bit to the right into R1... this pushed a bit 1 into the Carry... This Carry bit was then shifted into R2 - Making the top byte C0 (%11000000)

10. Register indirect with scaled Register offset - Value of a register bit shifted

Because these bit shifts can be used with many other commands, we can use them to multiply a parameter for a register indirect offset - In this example we've shifted R1 left twice - effectively making our formula [R2 + (R1*4) ]
As we increase and decrease R1, the address we read from will change accordingly.

Lesson 3 - Labels, Branch CMP
We've learned how to do mathematics and how to move data in and out of memory,
Next we need to learn how to add conditions and branches - these will make up our loops, and our program logic.
Unlike most systems, on ARM conditions can apply to most commands, not just branching operations!

Flags on the ARM

CPU flags are set by mathematical operations and allow us to check if the result of an operation was zero, or if any bits we're pushed out of a register by a rotate command or addition.

On most CPU's the flags are set automatically however this is not the case on the ARM

the ARM cpu will generally only set the flags when we add a S to the end of our command - this causes the flags to be set by the command

The Add commands caused the value in R0 to roll over back to zero

The first add command did not end in S so the flags did not change

The second add command was addS - ending in S... this tells the cpu to set the flags - the Carry flag is set because the register overflowed, the Z flag was set because the current value of R0 is zero

We're going to look at some examples of these flags and condition codes - but really you should try them yourselves!

You'll notice commented out code (starting ;) - these are alternative tests you can do to see the conditions in action - Ideally you should try them yourselves, but they'll all be shown on the video!

Carry: CS/CC

The Carry flag is set when a register's value exceeds the limits of 32 bit - for example when we add 1 to 0xFFFFFFFF,

It will also be set by rotate commands that push a bit out of the register like RRX

We're going to use a Branch command with a condition code to test for the carry... BCC will Branch if Carry is Clear... BCS will branch if Carry is Set

Condition Codes:
CS = Carry Set
CC = Carry Clear

The Carry flag was set, so the BCS occurred, showing a C to the screen

Zero: EQ/NE

The Zero flag is set whenever a mathematical operation results in zero - either because of a subtraction, an addition or overflows, or other operation that results in a register containing zero... it's also set when a compare operation is performed on two registers with the same value - as the difference is zero.

We'll use BEQ (Branch if Equals) and BNE (Branch if Not Equals)

EQ - Equals (Zero)
NE - Not Equals (Not Zero)

The Zero flag was set (because the difference between the two registers was zero)
This caused the jump to occur, and the = was shown.

Unsigned Numbers: CS/CC/HI/LS

Unsigned mathematics (that do not use negative numbers ) use 4 comparisons - two we've already seen!
the CMP command is effects the flags like a 'subtraction' command, but does not alter registers.

there are four commands

>= CS - Carry Set
< CC - Carry Clear
> HI - Higher (Carry set and Zero Clear)
<= LS - Lower or same (Carry Clear or Zero Set)

Because negative numbers start with a 1 as the top bit, they will be treated as very large by these commands, we need to use other commands to test these

The Zero and Carry flag will be set depending on the values compared

Signed Numbers: GE/LT/GT/LE

Because of the way negative numbers works in assembly, We need to use 4 different commands for comparing signed numbers,
there are four commands

>= GE - Greater or Equals (N set and V set or N clear and V clear)
< LT- Less Than (N set and V clear or N clear and V set)
> GT - Greater Than (Z clear and N set or V set or N clear and V clear)
<= LE - Less than or Equals (Z set or N set and V clear, or N clear and V set)

The jumps will occur according to the flags... the flag-rules are pretty complex for these, but the commands are easy to use.

Positive / Negative Numbers: PL/NI

There may be times we need to simply know if a number is positive or negative, the N flag does this for us... We can use two special conditions to do this PL - Positive (Negative Clear NI - Negative (Negative set)
The N flag is set according to the top bit of the register

Overflow: VS/VC

Overflow occurs when the limit of a signed number is breached and a positive number incorrectly flips to a negative (or vice versa) A signed number cannot contain >+32767 or <-32768... when it tries to the top bit will flip, and the value will become invalid... Overflow is designed to allow this to be detected... we have two conditions: VS - oVerflow Set VC - oVerflow Clear
The jump will occur according to the V flag

Always/Never: AL/NV

These are pretty useless, but they do technically exist... one that always happens, and one that never does!

AL - jump ALways
NV - jump NeVer

Conditions everywhere!

While Conditions on branch commands exist on all CPUs, the ARM has something really special! Conditions can be attached to most commands! just add the CC condition code to a command, it will only run if the condition is met - this allows for conditional code without branching.
Here is the result

Some commands work with these condition codes, and others dont! Check out the cheatsheet for the full details!

Lesson 4 - The Stack... and SWI
We've learned how to save values in memory - but what about if we want to store a value for a very short time?

We need a temporary store, and that's where the stack comes in!

'Stacks' in assembly are like an 'In tray' for temporary storage...

Imagine we have an In-Tray... we can put items in it, but only ever take the top item off... we can store lots of paper - but have to take it off in the same order we put it on!... this is what a stack does!

If we want to temporarily store a register - we can put it's value on the top of the stack... but we have to take them off in the same order...

The stack will appear in memory, and the stack pointer goes DOWN with each push on the stack... so if it starts at $2000 and we push 2 bytes, it will point to $1FFE

As the ARM is 32 bit, we'll push onto the stack 32 bits at a time.

Pushing and Popping the stack

There are no dedicated PUSH / POP commands for the stack on the ARM - and technically any register can be used as the stack... though SP is defined as R13 To move an item onto the stack we use: str ??, [sp, #-4]! ... this is our PUSH command to take an item off the stack we use: ldr ??, [sp], #4 ... this is our POP command In this example, we'll load R0 with a value, push it onto the stack, change R0, then restore the pushed value from the stack We'll view the registers and stack at each stage
The test value was loaded into R0 - Pushed onto the stack... then recovered into R0
We can nest pushes... The important thing to understand is that we pop off in the reverse order to the way we pushed them on... We can also push a value in R0 onto the stack, and pop it off in R1
Because the stack moves down in memory, the second value appears before the first in ram.

Pushing Multiple items with STMFD and LDMFD

We can push multiple items with STMFD and LDMFD, We use a comma list eg (r1,r2,r4) and/or a range (r1-r4,r6)

The order we put the registers in the list doesn't affect the order they are pushed onto the stack.

But of course if we pop them of into different registers, things could go wrong!

The items will be pushed onto the stack and popped off in one go!

As well as the typical STMFD and LDMFD there are other options!

We can have an Ascending or Descending stack (Descending is typical)

We can also have a 'Full' stack (where stack pointer points to last pushed item) or 'Empty' stack (where pointer points to next empty item)

Direction	Type	Push	Pop
Descending	Full	STMFD	LDMFD
Ascending	Full	STMFA	LDMFA
Descending	Empty	STMED	LDMED
Ascending	Empty	STMEA	LDMEA

The Stack with Branch and Link (BL)

As we learned, Branch and Link moves the Program (PC) counter into the Link Register (LR)

When we perform a RETurn, the assembler actually creates a MOV PC,LR command...

Because we need the LR to be intact to return, we need to back it up somehow if we're nesting subroutines...

The easiest solution is to push it onto the stack, and pop it back into the PC...

Alternatively, we could transfer it into another register

Here is the changes to the stack and Link Register

System calls with SWI

SWI stands for SoftWareInterrupt... Like the RST's of the Z80 and the TRAPs of the 68000 these are often used for OS calls...On RiscOS there are a variety of SWI's... To use a SWI we use the commands followed by a byte value... What the SWI does and what parameters need to be passed will depend on the system, you'll need to consult the documentation of that system for details.
we called the show string function, then the end program function

If you're programming the Gameboy Advance then you'll probably never need SWI... these tutorials use the firmware as little as possible, so you won't see it much in those either...

If you're using the firmware though, you'll have to check the manual for Risc-OS, and beware! there are different versions for later Risc OS versions!

Lesson 5 - More Maths!
We're nearly done... but we need to look at operations that work at the bit level, and a few other important commands... lets take a look!

Logical Operations on bits.

We have four kinds of logical operations we can perform on bits. AND = Return 1 where both parameters are 1 - else 0 ORR = (or) Return 1 where either parameter is 1 - else 0 XOR = Flip bits in first parameter where second parameter is 1 BIC = (Bit CLear) Zero bits in first parameter when second parameter is 1
The results are shown here

Test Operations TST / TEQ

We have two commands which work like Logical operations - but they do not change the contents of the registers - they just change the flags. TST = effectively ANDs the two perimeters setting the flags accordingly TEQ = effectively XORs the two perimeters setting the flags accordingly There's two special commands MSR and MRS - we'll look at those next!
Here is the result!

Backing up flags with MRS / MSR

*These commands only exist on later ARM versions*
if we want to back up the flags, we can do so with these two commands... the flags are in register 'CPSR'.... we can transfer this to or from another register!
MRS will move the flags to a register backing them up
MSR will move the flags from a register restoring them

Using Carry for 64 bits!

There may be times when even 32 bit isn't enough - when we do ADDition or SUBtraction that goes over the limit of a 32 bit register, we can use special commands to add that carry to a second register - the two registers together will give us 64 bits!

ADC adds a parameter + any carry to the top register.
SBC Subtractss a parameter and any carry to the top register.

In either case, we need to do an ADDS or SUBS to the low register first - the S means the flags are set, if we don't do this, the carry will never be set

Here are the results, when the bottom byte over/under flowed, the top byte was altered to compensate for the carry/borrow

Multiplication

The ARM has two multiply commands MUL - MUltiplies two parameters together. MLA - MuLtiplies two parameters and Adds a third
The result of the two operation is shown here *32=6... (32)+1=7*

Negative and reversed commands

We have some special commands, which reverse the order of the parameters

RSB (Reverse SuB) is like SUB - except whereas SUB R0,R1,R2 will set R0=R1-R2, RSB will set R0=R2-R1... there is a carrying version called RSC

If we want to transfer a value with all its bits flipped. we can use MVN R0,R1 (MoVe Not) - This will set R0= R1 EOR 0xFFFFFFFF

If we want to compare a register to a negated register we can use CMN R0,R1... this sets the flags like ADD, but does not change any registers.

We performed a 64bit reversed subtract, Moved a negated value, and compared a negative

ARM4+ only... 16 bit Move (HalfWord), Swap Ram<->Register

This tutorial primarily covers ARM2, but there's a few later commands that are really good to know...

The first are LDRH and STRH - these (like LDR/STR) are load and store commands - however these work at the HalfWord (16 bit) level... they're handy for the Gameboy Advance screen!

another interesting command is SWP - this transfers a Ram address to a register, and a register to the same ram address... The Source/Destination registers can be the same or different.

We loaded in a Half (16 bit)... then stored the modified Half back to ram

We Swapped the ram into R0 and R1 into Ram

We've covered all the basic ARM2 commands - there are many more in the later revisions, but we won't be covering them at this time.

We've looked at enough to get started with RiscOS or the Nintendo Gameboy Advance!

Appendix

Mnemonic	Description	Example
ADCccS Rn, Rm, Op2	Add With Carry.	ADC R0,R0,#4
ADDccS Rn, Rm, Op2	Add Op2 to Rm and store the result in Rn.	ADD R0,R0,#4
ANDccS Rn, Rm, Op2	Logically AND Op2 with Rm and store the result in Rn.	AND R0,R0,#4
Bcc Label	Branch to a relative Label.	BEQ ConditionalJump
BICccS Rn, Rm, Op2	Logically Bit Clear Op2 with Rm and store the result in Rn.	BIC R0,R0,#4
BLcc Label	Branch and Link to a relative subroutine Label.	BL TestSub
CMNcc Rn, Op2	Compare Negative Rn to Op2. set the flags like"ADDS Rn,Op2"	CMN R0,#4
CMPcc Rn, Op2	Compare Rn to Op2. set the flags, the same as "SUBS Rn,Op2"	CMP R0,#4
EORccS Rn, Rm, Op2	Logically Exclusive OR Op2 with Rm and store the result in Rn.	EOR R0,R0,#4
LDMccadm Rn!, {Regs}	Transfer range of registers {Regs} to address in Rn. Like POP	LDMFD sp!,{r0,r1,r2}
LDRcc Rn, Flex LDRccB Rn, Flex	Load register Rn from address Flex	LDR R0,NearLabel
LDRccH Rn, Off LDRccSH Rn, Off LDRccSB Rn, Off	HalfWord (16 bit), Signed Word (16 Bit) and Signed Byte (8 Bit) load	LDRSB R0,[R1,#-255]
MLAccS Rn, Rm, Ro, Rp	32 bit Multiplication and Add. Rn=(Rm*Ro)+ Rp	MLA R0,R1,R2,R3
MOVccS Rn, Op2	Move value in Op2 into Rn.	MOV R0,#0xFF
MRScc Rn,sr	Move sr (either CPSR or SPSR) to register Rn.	MRS R0,SPSR
MSRcc sr_f,# MSRcc sr_f,Rn	Move immediate # or register into flags f of sr (either CPSR or SPSR).	MSR CPSR_F,#0
MULccS Rn, Rm, Ro	32 bit Multiplication. Rn=Rm*Ro.	MUL R0,R1,R2
MVNccS Rn, Op2	Move Not. Flip all the bits of Op2 and move result into Rn.	MVN R0,#0xFF
ORRccS Rn, Rm, Op2	Logically OR Op2 with Rm and store the result in Rn.	ORR R0,R0,#4
RSBccS Rn, Rm, Op2	Reverse Subtract. This performs the calculation Rn=Op2-Rm.	RSB R0,R0,#6
RSCccS Rn, Rm, Op2	Reverse Subtract with Carry. Rn=(Op2-Rm)-C .	RSC R0,R0,#6
SBCccS Rn, Rm, Op2	Reverse Subtract with Carry. Rn=(Op2-Rm)-C .	SBC R0,R0,#6
STMccadm Rn!, {Regs}	Transfer range of registers {Regs} to the address in Rn. Like PUSH	STMFD sp!,{r0,r1,r2}
STRcc Rn, Flex STRccB Rn, Flex	Store register Rn to address Flex.	STR r0,[r1,r2,asl #2]
STRccH Rn, Off STRccSH Rn, Off STRccSB Rn, Off	Half Word (16 bit), Signed half Word (16 Bit) and Signed Byte (8 Bit) store	STRSB R0,[R1,#-255]
SUBccS Rn, Rm, Op2	Subtract. This performs the calculation Rn=Rm-Op2.	SUB R0,R0,#6
SWIcc #	Software Interrupt.	SWI 3
SWPccB Rn, Rm, [Ro]	Swap a register and memory. Rn=[Ro], [Ro]=Rm.	SWPB R0,R1,[R2]
TEQcc Rn, Rm, Op2	Test for bitwise Equality. Set the flags like "EOR Rn,Rm,Op2"	TEQ R0,R0,#6
TSTcc Rn, Rm, Op2	Test bits. Set the flags like �AND Rn,Rm,Op2"	TST R0,R0,#6