Learn Multi platform ARM Assembly Programming... For the Future!

We've covered a wide variety of chips in these tutorials, but now it's time to look at one of the most modern... the ARM

Powering everything from the Gameboy Advance to the Nitendo Switch and Iphone,  the arm is NOT a dated 'Classic' CPU... it's not even a system with 8 bit 'legacy roots' like the 8086...

The ARM is 32 bit CPU from the ground up, designed around RISC principles and bytecode structure, it's highly optimized for low power situations - the arm is widely believed to be the future of computing.

In this series, we'll take a look at the ARM CPU on the GBA and as always, learn about the CPU from the ground up!



If you want to learn ARM get the Cheatsheet! it has all the ARM7 commands, it covers the commands, and options like Bitshifts and conditions as well as the bytecode structure of the commands!
We'll be using the excellent VASM for our assembly in these tutorials... VASM is an assembler which supports Z80, 6502, 68000, ARM and many more, and also supports multiple syntax schemes...

You can get the source and documentation for VASM from the official website HERE

Generations and Early uses:
Cpu
Instruction set  
System
ARM2 Arm v2 Acorn Archimedes
ARM60 Arm v3 3D0 (12 Mhz)
ARM7TDMI   ARMv4T GBA (16.78)





Platforms covered in these tutorials
Gameboy Advance
Risc OS


What is the ARM and what are 32 'bits' You can skip this if you know about binary and Hex (This is a copy of the same section in the Z80 tutorial)
The ARM is a 32-Bit processor with a 32 bit Address bus!... 
What's a bit... well, one 'Bit' can be 1 or 0
four bits make a Nibble (0-15)
two nibbles (8 bits) make a byte (0-255)
two bytes (16 bits) make a word (0-65535)

And what is 65535? well that's 64 kilobytes ... in computers Kilo is 1024, because 2^10 = 1024

With the ARM we actually have some serious memory resources available to us, both in RAM or ROM!

if you're looking to develop serious games or software, you probably want to use C++, but looking at assembly lets us see how the hardware really works, and that's the point of these tutorials!

Numbers in Assembly can be represented in different ways.
A 'Nibble' (half a byte) can be represented as Binary (0000-1111) , Decimal (0-15) or  Hexadecimal (0-F)... unfortunately, you'll need to learn all three for programming!

Also a letter can be a number... Capital 'A'  is stored in the computer as number 65!

Think of Hexadecimal as being the number system invented by someone wit h 15 fingers, ABCDEF are just numbers above 9!
Decimal is just the same, it only has 1 and 0.

In this guide, Binary will shown with a % symbol... eg %11001100 ... hexadecimal will be shown with & eg.. &FF.

Assemblers will use a symbol to denote a hexadecimal number, some use $FF or #FF or even 0x, but this guide uses & - as this is how hexadecimal is represented in CPC basic
All the code in this tutorial is designed for compiling with WinApe's assembler - if you're using something else you may need to change a few things!
But remember, whatever compiler you use, while the text based source code may need to be slightly different, the compiled "BYTES' will be the same!
Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 255
Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111   11111111
Hexadecimal 0 1 2 3 4 5 6 7 8 9 A B C D E F   FF

Another way to think of binary is think what each digit is 'Worth' ... each digit in a number has it's own value... lets take a look at %11001100 in detail and add up it's total

Bit position 7 6 5 4 3 2 1 0
Digit Value (D) 128 64 32 16 8 4 2 1
Our number (N) 1 1 0 0 1 1 0 0
D x N 128 64 0 0 8 4 0 0
128+64+8+4= 204            So %11001100 = 204 !

If a binary number is small, it may be shown as %11 ... this is the same as %00000011
Also notice in the chart above, each bit has a number, the bit on the far right is no 0, and the far left is 7... don't worry about it now, but you will need it one day!

If you ever get confused, look at Windows Calculator, Switch to 'Programmer Mode' and  it has binary and Hexadecimal view, so you can change numbers from one form to another!
If you're an Excel fan, Look up the functions DEC2BIN and DEC2HEX... Excel has all the commands to you need to convert one thing to the other!

But wait! I said a Byte could go from 0-255 before, well what happens if you add 1 to 255? Well it overflows, and goes back to 0!...  The same happens if we add 2 to 254... if we add 2 to 255, we will end up with 1
this is actually usefull, as if we want to subtract a number, we can use this to work out what number to add to get the effect we want

Negative number -1 -2 -3 -5 -10 -20 -50 -254 -255
Equivalent Byte value 255 254 253 251 246 236 206 2 1
Equivalent Hex Byte Value FF FE FD FB F6 EC CE 2 1

All these number types can be confusing, but don't worry! Your Assembler will do the work for you!
You can type %11111111 ,  &FF , 255  or  -1  ... but the assembler knows these are all the same thing! Type whatever you prefer in your ode and the assembler will work out what that means and put the right data in the compiled code!


The ARM-32 Registers
The arm technically has 27 general purpose 32 bit registers, but all but 16 are hidden from the user...

Main Registers:

32 Bit Registers Use cases
 R0  R0 Accumulator 
R1 R1 Base
R2 R2 Count
R3 R3 Data 
R4 R4 Used by String commands
R5 R5 Used by String commands
R6 R6 Stack
R7 R7 Alt Stack
R8 R8 Flags
R9 R9 Current running address
R10 R10
R11 R11 / FP Frame Pointer (Optional)
R12 R12 / IP Intra Procedural Call (Optional)
R13 SP Stack Pointer
R14 LR / LK Link Register / R15 Save Area
R15 PC System Program Counter

Added in ARM3:

32 Bit Registers Use Cases
CPSR CPSR Processor Status

Special registers for protected modes:

R13/14  have alternative versions and there is a SPSR for each of IRQ/SVC UNDEF and ABORT modes
FIQ mode has alternate R8-R14 and SPSR

A Frame pointer points to data areas in the Stack  used by the function, allowing for relative offsets... it's entirely optional if you reall use R11 for this or not.

the Intra Procedural Call register can be used as a backup of LR for subroutines
    PC Flags: NZCVIF------------------------MM

CSPR Flags: NZCV--------------------IFTMMMMM

      Name Meaning
N Negative Signed Less Than
Z Zero Zero
C Carry Carry / Not Borrow / Rotate Extend
V oVerflow Overflow
I IRQ Disable 1=disable
F FIQ disable  1=disable
T Thumb mode V4 only
M Mode 00=user 01=FIQ 10=IRQ 11=Supervisor

Getting and Setting Flags:


Arm2
Arm3+
Backup Flags to R0
MOV R0, R15 MRS R4, CPSR
Restore Flags from R0
TEQP R0, #0 MSR CPSR, R0
Because the ARM loads instructions in advance, R15 is always 2 instructions (8 bytes) ahead of the current running command

Number Representation
Decimal #1234
Hexadecimal #0x12EF
Binary #0b11110000



Equivalent commands
Z80 command Description Command
CALL (no nesting) Jump to subroutine  BL label
JP Jump to label B label
RET (no nesting) Return from linked branch MOV pc,lr
CALL - start Sub (allows nesting) After BL LDMFD sp!,{pc}
RET - end Sub (allows nesting) End of sub (RET) STMFD sp!,{r0-r12, lr}
DEC r1 Decrement r1 and set flags SUBS r1,r1,#1
Push r0 Put r0 onto the stack str r0, [sp, #-4]!
Pop r0 take r0 off the stack ldr r0, [sp], #4
Push all Push all+ return address STMFD sp!,{r0-r12, lr}
Pop all.. RET
Pop all + return LDMFD sp!,{r0-r12, pc}
LDIR r12=src
r13=dest
r14=bytecount+dest
loop:
     LDMIA r12!, {r0-r11}
     STMIA r13!, {r0-r11}
     CMP r12, r14
     BNE loop

Loading Registers
Unlike most systems, it is not possible to directly load a 32 bit register from an immediate value, we must either load from a relative address, or merge multiple values together,

If we're merging values together, we can specify a 16 bit Immediate (Though the assembler actually converts it to a MOV and an OR), then use Rotation to add the other two bytes in, Eg:
    mov r0,   #0x0000FFFF ;Can't load 32 bits directly - GRR!
    orr r0,r0,#0x00FF0000
    orr r0,r0,#0x12000000

Using rotation, we can specify 8 bits, and a rotation of 0-15 (each moves 2 bits)... allowing us to control the following bits:
Result Bitshift
. . . . . . . . . . . . . . . . . . . . . . . . 76543210 0
10. . . . . . . . . . . . . . . . . . . . . . . . 765432 1
3210. . . . . . . . . . . . . . . . . . . . . . . . 7654 2
543210. . . . . . . . . . . . . . . . . . . . . . . . 76 3
76543210. . . . . . . . . . . . . . . . . . . . . . . . 4
. . 76543210. . . . . . . . . . . . . . . . . . . . . . 5
. . . . 76543210. . . . . . . . . . . . . . . . . . . . 6
. . . . . . 76543210. . . . . . . . . . . . . . . . . . 7
. . . . . . . . 76543210. . . . . . . . . . . . . . . . 8
. . . . . . . . . . 76543210. . . . . . . . . . . . . . 9
. . . . . . . . . . . . 76543210. . . . . . . . . . . . 10
. . . . . . . . . . . . . . 76543210. . . . . . . . . . 11
. . . . . . . . . . . . . . . . 76543210. . . . . . . . 12
. . . . . . . . . . . . . . . . . . 76543210. . . . . . 13
. . . . . . . . . . . . . . . . . . . . 76543210. . . . 14
. . . . . . . . . . . . . . . . . . . . . . 76543210. . 15


Rotations
For normal commands, rotations are defined by 5 bits, allowing a shift from 1-31
LSL Logical Shift Left 
LSR Logical Shift Right
ASR Arithmatic shift Right
ROR Rotate Right
RRX Rotate Right with eXtend (1 bit only - opcode is ROR #0)

Data Definitions
 Bytes   Z80   68000   8086   ARM
1
DB DC.B DB .BYTE
2
DW DC.W DW .WORD
4

DC.L DD .LONG
n
 DS n,x  DS n,x  n DUP (x)   .SPACE n,xx 

Addressing Modes
Mode Format Details Sample
Immediate #n Fixed value of n ADD R0,R0,#1
Register Rn value in register Rn ADD R0,R0,R1
Register Shifted Rn, shft #n Shift Register by #n using shifter shft
Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX
note: RRX can only shift 1 bit
MOV R0,R1,ROR #2
Register indirect [Rn] value from address in register Rn LDR R0,[R1]
Register indirect with constant offset [Rn,#n] value from address in register Rn+n LDR R0,[R1,#4]
Register indirect with register offset [Rn,{-} Rm] value from address in register Rn+Rm    {can be negative} LDR R0,[R1,R2]
Register indirect with scaled Register offset Rn, [Rm,shft #n] value from address in register Rn+(Rm*#n) LDR R0,[R1,R2, LSL #2]
Register indirect with Preincrement [Rn,#n]!  /  [Rn,Rm]! value from address in register Rn+n... set Rn=Rn+n LDR R0,[R1,#4]!
Register indirect with Postincrement [Rn],#n  /  [Rn],Rm value from address in register Rn... set Rn=Rn+n
(No need for ! - as it's the only purpose of the command!)
LDR R0,[R1],#4
PC Relative Rn,addr value from address addr LDR R0,addr

All addressing modes are available for the main commands, but others are more limited.



Command format

Command Dest, Source, Param, Shifts

Command {COND}{B}{S} Dest, rd, [rs,off]{!}
B= byte transfer
!= update reg Rs
S= update conditional flags

post-indexed offset.
The syntax of the four forms, in the same order, are:
zero offset
    op{cond}type Rd, [Rn]
pre-indexed offset
    op{cond}type Rd, [Rn, Offset]{!}
program-relative
    op{cond}type Rd, label
    op{cond}type Rd, [Rn], Offset
where:
op is either  LDR or  STR .
cond is an optional condition code
type must be one of:
    SH for Signed Halfword ( LDR only)
    H for unsigned Halfword
    SB for Signed Byte ( LDR only).
Rd is the ARM register to load or save.
Rn is the register on which the memory address is based.
Rn must not be the sameas  Rd , if the instruction is either:
    pre-indexed with writeback
    post-indexed.

label is a program-relative expression.  label must be within 255 bytes of the current instruction.
Offset is an offset applied to the value in  Rn
! is an optional suffix. If  ! is present, the address including the offset is written back into  Rn . You cannot use the  ! suffix if  Rn is r15.

Zero offset
The value in  Rn is used as the address for the transfer.

Pre-indexed offset
The offset is applied to the value in  Rn before the transfer takes place. The result is used as the memory address for the transfer. If the  ! suffix is used, the result is written back into  Rn .

Program-relative
This is an alternative version of the pre-indexed form. The assembler calculates the offset from the PC for you, and generates a pre-indexed instruction with the PC as  Rn .You cannot use the  ! suffix.
Post-indexed offset
The value in  Rn is used as the memory address for the transfer. The offset is applied to the value in  Rn after the transfer takes place. The result is written back into  Rn .
Offset syntax
Both pre-indexed and post-indexed offsets can be either of the following:
#expr  {-}Rm
where:
- is an optional minus sign. If  - is present, the offset is subtracted from  Rn . Otherwise, the offset is added to  Rn .
expr is an expression evaluating to an integer in the range 255 to +255. This is often a numeric constant 
Rm is a register containing a value to be used as the offset.
The offset syntax is the same for LDR and STR, doublewords on page 4-15.\


Architectures
These instructions are available in ARM architecture v4 and above.
Examples
LDREQSH r11,[r6] ; (conditionally) loads r11 with a 16-bit halfword from the address in r6. Sign extends to 32 bits.
LDRH r1,[r0,#22] ; load r1 with a 16 bit halfword from 22 bytes above the address in r0. Zero extend to 32 bits.
STRH r4,[r0,r1]! ; store the least significant halfword from r4 to two bytes at an address equal to contents(r0)  plus contents(r1). Write address back into r0.
LDRSB r6,constf ; load a byte located at label constf. Sign extend.


Lesson 1 - Getting started with ARM
Lets start looking at some simple commands, and get the hang of the ARM registers!

These tutorials will use VASM to build... RPCEmu to run compiled code, and we'll use a simple monitor... you can download all the tools in the links to the right

There's a video of this lesson,  just click the icon to the right to watch it ->


Our Compiler and emulator
We're going to be using VASM as an assembler, it's a free which works on windows, OSX and Linux
My Devtools provide a batch file which will build the programs for you, but if you don't want to use them, the format of the build script is shown below:



-Fbin ... Specifies to create a Binary file
-Dxxx=Y ... Specifies to define a symbol xxx=y (we'll learn about symbols later.
-L ... Specifies a Listing file - this shows source code and resulting bytes... it's used for debugging if we have problems
-o ... Specifies the output file.
%BuildFile%... this would be the sourcefile you want to compile... Eg: Lesson1.asm
-m7tdmi... (or equivalent) specifies the ARM architecture we're building for.
-chklabels -nocase ... Disable case sensitivity, and check for lines where we've forgotten a tab on a command (it will be mistaken for a label)
Once we've successfully compiled our program, we can run it with VisualBoyAdvance

We'll also use RiscOS, but setting this up is more complex if you're doing it yourself.


A template program
To allow us to get started programming quickly and see the results, we'll be using a 'template program'...
This consists of 3 parts:

A Generic Header - this will set up the screen and a few parameters we'll need to start.

The Program - this is the body of our program where we do our work.

A Generic Footer - this gives us some support tools, and includes a common bitmap font.

This template program will compile on any of the systems in these tutorials (RiscOS and the GameboyAdvance!)
There's a lot of complex scary stuff in the include files - don't panic about it for now, you'll be able to understand it more later once you've covered all the lessons.

Commands, Labels and Calls
Lets take a look at a simple program!...

The first line is a command 'BL' (Branch and Link)... this is the same as CALL or JSR on other systems... it runs the subroutine labeled 'DoMonitor' - when that subroutine finishes (when register LR is transferred to PC) the program will carry on with the line after the BL call ... notice the command starts indented *this is required for commands*

the next line is not indented and ends with a colon : - that makes it a label called 'infloop' ... labels tell the assembler to 'name' this position in the program - the assembler will convert the label to a byte number in the executable... thanks to the assembler we don't need to worry what number that ends up being...

finally we have the command 'B' (Branch)... this is a jump! unlike BL (Branch and Link), it never returns... notice we're Branching to the label we just defined on the line before.... this makes the program run infinitely... a crude way to end our program so we can see the result!

you'll also notice text in green starting with a Semicolon ; - this is a comment (REMark) - they have no effect on the code

Subroutines and returns
Lets look at another subroutine.

This one stars with a label 'GetNextLine'... we know it's a label because it's not indented and ends in a colon... this is the name of the subroutine - we'll see the name with BL (Branch and Link) statements (calls on the arm).

Then there is an ADD Command... it adds 160 to r10 (R10=R10+160)... it is indented, so they are clearly commands...

Finally there is a MOV PC,LR command - this ends a subroutine... BL transfers PC (the program counter... the current running byte) to LR - transferring LR back to PC returns to the command after the BL command

if our code has a RET at the end - it's a subroutine and should probably be started with a CALL... if we start it with a JMP something bad will probably happen!

  
ARM calls are very weird... CALL is called BL - and rather than push the PC (Reg T15 - Program counter) onto the stack, it move it to LR (reg R14)
We can also return by popping a previously pushed LR back into PC

Don't worry if you don't understand this yet - this info is just for those familiar with other CPUs- we'll cover it in more detail soon!

The power of ARM, and the limitations of RISC!
It's time to start loading data into 'registers'...
Registers are the small bits of memory in the processor we use to store values we want to perform calculations on...

The ARM has 16 registers R0-R15 but many have special purposes - when doing mathematical operations we need to limit our use to R0-R12

we load a value into a register with a MOVe command... the destination register is on the left of the comma - the value is on the right (Starting with as #)
0x defines the value as a hexadecimal

All the registers are 32 bit, but due to the limitations of the instruction set, only 4 consecutive digits of the 8 hexadecimal digits can be nonzero - we'll learn more about this later.
We can see here the two registers have been set.
Because of  the RISC limitation this command will not compile - it has 5 digits that are nonzero.

it may seem weird we can't set all 8 bytes of a register in one go, but there's ways around this!

it all comes down to the way the instructions turn into bytes - each instruction is 4 bytes - and there's only enough 'space' in the MOV command to set 2 bytes of the register value

Hex,Dec,Binary and Asc Oh my!... also Adding and Subtracting.

We can load hexadecimal values (Base 16 - 0123456789ABCDEF) into registers by starting the value #0x....
if we want to use binary (Base 2 - 1's and 0s) by starting the value #0b...

Decimal values are just started #... unfortunately it seems VASM doesn't allow ascii characters as immediate values (They can be stored in BYTE string data, but not here)
Here's the result!
We've looked at loading numbers into registers, but MOV can also move one register into another..

In this example R2 will be moved into R0.... the destination is on the left of the comma, the source is on the right.
R0 will be set to the value that was in R2
Of course, we don't just have a MOVe command - we also have ADD and SUB for addition and subtraction!

The destination is the first parameter , the two values to be added are the second and third... for example:
    add r1,r0,#0x00000001
could be thought of as: R1=R0+#0x00000001

if we just want to change a value, the second and third parameters can be the same, for example: add r0,r0,#1 - or they can be different, for example: add r0,r1,#1

Before we learned we could not load #0x12345678 directly into a register, however we can do this in two parts, loading the first 4 digits with MOV - then adding the other 4 with ADD
the changes to the registers are shown here
Reading and writing 32bit values to RAM
MOV is good for setting registers from fixed values or other registers, but it's not what we need for working with RAM

For this example we'll define a 32 bit 'long' in ram called 'TestVal'

We'll use LDR to load from the testvalue... with STR, the Destination register is on the left, and the Source address is on the right...

with LDR, the Source register is on the left, and the Destination address is on the right...


LDR and STR load and save 32 bit values (the entire length of the register)

We loaded in 0xFEDCBA98 from RAM with LDR

Added 1
and wrote it back as 0xFEDCBA99 with STR
USER Ram is defined with a SYMBOL
Like a label, a symbol is a text name which is replaced by the assembler for a number

we use .EQU to define a symbol, in this case we're setting ramarea to 0x02000000 (this is the GBA version)
If we want to write to the address in a register, we put the register in Square Brackets (Eg [R1] )

We can load the address of a label like 'testval' with the ADR command... this will transfer a label address into a register
We can then use LDR to read in from that register.

If the address is in a symbol not a label like 'userram', ADR will not work - in this case we just use MOV to transfer the address into our register
we can use STR with that address to save back to that address
We load the address of TestVal into R2 with ADR

We then loaded R1 from [R2] with LDR

Next we loaded the address of 'UserRam' into R2 with MOV

We gave R0 a new value with MOV and store R0 back to [R2] with STR


The example here shows data is stored by the ARM in 'Little Endian' format... meaning the lowest value byte in a 32 bit register is stored first... and the highest is stored last.

This is basically always the case with the ARM - however the ARM CPU can actually also work in Big Endian mode.

Reading and writing Byte 8bit values to RAM
The previous LDR and STR worked with 32 bit registers... but we'll often want to work with bytes,

The ARM allows this with a LDRB and STRB command - they work the same as the other commands, but just load a single byte
We loaded in a byte from TestVal with LDRB... Note that the 24 unused bits of the register changed to 0

We then added 255 - causing the R1 to expand out of a single byte...

We then save back with STRB - because we used a byte command, only the low byte was saved

LDR and STR work with 32 bit values... LDRB and STRB work at 8 bit...But what about 16 bit? well LDRH and STRH (H=Half) will load and save 16 bit...
but these commands only exist on later processors, the Gameboy Advance uses them fine - but RiscOS can't use them!
Because the ARM is 32 bit, a WORD is 32 bits on arm, rather than 16 bit like on the Z80 or 68000
VASM uses the statement '.long' to define a 32 bit value - but a
LONG on the ARM would typically be 64 bit.

To avoid confusion the terms WORD and LONG won't be used in these tutorials - the length will be referred to in bits instead

Lesson 2 - Addressing modes and rotation on the ARM
We learned a few simple commands last time, but now it's time to start getting serious!
The ARM has many clever ways of addressing memory - and has something called the a 'Barrel Shifter' - We'll learn what that is soon...
Lets learn about each addressing mode on the arm!


1. Immediate - direct numeric values
We've already come across this!... Immediate addressing is where the values are numbers stored directly in the code
In this example The value is transferred into the register.

The size of the immediate value depends on the command, sometimes it can be 16 bit, others it can be only 8 bit... Though it can be shifted - so 0x0000FF00 is a valid 8 bit option.
The results are shown - our registers now contain the requested values
2. Register - Data from other registers
Register addressing is far less exciting than it sounds... it's just where a parameter is taken from the value in a register.
Here we've set R1 to the value in R2, then R2 to the value of R1+R2
These are both examples of register addressing
3 . Register indirect - Address is in register
Register Indirect is where the register holds an address, and that address is the source of the value for the command..

The register is wrapped in square brackets eg [r2]
We can load with LDR or save with STR

The value in RO has been read and written into the address in R2
4 .Register indirect with constant offset - Direct numeric values
As well as using the value of a register as the address, we can use the register plus a fixed offset.

the Offset is put in the square brackets [] after a comma

This is useful if our register points to a bank of settings, and the offsets point to individual settings in that bank.

To make things easier, we can define symbols and use those as the offset... also notice the offset can be negative
R2 points to the start of the data bank...

we read in R0 from the base+4 - R1 from the base+8 - and R2 from the base-4 (not shown in the ram dump)
5. Register indirect with register offset - Address in sum of two registers
Rather than a fixed offset from the address, we can use the value in a register... effectively the resulting address is the sum of the two addresses
The registers will be loaded from their respective offsets.
6. Register indirect with Preincrement - Increase register and Get from address in register
There will be times when we want to read a sequence of bytes in a loop - we'll probably want to read in using a register - then increase the address specified by that register, so we read the following data in the next loop iteration

The ARM can do this for us... just put a ! at the end of the command, and the address register (R1) will go up by the offset #4 BEFORE each read.
The First Value was read without preincrement - each other was done with a preincrement of 4... notice how R1 goes up by 4 each time
Just like before, the 'Increment' can actually be negative - so you can read backwards sequentially as well as forwards!

Isn't the ARM great!??

7. Register indirect with Postincrement - Get from address in register and Increase register
If we want to increase the register AFTER the read, we can do this too... instead of putting the offset inside the square brackets [] and a ! - we just put the offset OUTSIDE the square brackets (no ! required)
The First Value was read without postincrement... the second used postincrement but this also read from the same address... the others also used postincrement, and these loaded from successive addresses

8. Program Counter Relative - label relative to current code
PC Relative allows us to load directly from a label near to the current running code,

we don't need to know what the PC is, the assembler works it out for us
The specified addresses are loaded into the registers.

9. Register Shifted - Value of a register bit shifted
Many CPU's have rotation and shifting commands commands which will perform bit shifts on a register - but the ARM is special, bit shifts can be performed on the value of a register with virtually any command!
There's no dedicated ROT commands, n case we just rotate a registers value an move that result into another register.
LSL/LSR
We have two 'Logical shift commands'... these are designed to work with unsigned numbers

Shifting Right with LSR essentially halves the number, shifting left with LSL effectively doubles - of course they can also be used to move bits around!
We loaded R0 with our test value and shifted 8 bits to the left into R1... the top '8' got pushed out and was lost

We then shifted 8 bits to the right in R2 - the 2 went of the right hands side, and was lost
ASR is 'Arithmetic shift Right' - this effectively shifts bits to the right like LSR - however it's designed to work with negative values, and will copy the top bit to the freed up bits to allow negative numbers to be halved
Here is the result... in R2 the top byte has changed to FF as all the bits are 1
ROR is Rotate right - unlike the other commands which push the bits our of the register ROR will rotate them around again, so any bits pushed off the right will me moved to the left of the register.
We rotated by 8 bits right - this pushed 02 off the right side, and onto the left side... the remaining bytes 800010 moved to the right
We don't just have to use immediate values - we can use a register value as the shift amount!
Here's the result - the value in R3 was used as the shift amount
RRX is the last option - this rotates the rightmost bit into the Carry bit - and any value that was in the Carry bit is moved to the topmost bit.

RRX can only rotate by 1 bit, also there is no left rotate.

we have to add an S to the mov command - making it movs - otherwise the Carry bit won't be set
The TestValue was shifted 1 bit to the right into R1... this pushed a bit 1 into the Carry...
This Carry bit was then shifted into R2 - Making the top byte C0 (%11000000)

10. Register indirect with scaled Register offset - Value of a register bit shifted
Because these bit shifts can be used with many other commands, we can use them to multiply a parameter for a register indirect offset -

In this example we've shifted R1 left twice - effectively making our formula [R2 + (R1*4) ]
As we increase and decrease R1, the address we read from will change accordingly.

Lesson 3 - Labels, Branch CMP
We've learned how to do mathematics and how to move data in and out of memory,
Next we need to learn how to add conditions and branches - these will make up our loops, and our program logic.
Unlike most systems, on ARM conditions can apply to most commands, not just branching operations!


Flags on the ARM
CPU flags are set by mathematical operations and allow us to check if the result of an operation was zero, or if any bits we're pushed out of a register by a rotate command or addition.

On most CPU's the flags are set automatically however this is not the case on the ARM

the ARM cpu will generally only set the flags when we add a S to the end of our command - this causes the flags to be set by the command
The Add commands caused the value in R0 to roll over back to zero

The first add command did not end in S so the flags did not change

The second add command was addS - ending in S... this tells the cpu to set the flags - the Carry flag is set because the register overflowed, the Z flag was set because the current value of R0 is zero

We're going to look at some examples of these flags and condition codes - but really you should try them yourselves!

You'll notice commented out code (starting ;) - these are alternative tests you can do to see the conditions in action - Ideally you should try them yourselves, but they'll all be shown on the video!

Carry: CS/CC
The Carry flag is set when a register's value exceeds the limits of 32 bit - for example when we add 1 to 0xFFFFFFFF,

It will also be set by rotate commands that push a bit out of the register like RRX

We're going to use a Branch command with a condition code to test for the carry... BCC will Branch if Carry is Clear... BCS will branch if Carry is Set

Condition Codes:
CS = Carry Set
CC = Carry Clear
The Carry flag was set, so the BCS occurred, showing a C to the screen

Zero: EQ/NE
The Zero flag is set whenever a mathematical operation results in zero - either because of a subtraction, an addition or overflows, or other operation that results in a register containing zero... it's also set when a compare operation is performed on two registers with the same value - as the difference is zero.

We'll use BEQ (Branch if Equals) and BNE (Branch if Not Equals)

EQ - Equals (Zero)
NE - Not Equals (Not Zero)
The Zero flag was set (because the difference between the two registers was zero)
This caused the jump to occur, and the = was shown.

Unsigned Numbers: CS/CC/HI/LS
Unsigned mathematics (that do not use negative numbers ) use 4 comparisons - two we've already seen!
the CMP command is effects the flags like a 'subtraction' command, but does not alter registers.

there are four commands

>= CS - Carry Set
< CC - Carry Clear
> HI - Higher (Carry set and Zero Clear)
<= LS - Lower or same (Carry Clear or Zero Set)

Because negative numbers start with a 1 as the top bit, they will be treated as very large by these commands, we need to use other commands to test these
The Zero and Carry flag will be set depending on the values compared

Signed Numbers: GE/LT/GT/LE
Because of the way negative numbers works in assembly, We need to use 4 different commands for comparing signed numbers,
there are four commands

>= GE - Greater or Equals (N set and V set or N clear and V clear)
< LT- Less Than (N set and V clear or N clear and V set)
> GT - Greater Than (Z clear and N set or V set or N clear and V clear)
<= LE - Less than or Equals (Z set or N set and V clear, or N clear and V set)
The jumps will occur according to the flags... the flag-rules are pretty complex for these, but the commands are easy to use.

Positive / Negative Numbers: PL/NI 
There may be times we need to simply know if a number is positive or negative, the N flag does this for us...

We can use two special conditions to do this

PL - Positive (Negative Clear
NI - Negative (Negative set)
The N flag is set according to the top bit of the register

Overflow: VS/VC
Overflow occurs when the limit of a signed number is breached and a positive number incorrectly flips to a negative (or vice versa)

A signed number cannot contain >+32767 or <-32768... when it tries to the top bit will flip, and the value will become invalid...

Overflow is designed to allow this to be detected... we have two conditions:

VS - oVerflow Set
VC - oVerflow Clear
The jump will occur according to the V flag

Always/Never: AL/NV
These are pretty useless, but they do technically exist... one that always happens, and one that never does!

AL - jump ALways
NV - jump NeVer

Conditions everywhere!
While Conditions on branch commands exist on all CPUs, the ARM has something really special!

Conditions can be attached to most commands!

just add the CC condition code to a command, it will only run if the condition is met - this allows for conditional code without branching.
Here is the result

Some commands work with these condition codes, and others dont! Check out the cheatsheet for the full details!

Lesson 4 - The Stack... and SWI
We've learned how to save values in memory - but what about if we want to store a value for a very short time?

We need a temporary store, and that's where the stack comes in!


'Stacks' in assembly are like an 'In tray' for temporary storage...

Imagine we have an In-Tray... we can put items in it, but only ever take the top item off... we can store lots of paper - but have to take it off in the same order we put it on!... this is what a stack does!

If we want to temporarily store a register - we can put it's value on the top of the stack... but we have to take them off in the same order...


The stack will appear in memory, and the stack pointer goes DOWN with each push on the stack... so if it starts at $2000 and we push 2 bytes, it will point to $1FFE

As the ARM is 32 bit, we'll push onto the stack 32 bits at a time.


Pushing and Popping the stack
There are no dedicated PUSH / POP commands for the stack on the ARM - and technically any register can be used as the stack... though SP is defined as R13

To move an item onto the stack we use: str ??, [sp, #-4]!  ... this is our PUSH command
to take an item off the stack we use: ldr ??, [sp], #4  ... this is our POP command

In this example, we'll load R0 with a value, push it onto the stack, change R0, then restore the pushed value from the stack

We'll view the registers and stack at each stage
The test value was loaded into R0 - Pushed onto the stack... then recovered into R0
We can nest pushes... The important thing to understand is that we pop off in the reverse order to the way we pushed them on...

We can also push a value in R0 onto the stack, and pop it off in R1
Because the stack moves down in memory, the second value appears before the first in ram.

Pushing Multiple items with STMFD and LDMFD
We can push multiple items with STMFD and LDMFD, We use a comma list eg (r1,r2,r4) and/or a range (r1-r4,r6)

The order we put the registers in the list doesn't affect the order they are pushed onto the stack.

But of course if we pop them of into different registers, things could go wrong!
The items will be pushed onto the stack and popped off in one go!
As well as the typical STMFD and LDMFD there are other options!

We can have an Ascending or Descending stack (Descending is typical)

We can also have a 'Full' stack (where stack pointer points to last pushed item) or 'Empty' stack (where pointer points to next empty item)
Direction Type Push Pop
Descending Full STMFD LDMFD
Ascending Full STMFA LDMFA
Descending Empty STMED LDMED
Ascending Empty STMEA LDMEA

The Stack with Branch and Link (BL)
As we learned, Branch and Link moves the Program (PC) counter into the Link Register (LR)

When we perform a RETurn, the assembler actually creates a MOV PC,LR command...

Because we need the LR to be intact to return, we need to back it up somehow if we're nesting subroutines...

The easiest solution is to push it onto the stack, and pop it back into the PC...

Alternatively, we could transfer it into another register
Here is the changes to the stack and Link Register

System calls with SWI
SWI stands for SoftWareInterrupt...

Like the RST's of the Z80 and the TRAPs of the 68000 these are often used for OS calls...On RiscOS there are a variety of SWI's...

To use a SWI we use the commands followed by a byte value...

What the SWI does and what parameters need to be passed will depend on the system, you'll need to consult the documentation of that system for details.
we called the show string function, then the end program function
If you're programming the Gameboy Advance then you'll probably never need SWI... these tutorials use the firmware as little as possible, so you won't see it much in those either...

If you're using the firmware though, you'll have to check the manual for Risc-OS, and beware! there are different versions for later Risc OS versions!

Lesson 5 - More Maths!
We're nearly done... but we need to look at operations that work at the bit level, and a few other important commands... lets take a look!


Logical Operations on bits.
We have four kinds of logical operations we can perform on bits.

AND = Return 1 where both parameters are 1 - else 0
ORR = (or) Return 1 where either parameter is 1 - else 0
XOR = Flip bits in first parameter where second parameter is 1
BIC = (Bit CLear) Zero bits in first parameter when second parameter is 1
The results are shown here

Test Operations TST / TEQ

We have two commands which work like Logical operations - but they do not change the contents of the registers - they just change the flags.

TST = effectively ANDs the two perimeters setting the flags accordingly
TEQ = effectively XORs the two perimeters setting the flags accordingly

There's two special commands MSR and MRS - we'll look at those next!
Here is the result!

Backing up flags with MRS / MSR

*These commands only exist on later ARM versions*
if we want to back up the flags, we can do so with these two commands... the flags are in register 'CPSR'.... we can transfer this to or from another register!
MRS will move the flags to a register backing them up
MSR will move the flags from a register restoring them

Using Carry for 64 bits!

There may be times when even 32 bit isn't enough - when we do ADDition or SUBtraction that goes over the limit of a 32 bit register, we can use special commands to add that carry to a second register - the two registers together will give us 64 bits!

ADC adds a parameter + any carry to the top register.
SBC Subtractss a parameter and any carry to the top register.

In either case, we need to do an ADDS or SUBS to the low register first - the S means the flags are set, if we don't do this, the carry will never be set
Here are the results, when the bottom byte over/under flowed, the top byte was altered to compensate for the carry/borrow

Multiplication

The ARM has two multiply commands

MUL - MUltiplies two parameters together.

MLA - MuLtiplies two parameters and Adds a third
The result of the two operation is shown here

3*2=6
... (3*2)+1=7

Negative and reversed commands

We have some special commands, which reverse the order of the parameters

RSB (Reverse SuB)  is like SUB - except whereas SUB R0,R1,R2 will set R0=R1-R2, RSB will set R0=R2-R1... there is a carrying version called RSC

If we want to transfer a positive value and make it negtative we can use MVN R0,R1 (MoVe Negative) - This will set R0=-R1

If we want to compare a register to a negated register we can use CMN R0,R1... this is the equivalent of CMP R0,-R1
We performed a 64bit reversed subtract, Moved a negated value, and compared a negative

ARM4+ only... 16 bit Move (HalfWord), Swap Ram<->Register

This tutorial primarily covers ARM2, but there's a few later commands that are really good to know...

The first are LDRH and STRH - these (like LDR/STR) are load and store commands - however these work at the HalfWord (16 bit) level... they're handy for the Gameboy Advance screen!

another interesting command is SWP - this transfers a Ram address to a register, and a register to the same ram address... The Source/Destination registers can be the same or different.
We loaded in a Half (16 bit)... then stored the modified Half back to ram

We Swapped the ram into R0 and R1 into Ram
We've covered all the basic ARM2 commands - there are many more in the later revisions, but we won't be covering them at this time.

We've looked at enough to get started with RiscOS or the Nintendo Gameboy Advance!


 

View Options
Default Dark
Simple (Hide this menu)
Print Mode (white background)

Top Menu
***Main Menu***
Youtube channel
ASM Programming Forums
Dec/Bin/Hex/Oct/Ascii Table

Z80 Content
***Z80 Tutorial List***
Learn Z80 Assembly
Hello World
Advanced Series
Multiplatform Series
Platform Specific Series
ChibiAkumas Series
Grime Z80
Z80 Downloads
Z80 Cheatsheet
Sources.7z
DevTools kit
Z80 Platforms
Amstrad CPC
Elan Enterprise
Gameboy & Gameboy Color
Master System & GameGear
MSX & MSX2
Sam Coupe
TI-83
ZX Spectrum
Spectrum NEXT
Camputers Lynx

6502 Content
***6502 Tutorial List***
Learn 6502 Assembly
Advanced Series
Platform Specific Series
Grime 6502
6502 Downloads
6502 Cheatsheet
Sources.7z
DevTools kit
6502 Platforms
Apple IIe
Atari 800 and 5200
Atari Lynx
BBC Micro
Commodore 64
Commander x16
Super Nintendo (SNES)
Nintendo NES / Famicom
PC Engine (Turbografx-16)
Vic 20

68000 Content
***68000 Tutorial List***
Learn 68000 Assembly
Platform Specific Series
Grime 68000
68000 Downloads
68000 Cheatsheet
Sources.7z
DevTools kit
68000 Platforms
Amiga 500
Atari ST
Neo Geo
Sega Genesis / Mega Drive
Sinclair QL (Quantum Leap)
X68000 (Sharp x68k)

8086 Content
Learn 8086 Assembly
8086 Downloads
8086 Cheatsheet
Sources.7z
DevTools kit
8086 Platforms
Wonderswan
MsDos

ARM Content
Learn ARM Assembly
ARM Downloads
ARM Cheatsheet
Sources.7z
DevTools kit
ARM Platforms
Gameboy Advance
Risc Os

Risc-V Content
Learn Risc-V Assembly
Risc-V Downloads
Risc-V Cheatsheet
Sources.7z
DevTools kit

PDP-11 Content
Learn PDP-11 Assembly
PDP-11 Downloads
PDP-11 Cheatsheet
Sources.7z
DevTools kit

My Game projects
Chibi Aliens
Chibi Akumas

Work in Progress
Learn 6809 Assembly
Learn 65816 Assembly
Learn 6809 Assembly
Learn TMS9900 Assembly
Dragon 32/Tandy Coco
Fujitsu FM7
Ti 99

Misc bits
Ruby programming













































If you want to support my work, please consider backing me on patreon!










Buy Chibi Akuma(s) from PolyPlay
Buy ChibiAkuma(s) games now!



































































































































































































































If you want to support my work, please consider backing me on patreon!










Buy Chibi Akuma(s) from PolyPlay
Buy ChibiAkuma(s) games now!



































































































































































































































If you want to support my work, please consider backing me on patreon!










Buy Chibi Akuma(s) from PolyPlay
Buy ChibiAkuma(s) games now!