Learn Multi platform Super-H Assembly Programming... Because Why not?

The Super-H is a series of processors developed by Hitachi, and is now distributed by Renesas.

The Super-H is probably best known for the sega consoles 32x (SH-2) Saturn (SH-2) and Dreamcast (SH-4), it was also used in some Pocket PCs
The Super-H is also known as the SH7600 series (SH-2) and SH7700 series (SH-3)
There is also an open source implementation of the SH-2, known as the J-core

We'll only be covering the SH-2 in these tutorials, and we'll use the 32X emulator for our testing!

If you want to learn SH2 get the Cheatsheet! it has all the Super-H commands, It will help you get started with ASM programming, and let you quickly look up commands when you get confused!

We'll be using ASW as our assembler for these tutorials
You can get the source and documentation for ASW from the official website HERE

Useful Documents

SH7000 7600 Series Super H RISC Engine Programming Manual 1994

SH-1/SH-2/SH-DSP Software Manual

The SH-2 Registers
All SH2 registers are fully 32 bit.

There are 16 general purpose registers, and a few which have special purposes, and limited commands which can access their values

General Purpose Registers:

R0	General Purpose Index for addressing Fixed source for some instructions
R1	General Purpose
R2	General Purpose
R3	General Purpose
R4	General Purpose
R5	General Purpose
R6	General Purpose
R7	General Purpose
R8	General Purpose
R9	General Purpose
R10	General Purpose
R11	General Purpose
R12	General Purpose
R13	General Purpose
R14	General Purpose
R15 / SP	Stack Pointer

Special Registers:

SR	Status Register (Flags)
GBR	Global Base Register
VBR	Vector Base Register (TRAP / Exception Processing base)
MACH	Multiply And Accumulate High value
MACL	Multiply And Accumulate Low value
PR	Procedure Register (Return address used By JSR/BSR and RTS)
PC	Program Counter (current instruction + 4)
MOD	Modulo Register (SH-DSP only)
RS	Repeat Start (SH-DSP only)
RE	Repeat End (SH-DSP only)
DSR	(SH-DSP only)
A0	(SH-DSP only)
X0	(SH-DSP only)
Y0	(SH-DSP only)
X1	(SH-DSP only)
Y1	(SH-DSP only)

MACH was only 10 bit on the SH-1 CPU (SH7000)

MACH is fully 32 bit on the SH-2 CPU (SH7600)

Status Register (Flags) bits

F	E	D	C	B	A	9	8	7	6	5	4	3	2	1	0	F	E	D	C	B	A	9	8	7	6	5	4	3	2	1	0
																						M	Q	I	I	I	I	-	-	S	T

T= Carry Bit
S = Used by Multiply and Accumulate
I = Interrupt mask bits
M and Q = used by Div

The Super-H can run in Big or Little endian mode!

On the 32X and Saturn it runs in Big Endian, like the 68000

Addressing Modes

Mode	Format	Notes	Example
Direct register addressing	Rn	The effective address is register Rn. (The operand is the contents of register Rn.)	mov r0,r1
Indirect register addressing	@Rn	The effective address is the content of register Rn	mov.l @r5,r0
Post-increment indirect register addressing	@Rn+	The effective address is the content of register Rn. Rn is incremented by the amount loaded (B/W/L = 1/2/4)	mov.l @r5+,r1
Pre-decrement indirect register addressing	@�Rn	First, Rn is decremented by the amount loaded (B/W/L = 1/2/4) The effective address is the value obtained by subtracting a constant from Rn.	mov.b r3,@-r5
Indirect register addressing with displacement	@(disp:4,Rn)	The effective address is Rn plus a 4-bit displacement (disp). The value of disp is zero- extended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation.	mov.l @(4,r5),r2
Indirect indexed register addressing	@(R0, Rn)	The effective address is the Rn value plus R0 (RN+R0)	mov r0,@(r0,r5)
Indirect GBR addressing with displacement	@(disp:8,GBR)	The effective address is the GBR value plus an 8-bit displacement (disp). The value of disp is zero-extended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation.	mov.l @(8,gbr),r0
Indirect indexed GBR addressing	@(R0,GBR)	The effective address is the GBR value plus R0. (GBR + R0)	and.b #1,@(r0,gbr)
PC relative addressing with displacement	@(disp:8,PC)	The effective address is the PC value plus an 8-bit displacement (disp). The value of disp is zero- extended, and disp is doubled for a word operation, or is quadrupled for a longword operation. For a longword operation, the lowest two bits of the PC are masked.	mov.l @(4,pc),r1
PC relative addressing (8 bit)	disp:8	The effective address is the PC value sign-extended with an 8-bit displacement (disp), doubled, and added to the PC. (PC + disp * 2)	bt SkipD
PC relative addressing (12 bit)	disp:12	The effective address is the PC value sign-extended with a 12-bit displacement (disp), doubled, and added to the PC. (PC + disp * 2)	bsr ShowB
PC relative addressing (Register)	Rn	The effective address is the register PC plus Rn. (PC + Rn)	bsrf r0
Immediate addressing	#imm:8	Immediate is zero extended for TST,AND,OR and XOR Immediate is sign extended for MOV, ADD and CMP/EQ Immediate is zero extended and quadrupled for TRAPA	mov #-100,r0

Branch Delay Slots

JMP, BRA, JSR, BSR, BSRF, RTE and RTS all have a delay slot after them, meaning the command after these instructions will occur before the jump!... if that sounds annoying (Which it is!) just put a NOP after these commands!

BF/S and BT/S also have a delay slot... that's what the /S means!

BF and BT do not have a delay slot.

There are also no load delays on the Super-H.

Lesson 1 - Getting Started with the SuperH
Lets start learning about the SH2 or SH3... Lets learn how to do simple maths operations, and how to transfer data to and from memory.

There's a video of this lesson, just click the icon to the right to watch it ->

Lesson1.asm

A template program

To allow us to get started programming quickly and see the results, we'll be using a 'template program'... This consists of 3 parts: A Header - this includes the hardware initialization to get things in a usable state. The Program - this is the body of our program where we do our work. A Generic Footer - this includes core graphics routines, and our 'monitor' debugging tools
The test program will show a text string. It will then dump all the system registers. Finally it will show a memory area to the screen. These will compile for the Saturn or 32X!... the include files have code to do the same screen drawing on both systems. These tools are designed for testing and debugging the SH2 - we'll use them in our tutorials!

The DevTools on this website come with headers to allow this program to assemble for the 32X or Saturn, but without them you couldn't compile this program.

It takes a lot more code to get either of these machines to even turn on the screen!

Commands, Labels and jumps

Lets take a look at a simple program!...

There will be times we need to jump around the code... the simplest way to do this is the command 'BRA'... this will BRAnch (like Jump or Goto) to another position in the code ... notice, commands like this are indented by a tab.

Notice! There is a NOP command after the branch - We need to put one of these after a branch - it doesn't do anything (No OPeration), but we need it to make the BRA command work right.

Notice the line which is not indented and ends with a colon : - that makes it a label called 'InfLoop' ... labels tell the assembler to 'name' this position in the program - the assembler will convert the label to a byte number in the executable... thanks to the assembler we don't need to worry what number that ends up being...

you'll also notice text in green starting with a Semicolon ; - this is a comment (REMark) - they have no effect on the code

Why do we put a NOP after BRA and BSR? Well these commands have a 'Delay Slot'... This means they run the command following the command, before the command! so the NOP after BRA INFLOOP is actually executed before the branch!
We use a NOP so we don't have to worry about it - as a NOP does nothing.
This may sound like a bug, but it's not, it makes the processor more efficient if we take advantage of it, but we're not worried about speed, so for clarity and simplicity we wont' use the delay slots for anything other than a NOP.

Loading values into registers

Lets start with some simple loading of registers. Registers R0-R14 are available for our general use. We can load a value into a register with the MOV command (Move) The source parameter is on the Left of the comma, the destination register is on the right. Here we specified an Immediate (fixed number value) by putting a hash symbol # at the start of the number.
Here are the results - we loaded R0 R1 and R2 Note, the numbers are in 'Hexadecimal' so don't quite look the same as the decimal values, but we can check them with windows calculator and confirm they are the same. $7F=127 $FFFFFFFF=-1 $FFFFFF80=-128
Lets try some more Immediates! A number on it's own is Decimal A number starting with a percent symbol % is Binary A number starting with a Dollar symbol $ is Hexadecimal Characters in quotes ' are ASCII
Here are the results - each register was loaded with the value - all shown here in HEX
Because SH2 commands assemble to 16 bit code, there isn't much space for immediate values in commands. Actually, only -128 to +127 can be stored in the assembled command. Many of the values we just specified were longer, but the Assembler worked things out for us, and stored the values nearby in the code, with a pointer to the value in the MOV command. We specify the location that these values can be stored with the command 'LTORG' We don't need to worry too much about this, just remember to put a LTORG command near your code (There can be multiple in a file, the assembler uses the next one to store the values), especially if you get errors relating to your immediate commands!	Here's the assembled result... notice the $66606660 we used!
MOV doesn't just load registers with immediates! MOV can transfer the value from one register to another. Again, the destination register is on the RIGHT, the source is on the LEFT
We copied R0 to R1 and R2 Then we copied R1 to R14 (not shown)
R15 is a special register - it acts as the Stack Pointer. This has a special purpose, so we shouldn't just use it as a general register - we'll learn more later. For clarity R15 can be referred to as SP. Here we copy the value FROM R15/SP, but don't copy TO it, that will mess things up
The SP register will be copied to R0 and R1 The value you will see will vary depending on the system.
Lets try loading values from memory. First we need to load a memory address... we've defined two labels with values TestValue and TestValue2 We load the addresses with a MOV.L command. The .L suffix defines the command as Long (32 bits) We load values from the address in a register with the @ prefix mov.l @r5,r0 tells the CPU to load a 32 bit Long from the address in register R5 into R0 mov.w @r5,r1 tells the CPU to load a 16 bit Word from the address in register R5 into R1 mov.b @r4,r2 tells the CPU to load an 8 bit Byte from the address in register R4 into R2 Actually MOV defaults .L, so mov @r5,r3 does the same as mov.l @r5,r3 Most commands default to .L but for clarity we may wish to specify .L to load 32 bits
Here we loaded all the values. Notice the Byte in R2, and the Word in R1, were sign extended, meaning the 'extra bits' were filled with the top bit of the loaded value, making the 32 bit value's sign the same as the Byte/Word
The SuperH can do more than just load with the @ command @r4+ will load a value from the address in register R4, then increment R4 by the amount loaded. This is known as postincrement.
Here we loaded 4 consecutive bytes from r4. We'll learn more about addressing modes in the next lesson.

On the 32x and Saturn, the SuperH is a BIG ENDIAN CPU, meaning it stores the most significant byte of a 32 bit word first in memory.
But this is actually optional, the CPU can be set to run in LITTLE ENIAN mode, where the least significant byte is stored first!

Word and Long reads must be aligned on even boundaries... Bytes can be loaded or stored anywhere.

Warning! The Fusuion 32x emulator allows incorrectly aligned W/L access which would fail on real hardware, Saturn emulator Yabause does 'correctly fail' with these misaligned addresses!!

Addition and Subtraction

We can add or subtract registers! add r2,r0 will add R2 to R0, storing the result in R0 sub r2,r1 will subtract R2 from R1, storing the result in R1 We can also use immediates! Add #1,R0 will add 1 to R0 but there is no SUB # command, however we can use ADD #- to add a negative value
Here are the results. Note: $FFFFFFFF is the hexadecimal representation of -1

Branches, Jumps and Subs

There will be many times we want to call subroutines to do work for us and return - like a GOSUB in basic. We can use BSR to Branch to a SubRoutine. The return address is put in the special PR register (Procedure Register) Branches use relative addresses, and can't branch to far, If we need to call somewher else we can use JSR. JSR can call a subroutine further away, but the destination address must be loaded into a register. Note: Both BSR and JSR have a delay slot, meaning the command after the JSR/BSR is executed BEFORE the jump - we've put a NOP in this slot to make things simple
Here are the results
This time we'll use the delay slot... we've put "mov #'?',r0" AFTER the first branch to Printchar
Even though the command was after the call to the subroutine, it happened before - because the BSR was delayed by one command
We finish our subroutine with an RTS command (Return from subroutine). RTS also has a delay slot But there's a problem! We want to call other subroutines within this subroutine, but this will cause the loss of the return address in the PR register. We can backup and restore the PR register via the stack with STS (STore Special register) and LDS (LoaD Special register) - we'll learn more about these commands later.
We may want to skip to another part of our code, without a subroutine call (Like a GOTO) we can use BRA (BRAnch) or JMP (JuMP) to do this.
Here are the results
You probably won't need these, but for completeness we'll discuss it. Branches use relative addresses, so the code can be relocated, but can only branch short distances. BSRF and BRAF can branch any distance, but we have to calculate a relative offset to the program counter. We can get the program counter using the * register, but the program counter is always a few commands ahead of the line of the code, so we add 6.

Lesson 2 - Addressing Modes
We've done some simple stuff, but now lets take a look at all the addressing modes available.

These represent the possible source, or destination of the data as we process our commands

Lesson2.asm

Immediate addressing (#imm:8)

We've seen Immediate addressing before! This uses an 8 bit immediate value in the code itself. The immediate starts with a Hash symbol # Most commands use a signed immediate, giving a range of -128 to +127. However Logical operations (AND / OR / XOR etc) use unsigned numbers, giving a range of 0-255. With Immediates, Logical operations only work with R0
Here are the results

Direct register addressing (Rn)

Register addressing is really the simplest addressing, The source parameter is just the value another register. Remember - the source parameter is on the left of the comma, the destination is on the right!
Here are the results

Indirect register addressing (@Rn)

Indirect register addressing uses the address in a register as the source or destination. The register is prefixed by an At symbol @ In this example, we use indirect addressing as the Source of a read, and the Destination of a write.
Here are the results

Post-increment indirect register addressing (@Rn+)

We can use the address in a register, but update that address, but adding the amount loaded or stored. we use the syntax @Rn+... the fact the + is AFTER the register implies the register is changed after the load.

This is known as a Post Increment. it can be loading or storing data to sequential addresses - it can also be used for popping values off the stack (we'll look at that later)

For example, if R5=$1000, and we load in a 4 byte word, then R5 will be changed to $1004

Here are the loaded values.

Pre-decrement indirect register addressing (@�Rn)

Related to post increment is Pre Decrement. BEFORE reading or writing, we decrement the register by the amount loaded or stored. This is most useful for pushing values on to the stack (We'll look at that later) For example, if R5=$1000, and we load in a 4 byte word, then before the load, R5 will be changed to $FFC
Here are the results
We can combine PostIncrement and PreDecrement for stack operations. Here Push items onto the stack, to back them up We Pop them off to restore them.
Here are the results.

Indirect register addressing with displacement (@(disp:4,Rn))

A very powerful way of using registers is to have a base plus an offset.

For example, our register could point to player data, offset 4 could be the Xpos, offset 8 could be the Ypos, and offset 12 could be the remaining lives.

By changing the base register from player 1 to player 2, the same code could seamlessly handle both players.

Indirect register addressing with displacement does this via a base register with an immediate offset. For Example @(4,r5) will load from the address in R5 plus 4

We can work with bytes or words too, however the source or destination register must be R0 in those cases.

Here are the results

Indirect indexed register addressing (@(R0, Rn))

Rather than an immediate value, we can use the value in R0 as an offset to the base register. This is known as Indirect indexed register addressing. Of course this can be used for storing values as well as reading them.
Here are the results

PC relative addressing with displacement (@(disp:8,PC))

We can specify and address as a relative offset to the Program Counter (the current line of the code) with PC relative addressing with displacement Here we specified a fixed address - but really we'll probably never use this command in this way. When we specify an immediate that's more than 8 bit, it's stored at LTORG by the assembler, and the assembler calculates a PC relative displacement to that LTORG.
Here are the results.
If we want to calculate the resulting address, and store that Effective Address in a register we can use MOVA (MOVe Address) Here the final address of (4,pc) is loaded into R0
Here are the results

We've covered all the important addressing modes, the remaining ones are a bit obscure, or will be used by the assembler without you knowing about it!

Still, for completeness, we'll cover them here.

PC relative addressing (disp:12 / disp:8 / Register)

This isn't used by Load or Store operations, only by Branches.

BSR and BRA use a 12 bit relative offset.

Conditional branches like BT (Branch if True) use an 8 bit offset

These offsets are calculated by the assembler.

The final option (Which you'll probably never use) is to use a Register as an relative offset for a subroutine branch - but as it must be calculated relative to the program counter, it's not very useful.

Here are the results

Indirect GBR addressing with displacement (@(disp:8,GBR))
& Indirect indexed GBR addressing (@(R0,GBR))

The Global Base Register is intended for use when addressing Peripheral data. It's not an addressing mode you may need. We can specify an Offset as an 8 bit immediate, or with logical operations the R0 register can be used as an offset for the destination

Lesson 3 - Conditions, Compares, Stack and Special Regs
We've learned the basics of reading and writing data, but of course we'll want to make decisions based on the contents of that data.

We'll also take a proper look at the Stack, and learn how to access the values in special registers.

Lesson3.asm

To Get the benefit from this lesson you'll want to try downloading it and running it yourself!

You'll want to try different values and compares to see how the results change to ensure you really understand what things do and why.

False or True?

Conditions on the SuperH all come down to False or True and the T flag (True flag) We can alter the value of the T flag directly with SETT or CLRT. We can read the value in the T flag into another register with MOVT
Here are the results! In this case we set the T flag with SETT
We can Branch conditionally based on the T flag. BT (Branch True) will branch if T=1 BF (Branch False) will branch if T=0 unlike BRA - BT and BF have no delay slot.
In this case the T flag was set so the BT branch occurred.
DT stands for Decrement and Test - combined with BF we can use it to form a loop! Here we use R0 as our loop counter?
Here are the results, the loop occurred 4 times.
BT and BF do not have a delay slot... unless we want them to! bt/s and bf/s are the versions with a delay Slot! As usual, this delay slot means the command AFTER the branch will occur BEFORE the branch!
In this case, the T flag was zero, so the false branch occurred - AFTER we set R0 to 'F'

Compare with CMP/cc

To set the T flag via a comparison we need to use two values to compare, and a CMP/cc type command.

We need to ensure we use the correct condition, depending if our values are signed or unsigned.

Here we used an Unsigned comparison to see if R1 is HIgher than R0 .. "CMP/HI R0,R1"... it is, so the branch to PrintGT occurred

There are a wide variety available:

Bcc	Description	Condition
CMP/EQ Rm,Rn	CoMPare if EQual	if Rn = Rm then T=1 else T=0
CMP/GE Rm,Rn	CoMPare if Greater than or Equal (Signed)	if Rn >= Rm then T=1 else T=0
CMP/GT Rm,Rn	CoMPare if Greater Than (Signed)	if Rn > Rm then T=1 else T=0
CMP/HI Rm,Rn	CoMPare if HIgher (Unsigned)	if Rn > Rm then T=1 else T=0
CMP/HS Rm,Rn	CoMPare if Higher or Same (Unsigned)	if Rn >= Rm then T=1 else T=0
CMP/PL Rn	CoMPare if PLus (Signed)	if Rn > 0 then T=1 else T=0
CMP/PZ Rn	CoMPare if Plus or Zero (Signed)	if Rn >= 0 then T=1 else T=0
CMP/STR Rm,Rn	CoMPare STRing	if a byte in Rn matches the same positioned byte in Rm then T=1 else T=0
CMP/EQ #imm,R0	CoMPare if EQual (Signed)	if R0 = #imm then T=1 else T=0

PrintGT shows a > to the screen.
Try setting different values in R0 and R1, and using different conditions!

CMP/STR is an odd case!

if a byte in the second parameter is in the same position in the first parameter T will be setr.

In this case '22' is in both strings in the same place so T is set

'11' is in both strings, but in different places, so would not set the T flag.

Here is the result!

The Stack

There will be many times that we need to backup and restore registers for a period of time. This is where the stack comes in! R15 is our Stack Pointer - we can use the alias SP for clarity. To Backup (PUSH) a value we use "mov ??,@-sp" To Restore (POP) a value we use "mov @sp+,??" The Super-H stack is known as LIFO - Last in First Out... It's like an In-tray the last thing we put into the top of our in-tray is the first thing we take out.
In this example, we restored the values in R7,R8 in the reverse order we backed them up, so the values will be the same, but in different registers. R6 was backed up first, and restored last, so it's in the right place!
Subroutines are a common time we'll want to use the stack! Especially if we want to call a subroutine within our subroutine (Nested Subs) We'll need to back up the PR register as it holds the return address, we use STS and LDS to do this.
We loaded the return address into R7 so we could confirm what happened. When we called the sub, the Return address (PR) was pushed onto the stack before R6

System Registers and Control Registers

;WARNING! Control registers are privileged... That means you probably shouldn't use them in your actual code!

Also PR,DSR,A0,X0,X1,Y0,Y1 are only for the SH-DSP only, so don't exist on the SH2/3

We've been forced to mess with the PR register so we can back it up during subs, so lets take a look at the commands to work with System and Control registers. LDS and STS can Load and store the values in the PR (return register) and MACH/MACL (Multiply and ACumulate H/L) LDC and STC are for the SR (flags) GBR (global base register for GBR addressing) and VBR (Vector Base Register - Used for traps)
Here are the results Note... This example won't run on the Saturn - it doesn't like us messing with the VBR!
We can use these commands to backup these special registers onto the stack, then restore them as required.

If you want to back up or restore multiple registers, you'll want to use multiple load commands!

You'll probably want to create some macros to make it easier, and bulk copy the registers you'll want to push and pop most.

Lesson 4 - Logical Ops, Signs and Shifts
We've covered some basic maths, but there's lots more to do! This time we'll take a look at 'Logical Operations', Bit shifting commands, and a few commands to work with signed numbers.

Lesson4.asm

Logical Operations

Logical operations work at the bit level, applying a mask parameter to the destination register. The mask can be another register, or an 8 bit unsigned immediate, but if it's an immediate the destination must be R0 Logical AND will set bits in the source, and store the result in the destination according to the source and destination, leaving the source unaltered. Where a bit in both source and destination are 1 the result in the destination will be 1, when they are not it will be 0. It can be effectively used to clear bits in the destination. Here we use "AND #$F0,R0" and "AND R3,R1"
Here are the results.. all the 0 bits in the mask were cleared in the destination.
Logical OR will set the bits in the source, and store the result in the destination according to the source and destination, leaving the source unaltered. Where a bit in the source is 1, the bit in the destination will be 1. Where a bit in the source is 0 the bit in the destination will be unchanged. It can be effectively used to set bits in the destination. Here we use "OR #$F0,R0" and "OR R3,R1"
Here are the results.. all the 1 bits in the mask were set in the destination.
Logical XOR will flip bits in the source, and store the result in the destination according to the source and destination, leaving the source unaltered. Where a bit in the source is 1, the bit in the destination will be flipped. Where a bit in the source is 0, the bit in the destionation will be unchanged. It can be effectively used to invert bits in the destination. Here we use "XOR #$F0,R0" and "XOR R3,R1"
Here are the results.. all the 1 bits in the mask were flipped in the destination.
TeST performs a Logical AND of the source and destination, and if the result is zero, the T flag will be set to 1, otherwise T is set to 0. While this command considers the source and destination like AND, both source and destination are unaltered. It can be effectively used to test bits in the destination. Here we use "TST #$F0,R0" and "TST R3,R1"
The result of "tst r3,r1" was zero, so the T flag was set
We can use the the address in the GBR offset by R0 as the destination if we wish. Here we've performed all the operations on the memory TestAddr
Here are the results.

Shifts and Rotates

SHift Arithmetic Left will shift the bits in register Rn by 1 bit. The T flag will be the old top bit. The new bottom bit is 0. SHift Logical Left will shift the bits in register Rn by 1 bit. The T flag will be the old top bit. The new bottom bit is 0.
Here are the results Both SHAL and SHLL actually have the same effect on the register! Shifts Left effectively double the value in the register.
SHift Arithmetic Right will shift the bits in register Rn by 1 bit. 'Arithmetic' means the sign is maintained as the right shift occurs. The T flag will be the old bottom bit. The new bottom bit is the same as the previous top bit, maintaining the sign. SHift Logically Right will shift the bits in register Rn by 1 bit. 'Logical' means this is intended for unsigned numbers, as new bits are zero. The T flag will be the old bottom bit. The new top bit is the same as the previous top bit, maintaining the sign.
Here are the results Unlike SHAL and SHLL the result differs SHAR kept the sign the same (top bit =1). It can effectively halve signed numbers SHLR did not. It can effectively halve unsigned numbers Shifts Left effectively double the value in the register.
SHAL/SHAR and SHLL/SHLR only work one bit at a time Logical shift has special commands to shift Left or Right 2,8 or 16 bits. Commands for other bit amounts or Arithematic shifts are not available.
Here are the results.
Rather than halving or doubling, we want to move bits around the registers. We can do this with the Rotate commands. ROTL will rotate bits Left around a register ROTR will rotate bits Right around a register
Here are the results, Every 4 rotates you'll see the digits move one left or right... that's because each digit is a nibble - 4 bits!
When a shift occurs and bits are pushed out the registers they are moved into the T flag, however these bits are never used with shifts. ROTCL and ROTCR will ROTtate bits with the Carry Left or Right with the T flag, meaning the T bit is moved back into the register. This makes it possible to combine two 32 bit registers, using the T flag to shift bits between them. We use CLRT to clear the T flag, so new bits on R1 are all zero.
Here are the results. Bits move between R0 and R1 as the rotates occur

Signs and Stuff!

If our register contains an 8 or 16 bit value, we may need to extend it to fill the full 32 bit register. If its unsigned, we need to fill the extra bits with 0 - we can do this with EXTU If it's signed we need to filled the extra bits with the top bit of the byte or word - we can do this with EXTS
Here are the results. Notice EXTS filled the extra nibbles with Fs when the top bit was 1, and 0s when it was 0
There will be times we want to convert a positive to a negative Neg will do this for us. if we want to negate a 64 bit pair, NegC will carry the negation to a second register. Note: NEG does not set the T flag, so we need to use two NEGC commands (preceded by a CLRT) to correctly negate a 64 bit pair.
Here is the result.
NEG effectively flips all the bits and adds one - if we just want to flip the bits, we can use NOT
Here is the result

Lesson 5 - More Maths
We've covered lots of commands, but there's a few last ones we need to do.
Lets finish looking at the last of the maths commands

Lesson5.asm

Add and Subtract with Carry

The normal ADD and SUB commands do not set the T flag with any Carry, but we have special ones that do. We can use these to extend two registers to add or subtract in 64 bits. Here we use ADDC and SUBC - we use CLRT to zero the T flag first.
The carry between the two commands extended the addition and subtraction.

Add and Subtract with Overflow

Signed numbers have a limit to their value, and this may change by accident! $7FFFFFFF is a very high positive number - $80000000 is a very low one, but in unsigned arithematic they are only one apart! To detect the possible 'accidental sign change' we can use ADDV and SUBV - These set the T flag if overflow occurred and the sign changed incorrectly.
Here are the results. The T flag shows when the value went 'wrong'

Swapping parts and extracting bits!

We have some weird commands that may be useful! Swap.B will swap the bytes in a word Swap.W will swap the words in a Long. xtrct will take the 32 bit middle part of a 64 bit register pair.
Here are the results

Multiplication

If we have two 16 bit values we want to multiply we can use the following MULU will work with Unsigned numbers MULS will work with Signed numbers The result is stored in special register MACL, we can access it via "STS MACL Rn"
Here are the results
If we want to multiply two 32 bit numbers together, and just need a 32 bit result we can use MUL This will work for signed or unsigned numbers
Here is the result
If we need a 64 bit result we can use the following DMULU.L will work for Unsigned numbers DMULS.L will work for signed numbers. These store in the special register pair MACH:MACL
Here are the results.
If you need to multiply multiple values, and sum the total you can use MAC - Multiply and Accumulate. MAC / MAC.W will multiply the signed 16 bit values at the addresses in two registers, adding the result to MACH and MACL, incrementing the address in the registers by 2. MAC.L will multiply the signed 32 bit values at the addresses in the registers, adding the result to MACH and MACL, incrementing the address in the registers by 4. First we will want to clear MACH and MACL with CLRMAC
Here are the results

Division

Division on the Super-H is rather odd compared to other systems! We use a combination of Div0s Div0u and Div1, but even these need other commands to make them work. We'll look at some sample usages, copied from the official manuals! Here are the commands to perform R1 (32 bits) / R0 (16 bits) = R1 (16 bits)... Unsigned
Here are the results
Here are the commands to perform R1:R2 (64 bits)/R0 (32 bits) = R2 (32 bits)... Unsigned
Here are the results
Here are the commands to perform R1 (16 bits)/R0 (16 bits) = R1 (16 bits)... Signed
Here are the results
Here are the commands to perform R2 (32 bits) / R0 (32 bits) = R2 (32 bits)... Signed
Here are the results

Rare commands... you probably won't need!

One of our last commands is somewhat strange! You probably won't actually need it! TAS is Test and Set... It will test a memory address, and set the T flag if the address contained a zero byte - but then it will set the top bit of the byte to 1. It's intended for locking operations in multi thread or CPU systems.
Here are the results.
You probably won't need it, but SLEEP will power down the cpu until an interrupt occurs
This won't work on the saturn! We can execute operating system traps, and even make our own. TRAPA will execute a trap address from the vector table pointed to by the VBR We use TrapA #n to execute one of the addresses in the table
Here are the results.

Traps may be used to call operating system functions on your machine, it depends what you're developing for...

The Saturn certainly doesnt't like us trying to use them!

Instruction Set Summary

Opcode	Instruction	Function	Example
ADD Rm,Rn	ADD Binary	Rm + Rn → Rn	ADD R0,R1
ADD #imm,Rn	ADD Binary	Rn + #imm → Rn	ADD #H'01,R2
ADDC Rm,Rn	ADD with Carry	Rn + Rm + T → Rn, carry → T	ADDC R3,R1
ADDV Rm,Rn	ADD with V Flag Overflow Check	Rn + Rm → Rn, overflow → T	ADDV R0,R1
AND Rm,Rn	AND Logical	Rn & Rm → Rn	AND R0,R1
AND #imm,R0	AND Logical	R0 & imm → R0	AND #H'0F,R0
AND.B #imm,@(R0,GBR)	AND Logical	(R0 + GBR) & imm → (R0 + GBR)	AND.B #H'80,@(R0,GBR)
BF label	Branch if False	When T = 0, disp � 2 + PC → PC; When T = 1, nop	BF TRGET_F
BF/S label	Branch if False with Delay Slot	When T = 0, disp � 2+ PC → PC; When T = 1, nop	BF/S TRGET_F
BRA label	Branch	disp � 2 + PC → PC	BRA TRGET
BRAF @Rn	Branch Far	Rn + PC → PC
BSR label	Branch to Subroutine	PC → PR, disp � 2+ PC → PC	BSR TRGET
BSRF @Rn	Branch to Subroutine Far	PC → PR, Rn + PC → PC	BRSF R0
BT label	Branch if True	When T = 1, disp � 2 + PC → PC; When T = 0, nop	BT TRGET_T
BT/S label	Branch if True with Delay Slot	When T = 1,disp � 2 + PC → PC; When T = 0, nop	BT/S TARGET_T
CLRMAC	Clear MAC Register	0 → MACH, MACL	CLRMAC
CLRT	Clear T Bit	0 → T	CLRT
CMP/EQ Rm,Rn	Compare Equal	When Rn = Rm,1 → T
CMP/GE Rm,Rn	Compare Greater or Equal (signed)	When signed and Rn � Rm, 1 → T	CMP/GE R0,R1
CMP/GT Rm,Rn	Compare Greater Than (Signed)	When signed and Rn > Rm, 1 → T
CMP/HI Rm,Rn	Compare Higher (Unsigned)	When unsigned and Rn > Rm, 1 → T
CMP/HS Rm,Rn	Compare Higher or Same (Unsigned)	When unsigned and Rn � Rm, 1 → T	CMP/HS R0,R1
CMP/PL Rn	Compare if Plus	When Rn > 0, 1 → T
CMP/PZ Rn	Compare if Plus or Zero	When Rn � 0, 1 → T
CMP/STR Rm,Rn	Compare String	When byte in Rn = byte in Rm, 1 → T	CMP/STR R2,R3
CMP/EQ #imm,R0	Compare Equal Immediate	When R0 = imm, 1 → T
DIV0S Rm,Rn	Divide Step 0 as Signed	MSB of Rn → Q, MSB of Rm → M,M^Q → T	DIV0S R0,R1
DIV0U	Divide Step 0 as Unsigned	0 → M/Q/T	DIV0U
DIV1 Rm,Rn	Divide 1 Step	1 step division (Rn � Rm)	DIV1 R0,R1
DMULS.L Rm,Rn	Double-Length Multiply as Signed	With sign,Rn � Rm →MACH, MACL	DMULS.L R0,R1
DMULU.L Rm,Rn	Double-Length Multiply as Unsigned	Without sign,Rn � Rm →MACH, MACL	DMULU.L R0,R1
DT Rn	Decrement and Test (DJNZ)	Rn � 1 → Rn;When Rn is 0,1 → T, when Rn is nonzero, 0 → T	DT R5
EXTS.B Rm,Rn	Extend as Signed	Sign-extend Rm from byte → Rn	EXTS.B R0,R1
EXTS.W Rm,Rn	Extend as Signed	Sign-extend Rm from word → Rn	EXTS.W R0,R1
EXTU.B Rm,Rn	Extend as Unsigned	Zero-extend Rm from byte → Rn	EXTU.B R0,R1
EXTU.W Rm,Rn	Extend as Unsigned	Zero-extend Rm from word → Rn	EXTU.W R0,R1
JMP @Rn	Jump	Rn → PC	JMP @R0
JSR @Rn	Jump to Subroutine	PC → PR, Rn → PC	JSR @R0
LDC Rm,SR	Load to Control Register	Rm → SR	LDC R0,SR
LDC Rm,GBR	Load to Control Register	Rm → GBR
LDC Rm,VBR	Load to Control Register	Rm → VBR
LDC.L @Rm+,SR	Load to Control Register	(Rm) → SR, Rm + 4 → Rm
LDC.L @Rm+,GBR	Load to Control Register	(Rm) → GBR, Rm + 4 → Rm	LDC.L @R15+,GBR
LDC.L @Rm+,VBR	Load to Control Register	(Rm) → VBR, Rm + 4 → Rm
LDS Rm,MACH	Load to System Register	Rm → MACH
LDS Rm,MACL	Load to System Register	Rm → MACL
LDS Rm,PR	Load to System Register	Rm → PR	LDS R0,PR
LDS.L @Rm+,MACH	Load to System Register	(Rm) → MACH,Rm + 4 → Rm
LDS.L @Rm+,MACL	Load to System Register	(Rm) → MACL,Rm + 4 → Rm	LDS.L @R15+,MACL
LDS.L @Rm+,PR	Load to System Register	(Rm) → PR,Rm + 4 → Rm
MAC.L @Rm+,@Rn+	Multiply and Accumulate Calculation Long	Signed operation (Rn) � (Rm) + MAC→ MAC	MAC.L @R0+,@R1+
MAC.W @Rm+,@Rn+ MAC @Rm+,@Rn+	Multiply and Accumulate Calculation Word	With sign, (Rn) � (Rm) + MAC → MAC	MAC.W @R0+,@R1+
MOV Rm,Rn	Move Data	Rm → Rn	MOV R0,R1
MOV.B Rm,@Rn	Move Data	Rm → (Rn)
MOV.W Rm,@Rn	Move Data	Rm → (Rn)	MOV.W R0,@R1
MOV.L Rm,@Rn	Move Data	Rm → (Rn)
MOV.B @Rm,Rn	Move Data	(Rm) → sign extension → Rn
MOV.W @Rm,Rn	Move Data	(Rm) → sign extension → Rn
MOV.L @Rm,Rn	Move Data	(Rm) → Rn
MOV.B Rm,@�Rn	Move Data	Rn � 1 → Rn,Rm → (Rn)
MOV.W Rm,@�Rn	Move Data	Rn � 2 → Rn,Rm → (Rn)	MOV.W R0,@�R1
MOV.L Rm,@�Rn	Move Data	Rn � 4 → Rn,Rm → (Rn)
MOV.B @Rm+,Rn	Move Data	(Rm) → sign ext → Rn,Rm + 1 → Rm	MOV.B @R0,R1
MOV.W @Rm+,Rn	Move Data	(Rm) → sign ext → Rn, Rm + 2 → Rm
MOV.L @Rm+,Rn	Move Data	(Rm) → Rn, Rm + 4 → Rm	MOV.L @R0+,R1
MOV.B Rm,@(R0,Rn)	Move Data	Rm → (R0 + Rn)	MOV.B R1,@(R0,R2)
MOV.W Rm,@(R0,Rn)	Move Data	Rm → (R0 + Rn)
MOV.L Rm,@(R0,Rn)	Move Data	Rm → (R0 + Rn)
MOV.B @(R0,Rm),Rn	Move Data	(R0 + Rm) → sign extension → Rn
MOV.W @(R0,Rm),Rn	Move Data	(R0 + Rm) → sign extension → Rn	MOV.W @(R0,R2),R1
MOV.L @(R0,Rm),Rn	Move Data	(R0 + Rm) → Rn
MOV #imm,Rn	Move Immediate Data	imm → sign extension → Rn	MOV #H'80,R1
MOV.W @(disp,PC),Rn	Move Immediate Data	(disp � 2 + PC) → sign ext → Rn	MOV.W IMM,R2
MOV.L @(disp,PC),Rn	Move Immediate Data	(disp � 4 + PC) → Rn	MOV.L @(4,PC),R3
MOV.B @(disp,GBR),R0	Move Peripheral Data	(disp + GBR) → sign ext → R0
MOV.W @(disp,GBR),R0	Move Peripheral Data	(disp � 2 + GBR) → sign ext → R0
MOV.L @(disp,GBR),R0	Move Peripheral Data	(disp � 4 + GBR) → R0	MOV.L @(2,GBR),R0
MOV.B R0,@(disp,GBR)	Move Peripheral Data	R0 → (disp + GBR)	MOV.B R0,@(1,GBR)
MOV.W R0,@(disp,GBR)	Move Peripheral Data	R0 → (disp � 2 + GBR)
MOV.L R0,@(disp,GBR)	Move Peripheral Data	R0 → (disp � 4 + GBR)
MOV.B R0,@(disp,Rn)	Move Structure Data	R0 → (disp + Rn)
MOV.W R0,@(disp,Rn)	Move Structure Data	R0 → (disp � 2 + Rn)
MOV.L Rm,@(disp,Rn)	Move Structure Data	Rm → (disp � 4 + Rn)	MOV.L R0,@(H'F,R1)
MOV.B @(disp,Rn),R0	Move Structure Data	(disp + Rn) → sign extension → R0
MOV.W @(disp,Rn),R0	Move Structure Data	(disp � 2 + Rn) → sign extension → R0
MOV.L @(disp,Rm),Rn	Move Structure Data	(disp � 4 + Rm) → Rn	MOV.L @(2,R0),R1
MOVA @(disp,PC),R0	Move Effective Address	disp � 4 + PC → R0	MOVA @(0,PC),R0
MOVT Rn	Move T Bit	T → Rn	MOVT R0
MUL.L Rm,Rn MUL Rm,Rn	Multiply Long	Rn � Rm → MACL	MULL R0,R1
MULS.W Rm,Rn MULS Rm,Rn	Multiply as Signed Word	Signed operation, Rn � Rm → MACL	MULS R0,R1
MULU.W Rm,Rn MULU Rm,Rn	Multiply as Unsigned Word	Unsigned, Rn � Rm → MACL	MULU R0,R1
NEG Rm,Rn	Negate	0 � Rm → Rn	NEG R0,R1
NEGC Rm,Rn	Negate with Carry	0 � Rm � T → Rn, Borrow → T	NEGC R1,R1
NOP	No operation	No operation	NOP
NOT Rm,Rn	NOT�Logical Complement	~Rm → Rn	NOT R0,R1
OR Rm,Rn	OR Logical	Rn	1
OR #imm,R0	OR Logical	R0	1
OR.B #imm,@(R0,GBR)	OR Logical	(R0 + GBR)	3
ROTCL Rn	Rotate with Carry Left	T ← Rn ← T	ROTCL R0
ROTCR Rn	Rotate with Carry Right	T → Rn → T	ROTCR R0
ROTL Rn	Rotate Left	T ← Rn ← MSB	ROTL R0
ROTR Rn	Rotate Right	LSB → Rn → T	ROTR R0
RTE	Return from Exception	Delayed branch, Stack area → PC/SR	RTE
RTS	Return from Subroutine	Delayed branch, PR → PC	RTS
SETT	Set T Bit	1 → T	SETT
SHAL Rn	Shift Arithmetic Left 1 Bit with carry	T ← Rn ← 0	SHAL R0
SHAR Rn	Shift Arithmetic Right 1 Bit with carry	MSB → Rn → T	SHAR R0
SHLL Rn	Shift Logical Left 1 Bit with carry	T ← Rn ← 0	SHLL R0
SHLL2 Rn	Shift Logical Left 2 Bits	Rn << 2 → Rn	SHLL2 R0
SHLL8 Rn	Shift Logical Left 8 Bits	Rn << 8 → Rn	SHLL8 R0
SHLL16 Rn	Shift Logical Left 16 Bits	Rn << 16 → Rn	SHLL16 R0
SHLR Rn	Shift Logical Right 1 Bit with carry	0 → Rn → T	SHLR R0
SHLR2 Rn	Shift Logical Right 2 Bits	Rn>>2 → Rn	SHLR2 R0
SHLR8 Rn	Shift Logical Right 16 Bits	Rn>>8 → Rn	SHLR8 R0
SHLR16 Rn	Shift Logical Right 16 Bits	Rn>>16 → Rn	SHLR16 R0
SLEEP	Sleep	Sleep	SLEEP
STC SR,Rn	Store Control Register	SR → Rn	STC SR,R0
STC GBR,Rn	Store Control Register	GBR → Rn
STC VBR,Rn	Store Control Register	VBR → Rn
STC.L SR,@-Rn	Store Control Register	Rn � 4 → Rn, SR → (Rn)
STC.L GBR,@-Rn	Store Control Register	Rn � 4 → Rn, GBR → (Rn)	STC.L GBR,@-R15
STC.L VBR,@-Rn	Store Control Register	Rn � 4 → Rn, VBR → (Rn)
STS MACH,Rn	Store System Register	MACH → Rn	STS MACH,R0
STS MACL,Rn	Store System Register	MACL → Rn
STS PR,Rn	Store System Register	PR → Rn
STS.L MACH,@�Rn	Store System Register	Rn � 4 → Rn,MACH → (Rn)
STS.L MACL,@�Rn	Store System Register	Rn � 4 → Rn,MACL → (Rn)
STS.L PR,@�Rn	Store System Register	Rn � 4 → Rn,PR → (Rn)	STS.L PR,@�R15
SUB Rm,Rn	Subtract Binary	Rn � Rm → Rn	SUB R0,R1
SUBC Rm,Rn	Subtract with Carry	Rn � Rm� T → Rn, Borrow → T	SUBC R3,R1
SUBV Rm,Rn	Subtract with V Flag Underflow Check	Rn � Rm → Rn, underflow → T	SUBV R0,R1
SWAP.B Rm,Rn	Swap Register Halves	Rm → Swap upper and lower halves of lower 2 bytes → Rn	SWAP.B R0,R1
SWAP.W Rm,Rn	Swap Register Halves	Rm → Swap upper and lower word → Rn	SWAP.W R0,R1
TAS.B @Rn	Test and Set	When (Rn) is 0, 1 → T, 1 → MSB of (Rn)	TAS.B @R7
TRAPA #imm	Trap Always	PC/SR → Stack area, (imm � 4 + VBR) → PC	TRAPA #H'20
TST Rm,Rn	Test Logical	Rn & Rm, when result is 0, 1 → T	TST R0,R0
TST #imm,R0	Test Logical	R0 & imm, when result is 0, 1 → T	TST #H'80,R0
TST.B #imm, @(R0,GBR)	Test Logical	(R0 + GBR) & imm, when result is 0, 1 → T	TST.B #H'A5,@(R0,GBR)
XOR Rm,Rn	Exclusive OR Logical	Rn ^ Rm → Rn	XOR R0,R1
XOR #imm,R0	Exclusive OR Logical	R0 ^ imm → R0	XOR #H'F0,R0
XOR.B #imm,@(R0,GBR)	Exclusive OR Logical	(R0 + GBR) ^ imm → (R0 + GBR)	XOR.B #H'A5,@(R0,GBR)
XTRCT Rm,Rn	Extract	Rm: Center 32 bits of Rn → Rn	XTRCT R0,R1