| 31 | 30 | 29 | 28 | 27 | 26 | 25..2 | 1..0 |
| N | Z | C | V | I | F | 24-bit program counter (word aligned) | Mode |
With this arrangement, it was possible to have a 26-bit address space, capable of accessing up to 64MBytes of memory. At the time, this was considered a lot, but with the advent of the ARM6, memory had become a much cheaper commodity, and so a 32-bit mode was introduced, with a separate register to store the processor flags in. Existing applications could still run because the processor could be put into a special 26-bit mode, where the program counter reflected the processor status bits, as in the ARM 2.
Since RISC OS uses the 26-bit program counter, applications also use this mode of operation, as RISC OS does not support any kind of mode switching. Only the FIRQ vector operates in 32-bit mode.
As 32-bit mode is generally more desireable (you get up to 4GBytes of memory space), future processors will not have the 26-bit mode as an option. This means that RISC OS will not work on the future processors.
Consider the ARM instruction
: MOVS pc,lr
This becomes the
hexadecimal sequence 0xE1B0F00E. If an automatic convertor saw this
instruction, it would convert it into whatever instruction performed the same
function as that when it was running under 26-bit (a possible method is
discussed below).
Now, if you had a block of raw data, and one of the words in the raw data
just so happened to be 0xE1B0F00E, then that instruction would also
be changed.
There is the possibility of checking around that instruction, to see if it looks like it is code, but even that can fail:
init
STMFD r13!,{r14}
MOV r0,#0
LDR r1,count
loop
BL debug_routine
SUBS r1,r1,#1
BPL loop
LDMFD r13!,{pc}^
count
DCD 128
debug_routine
[ debug=1
SWI XDebug_Code
]
MOVS pc,lr
data
DCD 0In this example, the convertor may correctly identify the
LDMFD r13!,{pc}^, but if the debug variable was not
one, then the MOVS pc,lr would be surrounded by words which 'looks'
more like data than ARM instructions. If you were to disassemble it, it would
become:
ANDEQ r0,r0,r0,LSL#1 MOVS pc,lr ANDEQ r0,r0,r0You could, of course, have part of the convertor that checks to see where program branches jump to. However, they would fail if the following occurs:
jump_table
DCD t_code1-t_start
DCD t_code2-t_start
init
ADR r0,jump_table ; r0 is pointer to the vector table
ADR r1,t_start ; r1 is the pointer to the code start
MOV r2,#1 ; r2 is the vector number
BL jump_vector
; ...
jump_vector
STMFD r13!,{r14}
LDR r0,[r0,r2,LSL#2] ; read the vector offset
MOV lr,pc ; 'Fake' a BL-type instruction
ADD pc,r0,r1 ; and jump to it
vector_return ; This is where the vector returns
MOVVC r0,#0 ; Clear r0 if no error
LDMFD r13!,{pc}^
; ...
t_start
; ...
data1
DCD 0
t_code2
MOVS pc,lr
data2
DCD 0
;...Here, the automatic convertor wouldn't know what to do with it, unless
it had some form of built-in emulator to work out what is happening. This may
seem like a contrived example, but there could be many more complicated ways of
doing the same thing.
This solution is to emulate 26-bit mode from within a 32-bit mode. There are two ways of doing this, one is a standard emulation, the other I'm calling the "Code Lookahead Optimal Emulator" (or CLOE), which will be discussed later.
In standard emulation, the processor would 'pretend' to have 15 registers, and one program counter. Some of these may have a direct relationship with the actual registers (ie. they would not be virtual registers), but others may be virtual registers - the program counter being one of those.
For each instruction, the emulator would work out what the instruction did, and would perform it on its set of registers. This would be quite slow (it would be good performance if a 20:1 ratio could be achieved), but the 26-bit programs would work. When a SWI is called, control is passed from the emulator to the OS, and when the SWI returns, the emulator resumes. However, parts of the OS may still be 26-bit and hence use the emulator...
init
MOV r0,#0
MOV r1,#8
MOV r7,#32
loop
ADR r2,text
BL print_text
SUBS r7,r7,#1
BGT loop
SWI XOS_Exit
print_text
STMFD r13!,{r0-r2,lr}
MOV r0,r2
SWI XOS_Write0
LDMFD r13!,{r0-r2,pc}^CLOE starts off at the first instruction. It
looks at it, and decides whether or not it needs to be emulated. In this case,
it doesn't, and so looks at the next instruction. This continues until it finds
one of the following class of instructions:
init
MOV r0,#0
MOV r1,#8
MOV r7,#32
loop
ADR r2,text
SWI CLOE_branch_link ; This is the new instruction
; BL print_text ; This was there
SUBS r7,r7,#1
BGT loop
SWI XOS_Exit
print_text
STMFD r13!,{r0-r2,lr}
MOV r0,r2
SWI XOS_Write0
LDMFD r13!,{r0-r2,pc}^Since CLOE knows that the processor will always
reach the SWI (because there is no opportunity for the program counter to change
without CLOE's knowledge), the program will execute as a standard 32-bit mode.
CLOE then looks up the original instruction, and then decides that the following needs to take place:
print_text;
LDMFD
r13!,{r0-r2,pc}^, so it marks that with a SWI:
init
MOV r0,#0
MOV r1,#8
MOV r7,#32
loop
ADR r2,text
SWI CLOE_branch_link
SUBS r7,r7,#1
BGT loop
SWI XOS_Exit
print_text
STMFD r13!,{r0-r2,lr}
MOV r0,r2
SWI XOS_Write0
SWI CLOE_pull_stack ; Another new instruction
; LDMFD r13!,{r0-r2,pc}^ ; This was the old oneAs before, CLOE starts
executing in normal mode from print_text, and after calling the SWI
XOS_Write0, it is then called again. It emulates the instruction
which it has stored, and finds out that the execution continues after the
earlier SWI that has been called. So, it starts the emulation again, this time
reaching BGT loop. The code becomes:
init
MOV r0,#0
MOV r1,#8
MOV r7,#32
loop
ADR r2,text
SWI CLOE_branch_link
SUBS r7,r7,#1
SWI CLOE_branch ; Note that it's not SWIGT
; BGT loop ; This is the old instruction
SWI XOS_Exit
print_text
STMFD r13!,{r0-r2,lr}
MOV r0,r2
SWI XOS_Write0
SWI CLOE_pull_stackAs CLOE needs to know exactly where execution
continues, CLOE needs to emulate any form of branches, so it can continue
emulating where the code left off. After the first run, R7 is 31, so the routine
would repeat for 32 times.
After the final CLOE_branch has failed, CLOE recognises SWI XOS_Exit, and this would return back to the 32-bit OS.
There is one problem with CLOE - code which checks itself against modification. This would have to be addressed...
| In order to reduce the amount of emulation, it is vital that a 32-bit kernel be in place as soon as possible, as well as an active encouragement to get developers to write 32-bit applications. |
There are four main ways ARM code can get executed:
In order to allow 32-bit operations of the OS, the kernel would have to be written in 32-bit, and switch between 32-bit and 26-bit modes on the processor. To distinguish between 26-bit versions of the above, and 32-bit versions, different file-types/SWIs could be used:
*RMLoad32 etc.
XOS_Claim32, XOS_CallAfter32 etc.