This question already has an answer here:
8086 assembly on DOSBox: Bug with idiv instruction?
(1 answer)
Closed 4 years ago.
I am trying to solve the equation: 7 * (4 + 10) + (15 / 5) for example in assembly language. I assume the BEDMAS principal still applies, but the code I run is not giving me the correct numerical value? I am not sure where I am going wrong. When we invoke DIV, does it not automatically divide the value from the AX register?
MOV BX,10
ADD BX,4
MOV AX,15
MOV BL,5
DIV BL
ADD AX,BX
MOV BX, 7
MUL BX
HLT
MOV BX,10
ADD BX,4
MOV AX,15
MOV BL,5 <<<< This overwrite the sum 10 + 4 in BX
DIV BL
ADD AX,BX <<<< Luckily remainder was zero
MOV BX, 7
MUL BX <<<< Needlessly clobbers DX
Apart from some other imperfections, this calculation does not even follow normal algebraic rules.
You've calculated 7 * ( (4 + 10) + (15 / 5) ) when the task asked for ( 7 * (4 + 10) ) + (15 / 5)
On 8086 both division and multiplication use the accumulator, so inevitably you'll have to move the result from whichever of these you choose to do first in an extra register.
The byte sized division yields a quotient in AL but also a remainder in AH. This task asks you to continu with the quotient disregarding the remainder. Your code does not explicitely zero AH and that's not good enough for a generalized solution! Luckily 15 / 5 gave a remainder = 0.
Solution with division before multiplication:
mov ax, 15
mov bl, 5 ;Divider in BL
div bl ;AX/BL -> AL=3 (remainder in AH=0)
mov bl, al ;Move to an extra register
mov al, 4
add al, 10 ;AL=14
mov ah, 7
mul ah ;AL*AH -> AX=98
add al, bl
Solution with multiplication before division:
mov al, 4
add al, 10 ;AL=14
mov ah, 7
mul ah ;AL*AH -> AX=98
mov bh, al ;Move to an extra register
mov ax, 15
mov bl, 5 ;Divider in BL
div bl ;AX/BL -> AL=3 (remainder in AH=0)
add al, bh
Both solutions produce the same result (101) and use just 2 registers (AX and BX).
In my 80x86 assembly program, I am trying to calculate the equation of
(((((2^0 + 2^1) * 2^2) + 2^3) * 2^4) + 2^5)...(2^n), where each even exponent is preceded by a multiplication and each odd exponent is preceded by a plus. I have code, but my result is continuously off from the desired result. When 5 is put in for n, the result should be 354, however I get 330.
Any and all advice will be appreciated.
.586
.model flat
include io.h
.stack 4096
.data
number dword ?
prompt byte "enter the power", 0
string byte 40 dup (?), 0
result byte 11 dup (?), 0
lbl_msg byte "answer", 0
bool dword ?
runtot dword ?
.code
_MainProc proc
input prompt, string, 40
atod string
push eax
call power
add esp, 4
dtoa result, eax
output lbl_msg, result
mov eax, 0
ret
_MainProc endp
power proc
push ebp
mov ebp, esp
push ecx
mov bool, 1 ;initial boolean value
mov eax, 1
mov runtot, 2 ;to keep a running total
mov ecx, [ebp + 8]
jecxz done
loop1:
add eax, eax ;power of 2
test bool, ecx ;test case for whether exp is odd/even
jnz oddexp ;if boolean is 1
add runtot, eax ;if boolean is 0
loop loop1
oddexp:
mov ebx, eax ;move eax to seperate register for multiplication
mov eax, runtot ;move existing total for multiplication
mul ebx ;multiplication of old eax to new eax/running total
loop loop1
done:
mov eax, runtot ;move final runtotal for print
pop ecx
pop ebp
ret
power endp
end
You're overcomplicating your code with static variables and branching.
These are powers of 2, you can (and should) just left-shift by n instead of actually constructing 2^n and using a mul instruction.
add eax,eax is the best way to multiply by 2 (aka left shift by 1), but it's not clear why you're doing that to the value in EAX at that point. It's either the multiply result (which you probably should have stored back into runtot after mul), or it's that left-shifted by 1 after an even iteration.
If you were trying to make a 2^i variable (with a strength reduction optimization to shift by 1 every iteration instead of shifting by i), then your bug is that you clobber EAX with mul, and its setup, in the oddexp block.
As Jester points out, if the first loop loop1 falls through, it will fall through into oddexp:. When you're doing loop tail duplication, make sure you consider where fall-through will go from each tail if the loop does end there.
There's also no point in having a static variable called bool which holds a 1, which you only use as an operand for test. That implies to human readers that the mask sometimes needs to change; test ecx,1 is a lot clearer as a way to check the low bit for zero / non-zero.
You also don't need static storage for runtot, just use a register (like EAX where you want the result eventually anyway). 32-bit x86 has 7 registers (not including the stack pointer).
This is how I'd do it. Untested, but I simplified a lot by unrolling by 2. Then the test for odd/even goes away because that alternating pattern is hard-coded into the loop structure.
We increment and compare/branch twice in the loop, so unrolling didn't get rid of the loop overhead, just changed one of the loop branches into an an if() break that can leave the loop from the middle.
This is not the most efficient way to write this; the increment and early-exit check in the middle of the loop could be optimized away by counting another counter down from n, and leaving the loop if there are less than 2 steps left. (Then sort it out in the epilogue)
;; UNTESTED
power proc ; fastcall calling convention: arg: ECX = unsigned int n
; clobbers: ECX, EDX
; returns: EAX
push ebx ; save a call-preserved register for scratch space
mov eax, 1 ; EAX = 2^0 running total / return value
test ecx,ecx
jz done
mov edx, ecx ; EDX = n
mov ecx, 1 ; ECX = i=1..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; add 2^odd power
mov ebx, 1
shl ebx, cl ; 2^i ; xor ebx, ebx; bts ebx, ecx
add eax, ebx ; total += 2^i
inc ecx
cmp ecx, edx
jae done ; if (++i >= n) break;
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
pop ebx
ret
I didn't check if the adding-odd-power step ever produces a carry into another bit. I think it doesn't, so it could be safe to implement it as bts eax, ecx (setting bit i). Effectively an OR instead of an ADD, but those are equivalent as long as the bit was previously cleared.
To make the asm look more like the source and avoid obscure instructions, I implemented 1<<i with shl to generate 2^i for total += 2^i, instead of a more-efficient-on-Intel xor ebx,ebx / bts ebx, ecx. (Variable-count shifts are 3 uops on Intel Sandybridge-family because of x86 flag-handling legacy baggage: flags have to be untouched if count=0). But that's worse on AMD Ryzen, where bts reg,reg is 2 uops but shl reg,cl is 1.
Update: i=3 does produce a carry when adding, so we can't OR or BTS the bit for that case. But optimizations are possible with more branching.
Using calc:
; define shiftadd_power(n) { local res=1; local i; for(i=1;i<=n;i++){ res+=1<<i; i++; if(i>n)break; res<<=i;} return res;}
shiftadd_power(n) defined
; base2(2)
; shiftadd_power(0)
1 /* 1 */
...
The first few outputs are:
n shiftadd(n) (base2)
0 1
1 11
2 1100
3 10100 ; 1100 + 1000 carries
4 101000000
5 101100000 ; 101000000 + 100000 set a bit that was previously 0
6 101100000000000
7 101100010000000 ; increasing amounts of trailing zero around the bit being flipped by ADD
Peeling the first 3 iterations would enable the BTS optimization, where you just set the bit instead of actually creating 2^n and adding.
Instead of just peeling them, we can just hard-code the starting point for i=3 for larger n, and optimize the code that figures out a return value for the n<3 case. I came up with a branchless formula for that based on right-shifting the 0b1100 bit-pattern by 3, 2, or 0.
Also note that for n>=18, the last shift count is strictly greater than half the width of the register, and the 2^i from odd i has no low bits. So only the last 1 or 2 iterations can affect the result. It boils down to either 1<<n for odd n, or 0 for even n. This simplifies to (n&1) << n.
For n=14..17, there are at most 2 bits set. Starting with result=0 and doing the last 3 or 4 iterations should be enough to get the correct total. In fact, for any n, we only need to do the last k iterations, where k is enough that the total shift count from even i is >= 32. Any bits set by earlier iterations are shifted out. (I didn't add a branch for this special case.)
;; UNTESTED
;; special cases for n<3, and for n>=18
;; enabling an optimization in the main loop (BTS instead of add)
;; funky overflow behaviour for n>31: large odd n gives 1<<(n%32) instead of 0
power_optimized proc
; fastcall calling convention: arg: ECX = unsigned int n <= 31
; clobbers: ECX, EDX
; returns: EAX
mov eax, 14h ; 0b10100 = power(3)
cmp ecx, 3
ja n_gt_3 ; goto main loop or fall through to hard-coded low n
je early_ret
;; n=0, 1, or 2 => 1, 3, 12 (0b1, 0b11, 0b1100)
mov eax, 0ch ; 0b1100 to be right-shifted by 3, 2, or 0
cmp ecx, 1 ; count=0,1,2 => CF,ZF,neither flag set
setbe cl ; count=0,1,2 => cl=1,1,0
adc cl, cl ; 3,2,0 (cl = cl+cl + (count<1) )
shr eax, cl
early_ret:
ret
large_n: ; odd n: result = 1<<n. even n: result = 0
mov eax, ecx
and eax, 1 ; n&1
shl eax, cl ; n>31 will wrap the shift count so this "fails"
ret ; if you need to return 0 for all n>31, add another check
n_gt_3:
;; eax = running total for i=3 already
cmp ecx, 18
jae large_n
mov edx, ecx ; EDX = n
mov ecx, 4 ; ECX = i=4..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc edx
cmp ecx, edx
jae done ; if (++i >= n) break;
; add 2^odd power. i>3 so it won't already be set (thus no carry)
bts eax, edx ; total |= 1<<i;
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
ret
By using BTS to set a bit in EAX avoids needing an extra scratch register to construct 1<<i in, so we don't have to save/restore EBX. So that's a minor bonus saving.
Notice that this time the main loop is entered with i=4, which is even, instead of i=1. So I swapped the add vs. shift.
I still didn't get around to pulling the cmp/jae out of the middle of the loop. Something like lea edx, [ecx-2] instead of mov would set the loop-exit condition, but would require a check to not run the loop at all for i=4 or 5. For large-count throughput, many CPUs can sustain 1 taken + 1 not-taken branch every 2 clocks, not creating a worse bottleneck than the loop-carried dep chains (through eax and ecx). But branch-prediction will be different, and it uses more branch-order-buffer entries to record more possible roll-back / fast-recovery points.
Can someone give me an example of how recursion would be done in ARM Assembly with only the instructions listed here (for visUAL)?
I am trying to do a recursive fibonacci and factorial function for class. I know recursion is a function that calls a function, but I have no idea how to simulate that in ARM.
https://salmanarif.bitbucket.io/visual/supported_instructions.html
In case the link doesn't work, I am using visUAL and these are the only instructions I can use:
MOV
MVN
ADR
LDR
ADD
ADC
SUB
SBC
RSB
RSC
AND
EOR
BIC
ORR
LSL
LSR
ASR
ROR
RRX
CMP
CMN
TST
TEQ
LDR
LDM
STM
B
BL
FILL
END
This doesn't load an older value for R4, so R4 just doubles every time the function calls itself.
;VisUAL initializess all registers to 0 except for R13/SP, which is -16777216
MOV R4, #0
MOV R5, #1
MOV r0, #4
MOV LR, #16 ;tells program to move to 4th instruction
FIB
STMDB SP!, {R4-R6, LR} ;Stores necessary values on stack (PUSH command)
LDR R4, [SP] ;Loads older value for R4 from memory
ADD R4, R4, R5 ;Adds R5 to R4
STR R4, [SP], #8 ;stores current value for R4 to memory
MOV R5, R4 ;Makes R5 = R4
CMP R4, #144 ;If R4 >= 144:
BGE POP ;Branch to POP
MOV PC, LR ;Moves to STMDB(PUSH) statement
POP
LDMIA SP!, {R4-R6, LR} ;Pops registers off stack
END ;ends program
You need to use the stack, STMDB and LDMIA instructions. On real ARM tools with "unified" notation, they also have mnemonics PUSH and POP.
Fibonnaci and factorial are not great examples as they don't "need" recursion. But let's pretend they do. I'll pick Fibonacci as you don't have a MUL instruction!? You want to do something like this:
START
MOV R0, #6
BL FIB
END ; pseudo-instruction to make your simulator terminate
FIB ; int fib(int i) {
STMDB SP!, {R4,R5,R6,LR} ; int n, tmp;
MOV R4, R0 ; n = i;
CMP R0, #2 ; if (i <= 2) {
MOV R0, #1 ; return 1;
BLE FIB_END ; }
SUB R0, R4, #2 ; i = n-2;
BL FIB ; i = fib(i);
MOV R5, R0 ; tmp = i;
SUB R0, R4, #1 ; i = n-1;
BL FIB ; i = fib(i);
ADD R0, R0, R5 ; i = i + tmp;
FIB_END ; return i;
LDMIA SP!, {R4,R5,R6,PC} ; }
It should terminate with R0 containing fib(6) == 8. Of course this code is very inefficient as it repeatedly calls FIB for the same values.
The STM is needed so you can use registers r4,r5 because another function call can change r0-r3 and LR. Pushing LR and popping PC is like B LR. If you were calling C code you should push an even number of registers to keep SP 64-bit aligned (we don't really need to do that here; ignore R6).
some other recursive function:
unsigned int so ( unsigned int x )
{
static unsigned int z=0;
z+=x;
if(x==0) return(z);
so(x-1);
return(z);
}
build/disassemble
arm-none-eabi-gcc -O2 -c Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o
00000000 <so>:
0: e92d4010 push {r4, lr}
4: e59f4034 ldr r4, [pc, #52] ; 40 <so+0x40>
8: e5943000 ldr r3, [r4]
c: e3500000 cmp r0, #0
10: e0803003 add r3, r0, r3
14: e5843000 str r3, [r4]
18: 1a000002 bne 28 <so+0x28>
1c: e1a00003 mov r0, r3
20: e8bd4010 pop {r4, lr}
24: e12fff1e bx lr
28: e2400001 sub r0, r0, #1
2c: ebfffffe bl 0 <so>
30: e5943000 ldr r3, [r4]
34: e8bd4010 pop {r4, lr}
38: e1a00003 mov r0, r3
3c: e12fff1e bx lr
40: 00000000
If you dont understand it then is it worth it. Is it cheating to let a tool do it for you?
push is a pseudo instruction for stm, pop a pseudo instruction for ldm, so you can use those.
I used a static local which I call a local global, it lands in .data not on the stack (well .bss in this case as I made it zero)
Disassembly of section .bss:
00000000 <z.4099>:
0: 00000000
the first to loads are loading this value into r3.
the calling convention says that r0 will contain the first parameter on entry into the function (there are exceptions, but it is true in this case).
so we go and get z from memory, r0 already has the parameter x so we add x to z and save it to memory
the compiler did the compare out of order for who knows performance reasons, the add and str as written dont modify flags so that is okay,
if x is not equal to zero it branches to 28 which does the so(x-1) call
reads r3 back from memory (the calling convention says that r0-r3 are volatile a function you can can modify them at will and doesnt have to preserve them so our version of z in r3 might have been destroyed but r4 is preserved by any callee, so we read z back into r3. we pop r4 and the return address off the stack, we prepare the return register r0 with z and do the return.
if x was equal to zero (bne on 18 failed we run 1c, then 20, then 24) then we copy z (r3 version) into r0 which is the register used for returning from this function per the calling convention used by this compiler (arms recommendation). and returns.
the linker is going to fill in the address of z to the offset 0x40, this is an object not a final binary...
arm-none-eabi-ld -Ttext=0x1000 -Tbss=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <so>:
1000: e92d4010 push {r4, lr}
1004: e59f4034 ldr r4, [pc, #52] ; 1040 <so+0x40>
1008: e5943000 ldr r3, [r4]
100c: e3500000 cmp r0, #0
1010: e0803003 add r3, r0, r3
1014: e5843000 str r3, [r4]
1018: 1a000002 bne 1028 <so+0x28>
101c: e1a00003 mov r0, r3
1020: e8bd4010 pop {r4, lr}
1024: e12fff1e bx lr
1028: e2400001 sub r0, r0, #1
102c: ebfffff3 bl 1000 <so>
1030: e5943000 ldr r3, [r4]
1034: e8bd4010 pop {r4, lr}
1038: e1a00003 mov r0, r3
103c: e12fff1e bx lr
1040: 00002000
Disassembly of section .bss:
00002000 <z.4099>:
2000: 00000000
the point here is not to cheat and use a compiler, the point here is there is nothing magical about a recursive function, certainly not if you follow a calling convention or whatever your favorite term is.
for example
if you have parameters r0 is first, r1 second, up to r3 (if they fit, make your code such that it does and you have four or less parameters)
the return value is in r0 if it fits
you need to push lr on the stack as you will be calling another function
r4 on up preserve if you need to modify them, if you want some local storage either use the stack by modifying the stack pointer accordingly (or doing pushes/stms). you can see that gcc instead saves what was in the register to the stack and then uses the register during the function, at least up to a few local variables worth, beyond that it would need to bang on the stack a lot, sp relative.
when you do the recursive call you do so as you would any other normal function according to the calling convention, if you need to save r0-r3 before calling then do so either in a register r4 or above or on the stack, restore after the function returns. you can see it is easier just to put the values you want to keep before and after a function call in a register r4 or above.
the compiler could have done the compare of r0 just before the branch, reads easier that way. Likewise could have done the mov to r0 of the return value before the pop
I didnt put parameters, so my build of gcc here appears to be armv4t, if I ask for something a little newer
arm-none-eabi-gcc -O2 -c -mcpu=mpcore Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <so>:
0: e92d4010 push {r4, lr}
4: e59f402c ldr r4, [pc, #44] ; 38 <so+0x38>
8: e3500000 cmp r0, #0
c: e5943000 ldr r3, [r4]
10: e0803003 add r3, r0, r3
14: e5843000 str r3, [r4]
18: 1a000001 bne 24 <so+0x24>
1c: e1a00003 mov r0, r3
20: e8bd8010 pop {r4, pc}
24: e2400001 sub r0, r0, #1
28: ebfffffe bl 0 <so>
2c: e5943000 ldr r3, [r4]
30: e1a00003 mov r0, r3
34: e8bd8010 pop {r4, pc}
38: 00000000
You can see the returns read a little easier
although an optimization was missed it could have done an ldr r0,[r4] and saved an instruction. or leave that tail end as is and the bne could have been a beq to 30 (mov r0,r3; pop{r4,pc} and shared an exit.
a little more readable
so:
push {r4, lr}
# z += x
ldr r4, zptr
ldr r3, [r4]
add r3, r0, r3
str r3, [r4]
# if x==0 return z
cmp r0, #0
beq l30
# so(x - 1)
sub r0, r0, #1
bl so
ldr r3, [r4]
l30:
# return z
mov r0, r3
pop {r4, pc}
zptr: .word z
.section .bss
z: .word 0
arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <so>:
0: e92d4010 push {r4, lr} (stmdb)
4: e59f4024 ldr r4, [pc, #36] ; 30 <zptr>
8: e5943000 ldr r3, [r4]
c: e0803003 add r3, r0, r3
10: e5843000 str r3, [r4]
14: e3500000 cmp r0, #0
18: 0a000002 beq 28 <l30>
1c: e2400001 sub r0, r0, #1
20: ebfffff6 bl 0 <so>
24: e5943000 ldr r3, [r4]
00000028 <l30>:
28: e1a00003 mov r0, r3
2c: e8bd8010 pop {r4, pc} (ldmia)
00000030 <zptr>:
30: 00000000
Disassembly of section .bss:
00000000 <z>:
0: 00000000
EDIT
So lets walk through this last one.
push {r4,lr} which is a pseudo instruction for stmdb sp!,{r4,lr}
Lr is the r14 which is the return address look at the bl instruction
branch and link, so we branch to some address but lr (link register) is
set to the return address, the instruction after the bl. So when main or some other function calls so(4); lets assume so is at address 0x1000 so the program counter, r15, pc gets 0x1000, lr will get the value of the instruction after the caller so lets say that is 0x708. Lets also assume the stack pointer during this first call to so() from main is at 0x8000, and lets say that .bss is at 0x2000 so z lives at address 0x2000 (which also means the value at 0x1030, zptr is 0x2000.
We enter the function for the first time with r0 (x) = 4.
When you read the arm docs for stmdb sp!,{r4,lr} it decrements before (db) so sp on entry this time is 0x8000 so it decrements for the two items to 0x7FF8, the first item in the list is written there so
0x7FF8 = r4 from main
0x7FFC = 9x 0x708 return address to main
the ! means sp stays modified so sp-0x7ff8
then ldr r4,zptr r4 = 0x2000
ldr r3,[r4] this is an indirect load so what is at address r4 is read to
put in r3 so r3 = [0x2000] = 0x0000 at this point the z variable.
z+=x; add r3,r0,r3 r3 = r0 + r3 = 4 + 0 = 4
str r3,[r4] [r4] = r3, [0x2000] = r3 write 4 to 0x2000
cmp r0,#0 4 != 0
beq to 28 nope, not equal so no branch
sub r0,r0,#1 r0 = 4 - 1 = 3
bl so so this is so(3); pc = 0x1000 lr = 0x1024
so now we enter so for the second time with r0 = 3
stmdb sp!,{r4,lr}
0x7FF0 = r4 (saving from so(4) call but we dont care its value even though we know it)
0x7FF4 = lr from so(4) = 0x1024
sp=0x7FF0
ldr r4,zptr r4 = 0x2000
ldr r3,[r4] r3 = [0x2000] = 4
add r3,r0,r3 r3 = 3 + 4 = 7
str r3,[r4] write 7 to 0x2000
cmp r0,#0 3 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 3-1 = 2
bl so pc=0x1000 lr=0x1024
so(2)
stmdb sp!,{r4,lr}
0x7FE8 = r4 from caller, just save it
0x7FEC = lr from caller, 0x1024
sp=0x7FE8
ldr r4,zprt r4=0x2000
ldr r3,[r4] r3 = read 7 from 0x2000
add r3,r0,r3 r3 = 2 + 7 = 9
str r3,[r4] write 9 to 0x2000
cmp r0,#0 2 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 2 - 1 = 1
bl 0x1000 pc=0x1000 lr=0x1024
so(1)
stmdb sp!,{r4,lr}
0x7FE0 = save r4
0x7FE4 = lr = 0x1024
sp=0x7FE0
ldr r4,zptr r4=0x2000
ldr r3,[r4] r3 = read 9 from 0x2000
add r3,r0,r3 r3 = 1 + 9 = 10
str r3,[r4] write 10 to 0x2000
cmp r0,#0 1 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 1 - 1 = 0
bl 0x1000 pc=0x1000 lr=0x1024
so(0)
stmdb sp!,{r4,lr}
0x7FD8 = r4
0x7FDC = lr = 0x1024
sp = 0x7FD8
ldr r4,zptr r4 = 0x2000
ldr r3,[r4] r3 = read 10 from 0x2000
add r3,r0,r3 r3 = 0 + 10 = 10
str r0,[r4] write 10 to 0x2000
cmp r0,#0 0 = 0 so it matches
beq 0x1028 it is equal so we finally take this branch
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
increment after
r4 = [sp+0] = [0x7FD8] restore r4 from caller
pc = [sp+4] = [0x7FDC] = 0x1024
sp += 8 = 0x7FE0
(branch to 0x1024)(return from so(0) to so(1))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE0] restore r4 from caller
pc = [sp+4] = [0x7FE4] = 0x1024
sp += 8 = 0x7FE8
(branch to 0x1024)(return from so(1) to so(2))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE8] restore r4 from caller
pc = [sp+4] = [0x7FEC] = 0x1024
sp += 8 = 0x7FF0
(branch to 0x1024)(return from so(2) to so(3))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF0] restore r4 from caller
pc = [sp+4] = [0x7FF4] = 0x1024
sp += 8 = 0x7FF8
(branch to 0x1024)(return from so(3) to so(4))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF8] restore r4 from caller (main()'s r4)
pc = [sp+4] = [0x7FFC] = 0x708
sp += 8 = 0x8000
(branch to 0x708)(return from so(4) to main())
and we are done.
A stack is like a dixie cup holder which might be before your time. A cup holder where you pull a cup down and the next and rest of the cups stay in the holder, well you can shove one back up in there.
So a stack is temporary storage for the function, write one data item on the cup, then shove it up into the holder (save r4 from caller) write another item and shove it up into the holder (lr, return address from caller). we only used two items per function here, so each function I can push two cups up into the holder, each call of the function I get two NEW AND UNIQUE storage locations to store this local information. As I exit the function I pull the two cups down out of the holder and use their values (and discard them). This is to some extent the key to recursion, the stack gives you new local storage for each call, separate from prior calls to the same function, if nothing else you need a return address (although did make some even simpler recursion example that didnt when optimized was smart enough to make a loop out of it basically).
ldr rd,[rn] think of he brakets as saying the item at that address so read memory at the address in rn and save that value in rd.
str rd,[rn] the one messed up arm instruction as the rest the first parameter is the left side of the equals (add r1,r2,r3 r1 = r2 + r3, ldr r1,[r4] r1 = [r4]) this one is backward [rn] = rd store the value in rd to the memory location described by the address r4, one level of indirection.
stmdb sp!, means decrement the stack pointer before doing anything 4 bytes times the number of registers in the list, then write the first, lowest numbered register to [sp+0], then next to [sp+4] and so on the last one will be four less than the starting value of sp. The ! means the function finishes with sp being that decremented value. You can use ldm/stm for things other than stack pushes and pops. Like memcpy,but that is another story...
All of this is in the arm documentation from infocenter.arm.com which you should already have (arm architectural reference manual, armv5 is the preferred first one if you have not read one).
The following is my code. The block in hex2dec works successfully for converting a single hexadecimal number to decimal number. It would be really helpful if someone could point out where I was going wrong in the use of array. Thanks.
DATA SEGMENT
NUM DW 1234H,9H,15H
RES DB 3*10 DUP ('$','$','$')
SIZE DB 3
DATA ENDS
CODE SEGMENT
ASSUME DS:DATA, CS:CODE
START:
MOV AX, DATA
MOV DS,AX
MOV DI,0
LOOP3:
MOV AX,NUM[DI]
LEA SI,RES[DI]
CALL HEX2DEC
LEA DX,RES[DI]
MOV AH,9
INT 21H
INC DI
CMP DI,3
JL LOOP3
MOV AH,4CH ; end program
INT 21H
CODE ENDS
HEX2DEC PROC NEAR
MOV CX,0
MOV BX,10
LOOP1:
MOV DX,0
DIV BX
ADD DL,30H
PUSH DX
INC CX
CMP AX,9
JG LOOP1
ADD AL,30H
MOV [SI],AL
LOOP2:
POP AX
INC SI
MOV [SI],AL
LOOP LOOP2
RET
HEX2DEC ENDP
END START
MOV AX,NUM[DI]
LEA SI,RES[DI]
LEA DX,RES[DI]
You are treating DI as an array index like we use in any of the high level languages. In assembly programming we only use displacements aka offsets in the array.
In your program, since the NUM array is composed of words, you need to give the DI register successively the values 0, 2, and 4.
ADD DI, 2
CMP DI, 6
JB LOOP3
Also it would be best to not treat the RES as an array. Just consider it a buffer and always use it from the start.
RES DB 10 DUP (0)
...
LEA SI, RES
CALL HEX2DEC
LEA DX, RES
A better version of HEX2DEC avoids the ugly prefixed "0" on the single digit numbers:
HEX2DEC PROC NEAR
XOR CX, CX <--- Same as MOV CX,0
MOV BX,10
LOOP1:
XOR DX, DX <--- Same as MOV DX,0
DIV BX
ADD DL, 30H
PUSH DX
INC CX
TEST AX, AX
JNZ LOOP1
LOOP2:
POP AX
MOV [SI], AL
INC SI
LOOP LOOP2
MOV AL, "$" <--- Add this to use DOS function 09h
MOV [SI], AL
RET
HEX2DEC ENDP