bedmas process with intel 8086 assembly lanuage [duplicate] - intel

This question already has an answer here:
8086 assembly on DOSBox: Bug with idiv instruction?
(1 answer)
Closed 4 years ago.
I am trying to solve the equation: 7 * (4 + 10) + (15 / 5) for example in assembly language. I assume the BEDMAS principal still applies, but the code I run is not giving me the correct numerical value? I am not sure where I am going wrong. When we invoke DIV, does it not automatically divide the value from the AX register?
MOV BX,10
ADD BX,4
MOV AX,15
MOV BL,5
DIV BL
ADD AX,BX
MOV BX, 7
MUL BX
HLT

MOV BX,10
ADD BX,4
MOV AX,15
MOV BL,5 <<<< This overwrite the sum 10 + 4 in BX
DIV BL
ADD AX,BX <<<< Luckily remainder was zero
MOV BX, 7
MUL BX <<<< Needlessly clobbers DX
Apart from some other imperfections, this calculation does not even follow normal algebraic rules.
You've calculated 7 * ( (4 + 10) + (15 / 5) ) when the task asked for ( 7 * (4 + 10) ) + (15 / 5)
On 8086 both division and multiplication use the accumulator, so inevitably you'll have to move the result from whichever of these you choose to do first in an extra register.
The byte sized division yields a quotient in AL but also a remainder in AH. This task asks you to continu with the quotient disregarding the remainder. Your code does not explicitely zero AH and that's not good enough for a generalized solution! Luckily 15 / 5 gave a remainder = 0.
Solution with division before multiplication:
mov ax, 15
mov bl, 5 ;Divider in BL
div bl ;AX/BL -> AL=3 (remainder in AH=0)
mov bl, al ;Move to an extra register
mov al, 4
add al, 10 ;AL=14
mov ah, 7
mul ah ;AL*AH -> AX=98
add al, bl
Solution with multiplication before division:
mov al, 4
add al, 10 ;AL=14
mov ah, 7
mul ah ;AL*AH -> AX=98
mov bh, al ;Move to an extra register
mov ax, 15
mov bl, 5 ;Divider in BL
div bl ;AX/BL -> AL=3 (remainder in AH=0)
add al, bh
Both solutions produce the same result (101) and use just 2 registers (AX and BX).

Related

I dont understand the output of my code written in GNU Assembler. What happens to the stack after I call printf? [duplicate]

This question already has answers here:
What registers are preserved through a linux x86-64 function call
(3 answers)
Optimizing a loop that prints a counter
(2 answers)
Why is imul used for multiplying unsigned numbers?
(2 answers)
Why is %eax zeroed before a call to printf?
(3 answers)
Closed 6 months ago.
I am trying to code a program which prints the squares between 1 and 5 in assembly. But the output is not as expected. The function calling conventions used are those of UNIX.
My code:
.intel_syntax noprefix
.data
msg: .asciz "%d: %d\n"
.text
.global main
.type main, #function
main: PUSH RBP
MOV RBP, RSP
MOV RSI, 1 # i = 0
square: MOV RAX, RSI
MUL RSI
MOV RDI, offset flat:msg
MOV RDX, RAX
CALL printf
INC RSI
CMP RSI, 5
JB square
MOV EAX, 0
POP RBP
RET
Output: 1: 1

In my assembly program, I am trying to calculate the equation of (((((2^0 + 2^1) * 2^2) + 2^3) * 2^4) + 2^5)

In my 80x86 assembly program, I am trying to calculate the equation of
(((((2^0 + 2^1) * 2^2) + 2^3) * 2^4) + 2^5)...(2^n), where each even exponent is preceded by a multiplication and each odd exponent is preceded by a plus. I have code, but my result is continuously off from the desired result. When 5 is put in for n, the result should be 354, however I get 330.
Any and all advice will be appreciated.
.586
.model flat
include io.h
.stack 4096
.data
number dword ?
prompt byte "enter the power", 0
string byte 40 dup (?), 0
result byte 11 dup (?), 0
lbl_msg byte "answer", 0
bool dword ?
runtot dword ?
.code
_MainProc proc
input prompt, string, 40
atod string
push eax
call power
add esp, 4
dtoa result, eax
output lbl_msg, result
mov eax, 0
ret
_MainProc endp
power proc
push ebp
mov ebp, esp
push ecx
mov bool, 1 ;initial boolean value
mov eax, 1
mov runtot, 2 ;to keep a running total
mov ecx, [ebp + 8]
jecxz done
loop1:
add eax, eax ;power of 2
test bool, ecx ;test case for whether exp is odd/even
jnz oddexp ;if boolean is 1
add runtot, eax ;if boolean is 0
loop loop1
oddexp:
mov ebx, eax ;move eax to seperate register for multiplication
mov eax, runtot ;move existing total for multiplication
mul ebx ;multiplication of old eax to new eax/running total
loop loop1
done:
mov eax, runtot ;move final runtotal for print
pop ecx
pop ebp
ret
power endp
end
You're overcomplicating your code with static variables and branching.
These are powers of 2, you can (and should) just left-shift by n instead of actually constructing 2^n and using a mul instruction.
add eax,eax is the best way to multiply by 2 (aka left shift by 1), but it's not clear why you're doing that to the value in EAX at that point. It's either the multiply result (which you probably should have stored back into runtot after mul), or it's that left-shifted by 1 after an even iteration.
If you were trying to make a 2^i variable (with a strength reduction optimization to shift by 1 every iteration instead of shifting by i), then your bug is that you clobber EAX with mul, and its setup, in the oddexp block.
As Jester points out, if the first loop loop1 falls through, it will fall through into oddexp:. When you're doing loop tail duplication, make sure you consider where fall-through will go from each tail if the loop does end there.
There's also no point in having a static variable called bool which holds a 1, which you only use as an operand for test. That implies to human readers that the mask sometimes needs to change; test ecx,1 is a lot clearer as a way to check the low bit for zero / non-zero.
You also don't need static storage for runtot, just use a register (like EAX where you want the result eventually anyway). 32-bit x86 has 7 registers (not including the stack pointer).
This is how I'd do it. Untested, but I simplified a lot by unrolling by 2. Then the test for odd/even goes away because that alternating pattern is hard-coded into the loop structure.
We increment and compare/branch twice in the loop, so unrolling didn't get rid of the loop overhead, just changed one of the loop branches into an an if() break that can leave the loop from the middle.
This is not the most efficient way to write this; the increment and early-exit check in the middle of the loop could be optimized away by counting another counter down from n, and leaving the loop if there are less than 2 steps left. (Then sort it out in the epilogue)
;; UNTESTED
power proc ; fastcall calling convention: arg: ECX = unsigned int n
; clobbers: ECX, EDX
; returns: EAX
push ebx ; save a call-preserved register for scratch space
mov eax, 1 ; EAX = 2^0 running total / return value
test ecx,ecx
jz done
mov edx, ecx ; EDX = n
mov ecx, 1 ; ECX = i=1..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; add 2^odd power
mov ebx, 1
shl ebx, cl ; 2^i ; xor ebx, ebx; bts ebx, ecx
add eax, ebx ; total += 2^i
inc ecx
cmp ecx, edx
jae done ; if (++i >= n) break;
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
pop ebx
ret
I didn't check if the adding-odd-power step ever produces a carry into another bit. I think it doesn't, so it could be safe to implement it as bts eax, ecx (setting bit i). Effectively an OR instead of an ADD, but those are equivalent as long as the bit was previously cleared.
To make the asm look more like the source and avoid obscure instructions, I implemented 1<<i with shl to generate 2^i for total += 2^i, instead of a more-efficient-on-Intel xor ebx,ebx / bts ebx, ecx. (Variable-count shifts are 3 uops on Intel Sandybridge-family because of x86 flag-handling legacy baggage: flags have to be untouched if count=0). But that's worse on AMD Ryzen, where bts reg,reg is 2 uops but shl reg,cl is 1.
Update: i=3 does produce a carry when adding, so we can't OR or BTS the bit for that case. But optimizations are possible with more branching.
Using calc:
; define shiftadd_power(n) { local res=1; local i; for(i=1;i<=n;i++){ res+=1<<i; i++; if(i>n)break; res<<=i;} return res;}
shiftadd_power(n) defined
; base2(2)
; shiftadd_power(0)
1 /* 1 */
...
The first few outputs are:
n shiftadd(n) (base2)
0 1
1 11
2 1100
3 10100 ; 1100 + 1000 carries
4 101000000
5 101100000 ; 101000000 + 100000 set a bit that was previously 0
6 101100000000000
7 101100010000000 ; increasing amounts of trailing zero around the bit being flipped by ADD
Peeling the first 3 iterations would enable the BTS optimization, where you just set the bit instead of actually creating 2^n and adding.
Instead of just peeling them, we can just hard-code the starting point for i=3 for larger n, and optimize the code that figures out a return value for the n<3 case. I came up with a branchless formula for that based on right-shifting the 0b1100 bit-pattern by 3, 2, or 0.
Also note that for n>=18, the last shift count is strictly greater than half the width of the register, and the 2^i from odd i has no low bits. So only the last 1 or 2 iterations can affect the result. It boils down to either 1<<n for odd n, or 0 for even n. This simplifies to (n&1) << n.
For n=14..17, there are at most 2 bits set. Starting with result=0 and doing the last 3 or 4 iterations should be enough to get the correct total. In fact, for any n, we only need to do the last k iterations, where k is enough that the total shift count from even i is >= 32. Any bits set by earlier iterations are shifted out. (I didn't add a branch for this special case.)
;; UNTESTED
;; special cases for n<3, and for n>=18
;; enabling an optimization in the main loop (BTS instead of add)
;; funky overflow behaviour for n>31: large odd n gives 1<<(n%32) instead of 0
power_optimized proc
; fastcall calling convention: arg: ECX = unsigned int n <= 31
; clobbers: ECX, EDX
; returns: EAX
mov eax, 14h ; 0b10100 = power(3)
cmp ecx, 3
ja n_gt_3 ; goto main loop or fall through to hard-coded low n
je early_ret
;; n=0, 1, or 2 => 1, 3, 12 (0b1, 0b11, 0b1100)
mov eax, 0ch ; 0b1100 to be right-shifted by 3, 2, or 0
cmp ecx, 1 ; count=0,1,2 => CF,ZF,neither flag set
setbe cl ; count=0,1,2 => cl=1,1,0
adc cl, cl ; 3,2,0 (cl = cl+cl + (count<1) )
shr eax, cl
early_ret:
ret
large_n: ; odd n: result = 1<<n. even n: result = 0
mov eax, ecx
and eax, 1 ; n&1
shl eax, cl ; n>31 will wrap the shift count so this "fails"
ret ; if you need to return 0 for all n>31, add another check
n_gt_3:
;; eax = running total for i=3 already
cmp ecx, 18
jae large_n
mov edx, ecx ; EDX = n
mov ecx, 4 ; ECX = i=4..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc edx
cmp ecx, edx
jae done ; if (++i >= n) break;
; add 2^odd power. i>3 so it won't already be set (thus no carry)
bts eax, edx ; total |= 1<<i;
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
ret
By using BTS to set a bit in EAX avoids needing an extra scratch register to construct 1<<i in, so we don't have to save/restore EBX. So that's a minor bonus saving.
Notice that this time the main loop is entered with i=4, which is even, instead of i=1. So I swapped the add vs. shift.
I still didn't get around to pulling the cmp/jae out of the middle of the loop. Something like lea edx, [ecx-2] instead of mov would set the loop-exit condition, but would require a check to not run the loop at all for i=4 or 5. For large-count throughput, many CPUs can sustain 1 taken + 1 not-taken branch every 2 clocks, not creating a worse bottleneck than the loop-carried dep chains (through eax and ecx). But branch-prediction will be different, and it uses more branch-order-buffer entries to record more possible roll-back / fast-recovery points.

How is recursion possible in AVR Assembly?

I can't seem to wrap my head around recursion in Assembly Language. I understand how it works in higher level languages, but I don't understand how it is possible in assembly when the return value cannot be passed directly to the function.
I'm trying to make a recursive factorial function in AVR, but I don't understand how the stack passes the value when factorial requires n * (n-1), requiring both n and n-1 simultaneously
I just helped another person with the small code below to calculate factorial in AVR AtMega assembly.
It produces a factorial from 1~10, resulting in decimal 3628800 (hex 0x375F00).
It uses exactly what the OP wanted, if selected 8! as number! in R2, it will move 8 to the resulting bytes, then multiply by number!-1 and so on, until it reaches 1, then it ends. The multiplication 24x8 is the trickiest I could write, saving registers and clock cycles. It doesn't use stack nor RAM, straight use of AVR registers.
; Input at R2, value 1~10, from 1! to 10!
; Result 1~3628800 (0x375F00) at: R20:R21:R22 (LSB)
; Temporary Multiplication Middle Byte: R17
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, high(RAMEND)
out SPH, r16
Mov R16, R2 ; Get Value to factor
Rcall A0 ; Call Factorial
...
A0: Clr R20 ; Results = Number!
Clr R21 ;
Ldi R22, R16 ;
A1: Dec R16 ; Number! - 1
Cpi R16,1 ; If 1 then ended
Brne A2 ;
Ret
; This multiplication 24x8 is tricky, fast and save bytes
A2: Mul R22, R16 ; Mul Result LSB x Number!-1
Mov R22, R0 ; LSB Mul to Result LSB Byte
Mov R17, R1 ; MSB Mul to Temporary Middle Byte
Mul R20, R16 ; Mul Result MSB x Number!-1
Mov R20, R0 ; LSB Mul to MSB Result Byte, ignore MSB Mul, will be zero
Mul R21, R16 ; Mul Result Middle x Number!-1
Mov R21, R0 ; LSB Mul to Result Middle Byte
Add R21, R17 ; Add Temporary Middle to Result Middle Byte
Adc R20, R1 ; Add MSB Mul with Carry to Result MSB Byte
Rjmp A1
Using addition instead of multiplication
unsigned int accumulate(unsigned int n)
{
if(n) return(n+accumulate(n-1));
return(1);
}
and a different instruction set, perhaps easier to follow
00000000 <accumulate>:
0: e3500000 cmp r0, #0
4: 0a000005 beq 20 <accumulate+0x20>
8: e3a03000 mov r3, #0
c: e0833000 add r3, r3, r0
10: e2500001 subs r0, r0, #1
14: 1afffffc bne c <accumulate+0xc>
18: e2830001 add r0, r3, #1
1c: e12fff1e bx lr
20: e3a00001 mov r0, #1
24: e12fff1e bx lr
In this case the compiler didnt actually call the function, it detected what was going on and just made a loop.
Since there is nothing magic about recursion there is no difference in whether you call the same function or some other function.
unsigned int otherfun ( unsigned int );
unsigned int accumulate(unsigned int n)
{
if(n) return(n+otherfun(n-1));
return(1);
}
00000000 <accumulate>:
0: e92d4010 push {r4, lr}
4: e2504000 subs r4, r0, #0
8: 03a00001 moveq r0, #1
c: 0a000002 beq 1c <accumulate+0x1c>
10: e2440001 sub r0, r4, #1
14: ebfffffe bl 0 <otherfun>
18: e0800004 add r0, r0, r4
1c: e8bd4010 pop {r4, lr}
20: e12fff1e bx lr
so this shows how it works. Instead of using the stack to store the sum, the cheaper solution if you have the registers is to use a non-volatile register save that register to the stack then use that register during the funciton, depends on how many registers you have and how many local intermediate values you need to track. So r4 gets a copy of n coming in, then that is added (for factorial it is a multiply which depending on the instruction set and code can produce a lot more code that can confuse the understanding so I used addition instead) to the return value from the call to the next function (with recursion where the compiler didnt figure out what we were doing this would have been a call to ourselves, and we can write this asm and make it a call to ourselves to see how it works)
Then the function returns the sum.
If we assume that otherfun is really accumulate we enter this function with a 4 lets say
00000000 <accumulate>:
0: e92d4010 push {r4, lr}
4: e2504000 subs r4, r0, #0
8: 03a00001 moveq r0, #1
c: 0a000002 beq 1c <accumulate+0x1c>
10: e2440001 sub r0, r4, #1
14: ebxxxxxx bl accumulate
18: e0800004 add r0, r0, r4
1c: e8bd4010 pop {r4, lr}
20: e12fff1e bx lr
r4 and lr are saved on the stack (call this r4-4 and lr-4)
r4 = n (4)
r0 = n-1 (3)
call accumulate with n-1 (3)
r4 (4) and lr are saved on the stack (r4-3, lr-3) lr now points back into
r4 = n (3)
r0 = n-1 (2)
call accumulate with n-1 (2)
r4 (3) and lr are saved on the stack (r4-2, lr-2)
r4 = n (2)
r0 = n-1 (1)
call accumulate with n-1 (1)
r4 (2) and lr are saved on the stack (r4-1, lr-1)
r0 = n-1 (0)
call accumulate with n-1 (0)
now things change...
r0 = 1
return to lr-1 which is into accumulate after the call to accumulate
r4 gets 2 from the stack
r0 (1) = r0 (1) + r4 (2) = 3
return to lr-2 which is into accumulate r4 gets 3 from the stack
r0 (3) = r0 (3) + r4 (3) = 6
return to lr-3 which is into accumulate r4 gets 4 from the stack
r0 (6) = r0 (6) + r4 (4) = 10
return to lr-4 which is the function that called accumulate r4 is restored
to what it was before accumulate was first called, r4 is non-volatile you have to for this instruction set return r4 the way you found it (as well
as others, but we didnt modify those)
so the addition in this case multiplication in your desired case is
result = 1 + 2 + 3 + 4
How that happened is we basically pushed n on the stack then called the function with n-1. In this case we push 4, 3, 2, 1 then we start to unwind that and each return processes 1 then 2 then 3 then 4 as it returns
taking those from the stack essentially.
the bottom line is you dont have to care about recursion to support recursion simply use an abi that supports recursion, which is not hard to
do, then hand code the instructions in assembly as if you were the compiler
Maybe this makes it easier to see. n coming in is both a parameter coming in but also for the duration of the function it is a local variable, local
variables go on the stack.
unsigned int accumulate(unsigned int n)
{
unsigned int m;
m = n;
if(n) return(m+accumulate(n-1));
return(1);
}
back to this
unsigned int accumulate(unsigned int n)
{
if(n) return(n+accumulate(n-1));
return(1);
}
so independent of the instruction set
accumulate:
if(n!=0) jump over
return_reg = 1
return
over:
push n on the stack
first parameter (stack or register) = n - 1
call accumulate
pop or load n from the stack
return_reg = return_reg + n
clean stack
return
And also deal with return addresses for the instruction set if required.
The ABI may use the stack to pass parameters or registers.
If I didnt follow the arm abi I could implement
accumulate:
cmp r0,#0
bne over
mov r0,#1
bx lr
over:
push {lr}
push {r0}
sub r0,#1
bl accumulate
pop {r1}
add r0,r0,r1
pop {lr}
bx lr
for grins an instruction set that uses the stack for most things not
registers
00000000 <_accumulate>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 10a6 mov r2, -(sp)
6: 1d42 0004 mov 4(r5), r2
a: 0206 bne 18 <_accumulate+0x18>
c: 15c0 0001 mov $1, r0
10: 1d42 fffc mov -4(r5), r2
14: 1585 mov (sp)+, r5
16: 0087 rts pc
18: 1080 mov r2, r0
1a: 0ac0 dec r0
1c: 1026 mov r0, -(sp)
1e: 09f7 ffde jsr pc, 0 <_accumulate>
22: 6080 add r2, r0
24: 65c6 0002 add $2, sp
28: 1d42 fffc mov -4(r5), r2
2c: 1585 mov (sp)+, r5
2e: 0087 rts pc
it does a stack frame thing
gets the n parameter from the stack
saves that n parameter to the stack
compares and branches if not zero
in the if zero case we set the return value to 1
clean up the stack and return
now in the if not zero case
make the first parameter n-1
call a function (ourself)
do the addition and return

Converting an array of hexadecimal to decimal numbers Intel 8086 Assembly Language

The following is my code. The block in hex2dec works successfully for converting a single hexadecimal number to decimal number. It would be really helpful if someone could point out where I was going wrong in the use of array. Thanks.
DATA SEGMENT
NUM DW 1234H,9H,15H
RES DB 3*10 DUP ('$','$','$')
SIZE DB 3
DATA ENDS
CODE SEGMENT
ASSUME DS:DATA, CS:CODE
START:
MOV AX, DATA
MOV DS,AX
MOV DI,0
LOOP3:
MOV AX,NUM[DI]
LEA SI,RES[DI]
CALL HEX2DEC
LEA DX,RES[DI]
MOV AH,9
INT 21H
INC DI
CMP DI,3
JL LOOP3
MOV AH,4CH ; end program
INT 21H
CODE ENDS
HEX2DEC PROC NEAR
MOV CX,0
MOV BX,10
LOOP1:
MOV DX,0
DIV BX
ADD DL,30H
PUSH DX
INC CX
CMP AX,9
JG LOOP1
ADD AL,30H
MOV [SI],AL
LOOP2:
POP AX
INC SI
MOV [SI],AL
LOOP LOOP2
RET
HEX2DEC ENDP
END START
MOV AX,NUM[DI]
LEA SI,RES[DI]
LEA DX,RES[DI]
You are treating DI as an array index like we use in any of the high level languages. In assembly programming we only use displacements aka offsets in the array.
In your program, since the NUM array is composed of words, you need to give the DI register successively the values 0, 2, and 4.
ADD DI, 2
CMP DI, 6
JB LOOP3
Also it would be best to not treat the RES as an array. Just consider it a buffer and always use it from the start.
RES DB 10 DUP (0)
...
LEA SI, RES
CALL HEX2DEC
LEA DX, RES
A better version of HEX2DEC avoids the ugly prefixed "0" on the single digit numbers:
HEX2DEC PROC NEAR
XOR CX, CX <--- Same as MOV CX,0
MOV BX,10
LOOP1:
XOR DX, DX <--- Same as MOV DX,0
DIV BX
ADD DL, 30H
PUSH DX
INC CX
TEST AX, AX
JNZ LOOP1
LOOP2:
POP AX
MOV [SI], AL
INC SI
LOOP LOOP2
MOV AL, "$" <--- Add this to use DOS function 09h
MOV [SI], AL
RET
HEX2DEC ENDP

ASM x86 How to get the pointer from an array properly? (16-bit TASM / DOS)

Allright folks, let's hope this is an easy one: I need to access an array to achieve double-buffering (I use mode 13h) in 16-bit TASM. BUT: No matter if I use "OFFSET", "BYTE PTR [Array]", "BYTE PTR Array", or whatever I have tried already, the program reads/writes to the incorrect memory block, which is partly behind the actual start of the array.
Heres my (for now not really optimised and very messy) code:
.MODEL MEDIUM
.STACK
.DATA
XPos DW 0
YPos DB 0
Color DB 0
BoxX1 DW 0
BoxY1 DB 0
BoxX2 DW 0
BoxY2 DB 0
VPage DB 64010 DUP(0) ;TODO: Size *might* be incorrect.
PageSeg DW 0
.CODE
SetVGA13 PROC
MOV AX, 0013h ;Screen mode 13.
INT 10h ;Set screen mode to AX.
MOV AX, 0A000h ;Screen segment.
MOV ES, AX ;You can't affect segment registers
RET
ENDP
;-------DrawPixel---------------
; WORD XPos = x
; WORD YPos = y
; BYTE Color = colour
;-------------------------------
DrawPixel PROC
XOR AH, AH
MOV AL, [YPos]
MOV DX, 320
MUL DX
ADD AX, [XPos]
MOV DI, AX
MOV AL, [Color]
MOV ES, [PageSeg]
;ADD ES, DI
MOV ES:[DI],AL
;MOV ES:[DI],AL
RET
ENDP
DrawBox PROC
MOV CL, [BoxY1]
YLoop:
MOV BL, CL
PUSH CX
MOV CX, [BoxX1]
XLoop:
MOV [XPos], CX
MOV [YPos], BL
MOV [Color],CL
CALL DrawPixel
INC CX
CMP CX, [BoxX2]
JNZ XLoop
POP CX
INC CL
CMP CL, [BoxY2]
JNZ YLoop
RET
ENDP
WaitFrame PROC
PUSH DX
; Port #03DA contains VGA status
MOV DX, 03DAh
IN AL, DX
WaitRetrace:
; Bit 3 will be on if we're in retrace
TEST AL, 08h
JNZ WaitRetrace
EndRefresh:
IN AL, DX
TEST AL, 08h
JZ EndRefresh
POP DX
RET
ENDP
RestoreVideo PROC
; Return to text mode
MOV AX, 03h
INT 10h
RET
ENDP
ClearScreen PROC
XOR CX, CX
;MOV ES, [PageSeg]
ClearLoop:
MOV DI, CX
;MOV ES, [PageSeg]
MOV BX, OFFSET VPage
ADD BX, CX
MOV AL, [BX];VPage[DI];ES:[DI]
MOV [Color],AL
MOV AX, 0A000h
MOV ES, AX
MOV AL, [Color]
MOV ES:[DI],AL
INC CX
CMP CX, 64000
JNZ ClearLoop
RET
ENDP
Main:
;INITIALISE
MOV BX, OFFSET VPage
MOV [PageSeg],BX
CALL SetVGA13
;CALL MakePalette
MOV [BoxX1],33
MOV [BoxY1],33
MOV [BoxX2],99
MOV [BoxY2],99
;LOOP
GameLoop:
;DRAW
;CALL DrawBox
CALL ClearScreen
;CALL WaitFrame
;INPUT
MOV DX, 60h
IN AL, DX
CMP AL, 75
JNZ NotLeft
SUB [BoxX1],1
SUB [BoxX2],1
NotLeft:
IN AL, DX
CMP AL, 77
JNZ NotRight
ADD [BoxX1],1
ADD [BoxX2],1
NotRight:
CMP AL, 1
JNZ GameLoop
;END PROGRAM
Error:
;CALL ClearScreen
CALL RestoreVideo
MOV AH, 4Ch
INT 21h
END Main
This code shows a rainbow coloured box that you can move around with the left and right arrow keys,
;INITIALISE
MOV BX, OFFSET VPage
MOV [PageSeg],BX
That is my sad attempt to getbthe pointer to my buffer, but doesnt return the correct one
Sorry that my question was not done, i realised that when i got out of bed immediately for some reason.
Although this is not how I would code things, I'll offer up a couple of suggestions that may get you closer to resolving your problems.
When your executable loads the DS and ES registers originally point to the DOS PSP for your program. In your case you need to at least point DS to your DATA segment. The DATA segment that is filled in at run time by the DOS EXE loader can be referenced in your code by prefixing the segment name with an # (AT) symbol. So you can replace this code:
;INITIALISE
MOV BX, OFFSET VPage
MOV [PageSeg],BX
With this:
;INITIALISE
MOV BX, #DATA ; Set up DS with our program's DATA segment
MOV DS, BX
MOV [PageSeg],BX ; VPage is in DATA segment, move segment to PageSeg
In your DrawPixel code you'll need to add the VPage offset to DI. Replace this code:
XOR AH, AH
MOV AL, [YPos]
MOV DX, 320
MUL DX
ADD AX, [XPos]
MOV DI, AX
MOV AL, [Color]
MOV ES, [PageSeg]
MOV ES:[DI],AL
with:
XOR AH, AH
MOV AL, [YPos]
MOV DX, 320
MUL DX
ADD AX, [XPos]
MOV DI, AX
MOV AL, [Color]
MOV ES, [PageSeg]
ADD DI, OFFSET VPage ; We need to add VPage's offset
; from beginning of PageSeg
MOV ES:[DI],AL
Video mode 13h is 320x200x256 colors. The amount of video ram that is needed would be 320*200*1 = 64000. This could be changed from:
VPage DB 64010 DUP(0) ;TODO: Size *might* be incorrect.
to:
VPage DB 64000 DUP(0)
With these changes the program seems to display a rainbow box and moves left and right on the screen with the arrow keys.
There may be other bugs but I don't know what you were trying to achieve. This code could be simplified greatly. My changes were the minimal ones to work with the way you coded your program and make it somewhat functional.

Resources