Inversion of a bit - common-lisp

I haven't been able to find a common-lisp function (or macro) that simply inverts a bit. There are functions that operate on bit arrays, and logical functions that operate on numbers, but these would seem to involve extra steps for inversion of a single bit variable. A tentative solution is
(define-modify-macro invertf (&rest args)
(lambda (bit)
(if (= bit 0) 1 0)))
which works, but I was wondering if there is a more elegant or simpler solution.

There are a lot of operators for bitwise arithmetic:
For example you can use
(logxor bit 1)
This will give you 0 if the bit is 1 and 1 if the bit is 0:
(logxor 0 1) ; => 1
(logxor 1 1) ; => 0
Of course you can put the bit part as the second argument:
(logxor 1 bit)
Bonus:
Perhaps you can make a function and optimise it with a type declaration:
(defun invert (bit)
"Inverts a bit"
(declare (type bit bit))
(logxor bit 1))
After running (disassemble 'invert) I got the following results on SBCL:
WITHOUT type declaration:
; disassembly for INVERT
; Size: 56 bytes. Origin: #x10059E97FC
; 7FC: 498B4C2460 MOV RCX, [R12+96] ; thread.binding-stack-pointer
; no-arg-parsing entry point
; 801: 48894DF8 MOV [RBP-8], RCX
; 805: BF02000000 MOV EDI, 2
; 80A: 488BD3 MOV RDX, RBX
; 80D: 4883EC18 SUB RSP, 24
; 811: 48896C2408 MOV [RSP+8], RBP
; 816: 488D6C2408 LEA RBP, [RSP+8]
; 81B: B904000000 MOV ECX, 4
; 820: FF1425980B1020 CALL QWORD PTR [#x20100B98] ; TWO-ARG-XOR
; 827: 488B5DF0 MOV RBX, [RBP-16]
; 82B: 488BE5 MOV RSP, RBP
; 82E: F8 CLC
; 82F: 5D POP RBP
; 830: C3 RET
; 831: 0F0B10 BREAK 16 ; Invalid argument count trap
WITH type declaration
; disassembly for INVERT
; Size: 25 bytes. Origin: #x1005767CA9
; A9: 498B4C2460 MOV RCX, [R12+96] ; thread.binding-stack-pointer
; no-arg-parsing entry point
; AE: 48894DF8 MOV [RBP-8], RCX
; B2: 488BD3 MOV RDX, RBX
; B5: 4883F202 XOR RDX, 2
; B9: 488BE5 MOV RSP, RBP
; BC: F8 CLC
; BD: 5D POP RBP
; BE: C3 RET
; BF: 0F0B10 BREAK 16 ; Invalid argument count trap
It seems to me that the type declaration saved a few operations.
Let's measure the difference:
(defun invert (bit)
"Inverts a bit"
(declare (type bit bit))
(logxor bit 1))
(defun invert-no-dec (bit)
"Inverts a bit"
(logxor bit 1))
Now running:
(time (loop repeat 1000000 do (invert 1)))
Outputs:
Evaluation took:
0.007 seconds of real time
0.005164 seconds of total run time (0.005073 user, 0.000091 system)
71.43% CPU
14,060,029 processor cycles
0 bytes consed
While running:
(time (loop repeat 1000000 do (invert-no-dec 1)))
Outputs:
Evaluation took:
0.011 seconds of real time
0.011327 seconds of total run time (0.011279 user, 0.000048 system)
100.00% CPU
25,505,355 processor cycles
0 bytes consed
It seems the type declaration makes the function twice as fast. It must be noted that the call to invert will probably offset the gain in performance, unless you use (declaim (inline invert)). From the CLHS:
inline specifies that it is desirable for the compiler to produce inline calls to the functions named by function-names; that is, the code for a specified function-name should be integrated into the calling routine, appearing ``in line'' in place of a procedure call.

Related

How do I save the last value of my recursive function in MASM?

Essentially, I wrote a function that finds the GCD of two numbers recursively. I am trying to store the last value of the recursion call into firstVal, so I can print it to the screen. The way I have it written prints the all values of the recursion.
GCD proc firstVal: dword, secondVal: dword
mov edx, 0 ;clear edx for div
cmp secondVal, 0 ;if b is 0 (only way to exit recursion)
jz foundGCD ;then a is the gcd
mov eax, firstVal ;to find a mod b
div secondVal ;div a by b and check ah
mov ecx, secondVal ;old b in ecx
mov firstVal, ecx ;now store ecx in a
mov secondVal, edx ;store a mod b in b
invoke GCD, firstVal, secondVal ;recursion
foundGCD:
call crlf ;newline
mov eax, firstVal ;firstVal holds the gcd
call writedec ;I think the problem sits somewhere here?
call waitmsg
ret
GCD endp
How do I save the last value of the recursive proc?
The issue is that after the return from your recursive call, execution continues at the statement that follows the invoke GCD call: the foundGCD label, which will generate output.
What you should do after the invoke is handle the result of recursion, which in this case is a simple ret.
invoke GCD, firstVal, secondVal ;recursion
ret
With that, we can see that the recursion is tail recursion, so the recursive call can be replaced with an unconditional jump (left as an exercise for the reader).

Fibonacci with recursion steps shown indented / nested

This is the format that I need:
F(3) = F(2) + F(1) =
F(2) = (F1) + F(0) =
F(1) = 1
F(0) = 1
F(2) = 1
F(1) = 1
F(3) = 2
and this is my code, how am I going to do to get the format I want?
Please give me a hint or something that may help, thank you. I just start learning assembly language.
I only know how to show the first line like f()= the answer, but don't know how to show the process.
.data
fib1 BYTE "f(",0
fib2 BYTE ") + f(",0
fib3 BYTE ") = ",0
intVal DWORD ?
main PROC
mov edx, OFFSET fib1 ;show f(intVal)=
call WriteString
mov edx, intVal
call WriteDec
mov edx, OFFSET fib3
call WriteString
mov ecx, intVal-1
push intVal
call fib
add esp, 4
call WriteDec ;show result
call crlf
mov edx, OFFSET msg5 ;show goodbye msg
call WriteString
mov edx, OFFSET username
call WriteString
exit
main ENDP
fib PROC c
add ecx, 1
push ebp
mov ebp, esp
sub esp,4
mov eax, [ebp+8] ;get value
cmp eax,2 ;if ((n=1)or(n=2))
je S4
cmp eax,1
je S4
dec eax ;do fib(n-1)+fib(n-2)
push eax ;fib(n-1)
call fib
mov [ebp-4], eax ;store first result
dec dword ptr [esp] ;(n-1) -> (n-2)
call fib
add esp,4 ;clear
add eax,[ebp-4] ;add result and stored first result
jmp Quit
S4:
mov eax,1 ;start from 1,1
Quit:
mov esp,ebp ;restore esp
pop ebp ;restore ebp
ret
fib ENDP
END main
The code needs to output a "newline" at the end of each line of output, which could be a carriage return (00dh) followed by a linefeed (00ah) or just a linefeed (00ah) depending on the system (I don't know the irvine setup).
The indented lines should be printed from within the fib function, which means you have to save (push/pop stack) any registers that the print functions use.
The fib function needs to print out a variable number of spaces depending on the level of recursion, based on the text, 2 spaces per level of recursion.
The fib function needs to handle an input of 0 and return 0.
Note that the number of recursive calls to the fib function will be 2 * fib(n) - 1, assuming fib checks for fib(0), fib(1), fib(2) (more if it doesn't check for fib(2) ), which would be 5.94 billion calls for fib(47) = 2971215073, the max value for 32 bit unsigned integers. You may want to limit inputs to something like fib(10) = 55 or fib(11) = 89 or fib(12) = 144.

Attempting to implement this recursive Fibonacci sequence gives wrong outputs

I am trying to implement Fibonacci sequence in assembly by using recursion. This is my first time of trying to implement recursion in x86 Assembly.
The code compiles fine but it gives wrong outputs. The output for 1 is 1, output for 2 is 0, output for 3 is 1, output for 4 is 2, output for 5 is 3.
Only output it gives correct when you plug 5 in.
Is there something wrong with the algorithm?
.DATA
n1 DWORD ?
prompt1 BYTE "Please enter the first value", 0
prompt3 BYTE "No negative numbers!",0
string BYTE 40 DUP (?)
resultLbl BYTE "The Fib is: ", 0
fib BYTE 40 DUP (?), 0
.CODE
_MainProc PROC
input prompt1, string, 40
atod string
test eax, eax
js signed
mov n1, eax
jmp procName
signed:
output prompt3, string
jmp end1
procName:
mov eax, n1
push n1
call fib1
add esp,4
dtoa fib, eax
output resultLbl, fib
end1:
mov eax, 0
ret
_MainProc ENDP
Fib1 proc
PUSH EBP ; save previous frame pointer
MOV EBP, ESP ; set current frame pointer
MOV EAX, [EBP+8] ; get argument N
CMP EAX, 1 ; N<=1?
JA Recurse ; no, compute it recursively
MOV ECX, 1 ; yes, Fib(1)--> 1
JMP exit
Recurse:
DEC EAX ; = N-1
MOV EDX, EAX ; = N-1
PUSH EDX ; save N-1
PUSH EAX ; set argument = N-1
CALL Fib1 ; compute Fib(N-1) to ECX
POP EAX ; pop N-1
DEC EAX ; = N-2
PUSH ECX ; save Fib(N-1)
PUSH EAX ; set argument = N-2
CALL Fib1 ; compute Fib(N-2) to ECX
POP EAX ; = Fib(N-1)
ADD ECX, EAX ; = Fib(N-1)+FIB(N-2)
exit:
MOV ESP,EBP ; reset stack to value at function entry
POP EBP ; restore caller's frame pointer
RET
Fib1 endp
END

In my assembly program, I am trying to calculate the equation of (((((2^0 + 2^1) * 2^2) + 2^3) * 2^4) + 2^5)

In my 80x86 assembly program, I am trying to calculate the equation of
(((((2^0 + 2^1) * 2^2) + 2^3) * 2^4) + 2^5)...(2^n), where each even exponent is preceded by a multiplication and each odd exponent is preceded by a plus. I have code, but my result is continuously off from the desired result. When 5 is put in for n, the result should be 354, however I get 330.
Any and all advice will be appreciated.
.586
.model flat
include io.h
.stack 4096
.data
number dword ?
prompt byte "enter the power", 0
string byte 40 dup (?), 0
result byte 11 dup (?), 0
lbl_msg byte "answer", 0
bool dword ?
runtot dword ?
.code
_MainProc proc
input prompt, string, 40
atod string
push eax
call power
add esp, 4
dtoa result, eax
output lbl_msg, result
mov eax, 0
ret
_MainProc endp
power proc
push ebp
mov ebp, esp
push ecx
mov bool, 1 ;initial boolean value
mov eax, 1
mov runtot, 2 ;to keep a running total
mov ecx, [ebp + 8]
jecxz done
loop1:
add eax, eax ;power of 2
test bool, ecx ;test case for whether exp is odd/even
jnz oddexp ;if boolean is 1
add runtot, eax ;if boolean is 0
loop loop1
oddexp:
mov ebx, eax ;move eax to seperate register for multiplication
mov eax, runtot ;move existing total for multiplication
mul ebx ;multiplication of old eax to new eax/running total
loop loop1
done:
mov eax, runtot ;move final runtotal for print
pop ecx
pop ebp
ret
power endp
end
You're overcomplicating your code with static variables and branching.
These are powers of 2, you can (and should) just left-shift by n instead of actually constructing 2^n and using a mul instruction.
add eax,eax is the best way to multiply by 2 (aka left shift by 1), but it's not clear why you're doing that to the value in EAX at that point. It's either the multiply result (which you probably should have stored back into runtot after mul), or it's that left-shifted by 1 after an even iteration.
If you were trying to make a 2^i variable (with a strength reduction optimization to shift by 1 every iteration instead of shifting by i), then your bug is that you clobber EAX with mul, and its setup, in the oddexp block.
As Jester points out, if the first loop loop1 falls through, it will fall through into oddexp:. When you're doing loop tail duplication, make sure you consider where fall-through will go from each tail if the loop does end there.
There's also no point in having a static variable called bool which holds a 1, which you only use as an operand for test. That implies to human readers that the mask sometimes needs to change; test ecx,1 is a lot clearer as a way to check the low bit for zero / non-zero.
You also don't need static storage for runtot, just use a register (like EAX where you want the result eventually anyway). 32-bit x86 has 7 registers (not including the stack pointer).
This is how I'd do it. Untested, but I simplified a lot by unrolling by 2. Then the test for odd/even goes away because that alternating pattern is hard-coded into the loop structure.
We increment and compare/branch twice in the loop, so unrolling didn't get rid of the loop overhead, just changed one of the loop branches into an an if() break that can leave the loop from the middle.
This is not the most efficient way to write this; the increment and early-exit check in the middle of the loop could be optimized away by counting another counter down from n, and leaving the loop if there are less than 2 steps left. (Then sort it out in the epilogue)
;; UNTESTED
power proc ; fastcall calling convention: arg: ECX = unsigned int n
; clobbers: ECX, EDX
; returns: EAX
push ebx ; save a call-preserved register for scratch space
mov eax, 1 ; EAX = 2^0 running total / return value
test ecx,ecx
jz done
mov edx, ecx ; EDX = n
mov ecx, 1 ; ECX = i=1..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; add 2^odd power
mov ebx, 1
shl ebx, cl ; 2^i ; xor ebx, ebx; bts ebx, ecx
add eax, ebx ; total += 2^i
inc ecx
cmp ecx, edx
jae done ; if (++i >= n) break;
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
pop ebx
ret
I didn't check if the adding-odd-power step ever produces a carry into another bit. I think it doesn't, so it could be safe to implement it as bts eax, ecx (setting bit i). Effectively an OR instead of an ADD, but those are equivalent as long as the bit was previously cleared.
To make the asm look more like the source and avoid obscure instructions, I implemented 1<<i with shl to generate 2^i for total += 2^i, instead of a more-efficient-on-Intel xor ebx,ebx / bts ebx, ecx. (Variable-count shifts are 3 uops on Intel Sandybridge-family because of x86 flag-handling legacy baggage: flags have to be untouched if count=0). But that's worse on AMD Ryzen, where bts reg,reg is 2 uops but shl reg,cl is 1.
Update: i=3 does produce a carry when adding, so we can't OR or BTS the bit for that case. But optimizations are possible with more branching.
Using calc:
; define shiftadd_power(n) { local res=1; local i; for(i=1;i<=n;i++){ res+=1<<i; i++; if(i>n)break; res<<=i;} return res;}
shiftadd_power(n) defined
; base2(2)
; shiftadd_power(0)
1 /* 1 */
...
The first few outputs are:
n shiftadd(n) (base2)
0 1
1 11
2 1100
3 10100 ; 1100 + 1000 carries
4 101000000
5 101100000 ; 101000000 + 100000 set a bit that was previously 0
6 101100000000000
7 101100010000000 ; increasing amounts of trailing zero around the bit being flipped by ADD
Peeling the first 3 iterations would enable the BTS optimization, where you just set the bit instead of actually creating 2^n and adding.
Instead of just peeling them, we can just hard-code the starting point for i=3 for larger n, and optimize the code that figures out a return value for the n<3 case. I came up with a branchless formula for that based on right-shifting the 0b1100 bit-pattern by 3, 2, or 0.
Also note that for n>=18, the last shift count is strictly greater than half the width of the register, and the 2^i from odd i has no low bits. So only the last 1 or 2 iterations can affect the result. It boils down to either 1<<n for odd n, or 0 for even n. This simplifies to (n&1) << n.
For n=14..17, there are at most 2 bits set. Starting with result=0 and doing the last 3 or 4 iterations should be enough to get the correct total. In fact, for any n, we only need to do the last k iterations, where k is enough that the total shift count from even i is >= 32. Any bits set by earlier iterations are shifted out. (I didn't add a branch for this special case.)
;; UNTESTED
;; special cases for n<3, and for n>=18
;; enabling an optimization in the main loop (BTS instead of add)
;; funky overflow behaviour for n>31: large odd n gives 1<<(n%32) instead of 0
power_optimized proc
; fastcall calling convention: arg: ECX = unsigned int n <= 31
; clobbers: ECX, EDX
; returns: EAX
mov eax, 14h ; 0b10100 = power(3)
cmp ecx, 3
ja n_gt_3 ; goto main loop or fall through to hard-coded low n
je early_ret
;; n=0, 1, or 2 => 1, 3, 12 (0b1, 0b11, 0b1100)
mov eax, 0ch ; 0b1100 to be right-shifted by 3, 2, or 0
cmp ecx, 1 ; count=0,1,2 => CF,ZF,neither flag set
setbe cl ; count=0,1,2 => cl=1,1,0
adc cl, cl ; 3,2,0 (cl = cl+cl + (count<1) )
shr eax, cl
early_ret:
ret
large_n: ; odd n: result = 1<<n. even n: result = 0
mov eax, ecx
and eax, 1 ; n&1
shl eax, cl ; n>31 will wrap the shift count so this "fails"
ret ; if you need to return 0 for all n>31, add another check
n_gt_3:
;; eax = running total for i=3 already
cmp ecx, 18
jae large_n
mov edx, ecx ; EDX = n
mov ecx, 4 ; ECX = i=4..n loop counter and shift count
loop1: ; do{ // unrolled by 2
; multiply by 2^even power
shl eax, cl ; total <<= i; // same as total *= (1<<i)
inc edx
cmp ecx, edx
jae done ; if (++i >= n) break;
; add 2^odd power. i>3 so it won't already be set (thus no carry)
bts eax, edx ; total |= 1<<i;
inc ecx ; ++i
cmp ecx, edx
jb loop1 ; }while(i<n);
done:
ret
By using BTS to set a bit in EAX avoids needing an extra scratch register to construct 1<<i in, so we don't have to save/restore EBX. So that's a minor bonus saving.
Notice that this time the main loop is entered with i=4, which is even, instead of i=1. So I swapped the add vs. shift.
I still didn't get around to pulling the cmp/jae out of the middle of the loop. Something like lea edx, [ecx-2] instead of mov would set the loop-exit condition, but would require a check to not run the loop at all for i=4 or 5. For large-count throughput, many CPUs can sustain 1 taken + 1 not-taken branch every 2 clocks, not creating a worse bottleneck than the loop-carried dep chains (through eax and ecx). But branch-prediction will be different, and it uses more branch-order-buffer entries to record more possible roll-back / fast-recovery points.

Recersive proc in x8086 Assembly

;==============================================================================================
; recursive procedure:
; supersum(int x)
; returns 1*2 + 2*3 + 3*4 + ... + i*(i+1)
supersum PROC
push ebp ; start of every procedure
mov ebp, esp
push ebx
; Actual subproc calc here
mov eax, [ebp +8] ; returning eax to the original called value
cmp eax,1
je basecase
dec eax ; (n-1) ; its just a lie...
mov recnum, eax ; saving dec val for call
mul double ; 2(n-1)
mov rhs, eax ; the right hand side is finished
push recnum
call sumseries ; a_(n-1)
add esp, 4
definition:
add eax, rhs ; a_(n-1)+ 2(n-1)
jmp skp
basecase: ; a_1 = 0
mov eax, 0
mov rhs, 2
jmp definition
skp:
pop ebx
pop ebp
ret
supersum ENDP
;============================================================================
The explicit definition of the serries I'm trying to get is 1*2 + 2*3 + 3*4 ... + i(i+1).
I've got the math for it down and I found that the recursive definition for the series is a_n = a_(n-1) + 2(n-1) with a_1 = 0 as a base case. I'm trying to figure out why this code keeps giving me the even series: {2,4,6,8,10 ...} instead of the series I'm trying to calculate

Resources