I'm trying to implement functions and recursion in an ASM-like simplified language that has no procedures. Only simple jumpz, jump, push, pop, add, mul type commands.
Here are the commands:
(all variables and literals are integers)
set (sets the value of an already existing variable or declares and initializes a new variable) e.g. (set x 3)
push (pushes a value onto the stack. can be a variable or an integer) e.g. (push 3) or (push x)
pop (pops the stack into a variable) e.g. (pop x)
add (adds the second argument to the first argument) e.g. (add x 1) or (add x y)
mul (same as add but for multiplication)
jump (jumps to a specific line of code) e.g. (jump 3) would jump to line 3 or (jump x) would jump to the line # equal to the value of x
jumpz (jumps to a line number if the second argument is equal to zero) e.g. (jumpz 3 x) or (jumpz z x)
The variable 'IP' is the program counter and is equal to the line number of the current line of code being executed.
In this language, functions are blocks of code at the bottom of the program that are terminated by popping a value off the stack and jumping to that value. (using the stack as a call stack) Then the functions can be called anywhere else in the program by simply pushing the instruction pointer onto the stack and then jumping to the start of the function.
This works fine for non-recursive functions.
How could this be modified to handle recursion?
I've read that implementing recursion with a stack is a matter of pushing parameters and local variables onto the stack (and in this lower level case, also the instruction pointer I think)
I wouldn't be able to do something like x = f(n). To do this I'd have some variable y (that is also used in the body of f), set y equal to n, call f which assigns its "return value" to y and then jumps control back to where f was called from, where we then set x equal to y.
(a function that squares a number whose definition starts at line 36)
1 - set y 3
2 - set returnLine IP
3 - add returnLine 2
4 - push returnLine
5 - jump 36
6 - set x y
...
36 - mul y 2
37 - pop returnLine
38 - jump returnLine
This doesn't seem to lend itself to recursion. Arguments and intermediate values would need to go on the stack and I think multiple instances on the stack of the same address would result from recursive calls which is fine.
Next code raises the number "base" to the power "exponent" recursively in "John Smith Assembly":
1 - set base 2 ;RAISE 2 TO ...
2 - set exponent 4 ;... EXPONENT 4 (2^4=16).
3 - set result 1 ;MUST BE 1 IN ORDER TO MULTIPLY.
4 - set returnLine IP ;IP = 4.
5 - add returnLine 4 ;RETURNLINE = 4+4.
6 - push returnLine ;PUSH 8.
7 - jump 36 ;CALL FUNCTION.
.
.
.
;POWER FUNCTION.
36 - jumpz 43 exponent ;FINISH IF EXPONENT IS ZERO.
37 - mul result base ;RESULT = ( RESULT * BASE ).
38 - add exponent -1 ;RECURSIVE CONTROL VARIABLE.
39 - set returnLine IP ;IP = 39.
40 - add returnLine 4 ;RETURN LINE = 39+4.
41 - push returnLine ;PUSH 43.
42 - jump 36 ;RECURSIVE CALL.
43 - pop returnLine
44 - jump returnLine
;POWER END.
In order to test it, let's run it manually :
LINE | BASE EXPONENT RESULT RETURNLINE STACK
------|---------------------------------------
1 | 2
2 | 4
3 | 1
4 | 4
5 | 8
6 | 8
7 |
36 |
37 | 2
38 | 3
39 | 39
40 | 43
41 | 43(1)
42 |
36 |
37 | 4
38 | 2
39 | 39
40 | 43
41 | 43(2)
42 |
36 |
37 | 8
38 | 1
39 | 39
40 | 43
41 | 43(3)
42 |
36 |
37 | 16
38 | 0
39 | 39
40 | 43
41 | 43(4)
42 |
36 |
43 | 43(4)
44 |
43 | 43(3)
44 |
43 | 43(2)
44 |
43 | 43(1)
44 |
43 | 8
44 |
8 |
Edit : parameter for function now on stack (didn't run it manually) :
1 - set base 2 ;RAISE 2 TO ...
2 - set exponent 4 ;... EXPONENT 4 (2^4=16).
3 - set result 1 ;MUST BE 1 IN ORDER TO MULTIPLY.
4 - set returnLine IP ;IP = 4.
5 - add returnLine 7 ;RETURNLINE = 4+7.
6 - push returnLine ;PUSH 11.
7 - push base ;FIRST PARAMETER.
8 - push result ;SECOND PARAMETER.
9 - push exponent ;THIRD PARAMETER.
10 - jump 36 ;FUNCTION CALL.
...
;POWER FUNCTION.
36 - pop exponent ;THIRD PARAMETER.
37 - pop result ;SECOND PARAMETER.
38 - pop base ;FIRST PARAMETER.
39 - jumpz 49 exponent ;FINISH IF EXPONENT IS ZERO.
40 - mul result base ;RESULT = ( RESULT * BASE ).
41 - add exponent -1 ;RECURSIVE CONTROL VARIABLE.
42 - set returnLine IP ;IP = 42.
43 - add returnLine 7 ;RETURN LINE = 42+7.
44 - push returnLine ;PUSH 49.
45 - push base
46 - push result
47 - push exponent
48 - jump 36 ;RECURSIVE CALL.
49 - pop returnLine
50 - jump returnLine
;POWER END.
Your asm does provide enough facilities to implement the usual procedure call / return sequence. You can push a return address and jump as a call, and pop a return address (into a scratch location) and do an indirect jump to it as a ret. We can just make call and ret macros. (Except that generating the correct return address is tricky in a macro; we might need a label (push ret_addr), or something like set tmp, IP / add tmp, 4 / push tmp / jump target_function). In short, it's possible and we should wrap it up in some syntactic sugar so we don't get bogged down with that while looking at recursion.
With the right syntactic sugar, you can implement Fibonacci(n) in assembly that will actually assemble for both x86 and your toy machine.
You're thinking in terms of functions that modify static (global) variables. Recursion requires local variables so each nested call to the function has its own copy of local variables. Instead of having registers, your machine has (apparently unlimited) named static variables (like x and y). If you want to program it like MIPS or x86, and copy an existing calling convention, just use some named variables like eax, ebx, ..., or r0 .. r31 the way a register architecture uses registers.
Then you implement recursion the same way you do in a normal calling convention, where either the caller or callee use push / pop to save/restore a register on the stack so it can be reused. Function return values go in a register. Function args should go in registers. An ugly alternative would be to push them after the return address (creating a caller-cleans-the-args-from-the-stack calling convention), because you don't have a stack-relative addressing mode to access them the way x86 does (above the return address on the stack). Or you could pass return addresses in a link register, like most RISC call instructions (usually called bl or similar, for branch-and-link), instead of pushing it like x86's call. (So non-leaf callees have to push the incoming lr onto the stack themselves before making another call)
A (silly and slow) naively-implemented recursive Fibonacci might do something like:
int Fib(int n) {
if(n<=1) return n; // Fib(0) = 0; Fib(1) = 1
return Fib(n-1) + Fib(n-2);
}
## valid implementation in your toy language *and* x86 (AMD64 System V calling convention)
### Convenience macros for the toy asm implementation
# pretend that the call implementation has some way to make each return_address label unique so you can use it multiple times.
# i.e. just pretend that pushing a return address and jumping is a solved problem, however you want to solve it.
%define call(target) push return_address; jump target; return_address:
%define ret pop rettmp; jump rettmp # dedicate a whole variable just for ret, because we can
# As the first thing in your program, set eax, 0 / set ebx, 0 / ...
global Fib
Fib:
# input: n in edi.
# output: return value in eax
# if (n<=1) return n; // the asm implementation of this part isn't interesting or relevant. We know it's possible with some adds and jumps, so just pseudocode / handwave it:
... set eax, edi and ret if edi <= 1 ... # (not shown because not interesting)
add edi, -1
push edi # save n-1 for use after the recursive call
call Fib # eax = Fib(n-1)
pop edi # restore edi to *our* n-1
push eax # save the Fib(n-1) result across the call
add edi, -1
call Fib # eax = Fib(n-2)
pop edi # use edi as a scratch register to hold Fib(n-1) that we saved earlier
add eax, edi # eax = return value = Fib(n-1) + Fib(n-2)
ret
During a recursive call to Fib(n-1) (with n-1 in edi as the first argument), the n-1 arg is also saved on the stack, to be restored later. So each function's stack frame contains the state that needs to survive the recursive call, and a return address. This is exactly what recursion is all about on a machine with a stack.
Jose's example doesn't demonstrate this as well, IMO, because no state needs to survive the call for pow. So it just ends up pushing a return address and args, then popping the args, building up just some return addresses. Then at the end, follows the chain of return addresses. It could be extended to save local state across each nested call, doesn't actually illustrate it.
My implementation is a bit different from how gcc compiles the same C function for x86-64 (using the same calling convention of first arg in edi, ret value in eax). gcc6.1 with -O1 keeps it simple and actually does two recursive calls, as you can see on the Godbolt compiler explorer. (-O2 and especially -O3 do some aggressive transformations). gcc saves/restores rbx across the whole function, and keeps n in ebx so it's available after the Fib(n-1) call. (and keeps Fib(n-1) in ebx to survive the second call). The System V calling convention specifies rbx as a call-preserved register, but rbi as call-clobbered (and used for arg-passing).
Obviously you can implement Fib(n) much faster non-recursively, with O(n) time complexity and O(1) space complexity, instead of O(Fib(n)) time and space (stack usage) complexity. It makes a terrible example, but it is trivial.
Margaret's pastebin modified slightly to run in my interpreter for this language: (infinite loop problem, probably due to a transcription error on my part)
set n 3
push n
set initialCallAddress IP
add initialCallAddress 4
push initialCallAddress
jump fact
set finalValue 0
pop finalValue
print finalValue
jump 100
:fact
set rip 0
pop rip
pop n
push rip
set temp n
add n -1
jumpz end n
push n
set link IP
add link 4
push link
jump fact
pop n
mul temp n
:end
pop rip
push temp
jump rip
Successful transcription of Peter's Fibonacci calculator:
String[] x = new String[] {
//n is our input, which term of the sequence we want to calculate
"set n 5",
//temp variable for use throughout the program
"set temp 0",
//call fib
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//program is finished, prints return value and jumps to end
"print returnValue",
"jump end",
//the fib function, which gets called recursively
":fib",
//if this is the base case, then we assert that f(0) = f(1) = 1 and return from the call
"jumple base n 1",
"jump notBase",
":base",
"set returnValue n",
"pop temp",
"jump temp",
":notBase",
//we want to calculate f(n-1) and f(n-2)
//this is where we calculate f(n-1)
"add n -1",
"push n",
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//return from the call that calculated f(n-1)
"pop n",
"push returnValue",
//now we calculate f(n-2)
"add n -1",
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//return from call that calculated f(n-2)
"pop n",
"add returnValue n",
//this is where the fib function ultimately ends and returns to caller
"pop temp",
"jump temp",
//end label
":end"
};
Related
Forgive me if this is basic but I'm new to Elixir and trying to understand the language.
Say I have this code
defmodule Test do
def mult([]), do: 1
def mult([head | tail]) do
IO.puts "head is: #{head}"
IO.inspect tail
head*mult(tail)
end
end
when I run with this: Test.mult([1,5,10])
I get the following output
head is: 1
[5, 10]
head is: 5
'\n'
head is: 10
[]
50
But I'm struggling to understand what's going on because if I separately try to do this:
[h | t] = [1,5,10]
h * t
obviously I get an error, can someone explain what I'm missing?
Your function breaks down as:
mult([1 | [5, 10]])
1 * mult([5 | [10]])
1 * 5 * mult([10 | []])
1 * 5 * 10 * mult([])
1 * 5 * 10 * 1
50
The '\n' is actually [10] and is due to this: Elixir lists interpreted as char lists.
IO.inspect('\n', charlists: :as_lists)
[10]
Consider the arguments being passed to mult on each invocation.
When you do Test.mult([1,5,10]) first it checks the first function clause; that is can [1,5,10] be made to match []. Elixir will try to make the two expressions match if it can. In this case it cannot so then it tries the next function clause; can [1,5,10] be made to match [head|tail]? Yes it can so it assigns the first element (1) to head and the remaining elements [5,10] to tail. Then it recursively calls the function again but this time with the list [5,10]
Again it tries to match [5,10] to []; again this cannot be made to match so it drops down to [head|tail]. This time head is 5 and tail is 10. So again the function is called recursively with [10]
Again, can [10] be made to match []? No. So again it hits [head|tail] and assigns head = 10 and tail = [] (remember there's always an implied empty list at the end of every list).
Last go round; now [] definitely matches [] so it returns 1. Then the prior head * mult(tail) is evaluated (1 * 10) and that result is returned to the prior call on the stack. Evaluated again head (5) * mult(tail) (10) = 50. Final unwind of the stack head (1) * mult(tail) (50) = 50. Hence the overall value of the function is 50.
Remember that Elixir cannot totally evaluate any function call until it evaluates all subsequent function calls. So it hangs on to the intermediate values in order to compute the final value of the function.
Now consider your second code fragment in terms of pattern matching. [h|t] = [1,5,10] will assign h = 1 and t = [5,10]. h*t means 1 * [5,10]. Since those are fundamentally different types there's no inbuilt definition for multiplication in this case.
I found the following code that is meant to compute a^b (Cracking the Coding Interview, Ch. VI Big O).
What's the logic of return a * power(a, b - 1); ? Is it recursion
of some sort?
Is power a keyword here or just pseudocode?
int power(int a, int b)
{ if (b < 0) {
return a; // error
} else if (b == 0) {
return 1;
} else {
return a * power(a, b - 1);
}
}
Power is just the name of the function.
Ya this is RECURSION as we are representing a given problem in terms of smaller problem of similar type.
let a=2 and b=4 =calculate= power(2,4) -- large problem (original one)
Now we will represent this in terms of smaller one
i.e 2*power(2,4-1) -- smaller problem of same type power(2,3)
i.e a*power(a,b-1)
If's in the start are for controlling the base cases i.e when b goes below 1
This is a recursive function. That is, the function is defined in terms of itself, with a base case that prevents the recursion from running indefinitely.
power is the name of the function.
For example, 4^3 is equal to 4 * 4^2. That is, 4 raised to the third power can be calculated by multiplying 4 and 4 raised to the second power. And 4^2 can be calculated as 4 * 4^1, which can be simplified to 4 * 4, since the base case of the recursion specifies that 4^1 = 4. Combining this together, 4^3 = 4 * 4^2 = 4 * 4 * 4^1 = 4 * 4 * 4 = 64.
power here is just the name of the function that is defined and NOT a keyword.
Now, let consider that you want to find 2^10. You can write the same thing as 2*(2^9), as 2*2*(2^8), as 2*2*2*(2^7) and so on till 2*2*2*2*2*2*2*2*2*(2^1).
This is what a * power(a, b - 1) is doing in a recursive manner.
Here is a dry run of the code for finding 2^4:
The initial call to the function will be power(2,4), the complete stack trace is shown below
power(2,4) ---> returns a*power(2,3), i.e, 2*4=16
|
power(2,3) ---> returns a*power(2,2), i.e, 2*3=8
|
power(2,2) ---> returns a*power(2,1), i.e, 2*2=4
|
power(2,1) ---> returns a*power(2,0), i.e, 2*1=2
|
power(2,0) ---> returns 1 as b == 0
I am trying to find the smallest index containing the value i in a sorted array. If this i value is not present I want -1 to be returned. I am using a binary search recursive subroutine. The problem is that I can't really stop this recursion and I get lot of answers(one right and the rest wrong). And sometimes I get an error called "segmentation fault: 11" and I don't really get any results.
I've tried to delete this call random_number since I already have a sorted array in my main program, but it did not work.
program main
implicit none
integer, allocatable :: A(:)
real :: MAX_VALUE
integer :: i,j,n,s, low, high
real :: x
N= 10 !size of table
MAX_VALUE = 10
allocate(A(n))
s = 5 ! searched value
low = 1 ! lower limit
high = n ! highest limit
!generate random table of numbers (from 0 to 1000)
call Random_Seed
do i=1, N
call Random_Number(x) !returns random x >= 0 and <1
A(i)= anint(MAX_VALUE*x)
end do
call bubble(n,a)
print *,' '
write(*,10) (a(i),i=1,N)
10 format(10i6)
call bsearch(A,n,s,low,high)
deallocate(A)
end program main
The sort subroutine:
subroutine sort(p,q)
implicit none
integer(kind=4), intent(inout) :: p, q
integer(kind=4) :: temp
if (p>q) then
temp = p
p = q
q = temp
end if
return
end subroutine sort
The bubble subroutine:
subroutine bubble(n,arr)
implicit none
integer(kind=4), intent(inout) :: n
integer(kind=4), intent(inout) :: arr(n)
integer(kind=4) :: sorted(n)
integer :: i,j
do i=1, n
do j=n, i+1, -1
call sort(arr(j-1), arr(j))
end do
end do
return
end subroutine bubble
recursive subroutine bsearch(b,n,i,low,high)
implicit none
integer(kind=4) :: b(n)
integer(kind=4) :: low, high
integer(kind=4) :: i,j,x,idx,n
real(kind=4) :: r
idx = -1
call random_Number(r)
x = low + anint((high - low)*r)
if (b(x).lt.i) then
low = x + 1
call bsearch(b,n,i,low,high)
else if (b(x).gt.i) then
high = x - 1
call bsearch(b,n,i,low,high)
else
do j = low, high
if (b(j).eq.i) then
idx = j
exit
end if
end do
end if
! Stop if high = low
if (low.eq.high) then
return
end if
print*, i, 'found at index ', idx
return
end subroutine bsearch
The goal is to get the same results as my linear search. But I'am getting either of these answers.
Sorted table:
1 1 2 4 5 5 6 7 8 10
5 found at index 5
5 found at index -1
5 found at index -1
or if the value is not found
2 2 3 4 4 6 6 7 8 8
Segmentation fault: 11
There are a two issues causing your recursive search routine bsearch to either stop with unwanted output, or result in a segmentation fault. Simply following the execution logic of your program at the hand of the examples you provided, elucidate the matter:
1) value present and found, unwanted output
First, consider the first example where array b contains the value i=5 you are searching for (value and index pointed out with || in the first two lines of the code block below). Using the notation Rn to indicate the the n'th level of recursion, L and H for the lower- and upper bounds and x for the current index estimate, a given run of your code could look something like this:
b(x): 1 1 2 4 |5| 5 6 7 8 10
x: 1 2 3 4 |5| 6 7 8 9 10
R0: L x H
R1: Lx H
R2: L x H
5 found at index 5
5 found at index -1
5 found at index -1
In R0 and R1, the tests b(x).lt.i and b(x).gt.i in bsearch work as intended to reduce the search interval. In R2 the do-loop in the else branch is executed, idx is assigned the correct value and this is printed - as intended. However, a return statement is now encountered which will return control to the calling program unit - in this case that is first R1(!) where execution will resume after the if-else if-else block, thus printing a message to screen with the initial value of idx=-1. The same happens upon returning from R0 to the main program. This explains the (unwanted) output you see.
2) value not present, segmentation fault
Secondly, consider the example resulting in a segmentation fault. Using the same notation as before, a possible run could look like this:
b(x): 2 2 3 4 4 6 6 7 8 8
x: 1 2 3 4 5 6 7 8 9 10
R0: L x H
R1: L x H
R2: L x H
R3: LxH
R4: H xL
.
.
.
Segmentation fault: 11
In R0 to R2 the search interval is again reduced as intended. However, in R3 the logic fails. Since the search value i is not present in array b, one of the .lt. or .gt. tests will always evaluate to .true., meaning that the test for low .eq. high to terminate a search is never reached. From this point onwards, the logic is no longer valid (e.g. high can be smaller than low) and the code will continue deepening the level of recursion until the call stack gets too big and a segmentation fault occurs.
These explained the main logical flaws in the code. A possible inefficiency is the use of a do-loop to find the lowest index containing a searched for value. Consider a case where the value you are looking for is e.g. i=8, and that it appears in the last position in your array, as below. Assume further that by chance, the first guess for its position is x = high. This implies that your code will immediately branch to the do-loop, where in effect a linear search is done of very nearly the entire array, to find the final result idx=9. Although correct, the intended binary search rather becomes a linear search, which could result in reduced performance.
b(x): 2 2 3 4 4 6 6 7 |8| 8
x: 1 2 3 4 5 6 7 8 |9| 10
R0: L xH
8 found at index 9
Fixing the problems
At the very least, you should move the low .eq. high test to the start of the bsearch routine, so that recursion stops before invalid bounds can be defined (you then need an additional test to see if the search value was found or not). Also, notify about a successful search right after it occurs, i.e. after the equality test in your do-loop, or the additional test just mentioned. This still does not address the inefficiency of a possible linear search.
All taken into account, you are probably better off reading up on algorithms for finding a "leftmost" index (e.g. on Wikipedia or look at a tried and tested implementation - both examples here use iteration instead of recursion, perhaps another improvement, but the same principles apply) and adapt that to Fortran, which could look something like this (only showing new code, ...refer to existing code in your examples):
module mod_search
implicit none
contains
! Function that uses recursive binary search to look for `key` in an
! ordered `array`. Returns the array index of the leftmost occurrence
! of `key` if present in `array`, and -1 otherwise
function search_ordered (array, key) result (idx)
integer, intent(in) :: array(:)
integer, intent(in) :: key
integer :: idx
! find left most array index that could possibly hold `key`
idx = binary_search_left(1, size(array))
! if `key` is not found, return -1
if (array(idx) /= key) then
idx = -1
end if
contains
! function for recursive reduction of search interval
recursive function binary_search_left(low, high) result(idx)
integer, intent(in) :: low, high
integer :: idx
real :: r
if (high <= low ) then
! found lowest possible index where target could be
idx = low
else
! new guess
call random_number(r)
idx = low + floor((high - low)*r)
! alternative: idx = low + (high - low) / 2
if (array(idx) < key) then
! continue looking to the right of current guess
idx = binary_search_left(idx + 1, high)
else
! continue looking to the left of current guess (inclusive)
idx = binary_search_left(low, idx)
end if
end if
end function binary_search_left
end function search_ordered
! Move your routines into a module
subroutine sort(p,q)
...
end subroutine sort
subroutine bubble(n,arr)
...
end subroutine bubble
end module mod_search
! your main program
program main
use mod_search, only : search_ordered, sort, bubble ! <---- use routines from module like so
implicit none
...
! Replace your call to bsearch() with the following:
! call bsearch(A,n,s,low,high)
i = search_ordered(A, s)
if (i /= -1) then
print *, s, 'found at index ', i
else
print *, s, 'not found!'
end if
...
end program main
Finally, depending on your actual use case, you could also just consider using the Fortran intrinsic procedure minloc saving you the trouble of implementing all this functionality yourself. In this case, it can be done by making the following modification in your main program:
! i = search_ordered(a, s) ! <---- comment out this line
j = minloc(abs(a-s), dim=1) ! <---- replace with these two
i = merge(j, -1, a(j) == s)
where j returned from minloc will be the lowest index in the array a where s may be found, and merge is used to return j when a(j) == s and -1 otherwise.
While going through julia, I wanted to have a functionality similar to python's dis module.
Going through over the net, I found out that the Julia community have worked over this issue and given these (https://github.com/JuliaLang/julia/issues/218)
finfer -> code_typed
methods(function, types) -> code_lowered
disassemble(function, types, true) -> code_native
disassemble(function, types, false) -> code_llvm
I have tried these personally using the Julia REPL, but I quite seem to find it hard to understand.
In Python, I can disassemble a function like this.
>>> import dis
>>> dis.dis(lambda x: 2*x)
1 0 LOAD_CONST 1 (2)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
>>>
Can anyone who has worked with these help me understand them more? Thanks.
The standard CPython implementation of Python parses source code and does some pre-processing and simplification of it – aka "lowering" – transforming it to a machine-friendly, easy-to-interpret format called "bytecode". This is what is displayed when you "disassemble" a Python function. This code is not executable by the hardware – it is "executable" by the CPython interpreter. CPython's bytecode format is fairly simple, partly because that's what interpreters tend to do well with – if the bytecode is too complex, it slows down the interpreter – and partly because the Python community tends to put a high premium on simplicity, sometimes at the cost of high performance.
Julia's implementation is not interpreted, it is just-in-time (JIT) compiled. This means that when you call a function, it is transformed to machine code which is executed directly by the native hardware. This process is quite a bit more complex than the parsing and lowering to bytecode that Python does, but in exchange for that complexity, Julia gets its hallmark speed. (The PyPy JIT for Python is also much more complex than CPython but also typically much faster – increased complexity is a fairly typical cost for speed.) The four levels of "disassembly" for Julia code give you access to the representation of a Julia method implementation for particular argument types at different stages of the transformation from source code to machine code. I'll use the following function which computes the next Fibonacci number after its argument as an example:
function nextfib(n)
a, b = one(n), one(n)
while b < n
a, b = b, a + b
end
return b
end
julia> nextfib(5)
5
julia> nextfib(6)
8
julia> nextfib(123)
144
Lowered code. The #code_lowered macro displays code in a format that is the closest to Python byte code, but rather than being intended for execution by an interpreter, it's intended for further transformation by a compiler. This format is largely internal and not intended for human consumption. The code is transformed into "single static assignment" form in which "each variable is assigned exactly once, and every variable is defined before it is used". Loops and conditionals are transformed into gotos and labels using a single unless/goto construct (this is not exposed in user-level Julia). Here's our example code in lowered form (in Julia 0.6.0-pre.beta.134, which is just what I happen to have available):
julia> #code_lowered nextfib(123)
CodeInfo(:(begin
nothing
SSAValue(0) = (Main.one)(n)
SSAValue(1) = (Main.one)(n)
a = SSAValue(0)
b = SSAValue(1) # line 3:
7:
unless b < n goto 16 # line 4:
SSAValue(2) = b
SSAValue(3) = a + b
a = SSAValue(2)
b = SSAValue(3)
14:
goto 7
16: # line 6:
return b
end))
You can see the SSAValue nodes and unless/goto constructs and label numbers. This is not that hard to read, but again, it's also not really meant to be easy for human consumption. Lowered code doesn't depend on the types of the arguments, except in as far as they determine which method body to call – as long as the same method is called, the same lowered code applies.
Typed code. The #code_typed macro presents a method implementation for a particular set of argument types after type inference and inlining. This incarnation of the code is similar to the lowered form, but with expressions annotated with type information and some generic function calls replaced with their implementations. For example, here is the type code for our example function:
julia> #code_typed nextfib(123)
CodeInfo(:(begin
a = 1
b = 1 # line 3:
4:
unless (Base.slt_int)(b, n)::Bool goto 13 # line 4:
SSAValue(2) = b
SSAValue(3) = (Base.add_int)(a, b)::Int64
a = SSAValue(2)
b = SSAValue(3)
11:
goto 4
13: # line 6:
return b
end))=>Int64
Calls to one(n) have been replaced with the literal Int64 value 1 (on my system the default integer type is Int64). The expression b < n has been replaced with its implementation in terms of the slt_int intrinsic ("signed integer less than") and the result of this has been annotated with return type Bool. The expression a + b has been also replaced with its implementation in terms of the add_int intrinsic and its result type annotated as Int64. And the return type of the entire function body has been annotated as Int64.
Unlike lowered code, which depends only on argument types to determine which method body is called, the details of typed code depend on argument types:
julia> #code_typed nextfib(Int128(123))
CodeInfo(:(begin
SSAValue(0) = (Base.sext_int)(Int128, 1)::Int128
SSAValue(1) = (Base.sext_int)(Int128, 1)::Int128
a = SSAValue(0)
b = SSAValue(1) # line 3:
6:
unless (Base.slt_int)(b, n)::Bool goto 15 # line 4:
SSAValue(2) = b
SSAValue(3) = (Base.add_int)(a, b)::Int128
a = SSAValue(2)
b = SSAValue(3)
13:
goto 6
15: # line 6:
return b
end))=>Int128
This is the typed version of the nextfib function for an Int128 argument. The literal 1 must be sign extended to Int128 and the result types of operations are of type Int128 instead of Int64. The typed code can be quite different if the implementation of a type is considerably different. For example nextfib for BigInts is significantly more involved than for simple "bits types" like Int64 and Int128:
julia> #code_typed nextfib(big(123))
CodeInfo(:(begin
$(Expr(:inbounds, false))
# meta: location number.jl one 164
# meta: location number.jl one 163
# meta: location gmp.jl convert 111
z#_5 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 112:
$(Expr(:foreigncall, (:__gmpz_set_si, :libgmp), Void, svec(Ptr{BigInt}, Int64), :(&z#_5), :(z#_5), 1, 0))
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
$(Expr(:inbounds, false))
# meta: location number.jl one 164
# meta: location number.jl one 163
# meta: location gmp.jl convert 111
z#_6 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 112:
$(Expr(:foreigncall, (:__gmpz_set_si, :libgmp), Void, svec(Ptr{BigInt}, Int64), :(&z#_6), :(z#_6), 1, 0))
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
a = z#_5
b = z#_6 # line 3:
26:
$(Expr(:inbounds, false))
# meta: location gmp.jl < 516
SSAValue(10) = $(Expr(:foreigncall, (:__gmpz_cmp, :libgmp), Int32, svec(Ptr{BigInt}, Ptr{BigInt}), :(&b), :(b), :(&n), :(n)))
# meta: pop location
$(Expr(:inbounds, :pop))
unless (Base.slt_int)((Base.sext_int)(Int64, SSAValue(10))::Int64, 0)::Bool goto 46 # line 4:
SSAValue(2) = b
$(Expr(:inbounds, false))
# meta: location gmp.jl + 258
z#_7 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 259:
$(Expr(:foreigncall, ("__gmpz_add", :libgmp), Void, svec(Ptr{BigInt}, Ptr{BigInt}, Ptr{BigInt}), :(&z#_7), :(z#_7), :(&a), :(a), :(&b), :(b)))
# meta: pop location
$(Expr(:inbounds, :pop))
a = SSAValue(2)
b = z#_7
44:
goto 26
46: # line 6:
return b
end))=>BigInt
This reflects the fact that operations on BigInts are pretty complicated and involve memory allocation and calls to the external GMP library (libgmp).
LLVM IR. Julia uses the LLVM compiler framework to generate machine code. LLVM defines an assembly-like language which it uses as a shared intermediate representation (IR) between different compiler optimization passes and other tools in the framework. There are three isomorphic forms of LLVM IR:
A binary representation that is compact and machine readable.
A textual representation that is verbose and somewhat human readable.
An in-memory representation that is generated and consumed by LLVM libraries.
Julia uses LLVM's C++ API to construct LLVM IR in memory (form 3) and then call some LLVM optimization passes on that form. When you do #code_llvm you see the LLVM IR after generation and some high-level optimizations. Here's LLVM code for our ongoing example:
julia> #code_llvm nextfib(123)
define i64 #julia_nextfib_60009(i64) #0 !dbg !5 {
top:
br label %L4
L4: ; preds = %L4, %top
%storemerge1 = phi i64 [ 1, %top ], [ %storemerge, %L4 ]
%storemerge = phi i64 [ 1, %top ], [ %2, %L4 ]
%1 = icmp slt i64 %storemerge, %0
%2 = add i64 %storemerge, %storemerge1
br i1 %1, label %L4, label %L13
L13: ; preds = %L4
ret i64 %storemerge
}
This is the textual form of the in-memory LLVM IR for the nextfib(123) method implementation. LLVM is not easy to read – it's not intended to be written or read by people most of the time – but it is thoroughly specified and documented. Once you get the hang of it, it's not hard to understand. This code jumps to the label L4 and initializes the "registers" %storemerge1 and %storemerge with the i64 (LLVM's name for Int64) value 1 (their values are derived differently when jumped to from different locations – that's what the phi instruction does). It then does a icmp slt comparing %storemerge with register %0 – which holds the argument untouched for the entire method execution – and saves the comparison result into the register %1. It does an add i64 on %storemerge and %storemerge1 and saves the result into register %2. If %1 is true, it branches back to L4 and otherwise it branches to L13. When the code loops back to L4 the register %storemerge1 gets the previous values of %storemerge and %storemerge gets the previous value of %2.
Native code. Since Julia executes native code, the last form a method implementation takes is what the machine actually executes. This is just binary code in memory, which is rather hard to read, so long ago people invented various forms of "assembly language" which represent instructions and registers with names and have some amount of simple syntax to help express what instructions do. In general, assembly language remains close to one-to-one correspondence with machine code, in particular, one can always "disassemble" machine code into assembly code. Here's our example:
julia> #code_native nextfib(123)
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[1]
pushq %rbp
movq %rsp, %rbp
movl $1, %ecx
movl $1, %edx
nop
L16:
movq %rdx, %rax
Source line: 4
movq %rcx, %rdx
addq %rax, %rdx
movq %rax, %rcx
Source line: 3
cmpq %rdi, %rax
jl L16
Source line: 6
popq %rbp
retq
nopw %cs:(%rax,%rax)
This is on an Intel Core i7, which is in the x86_64 CPU family. It only uses standard integer instructions, so it doesn't matter beyond that what the architecture is, but you can get different results for some code depending on the specific architecture of your machine, since JIT code can be different on different systems. The pushq and movq instructions at the beginning are a standard function preamble, saving registers to the stack; similarly, popq restores the registers and retq returns from the function; nopw is a 2-byte instruction that does nothing, included just to pad the length of the function. So the meat of the code is just this:
movl $1, %ecx
movl $1, %edx
nop
L16:
movq %rdx, %rax
Source line: 4
movq %rcx, %rdx
addq %rax, %rdx
movq %rax, %rcx
Source line: 3
cmpq %rdi, %rax
jl L16
The movl instructions at the top initialize registers with 1 values. The movq instructions move values between registers and the addq instruction adds registers. The cmpq instruction compares two registers and jl either jumps back to L16 or continues to return from the function. This handful of integer machine instructions in a tight loop is exactly what executes when your Julia function call runs, presented in slightly more pleasant human-readable form. It's easy to see why it runs fast.
If you're interested in JIT compilation in general as compared to interpreted implementations, Eli Bendersky has a great pair of blog posts where he goes from a simple interpreter implementation of a language to a (simple) optimizing JIT for the same language:
http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/
http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit.html
What happens for a global variable when running in the parallel mode?
I have a global variable, "to_be_optimized_parameterIndexSet", which is a vector of indexes that should be optimized using gamultiobj and I have set its value only in the main script(nowhere else).
My code works properly in serial mode but when I switch to parallel mode (using "matlabpool open" and setting proper values for 'gaoptimset' ) the mentioned global variable becomes empty (=[]) in the fitness function and causes this error:
??? Error using ==> parallel_function at 598
Error in ==> PF_gaMultiFitness at 15 [THIS LINE: constants(to_be_optimized_parameterIndexSet) = individual;]
In an assignment A(I) = B, the number of elements in B and
I must be the same.
Error in ==> fcnvectorizer at 17
parfor (i = 1:popSize)
Error in ==> gamultiobjMakeState at 52
Score =
fcnvectorizer(state.Population(initScoreProvided+1:end,:),FitnessFcn,numObj,options.SerialUserFcn);
Error in ==> gamultiobjsolve at 11
state = gamultiobjMakeState(GenomeLength,FitnessFcn,output.problemtype,options);
E rror in ==> gamultiobj at 238
[x,fval,exitFlag,output,population,scores] = gamultiobjsolve(FitnessFcn,nvars, ...
Error in ==> PF_GA_mainScript at 136
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
Caused by:
Failure in user-supplied fitness function evaluation. GA cannot continue.
I have checked all the code to make sure I've not changed this global variable everywhere else.
I have a quad-core processor.
Where is the bug? any suggestion?
EDIT 1: The MATLAB code in the main script:
clc
clear
close all
format short g
global simulation_duration % PF_gaMultiFitness will use this variable
global to_be_optimized_parameterIndexSet % PF_gaMultiFitness will use this variable
global IC stimulusMoment % PF_gaMultiFitness will use these variables
[initialConstants IC] = oldCICR_Constants; %initialize state
to_be_optimized_parameterIndexSet = [21 22 23 24 25 26 27 28 17 20];
LB = [ 0.97667 0.38185 0.63529 0.046564 0.23207 0.87484 0.46014 0.0030636 0.46494 0.82407 ];
UB = [1.8486 0.68292 0.87129 0.87814 0.66982 1.3819 0.64562 0.15456 1.3717 1.8168];
PopulationSize = input('Population size? ') ;
GaTimeLimit = input('GA time limit? (second) ');
matlabpool open
nGenerations = inf;
options = gaoptimset('PopulationSize', PopulationSize, 'TimeLimit',GaTimeLimit, 'Generations', nGenerations, ...
'Vectorized','off', 'UseParallel','always');
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
length(to_be_optimized_parameterIndexSet),[],[],[],[],LB,UB,options);
matlabpool close
some other piece of code to show the results...
The MATLAB code of the fitness function, "PF_gaMultiFitness":
function objectives =PF_gaMultiFitness(individual, constants)
global simulation_duration IC stimulusMoment to_be_optimized_parameterIndexSet
%THIS FUNCTION RETURNS MULTI OBJECTIVES AND PUTS EACH OBJECTIVE IN A COLUMN
constants(to_be_optimized_parameterIndexSet) = individual;
[smcState , ~, Time]= oldCICR_CompCore(constants, IC, simulation_duration,2);
targetValue = 1; % [uM]desired [Ca]i peak concentration
afterStimulus = smcState(Time>stimulusMoment,14); % values of [Ca]i after stimulus
peak_Ca_value = max(afterStimulus); % smcState(:,14) is [Ca]i
if peak_Ca_value < 0.8 * targetValue
objectives(1,1) = inf;
else
objectives(1, 1) = abs(peak_Ca_value - targetValue);
end
pkIDX = peakFinder(afterStimulus);
nPeaks = sum(pkIDX);
if nPeaks > 1
peakIndexes = find(pkIDX);
period = Time(peakIndexes(2)) - Time(peakIndexes(1));
objectives(1,2) = 1e5* 1/period;
elseif nPeaks == 1 && peak_Ca_value > 0.8 * targetValue
objectives(1,2) = 0;
else
objectives(1,2) = inf;
end
end
Global variables do not get passed from the MATLAB client to the workers executing the body of the PARFOR loop. The only data that does get sent into the loop body are variables that occur in the text of the program. This blog entry might help.
it really depends on the type of variable you're putting in. i need to see more of your code to point out the flaw, but in general it is good practice to avoid assuming complicated variables will be passed to each worker. In other words anything more then a primitive may need to be reinitialized inside a parallel routine or may need have specific function calls (like using feval for function handles).
My advice: RTM