Recursion in MIPS with arrays - recursion

I'm a beginner user of MARS for programming in MIPS language. I started to study the recursion and I wrote a little method in java that take in input an array and an index, and make a recursive sum of all its elements. But I don't know how to write it in mips language, someone can help me?
public int recursiveSum(int i, int[] array)
{
if(i == array.length-1)
return array[i];
return array[i]+recursiveSum(i+1, array);
}

When you create a recursive function in mips, you need to save the return address (register $ra) on a stack frame you create for the function call's duration, restoring it upon exit (and removing/popping the frame).
Simple functions can get by minimally with just the save/restore of $ra, but, for recursive functions they typically have to save/restore some of their arguments as well.
Aside from the mips instruction set reference, one of the things you'll want to consult is the mips ABI document as it lays out which registers get used in which context for which purpose. This is particularly important when writing recursive code.
Here's a program that implements your function with unit tests. It has two different recursive function implementations to help illustrate some of the tricks you do [can do] in asm.
Notice in the top comment block your original function and a modified one that removes the "non-standard" return statement. That is, when using a high level language as pseudo-code for asm, just have a single return statement as that's how the asm gets implemented. It's important to keep things simple in the HLL. The simpler it is, the more literal the translation to the asm.
Also, there are some things you can do in asm [mips in particular] that can't be done/modeled in an HLL. mips has 32 registers. Imagine putting invariant values [such as the address of your array] in "global" registers that are preserved across all calls. The address is in a register and not on the stack or global memory, so it's very fast.
There is no HLL equivalent for this. BTW, if you know C, I'd do the pseudo-code in that rather than java. C has pointers [and explicit memory addresses], whereas java does not, and pointers are the lifeblood of asm.
Anyway, here's the code. It is well annotated, so, hopefully, it's easy to read:
# recursive sum
#
##+
# // original function:
# public int recursiveSum(int i, int[] array)
# {
#
# if (i == (array.length - 1))
# return array[i];
#
# return array[i] + recursiveSum(i + 1,array);
# }
#
# // function as it could be implemented:
# public int recursiveSum(int i, int[] array)
# {
# int ret;
#
# if (i == (array.length - 1))
# ret = 0;
# else
# ret = recursiveSum(i + 1,array);
#
# return array[i] + ret;
# }
#
# // function as it _was_ be implemented:
# public int[] array; // in a global register
#
# public int recursiveSum(int i)
# {
# int ret;
#
# if (i == (array.length - 1))
# ret = 0;
# else
# ret = recursiveSum(i + 1);
#
# return array[i] + ret;
# }
##-
.data
sdata:
array:
.word 1,2,3,4,5,6,7,8,9,10
.word 11,12,13,14,15,16,17,18,19,20
.word 11,22,23,24,25,26,27,28,29,30
arrend:
msg_simple: .asciiz "simple sum is: "
msg_recur1: .asciiz "recursive sum (method 1) is: "
msg_recur2: .asciiz "recursive sum (method 2) is: "
msg_match: .asciiz "difference is: "
msg_nl: .asciiz "\n"
.text
.globl main
# main -- main program
#
# registers:
# s0 -- array count (i.e. number of integers/words)
# s1 -- array pointer
# s2 -- array count - 1
#
# s3 -- simple sum
# s4 -- recursive sum
main:
la $s1,array # get array address
la $s0,arrend # get address of array end
subu $s0,$s0,$s1 # get byte length of array
srl $s0,$s0,2 # count = len / 4
subi $s2,$s0,1 # save count - 1
# get simple sum for reference
jal sumsimple # get simple sum
move $s3,$v0 # save for compare
# show the simple sum results
la $a0,msg_simple
move $a1,$v0
jal showsum
# get recursive sum (method 1)
li $a0,0 # i = 0
jal sumrecurs1 # get recursive sum
move $s4,$v0 # save for compare
# show the recursive sum results
la $a0,msg_recur1
move $a1,$v0
jal showsum
# get recursive sum (method 2)
li $a0,0 # i = 0
jal sumrecurs2 # get recursive sum
# show the recursive sum results
la $a0,msg_recur2
move $a1,$v0
jal showsum
# show the difference in values between simple and method 1
subu $a1,$s4,$s3 # difference of values
la $a0,msg_match
jal showsum
# exit the program
li $v0,10
syscall
# sumsimple -- compute simple sum by looping through array
#
# RETURNS:
# v0 -- sum
#
# registers:
# t0 -- array count
# t1 -- array pointer
sumsimple:
move $t0,$s0 # get array count
move $t1,$s1 # get array address
li $v0,0 # sum = 0
j sumsimple_test
sumsimple_loop:
lw $t2,0($t1) # get array[i]
add $v0,$v0,$t2 # sum += array[i]
addi $t1,$t1,4 # advance pointer to array[i + 1]
subi $t0,$t0,1 # decrement count
sumsimple_test:
bgtz $t0,sumsimple_loop # are we done? if no, loop
jr $ra # return
# sumrecurs1 -- compute recursive sum
#
# RETURNS:
# v0 -- sum
#
# arguments:
# a0 -- array index (i)
# s1 -- array pointer
#
# NOTES:
# (1) in the mips ABI, the second argument is normally passed in a1 [which can
# be trashed] but we are using s1 as a "global" register
# (2) this saves an extra [and unnecessary push/pop to stack] as s1 _must_ be
# preserved
# (3) we do, however, preserve the array index (a0) on the stack
sumrecurs1:
# save return address and argument on a stack frame we setup
subiu $sp,$sp,8
sw $ra,0($sp)
sw $a0,4($sp)
blt $a0,$s2,sumrecurs1_call # at the end? if no, fly
li $v0,0 # yes, simulate call return of 0
j sumrecurs1_done # skip to function epilog
sumrecurs1_call:
addi $a0,$a0,1 # bump up the index
jal sumrecurs1 # recursive call
# get back the index we were called with from our stack frame
# NOTES:
# (1) while we _could_ just subtract one from a0 here because of the way
# sumrecurs is implemented, we _don't_ because, in general, under the
# standard mips ABI the called function is at liberty to _trash_ it
# (2) the index value restored in the epilog of our recursive function
# call is _not_ _our_ value "i", but "i + 1", so we need to get our
# "i" value from _our_ stack frame
# (3) see sumrecurs2 for the faster method where we "cheat" and violate
# the ABI
lw $a0,4($sp)
sumrecurs1_done:
sll $t2,$a0,2 # get byte offset for index i
add $t2,$s1,$t2 # get address of array[i]
lw $t2,0($t2) # fetch array[i]
add $v0,$t2,$v0 # sum our value and callee's
# restore return address from stack
lw $ra,0($sp)
lw $a0,4($sp)
addiu $sp,$sp,8
jr $ra
# sumrecurs2 -- compute recursive sum
#
# RETURNS:
# v0 -- sum
#
# arguments:
# a0 -- array index (i)
# s1 -- array pointer
#
# NOTES:
# (1) in the mips ABI, the second argument is normally passed in a1 [which can
# be trashed] but we are using s1 as a "global" register
# (2) this saves an extra [and unnecessary push/pop to stack] as s1 _must_ be
# preserved
# (3) we do, however, preserve the array index (a0) on the stack
sumrecurs2:
# save _only_ return address on a stack frame we setup
subiu $sp,$sp,4
sw $ra,0($sp)
blt $a0,$s2,sumrecurs2_call # at the end? if no, fly
li $v0,0 # yes, simulate call return of 0
j sumrecurs2_done # skip to function epilog
sumrecurs2_call:
addi $a0,$a0,1 # bump up the index
jal sumrecurs2 # recursive call
subi $a0,$a0,1 # bump down the index
sumrecurs2_done:
sll $t2,$a0,2 # get byte offset for index i
add $t2,$s1,$t2 # get address of array[i]
lw $t2,0($t2) # fetch array[i]
add $v0,$t2,$v0 # sum our value and callee's
# restore return address from stack
lw $ra,0($sp)
addiu $sp,$sp,4
jr $ra
# showsum -- output a message
#
# arguments:
# a0 -- message
# a1 -- number value
showsum:
li $v0,4 # puts
syscall
move $a0,$a1 # get number to print
li $v0,1 # prtint
syscall
la $a0,msg_nl
li $v0,4 # puts
syscall
jr $ra

Related

How do I finish a recursive binary search in fortran?

I am trying to find the smallest index containing the value i in a sorted array. If this i value is not present I want -1 to be returned. I am using a binary search recursive subroutine. The problem is that I can't really stop this recursion and I get lot of answers(one right and the rest wrong). And sometimes I get an error called "segmentation fault: 11" and I don't really get any results.
I've tried to delete this call random_number since I already have a sorted array in my main program, but it did not work.
program main
implicit none
integer, allocatable :: A(:)
real :: MAX_VALUE
integer :: i,j,n,s, low, high
real :: x
N= 10 !size of table
MAX_VALUE = 10
allocate(A(n))
s = 5 ! searched value
low = 1 ! lower limit
high = n ! highest limit
!generate random table of numbers (from 0 to 1000)
call Random_Seed
do i=1, N
call Random_Number(x) !returns random x >= 0 and <1
A(i)= anint(MAX_VALUE*x)
end do
call bubble(n,a)
print *,' '
write(*,10) (a(i),i=1,N)
10 format(10i6)
call bsearch(A,n,s,low,high)
deallocate(A)
end program main
The sort subroutine:
subroutine sort(p,q)
implicit none
integer(kind=4), intent(inout) :: p, q
integer(kind=4) :: temp
if (p>q) then
temp = p
p = q
q = temp
end if
return
end subroutine sort
The bubble subroutine:
subroutine bubble(n,arr)
implicit none
integer(kind=4), intent(inout) :: n
integer(kind=4), intent(inout) :: arr(n)
integer(kind=4) :: sorted(n)
integer :: i,j
do i=1, n
do j=n, i+1, -1
call sort(arr(j-1), arr(j))
end do
end do
return
end subroutine bubble
recursive subroutine bsearch(b,n,i,low,high)
implicit none
integer(kind=4) :: b(n)
integer(kind=4) :: low, high
integer(kind=4) :: i,j,x,idx,n
real(kind=4) :: r
idx = -1
call random_Number(r)
x = low + anint((high - low)*r)
if (b(x).lt.i) then
low = x + 1
call bsearch(b,n,i,low,high)
else if (b(x).gt.i) then
high = x - 1
call bsearch(b,n,i,low,high)
else
do j = low, high
if (b(j).eq.i) then
idx = j
exit
end if
end do
end if
! Stop if high = low
if (low.eq.high) then
return
end if
print*, i, 'found at index ', idx
return
end subroutine bsearch
The goal is to get the same results as my linear search. But I'am getting either of these answers.
Sorted table:
1 1 2 4 5 5 6 7 8 10
5 found at index 5
5 found at index -1
5 found at index -1
or if the value is not found
2 2 3 4 4 6 6 7 8 8
Segmentation fault: 11
There are a two issues causing your recursive search routine bsearch to either stop with unwanted output, or result in a segmentation fault. Simply following the execution logic of your program at the hand of the examples you provided, elucidate the matter:
1) value present and found, unwanted output
First, consider the first example where array b contains the value i=5 you are searching for (value and index pointed out with || in the first two lines of the code block below). Using the notation Rn to indicate the the n'th level of recursion, L and H for the lower- and upper bounds and x for the current index estimate, a given run of your code could look something like this:
b(x): 1 1 2 4 |5| 5 6 7 8 10
x: 1 2 3 4 |5| 6 7 8 9 10
R0: L x H
R1: Lx H
R2: L x H
5 found at index 5
5 found at index -1
5 found at index -1
In R0 and R1, the tests b(x).lt.i and b(x).gt.i in bsearch work as intended to reduce the search interval. In R2 the do-loop in the else branch is executed, idx is assigned the correct value and this is printed - as intended. However, a return statement is now encountered which will return control to the calling program unit - in this case that is first R1(!) where execution will resume after the if-else if-else block, thus printing a message to screen with the initial value of idx=-1. The same happens upon returning from R0 to the main program. This explains the (unwanted) output you see.
2) value not present, segmentation fault
Secondly, consider the example resulting in a segmentation fault. Using the same notation as before, a possible run could look like this:
b(x): 2 2 3 4 4 6 6 7 8 8
x: 1 2 3 4 5 6 7 8 9 10
R0: L x H
R1: L x H
R2: L x H
R3: LxH
R4: H xL
.
.
.
Segmentation fault: 11
In R0 to R2 the search interval is again reduced as intended. However, in R3 the logic fails. Since the search value i is not present in array b, one of the .lt. or .gt. tests will always evaluate to .true., meaning that the test for low .eq. high to terminate a search is never reached. From this point onwards, the logic is no longer valid (e.g. high can be smaller than low) and the code will continue deepening the level of recursion until the call stack gets too big and a segmentation fault occurs.
These explained the main logical flaws in the code. A possible inefficiency is the use of a do-loop to find the lowest index containing a searched for value. Consider a case where the value you are looking for is e.g. i=8, and that it appears in the last position in your array, as below. Assume further that by chance, the first guess for its position is x = high. This implies that your code will immediately branch to the do-loop, where in effect a linear search is done of very nearly the entire array, to find the final result idx=9. Although correct, the intended binary search rather becomes a linear search, which could result in reduced performance.
b(x): 2 2 3 4 4 6 6 7 |8| 8
x: 1 2 3 4 5 6 7 8 |9| 10
R0: L xH
8 found at index 9
Fixing the problems
At the very least, you should move the low .eq. high test to the start of the bsearch routine, so that recursion stops before invalid bounds can be defined (you then need an additional test to see if the search value was found or not). Also, notify about a successful search right after it occurs, i.e. after the equality test in your do-loop, or the additional test just mentioned. This still does not address the inefficiency of a possible linear search.
All taken into account, you are probably better off reading up on algorithms for finding a "leftmost" index (e.g. on Wikipedia or look at a tried and tested implementation - both examples here use iteration instead of recursion, perhaps another improvement, but the same principles apply) and adapt that to Fortran, which could look something like this (only showing new code, ...refer to existing code in your examples):
module mod_search
implicit none
contains
! Function that uses recursive binary search to look for `key` in an
! ordered `array`. Returns the array index of the leftmost occurrence
! of `key` if present in `array`, and -1 otherwise
function search_ordered (array, key) result (idx)
integer, intent(in) :: array(:)
integer, intent(in) :: key
integer :: idx
! find left most array index that could possibly hold `key`
idx = binary_search_left(1, size(array))
! if `key` is not found, return -1
if (array(idx) /= key) then
idx = -1
end if
contains
! function for recursive reduction of search interval
recursive function binary_search_left(low, high) result(idx)
integer, intent(in) :: low, high
integer :: idx
real :: r
if (high <= low ) then
! found lowest possible index where target could be
idx = low
else
! new guess
call random_number(r)
idx = low + floor((high - low)*r)
! alternative: idx = low + (high - low) / 2
if (array(idx) < key) then
! continue looking to the right of current guess
idx = binary_search_left(idx + 1, high)
else
! continue looking to the left of current guess (inclusive)
idx = binary_search_left(low, idx)
end if
end if
end function binary_search_left
end function search_ordered
! Move your routines into a module
subroutine sort(p,q)
...
end subroutine sort
subroutine bubble(n,arr)
...
end subroutine bubble
end module mod_search
! your main program
program main
use mod_search, only : search_ordered, sort, bubble ! <---- use routines from module like so
implicit none
...
! Replace your call to bsearch() with the following:
! call bsearch(A,n,s,low,high)
i = search_ordered(A, s)
if (i /= -1) then
print *, s, 'found at index ', i
else
print *, s, 'not found!'
end if
...
end program main
Finally, depending on your actual use case, you could also just consider using the Fortran intrinsic procedure minloc saving you the trouble of implementing all this functionality yourself. In this case, it can be done by making the following modification in your main program:
! i = search_ordered(a, s) ! <---- comment out this line
j = minloc(abs(a-s), dim=1) ! <---- replace with these two
i = merge(j, -1, a(j) == s)
where j returned from minloc will be the lowest index in the array a where s may be found, and merge is used to return j when a(j) == s and -1 otherwise.

Expressing the double summation sequence in Raku

How to express the double variable double summation sequence in Perl 6?
For an example of double variable double summation sequence see this
It must be expressed as is, i.e. without mathematically reducing the double summation into a single summation. Thank you.
The X (cross operator) and the [+] (reduction metaoperator [ ] with additive operator +) make this surprisingly easy:
To represent1 the double summation ∑³x = 1 ∑⁵y = 1 2x + y , you can do the following:
[+] do for 1..3 X 1..5 -> ($x, $y) { 2 * $x + $y }
# for 1..3 X 1..5 # loop cross values
# -> ($x, $y) # plug into x/y
# { 2 * $x + $y } # calculate each iteration
# do # collect loop return vals
# [+] # sum them all
If you wanted to create a sub for this, you could write it as the following2
sub ΣΣ (
Int $aₒ, Int $aₙ, # to / from for the outer
Int $bₒ, Int $bₙ, # to / from for the inner
&f where .arity = 2  # 'where' clause guarantees only two params
) {
[+] do for $aₒ..$aₙ X $bₒ..$bₙ -> ($a, $b) { &f(a,b) }
}
say ΣΣ 1,3, 1,5, { 2 * $^x + $^y }
Or even simplify things more to
sub ΣΣ (
Iterable \a, # outer values
Iterable \b, # inner values
&f where .arity = 2) { # ensure only two parameters
[+] do f(|$_) for a X b
}
# All of the following are equivalent
say ΣΣ 1..3, 1..5, -> $x, $y { 2 * $x + $y }; # Anonymous block
say ΣΣ 1..3, 1..5, { 2 * $^x + $^y }; # Alphabetic args
say ΣΣ 1..3, 1..5, 2 * * + * ; # Overkill, but Whatever ;-)
Note that by typing it, we can ensure ranges are passed, but by typing it as Iterable rather than Range we can allow more interesting summation sequences, like, say, ΣΣ (1..∞).grep(*.is-prime)[^99], 1..10, { … } that would let us use sequence of the first 100 primes.
In fact, if we really wanted to, we could go overboard, and allow for an arbitrary depth summation operator, which is made easiest by moving the function to the left:
sub ΣΣ (
&function,
**#ranges where # slurp in the ranges
.all ~~ Iterable && # make sure they're Iterables
.elems == &function.arity # one per argument in the function
) {
[+] do function(|$_) for [X] #ranges;
};
Just like [+] sums up all the values of our f() function, [X] calculates the cross iteratively, e.g., [X] 0..1, 3..4, 5..6 first does 0..1 X 3..4 or (0,3),(0,4),(1,3),(1,4), and then does (0,3),(0,4),(1,3),(1,4) X 5..6, or (0,3,5),(0,4,5),(1,3,5),(1,4,5),(0,3,6),(0,4,6),(1,3,6),(1,4,6).
1. Sorry, SO doesn't let me do LaTeX, but you should get the idea. 2. Yes, I know that's a subscript letter O not a zero, subscript numbers aren't valid identifiers normally, but you can use Slang::Subscripts to enable them.

Parallelize two (or more) functions in julia

I am trying to solve some wave equation problem (related to my Phd) using finite difference method. For this, I have translated (line by line) a fortran code (link below): (https://github.com/geodynamics/seismic_cpml/blob/master/seismic_CPML_2D_anisotropic.f90)
Inside these code and within the time loop, there are four main loops that are independent. In fact, I could arrange them into four functions.
As I have to run this code about a hundred times, it would be nice to speed up the process. In this sense, I am turning my eyes toward parallelization. See below, as an example:
function main()
...some common code...
for time=1:N
function fun1() # I want this function to run parallel...
function fun2() # ..this function to run parallel with 1,3,4
function fun3() # ..This function to run parallel with 2,3,4
function fun4() # ..This function to run parallel with 1,2,3
end
... more code here...
return
end
So,
1) Is it possible to do what I mention before?
2) Will this approach speed up my code?
3) Is there a better way to think this problem?
A minimal working example could be like this:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
a = 2;
b = 2.5;
c = 3.0;
for i=1:100
a = fun1(a);
b = fun2(b);
c = fun3(c);
end
return;
end
So, As can be seen, non of the three functions above (fun1, fun2 & fun3) depend from any ohter, so they can sure run parallel. can these be achieved?, will it bust my computational speed?
Edited:
Hi #BogumiłKamiński I have altered the finite-Diff-eq in order to implement a "loop" (as you sugested) over the inputs and outputs of my functions. If there is no much trouble, I would like your opinion over the parellelization design of the code:
Key elements
1) I have packed all inputs in 4 tuples: sig_xy_in and sig_xy_cros_in (for the 2 sigma functions) and vel_vx_in and vel_vy_in (for 2 velocity functions). I then packed the 4 tuples into 2 vectors for "looping" purposes...
2) I packed the 4 functions in 2 vectors for "looping" purposes...
3) I run the first parallel loop and then unpack its output tuple...
4) I run the second parallel loop(for velocities) and then unpack its output tuple...
5) finally, I packed the outputed elements into the inputs tuples and continue the time loop until finish..
...code
l = Threads.SpinLock()
arg_in_sig = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
arg_in_vel = [vel_vx_in, vel_vy_in]; # Inputs tuples x velocity funct
func_sig = [sig_xy , sig_xy_cros]; # Vector with two sigma functions
func_vel = [vel_vx , vel_vy]; # Vector with two velocity functions
for it = 1:NSTEP # time steps
#------------------------------------------------------------
# Compute sigma functions
#------------------------------------------------------------
Threads.#threads for j in 1:2 # Star parallel of two sigma functs
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
end
# Unpack tuples for sig_xy and sig_xy_cros
# Unpack tuples for sig_xy
sigxx = arg_in_sig[1][1]; # changed by sig_xy
sigyy = arg_in_sig[1][2]; # changed by sig_xy
m_dvx_dx = arg_in_sig[1][3]; # changed by sig_xy
m_dvy_dy = arg_in_sig[1][4]; # changed by sig_xy
vx = arg_in_sig[1][5]; # unchanged by sig_xy
vy = arg_in_sig[1][6]; # unchanged by sig_xy
delx_1 = arg_in_sig[1][7]; # unchanged by sig_xy
dely_1 = arg_in_sig[1][8]; # unchanged by sig_xy
...more unpacking...
# Unpack tuples for sig_xy_cros
sigxy = arg_in_sig[2][1]; # changed by sig_xy_cros
m_dvy_dx = arg_in_sig[2][2]; # changed by sig_xy_cros
m_dvx_dy = arg_in_sig[2][3]; # changed by sig_xy_cros
vx = arg_in_sig[2][4]; # unchanged by sig_xy_cros
vy = arg_in_sig[2][5]; # unchanged by sig_xy_cros
...more unpacking....
#--------------------------------------------------------
# velocity
#--------------------------------------------------------
Threads.#threads for j in 1:2 # Start parallel ot two velocity funct
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[j] = func_vel[j](arg_in_vel[j])
end
# Unpack tuples for vel_vx
vx = arg_in_vel[1][1]; # changed by vel_vx
m_dsigxx_dx = arg_in_vel[1][2]; # changed by vel_vx
m_dsigxy_dy = arg_in_vel[1][3]; # changed by vel_vx
sigxx = arg_in_vel[1][4]; # unchanged changed by vel_vx
sigxy = arg_in_vel[1][5];....
# Unpack tuples for vel_vy
vy = arg_in_vel[2][1]; # changed changed by vel_vy
m_dsigxy_dx = arg_in_vel[2][2]; # changed changed by vel_vy
m_dsigyy_dy = arg_in_vel[2][3]; # changed changed by vel_vy
sigxy = arg_in_vel[2][4]; # unchanged changed by vel_vy
sigyy = arg_in_vel[2][5]; # unchanged changed by vel_vy
.....
...more unpacking...
# ensamble new input variables
sig_xy_in = (sigxx,sigyy,
m_dvx_dx,m_dvy_dy,
vx,vy,....);
sig_xy_cros_in = (sigxy,
m_dvy_dx,m_dvx_dy,
vx,vy,....;
vel_vx_in = (vx,....
vel_vy_in = (vy,.....
end #time loop
Here is a simple way to run your code in multithreading mode:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
l = Threads.SpinLock()
a = [2.0, 2.5, 3.0]
f = [fun1, fun2, fun3]
Threads.#threads for i in 1:3
for j in 1:4
Threads.lock(l)
println((thread=Threads.threadid(), iteration=j))
Threads.unlock(l)
a[i] = f[i](a[i])
end
end
return a
end
I have added locking - just as an example how you can do it (in Julia 1.3 you would not have to do this as IO is thread safe there).
Also note that rand() is sharing data among threads prior to Julia 1.3 so it would be not safe to run these functions if all of them used rand() (again in Julia 1.3 it would be safe to do so).
To run this code first set the maximum number of threads you want to use e.g. like this on Windows: set JULIA_NUM_THREADS=4 (in Linux you should export). Here is an example of this code run (I have reduced the number of iterations done in order to shorten the output):
julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
21.40311930108456
21.402807510451463
1.219028489573526
Now one smal cautionary note - while it is relatively easy to make code multithreaded in Julia (and in Julia 1.3 it will be even simpler) you have to be careful when you do it as you have to take care of race conditions.

prim's algorithm in julia with adjacency matrix

im trying to implement the prim algorithm in julia.
My function gets the adjacency matrix with the weights but isnt working correctly. I dont know what i have to change. There is some problem with the append!() function i guess.
Is there maybe another/better way to implement the algorithm by just passing the adjacency matrix?
Thank you.
function prims(AD)
n = size(AD)
n1 = n[1]
# choose initial vertex from graph
vertex = 1
# initialize empty edges array and empty MST
MST = []
edges = []
visited = []
minEdge = [nothing, nothing, float(Inf)]
# run prims algorithm until we create an MST
# that contains every vertex from the graph
while length(MST) != n1 - 1
# mark this vertex as visited
append!(visited, vertex)
# add each edge to list of potential edges
for r in 1:n1
if AD[vertex][r] != 0
append!(edges, [vertex, r, AD[vertex][r]])
end
end
# find edge with the smallest weight to a vertex
# that has not yet been visited
for e in 1:length(edges)
if edges[e][3] < minEdge[3] && edges[e][2] not in visited
minEdge = edges[e]
end
end
# remove min weight edge from list of edges
deleteat!(edges, minEdge)
# push min edge to MST
append!(MST, minEdge)
# start at new vertex and reset min edge
vertex = minEdge[2]
minEdge = [nothing, nothing, float(Inf)]
end
return MST
end
For example. When im trying the algorithm with this adjacency Matrix
C = [0 2 3 0 0 0; 2 0 5 3 4 0; 3 5 0 0 4 0; 0 3 0 0 2 3; 0 4 4 2 0 5; 0 0 0 3 5 0]
i get
ERROR: BoundsError
Stacktrace:
[1] getindex(::Int64, ::Int64) at .\number.jl:78
[2] prims(::Array{Int64,2}) at .\untitled-8b8d609f2ac8a0848a18622e46d9d721:70
[3] top-level scope at none:0
I guess i have to "reshape" my Matrix C in a form like this
D = [ [0,2,3,0,0,0], [2,0,5,3,4,0], [3,5,0,0,4,0], [0,3,0,0,2,3], [0,4,4,2,0,5], [0,0,0,3,5,0
]]
But also with this i get the same Error.
First note that LightGraphs.jl has this algorithm implemented. You can find the code here.
I have also made some notes on your algorithm to make it working and indicate the potential improvements without changing its general structure:
using DataStructures # library defining PriorityQueue type
function prims(AD::Matrix{T}) where {T<:Real} # make sure we get what we expect
# make sure the matrix is symmetric
#assert transpose(AD) == AD
n1 = size(AD, 1)
vertex = 1
# it is better to keep edge information as Tuple rather than a vector
# also note that I add type annotation for the collection to improve speed
# and make sure that the stored edge weight has a correct type
MST = Tuple{Int, Int, T}[]
# using PriorityQueue makes the lookup of the shortest edge fast
edges = PriorityQueue{Tuple{Int, Int, T}, T}()
# we will eventually visit almost all vertices so we can use indicator vector
visited = falses(n1)
while length(MST) != n1 - 1
visited[vertex] = true
for r in 1:n1
# you access a matrix by passing indices separated by a comma
dist = AD[vertex, r]
# no need to add edges to vertices that were already visited
if dist != 0 && !visited[r]
edges[(vertex, r, dist)] = dist
end
end
# we will iterate till we find an unvisited destination vertex
while true
# handle the case if the graph was not connected
isempty(edges) && return nothing
minEdge = dequeue!(edges)
if !visited[minEdge[2]]
# use push! instead of append!
push!(MST, minEdge)
vertex = minEdge[2]
break
end
end
end
return MST
end

iterative version of recursive algorithm to make a binary tree

Given this algorithm, I would like to know if there exists an iterative version. Also, I want to know if the iterative version can be faster.
This some kind of pseudo-python...
the algorithm returns a reference to root of the tree
make_tree(array a)
if len(a) == 0
return None;
node = pick a random point from the array
calculate distances of the point against the others
calculate median of such distances
node.left = make_tree(subset of the array, such that the distance of points is lower to the median of distances)
node.right = make_tree(subset, such the distance is greater or equal to the median)
return node
A recursive function with only one recursive call can usually be turned into a tail-recursive function without too much effort, and then it's trivial to convert it into an iterative function. The canonical example here is factorial:
# naïve recursion
def fac(n):
if n <= 1:
return 1
else:
return n * fac(n - 1)
# tail-recursive with accumulator
def fac(n):
def fac_helper(m, k):
if m <= 1:
return k
else:
return fac_helper(m - 1, m * k)
return fac_helper(n, 1)
# iterative with accumulator
def fac(n):
k = 1
while n > 1:
n, k = n - 1, n * k
return k
However, your case here involves two recursive calls, and unless you significantly rework your algorithm, you need to keep a stack. Managing your own stack may be a little faster than using Python's function call stack, but the added speed and depth will probably not be worth the complexity. The canonical example here would be the Fibonacci sequence:
# naïve recursion
def fib(n):
if n <= 1:
return 1
else:
return fib(n - 1) + fib(n - 2)
# tail-recursive with accumulator and stack
def fib(n):
def fib_helper(m, k, stack):
if m <= 1:
if stack:
m = stack.pop()
return fib_helper(m, k + 1, stack)
else:
return k + 1
else:
stack.append(m - 2)
return fib_helper(m - 1, k, stack)
return fib_helper(n, 0, [])
# iterative with accumulator and stack
def fib(n):
k, stack = 0, []
while 1:
if n <= 1:
k = k + 1
if stack:
n = stack.pop()
else:
break
else:
stack.append(n - 2)
n = n - 1
return k
Now, your case is a lot tougher than this: a simple accumulator will have difficulties expressing a partly-built tree with a pointer to where a subtree needs to be generated. You'll want a zipper -- not easy to implement in a not-really-functional language like Python.
Making an iterative version is simply a matter of using your own stack instead of the normal language call stack. I doubt the iterative version would be faster, as the normal call stack is optimized for this purpose.
The data you're getting is random so the tree can be an arbitrary binary tree. For this case, you can use a threaded binary tree, which can be traversed and built w/o recursion and no stack. The nodes have a flag that indicate if the link is a link to another node or how to get to the "next node".
From http://en.wikipedia.org/wiki/Threaded_binary_tree
Depending on how you define "iterative", there is another solution not mentioned by the previous answers. If "iterative" just means "not subject to a stack overflow exception" (but "allowed to use 'let rec'"), then in a language that supports tail calls, you can write a version using continuations (rather than an "explicit stack"). The F# code below illustrates this. It is similar to your original problem, in that it builds a BST out of an array. If the array is shuffled randomly, the tree is relatively balanced and the recursive version does not create too deep a stack. But turn off shuffling, and the tree gets unbalanced, and the recursive version stack-overflows whereas the iterative-with-continuations version continues along happily.
#light
open System
let printResults = false
let MAX = 20000
let shuffleIt = true
// handy helper function
let rng = new Random(0)
let shuffle (arr : array<'a>) = // '
let n = arr.Length
for x in 1..n do
let i = n-x
let j = rng.Next(i+1)
let tmp = arr.[i]
arr.[i] <- arr.[j]
arr.[j] <- tmp
// Same random array
let sampleArray = Array.init MAX (fun x -> x)
if shuffleIt then
shuffle sampleArray
if printResults then
printfn "Sample array is %A" sampleArray
// Tree type
type Tree =
| Node of int * Tree * Tree
| Leaf
// MakeTree1 is recursive
let rec MakeTree1 (arr : array<int>) lo hi = // [lo,hi)
if lo = hi then
Leaf
else
let pivot = arr.[lo]
// partition
let mutable storeIndex = lo + 1
for i in lo + 1 .. hi - 1 do
if arr.[i] < pivot then
let tmp = arr.[i]
arr.[i] <- arr.[storeIndex]
arr.[storeIndex] <- tmp
storeIndex <- storeIndex + 1
Node(pivot, MakeTree1 arr (lo+1) storeIndex, MakeTree1 arr storeIndex hi)
// MakeTree2 has all tail calls (uses continuations rather than a stack, see
// http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!171.entry
// for more explanation)
let MakeTree2 (arr : array<int>) lo hi = // [lo,hi)
let rec MakeTree2Helper (arr : array<int>) lo hi k =
if lo = hi then
k Leaf
else
let pivot = arr.[lo]
// partition
let storeIndex = ref(lo + 1)
for i in lo + 1 .. hi - 1 do
if arr.[i] < pivot then
let tmp = arr.[i]
arr.[i] <- arr.[!storeIndex]
arr.[!storeIndex] <- tmp
storeIndex := !storeIndex + 1
MakeTree2Helper arr (lo+1) !storeIndex (fun lacc ->
MakeTree2Helper arr !storeIndex hi (fun racc ->
k (Node(pivot,lacc,racc))))
MakeTree2Helper arr lo hi (fun x -> x)
// MakeTree2 never stack overflows
printfn "calling MakeTree2..."
let tree2 = MakeTree2 sampleArray 0 MAX
if printResults then
printfn "MakeTree2 yields"
printfn "%A" tree2
// MakeTree1 might stack overflow
printfn "calling MakeTree1..."
let tree1 = MakeTree1 sampleArray 0 MAX
if printResults then
printfn "MakeTree1 yields"
printfn "%A" tree1
printfn "Trees are equal: %A" (tree1 = tree2)
Yes it is possible to make any recursive algorithm iterative. Implicitly, when you create a recursive algorithm each call places the prior call onto the stack. What you want to do is make the implicit call stack into an explicit one. The iterative version won't necessarily be faster, but you won't have to worry about a stack overflow. (do I get a badge for using the name of the site in my answer?
While it is true in the general sense that directly converting a recursive algorithm into an iterative one will require an explicit stack, there is a specific sub-set of algorithms which render directly in iterative form (without the need for a stack). These renderings may not have the same performance guarantees (iterating over a functional list vs recursive deconstruction), but they do often exist.
Here is stack based iterative solution (Java):
public static Tree builtBSTFromSortedArray(int[] inputArray){
Stack toBeDone=new Stack("sub trees to be created under these nodes");
//initialize start and end
int start=0;
int end=inputArray.length-1;
//keep memoy of the position (in the array) of the previously created node
int previous_end=end;
int previous_start=start;
//Create the result tree
Node root=new Node(inputArray[(start+end)/2]);
Tree result=new Tree(root);
while(root!=null){
System.out.println("Current root="+root.data);
//calculate last middle (last node position using the last start and last end)
int last_mid=(previous_start+previous_end)/2;
//*********** add left node to the previously created node ***********
//calculate new start and new end positions
//end is the previous index position minus 1
end=last_mid-1;
//start will not change for left nodes generation
start=previous_start;
//check if the index exists in the array and add the left node
if (end>=start){
root.left=new Node(inputArray[((start+end)/2)]);
System.out.println("\tCurrent root.left="+root.left.data);
}
else
root.left=null;
//save previous_end value (to be used in right node creation)
int previous_end_bck=previous_end;
//update previous end
previous_end=end;
//*********** add right node to the previously created node ***********
//get the initial value (inside the current iteration) of previous end
end=previous_end_bck;
//start is the previous index position plus one
start=last_mid+1;
//check if the index exists in the array and add the right node
if (start<=end){
root.right=new Node(inputArray[((start+end)/2)]);
System.out.println("\tCurrent root.right="+root.right.data);
//save the created node and its index position (start & end) in the array to toBeDone stack
toBeDone.push(root.right);
toBeDone.push(new Node(start));
toBeDone.push(new Node(end));
}
//*********** update the value of root ***********
if (root.left!=null){
root=root.left;
}
else{
if (toBeDone.top!=null) previous_end=toBeDone.pop().data;
if (toBeDone.top!=null) previous_start=toBeDone.pop().data;
root=toBeDone.pop();
}
}
return result;
}

Resources