Allocating Subarrays in Mergesort - recursion

What's happening, folks.
So, I've done a fair amount of research on merge sort, and in spite of getting the "gist" of it, I am still baffled by how one is supposed to store the subarrays in order to merge them back together—in other words, save them somewhere so that they would "know" each other, as you would otherwise—in classic recursive fashion—have all these independent function calls returning data that I would assume would go out of scope.
Here's what I first thought: create a new array named "subs" to store the subarrays in upon each division (I also considered using a closure to do this and would like to know whether this is advisable). But, as you proceed to the next division, what are you gonna do—replace each element in subs with its subarrays? Then, you would be facing more costly work, especially once you consider how you're gonna move things around in subs in order to ensure that each subarray has its own index.
Heh—I have a bad feeling that this might be a far cry from what's actually supposed to be done. I understand that this algorithm is a classic example of the divide-and-conquer approach, but it's just strange to me that one couldn't just cut to the chase by splitting the array into all of its elements right off the bat (after all, that's the base case, and what would be wrong with throwing in a greedy approach to solving the problem?).
Thanks!
EDIT:
Alright, so I figured it out.
To sum it up: I used indices to track where to place elements (and obviate the need for built-in list functions that may slow down runtime).
By using nested functions and a (hidden) pointer to the new array, I kept data in scope. An auxiliary array buffers data from the subarrays.
In retrospect, what I originally had in mind vaguely resembled insertion sort was, in fact, bottom-up merge sort. Having previously questioned the efficiency and purpose of top-down merge sort, I now understand that by breaking down the problem, it expedites comparisons and swaps (especially when operating on larger lists, which insertion sort would prove to be less efficient in sorting). I did not attempt to implement my initial idea because I did not have a clear enough picture of recursion and how data is passed.
#!/bin/python
import sys
def merge_sort(arr):
def merge(*indices): # indices = first, last, and pivot indices, respectively
head, tail = indices[0], indices[1]
pivot = indices[2]
i = head
j = pivot+1
k = 0
while (i <= pivot and j <= tail):
if new[i] <= new[j]:
aux[k] = new[i]
i += 1
k += 1
else:
aux[k] = new[j]
j += 1
k += 1
while (i <= pivot):
aux[k] = new[i]
i += 1
k += 1
while (j <= tail):
aux[k] = new[j]
j += 1
k += 1
for x in xrange(head, tail+1):
new[x] = aux[x-head]
# end merge
def split(a, *indices): # indices = first and last indices, respectively
head, tail = indices[0], indices[1]
pivot = (head+tail) / 2
if head < tail:
l_sub = a[head:pivot+1]
r_sub = a[pivot+1:tail+1]
split(l_sub, head, pivot)
split(r_sub, pivot+1, tail)
merge(head, tail, pivot)
# end split
new = arr
aux = list(new)
tail = len(new)-1
split(new, 0, tail)
return new
# end merge_sort
if __name__ == "__main__":
loops = int(raw_input().strip())
for _ in xrange(loops):
arr = map(int, raw_input().strip().split(' '))
result = merge_sort(arr)
print result

Related

Iteration in r for loop

I want to write a for loop that iterates over a vector or list, where i'm adding values to them in each iteration. i came up with the following code, it's not iterating more than 1 iteration. i don't want to use a while loop to write this program. I want to know how can i control for loops iterator. thanks in advance.
steps <- 1
random_number <- c(sample(20, 1))
for (item in random_number){
if(item <18){
random_number <- c(random_number,sample(20, 1))
steps <- steps + 1
}
}
print(paste0("It took ", steps, " steps."))
It depends really on what you want to achieve. Either way, I am afraid you cannot change the iterator on the fly. while seems resonable in this context, or perhaps knowing the plausible maximum number of iterations, you could proceed with those, and deal with needless iterations via an if statement. Based on your code, something more like:
steps <- 1
for (item in 1:100){
random_number <- c(sample(20, 1))
if(random_number < 18){
random_number <- c(random_number,sample(20, 1))
steps <- steps + 1
}
}
print(paste0("It took ", steps, " steps."))
Which to be honest is not really different from a while() combined with an if statement to make sure it doesn't run forever.
This can't be done. The vector used in the for loop is evaluated at the start of the loop and any changes to that vector won't affect the loop. You will have to use a while loop or some other type of iteration.

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Reactive programming: how to get a moving average using Reactive.jl

I am wanting to have a signal / function that outputs the moving average of x periods. I have come up with two approaches, both of which work, but I believe the second is more properly reactive. Does anyone know of a better method?
Method 1
#time (
x = Signal(100);
col = foldp(Int[], x) do acc, val
push!(acc, val)
end;
for i in rand(90:110, 100)
push!(x, i)
end;
)
This code executes for me in 0.102663 seconds and can easily yield an average over the last 100 signals: mean(value(col))
Method 2
#time (
a = Signal(100);
sar = Vector{Signal}(101);
sar[1] = a;
for i in 1:100
sar[i+1] = previous(sar[i])
end;
for i in rand(90:110, 100)
push!(a, i)
end;
)
This code executes for me in 0.034911 seconds and can also easily yeild an average of the last 100 signals:
sarval = map(value, sar)
mean(sarval[2:end])`
Neither method above directly provides an output signal; here is Method 1 applied to create a moving average of specified length as a continuing signal:
Method 1 Applied
x = Signal(initial_value)
col = foldp(Float64[], x) do acc, elem
push!(acc, elem)
end
macro moving_average(per, collec)
quote
map($collec) do arr
length(arr) < $per ? mean(arr) : mean(arr[(end-$per+1):end])
end
end
end
ma_period = #moving_average(period_length, col)
This code uses the first method to generate a signal of arrays of all past signals, the length of which increases linearly with the number of signals. For signals that update thousands of times, this seems unwieldy (at best). This does not seem ideal, and I am looking for some ideas for a better approach. Not sure how to take Method 2 and turn it into a signal output, and not sure if it is the best approach I could take.
Also related question: there is a nice built in function previous(signal) that provides a signal lagging 1 update behind signal. Is there a way to specify a signsl that legs a specified, arbitrary number of updates behind signal?

Fortran's do-loop over arbitary indices like for-loop in R?

I have two p-times-n arrays x and missx, where x contains arbitrary numbers and missx is an array containing zeros and ones. I need to perform recursive calculations on those points where missx is zero. The obvious solution would be like this:
do i = 1, n
do j = 1, p
if(missx(j,i)==0) then
z(j,i) = ... something depending on the previous computations and x(j,i)
end if
end do
end do
Problem with this approach is that most of the time missx is always 0, so there is quite a lot if statements which are always true.
In R, I would do it like this:
for(i in 1:n)
for(j in which(xmiss[,i]==0))
z[j,i] <- ... something depending on the previous computations and x[j,i]
Is there a way to do the inner loop like that in Fortran? I did try a version like this:
do i = 1, n
do j = 1, xlength(i) !xlength(i) gives the number of zero-elements in x(,i)
j2=whichx(j,i) !whichx(1:xlength(i),i) contains the indices of zero-elements in x(,i)
z(j2,i) = ... something depending on the previous computations and x(j,i)
end do
end do
This seemed slightly faster than the first solution (if not counting the amount of defining xlength and whichx), but is there some more clever way to this like the R version, so I wouldn't need to store those xlength and whichx arrays?
I don't think you are going to get dramatic speedup anyway, if you must do the iteration for most items, than storing just the list of those with the 0 value for the whole array is not an option. You can of course use the WHERE or FORALL construct.
forall(i = 1: n,j = 1: p,miss(j,i)==0) z(j,i) = ...
or just
where(miss==0) z = ..
But the ussual limitations of these constructs apply.

missing value where TRUE/FALSE needed error in R

I have got a column with different numbers (from 1 to tt) and would like to use looping to perform a count on the occurrence of these numbers in R.
count = matrix(ncol=1,nrow=tt) #creating an empty matrix
for (j in 1:tt)
{count[j] = 0} #initiate count at 0
for (j in 1:tt)
{
for (i in 1:N) #for each observation (1 to N)
{
if (column[i] == j)
{count[j] = count[j] + 1 }
}
}
Unfortunately I keep getting this error.
Error in if (column[i] == j) { :
missing value where TRUE/FALSE needed
So I tried:
for (i in 1:N) #from obs 1 to obs N
if (column[i] = 1) print("Test")
I basically got the same error.
Tried to do abit research on this kind of error and alot have to said about "debugging" which I'm not familiar with.
Hopefully someone can tell me what's happening here. Thanks!
As you progress with your learning of R, one feature you should be aware of is vectorisation. Many operations that (in C say) would have to be done in a loop, can be don all at once in R. This is particularly true when you have a vector/matrix/array and a scalar, and want to perform an operation between them.
Say you want to add 2 to the vector myvector. The C/C++ way to do it in R would be to use a loop:
for ( i in 1:length(myvector) )
myvector[i] = myvector[i] + 2
Since R has vectorisation, you can do the addition without a loop at all, that is, add a scalar to a vector:
myvector = myvector + 2
Vectorisation means the loop is done internally. This is much more efficient than writing the loop within R itself! (If you've ever done any Matlab or python/numpy it's much the same in this sense).
I know you're new to R so this is a bit confusing but just keep in mind that often loops can be eliminated in R.
With that in mind, let's look at your code:
The initialisation of count to 0 can be done at creation, so the first loop is unnecessary.
count = matrix(0,ncol=1,nrow=tt)
Secondly, because of vectorisation, you can compare a vector to a scalar.
So for your inner loop in i, instead of looping through column and doing if column[i]==j, you can do idx = (column==j). This returns a vector that is TRUE where column[i]==j and FALSE otherwise.
To find how many elements of column are equal to j, we just count how many TRUEs there are in idx. That is, we do sum(idx).
So your double-loop can be rewritten like so:
for ( j in 1:tt ) {
idx = (column == j)
count[j] = sum(idx) # no need to add
}
Now it's even possible to remove the outer loop in j by using the function sapply:
sapply( 1:tt, function(j) sum(column==j) )
The above line of code means: "for each j in 1:tt, return function(j)", an returns a vector where the j'th element is the result of the function.
So in summary, you can reduce your entire code to:
count = sapply( 1:tt, function(j) sum(column==j) )
(Although this doesn't explain your error, which I suspect is to do with the construction or class of your column).
I suggest to not use for loops, but use the count function from the plyr package. This function does exactly what you want in one line of code.

Resources