Parallel iteration over array with step size greater than 1 - .net-core

I'm working on a practice program for doing belief propagation stereo vision. The relevant aspect of that here is that I have a fairly long array representing every pixel in an image, and want to carry out an operation on every second entry in the array at each iteration of a for loop - first one half of the entries, and then at the next iteration the other half (this comes from an optimisation described by Felzenswalb & Huttenlocher in their 2006 paper 'Efficient belief propagation for early vision'.) So, you could see it as having an outer for loop which runs a number of times, and for each iteration of that loop I iterate over half of the entries in the array.
I would like to parallelise the operation of iterating over the array like this, since I believe it would be thread-safe to do so, and of course potentially faster. The operation involved updates values inside the data structures representing the neighbouring pixels, which are not themselves used in a given iteration of the outer loop. Originally I just iterated over the entire array in one go, which meant that it was fairly trivial to carry this out - all I needed to do was put .Parallel between Array and .iteri. Changing to operating on every second array entry is trickier, however.
To make the change from simply iterating over every entry, I from Array.iteri (fun i p -> ... to using for i in startIndex..2..(ArrayLength - 1) do, where startIndex is either 1 or 0 depending on which one I used last (controlled by toggling a boolean). This means though that I can't simply use the really nice .Parallel to make things run in parallel.
I haven't been able to find anything specific about how to implement a parallel for loop in .NET which has a step size greater than 1. The best I could find was a paragraph in an old MSDN document on parallel programming in .NET, but that paragraph only makes a vague statement about transforming an index inside a loop body. I do not understand what is meant there.
I looked at Parallel.For and Parallel.ForEach, as well as creating a custom partitioner, but none of those seemed to include options for changing the step size.
The other option that occurred to me was to use a sequence expression such as
let getOddOrEvenArrayEntries myarray oddOrEven =
seq {
let startingIndex =
if oddOrEven then
1
else
0
for i in startingIndex..2..(Array.length myarray- 1) do
yield (i, myarray.[i])
}
and then using PSeq.iteri from ParallelSeq, but I'm not sure whether it will work correctly with .NET Core 2.2. (Note that, currently at least, I need to know the index of the given element in the array, as it is used as the index into another array during the processing).
How can I go about iterating over every second element of an array in parallel? I.e. iterating over an array using a step size greater than 1?

You could try PSeq.mapi which provides not only a sequence item as a parameter but also the index of an item.
Here's a small example
let res = nums
|> PSeq.mapi(fun index item -> if index % 2 = 0 then item else item + 1)
You can also have a look over this sampling snippet. Just be sure to substitute Seq with PSeq

Related

Saving multiple sparse arrays in one big sparse array

I have been trying to implement some code in Julia JuMP. The idea of my code is that I have a for loop inside my while loop that runs S times. In each of these loops I solve a subproblem and get some variables as well as opt=1 if the subproblem was optimal or opt=0 if it was not optimal. Depending on the value of opt, I have two types of constraints, either optimality cuts (if opt=1) or feasibility cuts (if opt=0). So the intention with my code is that I only add all of the optimality cuts if there are no feasibility cuts for s=1:S (i.e. we get opt=1 in every iteration from 1:S).
What I am looking for is a better way to save the values of ubar, vbar and wbar. Currently I am saving them one at a time with the for-loop, which is quite expensive.
So the problem is that my values of ubar,vbar and wbar are sparse axis arrays. I have tried to save them in other ways like making a 3d sparse axis array, which I could not get to work, since I couldn't figure out how to initialize it.
The below code works (with the correct code inserted inside my <>'s of course), but does not perform as well as I wish. So if there is some way to save the values of 2d sparse axis arrays more efficiently, I would love to know it! Thank you in advance!
ubar2=zeros(nV,nV,S)
vbar2=zeros(nV,nV,S)
wbar2=zeros(nV,nV,S)
while <some condition>
opts=0
for s=1:S
<solve a subproblem, get new ubar,vbar,wbar and opt=1 if optimal or 0 if not>
opts+=opt
if opt==1
# Add opt cut Constraints
for i=1:nV
for k=1:nV
if i!=k
ubar2[i,k,s]=ubar[i,k]
end
end
for j=i:nV
if links[i,j]==1
vbar2[i,j,s]=vbar[i,j]
wbar2[i,j,s]=wbar[i,j]
end
end
end
else
# Add feas cut Constraints
#constraint(mas, <constraint from ubar,vbar,wbar> <= 0)
break
end
if opts==S
for s=1:S
#constraint(mas, <constraint from ubar2,vbar2,wbar2> <= <some variable>)
end
end
end
A SparseAxisArray is simply a thin wrapper in top of a Dict.
It was defined such that when the user creates a container in a JuMP macro, whether he gets an Array, a DenseAxisArray or a SparseAxisArray, it behaves as close as possible to one another hence the user does not need to care about what he obtained for most operations.
For this reason we could not just create a Dict as it behaves differently as an array. For instance you cannot do getindex with multiple indices as x[2, 2].
Here you can use either a Dict or a SparseAxisArray, as you prefer.
Both of them have O(1) complexity for setting and getting new elements and a sparse storage which seems to be adequate for what you need.
If you choose SparseAxisArray, you can initialize it with
ubar2 = JuMP.Containers.SparseAxisArray(Dict{Tuple{Int,Int,Int},Float64}())
and set it with
ubar2[i,k,s]=ubar[i,k]
If you choose Dict, you can initialize it with
ubar2 = Dict{Tuple{Int,Int,Int},Float64}()
and set it with
ubar2[(i,k,s)]=ubar[i,k]

Why after pressing semicolon program is back in deep recursion?

I'm trying to understand the semicolon functionality.
I have this code:
del(X,[X|Rest],Rest).
del(X,[Y|Tail],[Y|Rest]) :-
del(X,Tail,Rest).
permutation([],[]).
permutation(L,[X|P]) :- del(X,L,L1), permutation(L1,P).
It's the simple predicate to show all permutations of given list.
I used the built-in graphical debugger in SWI-Prolog because I wanted to understand how it works and I understand for the first case which returns the list given in argument. Here is the diagram which I made for better understanding.
But I don't get it for the another solution. When I press the semicolon it doesn't start in the place where it ended instead it's starting with some deep recursion where L=[] (like in step 9). I don't get it, didn't the recursion end earlier? It had to go out of the recursions to return the answer and after semicolon it's again deep in recursion.
Could someone clarify that to me? Thanks in advance.
One analogy that I find useful in demystifying Prolog is that Backtracking is like Nested Loops, and when the innermost loop's variables' values are all found, the looping is suspended, the vars' values are reported, and then the looping is resumed.
As an example, let's write down simple generate-and-test program to find all pairs of natural numbers above 0 that sum up to a prime number. Let's assume is_prime/1 is already given to us.
We write this in Prolog as
above(0, N), between(1, N, M), Sum is M+N, is_prime(Sum).
We write this in an imperative pseudocode as
for N from 1 step 1:
for M from 1 step 1 until N:
Sum := M+N
if is_prime(Sum):
report_to_user_and_ask(Sum)
Now when report_to_user_and_ask is called, it prints Sum out and asks the user whether to abort or to continue. The loops are not exited, on the contrary, they are just suspended. Thus all the loop variables values that got us this far -- and there may be more tests up the loops chain that sometimes succeed and sometimes fail -- are preserved, i.e. the computation state is preserved, and the computation is ready to be resumed from that point, if the user presses ;.
I first saw this in Peter Norvig's AI book's implementation of Prolog in Common Lisp. He used mapping (Common Lisp's mapcan which is concatMap in Haskell or flatMap in many other languages) as a looping construct though, and it took me years to see that nested loops is what it is really all about.
Goals conjunction is expressed as the nesting of the loops; goals disjunction is expressed as the alternatives to loop through.
Further twist is that the nested loops' structure isn't fixed from the outset. It is fluid, the nested loops of a given loop can be created depending on the current state of that loop, i.e. depending on the current alternative being explored there; the loops are written as we go. In (most of the) languages where such dynamic creation of nested loops is impossible, it can be encoded with nested recursion / function invocation / inside the loops. (Here's one example, with some pseudocode.)
If we keep all such loops (created for each of the alternatives) in memory even after they are finished with, what we get is the AND-OR tree (mentioned in the other answer) thus being created while the search space is being explored and the solutions are found.
(non-coincidentally this fluidity is also the essence of "monad"; nondeterminism is modeled by the list monad; and the essential operation of the list monad is the flatMap operation which we saw above. With fluid structure of loops it is "Monad"; with fixed structure it is "Applicative Functor"; simple loops with no structure (no nesting at all): simply "Functor" (the concepts used in Haskell and the like). Also helps to demystify those.)
So, the proper slogan could be Backtracking is like Nested Loops, either fixed, known from the outset, or dynamically-created as we go. It's a bit longer though. :)
Here's also a Prolog example, which "as if creates the code to be run first (N nested loops for a given value of N), and then runs it." (There's even a whole dedicated tag for it on SO, too, it turns out, recursive-backtracking.)
And here's one in Scheme ("creates nested loops with the solution being accessible in the innermost loop's body"), and a C++ example ("create n nested loops at run-time, in effect enumerating the binary encoding of 2n, and print the sums out from the innermost loop").
There is a big difference between recursion in functional/imperative programming languages and Prolog (and it really became clear to me only in the last 2 weeks or so):
In functional/imperative programming, you recurse down a call chain, then come back up, unwinding the stack, then output the result. It's over.
In Prolog, you recurse down an AND-OR tree (really, alternating AND and OR nodes), selecting a predicate to call on an OR node (the "choicepoint"), from left to right, and calling every predicate in turn on an AND node, also from left to right. An acceptable tree has exactly one predicate returning TRUE under each OR node, and all predicates returning TRUE under each AND node. Once an acceptable tree has been constructed, by the very search procedure, we are (i.e. the "search cursor" is) on a rightmost bottommost node .
Success in constructing an acceptable tree also means a solution to the query entered at the Prolog Toplevel (the REPL) has been found: The variable values are output, but the tree is kept (unless there are no choicepoints).
And this is also important: all variables are global in the sense that if a variable X as been passed all the way down the call chain from predicate to predicate to the rightmost bottommost node, then constrained at the last possible moment by unifying it with 2 for example, X = 2, then the Prolog Toplevel is aware of that without further ado: nothing needs to be passed up the call chain.
If you now press ;, search doesn't restart at the top of the tree, but at the bottom, i.e. at the current cursor position: the nearest parent OR node is asked for more solutions. This may result in much search until a new acceptable tree has been constructed, we are at a new rightmost bottommost node. The new variable values are output and you may again enter ;.
This process cycles until no acceptable tree can be constructed any longer, upon which false is output.
Note that having this AND-OR as an inspectable and modifiable data structure at runtime allows some magical tricks to be deployed.
There is bound to be a lot of power in debugging tools which record this tree to help the user who gets the dreaded sphynxian false from a Prolog program that is supposed to work. There are now Time Traveling Debuggers for functional and imperative languages, after all...

Determine if there exists a number in the array occurring k times

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.
This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

Number of movements in a dynamic array

A dynamic array is an array that doubles its size, when an element is added to an already full array, copying the existing elements to a new place more details here. It is clear that there will be ceil(log(n)) of bulk copy operations.
In a textbook I have seen the number of movements M as being computed this way:
M=sum for {i=1} to {ceil(log(n))} of i*n/{2^i} with the argument that "half the elements move once, a quarter of the elements twice"...
But I thought that for each bulk copy operation the number of copied/moved elements is actually n/2^i, as every bulk operation is triggered by reaching and exceeding the 2^i th element, so that the number of movements is
M=sum for {i=1} to {ceil(log(n))} of n/{2^i} (for n=8 it seems to be the correct formula).
Who is right and what is wrong in the another argument?
Both versions are O(n), so there is no big difference.
The textbook version counts the initial write of each element as a move operation but doesn't consider the very first element, which will move ceil(log(n)) times. Other than that they are equivalent, i.e.
(sum for {i=1} to {ceil(log(n))} of i*n/{2^i}) - (n - 1) + ceil(log(n))
== sum for {i=1} to {ceil(log(n))} of n/{2^i}
when n is a power of 2. Both are off by different amounts when n is not a power of 2.

modifying an element of a list in-place in J, can it be done?

I have been playing with an implementation of lookandsay (OEIS A005150) in J. I have made two versions, both very simple, using while. type control structures. One recurs, the other loops. Because I am compulsive, I started running comparative timing on the versions.
look and say is the sequence 1 11 21 1211 111221 that s, one one, two ones, etc.
For early elements of the list (up to around 20) the looping version wins, but only by a tiny amount. Timings around 30 cause the recursive version to win, by a large enough amount that the recursive version might be preferred if the stack space were adequate to support it. I looked at why, and I believe that it has to do with handling intermediate results. The 30th number in the sequence has 5808 digits. (32nd number, 9898 digits, 34th, 16774.)
When you are doing the problem with recursion, you can hold the intermediate results in the recursive call, and the unstacking at the end builds the results so that there is minimal handling of the results.
In the list version, you need a variable to hold the result. Every loop iteration causes you to need to add two elements to the result.
The problem, as I see it, is that I can't find any way in J to modify an extant array without completely reassigning it. So I am saying
try. o =. o,e,(0&{y) catch. o =. e,(0&{y) end.
to put an element into o where o might not have a value when we start. That may be notably slower than
o =. i.0
.
.
.
o =. (,o),e,(0&{y)
The point is that the result gets the wrong shape without the ravels, or so it seems. It is inheriting a shape from i.0 somehow.
But even functions like } amend don't modify a list, they return a list that has a modification made to it, and if you want to save the list you need to assign it. As the size of the assigned list increases (as you walk the the number from the beginning to the end making the next number) the assignment seems to take more time and more time. This assignment is really the only thing I can see that would make element 32, 9898 digits, take less time in the recursive version while element 20 (408 digits) takes less time in the loopy version.
The recursive version builds the return with:
e,(0&{y),(,lookandsay e }. y)
The above line is both the return line from the function and the recursion, so the whole return vector gets built at once as the call gets to the end of the string and everything unstacks.
In APL I thought that one could say something on the order of:
a[1+rho a] <- new element
But when I try this in NARS2000 I find that it causes an index error. I don't have access to any other APL, I might be remembering this idiom from APL Plus, I doubt it worked this way in APL\360 or APL\1130. I might be misremembering it completely.
I can find no way to do that in J. It might be that there is no way to do that, but the next thought is to pre-allocate an array that could hold results, and to change individual entries. I see no way to do that either - that is, J does not seem to support the APL idiom:
a<- iota 5
a[3] <- -1
Is this one of those side effect things that is disallowed because of language purity?
Does the interpreter recognize a=. a,foo or some of its variants as a thing that it should fastpath to a[>:#a]=.foo internally?
This is the recursive version, just for the heck of it. I have tried a bunch of different versions and I believe that the longer the program, the slower, and generally, the more complex, the slower. Generally, the program can be chained so that if you want the nth number you can do lookandsay^: n ] y. I have tried a number of optimizations, but the problem I have is that I can't tell what environment I am sending my output into. If I could tell that I was sending it to the next iteration of the program I would send it as an array of digits rather than as a big number.
I also suspect that if I could figure out how to make a tacit version of the code, it would run faster, based on my finding that when I add something to the code that should make it shorter, it runs longer.
lookandsay=: 3 : 0
if. 0 = # ,y do. return. end. NB. return on empty argument
if. 1 ~: ##$ y do. NB. convert rank 0 argument to list of digits
y =. (10&#.^:_1) x: y
f =. 1
assert. 1 = ##$ y NB. the converted argument must be rank 1
else.
NB. yw =. y
f =. 0
end.
NB. e should be a count of the digits that match the leading digit.
e=.+/*./\y=0&{y
if. f do.
o=. e,(0&{y),(,lookandsay e }. y)
assert. e = 0&{ o
10&#. x: o
return.
else.
e,(0&{y),(,lookandsay e }. y)
return.
end.
)
I was interested in the characteristics of the numbers produced. I found that if you start with a 1, the numerals never get higher than 3. If you start with a numeral higher than 3, it will survive as a singleton, and you can also get a number into the generated numbers by starting with something like 888888888 which will generate a number with one 9 in it and a single 8 at the end of the number. But other than the singletons, no digit gets higher than 3.
Edit:
I did some more measuring. I had originally written the program to accept either a vector or a scalar, the idea being that internally I'd work with a vector. I had thought about passing a vector from one layer of code to the other, and I still might using a left argument to control code. With I pass the top level a vector the code runs enormously faster, so my guess is that most of the cpu is being eaten by converting very long numbers from vectors to digits. The recursive routine always passes down a vector when it recurs which might be why it is almost as fast as the loop.
That does not change my question.
I have an answer for this which I can't post for three hours. I will post it then, please don't do a ton of research to answer it.
assignments like
arr=. 'z' 15} arr
are executed in place. (See JWiki article for other supported in-place operations)
Interpreter determines that only small portion of arr is updated and does not create entire new list to reassign.
What happens in your case is not that array is being reassigned, but that it grows many times in small increments, causing memory allocation and reallocation.
If you preallocate (by assigning it some large chunk of data), then you can modify it with } without too much penalty.
After I asked this question, to be honest, I lost track of this web site.
Yes, the answer is that the language has no form that means "update in place, but if you use two forms
x =: x , most anything
or
x =: most anything } x
then the interpreter recognizes those as special and does update in place unless it can't. There are a number of other specials recognized by the interpreter, like:
199(1000&|#^)199
That combined operation is modular exponentiation. It never calculates the whole exponentiation, as
199(1000&|^)199
would - that just ends as _ without the #.
So it is worth reading the article on specials. I will mark someone else's answer up.
The link that sverre provided above ( http://www.jsoftware.com/jwiki/Essays/In-Place%20Operations ) shows the various operations that support modifying an existing array rather than creating a new one. They include:
myarray=: myarray,'blah'
If you are interested in a tacit version of the lookandsay sequence see this submission to RosettaCode:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1
11 21 1211 111221 312211

Resources