#view when both sides of equation are array slice in Julia - julia

I tried to optimize my code with #view. But I don't know If I should use #view if array slice is in left-hand side.
At first, I knew #view could reduce cost when used in the right-hand side like this :
a = #view b[:]
But if the left-hand side is also array slice. I don't know the meaning of #view.
For example, see the code below
using BenchmarkTools
a = ones(50,100)
b = zeros(50,50)
#benchmark a[:,1:25] = b[:,1:25]
#benchmark a[:,1:25] = #view b[:,1:25]
The second allocation with #view is faster. And this operation is more like a copy because if I change element in b, a won't change.
So my question is what is the meaning of #view in this case? Should I use #view in this situation?
Why I change elements in a, b won't change?
The second allocation seems faster in this case, but I found using #view might slow down my program when used in a larger case.

a[:,1:25] = b[:,1:25] creates an array for b[:,1:25] and then copies element by element a[:,1:25] = #view b[:,1:25] skips the first part leading to no allocations and better performance.

Related

Failure to report number that is too small

I did the following calculations in Julia
z = LinRange(-0.09025000000000001,0.19025000000000003,5)
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* (similar(z) .*0 .+1))
minimum(cdf.(d, (z[3]+z[2])/2))
The problem I have is that the last code sometimes gives me the correct result 4.418051841202834e-239, sometimes reports the error DomainError with NaN: Normal: the condition σ >= zero(σ) is not satisfied. I think this is because 4.418051841202834e-239 is too small. But I was wondering why my code can give me different results.
In addition to points mentioned by others, here are a few more:
Firstly, don't use LinRange when numerical accuracy is of importance. This is what the range function is for. LinRange can be used when numerical precision is of lesser importance, since it is faster. From the docstring of range:
Special care is taken to ensure intermediate values are computed rationally. To avoid this induced overhead, see the LinRange constructor.
Example:
julia> LinRange(-0.09025000000000001,0.19025000000000003,5) .- range(-0.09025000000000001,0.19025000000000003,5)
0.0:-3.469446951953614e-18:-1.3877787807814457e-17
Secondly, this is a pretty terrible way to create a vector of a certain value:
0.0051 .* (similar(z) .*0 .+1)
Other's have mentioned ones, etc. but I think it's better to use fill
fill(0.0051, size(z))
which directly fills the array with the right value. Perhaps one should use convert(eltype(z), 0.0051) inside fill.
Thirdly, don't create this vector at all! You use broadcasting, so just use the scalar value:
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051) # look! just a scalar!
This is how broadcasting works, it expands singleton dimensions implicitly to match other arguments (without actually wasting that memory).
Much of the point of broadcasting is that you don't need to create that sort of 'dummy arrays' anymore. If you find yourself doing that, give it another think; constant-valued arrays are inherently wasteful, and you shouldn't need to create them.
There are two problems:
Noted by #Dan Getz: similar does no initialize the values and quite often unused areas of memory have values corresponding to NaN. In that case multiplication by 0 does not help since NaN * 0 == NaN. Instead you want to have ones(eltype(z),size(z))
you need to use higher precision than Float64. BigFloat is one way to go - just you need to remember to call setprecision(BigFloat, 128) so you actually control how many bits you use. However, much more time-efficient solution (if you run computations at scale) will be to use a dedicated package such as DoubleFloats.
Sample corrected code using DoubleFloats below:
julia> z = LinRange(df64"-0.09025000000000001",df64"0.19025000000000003",5)
5-element LinRange{Double64, Int64}:
-0.09025000000000001,-0.020125,0.05000000000000001,0.12012500000000002,0.19025000000000003
julia> d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* ones(eltype(z),size(z)))
5-element Vector{Normal{Double64}}:
Normal{Double64}(μ=-0.083250505, σ=0.0051)
Normal{Double64}(μ=-0.016631754999999998, σ=0.0051)
Normal{Double64}(μ=0.049986995000000006, σ=0.0051)
Normal{Double64}(μ=0.11660574500000001, σ=0.0051)
Normal{Double64}(μ=0.18322449500000001, σ=0.0051)
julia> minimum(cdf.(d, (z[3]+z[2])/2))
4.418051841203009e-239
The problem in the code is similar(z) which produces a vector with undefined entries and is used without initialization. Use ones(length(z)) instead.

rev_append vs (append or #)

If we have two lists l1 and l2 and we want to concatenate them we can use # or append which is in O(n1) where n1 is the length of l1. Or we can use rev_append which is according to the doc:
equivalent to List.rev l1 # l2, but rev_append is tail-recursive and more efficient.
So is rev_append more efficient than # or is it more efficient than List.rev + #? And is it better to use it instead of # and append when we don't care about the order?
OCaml lists are immutable. The second list doesn't need to be changed, but the first list has to be copied so the copy can point to the second list. Hence you're going to have to traverse the first list somehow. Nothing you can do will change the big-O time complexity of the append.
Since you can only add new elements at the beginning of a list, you need to traverse the first list in reverse order if you want the result to preserve the order of the first list.
The most obvious way to do this is to call recursively until you're at the end of the first list, then do the prefixing as you return from each recursive call. However this isn't tail-recursive. I.e., it will consume stack space proportional to the length of the first list. When the first list is long, you can run out of stack space (aka stack overflow).
This is the way that # works. It takes time and stack space proportional to the length of the first list.
Another idea is to give up on maintaining the order of the first list. If you prefix the first list in reverse order, you can can easily make the operation tail recursive. That's the purpose of List.rev_append. It takes constant stack space.
If you want to maintain the original list orders, but also use constant stack space you can reverse the first list (with List.rev), then use List.rev_append.
Plain List.rev_append is faster than # because it doesn't have to make internal function calls--it can just be a loop. It's also obviously faster than List.rev plus List.rev_append.
In summary if you don't care about the final order, then List.rev_append is faster than #, yes. Also it won't overflow the stack. It's not going to be a gigantic amount faster because the time complexity is basically the same.

Saving multiple sparse arrays in one big sparse array

I have been trying to implement some code in Julia JuMP. The idea of my code is that I have a for loop inside my while loop that runs S times. In each of these loops I solve a subproblem and get some variables as well as opt=1 if the subproblem was optimal or opt=0 if it was not optimal. Depending on the value of opt, I have two types of constraints, either optimality cuts (if opt=1) or feasibility cuts (if opt=0). So the intention with my code is that I only add all of the optimality cuts if there are no feasibility cuts for s=1:S (i.e. we get opt=1 in every iteration from 1:S).
What I am looking for is a better way to save the values of ubar, vbar and wbar. Currently I am saving them one at a time with the for-loop, which is quite expensive.
So the problem is that my values of ubar,vbar and wbar are sparse axis arrays. I have tried to save them in other ways like making a 3d sparse axis array, which I could not get to work, since I couldn't figure out how to initialize it.
The below code works (with the correct code inserted inside my <>'s of course), but does not perform as well as I wish. So if there is some way to save the values of 2d sparse axis arrays more efficiently, I would love to know it! Thank you in advance!
ubar2=zeros(nV,nV,S)
vbar2=zeros(nV,nV,S)
wbar2=zeros(nV,nV,S)
while <some condition>
opts=0
for s=1:S
<solve a subproblem, get new ubar,vbar,wbar and opt=1 if optimal or 0 if not>
opts+=opt
if opt==1
# Add opt cut Constraints
for i=1:nV
for k=1:nV
if i!=k
ubar2[i,k,s]=ubar[i,k]
end
end
for j=i:nV
if links[i,j]==1
vbar2[i,j,s]=vbar[i,j]
wbar2[i,j,s]=wbar[i,j]
end
end
end
else
# Add feas cut Constraints
#constraint(mas, <constraint from ubar,vbar,wbar> <= 0)
break
end
if opts==S
for s=1:S
#constraint(mas, <constraint from ubar2,vbar2,wbar2> <= <some variable>)
end
end
end
A SparseAxisArray is simply a thin wrapper in top of a Dict.
It was defined such that when the user creates a container in a JuMP macro, whether he gets an Array, a DenseAxisArray or a SparseAxisArray, it behaves as close as possible to one another hence the user does not need to care about what he obtained for most operations.
For this reason we could not just create a Dict as it behaves differently as an array. For instance you cannot do getindex with multiple indices as x[2, 2].
Here you can use either a Dict or a SparseAxisArray, as you prefer.
Both of them have O(1) complexity for setting and getting new elements and a sparse storage which seems to be adequate for what you need.
If you choose SparseAxisArray, you can initialize it with
ubar2 = JuMP.Containers.SparseAxisArray(Dict{Tuple{Int,Int,Int},Float64}())
and set it with
ubar2[i,k,s]=ubar[i,k]
If you choose Dict, you can initialize it with
ubar2 = Dict{Tuple{Int,Int,Int},Float64}()
and set it with
ubar2[(i,k,s)]=ubar[i,k]

Parallel iteration over array with step size greater than 1

I'm working on a practice program for doing belief propagation stereo vision. The relevant aspect of that here is that I have a fairly long array representing every pixel in an image, and want to carry out an operation on every second entry in the array at each iteration of a for loop - first one half of the entries, and then at the next iteration the other half (this comes from an optimisation described by Felzenswalb & Huttenlocher in their 2006 paper 'Efficient belief propagation for early vision'.) So, you could see it as having an outer for loop which runs a number of times, and for each iteration of that loop I iterate over half of the entries in the array.
I would like to parallelise the operation of iterating over the array like this, since I believe it would be thread-safe to do so, and of course potentially faster. The operation involved updates values inside the data structures representing the neighbouring pixels, which are not themselves used in a given iteration of the outer loop. Originally I just iterated over the entire array in one go, which meant that it was fairly trivial to carry this out - all I needed to do was put .Parallel between Array and .iteri. Changing to operating on every second array entry is trickier, however.
To make the change from simply iterating over every entry, I from Array.iteri (fun i p -> ... to using for i in startIndex..2..(ArrayLength - 1) do, where startIndex is either 1 or 0 depending on which one I used last (controlled by toggling a boolean). This means though that I can't simply use the really nice .Parallel to make things run in parallel.
I haven't been able to find anything specific about how to implement a parallel for loop in .NET which has a step size greater than 1. The best I could find was a paragraph in an old MSDN document on parallel programming in .NET, but that paragraph only makes a vague statement about transforming an index inside a loop body. I do not understand what is meant there.
I looked at Parallel.For and Parallel.ForEach, as well as creating a custom partitioner, but none of those seemed to include options for changing the step size.
The other option that occurred to me was to use a sequence expression such as
let getOddOrEvenArrayEntries myarray oddOrEven =
seq {
let startingIndex =
if oddOrEven then
1
else
0
for i in startingIndex..2..(Array.length myarray- 1) do
yield (i, myarray.[i])
}
and then using PSeq.iteri from ParallelSeq, but I'm not sure whether it will work correctly with .NET Core 2.2. (Note that, currently at least, I need to know the index of the given element in the array, as it is used as the index into another array during the processing).
How can I go about iterating over every second element of an array in parallel? I.e. iterating over an array using a step size greater than 1?
You could try PSeq.mapi which provides not only a sequence item as a parameter but also the index of an item.
Here's a small example
let res = nums
|> PSeq.mapi(fun index item -> if index % 2 = 0 then item else item + 1)
You can also have a look over this sampling snippet. Just be sure to substitute Seq with PSeq

Pushing element at back of vec in armadillo

How can I push an element at the end of vector in vec of armadillo? I am performing adding and removing an element in a sorted list in a loop. This is very expensive thing. The way I am currently doing in case of removing an element from a vec x to vec x_curr as:
x_curr = x(find(x != element))
However its not trivial in case of adding an element in loop.
x_curr = x; x_curr << element; x_curr = sort(x_curr);
This not correct. In addition not very efficient. What would be most efficient way to do this in armadillo. Any other STL library solution. I am using this in Rcpp armadillo. I can perhaps sorting every loop. x_curr is used to store of indices of column of arma::mat i.e. I am going to use it as mat.col(x_curr).
I don't understand your question.
Armadillo is a math library, so it operates on vectors. If you do not know your size, you could allocate a guessed N elements and resize in the common 'times two' idiom as needed, and shrink at the end. If you know the size, well then you have no problem.
The STL has the so-called generic containers and algorithms, but it does not do linear algebra. You need to figure out what you need most, and plan your implementation accordingly.
I am not sure that I understood what you want to do,
but if you want to append an element at the end of your vector,
you can do it like this:
int sz = yourvector.size();
yourvector.resize(sz+1);
yourvector(sz) = element;

Resources