Conditional subset array between ranges - julia

I wish to filter data between a specific range.
dummy = [1,2,3,4,5,6,7,8,9,10]
This works for a single condition:
dummy[dummy .> 4]
If I try set a range:
dummy[dummy .> 4 & dummy .< 7]
This logic doesnt provide the expected output filtering > 4 and < 7.
This did the trick
dummy[(dummy .> 4) .& (dummy .< 7)]

Indexing by a boolean array, either dummy[(4 .< dummy) .& (dummy .< 7)] or dummy[4 .< dummy .< 7] should work; the parentheses in the first snippet are required due to operator precedence. For additional clarity with larger filters, the generation of the boolean array can be vectorized using the #. macro:
dummy[#. 4 < dummy < 7]
Note that filtering using boolean arrays will allocate memory for the intermediate array; thus, the filter/filter! functions may come in handy. Both of the following calls are equivalent, with the latter improving readability for larger conditions.
filter(x -> 4 < x < 7, dummy)
filter(dummy) do x
4 < x < 7
end
The filter! function may be used in place of filter if mutation of the existing array is acceptable.

Related

How to delete an element from a list in Julia?

v = range(1e10, -1e10, step=-1e8) # velocities [cm/s]
deleteat!(v, findall(x->x==0,v))
I want to delete the value 0 from v. Following this tutorial, I tried deleteat! but I get the error
MethodError: no method matching deleteat!(::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, ::Vector{Int64})
What am I missing here?
Notice the type that is returned by the function range.
typeof(range(1e10, -1e10, step=-1e8))
The above yields to
StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}
Calling the help function for the function deleteat!.
? deleteat!()
deleteat!(a::Vector, inds)
Remove the items at the indices given by inds, and return the > modified a. Subsequent items are shifted to fill the resulting gap.
inds can be either an iterator or a collection of sorted and > unique integer indices, or a boolean vector of the same length as a with true indicating entries to delete.
We can convert the returned type of range using collect. Try the following code.
v = collect(range(1e10, -1e10, step=-1e8))
deleteat!(v,findall(x->x==0,v))
Notice that we can shorten x->x==0 to iszero which yields to
v = collect(range(1e10, -1e10, step=-1e8))
deleteat!(v,findall(iszero,v))
Use filter! or filter:
julia> filter!(!=(0), [1,0,2,0,4])
3-element Vector{Int64}:
1
2
4
In case of a range you can collect it or use:
julia> filter(!=(0), range(2, -2, step=-1))
4-element Vector{Int64}:
2
1
-1
-2
However for big ranges you might just not want to materialize them to save the memory footprint. In that case you could use:
(x for x in range(2, -2, step=-1) if x !== 0)
To see what is being generated you need to collect it:
julia> collect(x for x in range(2, -2, step=-1) if x !== 0)
4-element Vector{Int64}:
2
1
-1
-2

Creating subset using or statement

I have a data frame with 6 columns and thousands of rows containing share transactions. I want to identify rows with bad price data. The following function gives me a subset with the rows with good price data:
function in_price_range(df)
price_good = subset(df, :UnitPrice => X-> (trough_share_price .<= X .<= peak_share_price), skipmissing=true)
return price_good
end
For a subset for bad data I tried:
function out_price_range(df)
price_discrepancy = subset(df, :UnitPrice => X-> (X .< trough_share_price || X .> peak_share_price), skipmissing=true)
return price_discrepancy
end
However, that givers error TypeError: non-boolean (BitVector) used in boolean context
I tried .|| rather than || but that then gives error: syntax: "|" is not a unary operator
How do I fix the code?
In Julia, || is
help?> ||
search: ||
x || y
Short-circuiting boolean OR.
The short-circuiting part meaning, that if x is true, || will not even bother to evaluate y. In other words, this will make a branch in the code. For example:
julia> 5 < 7 || print("This is unreachable")
true
This is great if you want to write code that is efficient for a case like
if something_easy_to_evaluate || something_costly_to_evaluate
# Do something
end
In other words, this is control flow! Obviously, this cannot be broadcasted. For that, what you want is the regular or operator |, which you can broadcast with .|. So for example:
julia> a = rand(3) .< 0.5
3-element BitVector:
1
0
0
julia> b = rand(3) .< 0.5
3-element BitVector:
0
1
0
julia> a .|| b
ERROR: syntax: "|" is not a unary operator
Stacktrace:
[1] top-level scope
# none:1
julia> a .| b
3-element BitVector:
1
1
0
The same applies to && vs &; the former is only used for control-flow, the latter is normal bitwise and.

How to convert a multidimensional array to/from vector of vector of ... vector in julia

Is there a method in julia to convert a multidimensional array to a vector of vector and so on, and vice versa? It is OK to define a method for a fix number of dimensions. But how about a method for arbitrary dims?
julia> s = (1,2,3)
julia> a = reshape(1:prod(s), s)
1×2×3 Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}:
[:, :, 1] =
1 2
[:, :, 2] =
3 4
[:, :, 3] =
5 6
julia> b = [[[a[i,j,k] for i=1:s[1]] for j=1:s[2]] for k=1:s[3]]
3-element Array{Array{Array{Int64,1},1},1}:
Array{Int64,1}[[1], [2]]
Array{Int64,1}[[3], [4]]
Array{Int64,1}[[5], [6]]
julia> unstack(a) == b
ERROR: UndefVarError: unstack not defined
RecursiveArrayTools.jl can help with this kind of work.
recs = [rand(8) for i in 1:10]
A = VectorOfArray(recs)
A[i] # Returns the ith array in the vector of arrays
A[j,i] # Returns the jth component in the ith array
A[j1,...,jN,i] # Returns the (j1,...,jN) component of the ith array
So it acts like the matrix without ever building the matrix, which is a good way to save allocations if you tend to act on the columns (which are the separate arrays). It also has a fast conversion to a contiguous array via the indexing fallback (honestly, I tried to create a faster one but the fallback worked better than I could make it):
arr = convert(Array,A)
Converting back would require allocating of course
VA = VectorOfArray([A[:,i] for i in size(A,2)])

Transform nested array into new dimension

Given an array as follows:
A = Array{Array{Int}}(2,2)
A[1,1] = [1,2]
A[1,2] = [3,4]
A[2,1] = [5,6]
A[2,2] = [7,8]
We then have that A is a 2x2 array with elements of type Array{Int}:
2×2 Array{Array{Int64,N} where N,2}:
[1, 2] [3, 4]
[5, 6] [7, 8]
It is possible to access the entries with e.g. A[1,2] but A[1,2,2] would not work since the third dimension is not present in A. However, A[1,2][2] works, since A[1,2] returns an array of length 2.
The question is then, what is a nice way to convert A into a 3-dimensional array, B, so that B[i,j,k] refers the the i,j-th array and the k-th element in that array. E.g. B[2,1,2] = 6.
There is a straightforward way to do this using 3 nested loops and reconstructing the array, element-by-element, but I'm hoping there is a nicer construction. (Some application of cat perhaps?)
You can construct a 3-d array from A using an array comprehension
julia> B = [ A[i,j][k] for i=1:2, j=:1:2, k=1:2 ]
2×2×2 Array{Int64,3}:
[:, :, 1] =
1 3
5 7
[:, :, 2] =
2 4
6 8
julia> B[2,1,2]
6
However a more general solution would be to overload the getindex function for arrays with the same type of A. This is more efficient since there is no need to copy the original data.
julia> import Base.getindex
julia> getindex(A::Array{Array{Int}}, i::Int, j::Int, k::Int) = A[i,j][k]
getindex (generic function with 179 methods)
julia> A[2,1,2]
6
With thanks to Dan Getz's comments, I think the following works well and is succinct:
cat(3,(getindex.(A,i) for i=1:2)...)
where 2 is the length of the nested array. It would also work for higher dimensions.
permutedims(reshape(collect(Base.Iterators.flatten(A)), (2,2,2)), (2,3,1))
also does the job and appears to be faster than the accepted cat() answer for me.
EDIT: I'm sorry, I just saw that this has already been suggested in the comments.

List comprehensions and tuples in Julia

I am trying to do in Julia what this Python code does. (Find all pairs from the two lists whose combined value is above 7.)
#Python
def sum_is_large(a, b):
return a + b > 7
l1 = [1,2,3]
l2 = [4,5,6]
l3 = [(a,b) for a in l1 for b in l2 if sum_is_large(a, b)]
print(l3)
There is no if for list comprehensions in Julia. And if I use filter(), I'm not sure if I can pass two arguments. So my best suggestion is this:
#Julia
function sum_is_large(pair)
a, b = pair
return a + b > 7
end
l1 = [1,2,3]
l2 = [4,5,6]
l3 = filter(sum_is_large, [(i,j) for i in l1, j in l2])
print(l3)
I don't find this very appealing. So my question is, is there a better way in Julia?
Using the very popular package Iterators.jl, in Julia:
using Iterators # install using Pkg.add("Iterators")
filter(x->sum(x)>7,product(l1,l2))
is an iterator producing the pairs. So to get the same printout as the OP:
l3iter = filter(x->sum(x)>7,product(l1,l2))
for p in l3iter println(p); end
The iterator approach is potentially much more memory efficient. Ofcourse, one could just l3 = collect(l3iter) to get the pair vector.
#user2317519, just curious, is there an equivalent iterator form for python?
Guards (if) are now available in Julia v0.5 (currently in the release-candidate stage):
julia> v1 = [1, 2, 3];
julia> v2 = [4, 5, 6];
julia> v3 = [(a, b) for a in v1, b in v2 if a+b > 7]
3-element Array{Tuple{Int64,Int64},1}:
(3,5)
(2,6)
(3,6)
Note that generators are also now available:
julia> g = ( (a, b) for a in v1, b in v2 if a+b > 7 )
Base.Generator{Filter{##18#20,Base.Prod2{Array{Int64,1},Array{Int64,1}}},##17#19}(#17,Filter{##18#20,Base.Prod2{Array{Int64,1},Array{Int64,1}}}(#18,Base.Prod2{Array{Int64,1},Array{Int64,1}}([1,2,3],[4,5,6])))
Another option similar to the one of #DanGetz using also Iterators.jl:
function expensive_fun(a, b)
return (a + b)
end
Then, if the condition is also complicated, it can be defined as a function:
condition(x) = x > 7
And last, filter the results:
>>> using Iterators
>>> result = filter(condition, imap(expensive_fun, l1, l2))
result is an iterable that is only computed when needed (inexpensive) and can be collected collect(result) if required.
The one-line if the filter condition is simple enough would be:
>>> result = filter(x->(x > 7), imap(expensive_fun, l1, l2))
Note: imap works natively for arbitrary number of parameters.
Perhaps something like this:
julia> filter(pair -> pair[1] + pair[2] > 7, [(i, j) for i in l1, j in l2])
3-element Array{Tuple{Any,Any},1}:
(3,5)
(2,6)
(3,6)
although I'd agree it doesn't look like it ought to be the best way...
I'm surprised nobody mentions the ternary operator to implement the conditional:
julia> l3 = [sum_is_large((i,j)) ? (i,j) : nothing for i in l1, j in l2]
3x3 Array{Tuple,2}:
nothing nothing nothing
nothing nothing (2,6)
nothing (3,5) (3,6)
or even just a normal if block within a compound statement, i.e.
[ (if sum_is_large((x,y)); (x,y); end) for x in l1, y in l2 ]
which gives the same result.
I feel this result makes a lot more sense than filter(), because in julia the a in A, b in B construct is interpreted dimensionally, and therefore the output is in fact an "array comprehension" with appropriate dimensionality, which clearly in many cases would be advantageous and presumably the desired behaviour (whether we include a conditional or not).
Whereas filter will always return a vector. Obviously, if you really want a vector result you can always collect the result; or for a conditional list comprehension like the one here, you can simply remove nothing elements from the array by doing l3 = l3[l3 .!= nothing].
Presumably this is still clearer and no less efficient than the filter() approach.
You can use the #vcomp (vector comprehension) macro in VectorizedRoutines.jl to do Python-like comprehensions:
using VectorizedRoutines
Python.#vcomp Int[i^2 for i in 1:10] when i % 2 == 0 # Int[4, 16, 36, 64, 100]

Resources