Julia: Finding values larger than 0 in vector with missing - julia

I'm fairly new to Julia and as a Matlab/R User I find it, for the most part, really nice to work with.
However, I'm a little confused by the missing values and how to work with them.
Let's say I have a vector:
a=[missing -1 2 3 -12] #Julia
a=[NaN -1 2 3 -12] #Matlab
In Matlab I would just do the following to find the values below 0
a(a<0)
which gives me
-1 -12
The same unfortunately doesn't work in Julia and when I try
a[a.<0]
in Julia I just get the following error
ERROR: ArgumentError: unable to check bounds for indices of type Missing
I also tried the following
a[findall(skipmissing(a).<0)]
which gives me
missing
3
since, of course, I skipped the missing value in the findall-function. I'm pretty sure there is an easy and logical way to do this, but I don't seem to be able to find it.
Can someone please show me the way?
Best,
Richard

Here is the simplest way to do it:
julia> a=[missing -1 2 3 -12]
1×5 Array{Union{Missing, Int64},2}:
missing -1 2 3 -12
julia> a[isless.(a, 0)]
2-element Array{Union{Missing, Int64},1}:
-1
-12
This uses the fact that missing is considered larger than any number by isless.
Another way to write it:
julia> filter(x -> isless(x, 0), a)
2-element Array{Union{Missing, Int64},1}:
-1
-12
Now in order to avoid this special trick with isless you can do the following (using coalesce is a general approach that can be used for safe handling of missing values):
julia> a[coalesce.(a .< 0, false)]
2-element Array{Union{Missing, Int64},1}:
-1
-12
or
julia> filter(x -> coalesce(x < 0, false), a)
2-element Array{Union{Missing, Int64},1}:
-1
-12
finally you can be more explicit like:
julia> filter(x -> !ismissing(x) && x < 0, a)
2-element Array{Union{Missing, Int64},1}:
-1
-12
or
julia> [v for v in a if !ismissing(v) && v < 0]
2-element Array{Int64,1}:
-1
-12
(you could use comprehension syntax also in the examples above)

Related

Filtering using conditional evaluation (&&)

I am having trouble filtering a simple array-based on two conditions. For example, to filter out values between 3 and 5, I tried the following but I get an ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context error.
arr = Array{Int64}([1,2,3,4,5,6])
arr[(arr .> 3) && (arr.< 5)]
Any idea how to solve it?
Also on a side note, I am wondering if there is a function opposite to isless. Something to find a value greater than a certain value.
Here are two ways to do it:
julia> arr = [1,2,3,4,5,6]
6-element Array{Int64,1}:
1
2
3
4
5
6
julia> arr[(arr .> 3) .& (arr.< 5)]
1-element Array{Int64,1}:
4
julia> filter(v -> 3 < v < 5, arr)
1-element Array{Int64,1}:
4
(I personally prefer filter).
To get the opposite of isless just reverse its arguments, or if needed define a new function:
isgreater(x, y) = isless(y, x)
I prefer a set comparison approach because it's quite intuitive:
julia> arr = Array{Int64}([1,2,3,4,5,6])
julia> intersect( arr[ arr .> 1 ], arr[ arr .< 4 ] )
2-element Array{Int64,1}:
2
3
Or list comprehension:
[ x for x in arr if 3 < x < 5]
# or
[ x for x in arr if 3 < x && x < 5]
Also to define an array literal with specific type Int64, there is a dedicated syntax to make it simpler:
arr = Int64[1,2,3,4,5,6]

Create a Vector of Integers and missing Values

What a hazzle...
I'm trying to create a vector of integers and missing values. This works fine:
b = [4, missing, missing, 3]
But I would actually like the vector to be longer with more missing values and therefore use repeat(), but this doesn't work
append!([1,2,3], repeat([missing], 1000))
and this also doesn't work
[1,2,3, repeat([missing], 1000)]
Please, help me out, here.
It is also worth to note that if you do not need to do an in-place operation with append! actually in such cases it is much easier to do vertical concatenation:
julia> [[1, 2, 3]; repeat([missing], 2); 4; 5] # note ; that denotes vcat
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
julia> vcat([1,2,3], repeat([missing], 2), 4, 5) # this is the same but using a different syntax
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
The benefit of vcat is that it automatically does the type promotion (as opposed to append! in which case you have to correctly specify the eltype of the target container before the operation).
Note that because vcat does automatic type promotion in corner cases you might get a different eltype of the result of the operation:
julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> append!(x, [1.0, 2.0]) # conversion from Float64 to Int happens here
5-element Array{Int64,1}:
1
2
3
1
2
julia> [[1, 2, 3]; [1.0, 2.0]] # promotion of Int to Float64 happens in this case
5-element Array{Float64,1}:
1.0
2.0
3.0
1.0
2.0
See also https://docs.julialang.org/en/v1/manual/arrays/#man-array-literals.
This will work:
append!(Union{Int,Missing}[1,2,3], repeat([missing], 1000))
[1,2,3] creates just a Vector{Int} and since Julia is strongly typed the Vector{Int} cannot accept values of non-Int type. Hence, when defining a structure, that you plan to hold more data types within, you need to explicitly state it - here we have defined Vector{Union{Int,Missing}}.

Find function in Julia 1.0.2

I am transitioning to Julia 1.0.2 and I realized that the find function is not defined. In a previous version (Julia 0.6) I could write
find(x -> x<0, my_var)
In order to get the negative elements of the array called my_var. When I run the same code in Julia 1.0.2 I get the following error:
UndefVarError: find not defined
I couldn't find whether the find function is implemented under a different name or if it has been dropped. Is there any Julia 1.0.2 function that would be equivalent to the find function in previous Julia versions?
Use filter():
julia> filter(x -> x<0, -5:5)
5-element Array{Int64,1}:
-5
-4
-3
-2
-1
Another option is to use findall() to get the indices of elements:
julia> indices = findall(x -> x<0, -5:5)
5-element Array{Int64,1}:
1
2
3
4
5
You can use getindex() to get the actual values, e.g.:
julia> getindex(-5:5,indices)
5-element Array{Int64,1}:
-5
-4
-3
-2
-1

How do you select a subset of an array based on a condition in Julia

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop
julia> map([1,2,3,4]) do x
return (x%2==0)?x:nothing
end
4-element Array{Any,1}:
nothing
2
nothing
4
Desired output:
[2, 4]
Observed output:
[nothing, 2, nothing, 4]
You are looking for filter
http://docs.julialang.org/en/release-0.4/stdlib/collections/#Base.filter
Here is example an
filter(x->x%2==0,[1,2,3,5]) #anwers with [2]
There are element-wise operators (beginning with a "."):
julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
julia> x % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
2
4
julia> x .% 2
4-element Array{Int64,1}:
1
0
1
0
You can use the find() function (or the .== syntax) to accomplish this. E.g.:
julia> x = collect(1:4)
4-element Array{Int64,1}:
1
2
3
4
julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
2
4
julia> y = x[x%2.==0] ## more concise and slightly quicker
2-element Array{Int64,1}:
2
4
Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.
Update: Good point #Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:
x = collect(1:100000000);
#time y1 = filter(x->x%2==0,x);
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)
#time y2 = x[find(x%2.==0)];
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)
#time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)
Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].
Another updated version:
v[v .% 2 .== 0]
Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Location of minimum in Julia

Does Julia have a build in command to find the index of the minimum of a vector? R, for example, has a which.min command (and a which.max, of course).
Obviously, I could write the following myself, but it would be nice not to have to.
function whichmin( x::Vector )
i = 1
min_x=minimum(x)
while( x[i] > min_x )
i+=1
end
return i
end
Apologies if this has been asked before, but I couldn't find it. Thanks!
Since 0.7-alpha, indmin and indmax are deprecated.
Use argmin and argmax instead.
For a vector it just returns the linear index
julia> x = rand(1:9, 4)
4-element Array{Int64,1}:
9
5
8
5
julia> argmin(x)
2
julia> argmax(x)
1
If looking for both the index and the value, use findmin and findmax.
For multidimensional array, all these functions return the CartesianIndex.
I believe indmax(itr) does what you want. From the julia documentation:
indmax(itr) → Integer
Returns the index of the maximum element in a collection.
And here's an example of it in use:
julia> x = [8, -4, 3.5]
julia> indmax(x)
1
There's also findmax, that returns both the maximum value and its position.
For multidim array, you'll have to switch between linear indexes et multidim indexes:
x = rand(1:9, 2,3)
# 2×3 Array{Int64,2}:
# 5 1 9
# 3 3 8
indmin(x)
# 3
# => third element in the column-major ordered array (value=1)
ind2sub(size(x),indmin(x))
# (1, 2)
# => (row,col) indexes: what you are looking for.
-- Maurice

Resources