select string array values based on another array - julia

I can select data that is equal to a value with
data = rand(1:3, 10)
value = 2
data .== value
or equal to a list of values with
values = [1, 2]
in.(data, (values,))
The last one is generic and also works for a scalar: in.(data, (value, )) .
However, this works for Int, but the generic does not work for String values:
data = rand(["A", "B", "C"], 10)
value = "B"
data .== value
values = ["A","B"]
in.(data, (values, ))
in.(data, (value, ))
ERROR: use occursin(x, y) for string containment
Is there a generic way for Strings?
For a generic val input I'm now writing the following, but I feel there must be a better solution.
isa(val, AbstractArray) ? in.(data, (val,)) : data .== val
Background: I'm creating a function to select rows from a dataframe (and do something with them) but I want to allow for both a list of values as well as a single value.

Here is a trick that is worth knowing:
[x;]
Now - if x is an array it will remain an array. If x is a scalar it will become a 1-element array. And this is exactly what you need.
So you can write
in.(data, ([val;],))
The drawback is that it allocates a new array, but I guess that val is small and it is not used in performance critical code? If the code is performance critical I think it is better to treat scalars and arrays by separate branches.

Related

Creating an array of `nothing` of any size in Julia

In Julia it is possible to create arrays of any size using the functions zeros(.) or ones(.). Is there a similar function to create an array that is filled with nothing at initialization but also accepts floats? I mean a function like in this example:
a = array_of_nothing(3)
# a = [nothing,nothing,nothing]
a[1] = 3.14
# a = [3.14,nothing,nothing]
I tried to find information on internet, but without success... Sorry, I am a beginner in Julia.
The fill function can be used to create arrays of arbitrary values, but it's not so easy to use here, since you want a Vector{Union{Float64, Nothing}}. Two options come to mind:
A comprehension:
a = Union{Float64, Nothing}[nothing for _ in 1:3];
a[2] = 3.14;
>> a
3-element Array{Union{Nothing, Float64},1}:
nothing
3.14
nothing
Or ordinary array initialization:
a = Vector{Union{Float64, Nothing}}(undef, 3)
fill!(a, nothing)
a[2] = 3.14
It seems that when you do Vector{Union{Float64, Nothing}}(undef, 3) the vector automatically contains nothing, but I wouldn't rely on that, so fill! may be necessary.
I think you are looking for the Base.fill — Function.
fill(x, dims)
This creates an array filled with value x.
println(fill("nothing", (1,3)))
You can also pass a function Foo() like fill(Foo(), dims) which will return an array filled with the result of evaluating Foo() once.

DataFrame column with nothing values in Julia?

I am trying to understand how DataFrames work in Julia and I am having a rough time.
I usually worked with DataFrames --in Python-- adding new columns on every simulation step and populating each row with values.
For example, I have this DataFrame which contains input Data:
using DataFrames
df = DataFrame( A=Int[], B=Int[] )
push!(df, [1, 10])
push!(df, [2, 20])
push!(df, [3, 30])
Now, let's say that I do calculations based on those A and B columns that generate a third column C with DateTime objects. But DateTime objects are not generated for all rows, they could be null.
How is that use case handled in Julia?
How shall I create the new C column and assign values inside the for r in eachrow(df)?
# Pseudocode of what I intend to do
df[! :C] .= nothing
for r in eachrow(df)
if condition
r.C = mySuperComplexFunctionThatReturnsDateTimeForEachRow()
else
r.C = nothing
end
end
To give a runable and concrete code, let's fake condition and function:
df[! :C] .= nothing
for r in eachrow(df)
if r.A == 2
r.C = Dates.now()
else
r.C = nothing
end
end
The efficient pattern to do this is:
df.C = f.(df.A, df.B)
where f is a function that takes scalars and calculates an output based on them (i.e. your simulation code) and you pass to it the columns you need to extract from df to perform the calculations. In this way the Julia compiler will be able to generate fast (type-stable) native code.
In your example the function f would be ifelse so you could write:
df.C = ifelse.(df.A .== 2, Dates.now(), nothing)
Also consider if you return nothing or missing (they have a different interpretation in Julia: nothing means absence of a value and missing means that the value is present but is not known; I am not sure which would be better in your case).
If you initialize the column with df[!, :C] .= nothing it has the element type Nothing. When writing DateTimes to this column, Julia is attempting to convert them to Nothing and fails.
I am not sure if this is the most efficient or recommended solution, but if you initialize the column as a union of DateTime and Nothing
df[!, :C] = Vector{Union{DateTime, Nothing}}(nothing, size(df, 1))
your example should work.

In-place modification/reassignment of vector in Julia without getting copies

Here's some toy code:
type MyType
x::Int
end
vec = [MyType(1), MyType(2), MyType(3), MyType(4)]
ids = [2, 1, 3, 1]
vec = vec[ids]
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(1)
MyType(3)
MyType(1)
That looks fine, except for this behavior:
julia> vec[2].x = 60
60
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(60)
MyType(3)
MyType(60)
I want to be able to rearrange the contents of a vector, with the possibility that I eliminate some values and duplicate others. But when I duplicate values, I don't want this copy behavior. Is there an "elegant" way to do this? Something like this works, but yeesh:
vec = [deepcopy(vec[ids[i]]) for i in 1:4]
The issue is that you're creating mutable types, and your vector therefore contains references to the instantiated data - so when you create a vector based on ids, you're creating what amounts to a vector of pointers to the structures. This further means that the elements in the vector with the same id are actually pointers to the same object.
There's no good way to do this without ensuring that your references are different. That either means 1) immutable types, which means you can't reassign x, or 2) copy/deepcopy.

How to know the index of the iterator when using map in Julia

I have an Array of arrays, called y:
y=Array(Vector{Int64}, 10)
which is basically a list of 1-dimensional arrays(10 of them), and each 1-dimensional array has length 5. Below is an example of how they are initialized:
for i in 1:10
y[i]=sample(1:20, 5)
end
Each 1-dimensional array includes 5 randomly sampled integers between 1 to 20.
Right now I am applying a map function where for each of those 1-dimensional arrays in y , excludes which numbers from 1 to 20:
map(x->setdiff(1:20, x), y)
However, I want to make sure when the function applied to y[i], if the output of setdiff(1:20, y[i]) includes i, i is excluded from the results. in other words I want a function that works like
setdiff(deleteat!(Vector(1:20),i) ,y[i])
but with map.
Mainly my question is that whether you can access the index in the map function.
P.S, I know how to do it with comprehensions, I wanted to know if it is possible to do it with map.
comprehension way:
[setdiff(deleteat!(Vector(1:20), index), value) for (index,value) in enumerate(y)]
Like this?
map(x -> setdiff(deleteat!(Vector(1:20), x[1]),x[2]), enumerate(y))
For your example gives this:
[2,3,4,5,7,8,9,10,11,12,13,15,17,19,20]
[1,3,5,6,7,8,9,10,11,13,16,17,18,20]
....
[1,2,4,7,8,10,11,12,13,14,15,16,17,18]
[1,2,3,5,6,8,11,12,13,14,15,16,17,19,20]

Julia: append to an empty vector

I would like to create an empty vector and append to it an array in Julia. How do I do that?
x = Vector{Float64}
append!(x, rand(10))
results in
`append!` has no method matching append!(::Type{Array{Float64,1}}, ::Array{Float64,1})
Thanks.
Your variable x does not contain an array but a type.
x = Vector{Float64}
typeof(x) # DataType
You can create an array as Array(Float64, n)
(but beware, it is uninitialized: it contains arbitrary values) or zeros(Float64, n),
where n is the desired size.
Since Float64 is the default, we can leave it out.
Your example becomes:
x = zeros(0)
append!( x, rand(10) )
I am somewhat new to Julia and came across this question after getting a similar error. To answer the original question for Julia version 1.2.0, all that is missing are ():
x = Vector{Float64}()
append!(x, rand(10))
This solution (unlike x=zeros(0)) works for other data types, too. For example, to create an empty vector to store dictionaries use:
d = Vector{Dict}()
push!(d, Dict("a"=>1, "b"=>2))
A note regarding use of push! and append!:
According to the Julia help, push! is used to add individual items to a collection, while append! adds an collection of items to a collection. So, the following pieces of code create the same array:
Push individual items:
a = Vector{Float64}()
push!(a, 1.0)
push!(a, 2.0)
Append items contained in an array:
a = Vector{Float64}()
append!(a, [1.0, 2.0])
You can initialize an empty Vector of any type by typing the type in front of []. Like:
Float64[] # Returns what you want
Array{Float64, 2}[] # Vector of Array{Float64,2}
Any[] # Can contain anything
New answer, for Julia 1. append! is deprecated, you now need to use push!(array, element) to add elements to an array
my_stuff = zeros()
push!(my_stuff, "new element")

Resources