How to change a value to missing - julia

I seem to be unable to change a value to missing in Julia version 0.6.4 (I believe it was allowed before 0.6).
Example code:
using Dataframes
x = zeros(5)
5-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
x[3] = missing
ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an
object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] setindex!(::Array{Float64,1}, ::Missings.Missing, ::Int64) at ./array.jl:583
In this setting I am trying to encode certain indicies as missing values for an analysis. Is there a simple workaround?

missing in Julia is of its own type:
julia> typeof(missing)
Missings.Missing
In your case, it is particularly important to note that:
julia> Missing <: Float64
false
That is, Missing is not a subtype of Float64. Now, note that:
julia> typeof(zeros(5))
Array{Float64,1}
So you construct x, an array that should only contain Float64. Since missing is not a subtype of Float64, when you try to change one of the elements of x to missing, you get an error, in the same way you would get an error if you tried x[3] = "a string".
If you want an array to contain both the type Missing and the type Float64, then you need to specify up front that the elements of the array can be of type Missing or type Float64. In Julia v0.6 (which you specify in the question), you can do this via missings, which is located in the Missings.jl package, e.g.:
julia> x = missings(Float64, 2)
2-element Array{Union{Float64, Missings.Missing},1}:
missing
missing
julia> x[1] = 0.0
0.0
julia> x
2-element Array{Union{Float64, Missings.Missing},1}:
0.0
missing
In v1.0, the core functionality related to missing was moved into Base, so instead you would need:
julia> Array{Union{Float64,Missing}}(missing, 2)
2-element Array{Union{Missing, Float64},1}:
missing
missing
which is admittedly a little cumbersome. However, the missings syntax from v0.6 is still available for v1.0 in Missings.jl. It's just that many people may choose not to bother with this since the type Missing itself has moved to Base, so you don't need Missings.jl, unlike v0.6.
If you already have a pre-existing Array{Float64} and want to mark some of the elements as missing, then (as far as I know) you will need to re-construct the array. For example, in both v0.6 and v1.0 you could use:
julia> x = randn(2)
2-element Array{Float64,1}:
-0.642867
-1.17995
julia> y = convert(Vector{Union{Missing,Float64}}, x)
2-element Array{Union{Float64, Missings.Missing},1}:
-0.642867
-1.17995
julia> y[2] = missing
missing
Note that missing is typically envisaged to be used in datatypes like DataFrames, where a lot of this stuff happens automatically for you, and so you don't have to waste time typing out so many Unions. This might be one reason why the syntax is a little verbose when working with regular arrays like you are.
One final point: you could of course explicitly construct your arrays to accept any type, e.g. x = Any[1.0, 2.0] ; x[1] = missing. The downside is that now the compiler cannot generate type-efficient code for working with x and so you will lose the speed benefits of working in Julia.

Related

How to create a ones array in Julia?

In many Machine Learning use cases, you need to create an array filled with ones, with specific dimensions. In Python, I would use np.ones((2, 1)). What is the analog version of this in Julia?
Julia has a built in ones function which can be used as follows:
julia> ones(1,2)
1×2 Matrix{Float64}:
1.0 1.0
You can read more about the ones function in the Julia docs.
The answer by Logan is excellent. You can just use the ones function.
BUT, you can also often not use it.
For instance, a common use of a vector of ones is to multiply that vector times another vector so you get a matrix where each row just has the same value as in the corresponding element of the matrix. Then you can add that matrix to something. This allows you add the values of a vector to the corresponding rows of a matrix. You get code like this:
>>> A = np.random.rand(4,3)
>>> x = np.random.rand(4)
array([0.01250529, 0.9620139 , 0.70991563, 0.99795451])
>>> A + np.reshape(np.ones(3), (1,3)) * np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
This is actually a lot of extra work for the computer because Python can't optimize this and a lot of extra work is going on. You could also use so called broadcasting to do this extension more simply, but you still have to get x into the right shape:
>>> A + x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,3) (4,)
>>> A + np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
In Julia, the extension of the vector to the same shape as the matrix you to which you want to add can be done more simply using the broadcast operator. Thus, the code above simplifies to
julia> A = rand(4,3)
4×3 Matrix{Float64}:
0.885593 0.494999 0.534039
0.915725 0.479218 0.229797
0.739122 0.670486 0.247376
0.419879 0.857314 0.652547
julia> x = rand(4)
4-element Vector{Float64}:
0.9574839624590326
0.9736140903654276
0.6051487944513263
0.3581090323172089
julia> A .+ x
4×3 Matrix{Float64}:
1.84308 1.45248 1.49152
1.88934 1.45283 1.20341
1.34427 1.27563 0.852524
0.777988 1.21542 1.01066
One reason that this works better is because there is less noise in the syntax because arrays are primitive to Julia.
Much more importantly, though, compiler sees the use of the broadcast operator and it can generate very efficient code (and can even vectorize it). In fact, x doesn't even have to be an actual vector as long as it has a few of the same methods defined for it.
In fact, if you really do need a vector or matrix of all ones (or some other constant) you can use broadcast with scalars as well
julia> A .+ 1
4×3 Matrix{Float64}:
1.88559 1.495 1.53404
1.91572 1.47922 1.2298
1.73912 1.67049 1.24738
1.41988 1.85731 1.65255

Problem with asigning non-irrational values to irrational arrays using broadcast .=

Take as example following (irrational) array
a = fill(pi, 10)
When trying to assign a different value to one element, for example
a[1] .= 0.0
Following error occurs:
ERROR: MethodError: no method matching copyto!(::Irrational{:π}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Tuple{},typeof(identity),Tuple{Int64}})
The reason for this is that the element type of a when you construct it like that is the special number typ Irrational{:π} as seen from the output:
julia> a = fill(pi, 2)
2-element Array{Irrational{:π},1}:
π
π
When you try to put another numeric type in this container (e.g. a Float64 with value 0.0 in your example) it is like trying to fit squares in circular holes -- they don't fit.
The solution is to construct the array with the desired element type to start with. For "regular" computations you probably want Float64, so you can convert pi to a float first:
julia> a = fill(float(pi), 2)
2-element Array{Float64,1}:
3.141592653589793
3.141592653589793
The two other answers suggest you to convert your pi to Float64. In Julia you do not have to do that.
v = fill!(Vector{Union{Float64,Irrational}}(undef,10), pi)
Now your vector v can store both Float64 and Irrational numbers. Note that the performance of such code will be worse than having just a Vector{Float64} but on the other hand you are not forced to loose precision (which might be desirable or not).
First of all, we use broadcast to vectorialize operation : if you want to change all the values of the array a, you write
a .= 0.0
And if you want to change only the first value, you write
a[1] = 0.0
wich gives now a different error
ERROR: MethodError: no method matching Irrational{:π}(::Float64)
The problem comes frome the type. As you can see here, https://julialang.org/blog/2017/03/piday/ irrational is some kind of weird type. Shortly, it's only used to stock some classical values ( pi, e, ...) wich can be converted to any floating type without any intermediate routine.
It's a bit strange to set an array of irrational, I think you would prefer to use Float64. If I take your original declation, write
a = fill( Float64(pi), 10 )
an then you can use
a[1] = 0.0
or
a .= 0.0

How does type inference differ between constant and non-constant global variables?

From the Julia docs on array comprehensions:
The following example computes a weighted average of the current
element and its left and right neighbor along a 1-d grid. :
julia> const x = rand(8)
8-element Array{Float64,1}:
0.843025
0.869052
0.365105
0.699456
0.977653
0.994953
0.41084
0.809411
julia> [ 0.25*x[i-1] + 0.5*x[i] + 0.25*x[i+1] for i=2:length(x)-1 ]
6-element Array{Float64,1}:
0.736559
0.57468
0.685417
0.912429
0.8446
0.656511
Note
In the above example, x is declared as constant because type inference
in Julia does not work as well on non-constant global variables.
The resulting array type is inferred from the expression; in order to
control the type explicitly, the type can be prepended to the
comprehension. For example, in the above example we could have avoided
declaring x as constant, and ensured that the result is of type
Float64 by writing:
Float64[ 0.25*x[i-1] + 0.5*x[i] + 0.25*x[i+1] for i=2:length(x)-1 ]
What does the note near the end mean? That is, how does type inference differ between constant and non-constant global variables?
I believe the problem is that if x is not declared as a const, then Julia has no idea if the type of that variable will ever change (because it never falls out of scope as a global). For this reason, Julia would need to assume x is of type Any.
If x is declared as a const, however, Julia can safely assume that its type will not change, and Julia can make optimizations based on that information.
Note that if you do not declare x as a const, then the returned type from the list comprehension will be Array{Any,1}

Constrain Vector of DataType to be of particular abstract type

Is it possible to create a function which takes a ::Vector{DataType} but constrains all members to be types which inherit from a particular abstract type?
Example:
# this code works but does not constrain the type
function foo{D<:DataType}(arr::Vector{D})
...
end
# this is kind of the syntax I'd like, but Type{Int} !<: Type{Integer}
function bar{D<:Type{Integer}}(arr::Vector{D})
...
end
Thank you
I'm not sure this is possible (cleanly) with a compile-time check. You could consider using a Val type, but this will be messy and probably slower. I would just make it a run-time check:
julia> function bar{T}(::Type{T}, arr::Vector{DataType})
if all(x->x<:T, arr)
println("ok")
else
println("bad")
end
end
bar (generic function with 1 method)
julia> bar(Integer, [Int,Int32])
ok
julia> bar(Integer, [Int,Int32,Float64])
bad
What's your use case for this? There might be an alternative that's cleaner.
just to clarify why function bar{T<:Integer}(arr::Vector{Type{T}}) = println(arr) won't work.
in a nutshell, this is because julia's type parameter is invariant.
firstly, take a look a OP's definition:
function bar{D<:Type{Integer}}(arr::Vector{D})
...
end
the problem here, as OP pointed out, is Type{Int} !<: Type{Integer}.
the reason is that Type{T} is a parametric type, even though Int <: Integer, we don't have Type{Int} <: Type{Integer}.
"bearing in mind"(yes, that's sarcasm) that the type parameter of julia's parametric type is invariant, i suggested to use this version:
function bar{T<:Integer}(arr::Vector{Type{T}})
...
end
it seems good! this time i'm using T instead of Type{T}, so i won't fall into the pit of Type{Int} !<: Type{Integer}.
however, as i wrote down that comment, i had just fallen into another pit -- Vector{} is also a parametric type. even if DataType <: Type{T}, we don't have Vector{DataType} <: Vector{Type{T}}.
as a result, a error will occur when running bar([Int64, Int32]).
julia> bar([Int64, Int32])
ERROR: MethodError: `bar` has no method matching bar(::Array{DataType,1})
julia> methods(bar)
bar{T<:Integer}(arr::Array{Type{T<:Integer},1})
julia> [Int64, Int32]
2-element Array{DataType,1}:
Int64
Int32
EDIT:
hmm, it seems that this problem is not that simple. the key point here is the mysterious relationship between DataType and Type{T}.
# we know that DataType is a subtype of Type{T},
# where T is a `TypeVar` \in [Bottom, Any].
julia> subtypes(Type)
3-element Array{Any,1}:
DataType
TypeConstructor
Union
julia> S = TypeVar(:S, Union{}, Integer, true)
S<:Integer
# but Type{S} is not a subtype of DataType
julia> Type{S} <: DataType
false
julia> Type{S} <: Type
true
i therefore conclude that it's impossible to make ::Vector{DataType} work in your case.
DataType has NO type parameters.
the below definition won't work, which seems like a bug.
julia> a = Array(Type{S}, 2)
2-element Array{Type{S<:Integer},1}:
#undef
#undef
julia> a[1] = Type{Int32} # or Int32
Type{Int32}
julia> a[2] = Type{Float32} # or Float32
Type{Float32}
julia> a
2-element Array{Type{S<:Integer},1}:
Type{Int32}
Type{Float32}
i'll post a question about this strange behavior. #Mageek

Can't declare number type in Julia

According to http://julia.readthedocs.org/en/latest/manual/integers-and-floating-point-numbers/, one should be able to do this:
julia> Float32(-1.5)
-1.5f0
Instead, I get:
julia> Float32(-1.5)
ERROR: type cannot be constructed
This happens for all other attempts to use this syntax, e.g. x = Int16(1)
I'm on 0.3.10.
You are on 0.3.10 but reading the manual for 0.5. In the manual for 0.3 http://julia.readthedocs.org/en/release-0.3/manual/integers-and-floating-point-numbers/
Values can be converted to Float32 easily:
julia> float32(-1.5)
-1.5f0
julia> typeof(ans)
Float32

Resources