How to create a ones array in Julia? - julia

In many Machine Learning use cases, you need to create an array filled with ones, with specific dimensions. In Python, I would use np.ones((2, 1)). What is the analog version of this in Julia?

Julia has a built in ones function which can be used as follows:
julia> ones(1,2)
1×2 Matrix{Float64}:
1.0 1.0
You can read more about the ones function in the Julia docs.

The answer by Logan is excellent. You can just use the ones function.
BUT, you can also often not use it.
For instance, a common use of a vector of ones is to multiply that vector times another vector so you get a matrix where each row just has the same value as in the corresponding element of the matrix. Then you can add that matrix to something. This allows you add the values of a vector to the corresponding rows of a matrix. You get code like this:
>>> A = np.random.rand(4,3)
>>> x = np.random.rand(4)
array([0.01250529, 0.9620139 , 0.70991563, 0.99795451])
>>> A + np.reshape(np.ones(3), (1,3)) * np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
This is actually a lot of extra work for the computer because Python can't optimize this and a lot of extra work is going on. You could also use so called broadcasting to do this extension more simply, but you still have to get x into the right shape:
>>> A + x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,3) (4,)
>>> A + np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
In Julia, the extension of the vector to the same shape as the matrix you to which you want to add can be done more simply using the broadcast operator. Thus, the code above simplifies to
julia> A = rand(4,3)
4×3 Matrix{Float64}:
0.885593 0.494999 0.534039
0.915725 0.479218 0.229797
0.739122 0.670486 0.247376
0.419879 0.857314 0.652547
julia> x = rand(4)
4-element Vector{Float64}:
0.9574839624590326
0.9736140903654276
0.6051487944513263
0.3581090323172089
julia> A .+ x
4×3 Matrix{Float64}:
1.84308 1.45248 1.49152
1.88934 1.45283 1.20341
1.34427 1.27563 0.852524
0.777988 1.21542 1.01066
One reason that this works better is because there is less noise in the syntax because arrays are primitive to Julia.
Much more importantly, though, compiler sees the use of the broadcast operator and it can generate very efficient code (and can even vectorize it). In fact, x doesn't even have to be an actual vector as long as it has a few of the same methods defined for it.
In fact, if you really do need a vector or matrix of all ones (or some other constant) you can use broadcast with scalars as well
julia> A .+ 1
4×3 Matrix{Float64}:
1.88559 1.495 1.53404
1.91572 1.47922 1.2298
1.73912 1.67049 1.24738
1.41988 1.85731 1.65255

Related

Problem with asigning non-irrational values to irrational arrays using broadcast .=

Take as example following (irrational) array
a = fill(pi, 10)
When trying to assign a different value to one element, for example
a[1] .= 0.0
Following error occurs:
ERROR: MethodError: no method matching copyto!(::Irrational{:π}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Tuple{},typeof(identity),Tuple{Int64}})
The reason for this is that the element type of a when you construct it like that is the special number typ Irrational{:π} as seen from the output:
julia> a = fill(pi, 2)
2-element Array{Irrational{:π},1}:
π
π
When you try to put another numeric type in this container (e.g. a Float64 with value 0.0 in your example) it is like trying to fit squares in circular holes -- they don't fit.
The solution is to construct the array with the desired element type to start with. For "regular" computations you probably want Float64, so you can convert pi to a float first:
julia> a = fill(float(pi), 2)
2-element Array{Float64,1}:
3.141592653589793
3.141592653589793
The two other answers suggest you to convert your pi to Float64. In Julia you do not have to do that.
v = fill!(Vector{Union{Float64,Irrational}}(undef,10), pi)
Now your vector v can store both Float64 and Irrational numbers. Note that the performance of such code will be worse than having just a Vector{Float64} but on the other hand you are not forced to loose precision (which might be desirable or not).
First of all, we use broadcast to vectorialize operation : if you want to change all the values of the array a, you write
a .= 0.0
And if you want to change only the first value, you write
a[1] = 0.0
wich gives now a different error
ERROR: MethodError: no method matching Irrational{:π}(::Float64)
The problem comes frome the type. As you can see here, https://julialang.org/blog/2017/03/piday/ irrational is some kind of weird type. Shortly, it's only used to stock some classical values ( pi, e, ...) wich can be converted to any floating type without any intermediate routine.
It's a bit strange to set an array of irrational, I think you would prefer to use Float64. If I take your original declation, write
a = fill( Float64(pi), 10 )
an then you can use
a[1] = 0.0
or
a .= 0.0

The use of map when the function that map is used on has arrays of inputs

Julia's "higher-order" function "map" looks very useful. But while it is easy to understand how it can be used on functions that have one input, it is not obvious how map can be used when the function has multiple inputs, and when each these may be arrays. I would like discover how map is used in that situation.
Suppose I have the following function:
function randomSample(items, weights)
sample(items, Weights(weights))
end
Example:
Pkg.add("StatsBase")
using StatsBase
randomSample([1,0],[0.5, 0.5])
How can map be used here? I have tried something like:
items = [1 0;1 0;1 0]
weights = [1 0;0.5 0.5;0.75 0.25]
map(randomSample(items,weights))
In the example above, I would expect Julia to output a 3 by 1 array of integers (from the items), each row being either 0 or 1 depending on the corresponding weights.
In your case when items and weights are Matrix you can use the eachrow function like this:
map(randomSample, eachrow(items), eachrow(weights))
If you are on Julia version earlier than 1.1 you can write:
map(i -> randomSample(items[i, :], weights[i, :]), axes(items, 1))
or
map(i -> randomSample(view(items,i, :), view(weights, i, :)), axes(items, 1))
(the latter avoids allocations)
However, in practice I would probably define items and weights as vectors of vectors:
items = [[1, 0],[1, 0],[1, 0]]
weights = [[1, 0], [0.5, 0.5], [0.75, 0.25]]
and then you can simply write:
map(randomSample, items, weights)
or
randomSample.(items, weights)
The reason for my preference is the following:
it is conceptually clearer what is the structure of your data
vector of vectors is easier to mutate (e.g. you can push! a new entry at the end)
vector of vectors can be ragged if needed
in some cases it might be a bit faster (iterating by rows in Julia is not optimal as it uses column-major indexing; of course you can fix it in your Matrix approach by assuming that you store your data columnwise not colwise as you currently do)
(this is not a very strong preference and you can probably choose whatever is more convenient to you)

How to change a value to missing

I seem to be unable to change a value to missing in Julia version 0.6.4 (I believe it was allowed before 0.6).
Example code:
using Dataframes
x = zeros(5)
5-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
x[3] = missing
ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an
object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] setindex!(::Array{Float64,1}, ::Missings.Missing, ::Int64) at ./array.jl:583
In this setting I am trying to encode certain indicies as missing values for an analysis. Is there a simple workaround?
missing in Julia is of its own type:
julia> typeof(missing)
Missings.Missing
In your case, it is particularly important to note that:
julia> Missing <: Float64
false
That is, Missing is not a subtype of Float64. Now, note that:
julia> typeof(zeros(5))
Array{Float64,1}
So you construct x, an array that should only contain Float64. Since missing is not a subtype of Float64, when you try to change one of the elements of x to missing, you get an error, in the same way you would get an error if you tried x[3] = "a string".
If you want an array to contain both the type Missing and the type Float64, then you need to specify up front that the elements of the array can be of type Missing or type Float64. In Julia v0.6 (which you specify in the question), you can do this via missings, which is located in the Missings.jl package, e.g.:
julia> x = missings(Float64, 2)
2-element Array{Union{Float64, Missings.Missing},1}:
missing
missing
julia> x[1] = 0.0
0.0
julia> x
2-element Array{Union{Float64, Missings.Missing},1}:
0.0
missing
In v1.0, the core functionality related to missing was moved into Base, so instead you would need:
julia> Array{Union{Float64,Missing}}(missing, 2)
2-element Array{Union{Missing, Float64},1}:
missing
missing
which is admittedly a little cumbersome. However, the missings syntax from v0.6 is still available for v1.0 in Missings.jl. It's just that many people may choose not to bother with this since the type Missing itself has moved to Base, so you don't need Missings.jl, unlike v0.6.
If you already have a pre-existing Array{Float64} and want to mark some of the elements as missing, then (as far as I know) you will need to re-construct the array. For example, in both v0.6 and v1.0 you could use:
julia> x = randn(2)
2-element Array{Float64,1}:
-0.642867
-1.17995
julia> y = convert(Vector{Union{Missing,Float64}}, x)
2-element Array{Union{Float64, Missings.Missing},1}:
-0.642867
-1.17995
julia> y[2] = missing
missing
Note that missing is typically envisaged to be used in datatypes like DataFrames, where a lot of this stuff happens automatically for you, and so you don't have to waste time typing out so many Unions. This might be one reason why the syntax is a little verbose when working with regular arrays like you are.
One final point: you could of course explicitly construct your arrays to accept any type, e.g. x = Any[1.0, 2.0] ; x[1] = missing. The downside is that now the compiler cannot generate type-efficient code for working with x and so you will lose the speed benefits of working in Julia.

In Julia assign the diagonal values of a matrix, get "error in method definition"

I want set the diagonal elements of a matrix as 1, so I use diag() function, but I got error.
aa=rand(3,3);
diag(aa)=ones(3)
error in method definition: function LinAlg.diag must be explicitly
imported to be extended
I also try to use diag(aa)=[1,1,1], but it also seems not work.
How can solve this problem.
First of all, diag(aa) = ones(3) is Matlab syntax and doesn't work as you would think. In Julia, it is a method definition for diag, which is why you get that error. You have to use indexing using square brackets, as in C-style languages. (And maybe read about the differences from Matlab to avoid future surprises.)
To answer the question, you can use LinearAlgebra.diagind to get the indices of the diagonal, and assign 1 to them by broadcasting:
julia> diagind(aa)
1:4:9
julia> aa[diagind(aa)] .= 1
3-element SubArray{Float64,1,Array{Float64,1},Tuple{StepRange{Int64,Int64}},true}:
1.0
1.0
1.0
julia> aa
3×3 Array{Float64,2}:
1.0 0.726595 0.195829
0.37975 1.0 0.882588
0.604239 0.309412 1.0

Julia: methods and DataArrays.DataArray

I would like to write a function fun1 with a DataArrays.DataArray y as unique argument. y can be either an integer or a float (in vector or in matrix form).
I have tried to follow the suggestions I have found in stackoverflow (Functions that take DataArrays and Arrays as arguments in Julia) and in the official documentation (http://docs.julialang.org/en/release-0.5/manual/methods/). However, I couldn't write a code enought flexible to deal with the uncertainty around y.
I would like to have something like (but capable of handling numerical DataArrays.DataArray):
function fun1(y::Number)
println(y);
end
Any suggestion?
One options can be to define:
fun1{T<:Number}(yvec::DataArray{T}) = foreach(println,yvec)
Then,
using DataArrays
v = DataArray(rand(10))
w = DataArray(rand(1:10,10))
fun1(v)
#
# elements of v printed as Flaot64s
#
fun1(w)
#
# elements of w printed as Ints
#
A delicate but recurring point to note is the invariance of Julia parametric types which necessitate defining a parametric function. A look at the documentation regarding types should clarify this concept (http://docs.julialang.org/en/release-0.4/manual/types/#types).

Resources