I want to turn an array like this
[1,2,3,4,5]
into a lagged version
[missing,1,2,3,4] # lag 1
[missing,missing,1,2,3] # lag 2
or a led version
[2,3,4,5,missing] # lead 1
[3,4,5,missing,missing] # lead 2
As Julia is designed for scientific computing, there must be something like this, right?
Add ShiftedArrays. See: https://discourse.julialang.org/t/ann-shiftedarrays-and-support-for-shiftedarrays-in-groupederrors/9162
Quoting from the above:
lag, lead functions, to shift an array and add missing (or a custom default value in the latest not yet released version) where the data is not available, or circshift for shifting circularly in a lazy (non allocating) way:
julia> v = [1.2, 2.3, 3.4]
3-element Array{Float64,1}:
1.2
2.3
3.4
julia> lag(v)
3-element ShiftedArrays.ShiftedArray{Float64,Missings.Missing,1,Array{Float64,1}}:
missing
1.2
2.3
Note the ShiftedArray version of lag keeps the array size the same. You might add a short function to make it behave the way you asked:
biglag(v, n) = lag(vcat(v, v[1:n]), n)
Related
In Julia, you can permanently append elements to an existing vector using append! or push!. For example:
julia> vec = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> push!(vec, 4,5)
5-element Vector{Int64}:
1
2
3
4
5
# or
julia> append!(vec, 4,5)
7-element Vector{Int64}:
1
2
3
4
5
But, what is the difference between append! and push!? According to the official doc it's recommended to:
"Use push! to add individual items to a collection which are not already themselves in another collection. The result
of the preceding example is equivalent to push!([1, 2, 3], 4, 5, 6)."
So this is the main difference between these two functions! But, in the example above, I appended individual elements to an existing vector using append!. So why do they recommend using push! in these cases?
append!(v, x) will iterate x, and essentially push! the elements of x to v. push!(v, x) will take x as a whole and add it at the end of v. In your example there is no difference, since in Julia you can iterate a number (it behaves like an iterator with length 1). Here is a better example illustrating the difference:
julia> v = Any[]; # Intentionally using Any just to show the difference
julia> x = [1, 2, 3]; y = [4, 5, 6];
julia> push!(v, x, y);
julia> append!(v, x, y);
julia> v
8-element Vector{Any}:
[1, 2, 3]
[4, 5, 6]
1
2
3
4
5
6
In this example, when using push!, x and y becomes elements of v, but when using append! the elements of x and y become elements of v.
Since Julia is still in its early phase it'd be best if you follow the community standards, and one of the community standards, is your code making "sense" to other developers at first sight - I should know what your "intents" are immediately I read your code.
About append!, the doc says:
"For an ordered container collection, add the elements of each
collections to the end of it. !!! compat "Julia 1.6" Specifying
multiple collections to be appended requires at least Julia 1.6."
The append! method was added and requires Julia 1.6 to use for multiple collections; so in a sense it is the method that's going to be used in the future as Julia gets adopted a lot, Python uses it too, so adopters from there would likely use it too.
About push!, the doc says:
"Insert one or more items in collection. If collection is an ordered
container, the items are inserted at the end (in the given order). If
collection is ordered, use append! to add all the elements of another
collection to it."
The doc advises you use "append!" over "push!" when your collection is ordered. So as a Julia user, if I see append! on your code, I should know the collections its making changes on is in some way "ordered". That's just it. Otherwise, push! and append! does same things(something that might change in the future), but please follow community standards, it will help.
So use append! when you care about order and use push! when order doesn't matter in your collections. This way, any Julia user reading your code, will know your intents right away; but please don't mix them up
What is the best way in Julia to vectorize a function along a specific axis? For example sum up all the rows of a matrix. Is it possible with the dot notation?
sum.(ones(4,4))
Does not yield the desired result.
Try using the dims argument on a lot of functions that deal with sets of values.
sum([1 2; 3 4], dims=2)
2×1 Matrix{Int64}:
3
7
# or
using Statistics
mean([1 2; 3 4], dims=1)
1×2 Matrix{Float64}:
2.0 3.0
There is already a standard function called mapslices, looks like exactly what you need.
julia> mapslices(sum, ones(4, 4), dims = 2)
4-element Vector{Float64}:
4.0
4.0
4.0
4.0
You can find the documentation here or by typing ? followed by mapslices in REPL.
If in your example you want to use the dot notation you should pass an array of rows, not the array itself. Otherwise, sum is applied to each element resulting in the same matrix. It can be done with eachrow and eachcol for rows and columns respectively.
julia> sum.(eachrow(ones(4, 4)))
4-element Vector{Float64}:
4.0
4.0
4.0
4.0
EDIT: I tried to suggest a more general solution, but if you have this option I would recommend using Andre's answer.
I'm trying to analyse some experimental in a matrix and I'm having some issues.
For example I'd like to scale the columns of a matrix so that the first row of each column is 1.
I'd like to do it in the neat/clean julia way that I'm now starting to learn but I'm struggling to find a good solution.
The problem comes from the fact that each column is the result of some experimental test, and they have different lengths. I've "fixed" this by creating the matrix in excel, adding a missing in the empty cells at the bottom of the column and then copy pasting it in julia. I take this is probably not the best way to deal with the issue?
Example: (normal matrices are much bigger though)
A=[1 2 3
4 5 6
missing missing 9]
After that, I'd like to do some analysis, one of which is scaling the matrix so that the first row is = [1 1 1...1]. I tried both map
map((x,y)->x./y,A[2:end,:],A[1,:])
but it seems to apply the top row the the first N elements of the first column only.
Alternatively I tried with mapslices but I'm getting the following error MethodError: Cannot `convert` an object of type Missing to an object of type Float64
I have the feeling I'm missing something and my googlefoo is failing me... any help is much appreciated!
PS: Apologies if I missed some already answered question or if I missed some guideline, I'll try to improve my question if needed. It's the first time I post here!
I'm not sure what your first question is, it seems hard to answer without knowing what the data you're processing it looks like.
Your second question if I understand correctly should be as simple as:
julia> A ./ A[1, :]'
3×3 Matrix{Union{Missing, Float64}}:
1.0 1.0 1.0
4.0 2.5 2.0
missing missing 3.0
Edit to add:
Whether a matrix is or isn't a good idea here depends on the wider context, but if you have some vectors of numbers of different lengths, you can just put them in a vector of vectors rather than a matrix, which means they don't all have to have the same length:
julia> x = rand(3); y = rand(5);
julia> A = [x, y]
2-element Vector{Vector{Float64}}:
[0.2654489138174001, 0.8598585826482341, 0.43527866751212607]
[0.4702376843007643, 0.7890927390349933, 0.6073796489306595, 0.9178238662871376, 0.5917433487576529]
julia> A ./ first.(A)
2-element Vector{Vector{Float64}}:
[1.0, 3.2392620119731323, 1.6397831931290188]
[1.0, 1.6780721013637205, 1.2916439264833126, 1.9518296745015802, 1.2583920185758093]
I have seen online in a few places the solution
a = [1 2 3; 4 5 Inf]
a[isinf(a)] = NaN
But this gives me an error on Julia 1.0.1:
ERROR: MethodError: no method matching isinf(::Array{Float64,2})
Closest candidates are:
isinf(::BigFloat) at mpfr.jl:851
isinf(::Missing) at missing.jl:79
isinf(::ForwardDiff.Dual) at <path on my local machine>
What gives?
As an additional comment. A standard function to perform this action is replace!. You can use it like this:
julia> a = [1 2 3; 4 5 Inf]
2×3 Array{Float64,2}:
1.0 2.0 3.0
4.0 5.0 Inf
julia> replace!(a, Inf=>NaN)
2×3 Array{Float64,2}:
1.0 2.0 3.0
4.0 5.0 NaN
It will perform better than broadcasting for large arrays.
If you really need speed you can write a simple function like this:
function inf2nan(x)
for i in eachindex(x)
#inbounds x[i] = ifelse(isinf(x[i]), NaN, x[i])
end
end
Now let us simply compare the performance of the three options:
julia> function bench()
x = fill(Inf, 10^8)
#time x[isinf.(x)] .= NaN
x = fill(Inf, 10^8)
#time replace!(x, Inf=>NaN)
x = fill(Inf, 10^8)
#time inf2nan(x)
end
bench (generic function with 1 method)
julia> bench()
0.980434 seconds (9 allocations: 774.865 MiB, 0.16% gc time)
0.183578 seconds
0.109929 seconds
julia> bench()
0.971408 seconds (9 allocations: 774.865 MiB, 0.03% gc time)
0.184163 seconds
0.102161 seconds
EDIT: For the most performant approaches to this problem see the excellent answer of #BogumilKaminski. This answer addresses the more general question of why isinf and related functions do not work on arrays anymore.
You are running into the more general issue that lots of functions that worked on arrays pre-v1.0 no longer work on arrays in v1.0 because you are supposed to be using broadcasting. The correct solution for v1.0 is:
a[isinf.(a)] .= NaN
I'm actually broadcasting in two places here. Firstly, we broadcast isinf over the array a, but we are also broadcasting the scalar NaN on the RHS to all indexed locations in the array on the LHS via .=. In general, the dot broadcasting notation is incredibly flexible and performant, and one of my favorite features of the latest iteration of Julia.
You are passing your entire array to isinf, it doesn't work on arrays, it works on numbers. Try this:
[isinf(i) ? NaN : i for i in a]
I was delighted to learn that Julia allows a beautifully succinct way to form inner products:
julia> x = [1;0]; y = [0;1];
julia> x'y
1-element Array{Int64,1}:
0
This alternative to dot(x,y) is nice, but it can lead to surprises:
julia> #printf "Inner product = %f\n" x'y
Inner product = ERROR: type: non-boolean (Array{Bool,1}) used in boolean context
julia> #printf "Inner product = %f\n" dot(x,y)
Inner product = 0.000000
So while i'd like to write x'y, it seems best to avoid it, since otherwise I need to be conscious of pitfalls related to scalars versus 1-by-1 matrices.
But I'm new to Julia, and probably I'm not thinking in the right way. Do others use this succinct alternative to dot, and if so, when is it safe to do so?
There is a conceptual problem here. When you do
julia> x = [1;0]; y = [0;1];
julia> x'y
0
That is actually turned into a matrix * vector product with dimensions of 2x1 and 1 respectively, resulting in a 1x1 matrix. Other languages, such as MATLAB, don't distinguish between a 1x1 matrix and a scalar quantity, but Julia does for a variety of reasons. It is thus never safe to use it as alternative to the "true" inner product function dot, which is defined to return a scalar output.
Now, if you aren't a fan of the dots, you can consider sum(x.*y) of sum(x'y). Also keep in mind that column and row vectors are different: in fact, there is no such thing as a row vector in Julia, more that there is a 1xN matrix. So you get things like
julia> x = [ 1 2 3 ]
1x3 Array{Int64,2}:
1 2 3
julia> y = [ 3 2 1]
1x3 Array{Int64,2}:
3 2 1
julia> dot(x,y)
ERROR: `dot` has no method matching dot(::Array{Int64,2}, ::Array{Int64,2})
You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.
The error message suggestion is dot(vec(x),vec(y), but sum(x.*y) also works in this case and is shorter.
julia> sum(x.*y)
10
julia> dot(vec(x),vec(y))
10
Now, you can write x⋅y instead of dot(x,y).
To write the ⋅ symbol, type \cdot followed by the TAB key.
If the first argument is complex, it is conjugated.
Now, dot() and ⋅ also work for matrices.
Since version 1.0, you need
using LinearAlgebra
before you use the dot product function or operator.