Passing NA to R using JULIA RCall package - julia

I have some problem with passing NA in an array to R for imputeTS package.
assume I have this array:
a = Any[1, 2, 3, NaN, 5]
and I want to pass it to this:
R"""
b <- na_seadec($a, algorithm = "kalman", find_frequency = TRUE, maxgap = Inf)
"""
NaN does not convert to NA automatically. How can I pass exactly NA value to RCall?

NaN in Julia will be NaN in R.
If you want NA in R you should use missing in Julia:
julia> x = [1, 2, NaN]
3-element Array{Float64,1}:
1.0
2.0
NaN
julia> y = [1, 2, missing]
3-element Array{Union{Missing, Int64},1}:
1
2
missing
julia> R"$x"
RObject{RealSxp}
[1] 1 2 NaN
julia> R"$y"
RObject{IntSxp}
[1] 1 2 NA
You can find the details in this section of the Julia manual.
And here is an example session:
julia> R"library(imputeTS)"
RObject{StrSxp}
[1] "imputeTS" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
julia> a = [1,2,3,missing,5]
5-element Array{Union{Missing, Int64},1}:
1
2
3
missing
5
julia> R"""
b <- na_seadec($a, algorithm = "kalman", find_frequency = TRUE, maxgap = Inf)
"""
┌ Warning: RCall.jl: Warning in na_seadec(`#JL`$a, algorithm = "kalman", find_frequency = TRUE, :
│ No seasonality information for dataset could be found, going on without decomposition.
│ Setting find_frequency=TRUE might be an option.
└ # RCall ~/.julia/packages/RCall/g7dhB/src/io.jl:113
RObject{RealSxp}
[1] 1 2 3 4 5

Related

Pairwise distance matrix between two vectors

x1=[1,2,3]
x2=[2,3,4]
how to find the Pairwise distance matrix between x1 and x2 (distance matrix should be a 3 x 3 matrix)
This is not a Euclidean distance matrix, but it is 3 X 3. Is it what you want?
julia> x1 = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> x2 = [2,3,4]
3-element Vector{Int64}:
2
3
4
julia> [(a-b)^2 for a in x1, b in x2]
3×3 Matrix{Int64}:
1 4 9
0 1 4
1 0 1
With Distances.jl:
julia> pairwise(Euclidean(), x1, x2)
3×3 Matrix{Float64}:
1.0 2.0 3.0
0.0 1.0 2.0
1.0 0.0 1.0
(Although this will not return integers, as it uses BLAS stuff internally.)
Since you ask for Euclidean distance between all combinations of vector pairs from x1 and x2, that's, the distance between [1, 2] and [2, 3], [1, 2] and [2, 4], ..., [2, 3] and [3, 4], this can be done as follows:
Using Combinatorics.jl, construct all pairs from x1 by taking 2 elements at a time. Do the same for x2. Now you have c1 and c2, just loop over the two sequences applying the formal definition of Euclidean distance, sqrt(sum((x-y)^2)), to get the 3-by-3 matrix of pairwise distances that you want.
using Combinatorics
x1 = [1, 2, 3]
x2 = [2, 3, 4]
c1 = combinations(x1, 2)
c2 = combinations(x2, 2)
pairwise = [sqrt(sum((i.-j).^2)) for i in c1, j in c2]
3×3 Matrix{Float64}:
1.41421 2.23607 2.82843
1.0 1.41421 2.23607
0.0 1.0 1.41421
If you like higher-order index notations similar to math books, you can use Tullio.jl like this:
using Tullio
x = collect(combinations(x1, 2))
y = collect(combinations(x2, 2))
#tullio pairwise[i,j] := sqrt(sum((x[i].-y[j]).^2))
3×3 Matrix{Float64}:
1.41421 2.23607 2.82843
1.0 1.41421 2.23607
0.0 1.0 1.41421
You can try:
abs.(x1 .- x2')
#3×3 Array{Int64,2}:
# 1 2 3
# 0 1 2
# 1 0 1
Where x2' turns x2 in a column vector and .- and abs. makes element wise operations.
Or creating the desired pairs (1,2)and(2,3), (1,2)and(2,4), ... (2,3)and(3,4) and calculating the distance using norm.
using LinearAlgebra
#Create pairs
c1 = [(x1[i], x1[j]) for i in 1:lastindex(x1)-1 for j in i+1:lastindex(x1)]
c2 = [(x2[i], x2[j]) for i in 1:lastindex(x2)-1 for j in i+1:lastindex(x2)]
#Calc distance
[norm(i.-j) for i in c1, j in c2]
#3×3 Array{Float64,2}:
# 1.41421 2.23607 2.82843
# 1.0 1.41421 2.23607
# 0.0 1.0 1.41421
#Calc cityblock distance
[norm(i.-j, 1) for i in c1, j in c2]
#3×3 Array{Float64,2}:
# 2.0 3.0 4.0
# 1.0 2.0 3.0
# 0.0 1.0 2.0

Removing elements containing NaN in R list

I have written a function that returns such a list:
$F1
[1] NaN
$F2
[1] NaN
$F3
[1] NaN
$F4
[1] NaN
$F5
[1] NaN
$F1_a
[1] NaN
$F2_a
a b c d e
0.602060 -0.090309 -0.090309 -0.090309 -0.090309
$F3_a
[1] NaN
$F4_a
a b c d e
6.000000000 0.001259629 3.830705151 23.992442227 0.084647425
$F5_a
[1] NaN
$F1_b
[1] NaN
$F2_b
a b c d e
1.20412 0.00000 0.60206 1.20412 0.00000
$F3_b
[1] NaN
$F4_b
a b c d e
28 0 6 6 0
$F5_b
[1] NaN
Is it possible to keep only the elements containing no NaN's?
I have tried looping through it and saving only those where each element is FALSE to is.nan, but I found no way to keep the names. Have you got any suggestions?
Thanks a lot in advance.
We can use Filter. Based on the data, NaNs are only found in elements with length of 1.
Filter(function(x) !(is.nan(x) && length(x) == 1), lst1)
If we want to remove elements having any 'NaN' i.e. elements such as c(5, NaN) will be removed as well
Filter(function(x) !any(is.nan(x)), lst1)
data
lst1 <- list(F1 = NaN, F2 = NaN, F3 = NaN, F4 = NaN, F1_a = NaN, F2_a = c(a = 0.602,
b = -0.0903, c = -903))
You can use sapply to keep only those list elements that have only non-NA value in them.
lst[sapply(lst, function(x) all(!is.na(x)))]
purrr also has keep and discard functions for this :
purrr::keep(lst, ~all(!is.na(.x)))
purrr::discard(lst, ~any(is.na(.x)))

How can I determine if an array contains some element?

How can I tell if an array contains some element?
I have been manually checking with a loop:
for x in xs
if x == a
return true
end
end
return false
Is there a more idiomatic way?
The in operator will iterate over an array and check if some element exists:
julia> xs = [5, 9, 2, 3, 3, 8, 7]
julia> 8 in xs
true
julia> 1 in xs
false
It is important to remember that missing values can alter the behavior you might otherwise expect:
julia> 2 in [1, missing]
missing
in can be used on general collections. In particular, matrices:
julia> A = [1 4 7
2 5 8
3 6 9]
3×3 Array{Int64,2}:
1 4 7
2 5 8
3 6 9
julia> 7 in A
true
julia> 10 in A
false

newline sensitive interpretation of Arrays

When I add a newline to my array definition, the type of my array changes.
julia> a = [[1]]
1-element Array{Array{Int64,1},1}:
[1]
julia> a = [[1]
]
1-element Array{Int64,1}:
1
I thought they both should return the same result i.e. of type Array{Array{Int64,1},1}
In order to understand this see the following:
julia> :([[1]
])
:([[1];])
And you see that adding a newline is rewritten as vcat operation.
The reason for this is to allow writing something like this:
julia> x = [1 2
3 4]
2×2 Array{Int64,2}:
1 2
3 4
and your example is hitting a corner case of this syntax.
Note, however, that without an extra empty line vcat is not called:
julia> :([[1]
])
:([[1]])
Another use-case that is worth to know is:
julia> [[1, 2]
[3, 4]]
4-element Array{Int64,1}:
1
2
3
4
and the same with variables (can improve code readability in some cases):
julia> a = [1,2]
2-element Array{Int64,1}:
1
2
julia> b = [3, 4]
2-element Array{Int64,1}:
3
4
julia> [a
b]
4-element Array{Int64,1}:
1
2
3
4

Cumsum but with maximum number of datapoints

I'm looking to create a hybrid of cumsum() and TTR::runSum()where cumSum() runs up until a pre-specified number of datapoints, at which points it acts more like a runSum()
For example:
library(TTR)
data <- rep(1:3,2)
cumsum <- cumsum(data)
runSum <- runSum(data, n = 3)
DesiredResult <- ifelse(is.na(runSum),cumsum,runSum)
Is there a way to get to DesiredResult that doesn't require getting finangly with NAs?
That is what the partial=TRUE argument to rollapplyr does. Here we show this with sum and also with sd and IQR. (Note that the sd of one value is NA and we chose IQR since it is a measure of spread that can be calculated for scalars although it is always 0 in that case.)
library(zoo)
rollapplyr(data, 3, sum, partial = TRUE)
## [1] 1 3 6 6 6 6
rollapplyr(data, 3, sd, partial = TRUE)
## [1] NA 0.7071068 1.0000000 1.0000000 1.0000000 1.0000000
rollapplyr(data, 3, IQR, partial = TRUE)
## [1] 0.0 0.5 1.0 1.0 1.0 1.0
Here are three alternatives.
n <- 3
rowSums(embed(c(rep(0, n - 1), data), n)) # base R
# [1] 1 3 6 6 6 6
library(TTR)
runSum(c(rep(0, n - 1), data), n = n)
# [1] NA NA 1 3 6 6 6 6 # na.omit fixes the beginning
library(zoo)
rollsum(c(rep(0, n - 1), data), k = 3, align = "right")
# [1] 1 3 6 6 6 6

Resources