Error when performing an NA replacement in R 4.0 - r

With R 3.6 I can perform the following NA replacement
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
> d
a b
2021-03-03 1 1
With R 4.0 I get the following error:
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
Error in as.Date.default(e) :
do not know how to convert 'e' to class “Date”
Has some default behavior changed in R 4.0?
R 3.6 session info:
Microsoft Windows [Version 10.0.19041.804]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\>R --no-site-file
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Warning message:
package 'zoo' was built under R version 4.0.4
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
> d
a b
2021-03-03 1 1
R 4.0 session info:
Microsoft Windows [Version 10.0.19041.804]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\>R --no-site-file
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
Error in as.Date.default(e) :
do not know how to convert 'e' to class "Date"
Session Info (3.6):
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_3.6.1 grid_3.6.1 lattice_0.20-38
Session Info (4.0):
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_4.0.4 tools_4.0.4 grid_4.0.4 lattice_0.20-41

Thanks for raising this issue, it was a bug in the zoo package. In the [.zoo and [<-.zoo methods we checked whether the index i was a matrix via
if (all(class(i) == "matrix")) ...
This worked correctly in R 3.x.y because matrix objects just had the class "matrix". However, in R 4.0.0 matrix objects started additionally inheriting from "array". See: https://developer.R-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/.
In the zoo development version on R-Forge (https://R-Forge.R-project.org/R/?group_id=18) I have fixed the issue now by replacing the above code with
if (inherits(i, "matrix")) ...
So you can already install zoo 1.8-9 from R-Forge and your code will work again as intended. Alternatively, you can wait for that version to arrive on CRAN which will hopefully come out in the next days after reverse dependency checks. In the mean time you can work around the issue by using
coredata(d)[is.na(d)] <- 1

I'm having an issue with this as well!! here's more odd behavior/breadcrumbs as to what might be happening. Still not sure WHY but seems to be an indexing issue w/ zoo rather than is.na() specifically. Logical indexing works if the structure of the logical has the same rownames/indices as the zoo obj:
Printing out d[is.na(d)] (without assignment) results in an empty zoo object, suggesting the issue is w/ indexing
wrapping d in coredata() works
d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
coredata(d)[is.na(d)] <- 1
d
a b
> 2021-03-05 1 1
The logical returned by is.na() will work if it is transformed to have the same rownames/indices as the zoo obj.
d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
changes <- is.na(d) #storing logical in a variable
> d
a b
2021-03-05 NA 1
> changes #d[changes] won't work, so change rownames
a b
[1,] TRUE FALSE
> changes <- as.zoo(changes, index(d))
> changes
a b
2021-03-05 TRUE FALSE
> d[as.logical(changes)] #changing zoo back to a logical, returns something
a b
2021-03-05 NA 1
> d[as.logical(changes)] <- 1
a b
2021-03-05 NA 1
Now for the breadcrumbs...does anybody know what changes were made to R's date class in version 4? Zoo suggest it made some changes to merge.zoo "explicitly to work around the new behavior of c.Date() in R >= 4.1.0."
https://cran.r-project.org/web/packages/zoo/NEWS (see bullet point #2 at the top)
I've searched and searched and see no mention of those changes...
I'm guessing there were some changes to the zoo class to more strictly enforce date indexing...unsure...also can't quite seem to figure out to post an issue w/ zoo
more breadcrumbs....
apparently this functionality was working for zoo objects back in April of 2020 according to this thread https://github.com/joshuaulrich/xts/issues/331
R 4.0.1 came out later that month and newest zoo package, 1.8-8, was released 5/2020 so maybe running version 1.8-7 of this package could determine if it's a change w/ R or a change w/ zoo that's causing different behavior

#neilfws you are correct that the issue is due to the change in class response in R 4.0.
For now, the best option is to use either:
d <- na.fill(d, 1)
or
coredata(d)[is.na(d)] <- 1
The old use case of d[is.na(d)] <- 1 will require an update to the zoo package

Related

Addition of NA and expression that evaluates to NaN return different results depending on order, violation of the commutative property?

I am investigating corner cases of numeric operations in R. I came across the following particular case involving zero divided by zero:
(0/0)+NA
#> [1] NaN
NA+(0/0)
#> [1] NA
Created on 2021-07-10 by the reprex package (v2.0.0)
Session info
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.27 withr_2.4.2 magrittr_2.0.1 reprex_2.0.0
#> [5] evaluate_0.14 highr_0.9 stringi_1.6.2 rlang_0.4.11
#> [9] cli_3.0.0 rstudioapi_0.13 fs_1.5.0 rmarkdown_2.9
#> [13] tools_4.1.0 stringr_1.4.0 glue_1.4.2 xfun_0.23
#> [17] yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1 knitr_1.33
This clearly violates the commutative property of addition. I have two questions:
Is there an explanation of this behavior based on the R language definition?
Are there other examples of the violation of the commutative property of addition (including in other languages) that don't involve side effects in the addend sub-expressions?
Noting that
0/0
#[1] NaN
a more general example of the behavior of + in the question is the following:
NA + NaN
#[1] NA
NaN + NA
#[1] NaN
This is in a r-devel thread and R Core Team member Tomas Kalibera answers the following (my emphasis and link).
Yes, the performance overhead of fixing this at R level would be too
large and it would complicate the code significantly. The result of
binary operations involving NA and NaN is hardware dependent (the
propagation of NaN payload) - on some hardware, it actually works the
way we would like - NA is returned - but on some hardware you get NaN or
sometimes NA and sometimes NaN. Also there are C compiler optimizations
re-ordering code, as mentioned in ?NaN. Then there are also external
numerical libraries that do not distinguish NA from NaN (NA is an R
concept). So I am afraid this is unfixable. The disclaimer mentioned by
Duncan is in ?NaN/?NA, which I think is ok - there are so many numerical
functions through which one might run into these problems that it would
be infeasible to document them all. Some functions in fact will preserve
NA, and we would not let NA turn into NaN unnecessarily, but the
disclaimer says it is something not to depend on.
According to ?NA, this could be because of NaN resulted from 0/0
Numerical computations using NA will normally result in NA: a possible exception is where NaN is also involved, in which case either might result (which may depend on the R platform). However, this is not guaranteed and future CPUs and/or compilers may behave differently. Dynamic binary translation may also impact this behavior (with valgrind, computations using NA may result in NaN even when no NaN is involved).

Mclust freezes with small dataset

I am trying to use the Mclust() function from the R-package mclust on a dataset with 500 observations and 2 variables, and I want to identify 2 clusters.
> head(data)
x y
1 0.9929185 -1.9662945
2 8.2259360 -0.7240049
3 3.3866952 -1.8054764
4 -0.5161490 -2.3096992
5 1.8931073 -1.8928091
6 4.0833228 -1.9045669
> Mclust(data, G = 2)
fitting ...
|=============================================================== | 67%
This should produce an output relatively quickly, but freezes at 67%.
I ran this function multiple times over different datasets, and had no problems whatsoever. It even works if I only include observations up to row 498, but fails as soon as row 499+ is included.
498 -1.710175250 -1.612248596
499 -5.666497204 5.565422240
500 -3.649579976 1.552779499
I have uploaded the whole dataset in my GitHub repository: https://github.com/fstermann/bthesis/tree/main/MclustFreeze
I would greatly appreciate if anyone has an idea why this is happing with this specific dataset.
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mclust_5.4.7
loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5

Knitr and data.table

I have an automated report that i produce using knitr. i'm running across the oddest problem. I wrote a function that sums the data by month for several locations. when i run this function in R i get the following result (which is correct):
###NAME MONTH VOL
###1: TOTAL 1 13.00872
###2: TOTAL 2 11.62527
###3: TOTAL 3 12.71313
###4: TOTAL 4 12.67269
###5: TOTAL 5 15.05127
###6: TOTAL 6 14.61002
###7: TOTAL 7 15.43827
###8: TOTAL 8 15.22400
###9: TOTAL 9 14.91259
###10: TOTAL 10 15.83505
###11: TOTAL 11 14.97242
###12: TOTAL 12 16.34950
when i run this same function (no changes) through knitr to produce the report i get the following result:
###NAME MONTH VOL
###1: TOTAL 1 14.00872
###2: TOTAL 2 13.62527
###3: TOTAL 3 15.71313
###4: TOTAL 4 16.11338
###5: TOTAL 5 17.61269
###6: TOTAL 6 18.46945
###7: TOTAL 7 20.18851
###8: TOTAL 8 21.04382
###9: TOTAL 9 21.72287
###10: TOTAL 10 23.54272
###11: TOTAL 11 23.72971
###12: TOTAL 12 26.03293
i also have another table where knitr just prints non-sense even though the table has actual values in it.
Here is my session info:
R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lubridate_1.3.3 xtable_1.7-4 shape_1.4.2 reshape2_1.4.1 rgdal_0.9-2 raster_2.2-12
[7] sp_1.0-17 png_0.1-7 data.table_1.9.2
loaded via a namespace (and not attached): [1] digest_0.6.8 evaluate_0.5.5 formatR_1.1 grid_3.1.2 knitr_1.9 lattice_0.20-29 memoise_0.2.1
[8] packrat_0.4.3 plyr_1.8.1 Rcpp_0.11.5 stringr_0.6.2 tools_3.1.2
UPDATE
i pinpointed the problem on at least one of the tables that this error occurs. The problem was the setnames function and the key merge feature of data.table.
when the merge happens R recognizes duplicate column names using a ".1" notation (i.e., if table1 and table 2 both have columns names CHEM then TABLE = table1[table2] has columns named CHEM and CHEM.1) whereas knitr is transforming them into CHEM and i.CHEM. to fix this, i originally used the code setnames(TABLE,names(TABLE),c(New column names)). but this didn't recognize the names(TABLE) in the correct order so i was renaming the wrong columns. but this error only happened when it was passed through knitr. when i ran this code through R alone it worked properly. What is the diconnect between knitr and data.table?
I will work on getting an example code up but as it stands the code would need to be simplified to make posting an example helpful.

nodesize parameter ignored in randomForest package

Does the randomForest package ignore the nodesize parameter? When I predict the terminal nodes for a dataset and check the counts, I see values that are less than the nodesize. I would submit a fix for this myself but the underlying code was written in Fortran. If someone can confirm this behavior I will reach out to the package maintainer and hopefully start a fix.
> library(randomForest)
> set.seed(1)
> rf <- randomForest(mtcars[,-1], mtcars[,1], nodesize = 5)
> nodes <- attr(predict(rf, mtcars[,-1], nodes = TRUE), 'nodes')
# node counts of first tree
> table(nodes[,1])
# first row is the terminal node ID#, second row is the count
2 6 9 10 11 14 15 16 18 19
5 3 3 6 4 2 3 1 3 2
Adding system info:
Session info----------------------------------------------------------------
setting value
version R version 3.1.1 (2014-07-10)
system x86_64, mingw32
ui RStudio (0.98.1049)
language (EN)
collate English_United States.1252
tz America/Chicago
Packages--------------------------------------------------------------------
package * version date source
randomForest * 4.6.10 2014-07-17 CRAN (R 3.1.1)
Response from package maintainer:
That parameter behaves as the way that Leo Breiman intended. The bug
is in how the parameter was described. It’s the same as minsplit in
the rpart:::rpart.control() function:
the minimum number of observations that must exist in a node in order
for a split to be attempted.
I will change the description in the help file in the next version to
resolve this confusion.
Best, Andy

"Error in colnames" when merging xts sets

I am trying to make an irregular multivariate time series regular. I am doing this by merging the irregular time series (one measure every 7 days) with a regular "NA" filled time series (daily measures) as suggested by:
- Joshua Ulrich here.
- Dirk Eddelbuettel here.
When I try this method for multivariate time series, I get the error:
"Error in colnames<-(*tmp*, value = c("C.1", "C.2", "C.1.1", "C.2.1" : length of 'dimnames' [2] not equal to array extent"
My question is 2 fold:
How can I merge these two xts data sets without getting this error?
Is there a "better" way of making an irregular multivariate time series regular? I guess I was expecting to find a method in the xts package, but could not find one.
Code to Reproduce Error:
require(xts)
set.seed(42)
# make irregular index
irr_index <- seq(from=as.Date("2010-01-19"), length.out=10, by=7)
# make irregular xts
irr_xts <- xts( x= matrix( data= rnorm(20), ncol= 2,
dimnames= list(c(1:length(irr_index)),
c("C.1", "C.2"))),
order.by= irr_index)
# make regular index
reg_index <- seq(from=as.Date(start(irr_xts)), to=as.Date(end(irr_xts)), by=1)
empty <- xts(matrix(data = NA,
nrow = length(reg_index),
ncol = ncol(irr_xts)),
reg_index )
reg_xts <- na.fill(merge(irr_xts, empty), fill=0)
In practice my real data are sporadic, sometimes daily, sometimes skipping several days. My approach is to normalize all data to 1 observation per day with 0 for days with missing values.
Thanks in advance.
EDIT:
Here is my sessionInfo() as requested:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.9-7 zoo_1.7-10
loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-24 tools_3.0.2
This works fine for me, I just follow Joshua Ulrich link :
empty <- xts(,reg_index ) ## No need to set coredata to create empty xts
merge(irr_xts, empty, fill=0)
C.1 C.2
2010-01-19 1.370958 1.30487
2010-01-20 0.000000 0.00000
2010-01-21 0.000000 0.00000
2010-01-22 0.000000 0.00000
2010-01-23 0.000000 0.00000
2010-01-24 0.000000 0.00000
.....

Resources