I am trying to make an irregular multivariate time series regular. I am doing this by merging the irregular time series (one measure every 7 days) with a regular "NA" filled time series (daily measures) as suggested by:
- Joshua Ulrich here.
- Dirk Eddelbuettel here.
When I try this method for multivariate time series, I get the error:
"Error in colnames<-(*tmp*, value = c("C.1", "C.2", "C.1.1", "C.2.1" : length of 'dimnames' [2] not equal to array extent"
My question is 2 fold:
How can I merge these two xts data sets without getting this error?
Is there a "better" way of making an irregular multivariate time series regular? I guess I was expecting to find a method in the xts package, but could not find one.
Code to Reproduce Error:
require(xts)
set.seed(42)
# make irregular index
irr_index <- seq(from=as.Date("2010-01-19"), length.out=10, by=7)
# make irregular xts
irr_xts <- xts( x= matrix( data= rnorm(20), ncol= 2,
dimnames= list(c(1:length(irr_index)),
c("C.1", "C.2"))),
order.by= irr_index)
# make regular index
reg_index <- seq(from=as.Date(start(irr_xts)), to=as.Date(end(irr_xts)), by=1)
empty <- xts(matrix(data = NA,
nrow = length(reg_index),
ncol = ncol(irr_xts)),
reg_index )
reg_xts <- na.fill(merge(irr_xts, empty), fill=0)
In practice my real data are sporadic, sometimes daily, sometimes skipping several days. My approach is to normalize all data to 1 observation per day with 0 for days with missing values.
Thanks in advance.
EDIT:
Here is my sessionInfo() as requested:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.9-7 zoo_1.7-10
loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-24 tools_3.0.2
This works fine for me, I just follow Joshua Ulrich link :
empty <- xts(,reg_index ) ## No need to set coredata to create empty xts
merge(irr_xts, empty, fill=0)
C.1 C.2
2010-01-19 1.370958 1.30487
2010-01-20 0.000000 0.00000
2010-01-21 0.000000 0.00000
2010-01-22 0.000000 0.00000
2010-01-23 0.000000 0.00000
2010-01-24 0.000000 0.00000
.....
Related
I am trying to use the Mclust() function from the R-package mclust on a dataset with 500 observations and 2 variables, and I want to identify 2 clusters.
> head(data)
x y
1 0.9929185 -1.9662945
2 8.2259360 -0.7240049
3 3.3866952 -1.8054764
4 -0.5161490 -2.3096992
5 1.8931073 -1.8928091
6 4.0833228 -1.9045669
> Mclust(data, G = 2)
fitting ...
|=============================================================== | 67%
This should produce an output relatively quickly, but freezes at 67%.
I ran this function multiple times over different datasets, and had no problems whatsoever. It even works if I only include observations up to row 498, but fails as soon as row 499+ is included.
498 -1.710175250 -1.612248596
499 -5.666497204 5.565422240
500 -3.649579976 1.552779499
I have uploaded the whole dataset in my GitHub repository: https://github.com/fstermann/bthesis/tree/main/MclustFreeze
I would greatly appreciate if anyone has an idea why this is happing with this specific dataset.
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mclust_5.4.7
loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5
With R 3.6 I can perform the following NA replacement
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
> d
a b
2021-03-03 1 1
With R 4.0 I get the following error:
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
Error in as.Date.default(e) :
do not know how to convert 'e' to class “Date”
Has some default behavior changed in R 4.0?
R 3.6 session info:
Microsoft Windows [Version 10.0.19041.804]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\>R --no-site-file
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Warning message:
package 'zoo' was built under R version 4.0.4
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
> d
a b
2021-03-03 1 1
R 4.0 session info:
Microsoft Windows [Version 10.0.19041.804]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\>R --no-site-file
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
> d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
> d[is.na(d)] <- 1
Error in as.Date.default(e) :
do not know how to convert 'e' to class "Date"
Session Info (3.6):
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_3.6.1 grid_3.6.1 lattice_0.20-38
Session Info (4.0):
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_4.0.4 tools_4.0.4 grid_4.0.4 lattice_0.20-41
Thanks for raising this issue, it was a bug in the zoo package. In the [.zoo and [<-.zoo methods we checked whether the index i was a matrix via
if (all(class(i) == "matrix")) ...
This worked correctly in R 3.x.y because matrix objects just had the class "matrix". However, in R 4.0.0 matrix objects started additionally inheriting from "array". See: https://developer.R-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/.
In the zoo development version on R-Forge (https://R-Forge.R-project.org/R/?group_id=18) I have fixed the issue now by replacing the above code with
if (inherits(i, "matrix")) ...
So you can already install zoo 1.8-9 from R-Forge and your code will work again as intended. Alternatively, you can wait for that version to arrive on CRAN which will hopefully come out in the next days after reverse dependency checks. In the mean time you can work around the issue by using
coredata(d)[is.na(d)] <- 1
I'm having an issue with this as well!! here's more odd behavior/breadcrumbs as to what might be happening. Still not sure WHY but seems to be an indexing issue w/ zoo rather than is.na() specifically. Logical indexing works if the structure of the logical has the same rownames/indices as the zoo obj:
Printing out d[is.na(d)] (without assignment) results in an empty zoo object, suggesting the issue is w/ indexing
wrapping d in coredata() works
d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
coredata(d)[is.na(d)] <- 1
d
a b
> 2021-03-05 1 1
The logical returned by is.na() will work if it is transformed to have the same rownames/indices as the zoo obj.
d <- zoo(data.frame(a = NA, b = 1), Sys.Date())
changes <- is.na(d) #storing logical in a variable
> d
a b
2021-03-05 NA 1
> changes #d[changes] won't work, so change rownames
a b
[1,] TRUE FALSE
> changes <- as.zoo(changes, index(d))
> changes
a b
2021-03-05 TRUE FALSE
> d[as.logical(changes)] #changing zoo back to a logical, returns something
a b
2021-03-05 NA 1
> d[as.logical(changes)] <- 1
a b
2021-03-05 NA 1
Now for the breadcrumbs...does anybody know what changes were made to R's date class in version 4? Zoo suggest it made some changes to merge.zoo "explicitly to work around the new behavior of c.Date() in R >= 4.1.0."
https://cran.r-project.org/web/packages/zoo/NEWS (see bullet point #2 at the top)
I've searched and searched and see no mention of those changes...
I'm guessing there were some changes to the zoo class to more strictly enforce date indexing...unsure...also can't quite seem to figure out to post an issue w/ zoo
more breadcrumbs....
apparently this functionality was working for zoo objects back in April of 2020 according to this thread https://github.com/joshuaulrich/xts/issues/331
R 4.0.1 came out later that month and newest zoo package, 1.8-8, was released 5/2020 so maybe running version 1.8-7 of this package could determine if it's a change w/ R or a change w/ zoo that's causing different behavior
#neilfws you are correct that the issue is due to the change in class response in R 4.0.
For now, the best option is to use either:
d <- na.fill(d, 1)
or
coredata(d)[is.na(d)] <- 1
The old use case of d[is.na(d)] <- 1 will require an update to the zoo package
I am working with time data, and I covert it to POSIXct class (read as strings). When I do this it work with all my data but no with one specific string. What I do is in essences:
Time1 <- '1900-04-01' # First Year then Month then Day
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d')
I do this vectorized and all my data is well converted. But with the date 1920-05-01
Time1 <- '1920-05-01'
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d' )
This return NA. I have no idea why this happens. If I add to the as.POSIXct function tz = 'GMT'; the time is well convert for all values. What I do not understand is why this happen and why this happen with this specific value when I have tried with more than 1500 different times values.
I add an image of the output:
More code added:
for( m in c(01,02,03,04,05,06,07,08,09,10,11,12)){
print(as.POSIXct(paste0('1920-',m,'-01'),format='%Y-%m-%d'))
}
and the output is:
[1] "1920-01-01 CMT"
[1] "1920-02-01 CMT"
[1] "1920-03-01 CMT"
[1] "1920-04-01 CMT"
[1] NA
[1] "1920-06-01 -04"
[1] "1920-07-01 -04"
[1] "1920-08-01 -04"
[1] "1920-09-01 -04"
[1] "1920-10-01 -04"
[1] "1920-11-01 -04"
[1] "1920-12-01 -04"
Output of sessionInfo():
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
locale:
[1] LC_CTYPE=es_AR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_AR.UTF-8 LC_COLLATE=es_AR.UTF-8
[5] LC_MONETARY=es_AR.UTF-8 LC_MESSAGES=es_AR.UTF-8
[7] LC_PAPER=es_AR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.3.3
Your local settings appear to be based in Argentina. As it happens, Argentina reset their time zone on that date from UTC-4:16:48 to UTC-4. I think this means that there wasn't a midnight in Argentina on May 5, 1920. When you convert that string to POSIXct, it interprets it at midnight that day in your local time zone, which by coincidence is a time that did not exist in Argentina. (This explains why it was not reproducible for others who tried the same code.)
http://www.statoids.com/tar.html
Locations in Argentina observed Local Mean Time until 1894-10-31 00:00
(as measured after the transition). At that moment, the entire country
synchronized on Córdoba's Local Mean Time, which was UTC-4:16:48. The
next transition occurred at 1920-05-01 00:00, when clocks were set
ahead sixteen minutes and forty-eight seconds to be an even UTC-4.
Argentina remained unified on UTC-4 until its first daylight saving
time was inaugurated in 1931.
If you need a POSIXct object, you might consider:
a) specifying a different time zone where midnight existed on that day.
as.POSIXct("1920-05-01", tz = "UTC")
# Or perhaps other nearby time zones didn't have that specific problem?
b) Storing the time in components, including one for date, and one for time within the day. e.g. time = hour(Time1) + minute(Time1)/60. It's a little unwieldy but it might be possible to perform the date / time calcs you need.
I would like to know if it is possible to make a heatmap with 200k rows? I have a matrix with genomic coordinates as rows and each column represents the presence or absence of that region in each patients. so I have 15 patient (columns). Now when I am trying heatmap.2 or pheatmap I get the memory allocation problem, how will I be able to use the entire matrix to generate a heatmap. The values of my matrix are jsut 0 and 1 and I want to draw some hypothesis based on the heatmap. How will I be able to use it. I tried once on my laptop but it does not help , so I tried in our linux cluster which is quite powerful. But still shows the below error. How can I resolve this problem. I also add the sessionInfo() . Any workaround is appreciated
data<-read.delim("path/H3_marks_map.txt",sep="\t", row.names=1)
pdf(file="path/map/H3_maps1.pdf")
pheatmap(data,scale="none")
Error: cannot allocate vector of size 170.5 Gb
dev.off()
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=C LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 pheatmap_1.0.7
loaded via a namespace (and not attached):
[1] colorspace_1.2-6 grid_3.1.2 gtable_0.1.2 munsell_0.4.2
[5] plyr_1.8.3 Rcpp_0.12.1 scales_0.3.0 tools_3.1.2
I have an automated report that i produce using knitr. i'm running across the oddest problem. I wrote a function that sums the data by month for several locations. when i run this function in R i get the following result (which is correct):
###NAME MONTH VOL
###1: TOTAL 1 13.00872
###2: TOTAL 2 11.62527
###3: TOTAL 3 12.71313
###4: TOTAL 4 12.67269
###5: TOTAL 5 15.05127
###6: TOTAL 6 14.61002
###7: TOTAL 7 15.43827
###8: TOTAL 8 15.22400
###9: TOTAL 9 14.91259
###10: TOTAL 10 15.83505
###11: TOTAL 11 14.97242
###12: TOTAL 12 16.34950
when i run this same function (no changes) through knitr to produce the report i get the following result:
###NAME MONTH VOL
###1: TOTAL 1 14.00872
###2: TOTAL 2 13.62527
###3: TOTAL 3 15.71313
###4: TOTAL 4 16.11338
###5: TOTAL 5 17.61269
###6: TOTAL 6 18.46945
###7: TOTAL 7 20.18851
###8: TOTAL 8 21.04382
###9: TOTAL 9 21.72287
###10: TOTAL 10 23.54272
###11: TOTAL 11 23.72971
###12: TOTAL 12 26.03293
i also have another table where knitr just prints non-sense even though the table has actual values in it.
Here is my session info:
R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lubridate_1.3.3 xtable_1.7-4 shape_1.4.2 reshape2_1.4.1 rgdal_0.9-2 raster_2.2-12
[7] sp_1.0-17 png_0.1-7 data.table_1.9.2
loaded via a namespace (and not attached): [1] digest_0.6.8 evaluate_0.5.5 formatR_1.1 grid_3.1.2 knitr_1.9 lattice_0.20-29 memoise_0.2.1
[8] packrat_0.4.3 plyr_1.8.1 Rcpp_0.11.5 stringr_0.6.2 tools_3.1.2
UPDATE
i pinpointed the problem on at least one of the tables that this error occurs. The problem was the setnames function and the key merge feature of data.table.
when the merge happens R recognizes duplicate column names using a ".1" notation (i.e., if table1 and table 2 both have columns names CHEM then TABLE = table1[table2] has columns named CHEM and CHEM.1) whereas knitr is transforming them into CHEM and i.CHEM. to fix this, i originally used the code setnames(TABLE,names(TABLE),c(New column names)). but this didn't recognize the names(TABLE) in the correct order so i was renaming the wrong columns. but this error only happened when it was passed through knitr. when i ran this code through R alone it worked properly. What is the diconnect between knitr and data.table?
I will work on getting an example code up but as it stands the code would need to be simplified to make posting an example helpful.