I am working with time data, and I covert it to POSIXct class (read as strings). When I do this it work with all my data but no with one specific string. What I do is in essences:
Time1 <- '1900-04-01' # First Year then Month then Day
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d')
I do this vectorized and all my data is well converted. But with the date 1920-05-01
Time1 <- '1920-05-01'
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d' )
This return NA. I have no idea why this happens. If I add to the as.POSIXct function tz = 'GMT'; the time is well convert for all values. What I do not understand is why this happen and why this happen with this specific value when I have tried with more than 1500 different times values.
I add an image of the output:
More code added:
for( m in c(01,02,03,04,05,06,07,08,09,10,11,12)){
print(as.POSIXct(paste0('1920-',m,'-01'),format='%Y-%m-%d'))
}
and the output is:
[1] "1920-01-01 CMT"
[1] "1920-02-01 CMT"
[1] "1920-03-01 CMT"
[1] "1920-04-01 CMT"
[1] NA
[1] "1920-06-01 -04"
[1] "1920-07-01 -04"
[1] "1920-08-01 -04"
[1] "1920-09-01 -04"
[1] "1920-10-01 -04"
[1] "1920-11-01 -04"
[1] "1920-12-01 -04"
Output of sessionInfo():
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
locale:
[1] LC_CTYPE=es_AR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_AR.UTF-8 LC_COLLATE=es_AR.UTF-8
[5] LC_MONETARY=es_AR.UTF-8 LC_MESSAGES=es_AR.UTF-8
[7] LC_PAPER=es_AR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.3.3
Your local settings appear to be based in Argentina. As it happens, Argentina reset their time zone on that date from UTC-4:16:48 to UTC-4. I think this means that there wasn't a midnight in Argentina on May 5, 1920. When you convert that string to POSIXct, it interprets it at midnight that day in your local time zone, which by coincidence is a time that did not exist in Argentina. (This explains why it was not reproducible for others who tried the same code.)
http://www.statoids.com/tar.html
Locations in Argentina observed Local Mean Time until 1894-10-31 00:00
(as measured after the transition). At that moment, the entire country
synchronized on Córdoba's Local Mean Time, which was UTC-4:16:48. The
next transition occurred at 1920-05-01 00:00, when clocks were set
ahead sixteen minutes and forty-eight seconds to be an even UTC-4.
Argentina remained unified on UTC-4 until its first daylight saving
time was inaugurated in 1931.
If you need a POSIXct object, you might consider:
a) specifying a different time zone where midnight existed on that day.
as.POSIXct("1920-05-01", tz = "UTC")
# Or perhaps other nearby time zones didn't have that specific problem?
b) Storing the time in components, including one for date, and one for time within the day. e.g. time = hour(Time1) + minute(Time1)/60. It's a little unwieldy but it might be possible to perform the date / time calcs you need.
Related
I am taking my first steps in R. Have a series of exercises, but one is especially dificult for me. I need to create an series like this:
a1, b10, c100, d1000, ..., j1000000000
Combining following letters with numbers is not an issue, but how generate series of numbers, where each of them is previous one multiplied by 10?
I have gut feeling that it is not hard at all but I can't figure it out.
we can use ^.
We may have to use options(scipen=999) to avoid incorporating scientific notation into the final character vector, if we have large numbers. some floating point issues may arise if the numbers get too high.
options(scipen=999)
paste0(letters, 10^(0:25))
[1] "a1" "b10"
[3] "c100" "d1000"
[5] "e10000" "f100000"
[7] "g1000000" "h10000000"
[9] "i100000000" "j1000000000"
[11] "k10000000000" "l100000000000"
[13] "m1000000000000" "n10000000000000"
[15] "o100000000000000" "p1000000000000000"
[17] "q10000000000000000" "r100000000000000000"
[19] "s1000000000000000000" "t10000000000000000000"
[21] "u100000000000000000000" "v1000000000000000000000"
[23] "w10000000000000000000000" "x100000000000000008388608"
[25] "y 999999999999999983222784" "z10000000000000000905969664"
Use
10^(0:n)
For example
> 10^(0:3)
[1] 1 10 100 1000
Or even use a better way
x<-10^(0:3)
names(x)<-paste0("a_",0:3)
> x
a_0 a_1 a_2 a_3
1 10 100 1000
I have run the exact same R script on two different machines and have gotten different results. I don't understand why...
Consider this integer as seconds since 1990-01-01 00:00:00 GMT+10
int <- 817779600
I need to convert this to a POSIXct object and to do so I run the following R script on a computer running "OSX 10.11.4":
> time <- as.POSIXct(as.POSIXlt(int, origin='1990-01-01 00:00:00',
tz='GMT'), tz='Australia/Sydney')
> time
[1] "2015-12-01 01:00:00 AEDT"
> as.integer(time)
[1] 1448892000
Now if I run the exact same code on another computer running "Ubuntu 16.04.1 LTS" I get a different result:
> time <- as.POSIXct(as.POSIXlt(int, origin='1990-01-01 00:00:00',
tz='GMT'), tz='Australia/Sydney')
> time
[1] "2015-12-01 02:00:00 AEDT"
> as.integer(time)
[1] 1448895600
I know the first value to be correct. Any idea as to what may be causing the discrepancy? Running date in bash on both machines gives the same result. Also is anyone able to reproduce this error?
Thanks in advance.
I would like to know if it is possible to make a heatmap with 200k rows? I have a matrix with genomic coordinates as rows and each column represents the presence or absence of that region in each patients. so I have 15 patient (columns). Now when I am trying heatmap.2 or pheatmap I get the memory allocation problem, how will I be able to use the entire matrix to generate a heatmap. The values of my matrix are jsut 0 and 1 and I want to draw some hypothesis based on the heatmap. How will I be able to use it. I tried once on my laptop but it does not help , so I tried in our linux cluster which is quite powerful. But still shows the below error. How can I resolve this problem. I also add the sessionInfo() . Any workaround is appreciated
data<-read.delim("path/H3_marks_map.txt",sep="\t", row.names=1)
pdf(file="path/map/H3_maps1.pdf")
pheatmap(data,scale="none")
Error: cannot allocate vector of size 170.5 Gb
dev.off()
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=C LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 pheatmap_1.0.7
loaded via a namespace (and not attached):
[1] colorspace_1.2-6 grid_3.1.2 gtable_0.1.2 munsell_0.4.2
[5] plyr_1.8.3 Rcpp_0.12.1 scales_0.3.0 tools_3.1.2
I have an automated report that i produce using knitr. i'm running across the oddest problem. I wrote a function that sums the data by month for several locations. when i run this function in R i get the following result (which is correct):
###NAME MONTH VOL
###1: TOTAL 1 13.00872
###2: TOTAL 2 11.62527
###3: TOTAL 3 12.71313
###4: TOTAL 4 12.67269
###5: TOTAL 5 15.05127
###6: TOTAL 6 14.61002
###7: TOTAL 7 15.43827
###8: TOTAL 8 15.22400
###9: TOTAL 9 14.91259
###10: TOTAL 10 15.83505
###11: TOTAL 11 14.97242
###12: TOTAL 12 16.34950
when i run this same function (no changes) through knitr to produce the report i get the following result:
###NAME MONTH VOL
###1: TOTAL 1 14.00872
###2: TOTAL 2 13.62527
###3: TOTAL 3 15.71313
###4: TOTAL 4 16.11338
###5: TOTAL 5 17.61269
###6: TOTAL 6 18.46945
###7: TOTAL 7 20.18851
###8: TOTAL 8 21.04382
###9: TOTAL 9 21.72287
###10: TOTAL 10 23.54272
###11: TOTAL 11 23.72971
###12: TOTAL 12 26.03293
i also have another table where knitr just prints non-sense even though the table has actual values in it.
Here is my session info:
R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lubridate_1.3.3 xtable_1.7-4 shape_1.4.2 reshape2_1.4.1 rgdal_0.9-2 raster_2.2-12
[7] sp_1.0-17 png_0.1-7 data.table_1.9.2
loaded via a namespace (and not attached): [1] digest_0.6.8 evaluate_0.5.5 formatR_1.1 grid_3.1.2 knitr_1.9 lattice_0.20-29 memoise_0.2.1
[8] packrat_0.4.3 plyr_1.8.1 Rcpp_0.11.5 stringr_0.6.2 tools_3.1.2
UPDATE
i pinpointed the problem on at least one of the tables that this error occurs. The problem was the setnames function and the key merge feature of data.table.
when the merge happens R recognizes duplicate column names using a ".1" notation (i.e., if table1 and table 2 both have columns names CHEM then TABLE = table1[table2] has columns named CHEM and CHEM.1) whereas knitr is transforming them into CHEM and i.CHEM. to fix this, i originally used the code setnames(TABLE,names(TABLE),c(New column names)). but this didn't recognize the names(TABLE) in the correct order so i was renaming the wrong columns. but this error only happened when it was passed through knitr. when i ran this code through R alone it worked properly. What is the diconnect between knitr and data.table?
I will work on getting an example code up but as it stands the code would need to be simplified to make posting an example helpful.
I am trying to make an irregular multivariate time series regular. I am doing this by merging the irregular time series (one measure every 7 days) with a regular "NA" filled time series (daily measures) as suggested by:
- Joshua Ulrich here.
- Dirk Eddelbuettel here.
When I try this method for multivariate time series, I get the error:
"Error in colnames<-(*tmp*, value = c("C.1", "C.2", "C.1.1", "C.2.1" : length of 'dimnames' [2] not equal to array extent"
My question is 2 fold:
How can I merge these two xts data sets without getting this error?
Is there a "better" way of making an irregular multivariate time series regular? I guess I was expecting to find a method in the xts package, but could not find one.
Code to Reproduce Error:
require(xts)
set.seed(42)
# make irregular index
irr_index <- seq(from=as.Date("2010-01-19"), length.out=10, by=7)
# make irregular xts
irr_xts <- xts( x= matrix( data= rnorm(20), ncol= 2,
dimnames= list(c(1:length(irr_index)),
c("C.1", "C.2"))),
order.by= irr_index)
# make regular index
reg_index <- seq(from=as.Date(start(irr_xts)), to=as.Date(end(irr_xts)), by=1)
empty <- xts(matrix(data = NA,
nrow = length(reg_index),
ncol = ncol(irr_xts)),
reg_index )
reg_xts <- na.fill(merge(irr_xts, empty), fill=0)
In practice my real data are sporadic, sometimes daily, sometimes skipping several days. My approach is to normalize all data to 1 observation per day with 0 for days with missing values.
Thanks in advance.
EDIT:
Here is my sessionInfo() as requested:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.9-7 zoo_1.7-10
loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-24 tools_3.0.2
This works fine for me, I just follow Joshua Ulrich link :
empty <- xts(,reg_index ) ## No need to set coredata to create empty xts
merge(irr_xts, empty, fill=0)
C.1 C.2
2010-01-19 1.370958 1.30487
2010-01-20 0.000000 0.00000
2010-01-21 0.000000 0.00000
2010-01-22 0.000000 0.00000
2010-01-23 0.000000 0.00000
2010-01-24 0.000000 0.00000
.....