As.XTS from Matrix - Error - Adds time and timezone info - r

For some reason I do not understand, when I run as.xts to convert from a matrix with a date in rownames, this operation will generate a Date Time in the end. Since this is different from the start indexes merge/cbinds will not work.
Can someone point me what am I doing wrong?
> class(x)
[1] "xts" "zoo"
> head(x)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 0.042255791 0.017219585 0.17841600 0.010806168 0.04960026
2005-08-31 0.034117087 0.009951766 0.18476766 0.015245222 0.03825968
2005-09-30 -0.029594066 0.008697349 0.22851906 0.009769765 0.02944754
2005-10-31 -0.015653740 0.019966664 0.09314327 -0.012705172 0.01640395
2005-11-30 -0.005593003 0.005932542 0.05437377 -0.005209811 0.03173972
2005-12-31 0.005084193 0.021293537 0.05672958 0.002592639 0.04045477
> head(index(x))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> temp=t(apply(-x, 1, rank, na.last = "keep"))
> class(temp)
[1] "matrix"
> head(temp)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(rownames(temp))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> y=as.xts(temp)
> class(y)
[1] "xts" "zoo"
> head(y)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(index(y))
[1] "2005-07-31 BST" "2005-08-31 BST" "2005-09-30 BST" "2005-10-31 GMT" "2005-11-30 GMT" "2005-12-31 GMT"

as.xts.matrix has a dateFormat argument that defaults to "POSIXct", so it assumes the rownames of your matrix are datetimes. If you want them to simply be dates, specify dateFormat="Date" in your as.xts call.
y <- as.xts(temp, dateFormat="Date")

Related

The smallest date of `v` which makes the difference `w-v` positive

From these vectors of dates
v<-c("2019-12-06 01:32:30 UTC","2019-12-31 18:44:31 UTC","2020-01-29 22:18:25 UTC","2020-03-22 16:44:29 UTC")
v<-as.POSIXct(v)
w<-c("2019-12-07 00:11:46","2020-01-01 05:29:45","2019-12-08 02:54:10","2020-03-23 07:48:26","2020-02-02 16:58:16","2020-01-31 06:46:46")
w<-as.POSIXct(w)
I would like to obtain a dataframe of two columns. One of them is just w. The second is built on v entries so that in the row there is the smallest date of v which makes the difference w-v positive. For example, the difference
w-rep(v[1],length(w))
Time differences in hours
[1] 22.65444 627.95417 49.36111 2598.26556 1407.42944 1349.23778
Then, if the second column of the desired dataframe is w, then the first one has at the first row the date 2019-12-06 01:32:30 UTC. The operation should be:
date <- w-rep(v[1],length(w))
v[date==min(date[date>0])]
Then the first row of the dataframe should be
2019-12-06 01:32:30 UTC, 2019-12-07 00:11:46
How could I build the others row wihtout using loops?
How about this:
o <- outer(w, v, `-`)
o
# Time differences in hours
# [,1] [,2] [,3] [,4]
# [1,] 22.65444 -594.54583 -1294.11083 -2559.54528
# [2,] 627.95417 10.75389 -688.81111 -1954.24556
# [3,] 49.36111 -567.83917 -1267.40417 -2532.83861
# [4,] 2597.26556 1980.06528 1280.50028 15.06583
# [5,] 1407.42944 790.22917 90.66417 -1174.77028
# [6,] 1349.23778 732.03750 32.47250 -1232.96194
We don't want negative values, so
o[o < 0] <- NA
o
# Time differences in hours
# [,1] [,2] [,3] [,4]
# [1,] 22.65444 NA NA NA
# [2,] 627.95417 10.75389 NA NA
# [3,] 49.36111 NA NA NA
# [4,] 2597.26556 1980.06528 1280.50028 15.06583
# [5,] 1407.42944 790.22917 90.66417 NA
# [6,] 1349.23778 732.03750 32.47250 NA
Now just apply which.min on each row, then subset v on this value:
apply(o, 1, which.min)
# [1] 1 2 1 4 3 3
v[apply(o, 1, which.min)]
# [1] "2019-12-06 01:32:30 EST" "2019-12-31 18:44:31 EST" "2019-12-06 01:32:30 EST" "2020-03-22 16:44:29 EDT"
# [5] "2020-01-29 22:18:25 EST" "2020-01-29 22:18:25 EST"
data.frame(w=w, v2=v[apply(o, 1, which.min)])
# w v2
# 1 2019-12-07 00:11:46 2019-12-06 01:32:30
# 2 2020-01-01 05:29:45 2019-12-31 18:44:31
# 3 2019-12-08 02:54:10 2019-12-06 01:32:30
# 4 2020-03-23 07:48:26 2020-03-22 16:44:29
# 5 2020-02-02 16:58:16 2020-01-29 22:18:25
# 6 2020-01-31 06:46:46 2020-01-29 22:18:25

Compute distance between multiple sets of coordinates

I have a dataset of coordinates that are merged by time into one dataframe, with the individual IDs in the header. For example:
> Date_time<-c("2015/03/04 01:00:00","2015/03/04 02:00:00","2015/03/04 03:00:00","2015/03/04 04:00:00")
> lat.1<-c(63.81310,63.83336,63.83250,63.82237)
> long.1<-c(-149.1176,-149.0193,-149.0249,-149.0408)
> lat.2<-c(63.85893 ,63.85885,63.86108,63.86357)
> long.2<-c(-151.1336,-151.1336,-151.1236,-151.1238)
> lat.3<-c(63.87627,63.87670, 63.85044,63.85052)
> long.3<-c(-149.5029,-149.5021,-149.5199,-149.5199)
>
> data<-data.frame(Date_time,lat.1,long.1,lat.2,long.2,lat.3,long.3)
> data
Date_time lat.1 long.1 lat.2 long.2 lat.3 long.3
1 2015/03/04 01:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
2 2015/03/04 02:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
3 2015/03/04 03:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
4 2015/03/04 04:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
I want to calculate the distance between each of the individuals, so between 1 and 2, 1 and 3, and 2 and 3. My dataframe has many more individuals than this so I am hoping to apply a loop function.
I can do them individually using
> data$distbetween12<-distHaversine(cbind(data$long.1,data$lat.1), cbind(data$long.2,data$lat.2))
> data$distbetween12
[1] 99083.48 99083.48 99083.48 99083.48
But can I calculate all the pairwise distances without typing out every pair combination?
Thank you!
Here's a solution that relies on the combn function to generate the necessary combinations. If you have more than 3 pairs of lat, long columns, just change the first number in the combn function to the correct number of pairs.
Note this solution also requires that your columns strictly adhere to the naming lat.1 long.1, lat.2, long.2 etc.
combos <- combn(3, 2)
cbind(data, as.data.frame(`colnames<-`(apply(combos, 2, function(x) {
lats <- paste0("lat.", x)
lons <- paste0("long.", x)
geosphere::distHaversine(cbind(data[[lons[1]]], data[[lats[1]]]),
cbind(data[[lons[2]]], data[[lats[2]]]))
}), apply(combos, 2, paste, collapse = " v "))))
#> Date_time lat.1 long.1 lat.2 long.2 lat.3 long.3
#> 1 2015/03/04 01:00:00 63.81310 -149.1176 63.85893 -151.1336 63.87627 -149.5029
#> 2 2015/03/04 02:00:00 63.83336 -149.0193 63.85885 -151.1336 63.87670 -149.5021
#> 3 2015/03/04 03:00:00 63.83250 -149.0249 63.86108 -151.1236 63.85044 -149.5199
#> 4 2015/03/04 04:00:00 63.82237 -149.0408 63.86357 -151.1238 63.85052 -149.5199
#> 1 v 2 1 v 3 2 v 3
#> 1 99083.48 20172.13 79974.87
#> 2 103778.13 24168.80 80014.97
#> 3 103020.61 24374.46 78669.90
#> 4 102317.93 23724.27 78680.61

Add posixlt as a new column to a dataframe

I am creating some random numbers:
data <- matrix(runif(10, 0, 1), ncol = 2)
dataframe <- data.frame(data)
> dataframe
X1 X2
1 0.7981783 0.13233858
2 0.9592338 0.05512942
3 0.1812384 0.74571334
4 0.1447498 0.96656930
5 0.1735390 0.37345575
and I want to create a corresponding timestamp column and bind that to the above data frame.
time <- as.POSIXlt(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
This creates 10 values.
> time
[1] "2017-05-05 13:00:27 EEST" "2017-05-05 13:00:02 EEST" "2017-05-05 13:00:26 EEST" "2017-05-05 13:00:25 EEST" "2017-05-05 13:00:28 EEST"
[6] "2017-05-05 13:00:17 EEST" "2017-05-05 13:00:35 EEST" "2017-05-05 13:00:08 EEST" "2017-05-05 13:00:29 EEST" "2017-05-05 13:00:32 EEST"
Now, I want to bind it to the dataframe, so I thought first to make it a matrix:
time <- matrix(time, nrow = 5, ncol = 2)
but this gives me:
Warning message:
In matrix(time, nrow = 5, ncol = 2) :
data length [11] is not a sub-multiple or multiple of the number of rows [5]
The reason is that POSIXlt stores the date time as a list of attributes whereas POSIXct would not. So, it is better to use as.POSIXct
time <- as.POSIXct(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
If we need to store, it can be done as a list of data.frame
data.frame(date1= time[1:5], date2 = time[6:10])
without converting to matrix as 'Datetime' gets coerced to integer storage mode when converted to matrix.
Suppose, we proceed with POSIXlt, then we find the list of attributes
time1 <- as.POSIXlt(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
unclass(time1)
#$sec
# [1] 13.424695 40.860449 57.756890 59.072140 24.425521 39.429729 58.309546
# [8] 6.294982 46.613436 25.444415
#$min
# [1] 30 30 30 30 30 30 30 30 30 30
#$hour
# [1] 15 15 15 15 15 15 15 15 15 15
#$mday
# [1] 5 5 5 5 5 5 5 5 5 5
#$mon
# [1] 4 4 4 4 4 4 4 4 4 4
#$year
# [1] 117 117 117 117 117 117 117 117 117 117
#$wday
# [1] 5 5 5 5 5 5 5 5 5 5
#$yday
# [1] 124 124 124 124 124 124 124 124 124 124
#$isdst
# [1] 0 0 0 0 0 0 0 0 0 0
#$zone
# [1] "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST"
#$gmtoff
# [1] 19800 19800 19800 19800 19800 19800 19800 19800 19800 19800
#attr(,"tzone")
#[1] "" "IST" "IST"
With POSIXct, it is the integer storage values that can be found by unclass
unclass(time)
#[1] 1493978445 1493978451 1493978432 1493978402 1493978447 1493978441
#[7] 1493978445 1493978450 1493978419 1493978425
#attr(,"tzone")
#[1] ""

xts operations yield wrong result

Assume I have three xts objects a, m, s, indexed with the same time slots, I want to compute abs((a*20)-m)/s. This works in the following simple case:
bla <- data.frame(c("2016-09-03 13:00", "2016-09-03 13:10", "2016-09-03 13:20"),c(1,2,3), c(4,5,6), c(7,8,9))
names(bla) <- c('ts','lin','qua','cub')
a <- as.xts(x = bla[,c('lin','qua','cub')], order.by=as.POSIXct(bla$ts)
... similar for m and s...
abs((a*20)-m)/s
gives the correct results.
When I go to my real data, I see different behaviour:
> class(a)
[1] "xts" "zoo"
> class(m)
[1] "xts" "zoo"
> class(s)
[1] "xts" "zoo"
> dim(a)
[1] 1 4650
> dim(m)
[1] 1 4650
> dim(s)
[1] 1 4650
Also the column names are the same:
> setdiff(names(a),names(m))
character(0)
> setdiff(names(m),names(s))
character(0)
Now when I do n <- abs((a*20)-m)/s I get
> n[1,feature]
feature
2016-09-08 14:00:00 12687075516
but if I do the computation by hand:
> aa <- coredata((a*20)[1,feature])[1,1]
> mm <- coredata(m[1,feature])[1,1]
> ss <- coredata(s[1,feature])[1,1]
> abs(aa-mm)/ss
feature
0.0005893713
Just to give the original values:
> a[1,feature]
feature
2016-09-08 14:00:00 27955015680
> m[1,feature]
feature
2016-09-08 14:00:00 559150430034
> s[1,feature]
feature
2016-09-08 14:00:00 85033719103
Can anyone explain this discrepancy?
Thanks a lot
Norbert
Self answering: the error was that I believed that xts is more intelligent in the sense that a/b considers column names, which it does not.
> a
lin qua cub
2016-09-03 13:00:00 1 4 7
2016-09-03 13:10:00 2 5 8
2016-09-03 13:20:00 3 6 9
> b
qua lin cub
2016-09-03 13:00:00 2 3 4
2016-09-03 13:10:00 2 3 4
2016-09-03 13:20:00 2 3 4
> a/b
lin qua cub
2016-09-03 13:00:00 0.5 1.333333 1.75
2016-09-03 13:10:00 1.0 1.666667 2.00
2016-09-03 13:20:00 1.5 2.000000 2.25
Division is done via the underlying matrix without taking care of column names. That is the reason while even if the set of column names coincide, the results are wrong.

Recode Date (time) varibre in to new discrete variable

i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)
Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3

Resources