Scatterplot of two xts time series - r

I've got two xts time series. A small sample of them:
ts1
[,1]
2009-05-06 00:00:00 38.414
2009-05-06 00:15:00 45.079
2009-05-06 00:30:00 38.878
2009-05-06 00:45:00 49.889
2009-05-06 01:00:00 41.270
2009-05-06 01:15:00 41.050
2009-05-06 01:30:00 38.951
2009-05-06 01:45:00 39.854
2009-05-06 02:00:00 37.803
2009-05-06 02:15:00 42.930
ts2
[,1]
2009-05-06 00:00:00 406.887
2009-05-06 00:15:00 413.298
2009-05-06 00:30:00 409.353
2009-05-06 00:45:00 412.312
2009-05-06 01:00:00 409.353
2009-05-06 01:15:00 415.271
2009-05-06 01:30:00 416.257
2009-05-06 01:45:00 416.257
2009-05-06 02:00:00 416.257
2009-05-06 02:15:00 419.216
Now I want to create a scatterplot ts1 against ts2. According to the documentation of CRAN (and I also found example in stackoverflow in the same way) it should work like this: plot(ts1, ts2). But I get an error.
plot(ts1,ts2)
# Error in plot(xycoords$x, xycoords$y, type = type, axes = FALSE, ann = FALSE, :
# object 'xycoords' not found
What's going wrong? It works great great with normal ts with the ~ sign, but this doesn't work in xts. I also tried plot(ts1[, 1], ts2[, 1]).

The easiest thing to do is to call plot.zoo directly, instead of allowing the plot generic to dispatch to plot.xts.
ts1 <-
structure(c(38.414, 45.079, 38.878, 49.889, 41.27, 41.05, 38.951,
39.854, 37.803, 42.93), .Dim = c(10L, 1L), index = structure(c(1241586000,
1241586900, 1241587800, 1241588700, 1241589600, 1241590500, 1241591400,
1241592300, 1241593200, 1241594100), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")
ts2 <-
structure(c(406.887, 413.298, 409.353, 412.312, 409.353, 415.271,
416.257, 416.257, 416.257, 419.216), .Dim = c(10L, 1L),
index = structure(c(1241586000, 1241586900, 1241587800, 1241588700,
1241589600, 1241590500, 1241591400, 1241592300, 1241593200, 1241594100),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
plot.zoo(ts1, ts2)

Related

Merge date and time, find the time difference in R

I have a data set and there are some date and hour attributes. Here is the sample, then I will explain what I want to do;
date1
hour1
date2
hour2
date3
hour3
2014-03-16 00:00:00
16:20:00
2014-03-16 00:00:00
20:20:03
2014-03-16 00:00:00
22:12:34
2014-04-22 00:00:00
10:20:00
2014-04-22 00:00:00
15:20:03
2014-04-22 00:00:00
20:12:34
2015-03-12 00:00:00
16:20:00
2015-03-12 00:00:00
20:20:03
2015-03-12 00:00:00
22:12:34
We know event1 happens before event2 (event1 -> event2 -> event3)
But as you see, in the date attributes, time section is not correct yet we have hour attributes for each. What I want to do; I want to correct them by using hour attributes then find the difference between these two dates and create new attributes that gives the time difference as hours.
Sample for above table;
event2_time
4
5
4
I tried to merge hour to date and create a new attribute like this but it doesn't work.(my goal is actually correct the date value and get rid of the hour attribute)
trainTable <- trainTable %>%
mutate("newParam" = as.POSIXct(paste(alert_date, alert_hour), format="%Y-%m-%d %H:%M:%S")
I could use some help, thanks in advance.
Data
structure(list(alert_date = structure(c(1394928000, 1395014400,
1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), alert_hour = c("16:15:00", "20:53:00", "12:55:00",
"14:22:00", "12:07:00", "17:48:00"), firstInterv_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), firstInterv_hour = c("16:35:00", "21:05:00", "13:10:00",
"14:42:00", "12:07:00", "18:08:00"), extinction_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), extinction_hour = c("17:47:00", "22:46:00", "15:30:00",
"15:25:00", "13:14:00", "21:10:00")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Try this solution using mapply. It uses strsplit to split the date from the hours.
dat <- as.data.frame( dat ) # tibbles are cool but sometimes very restrictive, so changing to data.frame here
dat_new <- data.frame( setNames( mapply( function(x,y){
tmp <- sapply( strsplit( as.character(dat[,x]), " "), function(z) z[1] );
list( as.POSIXct( paste(tmp,dat[,y] ) ) ) },
grep("date", colnames(dat)), grep("hour", colnames(dat)) ),
c("a","b","c") ) )
dat_new$b - dat_new$a
Time differences in secs
[1] 1200 720 900 1200 0 1200
# if you need tibbles convert back if you need
as_tibble( dat_new )
# A tibble: 6 x 3
a b c
<dttm> <dttm> <dttm>
1 2014-03-16 16:15:00 2014-03-16 16:35:00 2014-03-16 17:47:00
2 2014-03-17 20:53:00 2014-03-17 21:05:00 2014-03-17 22:46:00
3 2014-03-17 12:55:00 2014-03-17 13:10:00 2014-03-17 15:30:00
4 2014-03-19 14:22:00 2014-03-19 14:42:00 2014-03-19 15:25:00
5 2014-03-20 12:07:00 2014-03-20 12:07:00 2014-03-20 13:14:00
6 2014-03-17 17:48:00 2014-03-17 18:08:00 2014-03-17 21:10:00
Data
dat <- structure(list(alert_date = structure(c(1394928000, 1395014400,
1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), alert_hour = c("16:15:00", "20:53:00", "12:55:00",
"14:22:00", "12:07:00", "17:48:00"), firstInterv_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), firstInterv_hour = c("16:35:00", "21:05:00", "13:10:00",
"14:42:00", "12:07:00", "18:08:00"), extinction_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), extinction_hour = c("17:47:00", "22:46:00", "15:30:00",
"15:25:00", "13:14:00", "21:10:00")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))

R Group ID's with overlapping time intervals

I have a large dataset with multiple groups within the dataset of IDs with Start & Stop datetimes. What I'm trying to do is within each group identify where a subgroup occurred. A subgroup within a group would be when two ID's overlap with their START & END datetime columns. Below is script to create a sample dataset in R for one group. What I want to do is within each group create a column called, "Grp" that groups those subgroups with overlapping START & END datetimes.
What I have...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
What I want is...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1508379300,
1508363100, 1490918400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1509031800,
1509062400, 1492247700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Grp = c(1,2,2,1)), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END","Grp"))
I've tried using lubridate's interval, and finding an overlap that way, but no luck. Any help would be greatly appreciated.
Atfter sorting by START, the condition for a new group is that the END of the previous row is less than the START of the next group:
head(df1$END, -1) < tail(df1$START,-1)
df1 <- structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
df1
ID START END
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00
df1a <- df1[ order(df1$START), ]
df1a$grp <- cumsum( c( 1, head(df1$END, -1) < tail(df1$START,-1) ))
df1a
#---------------
ID START END grp
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00 1
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00 1
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00 2
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00 2
Here's a function that answers the first part of my response to the comment below:
grp_overlaps <- function(endings, beginnings){
cumsum(c( 1, head(endings, -1) < tail(beginnings, -1) )) }

Function to return modified matrix in R

The following code adds vector XTS1$XTSSum2 to xts object XTS1:
library(xts)
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
colnames(XTS1) <- "XTS1"
XTS1$XTSSum2 <- XTS1$XTS1 + lag(XTS1$XTS1,1)
The following function performs the same operation.
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
}
addfunction(XTS1)
But the vector XTS1$XTSSum2 is not stored.
How can I get addfunction to store the vector so that after running addfunction(XTS1), XTS1 will look like this:
XTS1 XTSSum2
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 19
2014-10-22 09:15:00 7 14
2014-10-22 09:30:00 22 29
2014-10-22 09:45:00 24 46
2014-10-22 10:00:00 30 54
2014-10-22 10:15:00 26 56
2014-10-22 10:30:00 23 49
2014-10-22 10:45:00 27 50
2014-10-22 11:00:00 30 57
The reproducible example uses an xts object, presume the same solution would apply
to xts objects, matrices and data frames.
The assignment is happening within the function's environment, not the global one. You need to return the result in the function, and assign it with the function call. Try this:
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
x
}
XTS1 <- addfunction(XTS1)

Date conversion as.yearmon

I have a dataframe that looks like this:
library(zoo)
head(monthly.station6)
[,1]
1995-02-28 00:00:00 2.07
1995-03-01 00:00:00 5.70
1995-04-30 01:00:00 0.65
1995-05-31 01:00:00 1.03
1995-06-30 01:00:00 0.77
1995-07-31 01:00:00 0.39
I am applying this code: monthly.station6[,0] <- as.yearmon(monthly.station6[,0]) to try to convert this into a year month format, but I think the fact that the date column is [,0] is preventing it? Not sure where I am going wrong, any help would be appreciated!
head(monthly.station6)
[,1]
Feb 1995 2.07
Mar 1995 5.70
Apr 1995 0.65
May 1995 1.03
Jun 1995 0.77
Jul 1995 0.39
as requested dput(head(monthly.station6)):
structure(c(2.07, 5.7, 0.65, 1.03, 0.77, 0.39), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts",
"zoo"), index = structure(c(793929600, 794016000, 799200000,
801878400, 804470400, 807148800), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L))
1) index The object's class is "xts", not "data.frame". Use index (or time) to modify monthly.station6:
library(xts)
index(monthly.station6) <- as.yearmon(index(monthly.station6))
2) aggregate.zoo Another possibility is to use aggregate.zoo. That will return a "zoo" object so convert it back to "xts" :
library(xts)
as.xts(aggregate(monthly.station6, as.yearmon))
3) fortify.zoo Since the question mentions data.frame, if what you really wanted was a data.frame then the first statement after library will create a data.frame with a first column of Index and the second will perform the conversion to "yearmon":
library(xts)
DF <- fortify.zoo(monthly.station6)
transform(DF, Index = as.yearmon(Index))
Note: If you want just year then it cannot be an xts object but you could represent it as a data.frame. Using DF from (3):
transform(DF, Index = as.numeric(format(Index, "%Y")))

How to use .indexyear in xts package

I'm trying to use the .indexyear function in the xts package, but can't get my head around how it's supposed to be used.
Below are some code, you can see that the .indexyear returns 112, 113, 114, 115 for the years 2012, 2013, 2014, 2015. I want to see if a certain year exists in the xts object index, so how do I make 2012 %in% .indexyear(a) equal to TRUE?
Code
Browse[1]> index(a)
[1] "2012-12-30 00:00:00 CET" "2013-12-30 00:00:00 CET" "2014-12-30 00:00:00 CET" "2015-12-30 01:00:00 CET"
Browse[1]> .indexyear(a)
[1] 112 113 114 115
Browse[1]> 2014 %in% .index(a) # should actually be TRUE!
[1] FALSE
Browse[1]> 113 %in% .indexyear(a)
[1] TRUE
The .index* functions basically wrap the components of the POSIXlt class. So see the Details section of ?POSIXlt, which says:
'year' years since 1900.
So you need to add 1900 to the output of .indexyear to get what you want.
a <- structure(1:4, .Dim = c(4L, 1L), index = structure(c(1356847200, 1388383200,
1419919200, 1451458800), tzone = "", tclass = c("POSIXct", "POSIXt")),
class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"),
tclass = c("POSIXct", "POSIXt"), .indexTZ = "UTC", tzone = "UTC")
2014 %in% (.indexyear(a)+1900)
# [1] TRUE

Resources