I have a data frame which looks like this
df = data.frame (time = c("2013-12-23 00:00:00", "2013-12-23 00:13:00", "2013-12-23 00:14:00", "2013-12-23 00:14:01",
"2013-12-24 00:00:00", "2013-12-24 00:12:00", "2013-12-24 00:15:00", "2013-12-24 00:16:00"),
value = c(1, 2, 3, 4, 5, 6, 7, 8))
I transform this data frame to an xts object and use the POSIXct format for the index
df = as.xts(as.numeric(as.character(df[,"value"])), order.by = as.POSIXct(df[,"time"]))
What I now need is to change all the indices whose time is 00:00:00 to 22:00:00.
All other time indices must stay as they are.
The resulting object looks like this
>df
[,1]
2013-12-23 00:13:00 2
2013-12-23 00:14:00 3
2013-12-23 00:14:01 4
2013-12-23 22:00:00 1
2013-12-24 00:12:00 6
2013-12-24 00:15:00 7
2013-12-24 00:16:00 8
2013-12-24 22:00:00 5
Thanks for your help! Pat
We could use sub to replace the '00:00:00' to '22:00:00' in the original dataset and then do the xts conversion
df$time <- as.POSIXct(sub('00:00:00', '22:00:00', df$time),
format='%Y-%m-%d %H:%M:%S')
library(xts)
xts(df$value, order.by=df$time)
# [,1]
#2013-12-23 00:13:00 2
#2013-12-23 00:14:00 3
#2013-12-23 00:14:01 4
#2013-12-23 22:00:00 1
#2013-12-24 00:12:00 6
#2013-12-24 00:15:00 7
#2013-12-24 00:16:00 8
#2013-12-24 22:00:00 5
Here's a function that will shift the zero-hour of an xts object by n seconds.
shiftZeroHour <- function(x, n=1) {
stopifnot(is.xts(x))
# find zero hour
plt <- as.POSIXlt(index(x), tz=indexTZ(x))
isZeroHour <- plt$hour == 0 & plt$min == 0 & plt$sec == 0
# shift zero hour index values
.index(x)[isZeroHour] <- .index(x)[isZeroHour] + n
# ensure index is ordered properly
as.xts(x)
}
Here is how to use it with your sample data:
xdf <- structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Dim = c(8L, 1L),
index = structure(c(1387778400, 1387779180, 1387779240, 1387779241,
1387864800, 1387865520, 1387865700, 1387865760), tzone = "",
tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
shiftZeroHour(xdf, 60*60*22)
Related
I have two data frames with timestamps (in as.POSIXct, format="%Y-%m-%d %H:%M:%S") as below.
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 1
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
df_ID0
ID DATETIME
A 2019-03-26 00:02:00
A 2019-03-26 04:55:00
A 2019-03-26 11:22:00
B 2019-04-02 20:43:00
B 2019-04-04 11:03:00
B 2019-04-06 03:12:00
I want to compare the DATETIME in df_ID1 with the DATETIME in df_ID0 that is with the same ID and the DATETIME is "smaller than but closest to" the one in df_ID1,
For the pair in two data frames that matches, I want to further compare the TIMEDIFF in df_ID1 to the matched DATETIME in df_ID0, if TIMEDIFF in df_ID1 greater than the DATETIME in df_ID0, change EV 1 to 4 in df_ID1.
My desired result is
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
I've checked how to compare timestamps and calculate the time difference, also how to change values based on criteria...
But I cannot find anything to select the "smaller than but closest to" timestamps and cannot figure out how to apply all these logic too..
Any help would be appreciate!
You can do this with a for loop keeping in mind that if your actual data base is very big then the overhead would be quite bad performance wise.
for(i in 1:nrow(df_1)){
sub <- subset(df_0, ID == df_1$ID[i]) # filter on ID
df_0_dt <- max(sub[sub$DATETIME < df_1$DATETIME[i],]$DATETIME) # Take max of those with DATETIME less than (ie less than but closest to)
if(df_0_dt < df_1$TIMEDIFF[i]){ # final condition
df_1[i, "EV"] <- 4
}
}
df_1
# A tibble: 3 x 4
ID DATETIME TIMEDIFF EV
<chr> <dttm> <dttm> <dbl>
1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
One option using nested mapply, is to first split df_ID1 and df_ID0 based on ID. Calculate the difference in time between each value in df_ID1 with that of df_ID0 of same ID. Get the index of "smaller than but closest to" and store it in inds and change the value to 4 if the value of corresponding TIMEDIFF column is greater than the matched DATETIME value.
df_ID1$EV[unlist(mapply(function(x, y) {
mapply(function(p, q) {
vals = as.numeric(difftime(p, y$DATETIME))
inds = which(vals == min(vals[vals > 0]))
q > y$DATETIME[inds]
}, x$DATETIME, x$TIMEDIFF)
}, split(df_ID1, df_ID1$ID), split(df_ID0, df_ID0$ID)))] <- 4
df_ID1
# ID DATETIME TIMEDIFF EV
#1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
#2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
#3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
data
df_ID0 <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553529720, 1553547300,
1553570520, 1554208980, 1554346980, 1554491520), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -6L), class = "data.frame")
df_ID1 <- structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553551980, 1554249600,
1554352500), class = c("POSIXct", "POSIXt"), tzone = ""), TIMEDIFF =
structure(c(1553530380,
1554228000, 1554330900), class = c("POSIXct", "POSIXt"), tzone = ""),
EV = c(1, 1, 1)), row.names = c(NA, -3L), class = "data.frame")
Example Data:
structure(c(-0.0752423128397812, -0.00667756345500559, 0.127210629285125,
-0.139921096245914, 0.0652869973391721, -0.0426597532279215,
0.0900627738506856, 0.0181364458126518, 0.0655042896419282, 0.00433434751877004,
-0.0265985905707364, 0.0479551496911459), class = c("xts", "zoo"
), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1451606400,
1454284800, 1456790400, 1459468800, 1462060800, 1464739200, 1467331200,
1470009600, 1472688000, 1475280000, 1477958400, 1480550400), tzone = "UTC", tclass = "Date"), .Dim = c(12L,
1L), .Dimnames = list(NULL, "AAPL.Returns"))
How do I convert the index of an object, in this case, the Date column, into a new column labelled Date?
Edit:
> head(Stock1_returns)
AAPL.Returns
2007-01-01 -0.006489744
2007-02-01 -0.013064271
2007-03-01 0.098097127
2007-04-01 0.074157809
2007-05-01 0.214328635
2007-06-01 0.007013805
For turning a xts object into a dataframe with the date column you can use the following code. You use index to get the date index of the xts object and coredata for all the data contained in the xts object.
# my_xts is based on data from OP
df1 <- data.frame(Date = index(my_xts), coredata(my_xts) )
# show resulting structure
str(df1)
'data.frame': 12 obs. of 2 variables:
$ Date : Date, format: "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01" ...
$ AAPL.Returns: num -0.07524 -0.00668 0.12721 -0.13992 0.06529 ...
# outcome
df1
Date AAPL.Returns
1 2016-01-01 -0.075242313
2 2016-02-01 -0.006677563
3 2016-03-01 0.127210629
4 2016-04-01 -0.139921096
5 2016-05-01 0.065286997
6 2016-06-01 -0.042659753
7 2016-07-01 0.090062774
8 2016-08-01 0.018136446
9 2016-09-01 0.065504290
10 2016-10-01 0.004334348
11 2016-11-01 -0.026598591
12 2016-12-01 0.047955150
This question already has answers here:
How to flatten / merge overlapping time periods
(5 answers)
Closed 4 years ago.
I know the following problam can be solved using Bioconductor's IRanges-package, using reduce.
But since that function only accepts numeric input, and I am working with data.table anyway, I am wondering is the following van be achieved using data.tables'foverlaps().
Sample data
structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B"
), subgroup = c(1, 1, 2, 2, 1, 1, 2, 2), start = structure(c(1514793600,
1514795400, 1514794200, 1514798100, 1514815200, 1514817000, 1514815800,
1514818800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
end = structure(c(1514794500, 1514797200, 1514794800, 1514799000,
1514816100, 1514818800, 1514817600, 1514820600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# group subgroup start end
# 1: A 1 2018-01-01 08:00:00 2018-01-01 08:15:00
# 2: A 1 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2 2018-01-01 08:10:00 2018-01-01 08:20:00
# 4: A 2 2018-01-01 09:15:00 2018-01-01 09:30:00
# 5: B 1 2018-01-01 14:00:00 2018-01-01 14:15:00
# 6: B 1 2018-01-01 14:30:00 2018-01-01 15:00:00
# 7: B 2 2018-01-01 14:10:00 2018-01-01 14:40:00
# 8: B 2 2018-01-01 15:00:00 2018-01-01 15:30:00
Question
What I would like to achieve, is to join/merge events (by group) when:
a range (start - end) overlaps (or partially overlaps) another range
the start of a range is the end of another range
Subgroups can be ignored
As mentioned above, I'm know this can be done using biocondustor's IRanges reduce, but I wonder if the same can be achieved using data.table. I can't shake the feeling that foverlaps should be able to tackle my problem, but I cannot figure out how...
Since I'm an intermediate R-user, but pretty much a novice in data.table, it's hard for me to 'read' some solutions already provided on stackoverflow. So I'm not sure if a similar quenstion has already been asked and answered (if so, please be gentle ;-) )
Desired output
structure(list(group = c("A", "A", "A", "B"), start = structure(c(1514793600,
1514795400, 1514798100, 1514815200), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), end = structure(c(1514794800, 1514797200,
1514799000, 1514820600), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
# group start end
# 1: A 2018-01-01 08:00:00 2018-01-01 08:20:00
# 2: A 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2018-01-01 09:15:00 2018-01-01 09:30:00
# 4: B 2018-01-01 14:00:00 2018-01-01 15:30:00
If you arrange on group and start (in that order) and unselect the indx column, this solution posted by David Arenburg works perfectly: How to flatten/merge overlapping time periods in R
library(dplyr)
df1 %>%
group_by(group) %>%
arrange(group, start) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(group, indx) %>%
summarise(start = first(start), end = last(end)) %>%
select(-indx)
group start end
<chr> <dttm> <dttm>
1 A 2018-01-01 08:00:00 2018-01-01 08:20:00
2 A 2018-01-01 08:30:00 2018-01-01 09:00:00
3 A 2018-01-01 09:15:00 2018-01-01 09:30:00
4 B 2018-01-01 14:00:00 2018-01-01 15:30:00
I'm looking for a function to calculate the highest value for the prior X periods on an XTS object. The function would return a vector with such values.
I would believe there are multiple ways to calculate this. Surprisingly I could not find this covered in a prior SO question. I am hoping there is a package with a function already defined for this. If there is none maybe someone knows how to tackle it.
The example below shows how the vector with the highest values of the last 3 periods would look like for XTS object XTS1.
library('xts')
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
#DESIRED OUTPUT
[,1] GetHighest(3)
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 12
2014-10-22 09:15:00 7 12
2014-10-22 09:30:00 22 12
2014-10-22 09:45:00 24 22
2014-10-22 10:00:00 30 24
2014-10-22 10:15:00 26 30
2014-10-22 10:30:00 23 30
2014-10-22 10:45:00 27 30
2014-10-22 11:00:00 30 27
You could use rollapply from zoo.
So it would look something like this:
GetHighest_3 = rollapply(data = XTS1, width = 3, FUN = max)
Then combine it:
cbind(XTS1, GetHighest_3)
The only probelm I see, is that it will probably return NA for the first 2 values, not only the first value, since it has a width of 3.
I wasn't able to test the code, since I don't have access to R right now, so there might be some misspelling.
The following code adds vector XTS1$XTSSum2 to xts object XTS1:
library(xts)
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
colnames(XTS1) <- "XTS1"
XTS1$XTSSum2 <- XTS1$XTS1 + lag(XTS1$XTS1,1)
The following function performs the same operation.
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
}
addfunction(XTS1)
But the vector XTS1$XTSSum2 is not stored.
How can I get addfunction to store the vector so that after running addfunction(XTS1), XTS1 will look like this:
XTS1 XTSSum2
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 19
2014-10-22 09:15:00 7 14
2014-10-22 09:30:00 22 29
2014-10-22 09:45:00 24 46
2014-10-22 10:00:00 30 54
2014-10-22 10:15:00 26 56
2014-10-22 10:30:00 23 49
2014-10-22 10:45:00 27 50
2014-10-22 11:00:00 30 57
The reproducible example uses an xts object, presume the same solution would apply
to xts objects, matrices and data frames.
The assignment is happening within the function's environment, not the global one. You need to return the result in the function, and assign it with the function call. Try this:
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
x
}
XTS1 <- addfunction(XTS1)