I have a data set and there are some date and hour attributes. Here is the sample, then I will explain what I want to do;
date1
hour1
date2
hour2
date3
hour3
2014-03-16 00:00:00
16:20:00
2014-03-16 00:00:00
20:20:03
2014-03-16 00:00:00
22:12:34
2014-04-22 00:00:00
10:20:00
2014-04-22 00:00:00
15:20:03
2014-04-22 00:00:00
20:12:34
2015-03-12 00:00:00
16:20:00
2015-03-12 00:00:00
20:20:03
2015-03-12 00:00:00
22:12:34
We know event1 happens before event2 (event1 -> event2 -> event3)
But as you see, in the date attributes, time section is not correct yet we have hour attributes for each. What I want to do; I want to correct them by using hour attributes then find the difference between these two dates and create new attributes that gives the time difference as hours.
Sample for above table;
event2_time
4
5
4
I tried to merge hour to date and create a new attribute like this but it doesn't work.(my goal is actually correct the date value and get rid of the hour attribute)
trainTable <- trainTable %>%
mutate("newParam" = as.POSIXct(paste(alert_date, alert_hour), format="%Y-%m-%d %H:%M:%S")
I could use some help, thanks in advance.
Data
structure(list(alert_date = structure(c(1394928000, 1395014400,
1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), alert_hour = c("16:15:00", "20:53:00", "12:55:00",
"14:22:00", "12:07:00", "17:48:00"), firstInterv_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), firstInterv_hour = c("16:35:00", "21:05:00", "13:10:00",
"14:42:00", "12:07:00", "18:08:00"), extinction_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), extinction_hour = c("17:47:00", "22:46:00", "15:30:00",
"15:25:00", "13:14:00", "21:10:00")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Try this solution using mapply. It uses strsplit to split the date from the hours.
dat <- as.data.frame( dat ) # tibbles are cool but sometimes very restrictive, so changing to data.frame here
dat_new <- data.frame( setNames( mapply( function(x,y){
tmp <- sapply( strsplit( as.character(dat[,x]), " "), function(z) z[1] );
list( as.POSIXct( paste(tmp,dat[,y] ) ) ) },
grep("date", colnames(dat)), grep("hour", colnames(dat)) ),
c("a","b","c") ) )
dat_new$b - dat_new$a
Time differences in secs
[1] 1200 720 900 1200 0 1200
# if you need tibbles convert back if you need
as_tibble( dat_new )
# A tibble: 6 x 3
a b c
<dttm> <dttm> <dttm>
1 2014-03-16 16:15:00 2014-03-16 16:35:00 2014-03-16 17:47:00
2 2014-03-17 20:53:00 2014-03-17 21:05:00 2014-03-17 22:46:00
3 2014-03-17 12:55:00 2014-03-17 13:10:00 2014-03-17 15:30:00
4 2014-03-19 14:22:00 2014-03-19 14:42:00 2014-03-19 15:25:00
5 2014-03-20 12:07:00 2014-03-20 12:07:00 2014-03-20 13:14:00
6 2014-03-17 17:48:00 2014-03-17 18:08:00 2014-03-17 21:10:00
Data
dat <- structure(list(alert_date = structure(c(1394928000, 1395014400,
1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), alert_hour = c("16:15:00", "20:53:00", "12:55:00",
"14:22:00", "12:07:00", "17:48:00"), firstInterv_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), firstInterv_hour = c("16:35:00", "21:05:00", "13:10:00",
"14:42:00", "12:07:00", "18:08:00"), extinction_date = structure(c(1394928000,
1395014400, 1395014400, 1395187200, 1395273600, 1395014400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), extinction_hour = c("17:47:00", "22:46:00", "15:30:00",
"15:25:00", "13:14:00", "21:10:00")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Related
I have two columns
starttime endtime
2019-11-05 18:02:04 2019-11-05 00:02:04
2019-08-02 20:18:00 2019-0802 01:10:00
2019-12-07 17:28:00 2019-12-07 18:00:00
I am trying to find the difference in time between starttime and endtime
mutate(col = difftime(endtime,starttime,units = "hours")
but i am getting negative hours which makes no sense, and i need it to be endtime - startime because it would mess up things for the dataframe that I have I beleive that 0.533 is right I got
col
-18
-19
0.533
We can increment the endtime by 1 day if startttime > endtime and then use difftime
library(dplyr)
df %>%
mutate(endtime = if_else(starttime > endtime, endtime + 86400, endtime),
col = difftime(endtime,starttime,units = "hours"))
# starttime endtime col
#1 2019-11-05 18:02:04 2019-11-06 00:02:04 6.0000000 hours
#2 2019-08-02 20:18:00 2019-08-03 01:10:00 4.8666667 hours
#3 2019-12-07 17:28:00 2019-12-07 18:00:00 0.5333333 hours
data
df <- structure(list(starttime = structure(c(1572976924, 1564777080,
1575739680), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
endtime = structure(c(1572912124, 1564708200, 1575741600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -3L), class = "data.frame")
Error in seq.Date(as.Date(retail$Valid_from), as.Date(retail$Valid_to), :
'from' must be of length 1
I have tried both the methods as mentioned in the question :
How should I deal with 'from' must be of length 1 error?
I basically want to repeat the quantity for each day in a given date range :
HSD_RSP Valid_from Valid_to
70 1/1/2018 15/1/2018
80 1/16/2018 1/31/2018
.
.
.
Method 1 :
byDay = ddply(retail, .(HSD_RSP), transform,
day=seq(as.Date(retail$Valid_from), as.Date(retail$Valid_to), by="day"))
Method 2 :
dt <- data.table(retail)
dt <- dt[,seq(as.Date(Valid_from),as.Date(Valid_to),by="day"),
by=list(HSD_RSP)]
HSD_RSP final_date
70 1/1/2018
70 2/1/2018
70 3/1/2018
70 4/1/2018
.
.
.
output of
dput(head(retail))
structure(list(HSD_RSP = c(61.68, 62.96, 63.14, 60.51, 60.34,
61.63), Valid_from = structure(c(1483315200, 1484524800, 1487116800,
1491004800, 1491523200, 1492300800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Valid_to = structure(c(1484438400, 1487030400,
1490918400, 1491436800, 1492214400, 1493510400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Convert to date, create a sequence of dates between Valid_from and Valid_to and unnest
library(tidyverse)
df %>%
mutate_at(vars(starts_with("Valid")), as.Date, "%m/%d/%Y") %>%
mutate(Date = map2(Valid_from, Valid_to, seq, by = "1 day")) %>%
unnest(Date) %>%
select(-Valid_from, -Valid_to)
# HSD_RSP Date
# <int> <date>
# 1 70 2018-01-01
# 2 70 2018-01-02
# 3 70 2018-01-03
# 4 70 2018-01-04
# 5 70 2018-01-05
# 6 70 2018-01-06
# 7 70 2018-01-07
# 8 70 2018-01-08
# 9 70 2018-01-09
#10 70 2018-01-10
# … with 21 more rows
data
df <- structure(list(HSD_RSP = c(70L, 80L), Valid_from = structure(1:2,
.Label = c("1/1/2018", "1/16/2018"), class = "factor"), Valid_to =
structure(1:2, .Label = c("1/15/2018", "1/31/2018"), class = "factor")),
class = "data.frame", row.names = c(NA, -2L))
Using Ronak Shah's data structure, using data.table:
library(data.table)
dt <- as.data.table(df1)
dt[, .(final_date = seq(as.Date(Valid_from, "%m/%d/%Y"), as.Date(Valid_to, "%m/%d/%Y"), by = "day")),
by = HSD_RSP]
HSD_RSP final_date
1: 70 2018-01-01
2: 70 2018-01-02
3: 70 2018-01-03
4: 70 2018-01-04
....
data:
df <- structure(list(HSD_RSP = c(70L, 80L), Valid_from = structure(1:2,
.Label = c("1/1/2018", "1/16/2018"), class = "factor"), Valid_to =
structure(1:2, .Label = c("1/15/2018", "1/31/2018"), class = "factor")),
class = "data.frame", row.names = c(NA, -2L))
Example Data:
structure(c(-0.0752423128397812, -0.00667756345500559, 0.127210629285125,
-0.139921096245914, 0.0652869973391721, -0.0426597532279215,
0.0900627738506856, 0.0181364458126518, 0.0655042896419282, 0.00433434751877004,
-0.0265985905707364, 0.0479551496911459), class = c("xts", "zoo"
), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1451606400,
1454284800, 1456790400, 1459468800, 1462060800, 1464739200, 1467331200,
1470009600, 1472688000, 1475280000, 1477958400, 1480550400), tzone = "UTC", tclass = "Date"), .Dim = c(12L,
1L), .Dimnames = list(NULL, "AAPL.Returns"))
How do I convert the index of an object, in this case, the Date column, into a new column labelled Date?
Edit:
> head(Stock1_returns)
AAPL.Returns
2007-01-01 -0.006489744
2007-02-01 -0.013064271
2007-03-01 0.098097127
2007-04-01 0.074157809
2007-05-01 0.214328635
2007-06-01 0.007013805
For turning a xts object into a dataframe with the date column you can use the following code. You use index to get the date index of the xts object and coredata for all the data contained in the xts object.
# my_xts is based on data from OP
df1 <- data.frame(Date = index(my_xts), coredata(my_xts) )
# show resulting structure
str(df1)
'data.frame': 12 obs. of 2 variables:
$ Date : Date, format: "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01" ...
$ AAPL.Returns: num -0.07524 -0.00668 0.12721 -0.13992 0.06529 ...
# outcome
df1
Date AAPL.Returns
1 2016-01-01 -0.075242313
2 2016-02-01 -0.006677563
3 2016-03-01 0.127210629
4 2016-04-01 -0.139921096
5 2016-05-01 0.065286997
6 2016-06-01 -0.042659753
7 2016-07-01 0.090062774
8 2016-08-01 0.018136446
9 2016-09-01 0.065504290
10 2016-10-01 0.004334348
11 2016-11-01 -0.026598591
12 2016-12-01 0.047955150
I have a large dataset with multiple groups within the dataset of IDs with Start & Stop datetimes. What I'm trying to do is within each group identify where a subgroup occurred. A subgroup within a group would be when two ID's overlap with their START & END datetime columns. Below is script to create a sample dataset in R for one group. What I want to do is within each group create a column called, "Grp" that groups those subgroups with overlapping START & END datetimes.
What I have...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
What I want is...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1508379300,
1508363100, 1490918400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1509031800,
1509062400, 1492247700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Grp = c(1,2,2,1)), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END","Grp"))
I've tried using lubridate's interval, and finding an overlap that way, but no luck. Any help would be greatly appreciated.
Atfter sorting by START, the condition for a new group is that the END of the previous row is less than the START of the next group:
head(df1$END, -1) < tail(df1$START,-1)
df1 <- structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
df1
ID START END
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00
df1a <- df1[ order(df1$START), ]
df1a$grp <- cumsum( c( 1, head(df1$END, -1) < tail(df1$START,-1) ))
df1a
#---------------
ID START END grp
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00 1
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00 1
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00 2
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00 2
Here's a function that answers the first part of my response to the comment below:
grp_overlaps <- function(endings, beginnings){
cumsum(c( 1, head(endings, -1) < tail(beginnings, -1) )) }
I've got two xts time series. A small sample of them:
ts1
[,1]
2009-05-06 00:00:00 38.414
2009-05-06 00:15:00 45.079
2009-05-06 00:30:00 38.878
2009-05-06 00:45:00 49.889
2009-05-06 01:00:00 41.270
2009-05-06 01:15:00 41.050
2009-05-06 01:30:00 38.951
2009-05-06 01:45:00 39.854
2009-05-06 02:00:00 37.803
2009-05-06 02:15:00 42.930
ts2
[,1]
2009-05-06 00:00:00 406.887
2009-05-06 00:15:00 413.298
2009-05-06 00:30:00 409.353
2009-05-06 00:45:00 412.312
2009-05-06 01:00:00 409.353
2009-05-06 01:15:00 415.271
2009-05-06 01:30:00 416.257
2009-05-06 01:45:00 416.257
2009-05-06 02:00:00 416.257
2009-05-06 02:15:00 419.216
Now I want to create a scatterplot ts1 against ts2. According to the documentation of CRAN (and I also found example in stackoverflow in the same way) it should work like this: plot(ts1, ts2). But I get an error.
plot(ts1,ts2)
# Error in plot(xycoords$x, xycoords$y, type = type, axes = FALSE, ann = FALSE, :
# object 'xycoords' not found
What's going wrong? It works great great with normal ts with the ~ sign, but this doesn't work in xts. I also tried plot(ts1[, 1], ts2[, 1]).
The easiest thing to do is to call plot.zoo directly, instead of allowing the plot generic to dispatch to plot.xts.
ts1 <-
structure(c(38.414, 45.079, 38.878, 49.889, 41.27, 41.05, 38.951,
39.854, 37.803, 42.93), .Dim = c(10L, 1L), index = structure(c(1241586000,
1241586900, 1241587800, 1241588700, 1241589600, 1241590500, 1241591400,
1241592300, 1241593200, 1241594100), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")
ts2 <-
structure(c(406.887, 413.298, 409.353, 412.312, 409.353, 415.271,
416.257, 416.257, 416.257, 419.216), .Dim = c(10L, 1L),
index = structure(c(1241586000, 1241586900, 1241587800, 1241588700,
1241589600, 1241590500, 1241591400, 1241592300, 1241593200, 1241594100),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
plot.zoo(ts1, ts2)