I was thinking of how to find date(which does not exist in the table) based on time.
Example: Remember, I only have the time
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
I know for the fact that the start date is 2014-08-28, but how do I get the date which changes after midnight.
Expected outcome would be
9:44 2014-08-28
15:30 2014-08-28
23:48 2014-08-28
00:30 2014-08-29
05:30 2014-08-29
15:30 2014-08-29
22:00 2014-08-29
00:45 2014-08-30
Here's an example using data.table package ITime class which enables you to manipulate time (upon converting time to this class you can now subtract/add minutes/hours/etc.)
library(data.table)
time <- as.ITime(time)
Date <- as.IDate("2014-08-28") + c(0, cumsum(diff(time) < 0))
data.table(time, Date)
# time Date
# 1: 09:44:00 2014-08-28
# 2: 15:30:00 2014-08-28
# 3: 23:48:00 2014-08-28
# 4: 00:30:00 2014-08-29
# 5: 05:30:00 2014-08-29
# 6: 15:30:00 2014-08-29
# 7: 22:00:00 2014-08-29
# 8: 00:45:00 2014-08-30
Using the chron package we assume that a later time is on the same day and an earlier time is on the next day:
library(chron)
date <- as.Date("2014-08-28") + cumsum(c(0, diff(times(paste0(time, ":00"))) < 0))
data.frame(time, date)
giving:
time date
1 9:44 2014-08-28
2 15:30 2014-08-28
3 23:48 2014-08-28
4 00:30 2014-08-29
5 05:30 2014-08-29
6 15:30 2014-08-29
7 22:00 2014-08-29
8 00:45 2014-08-30
Here's one way to do it:
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
times <- sapply(strsplit(time, ":", TRUE), function(x) Reduce("+", as.numeric(x) * c(60, 1)))
as.POSIXct("2014-08-28") + times + 60*60*24*cumsum(c(F, tail(times < lag(times), -1)))
# [1] "2014-08-28 00:09:44 CEST" "2014-08-28 00:15:30 CEST" "2014-08-28 00:23:48 CEST" "2014-08-29 00:00:30 CEST" "2014-08-29 00:05:30 CEST" "2014-08-29 00:15:30 CEST" "2014-08-29 00:22:00 CEST" "2014-08-30 00:00:45 CEST"
You can concatenate system date with time and get result. For example, in Oracle we can get date with time as:
to_char(sysdate,'DD-MM-RRRR')|| ' ' || To_char(sysdate,'HH:MIAM')
This will result as eg. 12-09-2015 09:50 AM
For your requirement, use this as:
to_char(sysdate,'DD-MM-RRRR')|| ' 00:45' and so on.
Related
How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example :
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120))
The following functions show periods ends at 23:00
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days")
x1[endpoints(x1, 'days')]
So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?
As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin".
What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly.
(More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.)
Here is an example:
> tail(x1["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -1.2760349
# 2018-02-01 19:00:00 -0.1496041
# 2018-02-01 20:00:00 -0.5989614
# 2018-02-01 21:00:00 -0.9691905
# 2018-02-01 22:00:00 -0.2519618
# 2018-02-01 23:00:00 -1.6081656
> head(x1["2018-02-02"])
# [,1]
# 2018-02-02 00:00:00 -0.3373271
# 2018-02-02 01:00:00 0.8312698
# 2018-02-02 02:00:00 0.9321747
# 2018-02-02 03:00:00 0.6719425
# 2018-02-02 04:00:00 -0.5597391
# 2018-02-02 05:00:00 -0.9810128
> head(x1["2018-02-03"])
# [,1]
# 2018-02-03 00:00:00 2.3746424
# 2018-02-03 01:00:00 0.8536594
# 2018-02-03 02:00:00 -0.2467268
# 2018-02-03 03:00:00 -0.1316978
# 2018-02-03 04:00:00 0.3079848
# 2018-02-03 05:00:00 0.2445634
x2 <- x1
.index(x2) <- .index(x1) - 3600
> tail(x2["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -0.1496041
# 2018-02-01 19:00:00 -0.5989614
# 2018-02-01 20:00:00 -0.9691905
# 2018-02-01 21:00:00 -0.2519618
# 2018-02-01 22:00:00 -1.6081656
# 2018-02-01 23:00:00 -0.3373271
x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days")
> x.d2
# [,1]
# 2018-01-31 23:00:00 0.12516594
# 2018-02-01 23:00:00 -0.33732710
# 2018-02-02 23:00:00 2.37464235
# 2018-02-03 23:00:00 0.51797747
# 2018-02-04 23:00:00 0.08955208
# 2018-02-05 22:00:00 0.33067734
x.d2 <- align.time(x.d2, n = 86400)
> x.d2
# [,1]
# 2018-02-01 0.12516594
# 2018-02-02 -0.33732710
# 2018-02-03 2.37464235
# 2018-02-04 0.51797747
# 2018-02-05 0.08955208
# 2018-02-06 0.33067734
Want to convince yourself? Try something like this:
x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00"))))
x3["2018-02-01 23/2018-02-02 01"]
# [,1]
# 2018-02-01 23:00:00.000 -1.6081656
# 2018-02-01 23:59:59.999 1.0000000
# 2018-02-02 00:00:00.000 -0.3373271
# 2018-02-02 00:00:00.000 2.0000000
# 2018-02-02 01:00:00.000 0.8312698
x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days")
> x3.d <- align.time(x3.d, 86400)
> x3.d
[,1]
2018-02-02 1.00000000
2018-02-03 -0.09832625
2018-02-04 -0.65075506
2018-02-05 -0.09423664
2018-02-06 0.33067734
See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999.
Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates:
index(x3.d) = index(x3.d) - 86400
I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs:
#e.g. bad: day light savings occurs on this weekend for US EST
z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York"))
> index(z) - 86400
[1] "2018-03-10 23:00:00 EST"
i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00).
You could get around this problem using something much safer like this:
library(lubridate)
# right
> index(z) - days(1)
[1] "2018-03-11 EST"
I don't think this is possible because 00:00 is the start of the day. From the manual:
These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day
I think the solution here is to use minutes instead of hours. Using your example:
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200))
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day")
x1[endpoints(x1, 'day')]
Could you please tell me how to rearrange the datetime of data set A in order to compatible with datetime of data set B (which is in GMT+10 format)?
Thank you.
**data set A**
sitecode status start end
ANS0009 spike 11/09/2013 04:45:00 PM (GMT+11) 11/09/2013 05:00:00 PM (GMT+11)
ARM0064 spike 05/03/2014 11:00:00 AM (GMT+10) 05/03/2014 11:15:00 AM (GMT+10)
BAS0059 dry 13/01/2013 00:00:00 AM (GMT+11) 29/03/2013 11:45:00 PM (GMT+11)
BAS0059 spike 11/03/2014 10:15:00 AM (GMT+10) 11/03/2014 10:30:00 AM (GMT+10)
BLC0097 failure 12/20/2012 05:00:00 PM (GMT+11) 12/31/2012 11:45:00 PM (GMT+11)
BLC0097 spike 24/12/2015 04:59:45 PM (GMT+10) 24/12/2015 05:01:50 PM (GMT+10)
**data set B**
sitecode status start end
EUM0056 record 2012-12-01 11:00:00 2013-10-06 01:45:00
EUM0056 missing 2013-10-06 01:45:00 2013-10-06 03:00:00
EUM0056 record 2013-10-06 03:00:00 2014-03-11 20:15:00
MDL0026 record 2012-12-07 11:00:00 2013-04-04 19:45:00
MDL0026 missing 2013-04-04 19:45:00 2014-02-27 23:00:00
MDL0026 record 2014-02-27 23:00:00 2014-10-05 01:45:00
We can could use lubridate to parse multiple formats after splitting the string into two to remove the (GMT + ...).
library(lubridate)
library(stringr)
v1 <- strsplit(str1, "\\s+(?=\\()", perl = TRUE)[[1]]
parse_date_time(v1[1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz= "GMT", exact = TRUE) + lubridate::hours(str_extract(v1[2], "\\d+"))
#[1] "2013-09-12 03:45:00 GMT"
Using the full dataset example
datA[c("start", "end")] <- lapply(datA[c("start", "end")], function(x){
m1 <- do.call(rbind, strsplit(x, "\\s+(?=\\()", perl = TRUE))
parse_date_time(m1[,1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz = "GMT", exact = TRUE) + lubridate::hours(str_extract(m1[,2], "\\d+")
)})
data
str1 <- "11/09/2013 04:45:00 PM (GMT+11)"
require(lubridate)
exampleA <- c("11/09/2013 04:45:00 PM (GMT+11)",
"11/09/2013 04:45:00 PM (GMT+10)")
exampleA <- as.data.frame(exampleA)
exampleA$flag <- 0
exampleA$flag[grep(" PM \\(GMT\\+11\\)", exampleA$exampleA)] <- 1
exampleA$exampleA <- gsub(" PM \\(GMT\\+11\\)","", exampleA$exampleA)
exampleA$exampleA <- gsub(" PM \\(GMT\\+10\\)","", exampleA$exampleA)
exampleA$exampleA <- mdy_hms(exampleA$exampleA)
exampleA$exampleA[exampleA$flag == 1] <- exampleA$exampleA - 3600
exampleB <- c("2013-11-09 03:45:00", "2013-11-09 04:45:00")
exampleB <- ymd_hms(exampleB)
# Proof it works
exampleA$exampleA == exampleB
[1] TRUE TRUE
If you have a mix of formats in 1 data set (i.e. mdy, ydm, etc) you can deal with this by using if statements -- either in a function which you can apply or a for loop -- and text if a certain position has a value >12 to determine the format, then use the appropriate lubridate function to convert it.
The problem:
I have two dataframes that I would like to merge depending on the date/time of one dataframe being in the interval of the other dataframe.
traffic: Date and Time (Posixct), Frequency
mydata: Interval, Sum of Frequency
I would now like to calculate if the Posixct time from traffic is within the interval of mydata and if this is TRUE I would like to count the frequency in the column "Sum of Frequencies" in mydata.
The two problems, that I encountered:
1. traffic data frame has significantly more rows than mydata. I dont know how to tell R to loop through every observation in traffic to check for one row in mydata.
There can be more than one observation fitting in the frequency interval of mydata. I want R to add up all frequencies of the different traffic observations to get a total score of frequencies. Also the intervals are overlapping.
Here is the data:
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.frame(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
mydata$SumFrequency <- NA
The expected outcome should be something like this:
mydata$SumFrequency <- c(24, 2, 2)
head(mydata)
I tried int_overlaps from package lubridate.
Any tips on how to solve this are higly appreciated!
A short solution with foverlaps from the data.table package:
mydata <- data.table(DateTime1, DateTime2, key = c("DateTime1", "DateTime2"))
traffic <- data.table(start = DateTime, end = DateTime, Frequency, key = c("start","end"))
foverlaps(traffic, mydata, type="within", nomatch=0L)[, .(sumFreq = sum(Frequency)),
by = .(DateTime1, DateTime2)]
which gives:
DateTime1 DateTime2 sumFreq
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2
On a data.table approach with between to filter traffic dataset on time:
setDT(traffic)
setDT(mydata)
mydata[,SumFrequency := as.numeric(SumFrequency)] # coerce logical to numeric for next step.
mydata[,SumFrequency := sum( traffic[ DateTime %between% c(DateTime1, DateTime2), Frequency] ), by=1:nrow(mydata)]
which give:
DateTime1 DateTime2 Interval SumFrequency
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2
If there's a lot of row in mydata, it could be better to create an index column and use it in by clause:
mydata[, idx := .I]
mydata[, SumFrequency := sum( traffic[DateTime %between% c(DateTime1, DateTime2),Frequency] ),by=idx]
And this gives:
DateTime1 DateTime2 Interval SumFrequency idx
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24 1
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2 3
I see two solutions :
With data.frame and plyr
You could do it using %within% function in lubridate and with a for-loop or using plyr loop functions like dlply
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.frame(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
library(plyr)
# Create a group-by variable
mydata$NumInt <- 1:nrow(mydata)
mydata$SumFrequency <- dlply(mydata, .(NumInt),
function(row){
sum(
traffic[traffic$DateTime %within% row$Interval, "Frequency"]
)
})
mydata
#> DateTime1 DateTime2
#> 1 2014-11-01 04:00:00 2014-11-01 04:15:00
#> 2 2015-08-01 04:03:00 2015-08-01 04:13:00
#> 3 2015-08-01 14:00:00 2015-08-01 14:15:00
#> Interval NumInt SumFrequency
#> 1 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 1 24
#> 2 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2 2
#> 3 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 3 2
With data.table and functions foverlaps
data.table has implemented a function for overlapping joins that you could use in your case with a little trick.
This functions is foverlaps (I uses below data.table 1.9.6)
(see How to perform join over date ranges using data.table? and this presentation)
Notice that you do not need to create interval with lubridate
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.table(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
mydata <- data.table(DateTime1 = as.POSIXct(DateTime1), DateTime2 = as.POSIXct(DateTime2))
# Use function `foverlaps` for overlapping joins
# Here's the trick : create a dummy variable to artificially have an interval
traffic[, dummy:=DateTime]
setkey(mydata, DateTime1, DateTime2)
# do the join
mydata2 <- foverlaps(traffic, mydata, by.x=c("DateTime", "dummy"), type ="within", nomatch=0L)[, dummy := NULL][]
mydata2
#> DateTime1 DateTime2 DateTime Frequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 1
#> 2: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:03:00 2
#> 3: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:06:00 3
#> 4: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:08:00 5
#> 5: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:10:00 12
#> 6: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:12:00 1
#> 7: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:13:00 2
#> 8: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:15:00 1
#> 9: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:13:00 1
# summarise with a sum by grouping by each line of mydata
setkeyv(mydata2, key(mydata))
mydata2[mydata, .(SumFrequency = sum(Frequency)), by = .EACHI]
#> DateTime1 DateTime2 SumFrequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 24
#> 2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2
#> 3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2
As far as point 2 is concerned you can use aggregate for instance
aggData <- aggregate(traffic$Frequency~format(traffic$DateTime, "%Y%m%d h:m"), data=traffic, sum)
This sums all frequencies in minute intervals.
And for point 1. Wouldn't a merge work?
merge(x = myData, y = aggData, by = "DateTime", all.x = TRUE)
The outer merge is explained here
Using a for.loop we could do something like this:
for(i in 1:nrow(mydata)) {
mydata$SumFrequency[i] <- sum(traffic$Frequency[traffic$DateTime %within% mydata$Interval[i]])
}
> mydata
# DateTime1 DateTime2 Interval SumFrequency
#1 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24
#2 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2
#3 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2
I was working with a time series dataset having hourly data. The data contained a few missing values so I tried to create a dataframe (time_seq) with the correct time value and do a merge with the original data so the missing values become 'NA'.
> data
date value
7980 2015-03-30 20:00:00 78389
7981 2015-03-30 21:00:00 72622
7982 2015-03-30 22:00:00 65240
7983 2015-03-30 23:00:00 47795
7984 2015-03-31 08:00:00 37455
7985 2015-03-31 09:00:00 70695
7986 2015-03-31 10:00:00 68444
//converting the date in the data to POSIXct format.
> data$date <- format.POSIXct(data$date,'%Y-%m-%d %H:%M:%S')
// creating a dataframe with the correct sequence of dates.
> time_seq <- seq(from = as.POSIXct("2014-05-01 00:00:00"),
to = as.POSIXct("2015-04-30 23:00:00"), by = "hour")
> df <- data.frame(date=time_seq)
> df
date
8013 2015-03-30 20:00:00
8014 2015-03-30 21:00:00
8015 2015-03-30 22:00:00
8016 2015-03-30 23:00:00
8017 2015-03-31 00:00:00
8018 2015-03-31 01:00:00
8019 2015-03-31 02:00:00
8020 2015-03-31 03:00:00
8021 2015-03-31 04:00:00
8022 2015-03-31 05:00:00
8023 2015-03-31 06:00:00
8024 2015-03-31 07:00:00
// merging with the original data
> a <- merge(data,df, x.by = data$date, y.by = df$date ,all=TRUE)
> a
date value
4005 2014-07-23 07:00:00 37003
4006 2014-07-23 07:30:00 NA
4007 2014-07-23 08:00:00 37216
4008 2014-07-23 08:30:00 NA
The values I get after merging are incorrect and they contain half-hourly values. What would be the correct approach for solving this?
Why are is the merge result in 30 minute intervals when both my dataframes are hourly?
PS:I looked into this question : Fastest way for filling-in missing dates for data.table and followed the steps but it didn't help.
You can use the padr package to solve this problem.
library(padr)
library(dplyr) #for the pipe operator
data %>%
pad() %>%
fill_by_value()
This is an example of my dataset:
> head(daily[,c(6,7)])->test
> head(test)
timeMin min
316 2013-05-02 13:45:00 3239
317 2013-05-03 12:30:00 3260
318 2013-05-04 12:30:00 3165
319 2013-05-05 12:30:00 3404
320 2013-05-06 12:30:00 3514
321 2013-05-07 13:15:00 3626
I need mean(timeMin), in order to know what´s the time of the day (hour:minute) at what the event usually happens. I have tried this:
library(lubridate)
> test$hourMin<-paste(hour(test$timeMin),minute(test$timeMin),sep=":”)
> test$hourMin <- hm(test$hourMin)
And I got this:
> head(test)
timeMin min hourMin
316 2013-05-02 13:45:00 3239 13H 45M 0S
317 2013-05-03 12:30:00 3260 12H 30M 0S
318 2013-05-04 12:30:00 3165 12H 30M 0S
319 2013-05-05 12:30:00 3404 12H 30M 0S
320 2013-05-06 12:30:00 3514 12H 30M 0S
321 2013-05-07 13:15:00 3626 13H 15M 0S
however, when I try to calculate the mean I had no result:
> mean(test$hourMin)
[1] 0
It should be straightforward, but I don´t know how to do it, since I am a beginner. I would appreciate any help. Thanks
It's really not elegant, but the only way I found for now is to change the date components to the same day and to compute the mean on the result. With lubridate :
time <- df$timeMin
time <- update(time, year=2000, month=1, mday=1)
mean(time)
# [1] "2000-01-01 12:50:00 CET"
Hopefully someone will provide something better...
I'm calculating seconds past 1st Jan, 2013 midnight and then taking mean of that and adding it back to 1st Jan, 2013 midnight.
I guess there are packages that can do this from just one command but if you, like me, don't wish to rely too much on packages then this solution should work for you.
library(data.table)
timetable <- data.table(TimeMin = c("2013-05-02 13:45:00",
"2013-05-03 12:30:00",
"2013-05-04 12:30:00",
"2013-05-05 12:30:00",
"2013-05-06 12:30:00",
"2013-05-07 13:15:00")
)
timetable <- timetable[, TimePastMin :=
difftime(
"2013-01-01 00:00:00",
TimeMin,
units = "secs"
)
]
meanTimePastMin <- mean(timetable[, TimePastMin])
meanTimeMin <- strptime("2013-01-01 00:00:00", "%Y-%m-%d %H:%M:%S") - meanTimePastMin
meanTimeMin
# "2013-05-05 00:50:00 IST"