I am subtracting dates in xts i.e.
library(xts)
# make data
x <- data.frame(x = 1:4,
BDate = c("1/1/2000 12:00","2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00"),
CDate = c("2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00","9/1/2000 12:00"),
ADate = c("3/1/2000","4/1/2000","5/1/2000","10/1/2000"),
stringsAsFactors = FALSE)
x$ADate <- as.POSIXct(x$ADate, format = "%d/%m/%Y")
# object we will use
xxts <- xts(x[, 1:3], order.by= x[, 4] )
#### The subtractions
# anwser in days
transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
# asnwer in hours
transform(xxts, lag = as.POSIXct(CDate, format = "%d/%m/%Y %H:%M") - index(xxts))
Question: How can I standardise the result so that I always get the answer in hours. Not by multiplying the days by 24 as I will not know before han whther the subtratcion will round to days or hours....
Unless I can somehow check if the format is in days perhaps using grep and regexand then multiply within an if clause.
I have tried to work through this and went for the grep regex apprach but this doesnt even keep the negative sign..
p <- transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
library(stringr)
ind <- grep("days", p$lag)
p$lag[ind] <- as.numeric( str_extract_all(p$lag[ind], "\\(?[0-9,.]+\\)?")) * 24
p$lag
#2000-01-03 2000-01-04 2000-01-05 2000-01-10
# 36 36 36 132
I am convinced there is a more elegant solution...
ok difftime works...
transform(xxts, lag = difftime(as.POSIXct(BDate, format = "%d/%m/%Y %H:%M"), index(xxts), unit = "hours"))
Related
I have a dataframe called 'trial'. I have combined the Date and Time in the data frame to get a field which has timestamp as a POSIXct. I want to set this combined date time or the timestampas the index for my data frame 'trial' how can I do so? I have seen similar questions on this with no success.
The code is as follows:
trial <- read.csv("2018_05_04_h093500.csv", header=TRUE, skip = 16, sep=",")
trial$Date <- with(trial, as.POSIXct(paste(as.Date(Date, format="%Y/%m/%d"), Time)))
dtPOSIXct <- as.POSIXct(trial$Date )
dtTime <- as.numeric(dtPOSIXct - trunc(dtPOSIXct, "days"))
class(dtTime) <- "POSIXct"
If you're talking about rownames which is the R equivalent to index in pandas, they can't be POSIXct datetimes, they have to be characters.
# sample data
x <- data.frame('a' = 1:3, 'b' = c('a', 'b', 'c'))
print(x)
# a b
#2018-01-01 15:51:33 1 a
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
dates <- c('2018-01-01 15:51:33', '2018-01-04 11:42:31', '2018-01-07 22:04:41')
rownames(x) <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
print(class(rownames(x)[1]))
# [1] "character"
That said, you can still coerce them to POSIXct (or any other class, obviously) at the time of evaluation at the cost of some overhead and code clutteredness:
# print x where rownames, when coerced to POSIXct, represent dates after d
d <- as.POSIXct('2018-01-03 00:00:00', '%Y-%m-%d %H:%M:%S')
x[as.POSIXct(rownames(x), format = f) > d, ]
# a b
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
However, perhaps an easier approach would be to just have an arbitrary column effectively act as a datetime index:
x$date <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
class(x$date[1])
# [1] "POSIXct" "POSIXt"
I want to add hours to day specified dates. And I want the output to be in date format. I wrote the below code:
day<-as.Date(c("20-01-2016", "21-01-2016", "22-01-2016", "23-01-2016"),format="%d-%m-%Y")
hour<-c("12:00:00")
date<-as.Date(paste(day,hour), format="%d-%m-%Y %h:%m:%s")
However, This code produces NA's:
> date
[1] NA NA NA NA
How can I do this in R? I will be very glad for any help. Thanks a lot.
The below code also doesn't work:
day<-as.Date(c("20-01-2016", "21-01-2016", "22-01-2016", "23-01-2016"),format="%d-%m-%Y")
time <- "12:00:00"
x <- paste(day, time)
x1<-as.POSIXct(x, format = "%d-%m-%Y %H:%M:%S")
It still prodeces NAs:
> x1
[1] NA NA NA NA
You can do either of these two:
dates <- as.Date(c("20-01-2016", "21-01-2016", "22-01-2016", "23-01-2016"), format = "%d-%m-%Y")
time <- "12:00:00"
x <- paste(dates, time)
as.POSIXct(x, format = "%Y-%m-%d %H:%M:%S")
dates <- c("20-01-2016", "21-01-2016", "22-01-2016", "23-01-2016")
time <- "12:00:00"
x <- paste(dates, time)
as.POSIXct(x, format = "%d-%m-%Y %H:%M:%S")
I personally find the second version simpler.
I am having problems merging / joining data for the coming daylight savings shift. My time-vector d is supposed to be the controlling time-vector, so when I join with data with missing holes I just get NA values. This normally works brillantly. However, during the coming '2015-10-25 02:00:00' it goes horribly wrong.
Data example:
d <- seq.POSIXt(from = as.POSIXct("2015-10-25 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-10-25 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour")
df1 <- data.frame(Date = d, value1 = 1:25)
df2 <- data.frame(Date = as.POSIXct(format(d, "%Y-%m-%d %H:%M:%S"), tz = ""), value2 = 26:50)
require(dplyr)
df <- left_join(df1, df2, by = "Date")
df <- merge(df1, df2, by = "Date", all.x = TRUE)
Both left_join and merge gives wrong results, and I am not sure what goes wrong. Well, I can see R has no idea how to handle the two repeated hours - and that is completely understandable. Both time series are POSIXct, but there is clearly some information I am missing? How can you handle this? I would prefer a base R-solution.
It gets exponentially worse, if you need to do even more joins from different data-sets. I need to join 7 and it just gets worse and worse.
The correct result is:
result <- data.frame(Date = d, var1 = df1[, 2], var2 = df2[, 2])
I have a data frame with daily data. I need to bind it to hourly data, but first I need to convert it to a suitable posixct format. This looks like this:
set.seed(42)
df <- data.frame(
Date = seq.Date(from = as.Date("2015-01-01", "%Y-%m-%d"), to = as.Date("2015-01-29", "%Y-%m-%d"), by = "day"),
var1 = runif(29, min = 5, max = 10)
)
result <- data.frame(
Date = d <- seq.POSIXt(from = as.POSIXct("2015-01-01 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-01-29 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour"),
var1 = rep(df$var1, each = 24) )
However, my data is not as easy to work with as the above. I have lots of missing dates, so I need to be able to take the specific df$Date-vector and convert it to a posixct frame, with the matching daily values.
I've looked high and low but been unable to find anything on this.
The way I went about this was to find the min and max of the dataset and deem them hour 0 and hour 23.
hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$Date, "00:00:00"),tz="")),max(as.POSIXct(paste0(df$Date, "23:00:00"),tz="")),by="hour"))
hourly[,"Var1"] <- df[match(x = as.Date(hourly$Hourly),df$Date),"var1"]
This achieves a result of the daily values becoming hourly with the daily var1 assigned to each hour that contains the day. In this respect missing daily values should not be an issue and if there is no match, it will add in NA's.
I'm currently struggling with a beginner's issue regarding the calculation of a time difference between two events.
I want to take a column consisting of date and time (both values in one column) into consideration and calculate a time difference with the value of the previous/next row with the same ID (A or B in this example).
ID = c("A", "A", "B", "B")
time = c("08.09.2014 10:34","12.09.2014 09:33","13.08.2014 15:52","11.09.2014 02:30")
d = data.frame(ID,time)
My desired output is in the format Hours:Minutes
time difference = c("94:59","94:59","682:38","682:38")
The format Days:Hours:Minutes or anything similar would also work, as long as it could be conveniently implemented. I am flexible regarding the format of the output, the above is just an idea that crossed my mind.
For each single ID, I always have two rows (in the example 2xA and 2xB). I don't have a convincing idea how to avoid the repition of the difference.
I've tried some examples before, which I found on stackoverflow. Most of them used POSIXt and strptime. However, I didn't manage to apply those ideas to my data set.
Here's my attempt using dplyr
library(dplyr)
d %>%
mutate(time = as.POSIXct(time, format = "%d.%m.%Y %H:%M")) %>%
group_by(ID) %>%
mutate(diff = paste0(gsub("[.].*", "", diff(time)*24), ":",
round(as.numeric(gsub(".*[.]", ".", diff(time)*24))*60)))
# Source: local data frame [4 x 3]
# Groups: ID
#
# ID time diff
# 1 A 2014-09-08 10:34:00 94:59
# 2 A 2014-09-12 09:33:00 94:59
# 3 B 2014-08-13 15:52:00 682:38
# 4 B 2014-09-11 02:30:00 682:38
A very (to me) hack-ish base solution:
ID <- c("A", "A", "B", "B")
time <- c("08.09.2014 10:34", "12.09.2014 09:33", "13.08.2014 15:52","11.09.2014 02:30")
d <- data.frame(ID, time)
d$time <- as.POSIXct(d$time, format="%d.%m.%Y %H:%M")
unlist(unname(lapply(split(d, d$ID), function(d) {
sapply(abs(diff(c(d$time[2], d$time))), function(x) {
sprintf("%s:%s", round(((x*24)%/%1)), round(((x*24)%%1 *60)))
})
})))
## [1] "94:59" "94:59" "682:38" "682:38"
I have to believe this function exists somewhere already, tho.
similar to the attempts of David and hrmbrmstr, I found that this solution using difftime works
I use a rowshift script I found on stackoverflow
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
d$time.c <- as.POSIXct(d$time, format = "%d.%m.%Y %H:%M")
d$time.prev <- rowShift(d$time.c,-1)
d$diff <- difftime(d$time.c,d$time.prev, units="hours")
Every other row of d$diff has positive/negative values in the results. I do remove all the rows with negative values and have the difference between the first and the last time for every ID.