Convert daily (date) format to hourly (posixct) - r

I have a data frame with daily data. I need to bind it to hourly data, but first I need to convert it to a suitable posixct format. This looks like this:
set.seed(42)
df <- data.frame(
Date = seq.Date(from = as.Date("2015-01-01", "%Y-%m-%d"), to = as.Date("2015-01-29", "%Y-%m-%d"), by = "day"),
var1 = runif(29, min = 5, max = 10)
)
result <- data.frame(
Date = d <- seq.POSIXt(from = as.POSIXct("2015-01-01 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-01-29 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour"),
var1 = rep(df$var1, each = 24) )
However, my data is not as easy to work with as the above. I have lots of missing dates, so I need to be able to take the specific df$Date-vector and convert it to a posixct frame, with the matching daily values.
I've looked high and low but been unable to find anything on this.

The way I went about this was to find the min and max of the dataset and deem them hour 0 and hour 23.
hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$Date, "00:00:00"),tz="")),max(as.POSIXct(paste0(df$Date, "23:00:00"),tz="")),by="hour"))
hourly[,"Var1"] <- df[match(x = as.Date(hourly$Hourly),df$Date),"var1"]
This achieves a result of the daily values becoming hourly with the daily var1 assigned to each hour that contains the day. In this respect missing daily values should not be an issue and if there is no match, it will add in NA's.

Related

Turn integer into date

When I turn the date vector into a dataframe the dates turn into a integer. This integer has a very strange format and I can not understand how to turn it in to a date (including seconds).
vdates <- seq(c(ISOdate(2018,10,01)), by = "3 sec", length.out = 200000)
dfDates <- as.data.frame((matrix(vdates)))
dfDates$id <- 1:nrow(dfDates)
colnames(dfDates) <- c("Dates", "Id")

get monthly date in time series object in R as a POSIXlt

I have a time series object with the following dates:
my_data = rnorm(155)
my_data_ts <- ts(my_data, start = c(2002, 10), frequency = 12)
How do I get the date as a POSIXlt object, and the values?
my_data_date = STHGTOCONVERTTOPOSIXLT(my_data)???
my_data_values = STHGTOGETVALUES(my_data)???
To get the first day of the month, you can use:
as.POSIXlt( paste0(floor(time(my_data_ts)),'-', round(12*(time(my_data_ts)-floor(time(my_data_ts))))+1,'-01'), tz="UTC")
For the values, just use :
as.vector(my_data_ts)

Merge and join with Daylight Saving

I am having problems merging / joining data for the coming daylight savings shift. My time-vector d is supposed to be the controlling time-vector, so when I join with data with missing holes I just get NA values. This normally works brillantly. However, during the coming '2015-10-25 02:00:00' it goes horribly wrong.
Data example:
d <- seq.POSIXt(from = as.POSIXct("2015-10-25 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-10-25 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour")
df1 <- data.frame(Date = d, value1 = 1:25)
df2 <- data.frame(Date = as.POSIXct(format(d, "%Y-%m-%d %H:%M:%S"), tz = ""), value2 = 26:50)
require(dplyr)
df <- left_join(df1, df2, by = "Date")
df <- merge(df1, df2, by = "Date", all.x = TRUE)
Both left_join and merge gives wrong results, and I am not sure what goes wrong. Well, I can see R has no idea how to handle the two repeated hours - and that is completely understandable. Both time series are POSIXct, but there is clearly some information I am missing? How can you handle this? I would prefer a base R-solution.
It gets exponentially worse, if you need to do even more joins from different data-sets. I need to join 7 and it just gets worse and worse.
The correct result is:
result <- data.frame(Date = d, var1 = df1[, 2], var2 = df2[, 2])

Changing time zones with POSIXct time series, R

I've run into a trouble working with time series & zones in R, and I can't quite figure out how to proceed.
I have an time series data like this:
df <- data.frame(
Date = seq(as.POSIXct("2014-01-01 00:00:00"), length.out = 1000, by = "hours"),
price = runif(1000, min = -10, max = 125),
wind = runif(1000, min = 0, max = 2500),
temp = runif(1000, min = - 10, max = 25)
)
Now, the Date is in UTC-time. I would like to subset/filter the data, so for example I get the values from today (Today is 2014-05-13):
df[ as.Date(df$Date) == Sys.Date(), ]
However, when I do this, I get data that starts with:
2014-05-13 02:00:00
And not:
2014-05-13 00:00:00
Because im currently in CEST-time, which is two hours after UTC-time. So I try to change the data:
df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H", tz = "Europe/Berlin")
Yet this doesn't work. I've tried various variations, such as stripping it to character, and then converting and so on, but I've run my head against a wall, and Im guessing there is something simple im missing.
To avoid using issues with timezones like this, use format to get the character representation of the date:
df[format(df$Date,"%Y-%m-%d") == Sys.Date(), ]

subtracting dates with standardised result

I am subtracting dates in xts i.e.
library(xts)
# make data
x <- data.frame(x = 1:4,
BDate = c("1/1/2000 12:00","2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00"),
CDate = c("2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00","9/1/2000 12:00"),
ADate = c("3/1/2000","4/1/2000","5/1/2000","10/1/2000"),
stringsAsFactors = FALSE)
x$ADate <- as.POSIXct(x$ADate, format = "%d/%m/%Y")
# object we will use
xxts <- xts(x[, 1:3], order.by= x[, 4] )
#### The subtractions
# anwser in days
transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
# asnwer in hours
transform(xxts, lag = as.POSIXct(CDate, format = "%d/%m/%Y %H:%M") - index(xxts))
Question: How can I standardise the result so that I always get the answer in hours. Not by multiplying the days by 24 as I will not know before han whther the subtratcion will round to days or hours....
Unless I can somehow check if the format is in days perhaps using grep and regexand then multiply within an if clause.
I have tried to work through this and went for the grep regex apprach but this doesnt even keep the negative sign..
p <- transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
library(stringr)
ind <- grep("days", p$lag)
p$lag[ind] <- as.numeric( str_extract_all(p$lag[ind], "\\(?[0-9,.]+\\)?")) * 24
p$lag
#2000-01-03 2000-01-04 2000-01-05 2000-01-10
# 36 36 36 132
I am convinced there is a more elegant solution...
ok difftime works...
transform(xxts, lag = difftime(as.POSIXct(BDate, format = "%d/%m/%Y %H:%M"), index(xxts), unit = "hours"))

Resources