problem with hour interval in time series data in r

problem with hour interval in time series data in r - r

I am new at using R and I am encountering a problem with historical hourly electric load data that I have downloaded.My goal is to make a load forecast based on an ARIMA model and/or Artificial Neural Networks.
The problem is that the data is in the following Date-time (hourly) format:
#> DateTime Day_ahead_Load Actual_Load
#> [1,] "01.01.2015 00:00 - 01.01.2015 01:00" "6552" "6100"
#> [2,] "01.01.2015 01:00 - 01.01.2015 02:00" "6140" "5713"
#> [3,] "01.01.2015 02:00 - 01.01.2015 03:00" "5950" "5553"
I have tried to make a POSIXct object but it didn't work:
as.Date.POSIXct(DateTime, format = "%d-%m-%Y %H:%M:%S", tz="EET", usetz=TRUE)
The message I get is that it is not in an unambiguous format. I would really appreciate your feedback on this.
Thank you in advance.
Best Regards,
Iro

You have 2 major problems. First, your DateTime column contains two dates, so you need to split that column into two. Second, your format argument has - characters but your date has . characters.
We can use separate from tidyr and mutate with across to change the columns to POSIXct.
library(dplyr)
library(tidyr)
data %>%
separate(DateTime, c("StartDateTime","EndDateTime"), " - ") %>%
mutate(across(c("StartDateTime","EndDateTime"),
~ as.POSIXct(., format = "%d.%m.%Y %H:%M",
tz="EET", usetz=TRUE)))
StartDateTime EndDateTime Day_ahead_Load Actual_Load
1 2015-01-01 00:00:00 2015-01-01 01:00:00 6552 6100
2 2015-01-01 01:00:00 2015-01-01 02:00:00 6140 5713
3 2015-01-01 02:00:00 2015-01-01 03:00:00 5950 5553

Related

How to convert date and time from UTC to local time in R?

I have a dataframe (vlinder) like the following, whereby the date and the timestamp (in UTC) are in separate columns:
date time.utc variable
1/04/2020 0:00:00 12
1/04/2020 0:05:00 54
In a first step, I combined the date and time variables into one column called dateandtime using the following code:
vlinder$dateandtime <- paste(vlinder$date, vlinder$time.utc)
which resulted in an extra column in dataframe vlinder:
date time.utc variable dateandtime
1/04/2020 0:00:00 12 1/04/2020 0:00:00
1/04/2020 0:05:00 54 1/04/2020 0:05:00
I want to convert the time of UTC into local time (which is CEST, so a time difference of 2 hours).
I tried using the following code, but I get something totally different.
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC")
vlinder$dateandtime.cest <- format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)
which results in:
date time.utc variable dateandtime dateandtime.cest
1/04/2020 0:00:00 12 0001-04-20 0001-04-20 00:17:30 LMT
1/04/2020 0:05:00 54 0001-04-20 0001-04-20 00:17:30 LMT
How can I solve this?
Many thanks!

Here's a lubridate and tidyverse answer. Some data tidying, data type changes, and then bam. Check lubridate::OlsonNames() for valid time zones (tz). (I'm not positive I chose the correct tz.)
library(tidyverse)
library(lubridate)
df <- read.table(header = TRUE,
text = "date time.utc variable
1/04/2020 00:00:00 12
1/04/2020 00:05:00 54")
df <- df %>%
mutate(date = mdy(date),
datetime_utc = as_datetime(paste(date, time.utc)),
datetime_cest = as_datetime(datetime_utc, tz = 'Europe/Brussels'))
date time.utc variable datetime_utc datetime_cest
1 2020-01-04 00:00:00 12 2020-01-04 00:00:00 2020-01-04 01:00:00
2 2020-01-04 00:05:00 54 2020-01-04 00:05:00 2020-01-04 01:05:00

The default format of as.POSIXct expects an date ordered by Year-Month-Day. Therefore the date 01/04/2020 is translated into the 20th April of Year 1.
You just need to add your timeformat to as.POSIXct:
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC", format = "%d/%m/%Y %H:%M:%S")
format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)

Converting date and hour into xts R

i have this table of consumptions. I am trying to convert the first two columns into a one xts date format.
1 01.01.2016 00:00:00 26.27724
2 01.01.2016 01:00:00 24.99182
3 01.01.2016 02:00:00 23.53261
4 01.01.2016 03:00:00 22.46478
5 01.01.2016 04:00:00 22.00291
6 01.01.2016 05:00:00 21.95708
7 01.01.2016 06:00:00 22.20354
8 01.01.2016 07:00:00 21.84416
i have tried the code belo and got that error.
timestamp=format(as.POSIXct(paste(datecol,hourcol)), "%d/%m/%Y %H:%M:%S")
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
the date is character and hour is in double format.

If you were trying to combine date and time value to create timestamp, we can use as.POSIXct in base R.
df$timestamp <- as.POSIXct(paste(df$datecol,df$hourcol),
format = "%d.%m.%Y %T", tz = "UTC")
Or using lubridate
df$timestamp <- lubridate::dmy_hms(paste(df$datecol,df$hourcol))
Or using anytime
df$timestamp <- anytime::anytime(paste(df$datecol,df$hourcol))

Associate numbers to datetime/timestamp

I have a dataframe df with a certain number of columns. One of them, ts, is timestamps:
1462147403122 1462147412990 1462147388224 1462147415651 1462147397069 1462147392497
...
1463529545634 1463529558639 1463529556798 1463529558788 1463529564627 1463529557370.
I have also at my disposal the corresponding datetime in the datetime column:
"2016-05-02 02:03:23 CEST" "2016-05-02 02:03:32 CEST" "2016-05-02 02:03:08 CEST" "2016-05-02 02:03:35 CEST" "2016-05-02 02:03:17 CEST" "2016-05-02 02:03:12 CEST"
...
"2016-05-18 01:59:05 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:16 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:24 CEST" "2016-05-18 01:59:17 CEST"
As you can see my dataframe contains data accross several day. Let's say there are 3. I would like to add a column containing number 1, 2 or 3. 1 if the line belongs to the first day, 2 for the second day, etc...
Thank you very much in advance,
Clement

One way to do this is to keep track of total days elapsed each time the date changes, as demonstrated below.
# Fake data
dat = data.frame(datetime = c(seq(as.POSIXct("2016-05-02 01:03:11"),
as.POSIXct("2016-05-05 01:03:11"), length.out=6),
seq(as.POSIXct("2016-05-09 01:09:11"),
as.POSIXct("2016-05-16 02:03:11"), length.out=4)))
tz(dat$datetime) = "UTC"
Note, if your datetime column is not already in a datetime format, convert it to one using as.POSIXct.
Now, create a new column with the day number, counting the first day in the sequence as day 1.
dat$day = c(1, cumsum(as.numeric(diff(as.Date(dat$datetime, tz="UTC")))) + 1)
dat
datetime day
1 2016-05-02 01:03:11 1
2 2016-05-02 15:27:11 1
3 2016-05-03 05:51:11 2
4 2016-05-03 20:15:11 2
5 2016-05-04 10:39:11 3
6 2016-05-05 01:03:11 4
7 2016-05-09 01:09:11 8
8 2016-05-11 09:27:11 10
9 2016-05-13 17:45:11 12
10 2016-05-16 02:03:11 15
I specified the timezone in the code above to avoid getting tripped up by potential silent shifts between my local timezone and UTC. For example, note the silent shift from my default local time zone ("America/Los_Angeles") to UTC when converting a POSIXct datetime to a date:
# Fake data
datetime = seq(as.POSIXct("2016-05-02 01:03:11"), as.POSIXct("2016-05-05 01:03:11"), length.out=6)
tz(datetime)
[1] ""
date = as.Date(datetime)
tz(date)
[1] "UTC"
data.frame(datetime, date)
datetime date
1 2016-05-02 01:03:11 2016-05-02
2 2016-05-02 15:27:11 2016-05-02
3 2016-05-03 05:51:11 2016-05-03
4 2016-05-03 20:15:11 2016-05-04 # Note day is different due to timezone shift
5 2016-05-04 10:39:11 2016-05-04
6 2016-05-05 01:03:11 2016-05-05

Divide time-series data into weekday and weekend datasets using R

I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.

You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday

Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday

Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.

obtain hour from DateTime vector

I have a DateTime vector within a data.frame where the data frame is made up of 8760 observations representing hourly intervals throughout the year e.g.
2010-01-01 00:00
2010-01-01 01:00
2010-01-01 02:00
2010-01-01 03:00
and so on.
I would like to create a data.frame which has the original DateTime vector as the first column and then the hourly values in the second column e.g.
2010-01-01 00:00 00:00
2010-01-01 01:00 01:00
How can this be achieved?

Use format or strptime to extract the time information.
Create a POSIXct vector:
x <- seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5)
Extract the time:
data.frame(
date=x,
time=format(x, "%H:%M")
)
date time
1 2012-05-21 00:00:00 00:00
2 2012-05-21 01:00:00 01:00
3 2012-05-21 02:00:00 02:00
4 2012-05-21 03:00:00 03:00
5 2012-05-21 04:00:00 04:00
If the input vector is a character vector, then you have to convert to POSIXct first:
Create some data
dat <- data.frame(
DateTime=format(seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5), format="%Y-%m-%d %H:%M")
)
dat
DateTime
1 2012-05-21 00:00
2 2012-05-21 01:00
3 2012-05-21 02:00
4 2012-05-21 03:00
5 2012-05-21 04:00
Split time out:
data.frame(
DateTime=dat$DateTime,
time=format(as.POSIXct(dat$DateTime, format="%Y-%m-%d %H:%M"), format="%H:%M")
)
DateTime time
1 2012-05-21 00:00 00:00
2 2012-05-21 01:00 01:00
3 2012-05-21 02:00 02:00
4 2012-05-21 03:00 03:00
5 2012-05-21 04:00 04:00

Or generically, not treating them as dates, you can use the following provided that the time and dates are padded correctly.
library(stringr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- data.frame(df, Time = str_sub(df$DateTime, -5, -1))
It depends on your needs really.

Using lubridate
library(stringr)
library(lubridate)
library(plyr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- mutate(df, DateTime = ymd_hm(DateTime),
time = str_c(hour(DateTime), str_pad(minute(DateTime), 2, side = 'right', pad = '0'), sep = ':'))

On a more general note, for anyone that comes here from google and maybe wants to group by hour:
The key here is: lubridate::hour(datetime)
p22 in the cran doc here: https://cran.r-project.org/web/packages/lubridate/lubridate.pdf

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

problem with hour interval in time series data in r - r

Related

How to convert date and time from UTC to local time in R?

Converting date and hour into xts R

Associate numbers to datetime/timestamp

Divide time-series data into weekday and weekend datasets using R

obtain hour from DateTime vector

Categories

Resources