This is my setup: I have an excel-file with hourly electricity prices. I want to index them by the hourly interval, file here: Data. I load the data the usual way.
library(readxl)
library(tidyverse)
rm(list = ls())
DK1 <- read_excel("DK1.xlsx")
time_index <- as.POSIXct(DK1$Datetime, format="%Y/%m/%d %H:%M:%S", tz=Sys.timezone())
test <- xts(DK1[,-1], order.by = time_index)
This is just one of many ways I've tried to index it in XTS to no avail. The index row looks wrong and I do not know what to do.
UPDATE 1: dput(head(DK1))
It appears that read_excel is converting your time column into a datetime, but with all the dates set to "1899-12-31". This can be seen by running:
> str(DK1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8760 obs. of 6 variables:
$ Date : POSIXct, format: "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"...
$ Hours : POSIXct, format: "1899-12-31 00:00:00" "1899-12-31 01:00:00" "1899-12-31 02:00:00" "1899-12-31 03:00:00" ...
$ Datetime : chr "2019-01-01 00:00:00" "2019-01-01 01:00:00" "2019-01-01 02:00:00" "2019-01-01 03:00:00" ...
$ DK1 : num 211.5 75.2 -30.5 -74 -55.3 ...
This is more of a data import problem and the Datetime concat in excel can be performed in R. Generally it's simpler to have all data manipulation performed in a single spot.
library(readxl)
library(xts)
DK1 <- read_excel("DK1.xlsx")
# pasting date and time together in new column name for comparison
# note the use of strftime to remove the date information discussed earlier
DK1$Datetime2 <- paste(DK1$Date, strftime(DK1$Hours, "%H:%M:%S", tz = "UTC"))
# the format / in excel need to change to - for how it's displayed in R
DK1$time_index <- as.POSIXct(DK1$Datetime, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
# filtering out the NA value of 2019-03-10 02:00:00 which is when daylight savings occurred
DK1 <- DK1[!is.na(DK1$time_index), ]
DK1a <- xts(DK1[, "DK1"], order.by = DK1$time_index)
> head(DK1a)
DK1
2019-01-01 00:00:00 211.48
2019-01-01 01:00:00 75.20
2019-01-01 02:00:00 -30.47
2019-01-01 03:00:00 -74.00
2019-01-01 04:00:00 -55.33
2019-01-01 05:00:00 -93.72
We can select the numeric column and then order.by the 'Date' which is already a Datetime class
library(xts)
xts(DK1$DK1, order.by = DK1$Date)
as the format is in the default format, we don't have to specify the format
Related
I have a data frame with a Date columns, without time. I would like to convert it to a date time format, using 00:00:00 as time stamp. And print the time as well.
From these posts 1, 2 and 3, I get that time formatting in R might omit midnight, so I then use #ACuriousCat solution to print the time. The simpler code I have is:
data<-c(NA,"2014-03-18","2014-04-01","2014-04-15","2014-04-28","2014-05-14")
> data
[1] NA "2014-03-18" "2014-04-01" "2014-04-15" "2014-04-28" "2014-05-14"
> data1<-format(as.POSIXct(data,tz='UTC'),"%Y-%m-%d %H:%M:%S")
> data1
[1] NA "2014-03-18 00:00:00" "2014-04-01 00:00:00" "2014-04-15 00:00:00" "2014-04-28 00:00:00"
[6] "2014-05-14 00:00:00"
Which works great! However, on my real dataset, the time will be
> data1
[1] NA "2014-03-18 01:00:00" "2014-04-01 02:00:00" "2014-04-15 02:00:00" "2014-04-28 02:00:00"
[6] "2014-05-14 02:00:00"
It looks like a time zone issue + a daylight saving time issue in the way my data is read or coded in R. But how could I solve that? I tried different time zone, it didn't work. All I can do so far to solve it is:
> data1<-format(as.POSIXct(as_datetime(as.double(as.POSIXct(data)+3600)-3600),tz='UTC'),"%Y-%m-%d %H:%M:%S")
> data1
[1] NA "2014-03-18 00:00:00" "2014-04-01 00:00:00" "2014-04-15 00:00:00" "2014-04-28 00:00:00"
[6] "2014-05-14 00:00:00"
Is there a less convoluted way to code this?
It seems that in your manual check and sample you have your dates as a character string, and where it goes wrong on your real data table / frame you have probably the dates as a Date column (with another TZ set).
Here illustrated with dates (character) and dates2 (as.Date)
data <- data.table(
dates = c(NA,"2014-03-18","2014-04-01","2014-04-15","2014-04-28","2014-05-14")
)
data[, dates2 := as.Date(dates)]
data[, datetime := format(as.POSIXct(dates, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]
data[, datetime2 := format(as.POSIXct(dates2, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]
str(data)
# Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# $ dates : chr NA "2014-03-18" "2014-04-01" "2014-04-15" ...
# $ dates2 : Date, format: NA "2014-03-18" "2014-04-01" "2014-04-15" ...
# $ datetime : chr NA "03-18-2014 00:00:00" "04-01-2014 00:00:00" "04-15-2014 00:00:00" ...
# $ datetime2: chr NA "03-18-2014 01:00:00" "04-01-2014 02:00:00" "04-15-2014 02:00:00" ...
# - attr(*, ".internal.selfref")=<externalptr>
Edit
If you work with a character column with dates you can use this
data[, dates := as.character(dates)]
data[, datetime := format(as.POSIXct(dates, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]
If you had converted your dates to a Date colum you can use this
data[, dates := as.Date(dates)]
data[, datetime := format(as.POSIXct(dates), "%m-%d-%Y %H:%M:%S", tz = "UTC")]
As format returns a string anyhow, the best solution is actually this:
data[!is.na(dates), datetime := paste(dates, "00:00:00")]
Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.
In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
This question already has an answer here:
R: strptime() and is.na () unexpected results
(1 answer)
Closed 8 years ago.
I've encountered the following error when converting a set of dates in character format to a POSIXct object.
Example Data:
t<-c("3/11/2007 1:30", "3/11/2007 2:00", "4/11/2007 2:00")
str(t)
chr [1:3] "3/11/2007 1:30" "3/11/2007 2:00" "4/11/2007 2:00"
z<-as.POSIXct(strptime(t, format ="%m/%d/%Y %H:%M"))
z
"2007-03-11 01:30:00 MST" NA "2007-04-11 02:00:00 MDT"
str(z)
POSIXct[1:3], format: "2007-03-11 01:30:00" NA "2007-04-11 02:00:00"
My question is why is the NA returned for the second date in z? I have a dataset that contains 8 years of hourly data (from which I copied the dates above), and this NA error pops up only for dates between 3/8 - 3/14 and ONLY when the hour is 02:00:00.
I do not encounter an error if the dates are converted to POSIXlt, so that is my current work around.
Any thoughts?
Try using a time zone that does not use daylight savings time:
as.POSIXct(t, format = "%m/%d/%Y %H:%M", tz = "GMT")
## [1] "2007-03-11 01:30:00 GMT" "2007-03-11 02:00:00 GMT" "2007-04-11 02:00:00 GMT"
My data contains some date fields in this format yyyy-mm-dd
id <- c(1,2,3,4,5)
d1 <- c("2001-01-01", "1999-12-01","2007-11-31", "1995-05-01", "2013-01-07")
datadd <- data.frame(id,d1)
I need to convert date field d1 to the following format mm/dd/yyyy h:mm:ss
So the data looks like:
id d1
1 1/1/2001 0:00:00
2 12/1/1999 0:00:00
3 11/13/2007 0:00:00
4 5/1/1995 0:00:00
5 1/7/2013 0:00:00
Just use strptime (or as.Date) and format:
> format(strptime(datadd$d1, format = "%Y-%m-%d"), "%m/%d/%Y %H:%M:%S")
[1] "01/01/2001 00:00:00" "12/01/1999 00:00:00" "11/13/2007 00:00:00"
[4] "05/01/1995 00:00:00" "01/07/2013 00:00:00"
## format(as.Date(datadd$d1), "%m/%d/%Y %H:%M:%S")
I suppose you can use some gsub too if you want to remove the leading zeroes for single digit days and months.
the lubridatepackage is your friend. It's really intuitive.
## install and launch the {lubridate} package
> dt <- "1/1/2001 0:10:00"
> dt2 <- mdy_hms(dt)
[1] "2001-01-01 00:10:00 UTC"
This question already has an answer here:
R: strptime() and is.na () unexpected results
(1 answer)
Closed 8 years ago.
I've encountered the following error when converting a set of dates in character format to a POSIXct object.
Example Data:
t<-c("3/11/2007 1:30", "3/11/2007 2:00", "4/11/2007 2:00")
str(t)
chr [1:3] "3/11/2007 1:30" "3/11/2007 2:00" "4/11/2007 2:00"
z<-as.POSIXct(strptime(t, format ="%m/%d/%Y %H:%M"))
z
"2007-03-11 01:30:00 MST" NA "2007-04-11 02:00:00 MDT"
str(z)
POSIXct[1:3], format: "2007-03-11 01:30:00" NA "2007-04-11 02:00:00"
My question is why is the NA returned for the second date in z? I have a dataset that contains 8 years of hourly data (from which I copied the dates above), and this NA error pops up only for dates between 3/8 - 3/14 and ONLY when the hour is 02:00:00.
I do not encounter an error if the dates are converted to POSIXlt, so that is my current work around.
Any thoughts?
Try using a time zone that does not use daylight savings time:
as.POSIXct(t, format = "%m/%d/%Y %H:%M", tz = "GMT")
## [1] "2007-03-11 01:30:00 GMT" "2007-03-11 02:00:00 GMT" "2007-04-11 02:00:00 GMT"