Join date and time - r

Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.

In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00

Related

Changing date formats in R [duplicate]

I have some very simple data in R that needs to have its date format changed:
date midpoint
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
4 31/05/2011 0.7970
5 30/04/2011 0.7877
6 31/03/2011 0.7411
7 28/02/2011 0.7624
8 31/01/2011 0.7665
9 31/12/2010 0.7500
10 30/11/2010 0.7734
11 31/10/2010 0.7511
12 30/09/2010 0.7263
13 31/08/2010 0.7158
14 31/07/2010 0.7110
15 30/06/2010 0.6921
16 31/05/2010 0.7005
17 30/04/2010 0.7113
18 31/03/2010 0.7027
19 28/02/2010 0.6973
20 31/01/2010 0.7260
21 31/12/2009 0.7154
22 30/11/2009 0.7287
23 31/10/2009 0.7375
Rather than %d/%m/%Y, I would like it in the standard R format of %Y-%m-%d
How can I make this change? I have tried:
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
But that just cut off the year and added zeros to the day:
[1] "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20"
[6] "0031/03/20" "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20"
[11] "0031/10/20" "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20"
[16] "0031/05/20" "0030/04/20" "0031/03/20" "0028/02/20" "0031/01/20"
[21] "0031/12/20" "0030/11/20" "0031/10/20" "0030/09/20" "0031/08/20"
[26] "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20" "0031/03/20"
[31] "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20" "0031/10/20"
[36] "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20"
Thanks!
There are two steps here:
Parse the data. Your example is not fully reproducible, is the data in a file, or the variable in a text or factor variable? Let us assume the latter, then if you data.frame is called X, you can do
X$newdate <- strptime(as.character(X$date), "%d/%m/%Y")
Now the newdate column should be of type Date.
Format the data. That is a matter of calling format() or strftime():
format(X$newdate, "%Y-%m-%d")
A more complete example:
R> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
R> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
R> nzd$newdate <- strptime(as.character(nzd$date), "%d/%m/%Y")
R> nzd$txtdate <- format(nzd$newdate, "%Y-%m-%d")
R> nzd
date mid newdate txtdate
1 31/08/2011 0.8378 2011-08-31 2011-08-31
2 31/07/2011 0.8457 2011-07-31 2011-07-31
3 30/06/2011 0.8147 2011-06-30 2011-06-30
R>
The difference between columns three and four is the type: newdate is of class Date whereas txtdate is character.
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
In the above piece of code, there are two mistakes. First of all, when you are reading nzd$date inside as.Date you are not mentioning in what format you are feeding it the date. So, it tries it's default set format to read it. If you see the help doc, ?as.Date you will see
format
A character string. If not specified, it will try "%Y-%m-%d"
then "%Y/%m/%d" on the first non-NA element, and give an error
if neither works. Otherwise, the processing is via strptime
The second mistake is: even though you would like to read it in %Y-%m-%d format, inside format you wrote "%Y/%m/%d".
Now, the correct way of doing it is:
> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
> nzd$date <- format(as.Date(nzd$date, format = "%d/%m/%Y"), "%Y-%m-%d")
> head(nzd)
date mid
1 2011-08-31 0.8378
2 2011-07-31 0.8457
3 2011-06-30 0.8147
You could also use the parse_date_time function from the lubridate package:
library(lubridate)
day<-"31/08/2011"
as.Date(parse_date_time(day,"dmy"))
[1] "2011-08-31"
parse_date_time returns a POSIXct object, so we use as.Date to get a date object. The first argument of parse_date_time specifies a date vector, the second argument specifies the order in which your format occurs. The orders argument makes parse_date_time very flexible.
After reading your data in via a textConnection, the following seems to work:
dat <- read.table(textConnection(txt), header = TRUE)
dat$date <- strptime(dat$date, format= "%d/%m/%Y")
format(dat$date, format="%Y-%m-%d")
> format(dat$date, format="%Y-%m-%d")
[1] "2011-08-31" "2011-07-31" "2011-06-30" "2011-05-31" "2011-04-30" "2011-03-31"
[7] "2011-02-28" "2011-01-31" "2010-12-31" "2010-11-30" "2010-10-31" "2010-09-30"
[13] "2010-08-31" "2010-07-31" "2010-06-30" "2010-05-31" "2010-04-30" "2010-03-31"
[19] "2010-02-28" "2010-01-31" "2009-12-31" "2009-11-30" "2009-10-31"
> str(dat)
'data.frame': 23 obs. of 2 variables:
$ date : POSIXlt, format: "2011-08-31" "2011-07-31" "2011-06-30" ...
$ midpoint: num 0.838 0.846 0.815 0.797 0.788 ...
This is really easy using package lubridate. All you have to do is tell R what format your date is already in. It then converts it into the standard format
nzd$date <- dmy(nzd$date)
that's it.
Using one line to convert the dates to preferred format:
nzd$date <- format(as.Date(nzd$date, format="%d/%m/%Y"),"%Y/%m/%d")
I believe that
nzd$date <- as.Date(nzd$date, format = "%d/%m/%Y")
is sufficient.

R: xts conversion problem (Add x's at the index row)

This is my setup: I have an excel-file with hourly electricity prices. I want to index them by the hourly interval, file here: Data. I load the data the usual way.
library(readxl)
library(tidyverse)
rm(list = ls())
DK1 <- read_excel("DK1.xlsx")
time_index <- as.POSIXct(DK1$Datetime, format="%Y/%m/%d %H:%M:%S", tz=Sys.timezone())
test <- xts(DK1[,-1], order.by = time_index)
This is just one of many ways I've tried to index it in XTS to no avail. The index row looks wrong and I do not know what to do.
UPDATE 1: dput(head(DK1))
It appears that read_excel is converting your time column into a datetime, but with all the dates set to "1899-12-31". This can be seen by running:
> str(DK1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8760 obs. of 6 variables:
$ Date : POSIXct, format: "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"...
$ Hours : POSIXct, format: "1899-12-31 00:00:00" "1899-12-31 01:00:00" "1899-12-31 02:00:00" "1899-12-31 03:00:00" ...
$ Datetime : chr "2019-01-01 00:00:00" "2019-01-01 01:00:00" "2019-01-01 02:00:00" "2019-01-01 03:00:00" ...
$ DK1 : num 211.5 75.2 -30.5 -74 -55.3 ...
This is more of a data import problem and the Datetime concat in excel can be performed in R. Generally it's simpler to have all data manipulation performed in a single spot.
library(readxl)
library(xts)
DK1 <- read_excel("DK1.xlsx")
# pasting date and time together in new column name for comparison
# note the use of strftime to remove the date information discussed earlier
DK1$Datetime2 <- paste(DK1$Date, strftime(DK1$Hours, "%H:%M:%S", tz = "UTC"))
# the format / in excel need to change to - for how it's displayed in R
DK1$time_index <- as.POSIXct(DK1$Datetime, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
# filtering out the NA value of 2019-03-10 02:00:00 which is when daylight savings occurred
DK1 <- DK1[!is.na(DK1$time_index), ]
DK1a <- xts(DK1[, "DK1"], order.by = DK1$time_index)
> head(DK1a)
DK1
2019-01-01 00:00:00 211.48
2019-01-01 01:00:00 75.20
2019-01-01 02:00:00 -30.47
2019-01-01 03:00:00 -74.00
2019-01-01 04:00:00 -55.33
2019-01-01 05:00:00 -93.72
We can select the numeric column and then order.by the 'Date' which is already a Datetime class
library(xts)
xts(DK1$DK1, order.by = DK1$Date)
as the format is in the default format, we don't have to specify the format

R: Transform factor to Datetime showing time as well

I have a factor column DATE in a dataframe a that shows dates written like this:
01/01/2012 00
It shows the day, the month, the year and the hour.
On stackoverflow I found this way to transform from factor to datetime:
a$DATE <- as.POSIXct(as.character(a$DATE), format = "%d/%m/%Y %H")
However when I try to check the dataframe by View(a) I only get to see the date without the hour. All the dates appear like this:
2012-01-01
I have also tried to specify datetime by saving the dataframe in a csv and importing it again through the Rstudio button "Import Dataset". When I specify the type by clicking on the header of the DATE column I get the same error: the hour doesn't show.
Is the method I used correct?
If yes, how can I show the hour?
If it's not possible to show the hour, how can I get the hour from the POSIXct type?
I can't seem to reproduce your issue, could you possible provide a complete minimal reproducible example that demonste the issue?
Here's what I got.
times <- c("01/01/2012 00", "30/11/2013 11", "17/03/2014 23")
times_factor <- as.factor(times)
times_factor
#> [1] 01/01/2012 00 30/11/2013 11 17/03/2014 23
#> Levels: 01/01/2012 00 17/03/2014 23 30/11/2013 11
foo <- as.POSIXct(times_factor, format = "%d/%m/%Y %H")
foo
#> [1] "2012-01-01 00:00:00 CET" "2013-11-30 11:00:00 CET" "2014-03-17 23:00:00 CET"
bar <- format(foo,"%d/%m/%Y %H")
bar
#> [1] "01/01/2012 00" "30/11/2013 11" "17/03/2014 23"
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
dmy_h(times_factor, quiet = T, tz = "CET")
#> [1] "2012-01-01 00:00:00 CET" "2013-11-30 11:00:00 CET" "2014-03-17 23:00:00 CET"

Split dates separately

I have a date variable
date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07
I want split the variable into two as date and time seperatly.
When I tried
a <- strftime(date, format="%H:%M:%S"), it is showing
Error in as.POSIXlt.default(x, tz = tz) : do not know how to
convert 'x' to class “POSIXlt”
When I tried to see the data type, it shows it as function. How to convert this into date and split into two variables?
The reason you are getting that error is because your date variable doesn't have the right format yet. You should first convert your date variable to a POSIX class with strptime:
dat$date <- strptime(dat$date, format = '%d%b%y:%H:%M:%S')
After that you can use format to extract the time from that variable:
dat$time <- format(dat$date, "%H:%M:%S")
For extracting the date, it is preferrably to use as.Date:
dat$dates <- as.Date(dat$date)
Those steps will give the following result:
> dat
date time dates
1 2016-04-15 00:00:04 00:00:04 2016-04-15
2 2016-04-17 00:06:35 00:06:35 2016-04-17
3 2016-04-18 00:05:07 00:05:07 2016-04-18
4 2016-04-18 00:00:56 00:00:56 2016-04-18
5 2016-04-19 00:08:07 00:08:07 2016-04-19
6 2016-04-18 00:00:07 00:00:07 2016-04-18
7 2016-04-22 00:03:07 00:03:07 2016-04-22
Alternative you could use the lubridate package (as also shown in the other answer):
library(lubridate)
dat$date <- dmy_hms(dat$date)
Used data:
dat <- read.table(text="date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07", header=TRUE, stringsAsFactor=FALSE)
Package lubridate makes converting text to dates easy
library(lubridate)
x <-dmy_hms("15APR16:00:00:04")
format(x, "%H:%M:%S") # extract time
[1] "00:00:04"
format(x, "%d-%m-%Y") # extract date
[1] "15-04-2016"

Convert date from yyyy-mm-dd to mm/dd/yyyy h:mm:ss in R

My data contains some date fields in this format yyyy-mm-dd
id <- c(1,2,3,4,5)
d1 <- c("2001-01-01", "1999-12-01","2007-11-31", "1995-05-01", "2013-01-07")
datadd <- data.frame(id,d1)
I need to convert date field d1 to the following format mm/dd/yyyy h:mm:ss
So the data looks like:
id d1
1 1/1/2001 0:00:00
2 12/1/1999 0:00:00
3 11/13/2007 0:00:00
4 5/1/1995 0:00:00
5 1/7/2013 0:00:00
Just use strptime (or as.Date) and format:
> format(strptime(datadd$d1, format = "%Y-%m-%d"), "%m/%d/%Y %H:%M:%S")
[1] "01/01/2001 00:00:00" "12/01/1999 00:00:00" "11/13/2007 00:00:00"
[4] "05/01/1995 00:00:00" "01/07/2013 00:00:00"
## format(as.Date(datadd$d1), "%m/%d/%Y %H:%M:%S")
I suppose you can use some gsub too if you want to remove the leading zeroes for single digit days and months.
the lubridatepackage is your friend. It's really intuitive.
## install and launch the {lubridate} package
> dt <- "1/1/2001 0:10:00"
> dt2 <- mdy_hms(dt)
[1] "2001-01-01 00:10:00 UTC"

Resources