Changing date formats in R [duplicate] - r

I have some very simple data in R that needs to have its date format changed:
date midpoint
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
4 31/05/2011 0.7970
5 30/04/2011 0.7877
6 31/03/2011 0.7411
7 28/02/2011 0.7624
8 31/01/2011 0.7665
9 31/12/2010 0.7500
10 30/11/2010 0.7734
11 31/10/2010 0.7511
12 30/09/2010 0.7263
13 31/08/2010 0.7158
14 31/07/2010 0.7110
15 30/06/2010 0.6921
16 31/05/2010 0.7005
17 30/04/2010 0.7113
18 31/03/2010 0.7027
19 28/02/2010 0.6973
20 31/01/2010 0.7260
21 31/12/2009 0.7154
22 30/11/2009 0.7287
23 31/10/2009 0.7375
Rather than %d/%m/%Y, I would like it in the standard R format of %Y-%m-%d
How can I make this change? I have tried:
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
But that just cut off the year and added zeros to the day:
[1] "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20"
[6] "0031/03/20" "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20"
[11] "0031/10/20" "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20"
[16] "0031/05/20" "0030/04/20" "0031/03/20" "0028/02/20" "0031/01/20"
[21] "0031/12/20" "0030/11/20" "0031/10/20" "0030/09/20" "0031/08/20"
[26] "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20" "0031/03/20"
[31] "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20" "0031/10/20"
[36] "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20"
Thanks!

There are two steps here:
Parse the data. Your example is not fully reproducible, is the data in a file, or the variable in a text or factor variable? Let us assume the latter, then if you data.frame is called X, you can do
X$newdate <- strptime(as.character(X$date), "%d/%m/%Y")
Now the newdate column should be of type Date.
Format the data. That is a matter of calling format() or strftime():
format(X$newdate, "%Y-%m-%d")
A more complete example:
R> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
R> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
R> nzd$newdate <- strptime(as.character(nzd$date), "%d/%m/%Y")
R> nzd$txtdate <- format(nzd$newdate, "%Y-%m-%d")
R> nzd
date mid newdate txtdate
1 31/08/2011 0.8378 2011-08-31 2011-08-31
2 31/07/2011 0.8457 2011-07-31 2011-07-31
3 30/06/2011 0.8147 2011-06-30 2011-06-30
R>
The difference between columns three and four is the type: newdate is of class Date whereas txtdate is character.

nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
In the above piece of code, there are two mistakes. First of all, when you are reading nzd$date inside as.Date you are not mentioning in what format you are feeding it the date. So, it tries it's default set format to read it. If you see the help doc, ?as.Date you will see
format
A character string. If not specified, it will try "%Y-%m-%d"
then "%Y/%m/%d" on the first non-NA element, and give an error
if neither works. Otherwise, the processing is via strptime
The second mistake is: even though you would like to read it in %Y-%m-%d format, inside format you wrote "%Y/%m/%d".
Now, the correct way of doing it is:
> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
> nzd$date <- format(as.Date(nzd$date, format = "%d/%m/%Y"), "%Y-%m-%d")
> head(nzd)
date mid
1 2011-08-31 0.8378
2 2011-07-31 0.8457
3 2011-06-30 0.8147

You could also use the parse_date_time function from the lubridate package:
library(lubridate)
day<-"31/08/2011"
as.Date(parse_date_time(day,"dmy"))
[1] "2011-08-31"
parse_date_time returns a POSIXct object, so we use as.Date to get a date object. The first argument of parse_date_time specifies a date vector, the second argument specifies the order in which your format occurs. The orders argument makes parse_date_time very flexible.

After reading your data in via a textConnection, the following seems to work:
dat <- read.table(textConnection(txt), header = TRUE)
dat$date <- strptime(dat$date, format= "%d/%m/%Y")
format(dat$date, format="%Y-%m-%d")
> format(dat$date, format="%Y-%m-%d")
[1] "2011-08-31" "2011-07-31" "2011-06-30" "2011-05-31" "2011-04-30" "2011-03-31"
[7] "2011-02-28" "2011-01-31" "2010-12-31" "2010-11-30" "2010-10-31" "2010-09-30"
[13] "2010-08-31" "2010-07-31" "2010-06-30" "2010-05-31" "2010-04-30" "2010-03-31"
[19] "2010-02-28" "2010-01-31" "2009-12-31" "2009-11-30" "2009-10-31"
> str(dat)
'data.frame': 23 obs. of 2 variables:
$ date : POSIXlt, format: "2011-08-31" "2011-07-31" "2011-06-30" ...
$ midpoint: num 0.838 0.846 0.815 0.797 0.788 ...

This is really easy using package lubridate. All you have to do is tell R what format your date is already in. It then converts it into the standard format
nzd$date <- dmy(nzd$date)
that's it.

Using one line to convert the dates to preferred format:
nzd$date <- format(as.Date(nzd$date, format="%d/%m/%Y"),"%Y/%m/%d")

I believe that
nzd$date <- as.Date(nzd$date, format = "%d/%m/%Y")
is sufficient.

Related

Error while converting to Date format in R

It should be an easy issue, but I got stacked with it. I have a data.frame with dates and values:
class(var_data)
[1] "tbl_df" "tbl" "data.frame"
var_data
A tibble: 42 x 2
date Tourists
<dttm> <dbl>
1 2006-03-01 00:00:00 55280.
2 2006-06-01 00:00:00 84392.
3 2006-09-01 00:00:00 132714.
Then I want to copy some dates and values into other data.frame:
var_list_DB$var_last[ii] <- var_data[last,"Tourists"]
var_list_DB$var_date_start[ii] <- var_data[1,"date"]
var_list_DB$var_date_last[ii] <- var_data[last,"date"]
But instead of dates I got numbers:
var_date_start var_date_last var_val_last
951868800 1496275200 10044.3162
And while trying to convert to date format, got an error:
as.Date(var_data[last,"date"], format = "%m/%d/%Y")
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
I recently updated to 3.5.0 version, may be this is an issue.
Add as.character convertion before pass to date and move var_data to data.frame format, like this two examples using as.Date and as.POSIXct:
var_data<-data.frame(var_data)
as.Date(as.character(var_data[,"date"]))
[1] "2006-03-01" "2006-06-01" "2006-09-01"
as.POSIXct(as.character(var_data[,"date"]))
[1] "2006-03-01 CET" "2006-06-01 CEST" "2006-09-01 CEST"

Join date and time

Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.
In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00

Split dates separately

I have a date variable
date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07
I want split the variable into two as date and time seperatly.
When I tried
a <- strftime(date, format="%H:%M:%S"), it is showing
Error in as.POSIXlt.default(x, tz = tz) : do not know how to
convert 'x' to class “POSIXlt”
When I tried to see the data type, it shows it as function. How to convert this into date and split into two variables?
The reason you are getting that error is because your date variable doesn't have the right format yet. You should first convert your date variable to a POSIX class with strptime:
dat$date <- strptime(dat$date, format = '%d%b%y:%H:%M:%S')
After that you can use format to extract the time from that variable:
dat$time <- format(dat$date, "%H:%M:%S")
For extracting the date, it is preferrably to use as.Date:
dat$dates <- as.Date(dat$date)
Those steps will give the following result:
> dat
date time dates
1 2016-04-15 00:00:04 00:00:04 2016-04-15
2 2016-04-17 00:06:35 00:06:35 2016-04-17
3 2016-04-18 00:05:07 00:05:07 2016-04-18
4 2016-04-18 00:00:56 00:00:56 2016-04-18
5 2016-04-19 00:08:07 00:08:07 2016-04-19
6 2016-04-18 00:00:07 00:00:07 2016-04-18
7 2016-04-22 00:03:07 00:03:07 2016-04-22
Alternative you could use the lubridate package (as also shown in the other answer):
library(lubridate)
dat$date <- dmy_hms(dat$date)
Used data:
dat <- read.table(text="date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07", header=TRUE, stringsAsFactor=FALSE)
Package lubridate makes converting text to dates easy
library(lubridate)
x <-dmy_hms("15APR16:00:00:04")
format(x, "%H:%M:%S") # extract time
[1] "00:00:04"
format(x, "%d-%m-%Y") # extract date
[1] "15-04-2016"

Transform variable into a %H:%M time format in R

I have this vector representing time recorded as hours (0 to 24) and minute (0 to 59). I would like to transform it into a %H:%M time format in R such that I can use function like difftime.
str(SF5$ES_TIME)
int [1:11452] 1940 600 5 1455 1443 2248 1115 900 200 420 ...
This is what I've tried, but in both cases, I got an error:
>SF5$time1<-as.POSIXct(SF5$ES_TIME, format = "%H:%M",tz="EST")
Error in as.POSIXct.numeric(SF5$ES_TIME, format = "%H:%M", tz = "EST") :
'origin' must be supplied
SF5$time1<-as.POSIXct(as.character(SF5$ES_TIME), format="%H:%M",tz="")
> str(SF5$time1)
POSIXct[1:11452], format: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
Any help or reading suggestions would be much appreciated!
Thank you,
Aurelie
Well, the error message tells you to provide origin and a minute is 60 seconds, so:
SF5 <- list(ES_TIME=as.integer(c(1940,600,5,1455,1443,2248,1115,900,200,420)))
x <- as.POSIXct(SF5$ES_TIME*60, origin="1970-01-01")
format(x, format="%H:%M")
#[1] "08:20" "10:00" "00:05" "00:15" "00:03" "13:28" "18:35" "15:00" "03:20" "07:00"
Note that the POSIXct date is just a number (with a class), so you need the format call to print it as you want - the default printing of x would print the full date info (year/month/day etc).
...any origin date would do since you don't care about it, but 1970-01-01 is the usual origin...
I was able to crack down the code! Thank you all for your tip!
#1) as suggested by Justin : put all numbers into four digits with zero padding
SF5$ES_TIME2<-sprintf("%04d",SF5$ES_TIME)
#2) Matched these %H%M with their corresponding date %y-%m-%d
SF5$ES.datetime <- paste(SF5$ES_TIME2,SF5$ES_DATE,sep=" ")
#3) Transform into Date-Time format
SF5$ES.datetime2 <- as.POSIXct(SF5$ES.datetime,format="%H%M %y-%m-%d", tz="")
# Did the same for my other time-date of interest
SF5$SH_TIME2<-sprintf("%04d",SF5$SH_TIME)
SF5$SH.datetime <- paste(SF5$SH_TIME2,SF5$SH_DATE,sep=" ")
SF5$SH.datetime2 <- as.POSIXct(SF5$SH.datetime,format="%H%M %y-%m-%d", tz="")
# Calculate the time difference between the 2 date-time in hours
SF5$duration<-difftime(SF5$SH.datetime2,SF5$ES.datetime2,units="hours",tz="")

Change from date and hour format to numeric format

I am working in R and I need to change from a column in format
9/27/2011 3:33:00 PM
to a value format. In Excel I can use the function value() but I do not know how to do it in R.
My data looks like this:
9/27/2011 15:33 a 1 5 9
9/27/2011 15:33 v 2 6 2
9/27/2011 15:34 c 3 7 1
To convert a string into R date format, use as.POSIXct - then you can coerce it to a numeric value using as.numeric:
> x <- as.POSIXct("9/27/2011 3:33:00 PM", format="%m/%d/%Y %H:%M:%S %p")
> x
[1] "2011-09-27 03:33:00 BST"
> as.numeric(x)
[1] 1317090780
The value you get indicates the number of seconds since an arbitrary date, usually 1/1/1970. Note that this is different from Excel, where a date is stored as the number of days since an arbitrary date (1/1/1900 if my memory serves me well - I try not to use Excel any more.)
For more information, see ?DateTimeClasses
This was useful for me:
> test=as.POSIXlt("09/13/2006", format="%m/%d/%Y")
> test
[1] "2006-09-13"
> 1900+test$year
[1] 2006
> test$yday
[1] 255
> test$yday/365
[1] 0.6986301
> 1900+test$year+test$yday/366
[1] 2006.697
You can use similar approaches if you need day numbers like in Excel.

Resources