Split dates separately - r

I have a date variable
date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07
I want split the variable into two as date and time seperatly.
When I tried
a <- strftime(date, format="%H:%M:%S"), it is showing
Error in as.POSIXlt.default(x, tz = tz) : do not know how to
convert 'x' to class “POSIXlt”
When I tried to see the data type, it shows it as function. How to convert this into date and split into two variables?

The reason you are getting that error is because your date variable doesn't have the right format yet. You should first convert your date variable to a POSIX class with strptime:
dat$date <- strptime(dat$date, format = '%d%b%y:%H:%M:%S')
After that you can use format to extract the time from that variable:
dat$time <- format(dat$date, "%H:%M:%S")
For extracting the date, it is preferrably to use as.Date:
dat$dates <- as.Date(dat$date)
Those steps will give the following result:
> dat
date time dates
1 2016-04-15 00:00:04 00:00:04 2016-04-15
2 2016-04-17 00:06:35 00:06:35 2016-04-17
3 2016-04-18 00:05:07 00:05:07 2016-04-18
4 2016-04-18 00:00:56 00:00:56 2016-04-18
5 2016-04-19 00:08:07 00:08:07 2016-04-19
6 2016-04-18 00:00:07 00:00:07 2016-04-18
7 2016-04-22 00:03:07 00:03:07 2016-04-22
Alternative you could use the lubridate package (as also shown in the other answer):
library(lubridate)
dat$date <- dmy_hms(dat$date)
Used data:
dat <- read.table(text="date
15APR16:00:00:04
17APR16:00:06:35
18APR16:00:05:07
18APR16:00:00:56
19APR16:00:08:07
18APR16:00:00:07
22APR16:00:03:07", header=TRUE, stringsAsFactor=FALSE)

Package lubridate makes converting text to dates easy
library(lubridate)
x <-dmy_hms("15APR16:00:00:04")
format(x, "%H:%M:%S") # extract time
[1] "00:00:04"
format(x, "%d-%m-%Y") # extract date
[1] "15-04-2016"

Related

Converting a date in R returns NA

date
05-06-2016
05-07-2016
4/13/2016
4/14/2016
I want to format the column to date format using below code
td3 <- read.csv("Book2.csv")
td3$date <- as.Date(td3$date, "%m-%d-%y")
when i run the code the last 2 rows return NA
as.Date.character(gsub("/", "-",td3$date), '%m-%d-%Y')
[1] "2016-05-06" "2016-05-07" "2016-04-13" "2016-04-14"
Here is a solution with parse_date_time from lubridate package:
library(lubridate)
as.Date(parse_date_time(df$date, orders = c('mdy', 'dmy')))
[1] "2016-05-06" "2016-05-07" "2016-04-13" "2016-04-14"

Changing date formats in R [duplicate]

I have some very simple data in R that needs to have its date format changed:
date midpoint
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
4 31/05/2011 0.7970
5 30/04/2011 0.7877
6 31/03/2011 0.7411
7 28/02/2011 0.7624
8 31/01/2011 0.7665
9 31/12/2010 0.7500
10 30/11/2010 0.7734
11 31/10/2010 0.7511
12 30/09/2010 0.7263
13 31/08/2010 0.7158
14 31/07/2010 0.7110
15 30/06/2010 0.6921
16 31/05/2010 0.7005
17 30/04/2010 0.7113
18 31/03/2010 0.7027
19 28/02/2010 0.6973
20 31/01/2010 0.7260
21 31/12/2009 0.7154
22 30/11/2009 0.7287
23 31/10/2009 0.7375
Rather than %d/%m/%Y, I would like it in the standard R format of %Y-%m-%d
How can I make this change? I have tried:
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
But that just cut off the year and added zeros to the day:
[1] "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20"
[6] "0031/03/20" "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20"
[11] "0031/10/20" "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20"
[16] "0031/05/20" "0030/04/20" "0031/03/20" "0028/02/20" "0031/01/20"
[21] "0031/12/20" "0030/11/20" "0031/10/20" "0030/09/20" "0031/08/20"
[26] "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20" "0031/03/20"
[31] "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20" "0031/10/20"
[36] "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20"
Thanks!
There are two steps here:
Parse the data. Your example is not fully reproducible, is the data in a file, or the variable in a text or factor variable? Let us assume the latter, then if you data.frame is called X, you can do
X$newdate <- strptime(as.character(X$date), "%d/%m/%Y")
Now the newdate column should be of type Date.
Format the data. That is a matter of calling format() or strftime():
format(X$newdate, "%Y-%m-%d")
A more complete example:
R> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
R> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
R> nzd$newdate <- strptime(as.character(nzd$date), "%d/%m/%Y")
R> nzd$txtdate <- format(nzd$newdate, "%Y-%m-%d")
R> nzd
date mid newdate txtdate
1 31/08/2011 0.8378 2011-08-31 2011-08-31
2 31/07/2011 0.8457 2011-07-31 2011-07-31
3 30/06/2011 0.8147 2011-06-30 2011-06-30
R>
The difference between columns three and four is the type: newdate is of class Date whereas txtdate is character.
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
In the above piece of code, there are two mistakes. First of all, when you are reading nzd$date inside as.Date you are not mentioning in what format you are feeding it the date. So, it tries it's default set format to read it. If you see the help doc, ?as.Date you will see
format
A character string. If not specified, it will try "%Y-%m-%d"
then "%Y/%m/%d" on the first non-NA element, and give an error
if neither works. Otherwise, the processing is via strptime
The second mistake is: even though you would like to read it in %Y-%m-%d format, inside format you wrote "%Y/%m/%d".
Now, the correct way of doing it is:
> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
> nzd$date <- format(as.Date(nzd$date, format = "%d/%m/%Y"), "%Y-%m-%d")
> head(nzd)
date mid
1 2011-08-31 0.8378
2 2011-07-31 0.8457
3 2011-06-30 0.8147
You could also use the parse_date_time function from the lubridate package:
library(lubridate)
day<-"31/08/2011"
as.Date(parse_date_time(day,"dmy"))
[1] "2011-08-31"
parse_date_time returns a POSIXct object, so we use as.Date to get a date object. The first argument of parse_date_time specifies a date vector, the second argument specifies the order in which your format occurs. The orders argument makes parse_date_time very flexible.
After reading your data in via a textConnection, the following seems to work:
dat <- read.table(textConnection(txt), header = TRUE)
dat$date <- strptime(dat$date, format= "%d/%m/%Y")
format(dat$date, format="%Y-%m-%d")
> format(dat$date, format="%Y-%m-%d")
[1] "2011-08-31" "2011-07-31" "2011-06-30" "2011-05-31" "2011-04-30" "2011-03-31"
[7] "2011-02-28" "2011-01-31" "2010-12-31" "2010-11-30" "2010-10-31" "2010-09-30"
[13] "2010-08-31" "2010-07-31" "2010-06-30" "2010-05-31" "2010-04-30" "2010-03-31"
[19] "2010-02-28" "2010-01-31" "2009-12-31" "2009-11-30" "2009-10-31"
> str(dat)
'data.frame': 23 obs. of 2 variables:
$ date : POSIXlt, format: "2011-08-31" "2011-07-31" "2011-06-30" ...
$ midpoint: num 0.838 0.846 0.815 0.797 0.788 ...
This is really easy using package lubridate. All you have to do is tell R what format your date is already in. It then converts it into the standard format
nzd$date <- dmy(nzd$date)
that's it.
Using one line to convert the dates to preferred format:
nzd$date <- format(as.Date(nzd$date, format="%d/%m/%Y"),"%Y/%m/%d")
I believe that
nzd$date <- as.Date(nzd$date, format = "%d/%m/%Y")
is sufficient.

Join date and time

Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.
In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00

extract data based on datetime

I have two dataframes:
dat is a 9752x8 dataframe that contains some POSIXlt dates
trips.df is a 35772x28 dataframe that contains hourly temperature
data
I would like to save the corresponding temperature for each dates in dat.
I have tried:
trips.df$temperature<-lapply(trips.df$fin, function(x){
dat_meteo[dat_meteo$Date.Heure==round(x,"hours"),7]})
But I got this error, which makes me think that x is not passed as a datetime variable
Error in round(x, "hours") :
non-numeric argument to mathematical function
I have also tried this:
merge(trips.df,dat_meteo[,c(1,7)])
But I also got an error:
Error: cannot allocate vector of size 653.8 Mb
Any advice on how to retrieve data on dat_meteo by dates?
I am using R version 3.4.0 with RStudio Version 1.0.143 on Windows 10
And here are an exercpt of my data:
> head(trips.df$fin)
[1] "2013-06-25 16:34:16 EDT" "2013-06-25 16:34:16 EDT" "2013-06-26 13:00:05 EDT"
[4] "2013-06-29 12:52:21 EDT" "2013-06-29 15:34:13 EDT" "2013-06-29 17:39:29 EDT"
> dat_meteo[1870:1875,c(1,7)]
Date.Heure Temp...C.
1870 2013-03-19 18:00:00 -1,2
1871 2013-03-19 19:00:00 -1,7
1872 2013-03-19 20:00:00 -2,1
1873 2013-03-19 21:00:00 -2,8
1874 2013-03-19 22:00:00 -3,0
1875 2013-03-19 23:00:00 -3,7
You may want to take a slightly different approach and use data.table.
trips.dt <- data.table(trips.df)
dat <- data.table(dat)
trips.dt <- trips.dt[ , dates.a := strptime(as.POSIXct(fin,format='%m/%d/%Y %H:%M:%S'),format='%m/%d/%Y')][,dates.b := dates.a]
dat <- dat[dates.dat.a := strptime(as.POSIXct(Date.Heure, format = '%m/%d/%Y %H:%M:%S'),format='%m/%d/%Y')][, dates.dat.b := dates.dat.a]
setkey(trips.dt, id, dates.a, dates.b)
setkey(dat , id, dates.dat.a, dates.dat.b)
combo <- foverlaps(trips.df, dat, type = "within")
This creates date ranges for both trip.df and dat after converting it to a data.table, then merges trips.df to dat and stores the result as combo
Make sure that the two time columns you want to match have the same format (POSIXct). It is more straightforward to use the POSIXct format within a dataframe, as the POSIXlt format actually corresponds to a list of named elements whereas POSIXct is in vector form.
dat_meteo$Date.Heure=as.POSIXct(dat_meteo$Date.Heure,format="%Y-%m-%d %H:%M:%S")
Create a column in trips.df of times rounded to the closest hours, converting it to POSIXct too, as round converts POSIXct to POSIXlt:
trips.df$fin_r=as.POSIXct(round(trips.df‌​$fin,"hours"))
Then use merge:
res=merge(trips.df,dat_meteo[,c(1,7)],by.x="fin_r",by.y ="Date.Heure")

Subset dataframe based on POSIXct date and time greater than datetime using dplyr

I am not sure what is going wrong with selecting date times as a POSIXct format. I have read several comments on subsetting a dataframe based on as.Date and I can get that to work without an issue. I have also read many posts suggesting that filtering POSIXct formats should work, but for some reason I cannot get it to work.
An example dataframe:
library(lubridate)
library(dplyr)
date_test <- seq(ymd_hms('2016-07-01 00:00:00'),ymd_hms('2016-08-01 00:00:00'), by = '15 min')
date_test <- data.frame(date_test)
date_test$datetime <- date_test$date_test
date_test <- select(date_test, -date_test)
I checked that it is in POSIXct format and then tried several ways to subset the dataframe greater than 2016-07-01 01:15:00. However the output never shows the date times less than 2016-07-01 01:15:00 being removed. I am sorry if this has been asked somewhere and I cannot find it but I have looked and tried to get this to work. I am using UTC as the timezone to avoid daylight savings time issues so that is not the issue here - unless the filter requires it.
class(date_test$datetime)
date_test <- date_test %>% filter(datetime > '2016-07-01 01:15:00')
date_test <- date_test %>%
filter(datetime > as.POSIXct("2016-07-01 00:15"))
date_test <- subset(date_test, datetime > as.POSIXct('2016-07-01 01:15:00'))
Now if I filter using:
date_test <- date_test %>%
filter(datetime > as.POSIXct("2016-07-10 01:15:00"))
the output is very strange with a day behind and the wrong time?
2016-07-09 13:30:00
2016-07-09 13:45:00
2016-07-09 14:00:00
2016-07-09 14:15:00
2016-07-09 14:30:00
If it helps I am using MAC OS Sierra with R Studio Version 1.0.143 and R You Stupid Darkness, DPLYR 0.5 and Lubridate 1.6
ymd_hms uses POSIXct times in "UTC" timezone by default - as.POSIXct uses the system timezone (e.g. - Australia for me) - you need to consistently use ymd_hms or change to the "UTC" timezone as per Dave's suggestion in the comments.
E.g.: these examples work:
date_test <- seq(ymd_hms('2016-07-01 00:30:00'),ymd_hms('2016-07-01 01:30:00'), by = '15 min')
date_test <- data.frame(datetime=date_test)
date_test
# datetime
#1 2016-07-01 00:30:00
#2 2016-07-01 00:45:00
#3 2016-07-01 01:00:00
#4 2016-07-01 01:15:00
#5 2016-07-01 01:30:00
date_test %>%
filter(datetime > as.POSIXct("2016-07-01 01:00:00", tz="UTC"))
date_test %>%
filter(datetime > ymd_hms("2016-07-01 01:00:00"))
# datetime
#1 2016-07-01 01:15:00
#2 2016-07-01 01:30:00

Resources