I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.
Related
I am trying to parse a column into two variables, "date" and "time" in R. I have installed the lubridate library.
The current csv file has the following timestamp format: yyyyMMdd hh:mm a (e.g. '20170423 12:26 AM') and imports the column as character.
I'm trying this but its not working on my current variable 'Tran_Date' (below code doesn't work):
transactions_file <- as_date('Tran_Date', "%Y%m%d %H:%M %p")
I like the base R solution like this,
Tran_Date <- as.POSIXct("20170423 12:26 AM", format = "%Y%m%d %I:%M %p")
Tran_Date
#> [1] "2017-04-23 00:26:00 CEST"
transactions_file <- data.frame(
date = format(Tran_Date,"%m/%d/%Y"),
time = format(Tran_Date,"%H:%M")) # possibly add %p if you use %I
transactions_file
#> date time
#> 1 04/23/2017 00:26
with lubridate,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
Tran_Date <- ymd_hm("20170423 12:26 AM")
then you could recycle the above or use some combination of day(Tran_Date) cbind paste with month(Tran_Date) and similar with paste(hour(Tran_Date), minute(Tran_Date), sep = ":") or most likely something smarter.
Im have a time stamp column that I am converting into a POSIXct. The problem is that there are two different formats in the same column, so if I use the more common conversion the other gets converted into NA.
MC$Date
12/1/15 22:00
12/1/15 23:00
12/2/15
12/2/15 1:00
12/2/15 2:00
I use the following code to convert to a POSIXct:
MC$Date <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
The results:
MC$Date
15-12-01 22:00:00
15-12-01 23:00:00
NA
15-12-02 01:00:00
15-12-02 02:00:00
I have tried using a logic vector to identify the issue then correct it but can't find an easy solution.
The lubridate package was designed to deal with situations like this.
dt <- c(
"12/1/15 22:00",
"12/1/15 23:00",
"12/2/15",
"12/2/15 1:00",
"12/2/15 2:00"
)
dt
[1] "12/1/15 22:00" "12/1/15 23:00" "12/2/15" "12/2/15 1:00" "12/2/15 2:00"
lubridate::mdy_hm(dt, truncated = 2)
[1] "2015-12-01 22:00:00 UTC" "2015-12-01 23:00:00 UTC" "2015-12-02 00:00:00 UTC"
[4] "2015-12-02 01:00:00 UTC" "2015-12-02 02:00:00 UTC"
The truncated parameter indicates how many formats can be missing.
You may add the tz parameter to specify which time zone to parse the date with if UTC is not suitable.
I think the logic vector approach could work. Maybe in tandem with an temporary vector for holding the parsed dates without clobbering the unparsed ones. Something like this:
dates <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
dates[is.na(dates)] <- as.POSIXct(MC[is.na(dates),], tz='MST', format = '%m/%d/%Y')
MC$Date <- dates
Since all of your datetimes are separated with a space between date and time, you could use strsplit to extract only the date part.
extractDate <- function(x){ strsplit(x, split = " " )[[1]][1] }
MC$Date <- sapply( MC$Date, extractDate )
Then go ahead and convert any way you like, without worrying about the time part getting in the way.
Date
01/01/2013 #mm/dd/yyyy
12122015 #mmddyyyy
2014-03-21 #yyyy-mm-dd
95.10.12 #yy.mm.dd
I have a column "Date" with different formats. How can I clean this and convert them into a single date format?
Additional info:class(Date) is factor.
The easiest way to do this is to use the lubridate package.
Date=c(
"01/01/2013" #mm/dd/yyyy
,"12122015" #mmddyyyy
,"2014-03-21" #yyyy-mm-dd
,"95.10.12" #yy.mm.dd
)
library(lubridate)
# list of functions in lubridate that
# translate text to POSIXct: you only need to know the
# order of the data (e.g. mdy = month-day-year).
funs <- c("mdy","dym","ymd","dmy","myd","ydm")
# vector to store results
dates <- as.POSIXct(rep(NA,length(Date)))
# we try everything lubridate has. There will be some warnings
# e.g. because mdy cannot translate everything. You can ignore this.
for ( f in funs ){
dates[is.na(dates)] <- do.call(f,list(Date[is.na(dates)]))
}
dates
> dates
[1] "2013-01-01 01:00:00 CET" "2015-12-12 01:00:00 CET" "2013-01-01 01:00:00 CET" "2015-12-12 01:00:00 CET"
I am reading a date value from csv file.So the format will vary according to the date format of csv. How can I convert any date string to dd-mm-yyy HH:mm:ss ?
EDIT :
The input format are :
dd/mm/yyyy HH:mm:ss
dd/mm/yyyy
dd-mm-yyyy HH:mm:ss
dd-mm-yyyy
mm-dd-yyyy HH:mm:ss
mm-dd-yyyy
mm/dd/yyyy
yyyy-mm-dd HH:mm:ss
yyyy-mm-dd
I need to convert all these formats to dd-mm-yyyy HH:mm:ss
See the anytime package whose anytime function does just that -- and without requiring a format string:
> inputs <- c("12/07/2017 10:11:12", "12/07/2017", "12-07-2017 10:11:12",
+ "07-12-2017", "2017-12-07 10:11:12", "2017-12-07")
> library(anytime)
> anytime(inputs)
[1] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
[3] "2017-12-07 10:11:12 CST" "2017-07-12 00:00:00 CDT"
[5] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
>
However, your requirement of accepting both d-m-y and m-d-y is not satisfiable. So you need to make a choice and supply an explicit format here.
In general, I highly recommend avoiding the ambiguity and sticking to y-m-d ISO formats. As a convenience to stubborn North American habits, anytime and anydata also accept m-d-y ordering but it is dangerous.
Again, only you can tell if 3-4-5 is April 3rd or March 4th, and you need to specify that.
as.Date() converts a string into a Date object. You will need to adapt its format parameter to the specific format your csv has.
Try "lubridate" package. Here A_1.csv has both formats.
Data <-read.csv("Al_1.csv") # import data
str(Data)
a = NULL # create a null object
library(lubridate)
a$Date <- mdy_hm(Data$Date) # store dd/mm/yyyy HH:mm:ss objects here
a$Price <- Data$Price # get respective values(it could by any other column)
b = NULL
b$Date <- mdy(Data$Date)
b$Price <- Data$Price
a <- as.data.frame(a)
b <- as.data.frame(b)
a <- a[is.na(a$Date)==FALSE,] # those with NA had diffrent formats remove it
b <- b[is.na(b$Date)==FALSE,]
b$Date <- as.POSIXlt(b$Date)# change your other format also to UTC
x <- rbind(a,b)
str(x)
x <- ts(x)
I am reading a date value from csv file.So the format will vary according to the date format of csv. How can I convert any date string to dd-mm-yyy HH:mm:ss ?
EDIT :
The input format are :
dd/mm/yyyy HH:mm:ss
dd/mm/yyyy
dd-mm-yyyy HH:mm:ss
dd-mm-yyyy
mm-dd-yyyy HH:mm:ss
mm-dd-yyyy
mm/dd/yyyy
yyyy-mm-dd HH:mm:ss
yyyy-mm-dd
I need to convert all these formats to dd-mm-yyyy HH:mm:ss
See the anytime package whose anytime function does just that -- and without requiring a format string:
> inputs <- c("12/07/2017 10:11:12", "12/07/2017", "12-07-2017 10:11:12",
+ "07-12-2017", "2017-12-07 10:11:12", "2017-12-07")
> library(anytime)
> anytime(inputs)
[1] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
[3] "2017-12-07 10:11:12 CST" "2017-07-12 00:00:00 CDT"
[5] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
>
However, your requirement of accepting both d-m-y and m-d-y is not satisfiable. So you need to make a choice and supply an explicit format here.
In general, I highly recommend avoiding the ambiguity and sticking to y-m-d ISO formats. As a convenience to stubborn North American habits, anytime and anydata also accept m-d-y ordering but it is dangerous.
Again, only you can tell if 3-4-5 is April 3rd or March 4th, and you need to specify that.
as.Date() converts a string into a Date object. You will need to adapt its format parameter to the specific format your csv has.
Try "lubridate" package. Here A_1.csv has both formats.
Data <-read.csv("Al_1.csv") # import data
str(Data)
a = NULL # create a null object
library(lubridate)
a$Date <- mdy_hm(Data$Date) # store dd/mm/yyyy HH:mm:ss objects here
a$Price <- Data$Price # get respective values(it could by any other column)
b = NULL
b$Date <- mdy(Data$Date)
b$Price <- Data$Price
a <- as.data.frame(a)
b <- as.data.frame(b)
a <- a[is.na(a$Date)==FALSE,] # those with NA had diffrent formats remove it
b <- b[is.na(b$Date)==FALSE,]
b$Date <- as.POSIXlt(b$Date)# change your other format also to UTC
x <- rbind(a,b)
str(x)
x <- ts(x)