parsing a character date time format in R - r

I am trying to parse a column into two variables, "date" and "time" in R. I have installed the lubridate library.
The current csv file has the following timestamp format: yyyyMMdd hh:mm a (e.g. '20170423 12:26 AM') and imports the column as character.
I'm trying this but its not working on my current variable 'Tran_Date' (below code doesn't work):
transactions_file <- as_date('Tran_Date', "%Y%m%d %H:%M %p")

I like the base R solution like this,
Tran_Date <- as.POSIXct("20170423 12:26 AM", format = "%Y%m%d %I:%M %p")
Tran_Date
#> [1] "2017-04-23 00:26:00 CEST"
transactions_file <- data.frame(
date = format(Tran_Date,"%m/%d/%Y"),
time = format(Tran_Date,"%H:%M")) # possibly add %p if you use %I
transactions_file
#> date time
#> 1 04/23/2017 00:26
with lubridate,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
Tran_Date <- ymd_hm("20170423 12:26 AM")
then you could recycle the above or use some combination of day(Tran_Date) cbind paste with month(Tran_Date) and similar with paste(hour(Tran_Date), minute(Tran_Date), sep = ":") or most likely something smarter.

Related

R - Formatting dates in dataframe - mix of decimal and character values

I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.

Converting date from 2019-07-04 14:01 +0000 to MM/dd/yyyy format

I am trying to the date format 2019-07-04 14:01 +0000 to mm/dd/yyyy format.
I am using this:
as.Date(strptime(d <- Twitter$time, "%b %d %Y %H:%M %p"))
I've also tried:
ymd_hms(Twitter$time)
However it returns NA values. Is there any way to convert this format to MM/dd/yyyy in R?
As we are not interested in the time component convert the column to Date class with as.Date (here the format is not required as the input is in the default format mode) and use format to change the format
format(as.Date(str1), "%m/%d/%Y")
#[1] "07/04/2019"
data
str1 <- "2019-07-04 14:01 +0000"
There are always two steps: parse, and format.
You can use as.Date() as shown or anydate() from the anytime package (which will also work for different input formats as shown here):
R> inp <- anytime::anydate(c("2019-07-04 14:01 +0000", "04-Jul-2019 14:02"))
R> inp
[1] "2019-07-04" "2019-07-04"
R> format(inp, "%m/%d/%Y")
[1] "07/04/2019" "07/04/2019"
R>

convert orderdate of format m/d/yy to YYYY-MM-DD in R

My orderdate is in factor and i want to convert it into from mm/dd/yy format to YYYY-MM-DD format.
orderdate : Factor w/ 155932 levels "1/1/2017 1:05:00 AM",..: 41 1 1 89 100 102 106 107 119 122 ...
I tried couple of things:
orders2017$newdate <- (factor(orders2017$orderdate))
orders2017$newdate1 <- as.Date(orders2017$newdate,format="%Y-%m-%d")
but nothing is working out and giving me new columns as empty. Any help is appreciated
If you really have values like "1/1/2017 1:05:00 AM" then those aren't dates, they are date times, and as such you have to specify formatting characters for both the date and time parts.
So, first you need to get your date times into a form R understands as such (e.g. POSIXct) by specifying all the parts of the date time:
test <- as.POSIXct("1/1/2017 1:05:00 AM", format = '%m/%d/%Y %I:%M:%S %p')
test
> test
[1] "2017-01-01 01:05:00 CST"
See ?strftime if you are not familiar with all the formatting placeholders used above, and note the conditions for use of %I and %p.
Then you can convert the POSIXct vector into the date format you desire
format(test, format = '%Y-%m-%d')
> format(test, format = '%Y-%m-%d')
[1] "2017-01-01"
A complication for you is that R has converted your character date times into a factor, so you need to convert them back to a character vector before converting to date times. For example (not tested as you didn't supply example data)
orders2017 <- transform(orders2017,
orderdate = as.POSIXct(as.character(orderdate),
format = '%m/%d/%Y %I:%M:%S %p'))
orders2017 <- transform(orders2017,
newdate = format(orderdate, format = '%Y-%m-%d'))
You were really close with as.Date(orders2017$newdate,format="%Y-%m-%d"), you just need to make the format string match your actual format.
Your actual format is mm/dd/YYYY, so use %m/%d/%Y as the format string:
as.Date("1/1/2017 1:05:00 AM", format = "%m/%d/%Y")
# [1] "2017-01-01"
Then the default printing of Date format objects is what you want.
So for your data,
orders2017$newdate1 <- as.Date(orders2017$newdate,format="%Y/%m/%d")
The time part will just be ignored.

NA returned while using strptime

I have this data frame which gives me Date and Time columns. I am trying to combine these 2 columns but strptime is returning NA. i want to understand why is it happening?
x <- data.frame(date = "1/2/2007", time = "00:00:02")
y <- strptime(paste(x$date,x$time,sep = " "), format = "%b/%d/%y %H:%M:%S")
We need %m and %Y in place of %b and %y (%b - Abbreviated month name in the current locale on this platform. %y - Year without century (00–99)).
strptime(paste(x$date,x$time,sep = " "), "%m/%d/%Y %H:%M:%S")
#[1] "2007-01-02 00:00:02 IST"
For understanding the format, it is better to check ?strptime
Or we can use mdy_hms from lubridate
library(lubridate)
with(x, mdy_hms(paste(date, time)))
#[1] "2007-01-02 00:00:02 UTC"

Change Format of Date Column

I need to turn one date format into another with RStudio, since for lubridate and other date related functions a standard unambiguous format is needed for further work. I've included a few examples and informations below:
Example-Dataset:
Function,HiredDate,FiredDate
Waitress,16-06-01 12:40:02,16-06-13 11:43:12
Chef,16-04-17 15:00:59,16-04-18 15:00:59
Current Date Format (POSIXlt) of HiredDate and FiredDate:
"%y-%m-%d %H:%M:%S"
What I want the Date Format of HireDate and FiredDate to be:
"%Y-%m-%d %H:%M:%S" / 2016-06-01 12:40:02
or
"%Y/%m/%d %H:%M:%S" / 2016/06/01 12:40:02
In principle, you can convert date and time for example using the strftime function:
d <- "2016-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "2016/06/01 12:40:02"
In your case, the year is causing trouble:
d <- "16-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "0016/06/01 12:40:02"
As Dave2e suggested, the two digit year can be read by %y:
strftime(d, format="%y/%m/%d %H:%M:%S")
[1] "16/06/01 12:40:02"
Assuming that your data comes from the 20st and 21st century, you can paste a 19 or 20 in front of the HireDate and FireDate:
current <- 16
prefixHire <- ifelse(substr(data$HireDate, 1, 2)<=currentYear,20,19)
prefixFire <- ifelse(substr(data$FireDate, 1, 2)<=currentYear,20,19)
data$HireDate = paste(prefixHire, data$HireDate, sep="")
data$FireDate = paste(prefixFire, data$FireDate, sep="")
The code generates a prefix by assuming that any date from a year greater than the current ('16) is actually from the 20th century. The prefix is then pasted to HireDate and FireDate.

Resources