How to separate date and time - r

I`m giving an input as "a <- [12/Dec/2014:05:45:10]"
a is not a time-stamp so cannot use any time and date functions
Now I want the above variable to be split down into 2 parts as:-
date --> 12/Dec/2014
time --> 05:45:10
Any help will be appreciated.

You can use gsub to create a space between the Date and Time and use that to create two columns with read.table
read.table(text=gsub('^\\[([^:]+):(.*)+\\]', '\\1 \\2', a),
sep="", col.names=c('Date', 'Time'))
# Date Time
# 1 12/Dec/2014 05:45:10
Or you can use lubridate to convert it to a 'POSIXct' class
library(lubridate)
a1 <- dmy_hms(a)
a1
#[1] "2014-12-12 05:45:10 UTC"
If we need two columns with the specified format
d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'))
data
a <- "[12/Dec/2014:05:45:10]"

Code
a <- "[12/Dec/2014:05:45:10]"
Sys.setlocale("LC_TIME", "C") # depends on your local setting
as.POSIXlt(a, format = "[%d/%b/%Y:%H:%M:%S]")
Explanation
Depending on your local setting you need to change it such that the abbreviated month names can be read. Then you can use as.POSIXlt together with the format string to convert your string in a date.

Related

How to convert different date formats to single format in multiple columns of dataframe

I have a dataframe with dates in different formats scattered across the columns and I would like to standardize them to a single format. I can do the standardization for a single vector of heterogeneous dates, as in d, by defining the possible date formats in a vector such as formats and passing it to as.Date:
d <- c("01-02-2009","01/04/2009","15-Jan-2019", "12-12-2020")
formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")
format(as.Date(d, format = formats), "%d-%b-%Y")
[1] "01-Feb-2009" "01-Apr-2009" "15-Jan-2019" "12-Dez-2020"
But this doesn't work for the dataframe:
df <- data.frame(Transaction = c("01-Mar-2015", "31-01-2012", "15/01/1999"),
Delivery = c("01-02-2018", "01/08/2016", "17-09-2007"),
Return = c("27/11/2009", "22-Jan-2013", "20-Nov-1987"))
Here, the standardization works only partly:
df[,1:3] <- lapply(df[,1:3], function(x) format(as.Date(x, format = formats), "%d-%b-%Y"))
df
Transaction Delivery Return
1 <NA> 01-Feb-2018 <NA>
2 <NA> 01-Aug-2016 <NA>
3 <NA> <NA> 20-Nov-1987
How can the dates be standardized to the %d-%b-%Y format in the whole dataframe?
With mutate_all you can convert all character columns of your dataframe into a single date format using parse_date_time function from lubridate and passing your list of formats in orders argument.
Then, you can format these dates into the desired output by using format:
library(lubridate)
library(dplyr)
formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")
df %>% mutate_all( ~parse_date_time(., orders = formats)) %>%
mutate_all(~format(., "%d-%b-%Y"))
Transaction Delivery Return
1 01-Mar-2015 01-Feb-2018 27-Nov-2009
2 31-Jan-2012 01-Aug-2016 22-Jan-2013
3 15-Jan-1999 17-Sep-2007 20-Nov-1987
Using apply you can do:
library(lubridate)
apply(df, 2, function(x) format(parse_date_time(x, orders = formats), "%d-%b-%Y"))
Transaction Delivery Return
[1,] "01-Mar-2015" "01-Feb-2018" "27-Nov-2009"
[2,] "31-Jan-2012" "01-Aug-2016" "22-Jan-2013"
[3,] "15-Jan-1999" "17-Sep-2007" "20-Nov-1987"
Does it answer your question ?
NB: parse_date_time is working for lubridate version 1.7.8. For lubridate version 1.7.4, you can use parse_date and replace orders by format
The issue is that the formats in the columns are different than the one already created. So, we need something like
as.Date(df$Transaction, format = c("%d-%b-%Y", "%d-%m-%Y", "%d/%m/%Y"))
#[1] "2015-03-01" "2012-01-31" "1999-01-15"
i.e. the formats specified by the OP is
formats
#[1] "%d-%m-%Y" "%d/%m/%Y" "%d-%b-%Y"
if we check the 'Transaction' column
df$Transaction
#[1] 01-Mar-2015 31-01-2012 15/01/1999
It include %d-%m-%Y and %d/%m/%Y which is not found in the existing formats
Also, just to make it more clear, the vector format passed is doing an elementwise comparison of the format
as.Date(df$Transaction, format = c("%d-%b-%Y", "%d/%m/%Y"))
#[1] "2015-03-01" NA NA
i.e. by passing "%d/%m/%Y", it should have matched the third entry, but because it is an elementwise comparison, it does the check with the second element, then do a recycling of the vector format as it is of length less than the length of 'Transaction' column
This implies, that if our dataset is 1e6 rows, it expects 1e6 formats that should be matching each element.
Or using anydate from anytime
library(anytime)
addFormats(c('%d-%m-%Y', '%d/%m/%Y'))
df[] <- lapply(df, function(x) format(anydate(x), "%d-%b-%Y"))
df
# Transaction Delivery Return
#1 01-Mar-2015 01-Feb-2018 27-Nov-2009
#2 31-Jan-2012 01-Aug-2016 22-Jan-2013
#3 15-Jan-1999 17-Sep-2007 20-Nov-1987

Change column with different formats into dates

Would like to change a column of my data.frame into the date format in R.
The problem is that the format of the column is not consistent.
Most rows are in the format "%Y-%m-%d" and I can change them easily with the as.Date() function.
Few rows are in the format "%Y/%d/%m" and can't change them with the as.Date() function but instead I get NA's.
input <- c("2019-01-22", "2019-04-17", "2019/27/05", "2019/13/05", "2019/15/06", "2019-07-30")
Input: Output:
Dates Dates
2019-01-22 2019-01-22
2019-04-17 2019-04-17
2019/27/05 2019-27-05
2019/13/05 2019-13-05
2019/15/06 2019-15-06
2019-07-30 2019-07-30
In your case, in which you have "%Y-%m-%d" and "%Y/%d/%m", you might to use as.Date including the format it has. So, for example:
input <- c("2019-10-11", "2019/27/10", "2014-12-10")
If you use:
input2 <- ifelse(grepl("/",input), format(as.Date(input,"%Y/%d/%m"),"%Y-%m-%d"), input)
then:
> input2
[1] "2019-10-11" "2019-10-27" "2014-12-10"
If you have only these two formats you can substitute all the / for -:
Example:
input <- c("2019-10-11", "2019/10/12", "2014-10-13")
as.Date(gsub("/", "-", input), format = "%Y-%m-%d")
# [1] "2019-10-11" "2019-10-12" "2014-10-13"
If you just want to replace / by - for just a few rows, I think the following code might be an efficiency way to do the replacement
output <- as.Date(input)
output[is.na(output)]<-as.Date(input[is.na(output)],format = "%Y/%d/%m")
such that
> output
[1] "2019-01-22" "2019-04-17" "2019-05-27" "2019-05-13" "2019-06-15" "2019-07-30"
We can use anydate from anytime
library(anytime)
anydate(input)
#[1] "2019-10-11" "2019-10-12" "2014-10-13"
Or using lubridate
library(lubridate)
ymd(input)
data
input <- c("2019-10-11", "2019/10/12", "2014-10-13")

how to split numbers

I have a date format like this
5170301, where it means 1st March 2017.And I have 5 attached to it
I want the format of the date to be changed.
So can anyone help me in splitting that 5 from the date?
We can use substring to read from the 2nd character onwards
v1 <- substring(df1$date, 2)
NOTE: It should work for numeric/character/factor class
Then we change it to Date class
v2 <- as.Date(v1, "%y%m%d")
and if needed change the format
format(v2, "%d %b %Y")
Or as #thelatemail mentioned, it can be mentioned in the format
as.Date(df1$date, "5%y%m%d")
You can split it quite nicely with the stringr package
Split <- stringr::str_split_fixed(string=Column_Name, pattern="5", n=2)
This will give two variables: one blank and one of your value after the "5" (170301)
Then can change it to the date as so:
Date1 <- as.Date(format="%d%m%y", x = Split)

How to convert a date to YYYYDDD?

I can't figure out how to turn Sys.Date() into a number in the format YYYYDDD. Where DDD is the day of the year, i.e. Jan 1 would be 2016001 Dec 31 would be 2016365
Date <- Sys.Date() ## The Variable Date is created as 2016-01-01
SomeFunction(Date) ## Returns 2016001
You can just use the format function as follows:
format(Date, '%Y%j')
which gives:
[1] "2016161" "2016162" "2016163"
If you want to format it in other ways, see ?strptime for all the possible options.
Alternatively, you could use the year and yday functions from the data.table or lubridate packages and paste them together with paste0:
library(data.table) # or: library(lubridate)
paste0(year(Date), yday(Date))
which will give you the same result.
The values that are returned by both options are of class character. Wrap the above solutions in as.numeric() to get real numbers.
Used data:
> Date <- Sys.Date() + 1:3
> Date
[1] "2016-06-09" "2016-06-10" "2016-06-11"
> class(Date)
[1] "Date"
Here's one option with lubridate:
library(lubridate)
x <- Sys.Date()
#[1] "2016-06-08"
paste0(year(x),yday(x))
#[1] "2016160"
This should work for creating a new column with the specified date format:
Date <- Sys.Date
df$Month_Yr <- format(as.Date(df$Date), "%Y%d")
But, especially when working with larger data sets, it is easier to do the following:
library(data.table)
setDT(df)[,NewDate := format(as.Date(Date), "%Y%d"
Hope this helps. May have to tinker if you only want one value and are not working with a data set.

Create a proper date variable from an existing date variable in R

I have a data frame 'rta' with a date variable (date of death) with data entered in multiple formats like DD/MM/YY, D/M/YY, DD/M/YY, D/MM/YY, DD/MM, D/MM, D/M, DD/M.
rta$date.of.death<-c('12/12/08' ,'1/10/08','4/3/08','24/5/08','23/4','11/11','1/12')
Luckily all the dates belong to the year 2008.
I want to make this variable into a uniform format of DD/MM/YYYY, for example 12/12/2008. How to get it this way?
You could use this quick'n'dirty way:
rta <- data.frame(date.of.death=c('12/12/08' ,'1/10/08', '4/3/08',
'24/5/08','23/4','11/11','1/12'),
stringsAsFactors=F)
# append '/08' to the dates without year
noYear <- grep('.+/.+/.+',rta$date.of.death,invert=TRUE)
rta$date.of.death[noYear] <- paste(rta$date.of.death[noYear],'08',sep='/')
# convert the strings into POSIXct dates
dates <- as.POSIXct(rta$date.of.death, format='%d/%m/%y')
# turn the dates into strings having format: DD/MM/YYYY
rta$date.of.death <- format(dates,format='%d/%m/%Y')
> rta$date.of.death
[1] "12/12/2008" "01/10/2008" "04/03/2008" "24/05/2008" "23/04/2008" "11/11/2008" "01/12/2008"
Note:
this code assumes that no date has a four-digit year e.g. 01/01/2008
Try this:
as.Date(paste0(rta$date.of.death, "/08"), "%d/%m/%y")
giving
[1] "2008-12-12" "2008-10-01" "2008-03-04" "2008-05-24" "2008-04-23"
[6] "2008-11-11" "2008-12-01"

Resources