convert Single digit day in R - r

I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you

You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"

If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"

Related

R: Regex to string detect date-time format in R

What would be a solution to detect date-time format
14/07/2009 19:15:29
Is this a fullproof solution?
str_detect(s,regex("([0-9]{2}/[0-9]{2}/[0-9]{4}) [0-9]{2}:[0-9]{2}:[0-9]{2}"))
For example for the format
14.07.2009
I have written the regex to be
str_detect(date,regex("([0-9]{2}\\.[0-9]{2}\\.[0-9]{4})"))
I don't have much idea regarding regex in R or regex in general, just the very basic stuff so would appreciate an easy approach with detailed logic. Thanks in advance.
As a beginner, I sometimes found it helpful to assemble the pattern as follows:
c(
"[0-9]{2}", # day
"/",
"[0-9]{2}", # month
"/",
"[0-9]{4}", # year
" ",
"[0-9]{2}", # Hour
":",
"[0-9]{2}", # minute
":",
"[0-9]{2}" # second
) |> paste(collapse = "")
Returns the pattern:
[1] "[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}"
stringr::str_detect("14/07/2009 19:15:29",
"[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}")
# [1] TRUE
Update (as per comments)
Here is how you could use the lubridate package. dmy_hms() finds datetimes in your format:
lubridate::dmy_hms("14/07/2009 19:15:29")
# [1] "2009-07-14 19:15:29 UTC"
But it will not parse invalid dates:
lubridate::dmy_hms("14/07/2009 19:15:70") # invalid seconds
# [1] NA
So to validate you could do:
(! is.na(lubridate::dmy_hms("14/07/2009 19:15:29")))
# [1] TRUE

How to modify date values?

How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.

Is this an encoding issue?

I downloaded a text file that contains basically two columns—a date column and a contents column.
The date column was initially in the format: mm/dd/yy h:mm:ss am/pm. For example, one such date would be 10/16/2018 8:10:10 PM
I wanted to get the normal date isolated. I split the text column using the strsplit() command and so now I have a vector with dates in the common mm/dd/yy format. I want to convert this using the as.Date(x, format = '%m/%d/%y) coommand.
I notice, however, that I get a good chunk of my character vector coming out as NA. I compared the NA values to the values surrounding it. Here is what I see:
normal_vector[1:3]
[1] "10/12/17" "‎10/12/17" "10/12/17"
**The middle one (normal_vector[2]) is the problem one. **
as.Date(normal_vector[1:3], format = "%m/%d/%y")
[1] "2017-10-12" NA "2017-10-12"
Could this be an encoding issue? I try using the as.Date(iconv(normal_vector[1:3], to = "UTF-8"), format = "%m/%d/%y") but it does not appear to help. Furthermore, if I inspect the encoding of the character vectors as it already is, I get the following:
Encoding(normal_vector[1:3])
[1] "unknown" "UTF-8" "unknown"
Again, I just want to convert all three of these elements into a normal date object in R. They appear identical, and the encoding would have me think that a "UTF-8" character would be easily handled by an as.Date() function. What are some possible reasons that it refuses to be converted to a date?
Thanks!
There are indeed some strange characters (three 'dots') in your second string
look at the hex e280 8e
fread from the data.table-package can read these text just fine...
data.table::fread("./temp.csv", header = FALSE)
# V1 V2 V3
# 1: 10/12/17 ‎10/12/17 10/12/17
after reading, you can cleanse your data using some regex-magic...
dt <- data.table::fread("./temp.csv", header = FALSE)
# V1 V2 V3
# 1: 10/12/17 ‎10/12/17 10/12/17
#strip all NON 0-9, a-z, A-z AND '/' -characters
cleaned <- as.character( gsub( "[^0-9a-zA-Z/]", "", as.matrix( dt ) ) )
as.Date( cleaned, format = "%m/%d/%y" )
# [1] "2017-10-12" "2017-10-12" "2017-10-12"

How to insert "-" (hyphon symbol) with in a vector character in R?

I have a vector
a <- "20160402"
I want to insert a "-" symbol in positions 5 and 8.
The result should like this
"2016-04-02"
I was trying to use `ins(a, "-", pos = c(5,8))
But this has not worked. Can anyone please help me.
Thank you
We can easily do the conversion with lubridate
library(lubridate)
ymd(a)
#[1] "2016-04-02 UTC"
Or use the correct format with as.Date
as.Date(a, '%Y%m%d')
#[1] "2016-04-02"
If we are looking for a regex solution, capture the characters as a group and use the backreference as the replacement
sub('(.{4})(.{2})(.{2})', '\\1-\\2-\\3', a)
#[1] "2016-04-02"

Convert date format to appropriate one?

I have column filled with date with the following format 09nov1992 and want convert it to 1992-Nov-01.
Any help would be appreciated.
Here's a simple way:
vec <- "09nov1992"
format(as.Date(vec, "%d%b%Y"), "%Y-%b-%d")
# [1] "1992-Nov-09"
An alternative version using regular expressions:
sub("(\\d+)(\\w)(\\w+?)(\\d+)", "\\4-\\U\\2\\L\\3-\\1", vec, perl = TRUE)
# [1] "1992-Nov-09"

Resources