Convert date format to appropriate one? - r

I have column filled with date with the following format 09nov1992 and want convert it to 1992-Nov-01.
Any help would be appreciated.

Here's a simple way:
vec <- "09nov1992"
format(as.Date(vec, "%d%b%Y"), "%Y-%b-%d")
# [1] "1992-Nov-09"
An alternative version using regular expressions:
sub("(\\d+)(\\w)(\\w+?)(\\d+)", "\\4-\\U\\2\\L\\3-\\1", vec, perl = TRUE)
# [1] "1992-Nov-09"

Related

R: Regex to string detect date-time format in R

What would be a solution to detect date-time format
14/07/2009 19:15:29
Is this a fullproof solution?
str_detect(s,regex("([0-9]{2}/[0-9]{2}/[0-9]{4}) [0-9]{2}:[0-9]{2}:[0-9]{2}"))
For example for the format
14.07.2009
I have written the regex to be
str_detect(date,regex("([0-9]{2}\\.[0-9]{2}\\.[0-9]{4})"))
I don't have much idea regarding regex in R or regex in general, just the very basic stuff so would appreciate an easy approach with detailed logic. Thanks in advance.
As a beginner, I sometimes found it helpful to assemble the pattern as follows:
c(
"[0-9]{2}", # day
"/",
"[0-9]{2}", # month
"/",
"[0-9]{4}", # year
" ",
"[0-9]{2}", # Hour
":",
"[0-9]{2}", # minute
":",
"[0-9]{2}" # second
) |> paste(collapse = "")
Returns the pattern:
[1] "[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}"
stringr::str_detect("14/07/2009 19:15:29",
"[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}")
# [1] TRUE
Update (as per comments)
Here is how you could use the lubridate package. dmy_hms() finds datetimes in your format:
lubridate::dmy_hms("14/07/2009 19:15:29")
# [1] "2009-07-14 19:15:29 UTC"
But it will not parse invalid dates:
lubridate::dmy_hms("14/07/2009 19:15:70") # invalid seconds
# [1] NA
So to validate you could do:
(! is.na(lubridate::dmy_hms("14/07/2009 19:15:29")))
# [1] TRUE

How to modify date values?

How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.

convert Single digit day in R

I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you
You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"

String between first two (.dots)

Hi have data which contains two or more dots. My requirement is to get string from first to second dot.
E.g string <- "abcd.vdgd.dhdsg"
Result expected =vdgd
I have used
pt <-strapply(string, "\\.(.*)\\.", simplify = TRUE)
which is giving correct data but for string having more than two dots its not working as expected.
e.g string <- "abcd.vdgd.dhdsg.jsgs"
its giving dhdsg.jsgs but expected is vdgd
Could anyone help me.
Thanks & Regards,
In base R we can use strsplit
ss <- "abcd.vdgd.dhdsg"
unlist(strsplit(ss, "\\."))[2]
#[1] "vdgd"
Or using gregexpr with regmatches
unlist(regmatches(ss, gregexpr("[^\\.]+", ss)))[2]
#[1] "vdgd"
Or using gsub (thanks #TCZhang)
gsub("^.+?\\.(.+?)\\..*$", "\\1", ss)
#[1] "vdgd"
Another option:
string <- "abcd.vdgd.dhdsg.jsgs"
library(stringr)
str_extract(string = string, pattern = "(?<=\\.).*?(?=\\.)")
[1] "vdgd"
I like this one because the str_extract function will return the first instance of the correct pattern, but you could also use str_extract_all to get all instances.
str_extract_all(string = string, pattern = "(?<=\\.).*?(?=\\.)")
[[1]]
[1] "vdgd" "dhdsg"
From here, you could index to get any position between two dots you want.
Another solution with the qdapRegex package:
library(qdapRegex)
ex_between("abcd.vdgd.dhdsg.jsgs", ".", ".")[[1]][1]
# "vdgd"
You can use read.table as well if you wish.Here providing the string as given in your problem and selecting the separator as dot("."), Once the column is converted into a data.frame, you may choose to select whatever column you want to pick(In this case it is column number 2).
read.table(text=string, sep=".",stringsAsFactors = FALSE)[,2]
Output:
> read.table(text=string, sep=".",stringsAsFactors = FALSE)[,2]
[1] "vdgd"
Here is a fun easy way via stringr
stringr::word(string, 2, sep = '\\.')
Here are two options that are vectorized over the input string vector:
You can try tstrsplit from data.table, which is vectorized over string:
> string <- c("abcd.vdgd.dhdsg", "abcd.vdgd.dhdsg.jsgs")
> tstrsplit(string, '.', fixed = TRUE)[[2]]
[1] "vdgd" "vdgd"
or regex:
> sub('.*?\\.(.*?)\\..*', '\\1', string)
[1] "vdgd" "vdgd"`

numeric sort a list of strings in R

I have a list:
a <- ["12file.txt", "8file.txt", "66file.txt"]
I would like to sort by number:
a would be: ["8file.txt", "12file.txt", "66file.txt"]
Now I could get only this:
a = ["12file.txt", "66file.txt", "8file.txt"]
Thanks
I'm assuming you have a character vector:
a <- c("12file.txt", "8file.txt", "66file.txt")
I would approach this by pulling out the number at the start of each string and sorting on that:
num <- as.numeric(sub("([0-9]+).*", "\\1", a))
a[order(num)]
#[1] "8file.txt" "12file.txt" "66file.txt"
You could also pad your strings with spaces by setting a field length to sprintf to achieve the sorting you want:
a[order(sprintf("%10s",a))]
[1] "8file.txt" "12file.txt" "66file.txt"
You can use str_sort(..., numeric = TRUE) function from stringr package:
library(stringr)
a <- c("12file.txt", "8file.txt", "66file.txt")
str_sort(a, numeric = TRUE)
#> [1] "8file.txt" "12file.txt" "66file.txt"

Resources