Add "\\-" every "x" letters of a string vector - r

I have a date vector like this
date <- c("01jan2020", "04mar2020", "20dec2020")
and I want to separate it with - following the next pattern (after the first 2 digits and after the first 5 digits):
date_transform1 <- c("01-jan-2020", "04-mar-2020", "20-dec-2020")
Next I want to convert the first letter of the month into a capital letter:
date_transform2 <- c("01-Jan-2020", "04-Mar-2020", "20-Dec-2020")
Any clue?
Regards

An option with lubridate and format
library(lubridate)
format(dmy(date), "%d-%b-%Y")
#[1] "01-Jan-2020" "04-Mar-2020" "20-Dec-2020"

You can try this approach splitting your text chain into multiple components:
#Data
date <- c("01jan2020", "04mar2020", "20dec2020")
#Extract first element
x1 <- substr(gsub("[^0-9.-]", "", date),1,2)
#Extract second element
x2 <- substr(gsub("[^0-9.-]", "", date),nchar(gsub("[^0-9.-]", "", date))-3,
nchar(gsub("[^0-9.-]", "", date)))
#Format month
x3 <- gsub('[[:digit:]]+', '', date)
x3 <- paste(toupper(substr(x3, 1, 1)), substr(x3, 2, nchar(x3)), sep="")
#Now concatenate
xf <- paste0(x1,'-',x3,'-',x2)
Output:
[1] "01-Jan-2020" "04-Mar-2020" "20-Dec-2020"

You can convert the character object to Date and change its format.
format(as.Date(date, "%d%b%Y"), "%d-%b-%Y")
# [1] "01-Jan-2020" "04-Mar-2020" "20-Dec-2020"
The first letters of months will be turned to capital ones. You can also use dmy() from lubridate or anydate() from anytime to parse Date objects.
format(lubridate::dmy(date), "%d-%b-%Y")
format(anytime::anydate(date), "%d-%b-%Y")
Another option with stringr package:
library(stringr)
str_replace(date, "[a-z]+", function(x) sprintf("-%s-", str_to_title(x)))
# [1] "01-Jan-2020" "04-Mar-2020" "20-Dec-2020"
or
str_replace(date, "[a-z]+", function(x) str_pad(str_to_title(x), 5, "both", "-"))
# [1] "01-Jan-2020" "04-Mar-2020" "20-Dec-2020"

Related

String Split in R Studio

Can anyone help me as I am trying to split the date from a string and the word "football" from the date in R?
Before 30/8/2020football
After 30/8/2020 in a date format and "football" as a string
Thanks
Alan
Here is one way based on information you have provided :
string <- '30/8/2020football'
date <- sub('(\\d+\\d+\\d+).*', '\\1', string)
remaining_string <- sub('.*\\d+(.*)', '\\1', string)
remaining_string
#[1] "football"
date <- as.Date(date, '%d/%m/%Y')
date
#[1] "2020-08-30"
Data:
v <- '30/8/2020football'
Solution:
df <- data.frame(Date = format(as.Date(unlist(strsplit(sub('([0-9/]+)(football).*', '\\1 \\2', v), " "))[1], "%d/%m/%Y")),
String = unlist(strsplit(sub('([0-9/]+)(football).*', '\\1 \\2', v), " "))[2])
Result:
df
Date String
1 2020-08-30 football
Or, if you prefer a more transparent procedure:
First split the vector:
v_split <- unlist(strsplit(sub('([0-9/]+)(football).*', '\\1 \\2', v), " "))
Then set up the dataframe:
df <- data.frame(
Date = format(as.Date(v_split [1], "%d/%m/%Y")),
String = v_split [2])

How to modify date values?

How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.

Adding a new column with month extracted from a separate already existing "date" (mdy) column

Trying to add a new column in my data table denoting the month (either as a numeric value or character) using an already available column of "SetDate", which is in the format mdy.
I'm new to R and having trouble. Thank you
base solution:
f = "%m/%d/%y" # note the lowercase y; it's because the year is 92, not 1992
dataset$SetDateMonth <- format(as.POSIXct(dataset$SetDate, format = f), "%m")
Basically, what it does is it converts the column from character (presumed class) to POSIXct, which allows for an easy extraction of month information.
Quick test:
format(as.POSIXct('1/1/92', format = "%m/%d/%y"), "%m")
[1] "01"
Try this (created a small example):
library(lubridate)
date_example <- "1/1/92"
lubridate::mdy(date_example)
[1] "1992-01-01"
lubridate::mdy(date_example) %>% lubridate::month()
[1] 1
If you want full month as character string, use:
lubridate::mdy(date_example) %>% lubridate::month(label = TRUE, abbr = FALSE)

convert Single digit day in R

I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you
You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"

Process Date Regex Capturing Groups outputs in R

I'm trying to coerce dates from two formats into a single one that I can easily feed into as.Date. Here's a sample:
library(dplyr)
df <- data_frame(date = c("Mar 29 2017 9:30AM", "5/4/2016"))
I've tried this:
df %>%
mutate(date = gsub("([A-z]{3}) (\\d{2}) (\\d{4}).*",
paste0(which(month.abb == "\\1"),"/\\2","/\\3"), date))
But it gave me this:
date
1 /29/2017
2 5/4/2016
but I want this!
date
1 3/29/2017
2 5/4/2016
It looks like when I use month.abb == "\\1", it doesn't use the capturing group output ("Mar"), it just uses the caller text ("\\1"). I want to do this in regex if possible. I know you can do it another way but want to be slick.
Any ideas?
Here is one way with gsubfn
library(gsubfn)
df$date <- gsubfn("^([A-Za-z]{3})\\s+(\\d{2})\\s+(\\d{4}).*", function(x, y, z)
paste(match(x, month.abb),y, z, sep="/"), df$date)
df$date
#[1] "3/29/2017" "5/4/2016"
Or sub in combination with gsubfn
sub("(\\S+)\\s+(\\S+)\\s+(\\S+).*", "\\1/\\2/\\3",
gsubfn("^([A-z]{3})", setNames(as.list(1:12), month.abb), df$date))
#[1] "3/29/2017" "5/4/2016"

Resources