Converting the date format in R - r

I try to do the date format conversion in R and I encounter the following problem: my original date is
"Dec-2011"
I want is becomes
2011-12
Then I tried
as.Date("Dec-2011",format = "%b-%Y")
It produces NA
I did some search, and I found that if you type
as.Date(gsub("^", "01-", "Dec-2011"), format="%d-%b-%Y")
It will give you
2011-12-01
I understand the function "gsub" does here, which is replace all the "^" in the "Dec-2011" by "01-". However, as you can see in "Dec-2011", there is no "^" and I am thinking whether it should be "01-" or "-01"? I am a little puzzled. What does the "gsub" really do here? And how should I perform date format conversion in R?

The ^ is a metacharacter to represent the start of a string. In the gsub example, the pattern gives the position for the replacement ("01-"), thus it replaces only at the start. Suppose, if we remove the ^, it will replace the "01-" after every character
gsub("", "01-", "Dec-2011")
#[1] "01-D01-e01-c01--01-201-001-101-101-"
gsub("^", "01-", "Dec-2011")
#[1] "01-Dec-2011"
gsub means global substitution, but we can replace it once with sub too
sub("", "01-", "Dec-2011")
#[1] "01-Dec-2011"
For this case, it may be easier to use paste
paste0("01-", "Dec-2011")
#[1] "01-Dec-2011"

Related

R turn 6 digit number into HMS i.e. "130041" into "13:00:41"

As the question states, I want to turn "130041" into "13:00:41" i.e. HMS data
lubridate::ymd("20220413") works no problems but lubridate::hms("130041") does not.
I assume there should be a reasonably simply solution?!
Thank you.
If you need the output as a lubridate Period object, rather than a character vector, as you need to perform operations on it, you can use the approach suggested by Tim Biegeleisen of adding colon separators to the character vector and then using lubridate:
x <- "130041"
gsub("(\\d{2})(?!$)", "\\1:", x, perl = TRUE) |>
lubridate::hms()
# [1] "13H 0M 41S"
The output is similar but it is a Period object. I used a slightly different regex as well (add a colon when there are two digits not followed by the end of string) but it is fundamentally the same approach.
You could use sub here:
x <- "130041"
output <- sub("(\\d{2})(\\d{2})(\\d{2})", "\\1:\\2:\\3", x)
output
[1] "13:00:41"
Another regex which will will also work in case when hour uses only on digit.
gsub("(?=(..){1,2}$)", ":", c("130041", "30041"), perl=TRUE)
#[1] "13:00:41" "3:00:41"
Which can than be used in e.g. in lubridate::hms or hms::as_hms.
In base::as.difftime a format could be given: as.difftime("130041", "%H%M%S")

Replacing string variable with punctuation in R without removing other string

In R, I am having trouble replacing a substring that has punctuation. Ie within the string "r.Export", I am trying to replace "r." with "Report.". I've used gsub and below is my code:
string <- "r.Export"
short <- "r."
replacement <- "Report."
gsub(short,replacement,string)
The desired output is: "Report.Export" however gsub seems to replace the second r such that the output is:
Report.ExpoReport.
Using sub() instead is not a solution either because I am doing multiple gsubs where sometimes the string to be replaced is:
short <- "o."
So, then the o's in r.Export are replaced anyway and it becomes a complete mess.
string <- "r.Export"
short <- "r\\."
replacement <- "Report."
gsub(short,replacement,string)
Returns:
[1] "Report.Export"
Or, using fixed=TRUE:
string <- "r.Export"
short <- "r."
replacement <- "Report."
gsub(short,replacement,string, fixed=TRUE)
Returns:
[1] "Report.Export"
Explanation: Without the fixed=TRUE argument, gsub expects a regular expression as first argument. And with regular expressions . is a placeholder for 'any character'. If you want the literal . (period) you have to use either \\. (i.e. escaping the period) or the aforementioned argument fixed=TRUE
Since you have characters in your pattern (.) which has a special meaning in regex use fixed = TRUE which matches the string as is.
gsub(short,replacement,string, fixed = TRUE)
#[1] "Report.Export"
I might actually add word boundaries and lookaheads to the mix here, to ensure as targeted a match as possible:
string <- "r.Export"
replacement <- "Report."
output <- gsub("\\br\\.(?=\\w)", replacement, string, perl=TRUE)
output
[1] "Report.Export"
This approach ensures that we only match r. when the r is preceded by whitespace or is the start of the string, and also when what follows the dot is another word. Consider the sentence The project r.Export needed a programmer. We wouldn't want to replace the final r. in this case.
We can use sub
sub(short,replacement,string, fixed = TRUE)
#[1] "Report.Export"

What is the best way in R to identify the first character in a string?

I am trying to find a way to loop through some data in R that contains both numbers and characters and where the first character is found return all values after. For example:
column
000HU89
87YU899
902JUK8
result
HU89
YU89
JUK8
have tried stringr_detct / grepl but the value of the first character is by nature unknown so I am having difficultly pulling it out.
We could use str_extract
stringr::str_extract(x, "[A-Z].*")
#[1] "HU89" "YU899" "JUK8"
data
x <- c("000HU89", "87YU899", "902JUK8")
Ronak's answer is simple.
Though I would also like to provide another method:
column <-c("000HU89", "87YU899" ,"902JUK8")
# Get First character
first<-c(strsplit(gsub("[[:digit:]]","",column),""))[[1]][1]
# Find the location of first character
loc<-gregexpr(pattern =first,column)[[1]][1]
# Extract everything from that chacracter to the right
substring(column, loc, last = 1000000L)
We can use sub from base R to match one or more digits (\\d+) at the start (^) of the string and replace with blank ("")
sub("^\\d+", "", x)
#[1] "HU89" "YU899" "JUK8"
data
x <- c("000HU89", "87YU899", "902JUK8")
In base R we can do
x <- c("000HU89", "87YU899", "902JUK8")
regmatches(x, regexpr("\\D.+", x))
# [1] "HU89" "YU899" "JUK8"

Replace dashes with colon in part of string

I have a dataframe with date and time values as characters, as extracted from a database. Currently the date/time looks like this: 2017-03-17 11-56-20
I want it to loook like this 2017-03-17 11:56:20
It doesn't seem to be as simple as replacing all the dashes using gsub as in R: How to replace . in a string?
I'm thinking it has something to do with the positioning, like telling R to only look after the space. Ideas?
Since you're dealing with a date-time object, you can use strptime:
x <- "2017-03-17 11-56-20"
as.character(strptime(x, "%Y-%m-%d %H-%M-%S", tz = ""))
# [1] "2017-03-17 11:56:20"
Try matching the following pattern:
(\\d\\d)-(\\d\\d)-(\\d\\d)$
and then replace that with:
\\1:\\2:\\3
This will match your timestamp exclusively, because of the terminal anchor $ at the end of the pattern. Then, we rebuild the timestamp the way you want using colons and the three capture groups.
gsub("(\\d\\d)-(\\d\\d)-(\\d\\d)$", "\\1:\\2:\\3", x)
[1] "2017-03-17 11:56:20"
You can use library(anytime) to take care of the formatting for you too (which also coerces to POSIX)
library(anytime)
anytime(x)
# [1] "2017-03-17 11:56:20 AEDT"
as.character(anytime(x))
# [1] "2017-03-17 11:56:20"

Converting to a standard as.Date date-time then back again, without leading zeros

Is it possible to convert from R's default date format to a user-defined format ("m/d/yyyy" here) and avoid getting leading zeros in the resulting date?
In the example below I want to have date_2 look just like date_1. Is there a way to do this with format or another function (ideally in one line of code), or will I need to resort to gsub to find and remove the leading zeros in front of the month ("09") and day ("05") in date_2?
I looked in documentation on DateTimeClasses, strptime, POSIXct, and format, but didn't come across an answer.
date_1<-"9/5/2008"
date_num<-as.Date(date_1,"%m/%d/%Y")
> date_num
[1] "2008-09-05"
date_2<-format(date_num,"%m/%d/%Y")
date_2
[1] "09/05/2008"
We can use gsub
gsub('(?<=\\/)0|^0', '', date_2, perl=TRUE)
#[1] "9/5/2008"
Or another version is
gsub('0(?=[1-9]\\/)', '', date_2, perl=TRUE)
#[1] "9/5/2008"
We could also convert to POSIXlt class and then extract the components, paste it.
v1 <- as.POSIXlt(date_num)
paste(v1$mon+1L, v1$mday, v1$year+1900, sep='/')
#[1] "9/5/2008"
Here is a way to go straight from date_num to your desired result, using the chron package.
paste(chron::month.day.year(date_num), collapse = "/")
# [1] "9/5/2008"
This also works nicely for multiple dates. The code is slightly different as we need do.call() here.
do.call(paste, c(chron::month.day.year(Sys.Date()-0:3), sep = "/"))
# [1] "10/9/2015" "10/8/2015" "10/7/2015" "10/6/2015"

Resources