Extract string with digits and special characters in r - r

I have a list of filenames in the format "filename PID00-00-00" or just "PID00-00-00".
I want to extract part of the filename to create an ID column.
I am currently using this code for the string extraction
names(df) <- stringr::str_extract(names(df), "(?<=PID)\\d+")
binded1 = rbindlist(df, idcol = "ID")%>%
as.data.frame(binded1)
This gives the ID as the first set of digits after PID. e.g. filename PID1234-00-01 becomes ID 1234.
I want to also extract the first hyphen and following digits. So from filename PID1234-00-01 I want 1234-00.
What should my regex be?

try this:
stringr::str_extract(names(df),"(?<=PID)\\d{4}-\\d{2}")

Related

How to rename values from list based on information provided in the list name

I have 4 values in the list
c("JSMITH_WWWFRecvd2001_asof_20220901.xlsx", "WSMITH_AMEXRecvd2002_asof_20220901.xlsx",
"PSMITH_WWWFRecvd2003_asof_20220901.xlsx", "QSMITH_AMEXRecvd2004_asof_20220901.xlsx")
I would like my outcome to be
"wwwf_01","amex_02","wwwf_03","amex_04"
You can use sub:
tolower(sub('.+_(.+)Recvd[0-9][0-9](..).+', '\\1_\\2', x))
Something like this would work. You can extract the string you want with str_extract() make it lower case with tolower() and paste the formatted counter to the end of the string with a "_" separator =.
paste(tolower(stringr::str_extract(x,"WWWF|AMEX" )), sprintf("%02d",seq_along(x)), sep = "_")

Add string if the pattern matched using Regtex

I have a data table and I want to add a string (FirstWord!) to one column values if the pattern (Letter digits:Letter(s)digits) matches is like below
ColName
New test defiend
G54:Y23 (matched)
test:New
The expected results would be
New test defiend
FirstWord!G54:Y23
test:New
dt[, ColName := ColName %>% str_replace('(?<=\d)\:(?=[[:upper:]])',
paste0("'FirstWord!'",.))]
I don't know how to add the "FristWord!" when I find the pattern in the ColName.
transform(df, ColName = sub("([A-Z][0-9]+:[A-Z]+[0-9]+)", 'FirstWord!\\1', ColName))
ColName
1 New test defiend
2 FirstWord!G54:Y23
3 test:New
You can use sub and backreference:
sub("([A-Z]+\\d+:[A-Z]+)", "FirstWord!\\1", txt)
[1] "ColName" "New test defiend" "FirstWord!G54:Y23 (matched)" "test:New"
Here, we wrap the pattern Upper-case letter(s)digit(s):Upper-case letter(s)digit(s) into a capturing group to be able to refer to it using backreference \\1 in sub's replacement clause; there, we also add the desired "First_Word!" string.
If the pattern should not be case-sensitive, just add (?i) to the front of the pattern.
Data:
txt <- c("ColName","New test defiend","G54:Y23 (matched)","test:New")

Insert a character as an item delimiter into string in R

In R, I have a single string with multiple entries such as:
mydata <- c("(first data entry) (second data entry) (third data entry) ")
I want to insert the pipe symbol "|" between the entries as an item delimiter, ending up with the following list:
"(first data entry)|(second data entry)|(third data entry)"
Not all of the mydata rows are containing the same amount of entries. If mydata contains 0 or just 1 entry, then no "|" pipe symbol is required.
I've tried the following without success:
newdata <- paste(mydata, collapse = "|")
Thanks for your help!
You do not need a regex if you have consistent )+1 space+( pattern.
You can simply use
gsub(") (", ")|(", mydata, fixed=TRUE)
If your strings contain variable amount of spaces, tabs, etc., you can use
gsub("\\)\\s*\\(", ")|(", mydata)
gsub("\\)[[:space:]]*\\(", ")|(", mydata)
stringr::str_replace_all(mydata, "\\)\\s*\\(", ")|(")
Here, \)\s*\( pattern matches a ) (escaped because ) is a special regex metacharacter), then zero or more whitespaces, and then a (.
See the regex demo.
If there is always one or more whitespaces between parentheses, use \s+ instead of \s*.
This will replace the spaces with the "|" in your string. If you need more complex rules use regex with gsub.
gsub(") (",")|(", yourString)
I think you could use the following solution too:
gsub("(?<=\\))\\s+(?=\\()", "|", mydata, perl = TRUE)
[1] "(first data entry)|(second data entry)|(third data entry) "

replace element of string 3 - 10 positions after a pattern

Ideally, in base R I need some kind of string manipulation that will let me detect a pattern and change the string 3 positions after the pattern.
example <- "when string says SOMETHING = #c792ea"
desired output:
when string says SOMETHING = #001628
I have tried gsub but I am not sure how I can get it to replace the characters after a pattern.
If it based on the position of character, then we can use substring assignment
substring(example, 30) <- "#001628"
example
#[1] "when string says SOMETHING = #001628"
Or if we need to find the position of the word that starts with #
library(stringr)
posvec <- c(str_locate(example, "#\\w+"))
substring(example, posvec[1], posvec[2]) <- "#001628"
# // or with
# str_sub(example, posvec[1], posvec[2]) <- "#001628"
Another option is sub to change the substring after the = and one or more space (\\s*)
sub("=\\s*.*", "= #001628", example)
#[1] "when string says SOMETHING = #001628"

Get a list of files with exceptions using pattern

Using only the pattern argument from the list.files() function how could I get a file list excluding some files with similar pattern?
Let's say I have this files in my working directory:
med_t_1_1.csv, 01_t_1_1.csv, 02_t_1_1.csv, 03_t_1_1.csv,
med_t_2_1.csv, 01_t_2_1.csv, 02_t_2_1.csv, 03_t_2_1.csv
I want to get the files with the pattern t_1_1 but the one that starts with med:
01_t_1_1.csv, 02_t_1_1.csv, 03_t_1_1.csv
file_chrs <- c("med_t_1_1.csv", "01_t_1_1.csv", "02_t_1_1.csv", "03_t_1_1.csv",
"med_t_2_1.csv", "01_t_2_1.csv", "02_t_2_1.csv", "03_t_2_1.csv")
file_chrs[grepl("\\d_t_1_1", file_chrs)] # \\d matches and digit [0-9]
# console
[1] "01_t_1_1.csv" "02_t_1_1.csv" "03_t_1_1.csv"
# so in your working directory
list.files( pattern = "\\d_t_1_1" )
You can use regular expression
S[grepl("(?<!med_)t_1_1", S, perl=TRUE)]
# "01_t_1_1.csv" "02_t_1_1.csv" "03_t_1_1.csv"
Explanation of regex
(?<!med_) = (?< looks behind, ! does not match, med_ is the string
Look behind for string that does not match the string med_
t_1_1 = t_1_1 is string
Look for any string that matches t_1_1
**Other example
S1 <- c("med_t_1_1.csv", "S_t_1_1.csv", "04_t_1_1.csv")
S1[grepl("(?<!med_)t_1_1", S1, perl=TRUE)]
# "S_t_1_1.csv" "04_t_1_1.csv"

Resources