This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
How can I remove text within parentheses with a regex?
(9 answers)
Closed 3 years ago.
I need to remove a single closed parentheses from a string to fix an edge case in a simpler regex problem.
I need to remove text from within parentheses, but the solution I am currently using doesn't handle an extra single closed parentheses well. Should I use a different approach or can I add an extra step to handle this case?
Below is an example where all answers should be brother & I highlighted the line that it fails on below
cleaner = function(x){
x = tolower(x)
## if terms are in brackets - assume this is an alternative and remove
x = stringr::str_remove_all(x, "\\(.*\\)")
## if terms are seperated by semi-colons or commas, take the first, assume others are alternatives and remove
x = gsub("^(.*?)(,|;).*", "\\1", x)
## remove whitespace
x = stringi::stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")
x
}
cleaner("brother(bro)")
cleaner("brother;bro")
cleaner("bro ther")
cleaner("(bro)brother ;bro")
cleaner("(bro)brother ;bro)") ## this fails
cleaner("(bro)brother ;(bro") # this doesnt
stringr::str_remove_all("(bro)brother ;bro)", "\\(.*\\)")
Thanks,
Sam
Related
This question already has answers here:
Exclude everything after the second occurrence of a certain string
(2 answers)
Closed 3 years ago.
I have a vector which has names of the columns
group <- c("amount_bin_group", "fico_bin_group", "cltv_bin_group", "p_region_bin")
I want to replace the part after the second "_" from each element i.e. I want it to be
group <- c("amount_bin", "fico_bin", "cltv_bin", "p_region")
I can split this into two vectors and try gsub or substr. However, it would be nice to do that in vector. Any thoughts?
I checked other posts regarding the same question, but none of them has this framework
> sub("(.*)_.*$", "\\1", group)
[1] "amount_bin" "fico_bin" "cltv_bin" "p_region"
This question already has answers here:
How to delete everything after nth delimiter in R?
(2 answers)
Remove text after second colon
(3 answers)
Remove all characters after the 2nd occurrence of "-" in each element of a vector
(1 answer)
Closed 3 years ago.
How could I remove everything before the second pattern occurence in a dataframe using R?
I used:
for (i in 1:length(df1)){
df1[, i]<- gsub(".*_", "",df1[, i])
}
But I guess there is a better way to apply that for all the dataframe?
Here is an exemple of a value in the dataframe:
name_000004_A_B_C
name_00003_C_D
and get
A_B_C
C_D
Thank you for your help.
x <- c("name_000004_A_B_C", "name_00003_C_D")
gsub("(name_[0-9]*_)(.*)", "\\2", x)
##[1] "A_B_C" "C_D"
More generalised:
gsub("([a-z0-9]*_[a-z0-9]*_)(.*)", "\\2", x)
#[1] "A_B_C" "C_D"
The global substitution takes two matching group patterns into consideration, first is the pattern (name_[0-9]*_) and the second is whatever comes after. It keeps the second matching group. Hope this hepls!
This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 3 years ago.
what is the most easiest way how to get number from string? I have huge list of links like this, I need to get that number 98548 from it.
https://address.com/admin/customers/98548/contacts
Note that number cant have different count of numbers and can start from 0 to 9
This is the most easiest that I know :
str <- "https://address.com/admin/customers/98548/contacts"
str_extract_all(str, "\\d+")[[1]]
Using stringr:
no="https://address.com/admin/customers/98548/contacts"
unlist(stringr::str_extract_all(no,"\\d{1,}"))
[1] "98548"
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"
This question already has answers here:
How can I trim leading and trailing white space?
(15 answers)
Closed 7 years ago.
My data is
student_Name= c("Sachin Tendulkar ","Virendar Shewag ",
"Saurav Ganguly ")
I want to remove the blank spaces on the right side in R.
So that my output should be "Sachin Tendulkar","Virendar Shewag","Saurav Ganguly"
You can use str_trim from stringr
library(stringr)
student_Name <- str_trim(student_Name, side='right')
Or use sub
sub('\\s+$', '', student_Name)
#[1] "Sachin Tendulkar" "Virendar Shewag" "Saurav Ganguly"