Remove pattern from string with gsub - r

I am struggling to remove the substring before the underscore in my string.
I want to use * (wildcard) as the bit before the underscore can vary:
a <- c("foo_5", "bar_7")
a <- gsub("*_", "", a, perl = TRUE)
The result should look like:
> a
[1] 5 7
I also tried stuff like "^*" or "?" but did not really work.

The following code works on your example :
gsub(".*_", "", a)

Alternatively, you can also try:
gsub("\\S+_", "", a)

Just to point out that there is an approach using functions from the tidyverse, which I find more readable than gsub:
a %>% stringr::str_remove(pattern = ".*_")

as.numeric(gsub(pattern=".*_", replacement = '', a)
[1] 5 7

Related

gsub to remove escape string

> pname <- "Ratchanon \"TK\" Chantananuwat (Am)"
> gsub(\"TK\", "", pname)
Error: unexpected string constant in "gsub(\"TK\", ""
It is possible to remove the \"TK\" in this persons name?
I would like to suggest you do it in the following manner. First remove the special chars you have in your string. Then apply gsub() to get rid of the letter/word you may like.
pname <- "Ratchanon \"TK\" Chantananuwat (Am)"
library(stringr)
pname <- str_replace_all(pname, "[[:punct:]]", "") # removes all the special chars
gsub("TK", "", pname)
Hope this might help you!
In base R:
gsub('\\"TK\\"', "", pname)
#> [1] "Ratchanon Chantananuwat (Am)"
Another possible solution, based on stringr::str_replace:
library(stringr)
str_remove(pname, '\\"TK\\"')
#> [1] "Ratchanon Chantananuwat (Am)"

substitute string when there is a dot + number + ':'

I have strings that look like these:
> ABCD.1:f_HJK
> ABFD.1:f_HTK
> CJD:f_HRK
> QQYP.2:f_HDP
So basically, I have always a string in the first part, I could have a part with . and a number, and after this part I always have ':' and a string.
I would like to remove the '. + number' when it is included in the string, using R.
I know that maybe regular expressions could be useful but I have not idea about I can apply them in this context. I know that I can substitute the '.' with gsub, but not idea about how I can add the information about number and ':'.
Thank you for your help.
Does this work:
v <- c('ABCD.1:f_HJK','ABFD.1:f_HTK','CJD:f_HRK','QQYP.2:f_HDP')
v
[1] "ABCD.1:f_HJK" "ABFD.1:f_HTK" "CJD:f_HRK" "QQYP.2:f_HDP"
gsub('([A-Z]{,4})(\\.\\d)?(:.*)','\\1\\3',v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
You could also use any of the following depending on the structure of your string
If no other period and numbers in the string
sub("\\.\\d+", "", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If you are only interested in the first pattern matched.
sub("^([A-Z]+)\\.\\d+:", "\\1:", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
Same as above, invoking perl. ie no captured groups
sub("^[A-Z]+\\K\\.\\d+", "", v, perl = TRUE)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If I understood your explanation correctly, this should do the trick:
gsub("(\\.\\d+)", "", string)

Count number of dots in character string with str_count?

I am trying to count the number of dots in a character string.
I have tried to use str_count but it gives me the number of letters of the string instead.
ex_str <- "This.is.a.string"
str_count(ex_str, '.')
nchar(ex_str)
. is a special regex symbol, so you need to escape it:
str_count(ex_str, '\\.')
# [1] 3
Using just base R you could do:
nchar(gsub("[^.]", "", ex_str))
Using stringi:
stri_count_fixed(ex_str, '.')
Another base R solution could be:
length(grepRaw(".", ex_str, fixed = TRUE, all = TRUE))
[1] 3
You may also use the base function gregexpr:
sum(gregexpr(".", ex_str, fixed=TRUE)[[1]] > 0)
[1] 3
You can use stringr::str_count with a fixed(...) argument to avoid treating it as a regular expression:
str_count(ex_str, fixed('.'))
See the online R demo:
library(stringr)
ex_str <- "This.is.a.string"
str_count(ex_str, fixed('.'))
## => [1] 3

Remove part of a string until a character is found R

I have a regex problem or somewhat regex related problem...
I have strings that look like this:
"..........))))..)))))))"
"....))))))))...)).))))..))"
"......))))...)))...)))))"
I want to remove the initial dot sequence, so that I only get the string starting by the first occurence of ")" symbol. Say, the output would be somthing like:
"))))..)))))))"
"))))))))...)).))))..))"
"))))...)))...)))))"
I assume it would be somewhat similar to a lookahead regex but cannot figure out the correct one...
Any help?
Thanks
We match for 0 or more dots (\\.*) from the start (^) of the string and replace it with blank
sub("^\\.*", "", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
If it needs to start from ), then as above match 0 or more dots till the first ) and replace with the )
sub("^\\.*\\)", ")", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
data
v1 <- c("..........))))..)))))))", "....))))))))...)).))))..))", "......))))...)))...)))))")
You can simply remove dots from the beginning of the line (marked in the regex by ^) until you reach a non-dot character:
a <- "..........))))..)))))))"
b <- "....))))))))...)).))))..))"
c <- "......))))...)))...)))))"
sub("^\\.*", "", a) # "))))..)))))))"
sub("^\\.*", "", b) # "))))))))...)).))))..))"
sub("^\\.*", "", c) # "))))...)))...)))))"
The way your question is worded, the goal isn't to remove just . from the beginning, but any symbol until the first ) is encountered. So this answer is a more general solution.
stringr::str_extract("..........))))..)))))))","\\).*$")
Alternatively, if you want to stick with base R, you could use sub/gsub like this:
gsub("[^\\)]*(\\).*$)","\\1","..........))))..)))))))")
sub("[^\\)]*","","..........))))..)))))))")

str_replace (package stringr) cannot replace brackets in r?

I have a string, say
fruit <- "()goodapple"
I want to remove the brackets in the string. I decide to use stringr package because it usually can handle this kind of issues. I use :
str_replace(fruit,"()","")
But nothing is replaced, and the following is replaced:
[1] "()good"
If I only want to replace the right half bracket, it works:
str_replace(fruit,")","")
[1] "(good"
However, the left half bracket does not work:
str_replace(fruit,"(","")
and the following error is shown:
Error in sub("(", "", "()good", fixed = FALSE, ignore.case = FALSE, perl = FALSE) :
invalid regular expression '(', reason 'Missing ')''
Anyone has ideas why this happens? How can I remove the "()" in the string, then?
Escaping the parentheses does it...
str_replace(fruit,"\\(\\)","")
# [1] "goodapple"
You may also want to consider exploring the "stringi" package, which has a similar approach to "stringr" but has more flexible functions. For instance, there is stri_replace_all_fixed, which would be useful here since your search string is a fixed pattern, not a regex pattern:
library(stringi)
stri_replace_all_fixed(fruit, "()", "")
# [1] "goodapple"
Of course, basic gsub handles this just fine too:
gsub("()", "", fruit, fixed=TRUE)
# [1] "goodapple"
The accepted answer works for your exact problem, but not for the more general problem:
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace(my_fruits,"\\(\\)","")
## "goodapple" "(bad)apple", "(funnyapple"
This is because the regex exactly matches a "(" followed by a ")".
Assuming you care only about bracket pairs, this is a stronger solution:
str_replace(my_fruits, "\\([^()]{0,}\\)", "")
## "goodapple" "apple" "(funnyapple"
Building off of MJH's answer, this removes all ( or ):
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace_all(my_fruits, "[//(//)]", "")
[1] "goodapple" "badapple" "funnyapple"

Resources