gsub to remove escape string - r

> pname <- "Ratchanon \"TK\" Chantananuwat (Am)"
> gsub(\"TK\", "", pname)
Error: unexpected string constant in "gsub(\"TK\", ""
It is possible to remove the \"TK\" in this persons name?

I would like to suggest you do it in the following manner. First remove the special chars you have in your string. Then apply gsub() to get rid of the letter/word you may like.
pname <- "Ratchanon \"TK\" Chantananuwat (Am)"
library(stringr)
pname <- str_replace_all(pname, "[[:punct:]]", "") # removes all the special chars
gsub("TK", "", pname)
Hope this might help you!

In base R:
gsub('\\"TK\\"', "", pname)
#> [1] "Ratchanon Chantananuwat (Am)"
Another possible solution, based on stringr::str_replace:
library(stringr)
str_remove(pname, '\\"TK\\"')
#> [1] "Ratchanon Chantananuwat (Am)"

Related

substitute string when there is a dot + number + ':'

I have strings that look like these:
> ABCD.1:f_HJK
> ABFD.1:f_HTK
> CJD:f_HRK
> QQYP.2:f_HDP
So basically, I have always a string in the first part, I could have a part with . and a number, and after this part I always have ':' and a string.
I would like to remove the '. + number' when it is included in the string, using R.
I know that maybe regular expressions could be useful but I have not idea about I can apply them in this context. I know that I can substitute the '.' with gsub, but not idea about how I can add the information about number and ':'.
Thank you for your help.
Does this work:
v <- c('ABCD.1:f_HJK','ABFD.1:f_HTK','CJD:f_HRK','QQYP.2:f_HDP')
v
[1] "ABCD.1:f_HJK" "ABFD.1:f_HTK" "CJD:f_HRK" "QQYP.2:f_HDP"
gsub('([A-Z]{,4})(\\.\\d)?(:.*)','\\1\\3',v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
You could also use any of the following depending on the structure of your string
If no other period and numbers in the string
sub("\\.\\d+", "", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If you are only interested in the first pattern matched.
sub("^([A-Z]+)\\.\\d+:", "\\1:", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
Same as above, invoking perl. ie no captured groups
sub("^[A-Z]+\\K\\.\\d+", "", v, perl = TRUE)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If I understood your explanation correctly, this should do the trick:
gsub("(\\.\\d+)", "", string)

How to throw out spaces and underscores only from the beginning of the string?

I want to ignore the spaces and underscores in the beginning of a string in R.
I can write something like
txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)
But I think there could be an elegant solution
txt <- " 9PM 8-Oct-2014_0.335kwh "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt
The output should be "9PM 8-Oct-2014_0.335kwh ". But my code gives " 9PM 8-Oct-2014_0.335kwh ".
How can I fix it?
You could bundle the \s and the underscore only in a character class and use quantifier to repeat that 1+ times.
^[\s_]+
Regex demo
For example:
txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)
Or as #Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:
txt <- sub("[\\s_]+", "", txt, perl=TRUE)
Or using a POSIX character class
txt <- sub("[[:space:]_]+", "", txt)
More info about perl=TRUE and regular expressions used in R
R demo
The stringr packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.
Given string s <- " \t_blah_ ", which contains whitespace (spaces and tabs) and underscores:
library(stringr)
# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "
# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"
In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr functions that will help you keep things simple:
# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "
str_trim(s, side = "right")
# [1] " \t_blah_"
str_trim(s, side = "both") # This is the default.
# [1] "_blah_"
# `str_squish` reduces repeated whitespace anywhere in string.
s <- " \t_blah blah_ "
str_squish(s)
# "_blah blah_"
The same pattern [\\s_]+ will also work in base R's sub or gsub, with some minor modifications, if that's your jam (see Thefourthbird`s answer).
You can use stringr as:
txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Or the trimws in Base R
trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Use gsub to keep only the first part of my string

In R, I have strings looking like:
test <- 'ZYG11B|79699'
I want to keep only 'ZYG11B'.
My best attempt yet:
gsub ("|.*$", "", test) # should replace everything after '|' by nothing
but returns
> [1] ""
How should I do that?
It's a protected character which means it should be enclosed in square brackets or escaped with double slashes:
> gsub('[|].*$','', test)
[1] "ZYG11B"
> gsub('\\|.*$','', test)
[1] "ZYG11B"
We can do
library(stringr)
str_extract(test, "\\w+")
#[1] "ZYG11B"

Replacing underscore "_" with backslash-underscore "\_" in an R string

Q: How can I replace underscores "_" with backslash-underscores "_" in an R string? I'd prefer to use the stringr package.
Also, can anyone explain why line 5 below fails to get the desired result? I was almost certain that would work.
library(stringr)
s <- "foo_bar_baz"
str_replace_all(s, "_", 5) # [1] "foo5bar5baz"
str_replace_all(s, "_", "\_") # Error: '\_' is an unrecognized escape in character string starting ""\_"
str_replace_all(s, "_", "\\_") # [1] "foo_bar_baz"
str_replace_all(s, "_", "\\\_") # Error: '\_' is an unrecognized escape in character string starting ""\\\_"
str_replace_all(s, "_", "\\\\_") # [1] "foo\\_bar\\_baz"
Context: I'm making a LaTeX table using xtable and need to sanitize my column names since they all have underscores and break LaTeX.
It is all much easier. Replace literal strings with literal strings with the help of fixed("_"), no need for a regex.
> library(stringr)
> s <- "foo_bar_baz"
> str_replace_all(s, fixed("_"), "\\_")
[1] "foo\\_bar\\_baz"
And if you use cat:
> cat(str_replace_all(s, fixed("_"), "\\_"))
foo\_bar\_baz>
You will see that you actually have 1 backslash in the result.

Remove pattern from string with gsub

I am struggling to remove the substring before the underscore in my string.
I want to use * (wildcard) as the bit before the underscore can vary:
a <- c("foo_5", "bar_7")
a <- gsub("*_", "", a, perl = TRUE)
The result should look like:
> a
[1] 5 7
I also tried stuff like "^*" or "?" but did not really work.
The following code works on your example :
gsub(".*_", "", a)
Alternatively, you can also try:
gsub("\\S+_", "", a)
Just to point out that there is an approach using functions from the tidyverse, which I find more readable than gsub:
a %>% stringr::str_remove(pattern = ".*_")
as.numeric(gsub(pattern=".*_", replacement = '', a)
[1] 5 7

Resources