Remove a part of a string [duplicate] - r

This question already has answers here:
Remove part of string after "."
(6 answers)
Closed 5 years ago.
I got a string;
"Enviroment is dangerous.123"
Now I want to remove everything after "dangerous" so the result will be
"Enviroment is dangerous"
I got different text strings of different length. So it needs to respond to the string "dangerous"
How do I do that?

We can use sub to match the . followed by one or more numbers (\\d+) until the end of the string ($) and replace with blank ("")
sub("\\.\\d+$", "", str1)
#[1] "Enviroment is dangerous"
data
str1 <- "Enviroment is dangerous.123"

Related

Remove characters which repeat more than twice in a string [duplicate]

This question already has answers here:
remove repeated character between words
(4 answers)
Closed 3 years ago.
I have this text:
F <- "hhhappy birthhhhhhdayyy"
and I want to remove the repeat characters, I tried this code
https://stackoverflow.com/a/11165145/10718214
and it works, but I need to remove repeat characters if it repeats more than 2, and if it repeated 2 times keep it.
so the output that I expect is
"happy birthday"
any help?
Try using sub, with the pattern (.)\\1{2,}:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1 represents the first capture group in sub.

Keep part of string after last sign. [duplicate]

This question already has answers here:
Extract last word in string in R
(5 answers)
Closed 4 years ago.
I would like to keep only the string after the last | sign in my rownames which looks like this:
in:
"d__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Chromatiales|f__Woeseiaceae|g__Woeseia"
out:
g__Woeseia
I have this code which keeps everything from the start until a given sign:
gsub("^.*\\.",".",x)
We could do this by capturing as a group. Using sub, match characters (.*) until the | and capture zero or more characters that are not a | (([^|]*)) until the end ($) of the string and replace by the backreference (\\1) of the captured group
sub(".*\\|([^|]*)$", "\\1", str1)
#[1] "g__Woeseia"
Or match characters until the | and replace it with blank ("")
sub(".*\\|", "", str1)
#[1] "g__Woeseia"
data
str1 <- "d__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Chromatiales|f__Woeseiaceae|g__Woeseia"

Remove characters in string before specific symbol(including it) [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"

Remove underscore from a string in R [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 7 years ago.
In my data.frame, I have a column of type character, where all the values look like this : 123_456 (three digits, an underscore, three digits).
I need to transform these values to a numeric, and as.numeric(my_dataframe$my_column) gives me a NA. Therefore I need to remove the underscore first, in order to do as.numeric.
How would I do that please ?
Thanks
We can use sub
as.numeric(sub("_", "", my_dataframe$my_column))

Remove blank spaces on the right side of each element of a character vector in R [duplicate]

This question already has answers here:
How can I trim leading and trailing white space?
(15 answers)
Closed 7 years ago.
My data is
student_Name= c("Sachin Tendulkar ","Virendar Shewag ",
"Saurav Ganguly ")
I want to remove the blank spaces on the right side in R.
So that my output should be "Sachin Tendulkar","Virendar Shewag","Saurav Ganguly"
You can use str_trim from stringr
library(stringr)
student_Name <- str_trim(student_Name, side='right')
Or use sub
sub('\\s+$', '', student_Name)
#[1] "Sachin Tendulkar" "Virendar Shewag" "Saurav Ganguly"

Resources