Split string after last underscore in R [duplicate] - r

This question already has answers here:
Separate string after last underscore
(2 answers)
Closed 2 years ago.
I have a string like "ABC_Something_Filename". How can I split it into "ABC_Something" and "Filename" in R?
I do not want to remove anything. I want both components - before and after last underscore.
Edit: I tried using what's mentioned for columns separation but that is too extensive for my use case. Hence, I finding a regex alternative to simply split a string

One option would be to use strsplit with a negative lookahead which asserts that the underscore on which to split is the final one in the input:
input <- "ABC_Something_Filename"
parts <- strsplit(input, "_(?!.*_)", perl=TRUE)[[1]]
parts
[1] "ABC_Something" "Filename"

You can use str_match and capture data in two groups.
x <- 'ABC_Something_Filename'
stringr::str_match(x, '(.*)_(.*)')[, -1]
#[1] "ABC_Something" "Filename"

Related

String split to remove everything after _ [duplicate]

This question already has answers here:
How to extract everything until first occurrence of pattern
(4 answers)
Closed 1 year ago.
I have a list of file names and want to string extract just the part of the name before the _
I tried using the following but was unsuccessful.
condition <- strsplit(count_files, "_*")
also tried
condition <- strsplit(count_files, "_*.[c,t]sv")
Any suggestions?
Just use trimws from base R
trimws(count_files, whitespace = "_.*")
[1] "Fibroblast" "Fibroblast"
The output from strsplit is a list, it may need to be unlisted. Also, when we use _* the regex mentioned is zero or more _. Instead, it should be _.* i.e. _ followed by zero or more other characters (.*)
unlist(strsplit(count_files, "_.*"))
data
count_files <- c("Fibroblast_1.csv", "Fibroblast_2.csv")

Replace "$" in a string in R [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 1 year ago.
I would like to replace $ in my R strings. I have tried:
mystring <- "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked"
stringr::str_replace(mystring, pattern="$",
replacement="!")
However, it fails and my replacement character is put as the last character in my original string:
[1] "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked!"
I tried some variation using "pattern="/$" but it fails as well. Can someone point a strategy to do that?
In base R, You could use:
chartr("$","!", mystring)
[1] "file.tree.id!HASHd15962267-44c21f1cee1057d95d6840!HASHe92451fece3b3341962516acfa962b2f!checked"
Or even
gsub("$","!", mystring, fixed = TRUE)
We need fixed to be wrapped as by default pattern is in regex mode and in regex $ implies the end of string
stringr::str_replace_all(mystring, pattern = fixed("$"),
replacement = "!")
Or could escape (\\$) or place it in square brackets ([$]$), but `fixed would be more faster

Replace a string with first few characters [duplicate]

This question already has answers here:
Regex group capture in R with multiple capture-groups
(9 answers)
Closed 2 years ago.
Let say I have a pattern like -
Str = "#sometext_any_character_including_&**(_etc_blabla\\s"
Now I want to replace above text with
"#some\\s"
i.e. I just want to retain first 4 characters and trailing space and beginning #. Is there any r way to do this?
Any pointer will be highly appreciated.
I would extract using regex. If you want all text following the \\s I would capture them with an ex:
import re
# Extract
pattern = re.compile("(#[a-z]{4}|\\\s)")
my_match = "".join(pattern.findall(my_string))
An option with sub
sub("^(#.{4}).*(\\\\s)$", "\\1\\2", Str)
#[1] "#some\\s"
str_replace(string, pattern, replacement)
or
str_replace_all(string, pattern, replacement)
You can use

Extract the strings that follows a regex pattern in R [duplicate]

This question already has answers here:
Extract a regular expression match
(12 answers)
Closed 3 years ago.
The list of original inputs are a list of free text field. The task is to extract a pattern like "234-5678" from the string.
For example the list in the following:
text <- c("abced 156-8790","kien 3578-562839 bewsd","$nietl 66320-98703","789-55340")
what I would like to extract is:
return <- c("156-8790","578-5628","320-9870","789-5534")
I was considering to use gsub("^[([:digit:]{3})[-]([:digit:]{4})]", replacement = "", text), but the regex does not work the way I wanted. Could anyone please help with this? Many thanks in advance!
We can use str_extract to match 3 digits (\\d{3}) followed by a - , followed 4 digits (\\d{4})
library(stringr)
str_extract(text, "\\d{3}-\\d{4}")
#[1] "156-8790" "578-5628" "320-9870" "789-5534"
Or using base R with regmatches/regexpr
regmatches(text, regexpr("\\d{3}-\\d{4}", text))
#[1] "156-8790" "578-5628" "320-9870" "789-5534"

how to add a character to a string in R [duplicate]

This question already has answers here:
Insert a character at a specific location in a string
(8 answers)
Closed 6 years ago.
I have something like this:
text <- "abcdefg"
and I want something like this:
"abcde.fg"
how could I achieve this without assigning a new string to the vector text but instead changing the element of the vector itself? Finally, I would like to randomly insert the dot and actually not a dot but the character element of a vector.
We can try with sub to capture the first 5 characters as a group ((.{5})) followed by one or more characters in another capture group ((.*)) and then replace with the backreference of first group (\\1) followed by a . followed by second backreference (\\2).
sub("(.{5})(.*)", "\\1.\\2", text)
#[1] "abcde.fg"
NOTE: This solution is direct and doesn't need to paste anything together.
Also, substring with paste will work:
paste(substring(text, c(1,6), c(5,7)), collapse=".")
"abcde.fg"
The substring function accepts vector start-stop arguments and "splits" the string at the desired locations. We then can paste these elements together and with the collapse argument.
Without relying on the vector arguments, we could use the newer and recommended substr function:
paste(c(substr(text, 1, 5), substr(text, 6,7)), collapse=".")
[1] "abcde.fg"
Note that as mentioned by konrad-rudolph, this will create a copy of the vector.

Resources