String split to remove everything after _ [duplicate] - r

This question already has answers here:
How to extract everything until first occurrence of pattern
(4 answers)
Closed 1 year ago.
I have a list of file names and want to string extract just the part of the name before the _
I tried using the following but was unsuccessful.
condition <- strsplit(count_files, "_*")
also tried
condition <- strsplit(count_files, "_*.[c,t]sv")
Any suggestions?

Just use trimws from base R
trimws(count_files, whitespace = "_.*")
[1] "Fibroblast" "Fibroblast"
The output from strsplit is a list, it may need to be unlisted. Also, when we use _* the regex mentioned is zero or more _. Instead, it should be _.* i.e. _ followed by zero or more other characters (.*)
unlist(strsplit(count_files, "_.*"))
data
count_files <- c("Fibroblast_1.csv", "Fibroblast_2.csv")

Related

Replace "$" in a string in R [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 1 year ago.
I would like to replace $ in my R strings. I have tried:
mystring <- "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked"
stringr::str_replace(mystring, pattern="$",
replacement="!")
However, it fails and my replacement character is put as the last character in my original string:
[1] "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked!"
I tried some variation using "pattern="/$" but it fails as well. Can someone point a strategy to do that?
In base R, You could use:
chartr("$","!", mystring)
[1] "file.tree.id!HASHd15962267-44c21f1cee1057d95d6840!HASHe92451fece3b3341962516acfa962b2f!checked"
Or even
gsub("$","!", mystring, fixed = TRUE)
We need fixed to be wrapped as by default pattern is in regex mode and in regex $ implies the end of string
stringr::str_replace_all(mystring, pattern = fixed("$"),
replacement = "!")
Or could escape (\\$) or place it in square brackets ([$]$), but `fixed would be more faster

Split string after last underscore in R [duplicate]

This question already has answers here:
Separate string after last underscore
(2 answers)
Closed 2 years ago.
I have a string like "ABC_Something_Filename". How can I split it into "ABC_Something" and "Filename" in R?
I do not want to remove anything. I want both components - before and after last underscore.
Edit: I tried using what's mentioned for columns separation but that is too extensive for my use case. Hence, I finding a regex alternative to simply split a string
One option would be to use strsplit with a negative lookahead which asserts that the underscore on which to split is the final one in the input:
input <- "ABC_Something_Filename"
parts <- strsplit(input, "_(?!.*_)", perl=TRUE)[[1]]
parts
[1] "ABC_Something" "Filename"
You can use str_match and capture data in two groups.
x <- 'ABC_Something_Filename'
stringr::str_match(x, '(.*)_(.*)')[, -1]
#[1] "ABC_Something" "Filename"

How to extract substring from a string up to a certain occurrence of a character [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
How to extract substring from a string up to a certain occurrence of a character.
For example:
string <- 'test_file_csv_name.csv'
Up to the second occurrence of _
Expected output:
'test_file'
Using sub, we can capture the portion of the file name you want, removing the rest:
string <- 'test_file_csv_name.csv'
sub("^([^_]+_[^_]+).*$", "\\1", string)
[1] "test_file"
are you asking to keep a certain amout of _
you can use
sub("((.*?_){1}.*?)_.*","\\1",string)
change {1} to keep the amount of _ you'd like
Beside using sub (e.g. sub("([^_]*_[^_]*).*", "\\1", string)) you can use substr using the position found by gregexpr:
substr(string, 1, gregexpr("_", string)[[1]][2]-1)
#[1] "test_file"

Replace a string with first few characters [duplicate]

This question already has answers here:
Regex group capture in R with multiple capture-groups
(9 answers)
Closed 2 years ago.
Let say I have a pattern like -
Str = "#sometext_any_character_including_&**(_etc_blabla\\s"
Now I want to replace above text with
"#some\\s"
i.e. I just want to retain first 4 characters and trailing space and beginning #. Is there any r way to do this?
Any pointer will be highly appreciated.
I would extract using regex. If you want all text following the \\s I would capture them with an ex:
import re
# Extract
pattern = re.compile("(#[a-z]{4}|\\\s)")
my_match = "".join(pattern.findall(my_string))
An option with sub
sub("^(#.{4}).*(\\\\s)$", "\\1\\2", Str)
#[1] "#some\\s"
str_replace(string, pattern, replacement)
or
str_replace_all(string, pattern, replacement)
You can use

how to add a character to a string in R [duplicate]

This question already has answers here:
Insert a character at a specific location in a string
(8 answers)
Closed 6 years ago.
I have something like this:
text <- "abcdefg"
and I want something like this:
"abcde.fg"
how could I achieve this without assigning a new string to the vector text but instead changing the element of the vector itself? Finally, I would like to randomly insert the dot and actually not a dot but the character element of a vector.
We can try with sub to capture the first 5 characters as a group ((.{5})) followed by one or more characters in another capture group ((.*)) and then replace with the backreference of first group (\\1) followed by a . followed by second backreference (\\2).
sub("(.{5})(.*)", "\\1.\\2", text)
#[1] "abcde.fg"
NOTE: This solution is direct and doesn't need to paste anything together.
Also, substring with paste will work:
paste(substring(text, c(1,6), c(5,7)), collapse=".")
"abcde.fg"
The substring function accepts vector start-stop arguments and "splits" the string at the desired locations. We then can paste these elements together and with the collapse argument.
Without relying on the vector arguments, we could use the newer and recommended substr function:
paste(c(substr(text, 1, 5), substr(text, 6,7)), collapse=".")
[1] "abcde.fg"
Note that as mentioned by konrad-rudolph, this will create a copy of the vector.

Resources