String after and before character - r

I have this string
x = "Hello how are you Peter /"
And I would like to get only
x = "Peter"
I would like to find patter that extract only word after "you" and before "/" (exluded)
I would like to use something like
x = sub(" you*/.", "", x)
But I dont know how to make the pattern correctly.

gsub(".*you (.*) /$", "\\1", x)

library(stringr)
str_match(x, "you\\s*(.*?)\\s*\\/")[, 2]
#[1] "Peter"

With lookahead and lookbehind:
library(stringr)
x = "Hello how are you Peter /"
str_extract(x,"(?<=you )\\w+(?= /)")
[1] "Peter"
If you want to be a bit more robust to spaces (if there is or not a space after the name for example, the example above will not work):
str_extract(x,"(?<=you)[\\w ]+(?=/)") %>%
trimws()

Related

fetch specific word or number from url address [duplicate]

I have a dataset say
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
I'd like to remove everything before (including) the last slash, the result should look like
my cat is handsome
I googled this code which gives me everything before the last slash
gsub('(.*)/\\w+', '\\1', x)
[1] "test/test" "et/tom" "set/eat" "sk / tie"
How can I change this code, so that the other part of the string after the last slash can be shown?
Thanks
You can use basename:
paste(trimws(basename(x)),collapse=" ")
# [1] "my cat is handsome"
Using strsplit
> sapply(strsplit(x, "/\\s*"), tail, 1)
[1] "my" "cat" "is" "handsome"
Another way for gsub
> gsub("(.*/\\s*(.*$))", "\\2", x) # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
Using str_extract from stringr package
> library(stringr)
> str_extract(x, "\\w+$") # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
You can basically just move where the parentheses are in the regex you already found:
gsub('.*/ ?(\\w+)', '\\1', x)
You could use
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x)
Which yields
[1] "my" "cat" "is" "handsome"
To have it in one sentence, you could paste it:
(paste0(gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x), collapse = " "))
The pattern here is:
^ # start of the string
(?:[^/]*/)* # not a slash, followed by a slash, 0+ times
\\s* # whitespaces, eventually
(.*) # capture the rest of the string
This is replaced by \\1, hence the content of the first captured group.

Regex to add comma between any character

I'm relatively new to regex, so bear with me if the question is trivial. I'd like to place a comma between every letter of a string using regex, e.g.:
x <- "ABCD"
I want to get
"A,B,C,D"
It would be nice if I could do that using gsub, sub or related on a vector of strings of arbitrary number of characters.
I tried
> sub("(\\w)", "\\1,", x)
[1] "A,BCD"
> gsub("(\\w)", "\\1,", x)
[1] "A,B,C,D,"
> gsub("(\\w)(\\w{1})$", "\\1,\\2", x)
[1] "ABC,D"
Try:
x <- 'ABCD'
gsub('\\B', ',', x, perl = T)
Prints:
[1] "A,B,C,D"
Might have misread the query; OP is looking to add comma's between letters only. Therefor try:
gsub('(\\p{L})(?=\\p{L})', '\\1,', x, perl = T)
(\p{L}) - Match any kind of letter from any language in a 1st group;
(?=\p{L}) - Positive lookahead to match as per above.
We can use the backreference to this capture group in the replacement.
You can use
> gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
[1] "A,B,C,D"
The (.)(?=.) regex matches any char capturing it into Group 1 (with (.)) that must be followed with any single char ((?=.)) is a positive lookahead that requires a char immediately to the right of the current location).
Vriations of the solution:
> gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
## Or with stringr:
## stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
[1] "A,B,C,D"
Here, (?!$) fails the match if there is an end of string position.
See the R demo online:
x <- "ABCD"
gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
# => [1] "A,B,C,D"
A non-regex friendly answer:
paste(strsplit(x, "")[[1]], collapse = ",")
#[1] "A,B,C,D"
Another option is to use positive look behind and look ahead to assert there is a preceding and a following character:
library(stringr)
str_replace_all(x, "(?<=.)(?=.)", ",")
[1] "A,B,C,D"

A regex to remove the pattern "[0-9]g"

I have the following sample dataset:
XYZ 185g
ABC 60G
Gha 20g
How do I remove the strings "185g", "60G", "20g" without accidentally removing the alphabets g and G in the main words?
I tried the below code but it replaces the alphabets in the main words as well.
a <- str_replace_all(a$words,"[0-9]"," ")
a <- str_replace_all(a$words,"[gG]"," ")
You need to combine them into something like
a$words <- str_replace_all(a$words,"\\s*\\d+[gG]$", "")
The \s*\d+[gG]$ regex matches
\s* - zero or more whitespaces
\d+ - one or more digits
[gG] - g or G
$ - end of string.
If you can have these strings inside a string, not just at the end, you may use
a$words <- str_replace_all(a$words,"\\s*\\d+[gG]\\b", "")
where $ is replaced with a \b, a word boundary.
To ignore case,
a$words <- str_replace_all(a$words, regex("\\s*\\d+g\\b", ignore_case=TRUE), "")
You can try
> gsub("\\s\\d+g$", "", c("XYZ 185g", "ABC 60G", "Gha 20g"), ignore.case = TRUE)
[1] "XYZ" "ABC" "Gha"
You can also use the following solution:
vec <- c("XYZ 185g", "ABC 60G", "Gha 20g")
gsub("[A-Za-z]+(*SKIP)(*FAIL)|[ 0-9Gg]+", "", vec, perl = TRUE)
[1] "XYZ" "ABC" "Gha"

In R, how to remove everything before the last slash

I have a dataset say
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
I'd like to remove everything before (including) the last slash, the result should look like
my cat is handsome
I googled this code which gives me everything before the last slash
gsub('(.*)/\\w+', '\\1', x)
[1] "test/test" "et/tom" "set/eat" "sk / tie"
How can I change this code, so that the other part of the string after the last slash can be shown?
Thanks
You can use basename:
paste(trimws(basename(x)),collapse=" ")
# [1] "my cat is handsome"
Using strsplit
> sapply(strsplit(x, "/\\s*"), tail, 1)
[1] "my" "cat" "is" "handsome"
Another way for gsub
> gsub("(.*/\\s*(.*$))", "\\2", x) # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
Using str_extract from stringr package
> library(stringr)
> str_extract(x, "\\w+$") # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
You can basically just move where the parentheses are in the regex you already found:
gsub('.*/ ?(\\w+)', '\\1', x)
You could use
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x)
Which yields
[1] "my" "cat" "is" "handsome"
To have it in one sentence, you could paste it:
(paste0(gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x), collapse = " "))
The pattern here is:
^ # start of the string
(?:[^/]*/)* # not a slash, followed by a slash, 0+ times
\\s* # whitespaces, eventually
(.*) # capture the rest of the string
This is replaced by \\1, hence the content of the first captured group.

Extract characters between specified characters in R

I have this variable
x= "379_exp_mirror1.csv"
I need to extract the number ("379") at the beggining (which doesn't always have 3 characters), i.e. everything before the first "". And then I need to extract everything between the second "" and the ".", in this case "mirror1".
I have tried several combinations with sub and gsub with no success, can anyone give me some indications please?
Thank you
You can use regular expression. For your problem ^(?<Number>[0-9]*)_.* do the job
1/ Test your regular expression with this website : http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
Or you can split string with underscore and then try parse (int.TryParse). I think the second is better but if you want to be a regular expression master try the first method
You can use sub to extract the substrings:
x <- "379_exp_mirror1.csv"
sub("_.*", "", x)
# [1] "379"
sub("^(?:.*_){2}(.*?)\\..*", "\\1", x)
# [1] "mirror1"
Another approach with gregexpr:
regmatches(x, gregexpr("^.*?(?=_)|(?<=_)[^_]*?(?=\\.)", x, perl = TRUE))[[1]]
# [1] "379" "mirror1"
May be you can try:
library(stringr)
x <- "379_exp_mirror1.csv"
str_extract_all(x, perl('^[0-9]+(?=_)|[[:alnum:]]+(?=\\.)'))[[1]]
#[1] "379" "mirror1"
Or
strsplit(x, "[._]")[[1]][c(T,F)]
#[1] "379" "mirror1"
Or
scan(text=gsub("[.]","_", x),what="",sep="_")[c(T,F)]
#Read 4 items
#[1] "379" "mirror1"

Resources