In R, how to remove everything before the last slash - r

I have a dataset say
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
I'd like to remove everything before (including) the last slash, the result should look like
my cat is handsome
I googled this code which gives me everything before the last slash
gsub('(.*)/\\w+', '\\1', x)
[1] "test/test" "et/tom" "set/eat" "sk / tie"
How can I change this code, so that the other part of the string after the last slash can be shown?
Thanks

You can use basename:
paste(trimws(basename(x)),collapse=" ")
# [1] "my cat is handsome"

Using strsplit
> sapply(strsplit(x, "/\\s*"), tail, 1)
[1] "my" "cat" "is" "handsome"
Another way for gsub
> gsub("(.*/\\s*(.*$))", "\\2", x) # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
Using str_extract from stringr package
> library(stringr)
> str_extract(x, "\\w+$") # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"

You can basically just move where the parentheses are in the regex you already found:
gsub('.*/ ?(\\w+)', '\\1', x)

You could use
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x)
Which yields
[1] "my" "cat" "is" "handsome"
To have it in one sentence, you could paste it:
(paste0(gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x), collapse = " "))
The pattern here is:
^ # start of the string
(?:[^/]*/)* # not a slash, followed by a slash, 0+ times
\\s* # whitespaces, eventually
(.*) # capture the rest of the string
This is replaced by \\1, hence the content of the first captured group.

Related

fetch specific word or number from url address [duplicate]

I have a dataset say
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
I'd like to remove everything before (including) the last slash, the result should look like
my cat is handsome
I googled this code which gives me everything before the last slash
gsub('(.*)/\\w+', '\\1', x)
[1] "test/test" "et/tom" "set/eat" "sk / tie"
How can I change this code, so that the other part of the string after the last slash can be shown?
Thanks
You can use basename:
paste(trimws(basename(x)),collapse=" ")
# [1] "my cat is handsome"
Using strsplit
> sapply(strsplit(x, "/\\s*"), tail, 1)
[1] "my" "cat" "is" "handsome"
Another way for gsub
> gsub("(.*/\\s*(.*$))", "\\2", x) # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
Using str_extract from stringr package
> library(stringr)
> str_extract(x, "\\w+$") # without 'unwanted' spaces
[1] "my" "cat" "is" "handsome"
You can basically just move where the parentheses are in the regex you already found:
gsub('.*/ ?(\\w+)', '\\1', x)
You could use
x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x)
Which yields
[1] "my" "cat" "is" "handsome"
To have it in one sentence, you could paste it:
(paste0(gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x), collapse = " "))
The pattern here is:
^ # start of the string
(?:[^/]*/)* # not a slash, followed by a slash, 0+ times
\\s* # whitespaces, eventually
(.*) # capture the rest of the string
This is replaced by \\1, hence the content of the first captured group.

Regex to add comma between any character

I'm relatively new to regex, so bear with me if the question is trivial. I'd like to place a comma between every letter of a string using regex, e.g.:
x <- "ABCD"
I want to get
"A,B,C,D"
It would be nice if I could do that using gsub, sub or related on a vector of strings of arbitrary number of characters.
I tried
> sub("(\\w)", "\\1,", x)
[1] "A,BCD"
> gsub("(\\w)", "\\1,", x)
[1] "A,B,C,D,"
> gsub("(\\w)(\\w{1})$", "\\1,\\2", x)
[1] "ABC,D"
Try:
x <- 'ABCD'
gsub('\\B', ',', x, perl = T)
Prints:
[1] "A,B,C,D"
Might have misread the query; OP is looking to add comma's between letters only. Therefor try:
gsub('(\\p{L})(?=\\p{L})', '\\1,', x, perl = T)
(\p{L}) - Match any kind of letter from any language in a 1st group;
(?=\p{L}) - Positive lookahead to match as per above.
We can use the backreference to this capture group in the replacement.
You can use
> gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
[1] "A,B,C,D"
The (.)(?=.) regex matches any char capturing it into Group 1 (with (.)) that must be followed with any single char ((?=.)) is a positive lookahead that requires a char immediately to the right of the current location).
Vriations of the solution:
> gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
## Or with stringr:
## stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
[1] "A,B,C,D"
Here, (?!$) fails the match if there is an end of string position.
See the R demo online:
x <- "ABCD"
gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
# => [1] "A,B,C,D"
A non-regex friendly answer:
paste(strsplit(x, "")[[1]], collapse = ",")
#[1] "A,B,C,D"
Another option is to use positive look behind and look ahead to assert there is a preceding and a following character:
library(stringr)
str_replace_all(x, "(?<=.)(?=.)", ",")
[1] "A,B,C,D"

String after and before character

I have this string
x = "Hello how are you Peter /"
And I would like to get only
x = "Peter"
I would like to find patter that extract only word after "you" and before "/" (exluded)
I would like to use something like
x = sub(" you*/.", "", x)
But I dont know how to make the pattern correctly.
gsub(".*you (.*) /$", "\\1", x)
library(stringr)
str_match(x, "you\\s*(.*?)\\s*\\/")[, 2]
#[1] "Peter"
With lookahead and lookbehind:
library(stringr)
x = "Hello how are you Peter /"
str_extract(x,"(?<=you )\\w+(?= /)")
[1] "Peter"
If you want to be a bit more robust to spaces (if there is or not a space after the name for example, the example above will not work):
str_extract(x,"(?<=you)[\\w ]+(?=/)") %>%
trimws()

How to extract everything until first occurrence of pattern

I'm trying to use the stringr package in R to extract everything from a string up until the first occurrence of an underscore.
What I've tried
str_extract("L0_123_abc", ".+?(?<=_)")
> "L0_"
Close but no cigar. How do I get this one? Also, Ideally I'd like something that's easy to extend so that I can get the information in between the 1st and 2nd underscore and get the information after the 2nd underscore.
To get L0, you may use
> library(stringr)
> str_extract("L0_123_abc", "[^_]+")
[1] "L0"
The [^_]+ matches 1 or more chars other than _.
Also, you may split the string with _:
x <- str_split("L0_123_abc", fixed("_"))
> x
[[1]]
[1] "L0" "123" "abc"
This way, you will have all the substrings you need.
The same can be achieved with
> str_extract_all("L0_123_abc", "[^_]+")
[[1]]
[1] "L0" "123" "abc"
The regex lookaround should be
str_extract("L0_123_abc", ".+?(?=_)")
#[1] "L0"
Using gsub...
gsub("(.+?)(\\_.*)", "\\1", "L0_123_abc")
You can use sub from base using _.* taking everything starting from _.
sub("_.*", "", "L0_123_abc")
#[1] "L0"
Or using [^_] what is everything but not _.
sub("([^_]*).*", "\\1", "L0_123_abc")
#[1] "L0"
or using substr with regexpr.
substr("L0_123_abc", 1, regexpr("_", "L0_123_abc")-1)
#substr("L0_123_abc", 1, regexpr("_", "L0_123_abc", fixed=TRUE)-1) #More performant alternative
#[1] "L0"

Splitting CamelCase in R

Is there a way to split camel case strings in R?
I have attempted:
string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s" "ome" "amel" "ase"
string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z])", " \\1", string.to.split)
# [1] "this Is Some Camel Case"
strsplit(gsub("([A-Z])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
Looking at Ramnath's and mine I can say that my initial impression that this was an underspecified question has been supported.
And give Tommy and Ramanth upvotes for pointing out [:upper:]
strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
Here is one way to do it
split_camelcase <- function(...){
strings <- unlist(list(...))
strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
return(strsplit(tolower(strings), " ")[[1]])
}
split_camelcase("thisIsSomeGood")
# [1] "this" "is" "some" "good"
Here's an approach using a single regex (a Lookahead and Lookbehind):
strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)
## [[1]]
## [1] "this" "Is" "Some" "Camel" "Case"
Here is a one-liner using the gsubfn package's strapply. The regular expression matches the beginning of the string (^) followed by one or more lower case letters ([[:lower:]]+) or (|) an upper case letter ([[:upper:]]) followed by zero or more lower case letters ([[:lower:]]*) and processes the matched strings with c (which concatenates the individual matches into a vector). As with strsplit it returns a list so we take the first component ([[1]]) :
library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this" "Is" "Camel" "Case"
I think my other answer is better than the follwing, but if only a oneliner to split is needed...here we go:
library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this" "Is" "Some" "Camel" "Case"
The beginnings of an answer is to split all the characters:
sp.x <- strsplit(string.to.split, "")
Then find which string positions are upper case:
ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))
Then use that to split out each run of characters . . .
Here an easy solution via snakecase + some tidyverse helpers:
install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)
string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>%
str_split(pattern = "_") %>%
purrr::flatten_chr()
#> [1] "this" "Is" "Some" "Camel" "Case"
Githublink to snakecase: https://github.com/Tazinho/snakecase

Resources