String between first two (.dots) - r

Hi have data which contains two or more dots. My requirement is to get string from first to second dot.
E.g string <- "abcd.vdgd.dhdsg"
Result expected =vdgd
I have used
pt <-strapply(string, "\\.(.*)\\.", simplify = TRUE)
which is giving correct data but for string having more than two dots its not working as expected.
e.g string <- "abcd.vdgd.dhdsg.jsgs"
its giving dhdsg.jsgs but expected is vdgd
Could anyone help me.
Thanks & Regards,

In base R we can use strsplit
ss <- "abcd.vdgd.dhdsg"
unlist(strsplit(ss, "\\."))[2]
#[1] "vdgd"
Or using gregexpr with regmatches
unlist(regmatches(ss, gregexpr("[^\\.]+", ss)))[2]
#[1] "vdgd"
Or using gsub (thanks #TCZhang)
gsub("^.+?\\.(.+?)\\..*$", "\\1", ss)
#[1] "vdgd"

Another option:
string <- "abcd.vdgd.dhdsg.jsgs"
library(stringr)
str_extract(string = string, pattern = "(?<=\\.).*?(?=\\.)")
[1] "vdgd"
I like this one because the str_extract function will return the first instance of the correct pattern, but you could also use str_extract_all to get all instances.
str_extract_all(string = string, pattern = "(?<=\\.).*?(?=\\.)")
[[1]]
[1] "vdgd" "dhdsg"
From here, you could index to get any position between two dots you want.

Another solution with the qdapRegex package:
library(qdapRegex)
ex_between("abcd.vdgd.dhdsg.jsgs", ".", ".")[[1]][1]
# "vdgd"

You can use read.table as well if you wish.Here providing the string as given in your problem and selecting the separator as dot("."), Once the column is converted into a data.frame, you may choose to select whatever column you want to pick(In this case it is column number 2).
read.table(text=string, sep=".",stringsAsFactors = FALSE)[,2]
Output:
> read.table(text=string, sep=".",stringsAsFactors = FALSE)[,2]
[1] "vdgd"

Here is a fun easy way via stringr
stringr::word(string, 2, sep = '\\.')

Here are two options that are vectorized over the input string vector:
You can try tstrsplit from data.table, which is vectorized over string:
> string <- c("abcd.vdgd.dhdsg", "abcd.vdgd.dhdsg.jsgs")
> tstrsplit(string, '.', fixed = TRUE)[[2]]
[1] "vdgd" "vdgd"
or regex:
> sub('.*?\\.(.*?)\\..*', '\\1', string)
[1] "vdgd" "vdgd"`

Related

append letter to a string in r

I have a vector:
c("BAAAVAST", "BAACEZ", "BAAGECBA", "LOL")
And I would like to remove "BAA" from the words that contain it. And to those words I would like to append ".PR".
Desired outcome:
c("AVAST.PR", "CEZ.PR", "GECBA.PR", "LOL")
Any ideas? Ideally using stringr. Thank you a lot.
You could use the following solution:
gsub("BAA(.*)", "\\1\\.PR", vec)
[1] "AVAST.PR" "CEZ.PR" "GECBA.PR" "LOL"
You could use
library(stringr)
# optimized thanks to Anoushiravan
str_replace(c("BAAAVAST", "BAACEZ", "BAAGECBA", "LOL"), "BAA(\\w*)", "\\1.PR")
#> [1] "AVAST.PR" "CEZ.PR" "GECBA.PR" "LOL"
use \\w* if you want to match word characters only or .* if there are no limitations to the characters.
This is verbose than the other answers. It finds strings with 'BAA' and appends 'PR.' to it.
inds <- grepl('BAA', vec, fixed = TRUE)
vec[inds] <- paste(sub('BAA', '', vec[inds]), 'PR', sep = '.')
vec
#[1] "AVAST.PR" "CEZ.PR" "GECBA.PR" "LOL"

string split and interchange the position of string in R

I have a vector called myvec. I would like to split it at _ and interchange the position. What would be the simplest way to do this?
myvec <- c("08AD09144_NACC022453", "08AD8245_NACC657970")
Result I want:
NACC022453_08AD09144, NACC657970_08AD8245
You can do this with regex capturing data in two groups and interchanging them using back reference.
myvec <- c("A1_B1", "B2_C1", "D1_A2")
sub('(\\w+)_(\\w+)', '\\2_\\1', myvec)
#[1] "B1_A1" "C1_B2" "A2_D1"
We can use strsplit from base R
sapply(strsplit(myvec, "_"), function(x) paste(x[2], x[1], sep = "_"))
#[1] "NACC022453_08AD09144" "NACC657970_08AD8245"

Count number of dots in character string with str_count?

I am trying to count the number of dots in a character string.
I have tried to use str_count but it gives me the number of letters of the string instead.
ex_str <- "This.is.a.string"
str_count(ex_str, '.')
nchar(ex_str)
. is a special regex symbol, so you need to escape it:
str_count(ex_str, '\\.')
# [1] 3
Using just base R you could do:
nchar(gsub("[^.]", "", ex_str))
Using stringi:
stri_count_fixed(ex_str, '.')
Another base R solution could be:
length(grepRaw(".", ex_str, fixed = TRUE, all = TRUE))
[1] 3
You may also use the base function gregexpr:
sum(gregexpr(".", ex_str, fixed=TRUE)[[1]] > 0)
[1] 3
You can use stringr::str_count with a fixed(...) argument to avoid treating it as a regular expression:
str_count(ex_str, fixed('.'))
See the online R demo:
library(stringr)
ex_str <- "This.is.a.string"
str_count(ex_str, fixed('.'))
## => [1] 3

How to get any string we want?

The string is as shown below:
s <- "12N10-3A 12N10-3A-1 12N10-3A-2 YB10L-A2"
I can get the strings except from second one.
gsub("\\s.*","",s) #12N10-3A
gsub(".*\\s","",s) #YB10L-A2
gsub(".*\\s.*\\s(.*).*\\s(.*)","\\1",s) #12N10-3A-2
How to get the second string from s and what's short approach for each code line? I tried what I learnt on regex101.com
We can use stri_extract_last from stringi
library(stringi)
stri_extract_last(s, regex = '\\S+')
#[1] "YB10L-A2"
Or use word from stringr
library(stringr)
word(s, 4)
#[1] "YB10L-A2"
Just use strsplit:
items <- strsplit(s, "\\s+")[[1]]
If you want to access the last item, then just use:
items[4]
[1] "YB10L-A2"
If you really wanted to isolate the last term using sub, then here is one way:
sub(".*\\s+", "", s)

numeric sort a list of strings in R

I have a list:
a <- ["12file.txt", "8file.txt", "66file.txt"]
I would like to sort by number:
a would be: ["8file.txt", "12file.txt", "66file.txt"]
Now I could get only this:
a = ["12file.txt", "66file.txt", "8file.txt"]
Thanks
I'm assuming you have a character vector:
a <- c("12file.txt", "8file.txt", "66file.txt")
I would approach this by pulling out the number at the start of each string and sorting on that:
num <- as.numeric(sub("([0-9]+).*", "\\1", a))
a[order(num)]
#[1] "8file.txt" "12file.txt" "66file.txt"
You could also pad your strings with spaces by setting a field length to sprintf to achieve the sorting you want:
a[order(sprintf("%10s",a))]
[1] "8file.txt" "12file.txt" "66file.txt"
You can use str_sort(..., numeric = TRUE) function from stringr package:
library(stringr)
a <- c("12file.txt", "8file.txt", "66file.txt")
str_sort(a, numeric = TRUE)
#> [1] "8file.txt" "12file.txt" "66file.txt"

Resources