How to calculate the number of elements in a string [duplicate] - r

This question already has answers here:
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 1 year ago.
I have a lot of strings of elements separated with hyphen -:
string<-c("aaa","aaa-bbb","aaa-bbb-ccc","aaa-bbb-ccc-ddd")
I want to calculate the number of elements in each string. The expected vector is
[1] 1 2 3 4

Does this work:
sapply(strsplit(string, split = '-'), length)
[1] 1 2 3 4

Related

how would you extract last 3 characters from a string in a column of R dataframe? [duplicate]

This question already has answers here:
Extracting the last n characters from a string in R
(15 answers)
Closed 1 year ago.
WH-LKO
WH-CHE
WH-BLR
WH-BLR
WH-HYD
W1- GGN
WH12-GGN
WH3-GUW
F2-AMD
You can try substring with nchar to extract last 3 characters from a string.
substring(x, nchar(x)-2)
#[1] "LKO" "CHE"
Data:
x <- c("WH-LKO", "WH-CHE")

Need to extract only those digits which are having 5 to 6 digit length [duplicate]

This question already has answers here:
How to use grep()/gsub() to find exact match
(2 answers)
Closed 4 years ago.
I need to extract only 5 or 6 digit from a string. For example "hi 23456678 is number, also there is a number 92844 and 741653 "
I need to extract only the 5 or 6 digit number from string , i tried \d{5,6} but it is giving me result as (23456, 92844, 741653) but my desired outcome should be only 92844 & 741653 , how can i get that.
I am using R, please suggest.
You can try this
^[0-9]{5,6}$
{5,6} = between 5 and 6 characters

Cutting value in vector by determine positions [duplicate]

This question already has answers here:
Trying to return a specified number of characters from a gene sequence in R
(3 answers)
Extracting the last n characters from a string in R
(15 answers)
Closed 5 years ago.
Is there a function in R that I can cut a value in vector.
for example i got this vec:
40754831597
64278107602
64212163451
and each vale in the vec i want to cut so from the number pos 3 to 6 for example and get a new vector look like this
7548
2781
2121
and so on
I don't really get why you would like to do this, but here you go:
# assuming it's a character vector
substring(vec,3,6)
# if it's numeric
substring(as.character(vec),3,6)
#output
#[1] "7548" "2781" "2121"
We can use sub
sub(".{2}(.{4}).*", "\\1", v1)
#[1] "7548" "2781" "2121"
data
v1 <- c(40754831597, 64278107602, 64212163451)

R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]

This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 5 years ago.
I would like to keep the non-duplicated values from a vector, but without retaining one element from duplicated values. unique() does not work for this. Neither would duplicated().
For example:
> test <- c(1,1,2,3,4,4,4,5,6,6,7,8,9,9)
> unique(test)
[1] 1 2 3 4 5 6 7 8 9
Whereas I would like the result to be: 2,3,5,7,8
Any ideas on how to approach this? Thank you!
We can use duplicated
test[!(duplicated(test)|duplicated(test, fromLast=TRUE))]
#[1] 2 3 5 7 8
You can use ave to count the length of sub-groups divided by unique values in test and retain only the ones whose length is 1 (the ones that have no duplicates)
test[ave(test, test, FUN = length) == 1]
#[1] 2 3 5 7 8
If test is comprised of characters, use seq_along as first argument of ave
test[ave(seq_along(test), test, FUN = length) == 1]

R - count number of items in a piped list [duplicate]

This question already has answers here:
Count values separated by a comma in a character string
(5 answers)
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 6 years ago.
I have a column with a piped list of identifiers
Identifier
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786
P28066|P60900|O14818|P20618|P40306
Q99436|P28062|P28065
P28062|P28065|P62191|P35998|P17980|P43686
How do I produce a column of the numbers of identifiers in each row?
Output to read something like this
Identifier Count
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786 7
P28066|P60900|O14818|P20618|P40306 5
Q99436|P28062|P28065 3
P28062|P28065|P62191|P35998|P17980|P43686 6
Thanks in advance!
sapply(strsplit(df$Identifier, '[|]'), length)
for unique cases, just add the unique function
sapply(strsplit(df$Identifier, '[|]'), function(i) length(unique(i)))
A base R option without splitting would be
df1$Count <- nchar(gsub("[^|]", "", df1$Identifier)) + 1L
df1$Count
#[1] 7 5 3 6
Or with gregexpr
sapply(gregexpr("[|]", df1$Identifier),
function(x) sum(attr(x, "match.length"))+1)
#[1] 7 5 3 6

Resources