R - count number of items in a piped list [duplicate] - r

This question already has answers here:
Count values separated by a comma in a character string
(5 answers)
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 6 years ago.
I have a column with a piped list of identifiers
Identifier
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786
P28066|P60900|O14818|P20618|P40306
Q99436|P28062|P28065
P28062|P28065|P62191|P35998|P17980|P43686
How do I produce a column of the numbers of identifiers in each row?
Output to read something like this
Identifier Count
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786 7
P28066|P60900|O14818|P20618|P40306 5
Q99436|P28062|P28065 3
P28062|P28065|P62191|P35998|P17980|P43686 6
Thanks in advance!

sapply(strsplit(df$Identifier, '[|]'), length)
for unique cases, just add the unique function
sapply(strsplit(df$Identifier, '[|]'), function(i) length(unique(i)))

A base R option without splitting would be
df1$Count <- nchar(gsub("[^|]", "", df1$Identifier)) + 1L
df1$Count
#[1] 7 5 3 6
Or with gregexpr
sapply(gregexpr("[|]", df1$Identifier),
function(x) sum(attr(x, "match.length"))+1)
#[1] 7 5 3 6

Related

split the lines of a data frame into a variable number of lines based on a character in R [duplicate]

This question already has answers here:
Split delimited strings in a column and insert as new rows [duplicate]
(6 answers)
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 10 months ago.
I have this df:
df = data.frame(ID = c(1,2,3),
A = c("h;d;c", "j;k", "k"))
And i want to retrieve a new df with splited rows based on ";" character, just like this:
ID A
1 1 h
2 1 d
3 1 c
4 2 j
5 2 k
6 3 k
I searched for other questions, but they need an exact amount of expected characters. (Split data frame string column into multiple columns)
Thanks for the help!

How to calculate the number of elements in a string [duplicate]

This question already has answers here:
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 1 year ago.
I have a lot of strings of elements separated with hyphen -:
string<-c("aaa","aaa-bbb","aaa-bbb-ccc","aaa-bbb-ccc-ddd")
I want to calculate the number of elements in each string. The expected vector is
[1] 1 2 3 4
Does this work:
sapply(strsplit(string, split = '-'), length)
[1] 1 2 3 4

Cutting value in vector by determine positions [duplicate]

This question already has answers here:
Trying to return a specified number of characters from a gene sequence in R
(3 answers)
Extracting the last n characters from a string in R
(15 answers)
Closed 5 years ago.
Is there a function in R that I can cut a value in vector.
for example i got this vec:
40754831597
64278107602
64212163451
and each vale in the vec i want to cut so from the number pos 3 to 6 for example and get a new vector look like this
7548
2781
2121
and so on
I don't really get why you would like to do this, but here you go:
# assuming it's a character vector
substring(vec,3,6)
# if it's numeric
substring(as.character(vec),3,6)
#output
#[1] "7548" "2781" "2121"
We can use sub
sub(".{2}(.{4}).*", "\\1", v1)
#[1] "7548" "2781" "2121"
data
v1 <- c(40754831597, 64278107602, 64212163451)

R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]

This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 5 years ago.
I would like to keep the non-duplicated values from a vector, but without retaining one element from duplicated values. unique() does not work for this. Neither would duplicated().
For example:
> test <- c(1,1,2,3,4,4,4,5,6,6,7,8,9,9)
> unique(test)
[1] 1 2 3 4 5 6 7 8 9
Whereas I would like the result to be: 2,3,5,7,8
Any ideas on how to approach this? Thank you!
We can use duplicated
test[!(duplicated(test)|duplicated(test, fromLast=TRUE))]
#[1] 2 3 5 7 8
You can use ave to count the length of sub-groups divided by unique values in test and retain only the ones whose length is 1 (the ones that have no duplicates)
test[ave(test, test, FUN = length) == 1]
#[1] 2 3 5 7 8
If test is comprised of characters, use seq_along as first argument of ave
test[ave(seq_along(test), test, FUN = length) == 1]

change numbers into the sum of their figures [duplicate]

This question already has answers here:
Digit sum function in R
(4 answers)
Closed 7 years ago.
I've got this simple question: how can I change a vector consisting of 10 numbers into a vector consisting of ten numbers which are the sum of the figures of the first numbers? So 11 in the first vector becomes 2, 234 becomes 9.
We can use str_extract_all from stringr to get the individual numbers, convert them to numeric and get the sum.
library(stringr)
sapply(str_extract_all(c(11, 234), '\\d'), function(x) sum(as.numeric(x)))

Resources