unique string count in a sequence [duplicate] - r

This question already has answers here:
transitions in a sequence
(2 answers)
Closed 2 years ago.
I am trying to get the unique counts of the strings in a sequence.
For example,
A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
library('stringr')
B<- str_count(A, "-")
df<- data.frame(A, B)
I am expecting output as follows:
C here is the total diversity, or different states in the sequence, any thoughts or suggestions? I looked around in SO but couldn't find a reasonable solution.
df$C
4
3
2
1

I would do this using unique:
df$res <- sapply(str_split(A,"-"),function(x) length(unique(x)))
df
A B res
1 CCE-CRE-DEE-DEE 3 3
2 FOE-FOE-GOE-GOE-GOE-ISE 5 3
3 ISE-PCE 1 2
4 ISE 0 1
I supose that what you expect is actually 3 for CCE-CRE-DEE-DEE.

Related

R how to remove repeated value while save unique values in running length [duplicate]

This question already has answers here:
Remove/collapse consecutive duplicate values in sequence
(5 answers)
Closed 1 year ago.
So Example I have this vectors:
v <- c(3,3,3,3,3,1,1,1,1,1,1,
3,3,3,3,3,3,3,3,3,3,3,3,
3,3,3,2,2,2,2,2,2,2,3,3,
3,3,3,3,3,3,3,3,3,3,3)
And I like to Simplify the vectors as this expected outputs:
exp_output <- c(3,1,3,2,3)
Whats the best and convenient way to do this? Thankyou
Try rle(v)$values which results in [1] 3 1 3 2 3.
Another option using diff and which.
v[c(1, which(diff(v) != 0) + 1)]
#[1] 3 1 3 2 3
Another option is with lag:
library(dplyr)
v[v!=lag(v, default=1)]
[1] 3 1 3 2 3
We can use rleid
library(data.table)
tapply(v, rleid(v), FUN = first)
1 2 3 4 5
3 1 3 2 3

Is there a way in R to make all possible combinations between rows of different columns? [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Generate list of all possible combinations of elements of vector
(10 answers)
Closed 2 years ago.
I have a df with one column and I would like to make combinations with the values of this column in order to have a new df with two columns, like he simple example below: (Obs: my df has ~5000 rows)
df
CG
1
2
3
##I would like a result similar to this:
> head(df1)
C1 C2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Does someone could help me?
Thank you in advance

Missing values in sequence into actual sequence in R? [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 5 years ago.
I have a vector of integers, for example, v <- c(1,5,1,2,2,4,7,5,7). If I sort(unique(v)), the values 3 and 6 would be missing in the sequence. How can I transform v into a vector where sort(unique(v)) is an actual sequence of integers? This is, transforming v into c(1,4,1,2,2,3,5,3,5) (in general, of course).
Converting v to factor and back to numeric could do the trick
as.numeric(as.factor(v))
#[1] 1 4 1 2 2 3 5 4 5
Using OP's method, we get the expected output with match
match(v, sort(unique(v)))
#[1] 1 4 1 2 2 3 5 4 5

Adding group column to data frame [duplicate]

This question already has an answer here:
Compute the minimum of a pair of vectors
(1 answer)
Closed 7 years ago.
Say I have the following data frame:
dx=data.frame(id=letters[1:4], count=1:4)
# id count
# 1 a 1
# 2 b 2
# 3 c 3
# 4 d 4
And I would like to (grammatically) add a column that will get the count whenever count<3, otherwise 3, so I'll get the following:
# id count group
# 1 a 1 1
# 2 b 2 2
# 3 c 3 3
# 4 d 4 3
I thought to use
dx$group=if(dx$count<3){dx$count}else{3}
but it doesn't work on arrays. How can I do it?
In this particular case you can just use pmin (as I stated in the comments above):
df$group <- pmin(df$count, 3)
In general your if/else construction does not work on vectors, but you can use the function ifelse. It takes three arguments: First the condition, then the result if the condition is met and finally the result if the condition is not met. For your example you would write the following:
df$group <- ifelse(df$count < 3, df$count, 3)
Note that in your example the pmin solution is better. Just mentioning the ifelse solution for completeness.

How to create an expanding sequence in R, such as c(1,1,2,1,2,3,1,2,3,4,1,2,3,4,5) [duplicate]

This question already has answers here:
Generate an incrementally increasing sequence like 112123123412345
(4 answers)
Closed 5 years ago.
I have been trying to create the sequence c(1,1,2,1,2,3,1,2,3,4,1,2,3,4,5...) without using any loops. Does anyone have any idea how to create such a sequence?
I'll throw in
unlist(lapply(1:5, seq_len))
which is equivalent, if a bit longer than, alexis_jaz comment
sequence(1:5)
do.call(c, sapply(1:5, function(x) 1:x))
Or
v1 <- 1:5
seq_len(sum(v1))-rep(cumsum(c(0L, v1[-length(v1)])), v1)
#[1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5

Resources