Count the maximum of consecutive letters in a string - r

I have this vector:
vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X")
I want to detect the maximum of consecutive times that appears X. So, my expected vector would be:
4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2

In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X".
sapply(strsplit(vector, ""), function(x) {
inds = rle(x)
max(inds$lengths[inds$values == "X"])
})
#[1] 4 1 2 1 2 1 2 1 2 2 3 2

Here is a slightly different approach. We can split each term in the input vector on any number of dashes. Then, find the substring with the greatest length.
sapply(vector, function(x) {
max(nchar(unlist(strsplit(x, "-+"))))
})
XXXX-X-X ---X-X-X --X---XX --X-X--X -X---XX- -X--X--X X-----XX X----X-X
4 1 2 1 2 1 2 1
X---XX-- XX--X--- ---X-XXX --X-XX-X
2 2 3 2
I suspect that X really just represents any non dash character, so we don't need to explicitly check for it. If you do really only want to count X, then we can try removing all non X characters before we count:
sapply(vector, function(x) {
max(nchar(gsub("[^X]", "", unlist(strsplit(x, "-+")))))
})

Use strapply in gsubfn to extract out the X... substrings applying nchar to each to count its number of character producing a list of vectors of lengths. sapply the max function each such vector.
library(gsubfn)
sapply(strapply(vector, "X+", nchar), max)
## [1] 4 1 2 1 2 1 2 1 2 2 3 2

Here are a couple of tidyverse alternatives:
map_dbl(vector, ~sum(str_detect(., strrep("X", 1:8))))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2
map_dbl(strsplit(vector,"-"), ~max(nchar(.)))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2

Related

How to repeat the indices of a vector based on the values of that same vector?

Given a random integer vector below:
z <- c(3, 2, 4, 2, 1)
I'd like to create a new vector that contains all z's indices a number of times specified by the value corresponding to that element of z. To illustrate this. The desired result in this case should be:
[1] 1 1 1 2 2 3 3 3 3 4 4 5
There must be a simple way to do this.
You can use rep and seq to repeat the indices of a vector based on the values of that same vector. seq to get the indices and rep to repeat them.
rep(seq(z), z)
# [1] 1 1 1 2 2 3 3 3 3 4 4 5
Starting with all the indices of the vector z. These are given by:
1:length(z)
Then these elements should be repeated. The number of times these numbers should be repeated is specified by the values of z. This can be done using a combination of the lapply or sapply function and the rep function:
unlist(lapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x])))
[1] 1 1 1 2 2 3 3 3 3 4 4 5
unlist(sapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x])))
[1] 1 1 1 2 2 3 3 3 3 4 4 5
Both alternatives give the same result.

Sort vector into repeating sequence when sequential values are missing R

I would like to take a vector such as this:
x <- c(1,1,1,2,2,2,2,3,3)
and sort this vector into a repeating sequence maintaining the hierarchical order of 1, 2, 3 when values are absent.
return: c(1,2,3,1,2,3,1,2,2)
We can create the order based on the sequence of 'x'
x[order(ave(x, x, FUN = seq_along))]
#[1] 1 2 3 1 2 3 1 2 2
Or with rowid fromdata.table
library(data.table)
x[order(rowid(x))]
#[1] 1 2 3 1 2 3 1 2 2

What is an expressive and efficient way to vectorize seq in R

I am aware of seq, which used in this way:
seq(by=1, to=3, by=1)
will get you from c(1) to
c(1,2,3)
How can I vectorize this behavior to go from
Input:
c(1,1,1)
Output:
c(1,1,1,2,2,2,3,3,3)
seq isn't vectorised. You could use one of the loops to get the same behavior.
For example, with mapply
x <- c(1,1,1)
c(t(mapply(seq, x, 3)))
#[1] 1 1 1 2 2 2 3 3 3
If you want every sequence go till length(x) use that instead of hard-coded 3.
Besides, if your x will always start with 1 as shown in the example you can use rep and sequence
sort(sequence(rep(length(x), length(x))))
#[1] 1 1 1 2 2 2 3 3 3
An option is rep and it is vectorized. No need to use loops
rep(seq_along(v1), each = length(v1))
#[1] 1 1 1 2 2 2 3 3 3
Or another option is replicate
c(t(replicate(3, seq(1, 3, 1))))
#[1] 1 1 1 2 2 2 3 3 3
If we wanted to vectorize the seq, use Vectorize
c(t(Vectorize(function(x) seq(x, 3, 1))(v1)))
#[1] 1 1 1 2 2 2 3 3 3
data
v1 <- c(1, 1, 1)

how to append to a list after its last element in loop in R?

I have vector name: block_sizes=c(3,3,4) and I want to make a list which it has block_sizes[1] number of 1s,block_sizes[2] number of 2s ,and block_sizes[3] of 3s, etc.
In this example, I should give [3,3,4] to my code and get [1,1,1,2,2,2,3,3,3,3].
I wrote the code below, but I don't know why it gives me [0,3,3,3,3]. I think it is because I should have appended to last element of vector which I am not now. Any input is appreciated.
vector=0
for (i in length(block_sizes )) {
buf=rep.int(i,times =block_sizes[i])
membership_vector=append(vector,buf)
}```
We can use rep without any loop as the times parameter is vectorized
rep(seq_along(block_sizes), block_sizes)
#[1] 1 1 1 2 2 2 3 3 3 3
Or for the faster implementation
rep.int(seq_along(block_sizes), times = block_sizes)
#[1] 1 1 1 2 2 2 3 3 3 3
Or if we need a loop
v1 <- c()
for(i in seq_along(block_sizes)) {
v1 <- c(v1, rep.int(i, times = block_sizes[i]))
}
data
block_sizes <- c(3, 3, 4)
We can use mapply:
unlist(mapply(rep, c(1,2,3), c(3,3,4))
# [1] 1 1 1 2 2 2 3 3 3 3

How many times occur pair of 1 in a vector

i have a problem.
I have a vector, that consists from 0 or 1 - for example (011011111011100001111). In R i need to figure out, how to count how many times appears in vector two 1, three 1, four 1 and so on. In this example vector I have 1 times 11, 1 times 111, 1 times 1111 and 1 times 11111.
Thanks a lot, Peter
I'm assuming you have an actual vector like c(0, 1, 1, 0...).
Here is a solution using table and rle. I've also provided some longer sample data to make it a bit more interesting.
set.seed(1)
myvec <- sample(c(0, 1), 100, replace = TRUE)
temp <- rle(myvec)
table(temp$lengths[temp$values == 1])
#
# 1 2 3 4 6
# 15 8 1 2 1
If, indeed, you are dealing with a crazy-long character string of ones and zeroes, just use strsplit and follow the same logic as above.
myvec <- "00110111100010101101101000001001001110101111110011010000011010001001"
myvec <- as.numeric(strsplit(myvec, "")[[1]])
Here, I've converted to numeric, but that's just so you can use the same code as earlier. You can use rle on a character vector too.
rle is your friend:
vec <-c(0,1,1,0,1,1,1,1,1,0,1,1,1,0,0,0,0,1,1,1,1)
res <-data.frame(table(rle(vec)))
res[res$values==1,]
lengths values Freq
6 1 1 0
7 2 1 1
8 3 1 1
9 4 1 1
10 5 1 1

Resources