how to find where the interval of continuous numbers starts and ends? - r

I have a vector
vec <- c(2, 3, 5, 6, 7, 8, 16, 19, 22, 23, 24)
The continuous numbers are:
c(2, 3)
c(5, 6, 7, 8)
c(22, 23, 24)
So the first vector starts at 2 and ends at 3;
for the second vector starts at 5 and ends at 8;
for the third vector starts at 22 and ends at 24;
There is a function to identify where the continuous numbers starts and ends?

By using diff to check the differences between each consecutive value, you can find where the difference is not +1.
diff(vec)
## [1] 1 2 1 1 1 8 3 3 1 1
c(1, diff(vec)) != 1
## [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
Then use cumsum to make a group identifier:
cumsum(c(1, diff(vec))!=1)
## [1] 0 0 1 1 1 1 2 3 4 4 4
And use this to split your data up:
split(vec, cumsum(c(1, diff(vec))!=1))
##$`0`
##[1] 2 3
##
##$`1`
##[1] 5 6 7 8
##
##$`2`
##[1] 16
##
##$`3`
##[1] 19
##
##$`4`
##[1] 22 23 24
Which can be Filtered to consecutive values:
Filter(\(x) length(x) > 1, split(vec, cumsum(c(1, diff(vec))!=1)))
##$`0`
##[1] 2 3
##
##$`1`
##[1] 5 6 7 8
##
##$`4`
##[1] 22 23 24

Another one
vec=c( 2 , 3 , 5 , 6 , 7 , 8 , 16 , 19 , 22 , 23 , 24 )
x <- replace(NA, vec, vec)
# [1] NA 2 3 NA 5 6 7 8 NA NA NA NA NA NA NA 16 NA NA 19 NA NA 22 23 24
l <- split(x, with(rle(is.na(x)), rep(seq.int(length(lengths)), lengths)))
# l <- split(x, data.table::rleid(is.na(x))) ## same as above
l <- Filter(Negate(anyNA), l)
l
# $`2`
# [1] 2 3
#
# $`4`
# [1] 5 6 7 8
#
# $`6`
# [1] 16
#
# $`8`
# [1] 19
#
# $`10`
# [1] 22 23 24
If you have a length requirement:
l[lengths(l) > 1]
# $`2`
# [1] 2 3
#
# $`4`
# [1] 5 6 7 8
#
# $`10`
# [1] 22 23 24

Related

How to do a simple pattern coding

If I want to have the numbers coded like this, could you give me some suggestion about the codes?
1
2 3
6 5 4
7 8 9 10
15 14 13 12 11
...........
Thanks!!
Another base R option
v <- choose((1:5) + 1, 2)
Map(function(x, y) {
ifelse(x %% 2, rev, I)((y - x + 1):y)
}, seq_along(v), v)
gives
[[1]]
[1] 1
[[2]]
[1] 2 3
[[3]]
[1] 6 5 4
[[4]]
[1] 7 8 9 10
[[5]]
[1] 15 14 13 12 11
Here is one option with split
lst1 <- split(1:15, rep(1:5, 1:5))
lst1[c(TRUE, FALSE)] <- lapply(lst1[c(TRUE, FALSE)], rev)
-output
lst1
#$`1`
#[1] 1
#$`2`
#[1] 2 3
#$`3`
#[1] 6 5 4
#$`4`
#[1] 7 8 9 10
#$`5`
#[1] 15 14 13 12 11
Or another option is
m1 <- matrix(NA, 5, 5)
m1[upper.tri(m1, diag = TRUE)] <- 1:15
m1 <- t(m1)
m1[c(TRUE, FALSE),] <- t(apply(m1[c(TRUE, FALSE),], 1,
function(x) c(rev(x[!is.na(x)]), x[is.na(x)])))

Split dataframe into a list with vectors of unequal lengths

Suppose I have a dataframe with 10 rows and 5 columns like this
df <- data.frame(matrix(rep(1:10, 5), ncol=5))
I want to split the dataframe into a list with 5 vectors of unequal length, where first vector begins with the first row of column one, second vector begins with the second row of column two, and so on. I will provide the index where each of the vectors shall end. For example, if this index vector is c(3, 5, 4, 8, 10) then I expect the list to be list(1:3, 2:5, 3:4, 4:8, 5:10). Thanks!
Map(function(x, a, b) x[a:b], df, seq_along(df), c(3, 5, 4, 8, 10))
# $X1
# [1] 1 2 3
# $X2
# [1] 2 3 4 5
# $X3
# [1] 3 4
# $X4
# [1] 4 5 6 7 8
# $X5
# [1] 5 6 7 8 9 10
An option is imap which by default have sequence as the .y when the vector of list is unnamed
library(purrr)
imap(v1, ~ .y:.x)
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 2 3 4 5
#[[3]]
#[1] 3 4
#[[4]]
#[1] 4 5 6 7 8
#[[5]]
#[1] 5 6 7 8 9 10
data
v1 <- c(3, 5, 4, 8, 10)
x = c(3, 5, 4, 8, 10)
lapply(seq_along(x), function(i) df[cbind(i:x[i], i)])
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 2 3 4 5
#[[3]]
#[1] 3 4
#[[4]]
#[1] 4 5 6 7 8
#[[5]]
#[1] 5 6 7 8 9 10
OR
x = c(3, 5, 4, 8, 10)
i = rep(seq_along(x), x - seq_along(x) + 1)
split(df[cbind(i + sequence(rle(i)$lengths) - 1, i)], names(df)[i])
#$X1
#[1] 1 2 3
#$X2
#[1] 2 3 4 5
#$X3
#[1] 3 4
#$X4
#[1] 4 5 6 7 8
#$X5
#[1] 5 6 7 8 9 10

Adding a vector to components of a list

I have the following list:
A <- c(11)
B <- c(7, 13)
C <- c(1, 10, 11, 12)
my_list <- list(A, B, C)
> my_list
[[1]]
[1] 11
[[2]]
[1] 7 13
[[3]]
[1] 1 10 11 12
I would like to add -2, -1, 0, 1, and 2 to each number in this list, and retain all of the unique values within each list element, to obtain the following resulting list:
> my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14
I tried the following code, but I did not get the result I was hoping for:
my_new_list <- lapply(res, `+`, -2:2)
> my_new_list
$`1`
[1] 9 10 11 12 13
$`2`
[1] 5 12 7 14 9
$`3`
[1] -1 9 11 13 3
Why is this happening, and how can I obtain the result I'd like? Thanks!
Assuming that we need the unique values
lapply(my_list, function(x) sort(unique(unlist(lapply(x, `+`, -2:2)))))
Or with outer
lapply(my_list, function(x) sort(unique(c(outer(x, -2:2, `+`)))))
Or with rep and recyling
lapply(my_list, function(x) sort(unique(rep(-2:2, each = length(x)) + x)))
#[[1]]
# [1] 9 10 11 12 13
#[[2]]
# [1] 5 6 7 8 9 11 12 13 14 15
#[[3]]
# [1] -1 0 1 2 3 8 9 10 11 12 13 14
How about this:
my_new_list <- lapply(my_list, function(x) unique(union(x,sapply(x, function(y) y +c(-2:2)) )))
my_new_list <- lapply(my_new_list, sort)
my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14

how to make a vector of x from 1 to max value

In a dataset like this one
what code should I use if I want to make a vector of
x <- 1: max (day)/ID
? So x will be
1:7 for B1
1:11 for B2
1:22 for B3
I tried
max_day <- summaryBy(day ~ ID , df ,FUN=max) # to extract the maximum day per ID
df<- merge (df, max_day) ## to create another column with the maximum day
max_day<- unique(df[,c("ID", " day.max")]) ## to have one value (max) per ID
##& Finlay the vector
x <- 1: (max_day$day.max)
I got this message
Warning message:
In 1:(max_day$day.max) :
numerical expression has 11134 elements: only the first used
Any suggestions?
tapply(df$day, df$ID, function(x) 1:max(x))
I don't know how should look your output, but you can try this:
my_data <- data.frame(ID = c(rep("B1", 3), rep("B2", 4), rep("B3", 3)),
day = sample(1:20, 10, replace = TRUE))
tmp <- aggregate(test$day, by = list(test$ID), FUN = max)
sapply(1:nrow(tmp), function(y) return(1:tmp$x[y]))
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# [[2]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
# [[3]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
We can use sapply to loop over unique element of ID and generate a sequence from 1 to the max for that ID in the day column
sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"])))
#[[1]]
#[1] 1 2 3 4 5 6 7
#[[2]]
#[1] 1 2 3 4 5 6 7 8 9 10 11
#[[3]]
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
If we want all as one vector , we can try unlist
unlist(sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"]))))
#[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10
# 11 12 13 14 15 16 17 18 19 20 21 22
Yet another option, using Hadley Wickham's purrr package, as part of the tidyverse.
d <- data.frame(id = rep(c("B1", "B2", "B3"), c(3, 4, 5)),
v = c(1:3, 1:4, 1:5),
day = c(1, 3, 7, 1, 5, 9, 11, 3, 5, 11, 20, 22),
number = c(15, 20, 30, 25, 26, 28, 35, 10, 12, 14, 16, 18))
library(purrr)
d %>%
split(.$id) %>%
map(~1:max(.$day))
# $B1
# [1] 1 2 3 4 5 6 7
# $B2
# [1] 1 2 3 4 5 6 7 8 9 10 11
# $B3
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
df <-
data.frame(ID = c(rep("B1",3),rep("B2",4),rep("B3",5)),
V = c(1,2,3,1,2,3,4,1,2,3,4,5),
day = c(1,3,7,1,5,9,11,3,5,11,20,22),
number = c(15,20,30,25,26,28,35,10,12,14,16,18))
x <- list()
n <- 1
for(i in unique(df$ID)){
max_day <- max(df$day[df$ID==i])
x[[n]] <- 1:max_day
n <- n+1
}
x
[[1]]
[1] 1 2 3 4 5 6 7
[[2]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[3]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

R Order elements of list

I want to order 2 lists in R and intersect both on the 2 elements with most length. example:
Membership1
[[1]]
[1] 3 4 6 7 8
[[2]]
[1] 5 13 23
[[3]]
[1] 1 2 12 14 15 16 18 21 25 28
Membership2
[[1]]
[1] 8 13 20 21 23
[[2]]
[1] 3 6 7
[[3]]
[1] 1 2 4 5 10 15 17 19 24 25 29
Here, the result would be:
[[3]]
[1] 1 2 12 14 15 16 18 21 25 28
[[1]]
[1] 3 4 6 7 8
[[2]]
[1] 5 13 23
and
[[3]]
[1] 1 2 4 5 10 15 17 19 24 25 29
[[1]]
[1] 8 13 20 21 23
[[2]]
[1] 3 6 7
And then 1, 2, 15, 25 and 8 (intersect of both [[3]])
The intersect function is pretty straighforward, but I donĀ“t understand how to order those lists the way I want.
Manuel
To order a list w.r.t. the lengths of its elements, nonincreasingly, call:
x <- list(1:5, 1:3, 1:7)
(x <- x[order(sapply(x, length), decreasing=TRUE)])
## [[1]]
## [1] 1 2 3 4 5 6 7
##
## [[2]]
## [1] 1 2 3 4 5
##
## [[3]]
## [1] 1 2 3
Thus, the whole task may be solved with e.g.:
Membership1 <- list(c(3, 4, 6, 7, 8), c(5, 13, 23), c(1, 2, 12, 14, 15, 16, 18, 21, 25, 28))
Membership2 <- list(c(8, 13, 20, 21, 23), c(3, 6, 7), c(1, 2, 4, 5, 10, 15, 17, 19, 24, 25, 29))
Membership1 <- Membership1[order(sapply(Membership1, length), decreasing=TRUE)]
Membership2 <- Membership2[order(sapply(Membership2, length), decreasing=TRUE)]
lapply(seq_along(Membership1), function(i) intersect(Membership1[[i]], Membership2[[i]]))
## [[1]]
## [1] 1 2 15 25
##
## [[2]]
## [1] 8
##
## [[3]]
## numeric(0)
Equivalently, as #flodel suggested, the last step may be performed as follows:
Map(intersect, Membership1, Membership2)
or even:
mapply(intersect, Membership1, Membership2, SIMPLIFY=FALSE)

Resources