R: create a vector based on a list - r

I have the following list called m1:
> m1
[[1]]
[1] 36 37 38
[[2]]
[1] 34 35
[[3]]
[1] 30 31 32 33
[[4]]
[1] 24 25 26 27 28 29
[[5]]
[1] 20 21 22 23
[[6]]
[1] 14 15 16 17 18 19
[[7]]
[1] 11 12 13
[[8]]
[1] 7 8 9 10
[[9]]
[1] 5 6
[[10]]
[1] 1 2 3 4
[[11]]
integer(0)
I would like to create a vector based on this list, which has the value 1 at positions 36, 37, and 38; the value 2 at positions 34 and 35, etc. The final output should be:
vector_1 <- c(10, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 1, 1, 1)
How can I accomplish this in R?
EDIT:
Thanks to a comment below:
> rep(length(m1):1, sapply(m1, length))
[1] 11 11 11 10 10 9 9 9 9 8 8 8 8 8 8 7 7 7 7 6 6 6 6 6 6 5 5 5 4
[30] 4 4 4 3 3 2 2 2 2
That doesn't quite give me what I want, but it's definitely on the right track!

This should handle cases with empty entries and non-sequential entries....
m1 <- list(c(7,4,5), c(2,10,9), c(1,3,6,8), integer())
# [[1]]
# [1] 7 4 5
#
# [[2]]
# [1] 2 10 9
#
# [[3]]
# [1] 1 3 6 8
#
# [[4]]
# integer(0)
rep(seq_along(m1), sapply(m1, length))[order(unlist(m1))]
#[1] 3 2 3 1 1 3 1 3 2 2

This solution should work for more general cases too even if the elements inside m1 are not in a specific order
#DATA
m1 = list(36:38, 34:35, 30:33, 24:29, 20:23,
14:19, 11:13, 7:10, 5:6, 1:4, integer(0))
#Extract the maximum element in m1
mymax = max(unlist(m1))
#Go through m1 using index and replace respective indices in the position
#defined by the elements of m1, otherwise make the elements zero
Reduce("+", lapply(1:length(m1), function(i)
replace(rep(0, mymax), m1[[i]], i)))
# [1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5
#[24] 4 4 4 4 4 4 3 3 3 3 2 2 1 1 1

Here is a straightforward base-R solution:
# data
m1 <- list(36:38, 34:35, 30:33, 24:29, 20:23, 14:19, 11:13, 7:10, 5:6, 1:4, integer(0))
# Count length, and repeat each number in 1:11 accordingly
rev(rep(1:11, sapply(m1, length)))
[1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5 4 4 4 4 4 4 3 3 3
[33] 3 2 2 1 1 1
Edit:
A more generalisable answer would be:
rev(rep(seq_along(m1), sapply(m1, length)))

Try this:
rev(unlist(sapply(1:length(m1), function(x) rep(x,length(m1[[x]])))))
#or even better, #snoram's edited version of this:
rev(rep(seq_along(m1), sapply(m1, length)))
Output:
[1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5 4
[25] 4 4 4 4 4 3 3 3 3 2 2 1 1 1
Sample data:
m1 <- list(36:38,34:35,30:33,24:29,20:23,
14:19,11:13,7:10,5:6,1:4)
names(m1) <- 1:10

Related

Logical function of the sample numbers to be sequential

Suppose I am generating a matrix of card, each has its own letter (A/B/C) and its own number (2-11).
Now I randomly sampled 5 cards.
card <- data.frame(
pack = rep(c("A","B","C"), 10),
rank = rep(2:11, 3)
)
card
pack rank
1 A 2
2 B 3
3 C 4
4 A 5
5 B 6
6 C 7
7 A 8
8 B 9
9 C 10
10 A 11
11 B 2
12 C 3
13 A 4
14 B 5
15 C 6
16 A 7
17 B 8
18 C 9
19 A 10
20 B 11
21 C 2
22 A 3
23 B 4
24 C 5
25 A 6
26 B 7
27 C 8
28 A 9
29 B 10
30 C 11
s<-card[sample(seq_len(nrow(card)), 5),]
s
And then our sample is generated.
Now, define "sequential rank".
A sample is said to be containing five cards of sequential rank if the numbers are consecutive.
For example, the cards with rank 3 4 5 6 7 is sequential.
Also we define that 2 to be a special number, such that
2 3 4 5 6 is sequential. 8 9 10 11 2 is also sequential. But 9 10 11 2 3 is not sequential.
My target is to set a logical function such that the output is TRUE if the sample cards are sequential, FALSE otherwise.
Now I try to list all the sequential combination out
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
7 8 9 10 11
8 9 10 11 2
But note that 2 4 3 5 6 is also sequential, and the like. So we have actually many many cases to deal with.
My question is, is there a more compact way to set up the aforementioned function (TRUE if the sample cards are sequential, FALSE otherwise.) ?
Here's another approach:
is_run <- function(x) {
x <- sort(x)
# All consecutive runs should return TRUE
if(all(diff(x) == 1)) return(TRUE)
# The special case of 8, 9, 10, 11, 2 should also return TRUE
if(all(x == c(2, 8, 9, 10, 11))) return(TRUE)
# In all other cases this is not a consecutive run
return(FALSE)
}
Tests:
# Sequence
is_run(c(4, 5, 6, 7, 8))
#> [1] TRUE
# Sequence starting with 2
is_run(c(2, 4, 3, 5, 6))
#> [1] TRUE
# Sequence ending with 2
is_run(c(8, 9, 10, 11, 2))
#> [1] TRUE
# Cards not in order but can be arranged into sequence
is_run(c(6, 5, 8, 7, 4))
#> [1] TRUE
# Cards not in order but can be arranged into sequence (including a 2)
is_run(c(9, 11, 10, 2, 8))
#> [1] TRUE
# Non-sequence
is_run(c(3, 4, 6, 7, 8))
#> [1] FALSE
# Wrap around not allowed
is_run(c(10, 11, 2, 3, 4))
#> [1] FALSE
Created on 2022-10-06 with reprex v2.0.2
Exclude special number 2, then sort and check if difference is always 1.
# example input
d <- read.table(text = "
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
7 8 9 10 11
8 9 10 11 2
9 10 11 2 3")
foo <- function(x) all(diff(sort(x[ x != 2 ])) == 1)
apply(d, 1, foo)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
It is unclear exactly how 2 operates. This solution assumes the only "sequential rank" multisets involving 2 are {2, 3, 4, 5, 6} and {2, 8, 9, 10, 11} (e.g., {2, 4, 5, 6, 7} and {2, 2, 9, 10, 11} are not considered "sequential rank").
!any((diff(sort(s)) - 1) %% 5:8)
It can also be vectorized for a 5-by-n matrix:
library(Rfast)
colAll(!((diff(colSort(s)) - 1) %% 5:8))
Example usage:
set.seed(1013119055L)
s <- matrix(sample(2:11, 5e3, TRUE), 5)
s[,colAll(!((diff(colSort(s)) - 1) %% 5:8))]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 3 7 4 6 9 3 9 8 11 6
#> [2,] 7 8 2 2 8 7 8 7 2 8
#> [3,] 4 9 6 4 10 5 7 6 10 7
#> [4,] 5 10 5 5 11 4 6 10 9 10
#> [5,] 6 6 3 3 7 6 5 9 8 9
So you want a function that tells you if a vector of integers contains a straight sequence when ordered.
Edit: And if the last number is 2 but everything else is sequential, it should still be deemed sequential.
This should work:
is.sequential <- function(x) {
if (x[length(x)]==2) {
x <- x[-length(x)] # neglect last element if it is 2
if (2 %in% x) return(FALSE) # but then no other 2s are allowed
}
length(x) == max(x) - min(x) + 1
}
x <- c(1, 4, 5, 2, 3)
is.sequential(x) # TRUE
y <- c(1, 2, 3, 5, 6)
is.sequential(y) # FALSE
z <- c(4, 5, 6, 7, 2)
is.sequential(z) # TRUE
z2 <- c(2, 3, 4, 5, 2)
is.sequential(z2) # FALSE
Note that this function will not work well if x contains anything that is not an integer.

Adding a vector to components of a list

I have the following list:
A <- c(11)
B <- c(7, 13)
C <- c(1, 10, 11, 12)
my_list <- list(A, B, C)
> my_list
[[1]]
[1] 11
[[2]]
[1] 7 13
[[3]]
[1] 1 10 11 12
I would like to add -2, -1, 0, 1, and 2 to each number in this list, and retain all of the unique values within each list element, to obtain the following resulting list:
> my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14
I tried the following code, but I did not get the result I was hoping for:
my_new_list <- lapply(res, `+`, -2:2)
> my_new_list
$`1`
[1] 9 10 11 12 13
$`2`
[1] 5 12 7 14 9
$`3`
[1] -1 9 11 13 3
Why is this happening, and how can I obtain the result I'd like? Thanks!
Assuming that we need the unique values
lapply(my_list, function(x) sort(unique(unlist(lapply(x, `+`, -2:2)))))
Or with outer
lapply(my_list, function(x) sort(unique(c(outer(x, -2:2, `+`)))))
Or with rep and recyling
lapply(my_list, function(x) sort(unique(rep(-2:2, each = length(x)) + x)))
#[[1]]
# [1] 9 10 11 12 13
#[[2]]
# [1] 5 6 7 8 9 11 12 13 14 15
#[[3]]
# [1] -1 0 1 2 3 8 9 10 11 12 13 14
How about this:
my_new_list <- lapply(my_list, function(x) unique(union(x,sapply(x, function(y) y +c(-2:2)) )))
my_new_list <- lapply(my_new_list, sort)
my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14

how to make a vector of x from 1 to max value

In a dataset like this one
what code should I use if I want to make a vector of
x <- 1: max (day)/ID
? So x will be
1:7 for B1
1:11 for B2
1:22 for B3
I tried
max_day <- summaryBy(day ~ ID , df ,FUN=max) # to extract the maximum day per ID
df<- merge (df, max_day) ## to create another column with the maximum day
max_day<- unique(df[,c("ID", " day.max")]) ## to have one value (max) per ID
##& Finlay the vector
x <- 1: (max_day$day.max)
I got this message
Warning message:
In 1:(max_day$day.max) :
numerical expression has 11134 elements: only the first used
Any suggestions?
tapply(df$day, df$ID, function(x) 1:max(x))
I don't know how should look your output, but you can try this:
my_data <- data.frame(ID = c(rep("B1", 3), rep("B2", 4), rep("B3", 3)),
day = sample(1:20, 10, replace = TRUE))
tmp <- aggregate(test$day, by = list(test$ID), FUN = max)
sapply(1:nrow(tmp), function(y) return(1:tmp$x[y]))
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# [[2]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
# [[3]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
We can use sapply to loop over unique element of ID and generate a sequence from 1 to the max for that ID in the day column
sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"])))
#[[1]]
#[1] 1 2 3 4 5 6 7
#[[2]]
#[1] 1 2 3 4 5 6 7 8 9 10 11
#[[3]]
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
If we want all as one vector , we can try unlist
unlist(sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"]))))
#[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10
# 11 12 13 14 15 16 17 18 19 20 21 22
Yet another option, using Hadley Wickham's purrr package, as part of the tidyverse.
d <- data.frame(id = rep(c("B1", "B2", "B3"), c(3, 4, 5)),
v = c(1:3, 1:4, 1:5),
day = c(1, 3, 7, 1, 5, 9, 11, 3, 5, 11, 20, 22),
number = c(15, 20, 30, 25, 26, 28, 35, 10, 12, 14, 16, 18))
library(purrr)
d %>%
split(.$id) %>%
map(~1:max(.$day))
# $B1
# [1] 1 2 3 4 5 6 7
# $B2
# [1] 1 2 3 4 5 6 7 8 9 10 11
# $B3
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
df <-
data.frame(ID = c(rep("B1",3),rep("B2",4),rep("B3",5)),
V = c(1,2,3,1,2,3,4,1,2,3,4,5),
day = c(1,3,7,1,5,9,11,3,5,11,20,22),
number = c(15,20,30,25,26,28,35,10,12,14,16,18))
x <- list()
n <- 1
for(i in unique(df$ID)){
max_day <- max(df$day[df$ID==i])
x[[n]] <- 1:max_day
n <- n+1
}
x
[[1]]
[1] 1 2 3 4 5 6 7
[[2]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[3]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Making a list by grouping consecutive months in R [duplicate]

This question already has answers here:
Split a vector by its sequences [duplicate]
(2 answers)
Closed 6 years ago.
This sounds simple, but having a hard time figuring it out. I have a dataframe (S) with one column populated with numeric months (1-12 i.e Jan-Dec):
S$month
[1] 6 7 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[27] 11 12 2 3 4 6 10 11 12 1 2 3 5 6 7 7
I'd like to split the dataframe into a list as such consecutive months are grouped as shown below:
S[[1]]$month
[1] 6 7
S[[2]]$month
[1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[25] 11 12
S[[3]]$month
[1] 2 3 4
S[[4]]$month
[1] 6
S[[5]]$month
[1] 10 11 12 1 2 3
S[[6]]$month
[1] 5 6 7 7
Note that some months are repetitive because more than one measurement was taken.
Is there any easy way to do it other than writing a lot like:
S[[1]]<-S[c(1:2),]; S[[2]]<-S[c(3:28),]; and so on ...?? because that's quite inefficient!
You can use cumsum and diff to create a group variable and use the split function to turn your vector into a list of consecutive months:
split(month, cumsum(!c(1, diff(month)) %in% c(0, 1, -11)))
# by using c(0, 1, -11), (12, 1) which is the only consecutive case which can have diff of
# -11 and consecutive same months are also considered as legitimate consecutive order.
# $`0`
# [1] 6 7
# $`1`
# [1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
# $`2`
# [1] 2 3 4
# $`3`
# [1] 6
# $`4`
# [1] 10 11 12 1 2 3
# $`5`
# [1] 5 6 7 7
We can do this programmatically and not rely on the output from the diff.
with(S, split(month, cumsum(c(TRUE, diff(cumsum(c(FALSE,
(month==12)[-length(month)]))*12 + month)>1))))
#$`1`
#[1] 6 7
#$`2`
#[1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
#$`3`
#[1] 2 3 4
#$`4`
#[1] 6
#$`5`
#[1] 10 11 12 1 2 3
#$`6`
#[1] 5 6 7 7
data
S <- structure(list(month = c(6, 7, 12, 1, 2, 3, 4, 5, 5, 6, 7, 8,
9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 2, 3, 4,
6, 10, 11, 12, 1, 2, 3, 5, 6, 7, 7)), .Names = "month", row.names = c(NA,
-42L), class = "data.frame")

R Order elements of list

I want to order 2 lists in R and intersect both on the 2 elements with most length. example:
Membership1
[[1]]
[1] 3 4 6 7 8
[[2]]
[1] 5 13 23
[[3]]
[1] 1 2 12 14 15 16 18 21 25 28
Membership2
[[1]]
[1] 8 13 20 21 23
[[2]]
[1] 3 6 7
[[3]]
[1] 1 2 4 5 10 15 17 19 24 25 29
Here, the result would be:
[[3]]
[1] 1 2 12 14 15 16 18 21 25 28
[[1]]
[1] 3 4 6 7 8
[[2]]
[1] 5 13 23
and
[[3]]
[1] 1 2 4 5 10 15 17 19 24 25 29
[[1]]
[1] 8 13 20 21 23
[[2]]
[1] 3 6 7
And then 1, 2, 15, 25 and 8 (intersect of both [[3]])
The intersect function is pretty straighforward, but I donĀ“t understand how to order those lists the way I want.
Manuel
To order a list w.r.t. the lengths of its elements, nonincreasingly, call:
x <- list(1:5, 1:3, 1:7)
(x <- x[order(sapply(x, length), decreasing=TRUE)])
## [[1]]
## [1] 1 2 3 4 5 6 7
##
## [[2]]
## [1] 1 2 3 4 5
##
## [[3]]
## [1] 1 2 3
Thus, the whole task may be solved with e.g.:
Membership1 <- list(c(3, 4, 6, 7, 8), c(5, 13, 23), c(1, 2, 12, 14, 15, 16, 18, 21, 25, 28))
Membership2 <- list(c(8, 13, 20, 21, 23), c(3, 6, 7), c(1, 2, 4, 5, 10, 15, 17, 19, 24, 25, 29))
Membership1 <- Membership1[order(sapply(Membership1, length), decreasing=TRUE)]
Membership2 <- Membership2[order(sapply(Membership2, length), decreasing=TRUE)]
lapply(seq_along(Membership1), function(i) intersect(Membership1[[i]], Membership2[[i]]))
## [[1]]
## [1] 1 2 15 25
##
## [[2]]
## [1] 8
##
## [[3]]
## numeric(0)
Equivalently, as #flodel suggested, the last step may be performed as follows:
Map(intersect, Membership1, Membership2)
or even:
mapply(intersect, Membership1, Membership2, SIMPLIFY=FALSE)

Resources