Logical function of the sample numbers to be sequential - r

Suppose I am generating a matrix of card, each has its own letter (A/B/C) and its own number (2-11).
Now I randomly sampled 5 cards.
card <- data.frame(
pack = rep(c("A","B","C"), 10),
rank = rep(2:11, 3)
)
card
pack rank
1 A 2
2 B 3
3 C 4
4 A 5
5 B 6
6 C 7
7 A 8
8 B 9
9 C 10
10 A 11
11 B 2
12 C 3
13 A 4
14 B 5
15 C 6
16 A 7
17 B 8
18 C 9
19 A 10
20 B 11
21 C 2
22 A 3
23 B 4
24 C 5
25 A 6
26 B 7
27 C 8
28 A 9
29 B 10
30 C 11
s<-card[sample(seq_len(nrow(card)), 5),]
s
And then our sample is generated.
Now, define "sequential rank".
A sample is said to be containing five cards of sequential rank if the numbers are consecutive.
For example, the cards with rank 3 4 5 6 7 is sequential.
Also we define that 2 to be a special number, such that
2 3 4 5 6 is sequential. 8 9 10 11 2 is also sequential. But 9 10 11 2 3 is not sequential.
My target is to set a logical function such that the output is TRUE if the sample cards are sequential, FALSE otherwise.
Now I try to list all the sequential combination out
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
7 8 9 10 11
8 9 10 11 2
But note that 2 4 3 5 6 is also sequential, and the like. So we have actually many many cases to deal with.
My question is, is there a more compact way to set up the aforementioned function (TRUE if the sample cards are sequential, FALSE otherwise.) ?

Here's another approach:
is_run <- function(x) {
x <- sort(x)
# All consecutive runs should return TRUE
if(all(diff(x) == 1)) return(TRUE)
# The special case of 8, 9, 10, 11, 2 should also return TRUE
if(all(x == c(2, 8, 9, 10, 11))) return(TRUE)
# In all other cases this is not a consecutive run
return(FALSE)
}
Tests:
# Sequence
is_run(c(4, 5, 6, 7, 8))
#> [1] TRUE
# Sequence starting with 2
is_run(c(2, 4, 3, 5, 6))
#> [1] TRUE
# Sequence ending with 2
is_run(c(8, 9, 10, 11, 2))
#> [1] TRUE
# Cards not in order but can be arranged into sequence
is_run(c(6, 5, 8, 7, 4))
#> [1] TRUE
# Cards not in order but can be arranged into sequence (including a 2)
is_run(c(9, 11, 10, 2, 8))
#> [1] TRUE
# Non-sequence
is_run(c(3, 4, 6, 7, 8))
#> [1] FALSE
# Wrap around not allowed
is_run(c(10, 11, 2, 3, 4))
#> [1] FALSE
Created on 2022-10-06 with reprex v2.0.2

Exclude special number 2, then sort and check if difference is always 1.
# example input
d <- read.table(text = "
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
7 8 9 10 11
8 9 10 11 2
9 10 11 2 3")
foo <- function(x) all(diff(sort(x[ x != 2 ])) == 1)
apply(d, 1, foo)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE

It is unclear exactly how 2 operates. This solution assumes the only "sequential rank" multisets involving 2 are {2, 3, 4, 5, 6} and {2, 8, 9, 10, 11} (e.g., {2, 4, 5, 6, 7} and {2, 2, 9, 10, 11} are not considered "sequential rank").
!any((diff(sort(s)) - 1) %% 5:8)
It can also be vectorized for a 5-by-n matrix:
library(Rfast)
colAll(!((diff(colSort(s)) - 1) %% 5:8))
Example usage:
set.seed(1013119055L)
s <- matrix(sample(2:11, 5e3, TRUE), 5)
s[,colAll(!((diff(colSort(s)) - 1) %% 5:8))]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 3 7 4 6 9 3 9 8 11 6
#> [2,] 7 8 2 2 8 7 8 7 2 8
#> [3,] 4 9 6 4 10 5 7 6 10 7
#> [4,] 5 10 5 5 11 4 6 10 9 10
#> [5,] 6 6 3 3 7 6 5 9 8 9

So you want a function that tells you if a vector of integers contains a straight sequence when ordered.
Edit: And if the last number is 2 but everything else is sequential, it should still be deemed sequential.
This should work:
is.sequential <- function(x) {
if (x[length(x)]==2) {
x <- x[-length(x)] # neglect last element if it is 2
if (2 %in% x) return(FALSE) # but then no other 2s are allowed
}
length(x) == max(x) - min(x) + 1
}
x <- c(1, 4, 5, 2, 3)
is.sequential(x) # TRUE
y <- c(1, 2, 3, 5, 6)
is.sequential(y) # FALSE
z <- c(4, 5, 6, 7, 2)
is.sequential(z) # TRUE
z2 <- c(2, 3, 4, 5, 2)
is.sequential(z2) # FALSE
Note that this function will not work well if x contains anything that is not an integer.

Related

Extract cumulative unique values in a rolling basis (reset and resume) using data.table R

Given a data.table, I would like to extract cumulative unique elements until it reachs three unique values, than reset and resume:
y <- data.table(a=c(1, 2, 2, 3, 3, 4, 3, 2, 2, 5, 6, 7, 9, 8))
The desired output unique_acc_roll_3 is:
a unique_acc_roll_3
1 1
2 1 2
2 1 2
3 1 2 3
3 1 2 3
4 4 #4 is the forth element, so it resets and start again
3 3 4
2 2 3 4
2 2 3 4
5 5 #5 is the forth element, so it resets and start again
6 5 6
7 5 6 7
9 9 #9 is the forth element, so it resets and start again
8 8 9
Because it refers back recursively, I really got stucked... Real data is large, so data.table solutions would be great.
I can't think of any way to avoid a for loop essentially, except to hide it behind a Reduce call. My logic is to keep union-ing each new value at each row, until the set grows to length == n, at which point the new value is used as the starting point to the next iteration of the loop.
unionlim <- function(x, y, n=4) {
u <- union(x,y)
if(length(u) == n) y else u
}
y[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
# a out
# 1: 1 1
# 2: 2 1 2
# 3: 2 1 2
# 4: 3 1 2 3
# 5: 3 1 2 3
# 6: 4 4
# 7: 3 4 3
# 8: 2 4 3 2
# 9: 2 4 3 2
#10: 5 5
#11: 6 5 6
#12: 7 5 6 7
#13: 9 9
#14: 8 9 8
This is far from the fastest code on the planet, but a quick test suggests it will chew about 1M cases in ~15 seconds on my decent machine.
bigy <- y[rep(1:nrow(y), 75e3)]
system.time({
bigy[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
})
# user system elapsed
# 14.27 0.09 15.06
purrr::accumulate also does the work here
y$b <- accumulate(y$a, ~if(length(union(.x, .y)) == 4) .y else union(.x, .y))
y
a b
1 1 1
2 2 1, 2
3 2 1, 2
4 3 1, 2, 3
5 3 1, 2, 3
6 4 4
7 3 4, 3
8 2 4, 3, 2
9 2 4, 3, 2
10 5 5
11 6 5, 6
12 7 5, 6, 7
13 9 9
14 8 9, 8

R: create a vector based on a list

I have the following list called m1:
> m1
[[1]]
[1] 36 37 38
[[2]]
[1] 34 35
[[3]]
[1] 30 31 32 33
[[4]]
[1] 24 25 26 27 28 29
[[5]]
[1] 20 21 22 23
[[6]]
[1] 14 15 16 17 18 19
[[7]]
[1] 11 12 13
[[8]]
[1] 7 8 9 10
[[9]]
[1] 5 6
[[10]]
[1] 1 2 3 4
[[11]]
integer(0)
I would like to create a vector based on this list, which has the value 1 at positions 36, 37, and 38; the value 2 at positions 34 and 35, etc. The final output should be:
vector_1 <- c(10, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 1, 1, 1)
How can I accomplish this in R?
EDIT:
Thanks to a comment below:
> rep(length(m1):1, sapply(m1, length))
[1] 11 11 11 10 10 9 9 9 9 8 8 8 8 8 8 7 7 7 7 6 6 6 6 6 6 5 5 5 4
[30] 4 4 4 3 3 2 2 2 2
That doesn't quite give me what I want, but it's definitely on the right track!
This should handle cases with empty entries and non-sequential entries....
m1 <- list(c(7,4,5), c(2,10,9), c(1,3,6,8), integer())
# [[1]]
# [1] 7 4 5
#
# [[2]]
# [1] 2 10 9
#
# [[3]]
# [1] 1 3 6 8
#
# [[4]]
# integer(0)
rep(seq_along(m1), sapply(m1, length))[order(unlist(m1))]
#[1] 3 2 3 1 1 3 1 3 2 2
This solution should work for more general cases too even if the elements inside m1 are not in a specific order
#DATA
m1 = list(36:38, 34:35, 30:33, 24:29, 20:23,
14:19, 11:13, 7:10, 5:6, 1:4, integer(0))
#Extract the maximum element in m1
mymax = max(unlist(m1))
#Go through m1 using index and replace respective indices in the position
#defined by the elements of m1, otherwise make the elements zero
Reduce("+", lapply(1:length(m1), function(i)
replace(rep(0, mymax), m1[[i]], i)))
# [1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5
#[24] 4 4 4 4 4 4 3 3 3 3 2 2 1 1 1
Here is a straightforward base-R solution:
# data
m1 <- list(36:38, 34:35, 30:33, 24:29, 20:23, 14:19, 11:13, 7:10, 5:6, 1:4, integer(0))
# Count length, and repeat each number in 1:11 accordingly
rev(rep(1:11, sapply(m1, length)))
[1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5 4 4 4 4 4 4 3 3 3
[33] 3 2 2 1 1 1
Edit:
A more generalisable answer would be:
rev(rep(seq_along(m1), sapply(m1, length)))
Try this:
rev(unlist(sapply(1:length(m1), function(x) rep(x,length(m1[[x]])))))
#or even better, #snoram's edited version of this:
rev(rep(seq_along(m1), sapply(m1, length)))
Output:
[1] 10 10 10 10 9 9 8 8 8 8 7 7 7 6 6 6 6 6 6 5 5 5 5 4
[25] 4 4 4 4 4 3 3 3 3 2 2 1 1 1
Sample data:
m1 <- list(36:38,34:35,30:33,24:29,20:23,
14:19,11:13,7:10,5:6,1:4)
names(m1) <- 1:10

how to make a vector of x from 1 to max value

In a dataset like this one
what code should I use if I want to make a vector of
x <- 1: max (day)/ID
? So x will be
1:7 for B1
1:11 for B2
1:22 for B3
I tried
max_day <- summaryBy(day ~ ID , df ,FUN=max) # to extract the maximum day per ID
df<- merge (df, max_day) ## to create another column with the maximum day
max_day<- unique(df[,c("ID", " day.max")]) ## to have one value (max) per ID
##& Finlay the vector
x <- 1: (max_day$day.max)
I got this message
Warning message:
In 1:(max_day$day.max) :
numerical expression has 11134 elements: only the first used
Any suggestions?
tapply(df$day, df$ID, function(x) 1:max(x))
I don't know how should look your output, but you can try this:
my_data <- data.frame(ID = c(rep("B1", 3), rep("B2", 4), rep("B3", 3)),
day = sample(1:20, 10, replace = TRUE))
tmp <- aggregate(test$day, by = list(test$ID), FUN = max)
sapply(1:nrow(tmp), function(y) return(1:tmp$x[y]))
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# [[2]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
# [[3]]
# [1] 1 2 3 4 5 6 7 8 9 10 11
We can use sapply to loop over unique element of ID and generate a sequence from 1 to the max for that ID in the day column
sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"])))
#[[1]]
#[1] 1 2 3 4 5 6 7
#[[2]]
#[1] 1 2 3 4 5 6 7 8 9 10 11
#[[3]]
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
If we want all as one vector , we can try unlist
unlist(sapply(unique(df$ID), function(x) seq(1, max(df[df$ID == x, "day"]))))
#[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10
# 11 12 13 14 15 16 17 18 19 20 21 22
Yet another option, using Hadley Wickham's purrr package, as part of the tidyverse.
d <- data.frame(id = rep(c("B1", "B2", "B3"), c(3, 4, 5)),
v = c(1:3, 1:4, 1:5),
day = c(1, 3, 7, 1, 5, 9, 11, 3, 5, 11, 20, 22),
number = c(15, 20, 30, 25, 26, 28, 35, 10, 12, 14, 16, 18))
library(purrr)
d %>%
split(.$id) %>%
map(~1:max(.$day))
# $B1
# [1] 1 2 3 4 5 6 7
# $B2
# [1] 1 2 3 4 5 6 7 8 9 10 11
# $B3
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
df <-
data.frame(ID = c(rep("B1",3),rep("B2",4),rep("B3",5)),
V = c(1,2,3,1,2,3,4,1,2,3,4,5),
day = c(1,3,7,1,5,9,11,3,5,11,20,22),
number = c(15,20,30,25,26,28,35,10,12,14,16,18))
x <- list()
n <- 1
for(i in unique(df$ID)){
max_day <- max(df$day[df$ID==i])
x[[n]] <- 1:max_day
n <- n+1
}
x
[[1]]
[1] 1 2 3 4 5 6 7
[[2]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[3]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Making a list by grouping consecutive months in R [duplicate]

This question already has answers here:
Split a vector by its sequences [duplicate]
(2 answers)
Closed 6 years ago.
This sounds simple, but having a hard time figuring it out. I have a dataframe (S) with one column populated with numeric months (1-12 i.e Jan-Dec):
S$month
[1] 6 7 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[27] 11 12 2 3 4 6 10 11 12 1 2 3 5 6 7 7
I'd like to split the dataframe into a list as such consecutive months are grouped as shown below:
S[[1]]$month
[1] 6 7
S[[2]]$month
[1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[25] 11 12
S[[3]]$month
[1] 2 3 4
S[[4]]$month
[1] 6
S[[5]]$month
[1] 10 11 12 1 2 3
S[[6]]$month
[1] 5 6 7 7
Note that some months are repetitive because more than one measurement was taken.
Is there any easy way to do it other than writing a lot like:
S[[1]]<-S[c(1:2),]; S[[2]]<-S[c(3:28),]; and so on ...?? because that's quite inefficient!
You can use cumsum and diff to create a group variable and use the split function to turn your vector into a list of consecutive months:
split(month, cumsum(!c(1, diff(month)) %in% c(0, 1, -11)))
# by using c(0, 1, -11), (12, 1) which is the only consecutive case which can have diff of
# -11 and consecutive same months are also considered as legitimate consecutive order.
# $`0`
# [1] 6 7
# $`1`
# [1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
# $`2`
# [1] 2 3 4
# $`3`
# [1] 6
# $`4`
# [1] 10 11 12 1 2 3
# $`5`
# [1] 5 6 7 7
We can do this programmatically and not rely on the output from the diff.
with(S, split(month, cumsum(c(TRUE, diff(cumsum(c(FALSE,
(month==12)[-length(month)]))*12 + month)>1))))
#$`1`
#[1] 6 7
#$`2`
#[1] 12 1 2 3 4 5 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
#$`3`
#[1] 2 3 4
#$`4`
#[1] 6
#$`5`
#[1] 10 11 12 1 2 3
#$`6`
#[1] 5 6 7 7
data
S <- structure(list(month = c(6, 7, 12, 1, 2, 3, 4, 5, 5, 6, 7, 8,
9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 2, 3, 4,
6, 10, 11, 12, 1, 2, 3, 5, 6, 7, 7)), .Names = "month", row.names = c(NA,
-42L), class = "data.frame")

cbind recycled rows to add to last row in R

I have a question about cbinding recycled items. I simplified my problem into the following code.
I have two objects "a" and "b". "a" has 5 rows and "b" has 10 rows.
When I cbind them, I get a data.frame with 10 rows, and my column "a" recycles until it reaches 10 rows. My problem is, how do i recycle the values so it adds to the length(a). Thanks!
a <- c(4, 3, 5, 2, 8)
b <- c(1:10)
cbind(a,b)
a b
1 4 1
2 3 2
3 5 3
4 2 4
5 8 5
6 4 6
7 3 7
8 5 8
9 2 9
10 8 10
What I want to do: a[6] = a[5] + 4, a[7] = a[5] + 5, ... a[10] = a[5] + 8
a b
1 4 1
2 3 2
3 5 3
4 2 4
5 8 5
6 12 6
7 11 7
8 13 8
9 10 9
10 16 10
Do you mean this? I have 5 items and I'm adding a[5] to the the next 5 items, 2*a[5] to the next 5 items and so on.
a <- c(4, 3, 5, 2, 8)
b <- c(1:11)
counter <-0:floor(length(b)-1)/length(a))
new.col <- rep(a[length(a)] * counter, each = length(a)) + a
length(new.col) <- length(b)
new.col
[1] 4 3 5 2 8 12 11 13 10 16
The first length(a) items stay intact, we add a[5] to the next length(a) items, 2*a[5] to the next length(a) items and so on...

Resources