R Combinatoric Sequence Generation - r

I am attempting to generate a matrix where each column represents a sequence of factors in R. The factors can assume the values 1, 2, 3 or 4. Each sequence has 13 elements suggesting a total of 4^13 potential sequences. However, only a specific subset of these potential sequences are considered valid. The logic is as follows:
A sequence can start at any factor
If a sequence starts at 4, its second element can be less than or equal to 4
Once an element drops below 4, subsequent entries must be weakly increasing
If a sequence starts with 1, 2, or 3 it must be weakly increasing
So for example, the sequence (1,2,3,3,3.....3) is valid. The sequence (4,4,1,1,2,4,4....4) is also valid. The sequence (4,1,2,3,1,1....1) is not, since it is not weakly increasing after the first drop from 4 to 1.
At the moment, I have code to combine the 2&3 factors and generate this matrix. The process involves generating a matrix of all possible sequences and then filtering down based on the above logic. This is highly inefficient, but I can post it if necessary. This process also cannot be generalized to a four factor model, as the 4^13 potential sequences overwhelm my machine.
If any of you can offer insight into how I might generate these valid sequences, it would be greatly appreciated. Thank you.

I am assuming that once a gradually increasing vector reaches 4, it cannot jump back down again to a lower value the way that it can if 4 is the first number (if it can, the code is actually easier).
The following function generates compatible sequences, essentially using switch to implement a Markov chain
generate_seq <- function(n)
{
x <- numeric(n)
x[1] <- sample(4, 1)
had_a_four <- FALSE
for(i in seq(n - 1)) {
if(!had_a_four)
{
x[i + 1] <- switch(x[i], sample(1:2, 1, prob = c(3, 1)),
sample(2:3, 1, prob = c(3, 1)),
sample(3:4, 1, prob = c(3, 1)),
sample(4, 1))
}
else
{
x[i + 1] <- switch(x[i], sample(1:2, 1, prob = c(3, 1)),
sample(2:3, 1, prob = c(3, 1)),
sample(3:4, 1, prob = c(3, 1)),
4)
}
if(x[i + 1] == 4 & !all(x[1:(i+1)] == 4)) had_a_four <- TRUE
}
x
}
And we can generate a 4-column matrix like this:
set.seed(4)
matrix(replicate(4, generate_seq(13)), ncol = 4)
#> [,1] [,2] [,3] [,4]
#> [1,] 4 4 1 1
#> [2,] 3 3 1 2
#> [3,] 3 4 2 3
#> [4,] 3 4 2 3
#> [5,] 4 4 2 4
#> [6,] 4 4 3 4
#> [7,] 4 4 3 4
#> [8,] 4 4 3 4
#> [9,] 4 4 4 4
#> [10,] 4 4 4 4
#> [11,] 4 4 4 4
#> [12,] 4 4 4 4
#> [13,] 4 4 4 4

I think you can use RcppAlgos to do this efficiently by generating the combinations for vectors of length 1:n (where it's assumed that the shorter vectors are left padded to length 13 with 4) :
library(RcppAlgos)
get_combos <- function(n) {
unique(do.call(rbind, sapply(rev(seq(n)), function(x)
do.call(
cbind, c(rep(4, n - x), list(comboGeneral(1:4, x, TRUE)))
))))
}
res <- get_combos(13)
head(res)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 1 1 1 2
[3,] 1 1 1 1 1 1 1 1 1 1 1 1 3
[4,] 1 1 1 1 1 1 1 1 1 1 1 1 4
[5,] 1 1 1 1 1 1 1 1 1 1 1 2 2
[6,] 1 1 1 1 1 1 1 1 1 1 1 2 3
nrow(res)
[1] 2367

Related

Converting a list of lists into a data frame

I have a function that first generates a list of vectors (generated by using lapply), and then cbinds it to a column vector. I thought this would produce a dataframe. However, it produces a list of lists.
The cbind function isn't working as I thought it would.
Here's a small example of what the function is generating
col_test <- c(1, 2, 1, 1, 2)
lst_test <- list(c(1, 2 , 3), c(2, 2, 2), c(1, 1, 2), c(1, 2, 2), c(1, 1, 1))
a_df <- cbind(col_test, lst_test)
Typing
> a_df[1,]
gives the output
$`col_test`
[1] 1
$lst_test
[1] 1 2 3
I'd like the data frame to be
[,1] [,2] [,3] [,4]
[1,] 1 1 2 3
[2,] 2 2 2 2
[3,] 1 1 1 2
[4,] 1 1 2 2
[5,] 2 1 1 1
How do I get it into this form?
data.frame(col_test,t(as.data.frame(lst_test)))
do.call(rbind, Map(c, col_test, lst_test))
# [,1] [,2] [,3] [,4]
#[1,] 1 1 2 3
#[2,] 2 2 2 2
#[3,] 1 1 1 2
#[4,] 1 1 2 2
#[5,] 2 1 1 1
col_test <- c(1, 2, 1, 1, 2)
lst_test <- list(c(1, 2 , 3), c(2, 2, 2), c(1, 1, 2), c(1, 2, 2), c(1, 1, 1))
name the sublists so we can use bind_rows
names(lst_test) <- 1:length(lst_test)
lst_test1 <- bind_rows(lst_test)
the bind_rows function binds by cols in this case so we need to pivot it
lst_test_pivot <- t(lst_test1)
but this gives us a matrix, so we need to cast it back to a dataframe
lst_test_pivot_df <- as.data.frame(lst_test_pivot)
now it works as
cbind(col_test, lst_test_pivot_df)
now produces
col_test V1 V2 V3
1 1 1 2 3
2 2 2 2 2
3 1 1 1 2
4 1 1 2 2
5 2 1 1 1
This should do the trick. Note that we are using do.call so that the individual elements of lst_test are sent as parameters to cbind, which prevents cbind from creating a list-of-lists. t is used to transpose the resulting matrix to your preferred orientation, and finally, one more cbind with col_test inserts that data as well.
library(tidyverse)
mat.new <- do.call(cbind, lst_test) %>%
t %>%
cbind(col_test, .) %>%
unname
[,1] [,2] [,3] [,4]
[1,] 1 1 2 3
[2,] 2 2 2 2
[3,] 1 1 1 2
[4,] 1 1 2 2
[5,] 2 1 1 1

Randomly populate R dataframe with integers between

I would like to create an R dataframe with random integers WITHOUT repetition.
I have come up with this approach which works:
rank_random<-data.frame(matrix(NA, nrow = 13, ncol = 30)
for (colIdx in seq(1:30) {
rank_random[colIdx,] <-sample(1:ncol(subset(exc_ret, select=-c(Date))), 30,
replace=F)
}
I assume that you mean without repetition on each row. If you meant something else, please clarify.
For your example:
N= ncol(subset(exc_ret, select=-c(Date)))
num.rows = 30
t(sapply( seq(num.rows),
FUN=function(x){sample(1:N, num.rows, replace=F)} ))
To test it for a simpler case
N= 5
num.rows = 5
t(sapply( seq(num.rows),
FUN=function(x){sample(1:N, num.rows, replace=F)} ))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 4 5 1 3
# [2,] 2 5 1 3 4
# [3,] 5 1 4 3 2
# [4,] 3 4 5 2 1
# [5,] 3 2 5 1 4

R: sort using probabilities calculated from vector values

Given a vector, say c(1, 2, 3), I'd like to generate samples of this vector sorted according to probabilities calculated from its values. The process is illustrated below - is there an R function that does this?
A simple example, use probabilities calculated as the value divided by the vector sum: c(1/6, 2/6, 3/6) to determine the first value in the sorted vector. In this case value 3 has probability 3/6 or 50% of being the first element, value 2 has probability 2/6 or 33.3% of being the first element and 1 has probability 1/6 or 16.6%.
After the first element is selected, the process continues similarly for the remaining elements of the vector until a 'statistically' ordered vector is produced.
As the number of 'statistically' ordered samples grows, I'd expect 3 to be first 50% of the time, etc. A mocked up example of a sample size 6:
c(3, 2, 1)
c(2, 3, 1)
c(3, 1, 2)
c(3, 2, 1)
c(1, 3, 2)
c(2, 1, 3)
sample(1:3, prob = 1:3, replace = FALSE)
testing it:
set.seed(42)
res <- replicate(1e5, sample(1:3, prob = 1:3, replace = FALSE))
prop.table(table(res[1,]))
# 1 2 3
#0.16620 0.33324 0.50056
prop.table(table(res[2,]))
# 1 2 3
#0.25026 0.39827 0.35147
prop.table(table(res[3,]))
# 1 2 3
#0.58354 0.26849 0.14797
Try
N <- 100
X <- 3
replicate(N, sample(X, prob=prop.table(1:X)))
Output
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 3 3 3 3 3 1 3 3 3 3 2 3 2 2
[2,] 2 1 2 2 1 3 1 1 1 1 3 2 1 3
[3,] 1 2 1 1 2 2 2 2 2 2 1 1 3 1
# etc
You can transpose the output if you prefer
t(replicate(N, sample(X, prob=prop.table(1:X))))

Swap a negative value in an R data table with the previous column value

I have a data table where I want to swap negative values by assigning them the positive value in the previous row for the same column. for ex:
1 2 3 4
2 -3 -2 3
should be
1 2 3 4
2 2 3 3
Thanks!
Since there are no answers from more experienced guys, here is what I've come up with.
# I'm reconstructing your example:
n <- matrix(c(1, 2, 2, -3, 3, -2, 4, 3), nrow = 2)
n
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 -3 -2 3
changeMat <- function(mat) {
new_mat <- mat
for(i in 1:length(mat))
ifelse(mat[i] < 0, new_mat[i] <- mat[i-1], new_mat[i] <- mat[i])
return(new_mat)
}
changeMat(n)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 2 3 3
I checked that for data.table object dt changeMat(as.matrix(dt)) would work properly.
Anyway, I am pretty sure that there must be smarter way...

How to print row index and occurences count of zeros in rows in R data.frame

I want to print row index and the number of zeros present in each row of a R data.frame ..
The input matrix is like this:
A B
rowIndex1 0 1
rowIndex2 1 1
I thought to use this:
print(which(rowSums(matrix == 0) != 0))
I want that it prints something like this:
rowIndex1
1
However it does not print the number of zeros in the rows but a different number (I checked it) - like this:
rowIndex1
2400
How to achieve it?
Thanks
As mentioned in my comment, perhaps arr.ind would be of use.
Using #bartektartanus's sample data:
m <- diag(5) + c(0:6,0,0)
table(which(m == 0, arr.ind=TRUE)[, "row"])
#
# 2 3 4 5
# 1 2 1 1
The "names" (in this case, 2, 3, 4, and 5) are your row numbers and the values (in this case, 1, 2, 1, 1) are the counts.
Here is the output of which, so you can understand what is going on:
which(m == 0, arr.ind=TRUE)
# row col
# [1,] 3 2
# [2,] 4 2
# [3,] 5 2
# [4,] 2 4
# [5,] 3 4
This is working good. You get row number that contains zero.
> m <- diag(5) + c(0:6,0,0)
Warning message:
In diag(5) + c(0:6, 0, 0) :
longer object length is not a multiple of shorter object length
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 6 2
[2,] 1 7 2 0 3
[3,] 2 0 4 0 4
[4,] 3 0 4 1 5
[5,] 4 0 5 1 7
> which(rowSums(m == 0) != 0)
[1] 2 3 4 5
to obtain what you want use this:
> x <- rowSums(m==0)
> cbind(which(x!=0),x[x!=0])
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] 4 1
[4,] 5 1

Resources