R: split matrix into arbitrary number of blocks - r

Say I have a matrix of values
set.seed(1)
A <- matrix(runif(25),ncol=5)
I'd like to calculate some statistics for approximately square neighborhoods within this matrix of approximately equal size. Either of these kinds of output would do:
N1 <- matrix(c(rep(c("A","A","B","B","B"),2),rep(c("C","C","D","D","D"),3)),ncol=5)
N2 <- matrix(c(rep(c("A","A","A","B","B"),3),rep(c("C","C","D","D","D"),2)),ncol=5)
N1
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "C" "C" "C"
[2,] "A" "A" "C" "C" "C"
[3,] "B" "B" "D" "D" "D"
[4,] "B" "B" "D" "D" "D"
[5,] "B" "B" "D" "D" "D"
N2
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "A" "C" "C"
[2,] "A" "A" "A" "C" "C"
[3,] "A" "A" "A" "D" "D"
[4,] "B" "B" "B" "D" "D"
[5,] "B" "B" "B" "D" "D"
other approximations are also OK, since I can always rotate the matrix. Then I can use these neighborhood matrices to calculate stats using tapply(), like this:
tapply(A,N1,mean)
A B C D
0.6201744 0.5057402 0.4574495 0.5594227
What I want is a function that can make me a matrix of arbitrary dimensions with an arbitrary number of block-like neighborhoods like N1 or N2. I'm having a hard time trying to figure out how such a function would deal with situations where the desired number of blocks are not even squares. N1 and N2 have 4 neighborhoods, but say I wanted 5 for some output something like this:
N3 <- matrix(c("A","A","B","B","B","A","A","C","C","C","D","D","C","C","C",
"D","D","E","E","E","D","D","E","E","E"),ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "D" "D" "D"
[2,] "A" "A" "D" "D" "D"
[3,] "B" "C" "C" "E" "E"
[4,] "B" "C" "C" "E" "E"
[5,] "B" "C" "C" "E" "E"
Does anyone know of an existing function that can do this kind of split, or have any ideas on how to make one? Thank you!
[[Edit]]
My final function, taking into account Vincent's advice:
DecideBLocks <- function(A,nhoods){
nc <- ncol(A)
nr <- nrow(A)
nhood_side <- floor(sqrt((nc*nr)/nhoods))
Neighborhoods <- matrix(paste(ceiling(col(A)/nhood_side), ceiling(row(A)/nhood_side), sep="-"), nc=ncol(A))
nhoods.out <- length(unique(c(Neighborhoods)))
if (nhoods.out != nhoods){
cat(nhoods.out,"neighborhoods created.\nThese were on average",nhood_side,"by",nhood_side,"cells\nit's a different number than that stated the function tries to round things to square neighborhoods\n")
}
return(Neighborhoods)
}
A <- matrix(rnorm(120),12)
B <- DecideBLocks(A,13)

You can try to play with the row and col functions:
they reduce the problem to a 1-dimensional one.
The following defines blocks of size at most 2*2.
matrix(
paste(
ceiling(col(A)/2),
ceiling(row(A)/2),
sep="-"),
nc=ncol(A)
)

You can choose your bdeep (row-spec) and bwide (co-spec) parameters near the center of youree matrix dimensions in whatever manner you like and use this simple function to construct your matrix. As long as the bwide and bdeep are equal, and nrow==ncol, you should get square sub-matrices.
mkblk <- function(bwide, bdeep, nrow, ncol){
bstr1 <- c(rep("A", bdeep), rep("B", nrow-bdeep))
bstr2 <- c(rep("C", bdeep), rep("D", nrow-bdeep))
matrix(c( rep(bstr1, bwide), rep(bstr2, ncol-bwide)), ncol=ncol, nrow=nrow)}
mkblk(2,2,5,5)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "C" "C" "C"
[2,] "A" "A" "C" "C" "C"
[3,] "B" "B" "D" "D" "D"
[4,] "B" "B" "D" "D" "D"
[5,] "B" "B" "D" "D" "D"
#Test of your strategy
tapply(A, mkblk(2,2,5,5), mean)
A B C D
0.6201744 0.5057402 0.4574495 0.5594227

Related

Why does gtools::combinations and permutations not work with a vector containing the same elements?

Say I have a vector vec <- c("H", "H", "H", "H", "M", "M", "A", "A")
How do I get all combinations / permutations if I e.g. draw 5 out of 8 with the expetced ouput.
> head(t, 6)
[,1] [,2] [,3] [,4] [,5]
[1,] "H" "H" "H" "H" "M"
[2,] "H" "H" "H" "H" "M"
[3,] "H" "H" "H" "H" "A"
[4,] "H" "H" "H" "H" "A"
[5,] "H" "H" "H" "M" "M"
[6,] "H" "H" "H" "M" "A"
I tried gtools::combinations() but I always get the error that there are too few different elements (same is true for gtools::permutations() regardless if repeats are allowed or not.
So I did it in a laborious way
t <- gtools::combinations(8, 5, vec, repeats.allowed = F)
Error in gtools::combinations(8, 5, vec, repeats.allowed = F) :
too few different elements
t <- gtools::combinations(8, 5, letters[1:8], repeats.allowed = F)
for ( i in 1:8) {
if ( i <=4 ) {
t[t == letters[i]] <- "H"
} else if (i <= 6) {
t[t == letters[i]] <- "M"
} else if (i <= 8) {
t[t == letters[i]] <- "A"
}
}
I am looking for an easier solution from any package or base R and want to know, why it doesn't work. Thanks in advance.
An alternative
combn(vec,5)
which results in 56 combinations (choose(8,5)).
When you need combinations/permutations of a vector that contains repeats, or multisets, many of the available functions in base R and other packages will produce unnecessary duplicate results that eventually need to be filtered out. For smaller problems, this is not an issue, however this approach quickly becomes impractical.
Currently, there are a couple of packages capable of handling these types of problems. They are arrangements and RcppAlgos (I am the author).
vec <- c("H", "H", "H", "H", "M", "M", "A", "A")
tbl_v <- table(vec)
tbl_v
vec
A H M
2 4 2
library(RcppAlgos)
comboGeneral(names(tbl_v), 5, freqs = tbl_v)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "H" "H" "H"
[2,] "A" "A" "H" "H" "M"
[3,] "A" "A" "H" "M" "M"
[4,] "A" "H" "H" "H" "H"
[5,] "A" "H" "H" "H" "M"
[6,] "A" "H" "H" "M" "M"
[7,] "H" "H" "H" "H" "M"
[8,] "H" "H" "H" "M" "M"
## For package arrangements we have:
## arrangements::combinations(names(tbl_v), 5, freq = tbl_v)
Similarly, for permutations, we have:
permuteGeneral(names(tbl_v), 5, freqs = tbl_v)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "H" "H" "H"
[2,] "A" "A" "H" "H" "M"
[3,] "A" "A" "H" "M" "H"
[4,] "A" "A" "H" "M" "M"
. . . . . .
. . . . . .
. . . . . .
[137,] "M" "M" "H" "A" "A"
[138,] "M" "M" "H" "A" "H"
[139,] "M" "M" "H" "H" "A"
[140,] "M" "M" "H" "H" "H"
## For package arrangements we have:
## arrangements::permutations(names(tbl_v), 5, freq = tbl_v)
Both packages contain algorithms that generate each result without the need for filtering. This approach is much more efficient.
For example, what if we had big_vec <- rep(vec, 8) and we wanted all combinations of length 16. Using the filtering approach, one would need to generate all combinations of a vector of length 64 choose 16 and then filter them. That is choose(64, 16) = 4.885269e+14 total combinations. That's going to be difficult.
With these two packages, this problem is a breeze.
big_vec <- rep(vec, 8)
tbl_big_v <- table(big_vec)
tbl_big_v
big_vec
A H M
16 32 16
system.time(test_big <- comboGeneral(names(tbl_big_v), 16,
freqs = tbl_big_v))
user system elapsed
0 0 0
dim(test_big)
[1] 153 16
apply(gtools::combinations(8,5,repeats.allowed = FALSE),2,\(x) vec[x])
does what you want.
I don't know why the package wants different values if applying it on a vector through. It's unclear within the documentation.

How to generate stratified permutations in R

I would like to generate different possible permutations with the same frequency as in the input vector. For example, I would like to generate the permutations using the vector x in the below example.
library(gtools)
x <- c('A','A','B')
permutations(2, 3, x, repeats.allowed = T)
It gives the below output.
# [,1] [,2] [,3]
# [1,] "A" "A" "A"
# [2,] "A" "A" "B"
# [3,] "A" "B" "A"
# [4,] "A" "B" "B"
# [5,] "B" "A" "A"
# [6,] "B" "A" "B"
# [7,] "B" "B" "A"
# [8,] "B" "B" "B"
But, I want only permutations having A, B with frequencies 2, 1 respectively. The expected output is:
# [,1] [,2] [,3]
# [1,] "A" "A" "B"
# [2,] "A" "B" "A"
# [3,] "B" "A" "A"
Is there any function available in R?
Note: I do not want to do post-processing of the output to get the expected output as my original input contains 300 elements. It is not recommended to generate factorial(300) number of permutations.
Update: The suggested link provides a nice faster solution but fails when the input vector is doubled (eg: length=20) with the error message:
Error in matrix(NA, nrow = N, ncol = prod(sapply(foo, ncol))) :
invalid 'ncol' value (too large or NA)
Your problem can be reformulated as finding all possible permutations of the frequency vector. Take a look at combinat::permn:
x <- c( 'A', 'A', 'B' )
unique(combinat::permn( x ))
# [[1]]
# [1] "A" "A" "B"
# [[2]]
# [1] "A" "B" "A"
# [[3]]
# [1] "B" "A" "A"
unique is necessary to remove duplicate entries, which is automatically done by gtools::permutations you've been using (through the default set=TRUE argument).
If you need the result in matrix format, as in your original question, pass the output as arguments to rbind using do.call:
do.call( rbind, unique(combinat::permn( x )) )
# [,1] [,2] [,3]
# [1,] "A" "A" "B"
# [2,] "A" "B" "A"
# [3,] "B" "A" "A"

Getting paired permutations in R

Context: I have a list of sports teams called teamNames, and I would like to generate their match-ups for each week. I'm not sure if permutations are even the right approach, but I feel like they would be. What I would ideally like is to pass a vector of team names to a function, and then have it give me a matrix where each row has that vector of team names in a different order, such that if I go through them in pairs, I'll get a unique set of match-ups for each row.
For example if my input is teamNames <- c("a", "b", "c", "d"), I want the output to be a matrix that says:
a b c d
a c b d
a d c b
Edit: Further clarification: in this case, the matrix has given me three "weeks" of matchups. First week: "a" vs. "b" and "c" vs. "d"
Second week: "a" vs. "c" and "b" vs. "d"
Third week: "a" vs. "d" and "b" vs. "c"
The closest I've gotten from reading other questions is to use the permutations function in the gtools package as follows:
permutations(length(teamNames), 2, teamNames)
This generates all the possible match-ups, but what it doesn't do is to divide them into sets/weeks. combinations(length(teamNames), 4, teamNames only gives me one set of matchups.
If I understand correctly, if 2 teams are chosen from the 4 teams, the rest two have to be matched. Then it is selecting 2 out of 4. Permutation may not be applied as 'a vs b' == 'b vs a'. No extra package is necessary as the utils package has combn().
> combn(teamNames, 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "a" "b" "b" "c"
[2,] "b" "c" "d" "c" "d" "d"
Above shows selecting 2 teams from 4 and there are some duplication - selecting a and b equals to selecting c and d. If one of those duplicating cases are cancelled out, it'd be alright to set up a schedule.
Update
# Buckminster - I keep updating the code. In this update, the rest two are updated although there are still duplication. Also, among 4, if 2 are determined, the rest two have to be able to be determined (it is a similar idea how to solve a system of equations in linear algebra). In other words, I'm not sure why -1 was given probably by you.
# Update
teamNames <- c("a", "b", "c", "d")
first <- combn(teamNames, 2, simplify = FALSE)
second <- lapply(first, function(x) teamNames[!teamNames %in% x])
bind <- rbind(do.call(cbind, first), do.call(cbind, second))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "a" "b" "b" "c"
[2,] "b" "c" "d" "c" "d" "d"
[3,] "c" "b" "b" "a" "a" "a"
[4,] "d" "d" "c" "d" "c" "b"
Let me check if duplication can be removed easily.

Using a sample list as a template for sampling from a larger list with wraparound

Similar to my question at Using a sample list as a template for sampling from a larger list without wraparound, how can I know do this allowing for a wrap-around?
Thus, if I have a vector of letters:
> all <- letters
> all
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
and then I define a reference sample from letters as follows:
> refSample <- c("j","l","m","s")
in which the spacing between elements is 2 (1st to 2nd), 1 (2nd to 3rd) and 6 (3rd to 4th), how can I then select n samples from all that have identical, wrap-around spacing between its elements to refSample? For example, "a","c","d","j", "q" "s" "t" "z" and "r" "t" "u" "a" would be valid samples, but "a","c","d","k" would not.
Again, parameterised for a function is best.
I would have left it as an exercise but here goes --
all <- letters
refSample <- c("j","l","m","s")
pick_matches <- function(n, ref, full, wrap = FALSE) {
iref <- match(ref,full)
spaces <- diff(iref)
tot_space <- sum(spaces)
N <- length( full ) - 1
max_start <- N - tot_space*(1-wrap)
starts <- sample(0:max_start, n, replace = TRUE)
return( sapply( starts, function(s) full[ 1 + cumsum(c(s, spaces)) %% (N+1) ] ) )
}
> set.seed(1)
> pick_matches(5, refSample, all, wrap = FALSE)
[,1] [,2] [,3] [,4] [,5]
[1,] "e" "g" "j" "p" "d"
[2,] "g" "i" "l" "r" "f"
[3,] "h" "j" "m" "s" "g"
[4,] "n" "p" "s" "y" "m"
> pick_matches(5, refSample, all, wrap = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] "x" "y" "r" "q" "b"
[2,] "z" "a" "t" "s" "d"
[3,] "a" "b" "u" "t" "e"
[4,] "g" "h" "a" "z" "k"

R Question Number of Unique Combinations of A,A,A,A,B,B,B,B,B

I am trying to find a way to get a list in R of all the possible unique permutations of A,A,A,A,B,B,B,B,B.
Combinations was what was originally thought to be the method for obtaining a solution, hence the combinations answers.
I think this is what you're after. #bill was on the ball with the recommendation of combining unique and combn. We'll also use the apply family to generate ALL of the combinations. Since unique removes duplicate rows, we need to transpose the results from combn before uniqueing them. We then transpose them back before returning to the screen so that each column represents a unique answer.
#Daters
x <- c(rep("A", 4), rep("B",5))
#Generates a list with ALL of the combinations
zz <- sapply(seq_along(x), function(y) combn(x,y))
#Filter out all the duplicates
sapply(zz, function(z) t(unique(t(z))))
Which returns:
[[1]]
[,1] [,2]
[1,] "A" "B"
[[2]]
[,1] [,2] [,3]
[1,] "A" "A" "B"
[2,] "A" "B" "B"
[[3]]
[,1] [,2] [,3] [,4]
[1,] "A" "A" "A" "B"
[2,] "A" "A" "B" "B"
[3,] "A" "B" "B" "B"
...
EDIT Since the question is about permuations and not combinations, the answer above is not that useful. This post outlines a function to generate the unique permutations given a set of parameters. I have no idea if it could be improved upon, but here's one approach using that function:
fn_perm_list <-
function (n, r, v = 1:n)
{
if (r == 1)
matrix(v, n, 1)
else if (n == 1)
matrix(v, 1, r)
else {
X <- NULL
for (i in 1:n) X <- rbind(X, cbind(v[i], fn_perm_list(n -
1, r - 1, v[-i])))
X
}
}
zz <- fn_perm_list(9, 9)
#Turn into character matrix. This currently does not generalize well, but gets the job done
zz <- ifelse(zz <= 4, "A", "B")
#Returns 126 rows as indicated in comments
unique(zz)
There's no need to generate permutations and then pick out the unique ones.
Here's a much simpler way (and much, much faster as well): To generate all permutations of 4 A's and 5 B's, we just need to enumerate all possible ways of placing 4 A's among 9 possible locations. This is simply a combinations problem. Here's how we can do this:
x <- rep('B',9) # vector of 9 B's
a_pos <- combn(9,4) # all possible ways to place 4 A's among 9 positions
perms <- apply(a_pos, 2, function(p) replace(x,p,'A')) # all desired permutations
Each column of the 9x126 matrix perms is a unique permutation 4 A's and 5 B's:
> dim(perms)
[1] 9 126
> perms[,1:4] ## look at first few columns
[,1] [,2] [,3] [,4]
[1,] "A" "A" "A" "A"
[2,] "A" "A" "A" "A"
[3,] "A" "A" "A" "A"
[4,] "A" "B" "B" "B"
[5,] "B" "A" "B" "B"
[6,] "B" "B" "A" "B"
[7,] "B" "B" "B" "A"
[8,] "B" "B" "B" "B"
[9,] "B" "B" "B" "B"

Resources