Monte Carlo Simulation with contingency table in R - r

I want to generate the 1000 tables from an existing contingency table and estimate the exact p-value by Monte-Carlo. To do so, I know that the contingency tables have to satisfy the condition S = {p(T) ≤ p (array-observe)}. Then, I have to compute the probabilities of unique tables among the 1000 generated.
I have started a part of the code, but I am stuck in generating the tables respecting the condition mentioned above. I looking for a command that could to so without using the command fisher.test.
Here is my code, where infarctus.tables corresponds to the generated tables from a contingency table and infarctus is my data.
infractus <- matrix(c(6, 4, 2, 0, 7, 4, 1, 0, 2,2, 3, 5, 2, 5, 3, 2), nrow = 4, byrow = T)
infarctus.tables ### how do I generate this?
##probabilities of unique tables among the 1000 generated
tables.prob= lapply( unique(infarctus.tables),
FUN=function(x) { exp( a1 - sum(lgamma( x + 1) ) ) })
hist( unlist(tables.prob))
obs.prob= which(unlist(tables.prob) <= 0.0164141415 )
Any idea on how to do so?

Base R includes function r2dtable. From the documentation.
Description
Generate random 2-way tables with given marginals using Patefield's algorithm.
infractus <- matrix(c(6, 4, 2, 0, 7, 4, 1, 0, 2,2, 3, 5, 2, 5, 3, 2), nrow = 4, byrow = T)
rsums <- rowSums(infractus)
csums <- colSums(infractus)
n <- 5
r2dtable(n, rsums, csums)
#> [[1]]
#> [,1] [,2] [,3] [,4]
#> [1,] 3 5 4 0
#> [2,] 6 3 1 2
#> [3,] 3 4 3 2
#> [4,] 5 3 1 3
#>
#> [[2]]
#> [,1] [,2] [,3] [,4]
#> [1,] 8 3 1 0
#> [2,] 1 6 2 3
#> [3,] 4 3 4 1
#> [4,] 4 3 2 3
#>
#> [[3]]
#> [,1] [,2] [,3] [,4]
#> [1,] 4 3 3 2
#> [2,] 5 4 2 1
#> [3,] 6 1 3 2
#> [4,] 2 7 1 2
#>
#> [[4]]
#> [,1] [,2] [,3] [,4]
#> [1,] 4 5 2 1
#> [2,] 4 6 1 1
#> [3,] 7 1 3 1
#> [4,] 2 3 3 4
#>
#> [[5]]
#> [,1] [,2] [,3] [,4]
#> [1,] 3 4 4 1
#> [2,] 5 4 2 1
#> [3,] 7 2 1 2
#> [4,] 2 5 2 3
Created on 2022-10-02 with reprex v2.0.2
To generate 1000 matrices with the given marginals, run
n <- 1000L
infarctus.tables <- r2dtable(n, rsums, csums)

Related

How to find the repartition of pairs in a 2-colum matrix in R?

Suppose I have a set of pairs that I represent it in a 2-columns matrix like this:
> myMatrix
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 2 6
As you can see, the pair (2,6) has been repeated twice. I need to have a solution to retrieve information like this:
[,1] [,2] [,3]
[1,] 1 5 1
[2,] 2 6 2
[3,] 3 7 1
Is there any solution for this?
An R base alternative:
> x <- data.frame(table(myMatrix[, 1], myMatrix[, 2]))
> subset(x, Freq!=0)
Var1 Var2 Freq
1 1 5 1
5 2 6 2
9 3 7 1
One method is count from dplyr after converting to data.frame
library(dplyr)
as.data.frame(myMatrix) %>%
count(across(everything())) %>%
as.matrix %>%
`dimnames<-`(., NULL)
-output
[,1] [,2] [,3]
[1,] 1 5 1
[2,] 2 6 2
[3,] 3 7 1
data
myMatrix <- structure(c(1, 2, 3, 2, 5, 6, 7, 6), .Dim = c(4L, 2L))
With the aggregate function:
aggregate(m[,1], as.data.frame(m), length)
#> V1 V2 x
#> 1 1 5 1
#> 2 2 6 2
#> 3 3 7 1

sample with replacement but constrain the max frequency of each member to be drawn

Is it possible to extend the sample function in R to not return more than say 2 of the same element when replace = TRUE?
Suppose I have a list:
l = c(1,1,2,3,4,5)
To sample 3 elements with replacement, I would do:
sample(l, 3, replace = TRUE)
Is there a way to constrain its output so that only a maximum of 2 of the same elements are returned? So (1,1,2) or (1,3,3) is allowed, but (1,1,1) or (3,3,3) is excluded?
set.seed(0)
The basic idea is to convert sampling with replacement to sampling without replacement.
ll <- unique(l) ## unique values
#[1] 1 2 3 4 5
pool <- rep.int(ll, 2) ## replicate each unique so they each appear twice
#[1] 1 2 3 4 5 1 2 3 4 5
sample(pool, 3) ## draw 3 samples without replacement
#[1] 4 3 5
## replicate it a few times
## each column is a sample after out "simplification" by `replicate`
replicate(5, sample(pool, 3))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 4 2 2 3
#[2,] 4 5 1 2 5
#[3,] 2 1 2 4 1
If you wish different value to appear up to different number of times, we can do for example
pool <- rep.int(ll, c(2, 3, 3, 4, 1))
#[1] 1 1 2 2 2 3 3 3 4 4 4 4 5
## draw 9 samples; replicate 5 times
oo <- replicate(5, sample(pool, 9))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 1 4 3 2
# [2,] 2 2 4 4 1
# [3,] 4 4 1 1 1
# [4,] 4 2 3 2 5
# [5,] 1 4 2 5 2
# [6,] 3 4 3 3 3
# [7,] 1 4 2 2 2
# [8,] 4 1 4 3 3
# [9,] 3 3 2 2 4
We can call tabulate on each column to count the frequency of 1, 2, 3, 4, 5:
## set `nbins` in `tabulate` so frequency table of each column has the same length
apply(oo, 2L, tabulate, nbins = 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 2 1 1 2
#[2,] 1 2 3 3 3
#[3,] 2 1 2 3 2
#[4,] 3 4 3 1 1
#[5,] 1 0 0 1 1
The count in all columns meet the frequency upper bound c(2, 3, 3, 4, 1) we have set.
Would you explain the difference between rep and rep.int?
rep.int is not the "integer" method for rep. It is just a faster primitive function with less functionality than rep. You can get more details of rep, rep.int and rep_len from the doc page ?rep.

Minimum of cells in two matrices within a moving kernel

I have two matrices m1 and m2.
m1 <- matrix(1:16, ncol = 4)
m2 <- matrix(16:1, ncol = 4)
# > m1
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
# > m2
# [,1] [,2] [,3] [,4]
# [1,] 16 12 8 4
# [2,] 15 11 7 3
# [3,] 14 10 6 2
# [4,] 13 9 5 1
I want to find the minimum between the two matrices for each cell within a moving kernel of 3x3. The outer margines should be ignored, i.e. they can be filled with NAs and the min function should then have na.rm = TRUE. The result should look like this:
# > m3
# [,1] [,2] [,3] [,4]
# [1,] 1 1 3 3
# [2,] 1 1 2 2
# [3,] 2 2 1 1
# [4,] 3 3 1 1
I have already tried a combination of pmin{base} and runmin{caTools} like this:
pmin(runmin(m1, 3, endrule = "keep"),
runmin(m2, 3, endrule = "keep"))
However, this did not work. Probably due to the fact that
"If x is a matrix than each column will be processed separately."
(from ?runmin)
Is there any package, that performs such operations, or is it possible to apply?
Here is a base R approach:
m = pmin(m1, m2)
grid = expand.grid(seq(nrow(m)), seq(ncol(m)))
x = apply(grid, 1, function(u) {
min(m[max(1,u[1]-1):min(nrow(m), u[1]+1), max(1,u[2]-1):min(ncol(m), u[2]+1)])
})
dim(x) = dim(m)
#> x
# [,1] [,2] [,3] [,4]
#[1,] 1 1 3 3
#[2,] 1 1 2 2
#[3,] 2 2 1 1
#[4,] 3 3 1 1

sumcum on matrix using R

I would like to make a cumsum of multiple matrix obtaining the steps. If we consider:
A <- structure(c(1, 2, 3, 2, 3, 1, 4, 1, 2), .Dim = c(3, 3))
# [,1] [,2] [,3]
# [1,] 1 2 4
# [2,] 2 3 1
# [3,] 3 1 2
B <- structure(c(6, 1, 9, 6, 3, 7, 3, 2, 8), .Dim = c(3, 3))
# [,1] [,2] [,3]
# [1,] 6 6 3
# [2,] 1 3 2
# [3,] 9 7 8
C <- structure(c(1, 1, 2, 5, 3, 3, 3, 9, 1), .Dim = c(3, 3))
# [,1] [,2] [,3]
# [1,] 1 5 3
# [2,] 1 3 9
# [3,] 2 3 1
I would like the following results:
[,1] [,2] [,3]
[1,] 1 2 4
[2,] 2 3 1
[3,] 3 1 2
[,1] [,2] [,3]
[1,] 7 8 7
[2,] 3 6 3
[3,] 12 8 10
[,1] [,2] [,3]
[1,] 8 13 10
[2,] 4 9 12
[3,] 14 11 11
with all steps! I could do this with a for loop, but it's slow with big matrix, how can I do this with apply ?
This is a perfect job for Reduce:
Reduce("+", list(A,B,C), accumulate=TRUE)
[[1]]
[,1] [,2] [,3]
[1,] 1 2 4
[2,] 2 3 1
[3,] 3 1 2
[[2]]
[,1] [,2] [,3]
[1,] 7 8 7
[2,] 3 6 3
[3,] 12 8 10
[[3]]
[,1] [,2] [,3]
[1,] 8 13 10
[2,] 4 9 12
[3,] 14 11 11

Multiplication of matrix in to a vector in R

I have a matrix m and a vector v. I would like to multiply the matrix m into vetcor vand get a matrix whith same dimension as m means that multiply first element of m to v and .... How can I do this in R?
m = matrix(c(1, 2, 3, 4, 5), ncol=1)
v = c(1, 2, 3, 4, 5)
> z
[,1]
[1,] 1
[2,] 4
[3,] 9
[4,] 16
[5,] 25
Cross products can be obtained using the %*% operator:
> m = matrix(c(1, 2, 3, 4, 5), ncol=1)
> v = c(1, 2, 3, 4, 5)
> m %*% v
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 4 6 8 10
[3,] 3 6 9 12 15
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
> m * v
[,1]
[1,] 1
[2,] 4
[3,] 9
[4,] 16
[5,] 25

Resources