Sampling from a contingency table

Sampling from a contingency table - r

I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.
It uses expand.grid and then table to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table command returns a table that is of smaller dimensions than the original one.
FunSample<- function(Full, n) {
Frame <- expand.grid(lapply(dim(Full), seq))
table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!
My brain has stopped working, I know it has to be a small tweak to get it back on track!?

A crosstab is also a multinomial distribution, so you can use rmultinom and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.
> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1
[,1] [,2]
[1,] 18 92
[2,] 45 92
, , 2
[,1] [,2]
[1,] 28 72
[2,] 49 104
> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1
[,1] [,2]
[1,] 0 1
[2,] 0 0
, , 2
[,1] [,2]
[1,] 0 1
[2,] 1 1

If you don't want table() to "drop" missing combinations, you need to force the columns of Frame to be factors:
FunSample <- function(Full, n) {
Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )
table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2

You could use tabulate instead of table; it works on integer-valued vectors, as you have here. You could also get the output into an array by using array directly, just like when you created the original data.
FunSample<- function(Full, n) {
samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
array(tabulate(samp), dim=dim(Full))
}

Related

Replacing pair of element of symmetric matrix with NA

I have a positive definite symmetric matrix. Pasting the matrix generated using the following code:
set.seed(123)
m <- genPositiveDefMat(
dim = 3,
covMethod = "unifcorrmat",
rangeVar = c(0,1) )
x <- as.matrix(m$Sigma)
diag(x) <- 1
x
#Output
[,1] [,2] [,3]
[1,] 1.0000000 -0.2432303 -0.4110525
[2,] -0.2432303 1.0000000 -0.1046602
[3,] -0.4110525 -0.1046602 1.0000000
Now, I want to run the matrix through iterations and in each iteration I want to replace the symmetric pair with NA. For example,
Iteration 1:
x[1,2] = x[2,1] <- NA
Iteration2:
x[1,3] = x[3,1] <- NA
and so on....
My idea was to check using a for loop
Prototype:
for( r in 1:nrow(x)
for( c in 1:ncol(x)
if x[r,c]=x[c,r]<-NA
else
x[r,c]
The issue with my code is for row 1 and column 1, the values are equal hence it sets to 0 (which is wrong). Also, the moment it is not NA it comes out of the loop.
Appreciate any help here.
Thanks

If you need the replacement done iteratively, you can use the indexes of values represented by upper.tri(x)/lower.tri to do the replacements pair-by-pair. That will allow you to pass the results to a function before/after each replacement, e.g.:
idx <- which(lower.tri(mat), arr.ind=TRUE)
sel <- cbind(
replace(mat, , seq_along(mat))[ idx ],
replace(mat, , seq_along(mat))[ idx[,2:1] ]
)
# [,1] [,2]
#[1,] 2 4 ##each row represents the lower/upper pair
#[2,] 3 7
#[3,] 6 8
for( i in seq_len(nrow(sel)) ) {
mat[ sel[i,] ] <- NA
print(mean(mat, na.rm=TRUE))
}
#[1] 0.2812249
#[1] 0.5581359
#[1] 1

Finding cumulative sum and then average the values in R

I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.

There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.

k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.

All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5

Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.

Multiply a matrix' columns by its columns

I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..

First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.

Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE

replace diagonal elements in an array

Does anyone know a neat/efficient way to replace diagonal elements in array, similar to the use of diag(x) <- value for a matrix? In other words something like this:
> m<-array(1:27,c(3,3,3))
> for(k in 1:3){
+ diag(m[,,k])<-5
+ }
> m
, , 1
[,1] [,2] [,3]
[1,] 5 4 7
[2,] 2 5 8
[3,] 3 6 5
, , 2
[,1] [,2] [,3]
[1,] 5 13 16
[2,] 11 5 17
[3,] 12 15 5
, , 3
[,1] [,2] [,3]
[1,] 5 22 25
[2,] 20 5 26
[3,] 21 24 5
but without the use of a for loop (my arrays are pretty large and this manipulation will already be within a loop).
Many thanks.

Try this:
with(expand.grid(a = 1:3, b = 1:3), replace(m, cbind(a, a, b), 5))
EDIT:
The question asked for neat/efficient but, of course, those are not the same thing. The one liner here is compact and loop-free but if you are looking for speed I think you will find that the loop in the question is actually the fastest of all the answers.

You can use the following function for that, provided you have only 3 dimensions in your array. You can generalize to more dimensions based on this code, but I'm too lazy to do that for you ;-)
`arraydiag<-` <- function(x,value){
dims <- dim(x)
id <- seq_len(dims[1]) +
dims[2]*(seq_len(dims[2])-1)
id <- outer(id,(seq_len(dims[3])-1)*prod(dims[1:2]),`+`)
x[id] <- value
dim(x) <- dims
x
}
This works like :
m<-array(1:36,c(3,3,4))
arraydiag(m)<-NA
m
Note that, contrary to the diag() function, this function cannot deal with matrices that are not square. You can look at the source code of diag() to find out how to adapt this code in order it does so.

diagArr <-
function (dim)
{
n <- dim[2]
if(dim[1] != n) stop("expecting first two dimensions to be equal")
d <- seq(1, n*n, by=n+1)
as.vector(outer(d, seq(0, by=n*n, length=prod(dim[-1:-2])), "+"))
}
m[diagArr(dim(m))] <- 5
This is written with the intention that it works for dimensions higher than 3 but I haven't tested it in that case. Should be okay though.

Replacing every row in a matrix

I've got a matrix (mat1), say 100 rows and 100 columns; I want to create another matrix where every row is the same as the 1st row in mat1 (except that I want to keep the 1st col as the original values)
I've managed to do this using a loop:
mat2 <- mat1
for(i in 1:nrow(mat1))
{
mat2[i,2:ncol(mat2)] <- mat1[1,2:ncol(mat1)]
}
this works and produces the result I expect; however, I'd have thought there should be a way to do it without a loop; I've tried:
mat2 <- mat1
mat2[c(2:100),2:ncol(mat2)] <- mat1[1,2:ncol(mat1)]
Can someone point out my error?!
Thanks,
Chris

The problem is the way R fills matrices, by columns. Here is a simple example that illustrates this:
mat1 <- matrix(1:9, ncol = 3)
mat2 <- matrix(1:9, ncol = 3)
mat2[-1, -1] <- mat1[1, -1]
mat2
> mat2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 4 4
[3,] 3 7 7
mat1[1, -1] is the vector 4,7, which you can see that R has used to fill the bit of mat2 column-wise. You wanted a row-wise operation.
One solution is to replicate the replacement vector as many times as is required:
> mat2[-1, -1] <- rep(mat1[1, -1], each = nrow(mat1)-1)
> mat2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 4 7
[3,] 3 4 7
This works because the rep() call replicates each value in the vector when we use the "each" argument, instead of replicating (repeating) the vector:
> rep(mat1[1, -1], each = nrow(mat1)-1)
[1] 4 4 7 7
The default behaviour would also give the wrong answer:
> rep(mat1[1, -1], nrow(mat1)-1)
[1] 4 7 4 7
In part, the problem you are seeing is also the way R extends arguments to the appropriate length for the replacement. R actually, and silently, extended the replacement vector exactly in the way rep(mat1[1, -1], nrow(mat1)-1) does, which when coupled with the fill-by-column principle gave the behaviour you saw.

Try
mat2[c(2:nrow(mat2)), 2:ncol(mat2)] <- mat1[rep.int(1,nrow(mat1)-1),2:ncol(mat1)]

Another option...
n = 5
mat1 = matrix(sample(n^2, n^2), n, n)
# use matrix with byrow to copy 1st row n times
mat2 = matrix(rep(mat1[1, ], n), n, n, byrow = TRUE)
# copy 1st column
mat2[ , 1] = mat1[ , 1]
mat1
mat2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Sampling from a contingency table - r

Related

Replacing pair of element of symmetric matrix with NA

Finding cumulative sum and then average the values in R

Multiply a matrix' columns by its columns

replace diagonal elements in an array

Replacing every row in a matrix

Categories

Resources