MIC correlation between 2 matrices in R - r

The MINERVA package provide a function to perform the Maximal Information Coefficient (MIC). The description of the package stipulates that the function mine (x,y) works only with 2 matrices A and B of the same size.
Here, I would like to obtain the MIC coefficient value obtained from the correlation of two A and B matrices of different size, respectfully, A is n by m and B is n by z, with n being the number of observations (rows).
In other words, my aim is to obtain a C matrix of m x z , which returns, for each value, give the MIC correlation coefficient values (and, if possible, the associated P value, if any).
I provide an example of what I want with the Pearson correlation.
set.seed(1)
x <- matrix(rnorm(20), nrow=5, ncol=10)
y <- matrix(rnorm(15), nrow=5, ncol=20)
P <- cor(x, y=y)
I mailed one author of the MINERVA package without success, is there any way I can apply the mine function to obtain the desired m by z correlation?

Let me answer to my own post. In the code below, I use the loop function, which may be not the smartest/fastest way to to do it, but it work as expected.
library(minerva)
set.seed(1)
x <- matrix(rnorm(20), nrow=5, ncol=10)
y <- matrix(rnorm(15), nrow=5, ncol=20)
Result = matrix(ncol = ncol(y),nrow = ncol(x))
for(i in 1:ncol(x))
{Thisvar = x[,i]
print(i)
for(k in 1:ncol(y))
{Thisvar2 = y[,k]
res = mine(Thisvar,Thisvar2, master=TRUE, use="all.obs")
Result[i,k] = res$MIC
}}

Related

How to create correlation matrix after mice multiple imputation

I'm using the mice package to create multiple imputations. I want to create a correlations matrix (and a matrix of p-values for the correlation coefficients. I use miceadds::micombine.cor to do this. But this gives a dataframe with variables in the first to columns, and then a number of columns to contain r, p, t-values, and the like.
I'm looking for a way to turn this dataframe into a "good old" matrix with the correlation coefficient between x and y in position [x,y], and a matrix with p-values Does anyone have an easy way to do this?
Here's some code to reproduce:
data <- mtcars
mt.mis <- prodNA(mtcars, noNA = 0.1)
imputed <-mice(iris.mis, m = 5, maxit = 5, method = "pmm")
correlations<- miceadds::micombine.cor(mi.res=iris.mis, variables = c(1:3))
What I'm looking for is something like the output from cor(mtcars). Who can help?
I ended up writing my own function. Can probably be done much more efficiently, but this is what I made.
cormatrix <- function(r, N){
x <- 1
cormatrix <- matrix(nrow = N, ncol = N) # create empty matrix
for (i in 1:N) {
for (j in i:N) {
if(j>i){
cormatrix[i,j] <- r[x]
cormatrix[j,i] <- r[x]
x <- x + 1
}
}
}
diag(cormatrix) <- 1
cormatrix
}
You can call it with the output of micombine.cor and the number of variables in your model as arguments. So for example cormatrix(correlations$r,ncol(df)).

How can I simulate m random samples of size n from a given distribution with R?

I know how to generate a random sample of size n from a standard statistical distribution, say exponential. But if I want to generate m such random samples of size n (i.e. m vectors of dimension n) how can I do it?
To create a n by m matrix containing m samples of size n you can use:
x <- replicate(m, rnorm(n, ...))
Obviously substituting rnorm with other distributions if desired. If you then want to store these in separate individual vectors then you can use
v <- x[ , i]
This puts the ith column of x into v, which corresponds to the ith sample. It may be easier/quicker to just use a simple for loop altogether though:
for(i in 1:m){
name <- paste("V", i, sep = "")
assign(name, rnorm(n, ...))
}
This generates a random sample at each iteration, and for stage i, names the sample Vi. By the end of it you'll have m random samples named V1, V2, ..., Vm.

How to create a matrix of p-values in ks.test

I'am new with R. If you could help me that would be great. My problem is as follows. I have this code:
N <- 1000 # number of simulations
n <- 30
prob <- 0.001
Y2 <- rep(0, times=N)
for(i in 1:N) {
X <- rbinom(n, size=1, prob)
#binomial experiment
Y2[i] <- sum(X)
}
table(Y2)
library(dgof)
ks.test(Y2, "pnorm", mean=prob*n, sd=sqrt(n*prob*(1-prob)))
I want to generete this code 10 times, so I'll get 10 different p-values and D. Also the parameter n will be not fixed, but it will be a vector of values (50,100,500,1000,5000). So I want a matrix of ks.test like on a picture.
Thx.enter image description here

Matrix computation with for loop

I am newcomer to R, migrated from GAUSS because of the license verification issues.
I want to speed-up the following code which creates n×k matrix A. Given the n×1 vector x and vectors of parameters mu, sig (both of them k dimensional), A is created as A[i,j]=dnorm(x[i], mu[j], sigma[j]). Following code works ok for small numbers n=40, k=4, but slows down significantly when n is around 10^6 and k is about the same size as n^{1/3}.
I am doing simulation experiment to verify the bootstrap validity, so I need to repeatedly compute matrix A for #ofsimulation × #bootstrap times, and it becomes little time comsuming as I want to experiment with many different values of n,k. I vectorized the code as much as I could (thanks to vector argument of dnorm), but can I ask more speed up?
Preemptive thanks for any help.
x = rnorm(40)
mu = c(-1,0,4,5)
sig = c(2^2,0.5^2,2^2,3^2)
n = length(x)
k = length(mu)
A = matrix(NA,n,k)
for(j in 1:k){
A[,j]=dnorm(x,mu[j],sig[j])
}
Your method can be put into a function like this
A.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
A <- matrix(NA,n,k)
for(j in 1:k) A[,j] <- dnorm(x,mu[j],sig[j])
A
}
and it's clear that you are filling the matrix A column by column.
R stores the entries of a matrix columnwise (just like Fortran).
This means that the matrix can be filled with a single call of dnorm using suitable repetitions of x, mu, and sig. The vector z will have the columns of the desired matrix stacked. and then the matrix to be returned can be formed from that vector just by specifying the number of rows an columns. See the following function
B.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
z <- dnorm(rep(x,times=k),rep(mu,each=n),rep(sig,each=n))
B <- matrix(z,nrow=n,ncol=k)
B
}
Let's make an example with your data and test this as follows:
N <- 40
set.seed(11)
x <- rnorm(N)
mu <- c(-1,0,4,5)
sig <- c(2^2,0.5^2,2^2,3^2)
A <- A.fill(x,mu,sig)
B <- B.fill(x,mu,sig)
all.equal(A,B)
# [1] TRUE
I'm assuming that n is an integer multiple of k.
Addition
As noted in the comments B.fill is quite slow for large values of n.
The reason lies in the construct rep(...,each=...).
So is there a way to speed A.fill.
I tested this function:
C.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
sapply(1:k,function(j) dnorm(x,mu[j],sig[j]), simplify=TRUE)
}
This function is about 20% faster than A.fill.

Computing linear regressions for every possible permutation of matrix columns

I have a (k x n) matrix. I have initially managed to linearly regress (using the lm function) column 1 with each and every other column and extracted only the coefficients.
fore.choose <- matrix(0, 1, NCOL(assets))
for(i in seq(1, NCOL(assets), 1))
{
abc <- lm(assets[,1]~assets[,i])$coefficients
fore.choose[1,i] <- abc[2:length(abc)]
}
The coefficients are placed in the fore.choose matrix.
What I now need to do is to linearly regress column 2 with each and every other column, and then column 3 and so on and so forth and extract only the coefficients.
The output will be a square matrix of OLS univariate coefficients. Kind of similar to a correlation matrix, but it is the beta coefficients I am interested in.
fore.choose <- matrix(0, 1, NCOL(assets))
will initially need to become
fore.choose <- matrix(0, NCOL(assets), NCOL(assets))
I'd just compute the coefficients directly from the correlation matrix, using beta = cor(x,y)*sd(x)/sd(y), like this:
# set up some sample data
set.seed(1)
d <- matrix(rnorm(50), ncol=5)
# get the coefficients
s <- apply(d, 2, sd)
cor(d)*outer(s, s, "/")
You could also use lsfit to get the coefficients of one term on all the others at once and then only have one loop to do:
sapply(1:ncol(d), function(i) {
coef(lsfit(d[,i], d))[2,]
})
I'm sure there must be a more elegant way than to nested loops.
fore.choose <- matrix(NA, NCOL(assets), NCOL(assets))
abc <- NULL
for(i in seq_len(ncol(assets))){ # loop over "dependant" columns
for(j in seq_len(ncol(assets))){ # loop over "independant" columns
abc <- lm(assets[,i]~assets[,j])$coefficients
fore.choose[i,j] <- abc[-1]
}
}

Resources