Apply a pairwise function on the columns of a matrix in R

Apply a pairwise function on the columns of a matrix in R - r

I would like to apply some pairwise correlation estimator on all the columns of large matrix. Since I am working with large dimensions, I am looking for a function that might be of help here. I have been experimenting with the apply function but I have not gotten very far so all help is greatly appreciated.
require(ccaPP)
require(mvtnorm)
d<- 10
Sigma <- matrix(0.2, nrow = d, ncol = d)
diag(Sigma) <- 1
#Data generation
X <- rmvnorm(100, sigma = Sigma) # 100 x d matrix
Q <- apply(X, 2, FUN = corQuadrant, consistent = TRUE)
I apologise if this has been asked before but a search did not reveal something I can use. Thank you.

Nested for loops are one way:
require(ccaPP)
require(mvtnorm)
d<- 10
Sigma <- matrix(0.2, nrow = d, ncol = d)
diag(Sigma) <- 1
#Data generation
X <- rmvnorm(100, sigma = Sigma) # 100 x d matrix
Q <- matrix(nrow = d,ncol = d)
for (i in 1: (d-1)) {
for (j in (i+1):d) {
Q[i,j] <- corQuadrant( X[,i], X[,j] )
}
}

Related

How to write a for loop to compute max of each column for a dataset in R?

A similar question in How to write a double for loop in r with choosing maximal element in one loop?.
The same setup:
If I want to sample theta[j] as first for j=1,2,...,71, then draw replicated( like 1000 times) yrep[k] form Bin(n[j], theta[j]), n[j] is known.
For theta[1], we have yrep[1,1], yrep[1,2], ..., yrep[1,1000]. Then for all theta[j], we will have a matrix of data set of yrep[i,j], i=1,...,71, j=1,..,1000.Then compute mean, max or min of each column yrep[1,1], yrep[1,2], yrep[1,3], ... yrep[1,71], we will get 1000 mean, max or min.
How to write this for loop?
I first try to write a loop to sample theta[j] and yrep. I do not know how to add a code to compute the maximal, mean, and minimal in this loop. I am not sure if this code is right:
theta<-NULL
yrep<-NULL
test<-NULL
k=1
for(i in 1:1000){
for(j in 1:71){
theta[j] <- rbeta(1,samp_A+y[j], samp_B+n[j]-y[j])
yrep[k]<-rbinom(1, n[j], theta[j])
k=k+1
}
t<-c(test, max(yrep))
}
Data is given in How to write a double for loop in r with choosing maximal element in one loop?:
#Data
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
2,1,5,2,5,3,2,7,7,3,3,2,9,10,4,4,4,4,4,4,4,10,4,4,4,5,11,12,
5,5,6,5,6,6,6,6,16,15,15,9,4)
n <-
c(20,20,20,20,20,20,20,19,19,19,19,18,18,17,20,20,20,20,19,19,18,18,25,24,
23,20,20,20,20,20,20,10,49,19,46,27,17,49,47,20,20,13,48,50,20,20,20,20,
20,20,20,48,19,19,19,22,46,49,20,20,23,19,22,20,20,20,52,46,47,24,14)
#Evaluate densities in grid
x <- seq(0.0001, 0.9999, length.out = 1000)
#Compute the marginal posterior of alpha and beta in hierarchical model Use grid
A <- seq(0.5, 15, length.out = 100)
B <- seq(0.3, 45, length.out = 100)
#Make vectors that contain all pairwise combinations of A and B
cA <- rep(A, each = length(B))
cB <- rep(B, length(A))
#Use logarithms for numerical accuracy!
lpfun <- function(a, b, y, n) log(a+b)*(-5/2) +
sum(lgamma(a+b)-lgamma(a)-lgamma(b)+lgamma(a+y)+lgamma(b+n-y)-
lgamma(a+b+n))
lp <- mapply(lpfun, cA, cB, MoreArgs = list(y, n))
#Subtract maximum value to avoid over/underflow in exponentiation
df_marg <- data.frame(x = cA, y = cB, p = exp(lp - max(lp)))
#Sample from the grid (with replacement)
nsamp <- 100
samp_indices <- sample(length(df_marg$p), size = nsamp,
replace = T, prob = df_marg$p/sum(df_marg$p))
samp_A <- cA[samp_indices[1:nsamp]]
samp_B <- cB[samp_indices[1:nsamp]]
df_psamp <- mapply(function(a, b, x) dbeta(x, a, b),
samp_A, samp_B, MoreArgs = list(x = x)) %>%
as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)

This is not very well tested.
There is no need for loops to sample from distributions included in base R, those functions are vectorized on their arguments. Code following the lines below should be able to do what the question asks for.
Ni <- 1000
Nj <- 17
theta <- rbeta(Ni*Nj, rep(samp_A + y, each = Ni), rep(samp_B + n - y, each = Ni))
yrep <- rbinom(Ni*Nj, n, theta)
test1 <- matrix(yrep, nrow = Ni)
mins1 <- matrixStats::colMins(test1)

R reverse function to solve for parameter when output is a fixed constant

If I have a function
estimator <- function(A,B) {
A*(B+23)
}
How can I reverse this function to find the value of A for B as a sequence between 0 and 120 (B=1,2,3,4,...,120) that would give a fixed result, say C = 20?
I would use it to map the values for which satisfy the equation A*(B+23)= C = 20 with B being a list b.list between 0 and 120, for c.list, of different C?
b.list <- seq(0,120,by=1)
c.list <- tibble(seq(10,32,by=2))
In the end, I would like to plot the lines of curves of the function for different C using purrr or similar.
I.e.: Given that the height of a tree in metres at age 100 will follow the function, C = A*(B+23), solve for A that will give the result C=10 when B, Age is a list of years between 0 and 120?
Here's a link showing what I'm trying to make!
Here's another one
Many thanks!

For the inverse it is a quick inversion :
A = C/(B+23)
One answer could be :
B <- seq(0, 120)
C <- seq(10, 32, 2)
A <- matrix(0,
nrow = length(B),
ncol = length(C))
for(i in 1:ncol(M)){
A[,i] <- C[i] / (B + 23)
}
matplot(B, A, type ="l", col = "black")
In case of a more complex function indeed you need an automatic solving problem. One way is to see it like an optimisation problem where you want to minimise the distance from C :
B <- seq(1, 120)
C <- seq(10, 32, 2)
A <- matrix(0,
nrow = length(B),
ncol = length(C))
fct <- function(A, B, C){
paramasi <- 25
parambeta<- 7395.6
paramb2 <- -1.7829
refB <- 100
d <- parambeta*(paramasi^paramb2)
r <- (((A-d)^2)+(4*parambeta*A*(B^paramb2)))^0.5
si_est <- (A+d+r)/ (2+(4*parambeta*(refB^paramb2)) / (A-d+r))
return(sum(si_est - C)^2)}
for(c in 1:length(C)){
for(b in 1:length(B)){
# fixe parameters + optimisation
res <- optim(par = 1, fn = fct, B = B[b], C = C[c])
A[b, c] <- res$par
}
}
matplot(B, A, type = "l", col = "black")
You need to be careful because in your case I think that you could find an analytical formula for the inverse which would be better.
Good luck !

Multi-data likelihood function and mle2 function from bbmle package in R

I have written a custom likelihood function that fits a multi-data model that integrates mark-recapture and telemetry data (sensu Royle et al. 2013 Methods in Ecology and Evolution). The likelihood function is designed to be flexible in terms of whether and how many covariates are specified for different linear models in different likelihood components which is determined by values supplied as function arguments (i.e., data matrices "detcovs" and "dencovs" in my code). The likelihood function works when I directly supply it to optimization functions (e.g., optim or nlm), but does not play nice with the mle2 function in the bbmle package. My problem is that I continually run into the following error: "some named arguments in 'start' are not arguments to the specified log-likelihood function". This is my first attempt at writing custom likelihood functions so I'm sure there are general coding conventions of which I'm unaware that make such tasks much more efficient and amendable to the mle2 function. Below is my likelihood function, code creating the staring value objects, and code calling the mle2 function. Any advice how to solve the error problem and general comments on writing cleaner functions is welcome. Many thanks in advance.
Edit: As requested, I have simplified the likelihood function and provided code to simulate reproducible data to which the model can be fit. Included in the simulation code are 2 custom functions and use of the raster function from the raster package. Hopefully, I have sufficiently simplified everything to enable others to troubleshoot. Again, many thanks for your help!
Jared
Likelihood function:
CSCR.RSF.intlik2.EXAMPLE <- function(alpha0,sigma,alphas=NULL,betas=NULL,n0,yscr=NULL,K=NULL,X=X,trapcovs=NULL,Gden=NULL,Gdet=NULL,ytel=NULL,stel=NULL,
dencovs=NULL,detcovs=NULL){
#
# this version of the code handles a covariate on log(Density). This is starting value 5
#
# start = vector of starting values
# yscr = nind x ntraps encounter matrix
# K = number of occasions
# X = trap locations
# Gden = matrix with grid cell coordinates for density raster
# Gdet = matrix with gride cell coordinates for RSF raster
# dencovs = all covariate values for all nGden pixels in density raster
# trapcovs = covariate value at trap locations
# detcovs = all covariate values for all nGrsf pixels in RSF raster
# ytel = nguys x nGdet matrix of telemetry fixes in each nGdet pixels
# stel = home range center of telemetered individuals, IF you wish to estimate it. Not necessary
# alphas = starting values for RSF/detfn coefficients excluding sigma and intercept
# alpha0 = starting values for RSF/detfn intercept
# sigma = starting value for RSF/detfn sigma
# betas = starting values for density function coefficients
# n0 = starting value for number of undetected individuals on log scale
#
n0 = exp(n0)
nGden = nrow(Gden)
D = e2dist(X,Gden)
nGdet <- nrow(Gdet)
alphas = alphas
loglam = alpha0 -(1/(2*sigma*sigma))*D*D + as.vector(trapcovs%*%alphas) # ztrap recycled over nG
psi = exp(as.vector(dencovs%*%betas))
psi = psi/sum(psi)
probcap = 1-exp(-exp(loglam))
#probcap = (exp(theta0)/(1+exp(theta0)))*exp(-theta1*D*D)
Pm = matrix(NA,nrow=nrow(probcap),ncol=ncol(probcap))
ymat = yscr
ymat = rbind(yscr,rep(0,ncol(yscr)))
lik.marg = rep(NA,nrow(ymat))
for(i in 1:nrow(ymat)){
Pm[1:length(Pm)] = (dbinom(rep(ymat[i,],nGden),rep(K,nGden),probcap[1:length(Pm)],log=TRUE))
lik.cond = exp(colSums(Pm))
lik.marg[i] = sum( lik.cond*psi )
}
nv = c(rep(1,length(lik.marg)-1),n0)
part1 = lgamma(nrow(yscr)+n0+1) - lgamma(n0+1)
part2 = sum(nv*log(lik.marg))
out = -1*(part1+ part2)
lam = t(exp(a0 - (1/(2*sigma*sigma))*t(D2)+ as.vector(detcovs%*%alphas)))# recycle zall over all ytel guys
# lam is now nGdet x nG!
denom = rowSums(lam)
probs = lam/denom # each column is the probs for a guy at column [j]
tel.loglik = -1*sum( ytel*log(probs) )
out = out + tel.loglik
out
}
Data simulation code:
library(raster)
library(bbmle)
e2dist <- function (x, y){
i <- sort(rep(1:nrow(y), nrow(x)))
dvec <- sqrt((x[, 1] - y[i, 1])^2 + (x[, 2] - y[i, 2])^2)
matrix(dvec, nrow = nrow(x), ncol = nrow(y), byrow = F)
}
spcov <- function(R) {
v <- sqrt(nrow(R))
D <- as.matrix(dist(R))
V <- exp(-D/2)
cov1 <- t(chol(V)) %*% rnorm(nrow(R))
Rd <- as.data.frame(R)
colnames(Rd) <- c("x", "y")
Rd$C <- as.numeric((cov1 - mean(cov1)) / sd(cov1))
return(Rd)
}
set.seed(1234)
co <- seq(0.3, 0.7, length=5)
X <- cbind(rep(co, each=5),
rep(co, times=5))
B <- 10
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
dencovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(dencovs)[[2]][3:4] <- c("dencov1","dencov2")
denr.list <- vector("list",2)
for(i in 1:2){
denr.list[[i]] <- raster(
list(x=seq(0,1,length=10),
y=seq(0,1,length=10),
z=t(matrix(dencovs[,i+2],10,10,byrow=TRUE)))
)
}
B <- 20
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
detcovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(detcovs)[[2]][3:4] <- c("detcov1","detcov2")
detcov.raster.list <- vector("list",2)
trapcovs <- matrix(0,J,2)
for(i in 1:2){
detr.list[[i]] <- raster(
list(x=seq(0,1,length=20),
y=seq(0,1,length=20),
z=t(matrix(detcovs[,i+2],20,20,byrow=TRUE)))
)
trapcovs[,i] <- extract(detr.list[[i]],X)
}
alpha0 <- -3
sigma <- 0.15
alphas <- c(1,-1)
beta0 <- 3
betas <- c(-1,1)
pixelArea <- (dencovs$y[2] - dencovs$y[1])^2
mu <- exp(beta0 + as.matrix(dencovs[,3:4])%*%betas)*pixelArea
EN <- sum(mu)
N <- rpois(1, EN)
pi <- mu/sum(mu)
s <- dencovs[sample(1:nrow(dencovs), size=N, replace=TRUE, prob=pi),1:2]
J <- nrow(X)
K <- 10
yc <- d <- p <- matrix(NA, N, J)
D <- e2dist(s,X)
loglam <- t(alpha0 - t((1/(2*sigma*sigma))*D*D) + as.vector(trapcovs%*%alphas))
p <- 1-exp(-exp(loglam))
for(i in 1:N) {
for(j in 1:J) {
yc[i,j] <- rbinom(1, K, p[i,j])
}
}
detected <- apply(yc>0, 1, any)
yscr <- yc[detected,]
ntel <- 5
nfixes <- 100
poss.tel <- which(s[,1]>0.2 & s[,1]<0.8 & s[,2]>0.2 & s[,2]<0.8)
stel.id <- sample(poss.tel,ntel)
stel <- s[stel.id,]
ytel <- matrix(NA,ntel,nrow(detcovs))
d <- e2dist(stel,detcovs[,1:2])
lam <- t(exp(1 - t((1/(2*sigma*sigma))*d*d) + as.vector(as.matrix(detcovs[,3:4])%*%alphas)))
for(i in 1:ntel){
ytel[i,] <- rmultinom(1,nfixes,lam[i,]/sum(lam[i,]))
}
Specify starting values and call mle2 function:
start1 <- list(alpha0=alpha0,sigma=sigma,alphas=alphas,betas=betas,n0=log(N-nrow(yscr)))
parnames(CSCR.RSF.intlik2.EXAMPLE) <- names(start)
out1 <- mle2(CSCR.RSF.intlik2.EXAMPLE,start=start1,method="SANN",optimizer="optim",
data=list(yscr=yscr,K=K,X=X,trapcovs=trapcovs,Gden=dencovs[,1:2],Gdet=detcovs[,1:2],
ytel=ytel,stel=stel,dencovs=as.matrix(dencovs[,3:4]),detcovs=as.matrix(detcovs[,3:4]))
)

Asymmetric eta square matrix in R

I have a function that calculates the eta square coefficient for the correlation between qualitative and quantitative variable:
eta <- function(x,y){
m <- mean(x,na.rm = TRUE)
sct <- sum((x-m)^2,na.rm = TRUE)
n <- table(y)
mk <- tapply(x,y,mean)
sce <- sum(n*(mk-m)^2)
return(ifelse(sct>0,sce/sct,0))
}
Imagine I have the following variables (just an example), with missing values in it:
a <- factor(c("M","NA","F","F","NA","M","F"))
b <- factor(c("y","y","y","n","n","n","y")
d <- c(2,5,4,8,9,6,4)
e <- c(5,7,8,5,6,9,7)
I need to program a function that returns a matrix with the eta coefficient for each combination of qualitative and quantitative variable x and y.
Something like this maybe (it doesnt work):
matrix<-function(x,y){
col.y<-ncol(y)
row.x<-nrow(x)
M<-matrix(ncol=col.y,nrow=row.x,dimnames = list(names(x), names(y)))
for(i in 1:col.y){
for(j in 1:row.x){
M[i,j]<-rap.corr(y[,i],x[,j])
}
}
return(M)
}
How can I do this? Thank you in advance..

You can do this using mapply:
qual = list(a=a, b=b)
quant = list(e=e,d=d)
m = matrix(mapply(eta, rep(quant, length(qual)), rep(qual, each=length(quant))), ncol=length(qual))
rownames(m) = names(quant)
colnames(m) = names(qual)
#> m
# a b
#e 0.01950355 0.0008865248
#d 0.25333333 0.7363333333

Crossed for loops: Pick i th element of first loop then loop completely through second loop

Suppose I have the following code:
X <- model.matrix(~factor(1:2))
beta <- c(1, 2)
I then draw 70 and 40 values from two multivariate normal distributions:
library(MASS)
S1 <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
S2 <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 4, 4, 2), ncol = 2))
As can be easily seen S1 is 70x2 matrix und S2 a 40x2 matrix.
Now I build a for loop in R:
z <- list()
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta + X %*% S1[1,] + X %*% S2[i,] + rnorm(2, mean = 0, sd = 0.45)
Y <- do.call(rbind, z)
}
This gives me a matrix that contains all combinations for the 40 elements in S2 with the 1st element of S1. What I want is to completely cross the two matrices S1 and S2. That is I want the for loop to pick out S1[1,] first, then iterate completely through S2[i,] (e.g. in an inner loop) and store the results in a matrix then pick out S1[2,] iterate again through S2[i,] and store the results in a matrix and so on. If I would need to give a name to what I am looking for I would say "crossed for loops". I find it incredibly hard to come up with R-code that will allow me to do this. Any hints would be appreciated.
Maybe the idea will get clearer with this example:
My idea is equivalent to construction 70 for-loops for every element in S1[i,] and binding the result in a 70*40*2x1 matrix:
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta+X %*% S1[1,]+X %*% S2[i,]+rnorm(2, mean = 0 , sd = sigma)
Y1 <- unname(do.call(rbind, z))
}
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta+X %*% S1[2,]+X %*% S2[i,]+rnorm(2, mean = 0 , sd = sigma)
Y2 <- unname(do.call(rbind, z))
}
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta+X %*% S1[3,]+X %*% S2[i,]+rnorm(2, mean = 0 , sd = sigma)
Y3 <- unname(do.call(rbind, z))
}
.
.
.
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta+X %*% S1[70,]+X %*% S2[i,]+rnorm(2, mean = 0 , sd = sigma)
Y70 <- unname(do.call(rbind, z))
}
Y <- rbind(Y1, Y2, Y3, …, Y70)
What I ideally would want is to do this with for-loops or any other flexible way that can handle different dimensions for S1 and S2.

OK. I might do a few things to make this as efficient as possible. First, we can pre-calculate all the matrix multiplication with
Xb <- X %*% beta
XS1 <- X %*% t(S1)
XS2 <- X %*% t(S2)
Then we can clculate all the combinations of the S1/S2 values with expand.grid
idx <- unname(c(expand.grid(A=1:ncol(XS1), B=1:ncol(XS2))))
Then we can define the transformation
fx<-function(a, b) {
t(Xb + XS1[,a, drop=F] + XS2[,b,drop=F] + rnorm(2, mean = 0, sd = 0.45))
}
we assume we will be passed an index for S1 and an index for S2. Then we combine the data as in your formula. Finally, we use this helper function and the indexes with a set of do.calls
xx <- do.call(rbind, do.call(Map,c(list(fx), idx)))
First we use Map to calculate all the combinations, then we use rbind to merge all the results. This actually produces a 2800x2 matrix. (70*40)*2. The rows are ordered with S1 moving the fastest, then S2.

I realised that this was not a problem with for-loops but with the way I stored the variables. The solution to what I want is:
library(MASS)
z <- list()
y <- list()
for(j in 1:dim(S1)[1]){
for(i in 1:dim(S2)[1]){
z[[i]] <- X %*% beta+X %*% S1[j,]+X %*% S2[i,]+matrix(rnorm(2, mean = 0 , sd = sigma), ncol = 2, nrow = 2)
Z <- unname(do.call(rbind, z))
}
y[[j]] <- Z
Y <- unname(do.call(rbind, y))
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Apply a pairwise function on the columns of a matrix in R - r

Related

How to write a for loop to compute max of each column for a dataset in R?

R reverse function to solve for parameter when output is a fixed constant

Multi-data likelihood function and mle2 function from bbmle package in R

Asymmetric eta square matrix in R

Crossed for loops: Pick i th element of first loop then loop completely through second loop

Categories

Resources