faster 'outer' implementation in R - r

I was trying to use outer() function in R to create a matrix by pairwise evaluation of elements in a vector of dimension n. Specifically, let x be n-dimensional vector and I want to compare each pair of the elements of x. To do so, I use the following naive implementation using outer() function.
# these codes are example
n <- 500
x <- rnorm(n)
f <- function(x, y){
as.numeric(x<y)+0.5*as.numeric(x==y)
}
#new.mat <- outer(seq_len(n), seq_len(n), f) this was posted wrongly
new.mat <- outer(x, x, f) # edited
This implementation is extremely slow when n increases, and I would like to know an efficient way of doing this job. I really appreciate if you introduce me to your trick.
Thanks,
Alemu

Related

How to create multiple matrices based on a formula using two data frames and sum those matrices up in one go?

I'm fairly new to R and am thus not that knowledgeable yet about its different functionalities. I'm wondering if there is a more efficient way to replicate the following other than writing and running 230 lines of code.
I have two matrices, Z and E, which contain continuous numerical data and have the dimensions 7x229 and 17x229 respectively. For each column (so 229 times) I want to create a new 119x119 matrix by using the (repeated) formula below
ZZEE1 <- kronecker((Z[,1] %*% t(Z[,1])), (E[,1] %*% t(E[,1])))
ZZEE2 <- kronecker((Z[,2] %*% t(Z[,2])), (E[,2] %*% t(E[,2])))
ZZEE3 <- kronecker((Z[,3] %*% t(Z[,3])), (E[,3] %*% t(E[,3])))
ZZEE4 <- kronecker((Z[,4] %*% t(Z[,4])), (E[,4] %*% t(E[,4])))
#...
ZZEE228 <- kronecker((Z[,228] %*% t(Z[,228])), (E[,228] %*% t(E[,228])))
ZZEE229 <- kronecker((Z[,229] %*% t(Z[,229])), (E[,229] %*% t(E[,229])))
After this is done, I want to add all 229 matrices up into one matrix like this (not complete)
Sum_ZZEE <- ZZEE1 + ZZEE2 + ZZEE3 + ZZEE4 + ZZEE228 + ZZEE229 #Sum of all matrices from ZZEE1 to ZZEE229
Is there a quicker fix out there that will do exactly this? I have tried to find an answer online but did not find something that worked or something that I understood to the extent that I could modify it to my own data/code. As far as I understood it, there might be a fix with the function() function, but I would not know how to code it correctly. Getting the 'Sum_ZZEE' matrix is the final goal, I do not necessarily need the individual matrices stored in the workspace. Much obliged!
First construct a list of matrices: the following two code chunks are equivalent, use whichever is clearer to you.
ZZ_list <- lapply(1:229,
function(i) kronecker((Z[,i] %*% t(Z[,i])), (E[,i] %*% t(E[,i])))
)
or
ZZ_list <- list()
for (i in 1:229) {
ZZ_list[[i]] <- kronecker((Z[,i] %*% t(Z[,i])), (E[,i] %*% t(E[,i])))
}
Then use Reduce() (unfortunately sum() doesn't work the way you want):
answer <- Reduce("+", ZZ_list)
There might be some super-clever answer that works in pure linear algebra (e.g. with stacking/unstacking operators) ...

Problems with Gaussian Quadrature in R

I'm using the the gaussquad package to evaluate some integrals numerically.
I thought the ghermite.h.quadrature command worked by evaluating a function f(x) at points x1, ..., xn and then constructing the sum w1*f(x1) + ... + wn*f(xn), where x1, ..., xn and w1, ..., wn are nodes and weights supplied by the user.
Thus I thought the commands
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
would yield the same output for any function f, where ''rule'' is a dataframe which stores the nodes in a column labelled ''x'' and the weights in a column labelled "w". For many functions the output is indeed the same, but for some functions I get very different results. Can someone please help me understand this discrepancy?
Thanks!
Code:
n.quad = 50
rule = hermite.h.quadrature.rules(n.quad)[[n.quad]]
f <- function(z){
f1 <- function(x,y) pnorm(x+y)
f2 <- function(y) ghermite.h.quadrature(f1,rule,y = y)
g <- function(x,y) x/(1+y) / f2(y)*pnorm(x+y)
h <- function(y) ghermite.h.quadrature(g,rule,y=y)
h(z)
}
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
Ok, that problem got me interested.
I've looked into gaussquad sources, and clearly author is not running sapply internally, because all integrands/function shall return vector on vector argument.
It is clearly stated in documentation:
functn an R function which should take a numeric argument x and possibly some parameters.
The function returns a numerical vector value for the given argument x
In case where you're using some internal functions, they're written that way, so everything works.
You have to rewrite your function to work with vector argument and return back a vector
UPDATE
Vectorize() works for me to rectify the problem, as well as simple wrapper with sapply
vf <- function(z) {
sapply(z, f)
}
After either of those changes, results are identical: 0.2029512

R - function input and optimization

I usually have trouble inputing functions in R but they are always simple functions that I manage to work it out. However now I have a very complicated problem at hand that requires functions that has unknowns, summation and a matrix. And I am clueless where to begin. (This is not my homework question, just trying to work out something using a different method, hoping it works)
So I want to input a function:
A=∑i=1 N exp ^ [ ∑j=1 M Matrix ij * unknownj ]
and then minimize the function:
B= log A - ∑j=1 M unknown j * C j
so my goal is to find the j unknown parameters that minimizes function B.
But this is very complicated. You do not have to give me an answer directly. You can use another example to answer my question indirectly. Any help/tips/guidance is appreciated.
Let's see if we can break the problem into smaller things:
Let's name some variables first:
Let Q be an matrix with N rows and M columns
Let x be a (column) vector of length M (for a moment, think it's not an "unknown")
Let C be a (column) vector of length M
Notice that both A and B will be "scalars" (or, in R parlance, 1x1 vectors).
Hint: In R, you can do matrix multiplication using the %*% operator. See Quick-R: Matrix algebra.
Working on function A
Q %*% x is the product inside the sum which is inside the exponential function, so:
A <- function (Q, x) {
y <- Q %*% x # This will be a (column) vector of length `N`
return(sum(exp(y)) # This will be a scalar (more precisely, a 1x1 vector)
}
Not so hard, is it?
Working on function B
B <- function(Q, C, x) {
y <- sum(x * C) # or, since both x and C are column vectors:
# y <- t(x) %*% C
a <- A(Q, x)
return(log(a) - y)
}
So, that's how you would input the functions.
As for the optimization, I suggest you take a look to the optimx package; you'll need to supply starting values for vector x.

Integration of a vector return one value

I am using R to do some multivariate analysis. For this work I need to integrate the trivariate PDF.Since I want to use this in a MLE, a want a vector of integration. Is there a way to make Integratebring a vector instead of one value.
Here is simple example:
f1=function(x, y, z) {dmvnorm(x=as.matrix(cbind(x,y,z)), mean=c(0,0,0), sigma=sigma)}
f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))
integrate(Vectorize(function(x) {f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))}), lower = - Inf, upper = -1)$value
Error in integrate(Vectorize(function(x) { : evaluation of function gave a result of wrong length
To integrate a function of one variable, with vector values,
you can transform the function into n functions with real values,
and integrate each of them.
This is very inefficient (when integrating the i-th function,
I evaluate all the functions, and discard all but one value).
# Function to integrate
d <- rnorm(10)
f <- function(x) dnorm(d, mean=x)
# Integrate those n functions separately.
n <- length(f(1))
r <- sapply( 1:n,
function(i) integrate(
Vectorize(function(x) f(x)[i]),
lower=-Inf, upper=0
)$value
)
r
For 2-dimensional integrals, you can check pracma::integral2,
but the same manipulation (transforming a bivariate function with vector values
into n bivariate functions with real values) will probably be needed.

R: looping to search for max of non-monotonic function

Refer to the R code below. The function (someRfunction) operates on a vector and returns a scalar value. The data are pairs (x,y), where x and y are vectors of length n, which may be large.
I want to know the value of x* such that the result of someRfunction on y where {x>x*} is maximized. The function operates on y values and is non-monotonic in x*. I need to evaluate for all x* (i.e. each element of x). Speed is not an issue if executed once, but the code would be executed many times in a simulation. Is there any way to make this code more efficient/faster?
### x and y are vectors of length n
### sort x and y such that they are ordered by descending x
xord <- x[order(-x)]
yord <- y[order(-x)]
maxf <- -99999
maxcut <- NA
for (i in 1:n) {
### yi is a subvector of y that corresponds to y[x>x{i}]
### where x{i} is the (n-i+1)th order statistic of x
yi <- yord[1:(i-1)]
fxi <- someRfunction(yi)
if (fxi>maxf) {
maxf <- fxi
maxcut <- xord[i]
}
}
Thanks.
Edit: let someRfunction(yi)=t.test(yi)$statistic.
If you can say anything more about the function, particularly whether it is smooth and whether its gradient can be determine, you will get a better answer. At the moment the only increase in speed will be modest due to the ability to pre-specify a vector to hold the results, omit that if-max clause and then use which.max() on the vector. You might want to look at the function optimx in package "optimx".

Resources