I'm looking for a way to properly integrate my function:
lik = function(par, x){
cl = c()
for(i in 1:ncluster){
sub = c()
for(j in 1:nsub){
times = t[[i]][[j]]
m = c(1,t[[i]][j],t(cov[[i]][j,]))
repmat = cbind(1,1:t[[i]][j],matrix(rep(cov[[i]][j,], times),times, 3,byrow=T))
sub[j] = d[[i]][j]*m%*%c(par[-5],x)-sum(log((1+exp(repmat%*%c(par[-5],x)))))
}
cl[i] = sum(sub)
}
return(exp(cl))
}
function lik (which is likelihood) takes x, vector par of length 5, and yields a vector of likelihood at x at each cluster. For example,
> lik(1:5,1)
[1] 4.640101e-30 3.632315e-44 5.348611e-09 1.121790e-27 1.696704e-98
> #number of clusters=5
I want to integrate out x so that I can obtain the vector of marginalized pdf at each cluster, but function integrate or any other numerical integration packages are only capable of integrating scalar function. I've searched questions relating to this, and maybe Vectorization is the key to solving this problem, but I just do not know how.
I will really appreciate if you can give me any help. Thanks
Typically I recommend converting a function of 2 variables to a function of one variable prior to integrating as follows.
myfunc <-function(x,y){ stuff}
intfunc <-function(x){myfunc(x,y)}
integrate(intfunc,x, etc)
Related
Roughly speaking, I seek to solve a maximation problem in R, where the objective function has the following structure: log[f(theta)*g(theta)]. Thus I want to solve
Max log[f(theta)*g(theta)]
The problem comes from the fact that g(theta) is obtained from another (minimization) problem in theta with restrictions. g(theta) is defined as:
g(theta) = argmin {Min h(x,theta)*g}.
Since h(x,theta) depends on theta, the optimal g that minimizes h(x,theta)*g must be a function of theta.
My approach, so far, has been to first define the function constrOptim(.) such that I can tell R that I want to minimize h(x,theta)*g subject to some restrictions, and then I incorporate that optimal g(.) into the second function (in theta). With this, I use the function optim(.) to try to max log[f(theta)*g(theta)].
Here is my code:
First, the function problem_min, which is a function of theta, gives the optimal g(theta), which is obtained as a solution to the minimization problem with constraints (sol_min).
problem_min <- function(theta_est){
reg_theta_int <- reg_matrix%*%theta_est
for(i in 1:(n*T)){
for (j in 1:length(alpha)){
L_1[i,j] = exp(alpha[j]+reg_theta_int[i])/(1+exp(alpha[j]+reg_theta_int[i]))
}
}
y_tilde <- full_data$y - L_1[,1]
L_1_tilde <- L_1[,-1] - L_1[,1]
eval_funct_g <- function(g){
return((sum((y_tilde-L_1%*%g)^2))*0.5*(1/n)*(1/T))
}
sol_min <- constrOptim(theta = g_int, f = eval_funct_g, grad = NULL, ui = R, ci = r, mu = 1e-04,
method = "Nelder-Mead",
outer.iterations = 100, outer.eps = 1e-05,
hessian = FALSE)
g_theta = c(1-sum(sol_min$par), sol_min$par)
return(g_theta)
}
Once I have the optimal g(theta), which is a vector of numbers, I plug in log[f(theta)*g(theta)] to maximize the whole expression using optim(.) :
funct_f_g <- function(theta_est){
full_data$reg_theta_est <- reg_matrix%*%theta_est
for(i in 1:n){
for (j in 1:length(alpha)){
for(t in 1:T){
product[t] = exp(full_data$y[full_data$t==t & full_data$id==i]*(alpha[j]+full_data$reg_theta_est[full_data$t==t & full_data$id==i]))/(1+exp(alpha[j]+full_data$reg_theta_est[full_data$t==t & full_data$id==i]))
}
L_ml[i,j] = prod(product)
}}
return(sum(log(L_ml%*%problem_min(theta_est))))
}
sol_ml <- optim(par = theta_int, fn = funct_f_g, method=c("Nelder-Mead"),
lower=-Inf, upper=Inf,
control=list(fnscale=-1),
hessian = FALSE)
theta_opt <- sol_ml$par
}
sol_ml intends to solve Max log[f(theta)*g(theta)] while incorporating the fact that g(theta) should be previously chosen optimally.
A variable_int means that it gives an initial value.
When I run the previous code R tells me that the objective function in optim(.) cannot be evaluated. Nevertheless, when I evaluate funct_f_g at some give theta_est it runs perfectly. Thus I think that there is something wrong with the optim(.) function regarding how I am trying to tell R that the problem has the previous structure.
If you have a different approach to approach my problem, or an explanation about what I am not doing correctly, it will be great!
I know that I am not giving a description of all the matrices and operations that are involved in the previous problem. I skip this for simplicity, hoping that the general structure of the problems can be understood.
I am trying to vectorise the following function which calculates the Hoeffdings distance between two random variable on [0,1]^2, in a discretise way.
Indeed, if you use the hoeffd function from the Hmisc package, it provides you with a fortran implementation ( that you can find here : https://github.com/harrelfe/Hmisc/blob/master/src/hoeffd.f ), but only give back the maximum of the matrix i'm trying to analyse here. I'm here interested in the place of the maximum, and hence i need to compute the whole matrix.
Here is my current implementation :
hoeffding_D <- function(x,y){
n = length(x)
indep <- outer(0:n,0:n)/(n)^2
bp = list(
c(0,sort(x)) + (c(sort(x),1) - c(0,sort(x)))/2,
c(0,sort(y)) + (c(sort(y),1) - c(0,sort(y)))/2
)
pre_calc <- t(outer(rep(1,n+1),x)<=bp[[1]])
# This is the problematic part :
dep <- t(sapply(bp[[2]],function(bpy){
colMeans(pre_calc*(y<=bpy))
}))
rez <- abs(dep-indep)
return(rez)
}
To use it, consider the folloiwing exemple :
library(copula)
# for 10 values, it's fast enough, but for 1000 it takes a lot of time..
x = pobs(rnorm(10),ties.method = "max")
y = pobs(rnorm(10),ties.method = "max")
hoeffding_D(x,y)
I already suppressed a first sapply via the use of the outer function, but i cant get rid of the other. The issue is that the comparaison x<=bpx must be done for all x and for all bpx, and the same for y, altogether this is a lot of dimensions to the problem...
Do you have an idea on how to speed it up ?
in numerical analysis we students are obligated to implement code in R that given a function f(x) finds its Fourier interpolation tN(x) and computes the interpolation error
$||f(x)-t_{N}(x)||=\int_{0}^{2\pi}$ $|f(x)-t_{N}(x)|^2$
or a variety of different $N$
I first tried to compute the d-coefficients according to this formular:
$d = \frac 1N M y$
with M denoting the DFT matrix and y denoting a series of equidistant function values with
$y_j = f(x_j)$ and
$x_j = e^{\frac{2*pi*i}N*j}$
for $j = 1,..,N-1$.
My goal was to come up with a sum that can be described by:
$t_{N}(x) = \Sigma_{k=0}^{N-1} d_k * e^{i*k*x}$
Which would be easier to later integrate in sort of a subsequently additive notation.
f <- function(x) 3/(6+4*cos(x)) #first function to compare with
g <- function(x) sin(32*x) #second one
xj <- function(x,n) 2*pi*x/n
M <- function(n){
w = exp(-2*pi*1i/n)
m = outer(0:(n-1),0:(n-1))
return(w^m)
}
y <- function(n){
f(xj(0:(n-1),n))
}
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
script <- paste(d[1])
for(i in 2:n)
script <- paste0(script,paste0("+",d[i],"*exp(1i*x*",i,")"))
#trans <- sum(d[1:n] * exp(1i*x*(0:(n-1))))
return(script)
}
The main purpose of the transform function was, initially, to return a function - or rather: a mathematical expression - which could then be used in order to declarate my Fourier Interpolation Function. Problem is, based on my fairly limited knowledge, that I cannot integrate functions that still have sums nested in them (which is why I commented the corresponding line in the code).
Out of absolute desperation I then tried to paste each of the summands in form of text subsequently, only to parse them again as an expression.
So the main question that remains is: how do I return mathmatical expressions in a manner that allow me to use them as a function and later on integrate them?
I am sincerely sorry for any misunderstanding or confusion, as well as my seemingly amateurish coding.
Thanks in advance!
A function in R can return any class, so specifically also objects of class function. Hence, you can make trans a function of x and return that.
Since the integrate function requires a vectorized function, we use Vectorize before outputting.
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
## Output function
trans <- function(x) sum(d[1:n] * exp(1i*x*(0:(n-1))))
## Vectorize output for the integrate function
Vectorize(trans)
}
To integrate, now simply make a new variable with the output of transformFunction:
myint <- transformFunction(n = 10,f = f)
Test: (integrate can only handle real-valued functions)
integrate(function(x) Re(myint(x)),0,2)$value +
1i*integrate(function(x) Im(myint(x)),0,2)$value
# [1] 1.091337-0.271636i
I'm using the the gaussquad package to evaluate some integrals numerically.
I thought the ghermite.h.quadrature command worked by evaluating a function f(x) at points x1, ..., xn and then constructing the sum w1*f(x1) + ... + wn*f(xn), where x1, ..., xn and w1, ..., wn are nodes and weights supplied by the user.
Thus I thought the commands
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
would yield the same output for any function f, where ''rule'' is a dataframe which stores the nodes in a column labelled ''x'' and the weights in a column labelled "w". For many functions the output is indeed the same, but for some functions I get very different results. Can someone please help me understand this discrepancy?
Thanks!
Code:
n.quad = 50
rule = hermite.h.quadrature.rules(n.quad)[[n.quad]]
f <- function(z){
f1 <- function(x,y) pnorm(x+y)
f2 <- function(y) ghermite.h.quadrature(f1,rule,y = y)
g <- function(x,y) x/(1+y) / f2(y)*pnorm(x+y)
h <- function(y) ghermite.h.quadrature(g,rule,y=y)
h(z)
}
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
Ok, that problem got me interested.
I've looked into gaussquad sources, and clearly author is not running sapply internally, because all integrands/function shall return vector on vector argument.
It is clearly stated in documentation:
functn an R function which should take a numeric argument x and possibly some parameters.
The function returns a numerical vector value for the given argument x
In case where you're using some internal functions, they're written that way, so everything works.
You have to rewrite your function to work with vector argument and return back a vector
UPDATE
Vectorize() works for me to rectify the problem, as well as simple wrapper with sapply
vf <- function(z) {
sapply(z, f)
}
After either of those changes, results are identical: 0.2029512
I am trying to create a matrix n by k with k mvn covariates using a loop.
Quite simple but not working so far... Here is my code:
n=1000
k=5
p=100
mu=0
sigma=1
x=matrix(data=NA, nrow=n, ncol=k)
for (i in 1:k){
x [[i]]= mvrnorm(n,mu,sigma)
}
What's missing?
I see several things here:
You may want to set the random seed for replicability (set.seed(20430)). This means that every time you run the code, you will get exactly the same set of pseudorandom variates.
Next, your data will just be independent variates; they won't actually have any multivariate structure (although that may be what you want). In general, if you want to generate multivariate data, you should use ?mvrnorm, from the MASS package. (For more info, see here.)
As a minor point, if you want standard normal data, you don't need to specify mu = 0 and sigma = 1, as those are the default values for rnorm().
You don't need a loop to fill a matrix in R, just generate as many values as you like and add them directly using the data= argument in the matrix() function. If you really were committed to using a loop, you should probably use a double loop, so that you are looping over the columns, and within each loop, looping over the rows. (Note that this is a very inefficient way to code in R--although I do things like that all the time ;-).
Lastly, I can't tell what p is supposed to be doing in your code.
Here is a basic way to do what you seem to be going for:
set.seed(20430)
n = 1000
k = 5
dat = rnorm(n*k)
x = matrix(data=dat, nrow=n, ncol=k)
If you really wanted to use loops you could do it like this:
mu = 0
sigma = 1
x = matrix(data=NA, nrow=n, ncol=k)
for(j in 1:k){
for(i in 1:n){
x[i,j] = rnorm(1, mu, sigma)
}
}
define the matrix first
E<-matrix(data=0, nrow=10, ncol=10);
run two loops to iterate i for rows and j for columns, mine is a exchangeable correlation structure
for (i in 1:10)
{
for (j in 1:10)
{
if (i==j) {E[i,j]=1}
else {E[i,j]=0.6}
}
};
A=c(2,3,4,5);# In your case row terms
B=c(3,4,5,6);# In your case column terms
x=matrix(,nrow = length(A), ncol = length(B));
for (i in 1:length(A)){
for (j in 1:length(B)){
x[i,j]<-(A[i]*B[j])# do the similarity function, simi(A[i],B[j])
}
}
x # matrix is filled
I was thinking in my problem perspective.