Issue with the dimension of matrix being optimised in R - r

I am attempting to calculate some weights in order to perform an indirect treatment comparison using R. I have altered some code slightly, in order to reflect that I am only centring the mean. However, this code will not run.
I believe this is due to the a1 matrix having two columns instead of one, but I really can't work out how to change this. I have tried adding a column of zeros and ones to the matrix, but I'm not sure if this will give me a correct result.
Of course, this may not be the issue at all, but I fail to see what else could be causing this. I have included the code and any advice would be appreciated.
# Objective function
objfn <- function(a1, X){
sum(exp(X %*% a1))
}
# Gradient function
gradfn <- function(a1, X){
colSums(sweep(X, 1, exp(X %*% a1), "*"))
}
X.EM.0 = data$A-age.mean
# Estimate weights
print(opt1 <- optim(par = c(0,0), fn = objfn, gr = gradfn, X = X.EM.0, method = "BFGS"))
a1 <- opt1$par

Such a simple solution, I'm slightly embarrassed to have posted this.
par=c(0,0) should be altered to match the columns of data. Here it should have been changed to one.

Related

Solve non-linear equations using "nleqslv" package

I tried to solve the these non-linear equations by using nleqslv. However it does not work well. I do know the reason why it does not because I didn't separate the two unknowns to different sides of the equation.
My questions are: 1, Are there any other packages that could solve this kind of
equations?
2, Is there any effective way in R that could help me rearrange
the equation so that it meets the requirement of the package
nleqslv?
Thank you guys.
Here are the codes and p[1] and p[2] are the two unknowns I want to solve.
dslnex<-function(p){
p<-numeric(2)
0.015=sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad)
cum_dr<-0
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
mid<-0
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
0.4=(sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
}
pstart<-c(-0.000679354,-4.203065891)
z<- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
Following up on my comment I have rewritten your function as follows correcting errors and inefficiencies.
Errors and other changes are given as inline comments.
# no need to use dslnex as name for your function
# dslnex <- function(p){
# any valid name will do
f <- function(p) {
# do not do this
# you are overwriting p as passed by nleqslv
# p<-numeric(2)
# declare retun vector
y <- numeric(2)
y[1] <- 0.015 - (sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad))
# do not do this
# cum_dr is initialized as a scalar and will be made into a vector
# which will be grown as a new element is inserted (can be very inefficient)
# cum_dr<-0
# so declare cum_dr to be a vector with length(label) elements
cum_dr <- numeric(length(label))
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
# same problem as above
# mid<-0
mid <- numeric(length(label))
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
y[2] <- 0.4 - (sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
# return vector y
y
}
pstart <-c(-0.000679354,-4.203065891)
z <- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
nleqslv is intended for solving systems of equations of the form f(x) = 0 which must be square.
So a function must return a vector with the same size as the x-vector.
You should now be able to proceed provided your system of equations has a solution. And provided there are no further errors in your equations. I have my doubles about the [1:i] in the expression for cum_dr and the expression for mid[i]. The loop calculating mid possibly can be written as a single statement: mid <- cumsum(cum_dr)/2. Up to you.

Vectorising two similar functions in R works for one

Today, I came across a problem: two almost identical functions work as intended before vectorisation, but after it, one works fine, and another one returns an error.
I am examining the robustness of various estimators with respect to different transformations of residuals and aggregating functions. Quantile Regression and Least Median of Squares are particular cases of what I am doing.
So I wrote the following code to see how the Least Trimean of Squares is going to work and found out that it works fine if model parameters are supplied as different arguments, but fails if they come in a vector. For instance, I need the first function for plotting (it is convenient to use outer(...) to get a value matrix for persp or just supply f(x, y) to persp3d from library(rgl), but the second one for estimation (R optimisers are expecting a vector of inputs as the first argument over which the minimisation is going to be done).
MWE:
set.seed(105)
N <- 204
x <- rlnorm(N)
y <- 1 + x + rnorm(N)*sqrt(.1+.2*x+.3*x^2)
# A simple linear model with heteroskedastic errors
resfun <- function(x) return(x^2)
# Going to minimise a function of squared residuals...
distfun <- function(x) return(mean(quantile(x, c(0.25, 0.5, 0.5, 0.75))))
# ...which in this case is the trimean
penalty <- function(theta0, theta1) {
r <- y - theta0 - theta1*x
return(distfun(resfun(r)))
}
pen2 <- function(theta) {
r <- y - theta[1] - theta[2]*x
return(distfun(resfun(r)))
}
penalty(1, 1) # 0.5352602
pen2(c(1, 1)) # 0.5352602
vpenalty <- Vectorize(penalty)
vpen2 <- Vectorize(pen2)
vpenalty(1, 1) # 0.5352602
vpen2(c(1, 1))
Error in quantile.default(x, c(0.25, 0.5, 0.5, 0.75)) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Why does vpen2, being vectorised pen2, choke even on a single input?
As jogo pointed out, vpen2 reads the elements of the input vector and tries to take the first one. The right way to go is to use something like
a <- matrix(..., ncol=2)
apply(a, 1, pen2)
This will return a vector of values from vpar2 evaluated at each row of the matrix.

Problems with Gaussian Quadrature in R

I'm using the the gaussquad package to evaluate some integrals numerically.
I thought the ghermite.h.quadrature command worked by evaluating a function f(x) at points x1, ..., xn and then constructing the sum w1*f(x1) + ... + wn*f(xn), where x1, ..., xn and w1, ..., wn are nodes and weights supplied by the user.
Thus I thought the commands
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
would yield the same output for any function f, where ''rule'' is a dataframe which stores the nodes in a column labelled ''x'' and the weights in a column labelled "w". For many functions the output is indeed the same, but for some functions I get very different results. Can someone please help me understand this discrepancy?
Thanks!
Code:
n.quad = 50
rule = hermite.h.quadrature.rules(n.quad)[[n.quad]]
f <- function(z){
f1 <- function(x,y) pnorm(x+y)
f2 <- function(y) ghermite.h.quadrature(f1,rule,y = y)
g <- function(x,y) x/(1+y) / f2(y)*pnorm(x+y)
h <- function(y) ghermite.h.quadrature(g,rule,y=y)
h(z)
}
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
Ok, that problem got me interested.
I've looked into gaussquad sources, and clearly author is not running sapply internally, because all integrands/function shall return vector on vector argument.
It is clearly stated in documentation:
functn an R function which should take a numeric argument x and possibly some parameters.
The function returns a numerical vector value for the given argument x
In case where you're using some internal functions, they're written that way, so everything works.
You have to rewrite your function to work with vector argument and return back a vector
UPDATE
Vectorize() works for me to rectify the problem, as well as simple wrapper with sapply
vf <- function(z) {
sapply(z, f)
}
After either of those changes, results are identical: 0.2029512

Find minimums with R (1 Variable X, n times a fixed parameter U)

I'm trying to minimize a function f(X,U) = (X*log(X)-1/(1-U))^2
where U=(U_1,...,U_n) ~ U(0,1), that means I have n amount of fixed U's and want to find the min of:
(x_1*ln(x_1)-1/(1-u_1))^2
(x_2*ln(x_2)-1/(1-u_2))^2
......
(x_n*ln(x_n)-1/(1-u_n))^2
For that, I wanted to use the optim function.
I have defined:
n <- 10^3
U <- sort(runif(n,min=0,max=1))
X <- c()
Xsolution<- c()
f <- function(X,U){
return(-(X*log(X)-(1/(1-U)))^2)
} #-, because min(f) = max(-f)
now I have no idea how to do this with optim()? I always get the following error for the following code:
for(i in 1:n){
Xsolution[i] <- optim(f(X,U[i])
}
Error in log(X) : non-numeric argument to mathematical function
Sidenote: I would welcome a method without a for-loop, since for great n, it will take too long. Maybe you can help me get it work with sapply? Or an alternative way?
Alternatively, I thought I got it working with optimize(...,maximize=FALSE,..):
f <- function (X, a) ((X*log(X)-(1/(1-a)))^2)
for (i in 1:n){
xmin[i] <- optimize(f, c(0, 10000), tol = 0.0001, a = U[i])
}
This doesn't work either properly...
Also, the problem may be that it will take tooooo long. I want to do it with n=10^6. But I'm quite sure there has to be a way doing it without a for-loop? I think the for-loop is the problem that makes this take ages. Please help me, I've been sitting on this problem for ages and it's quite frustrating.
Since X * log(X) = 1 / (1 - U[i]) can be solved numerically for any U[i], there is a solution for each distinct U[i] so any of the (X*ln(X)-1/(1-U[i]))^2 can be driven to zero and therefore there is a solution for each distinct U[i]. If typically the U[i] are all distinct that means there are length(U) solutions. The solutions are given by (can omit the unique if the U[i] are all distinct):
f <- function (X, a) ((X*log(X)-(1/(1-a)))^2)
unique(sapply(U, function(a) optimize(f, c(0, 1000000), a = a)$minimum))

How do I repeat a calculation for each row of a matrix in R?

I am very very new to programming and R. I have tried to find an answer to my question, but part of the problem is I don't know exactly what to search.
I am trying to repeat a calculation (statistical distance) for each row of a matrix. Here is what I have so far:
pollution1 <-as.matrix(pollution[,5:6])
ss <- var(pollution1)
ssinv <- solve(ss)
xbar <- colMeans(pollution1)
t(pollution1[1,]-xbar)%*%ssinv%*%(pollution1[1,]-xbar)
This gets me only the first statistical distance, but I don't want to retype this line with a different matrix row to get all of them.
From what I have read, I may need a loop or to use apply(), but haven't had success on my own. Any help with this, and advice on how to search for help so I don't need to post, would be appreciated.
Thank you.
You might also consider the mahalanobis function: from ?mahalanobis,
Returns the squared Mahalanobis distance of all rows in ‘x’ and
the vector mu = ‘center’ with respect to Sigma = ‘cov’. This is
(for vector ‘x’) defined as
D^2 = (x - mu)' Sigma^-1 (x - mu)
Of course, it's good to learn how to use apply too ...
What about just using apply
apply(pollution1, 1, function(i) t(i-xbar) %*% ssinv %*% (i-xbar))
Also, it's helpful if you make your example reproducible, for example:
pollution1 = matrix(rnorm(100), ncol=2)
ss = var(pollution1)
ssinv = solve(ss)
xbar = colMeans(pollution1)
t(pollution1[1,]-xbar) %*% ssinv %*% (pollution1[1,]-xbar)

Resources