Solve equation with datatable in R - r

I am working with a large dataset in R using data table. I need to solve an equation and find the value of x in the expression:
data[,mean(pnorm(qnorm(var1)+x))]= 0.07
I have tried to use the function optimx using the following code:
library(optimx)
fnToFindRoot = function(x) {
data[,mean(pnorm(qnorm(var1)+x))]}
rootSearch = optimx(0.07, fnToFindRoot)
str(rootSearch)
fnToFindRoot(rootSearch$par)
But the produced result is not correct. Can someone help me to solve this equation?

As it's only doing optimisation over one variable, optimize should work fine e.g.
fnToFindRoot = function(x, a=0.07) {
y <- data[,(mean(pnorm(qnorm(var1)+x)) - a)^2]
print(sprintf("x: %s, y:%s", x, y))
y
}
rootSearch = optimize(fnToFindRoot, interval=c(-5, 5), a=0.07)
fnToFindRoot(rootSearch$minimum)
The problem with the way you had it setup is that the optim function is always trying to minimise the objective. The way you were writing it, it was trying to minimise mean(pnorm(qnorm(var1)+x)) with 0.07 as the starting value of x. Instead, you want to get the objective as close to 0.07 as possible, so minimise (mean(pnorm(qnorm(var1)+x)) - a)^2.
The interval controls the range of x that optimize can use
edit: I was using made up data, so check if rootSearch$minimum works for you. My made up data:
set.seed(1)
data <- data.table()
data[, var1 := runif(100, 0.04, 0.45)]
> fnToFindRoot(rootSearch$minimum)
[1] "x: -0.857685927870974, y:4.1043516952502e-13"
[1] 4.104352e-13

Related

How to solve nonlinear equations in R with Controls

I try to solve nonlinear equations with controls.
Here is my code:
fun <- function(x) {
b0 <- (0.64*1+(1-0.64)*x[1]*(x[2]*x[1]-1)+x[1]*1*(1-x[2])*x[3])/(x[1]-1) -1805*2.85*0.64
b1plus <- (0.64*1+(1-0.64)*x[1]*(x[2]*x[1]-1.01)+x[1]*1.01*(1-x[2])*x[3])/(1.01*(x[1]-1)) -1805*2.85*0.64*(1+0.00235)
b1minus <- (0.64*1+(1-0.64)*x[1]*(x[2]*x[1]-0.99)+x[1]*0.99*(1-x[2])*x[3])/(0.99*(x[1]-1)) -1805*2.85*0.64*(1-0.00235)
return(c(b0,b1plus,b1minus))
}
multiroot(fun,c(1.5, 0, 0))
However, the result I get is far beyond the actual results. I wish to control x1 to the range (1.5,4), x2(0,1), x3(0,10000). How can I do that?
Thank you!!
Methods like 'Newton-Raphson' in multiroot or nleqslv do not work well together with bounds constraints. One possible approach is to square and sum the components of your function
fun1 <- function(x) sum(fun(x)^2)
and then treat this as a global optimization problem where you hope for a minimum value of 0.0. For example, GenSA provides an implementation of "simulated annealing" that works reasonably well in low dimensions.
library(GenSA)
res <- GenSA(par=NULL, fn=fun1,
lower=c(1.5,0,0), upper=c(4,1,10000),
control=list(maxit=10e5))
res$value; res$par
## [1] 119.7869
## [1] 4.00 0.00 2469.44
Several tries did not find a lower function value than this one, which makes me think there is no common root in the constraint box you requested.

Calculating the mean and variance of a periodic (circular) variable in R

I have several variables in my dataset that represent daily timing of events across a week.
For example for two rows might look like:
t1 = c(NA,12.6,10.7,11.5,12.5,9.5,14.1)
t2 = c(23.7,1.2,NA,22.9,23.2,0.5,0.1)
I want to calculate the variance of these rows. To do this, I need the mean and because these are periodic variables, I've adapted the code from this page:
#This can all be wrapped in a function like this
circ.mean <- function(m,int,na.rm=T) {
if(na.rm) m <- m[!is.na(m)]
rad.m = m*(360/int)*(pi/180)
mean.cos = mean(cos(rad.m))
mean.sin = mean(sin(rad.m))
x.deg = atan(mean.sin/mean.cos)*(180/pi)
return(x.deg/(360/int))
}
This works as expected for t2:
> circ.mean(t2,24)
[1] -0.06803088
although ideally the answer would be 23.93197. But for t1, it gives an incorrect answer:
> circ.mean(t1,24)
[1] -0.1810074
whereas using the normal mean function gives the right answer:
> mean(t1,na.rm=T)
[1] 11.81667
My questions are:
1) Is this "circular mean" code correct and if so, am I using it correctly?
2) I've had a stab my own circ.var function (see below) to calculate the variance of a periodic variable - will this produce the correct variances for all possible input timing vectors?
circ.var <- function(m,int=NULL,na.rm=TRUE) {
if(is.null(int)) stop("Period parameter missing")
if(na.rm) m <- m[!is.na(m)]
if(sum(!is.na(m))==0) return(NA)
n=length(m)
mean.m = circ.mean(m,int)
var.m = 1/(n-1)*sum((((m-mean.m+(int/2))%%int)-(int/2))^2)
return(var.m)
}
Any help would be hugely appreciated! Thanks for taking the time to read this!
I deleted my old answer, as I believe there was a mistake in the solution I provided.
I've written a series of R scripts that I've made available at my GitHub page which should calculate the mean, variance and other stats.
Thanks to #Gregor for his help.

Non-linear optimization for exponential function with linear constraints

I try to solve a non-linear optimization problem using the function donlp2 in R. My goal is to find out the maximum value of the following function:
442.8658*(x1+1)^(0.008752747)*(y1+1)^(0.555782)+(x2+1)^(0.008752747)*(y2+1)^(0.555782)
There is no non-linear constraints. The linear constraints are listed below:
x1+x2<=20000;
y1+y2<=20000;
x1<=4662.41;
x2<=149339;
y1<=14013.94;
y2<=1342738;
x1>=0;
x2>=0;
y1>=0;
y2>=0;
Below is my code:
p <- c(rep(0,4))
par.l <- c(rep(0,4))
par.u <- c(4662.41, 149339, 14013.94, 1342738)
fn <- function(par){
x1 <- par[1]; y1<-par[3]
x2 <- par[2]; y2<-par[4]
y <- 1 / (442.8658*(x1+1)^(0.008752747)*(y1+1)^(0.555782)
+ (x2+1)^(0.008752747)*(y2+1)^(0.555782))
}
A <- matrix(c(rep(c(1,0),2), rep(c(0,1),2)), nrow=2)
lin.l <- c(-Inf, 20000)
lin.u <- c(-Inf, 20000)
ret <- donlp2(p, fn, par.u=par.u, par.l=par.l, A=A, lin.l=lin.l, lin.u=lin.u)
I searched and found some related posts saying that donlp2 is only good to find minimum value of a function, which is the reason I took the reciprocal in the objective function.
The code ran correctly, but I have concerns with the results, since I can easily find other values that can give me greater outcome, i.e. the minimization of the objective function is not true.
I also found that when I change the initial value or the lower bound of x1,x2,y1,y2, the results will change dramatically. For example, if I set p=c(rep(0,4)), par.l<-c(rep(1,4)) instead of p=c(rep(0,4)), par.l<-c(rep(0,4)), the results will change from
$par
[1] 2.410409e+00 5.442753e-03 1.000000e+04 1.000000e+04
to
$par
[1] 2331.748 74670.025 3180.113 16819.887
Any ideas? I appreciate your suggestions and help!

Issue with the dimension of matrix being optimised in R

I am attempting to calculate some weights in order to perform an indirect treatment comparison using R. I have altered some code slightly, in order to reflect that I am only centring the mean. However, this code will not run.
I believe this is due to the a1 matrix having two columns instead of one, but I really can't work out how to change this. I have tried adding a column of zeros and ones to the matrix, but I'm not sure if this will give me a correct result.
Of course, this may not be the issue at all, but I fail to see what else could be causing this. I have included the code and any advice would be appreciated.
# Objective function
objfn <- function(a1, X){
sum(exp(X %*% a1))
}
# Gradient function
gradfn <- function(a1, X){
colSums(sweep(X, 1, exp(X %*% a1), "*"))
}
X.EM.0 = data$A-age.mean
# Estimate weights
print(opt1 <- optim(par = c(0,0), fn = objfn, gr = gradfn, X = X.EM.0, method = "BFGS"))
a1 <- opt1$par
Such a simple solution, I'm slightly embarrassed to have posted this.
par=c(0,0) should be altered to match the columns of data. Here it should have been changed to one.

Finding Sample Size

I am attempting to use several methods (Wald, Wilson, Clopper-Pearson, Jeffreys, etc.) to calculate sample sizes for confidence intervals. I have been unable to find, in R, how to calculate these. Is there a better way to calculate these besides brute force? Does R have a package that will output all to compare?
I have been unsuccessful with the likes of n.clopper.pearson{GenBinomApps} and some of these require lots of by-hand computations. I have done this for the Wald method:
#Variables
z <- 1.95996
d <- .05
p <- 0.5
q <- 1 - p
#Wald
n_wald <- (z^2 * (p*q))/(d^2)
n_wald
But, I have not been able to find away, besides guess and check methods, to produce the others in R.
I was able to answer my own question with help from the comments:
n_wald <- ciss.wald(p, d, alpha = 0.05)
n_wilson <- ciss.wilson(p, d, alpha = 0.05)
n_agricoull <- ciss.agresticoull(p, d, alpha = 0.05)
These were from the binomSamSize package. Still struggling with an optimization for the clopper-pearson and jeffries if anyone can provide direction there, but these commands calculated sample size easily.

Resources