Underflow in R, sum of logartihm of probabilities - r

How to calculate the logarithm of the sum of the probabilities, i.e. ln(p1 + p2), where p1 = a very small number and p2 = a very small number. Using the values of lp1 = ln(p1) and lp2 = ln(p2)
If you p1 and p2 are very small numbers underflow will happen. How to avoid this?

In general the following tips are useful for taking logs in r:
If you are taking log(1+x) for a very small x there is a function log1p that is more accurate (see also expm1).
log(x^a) = a*log(x)
log(a*x) = log(a) + log(x)
Calculating log(x) for small x is fine. log(1e-308) does not suffer from underflow. Calculating exp(-1e308) is different, but that is far smaller than any representable answer anyway.
One way to solve your question might be (assuming p1 and p2 are less than $10^-308$) is to calculate log(p2) and p1/p2, and then
log(p1 + p2) = log(1 + p1/p2) + log(p2)
calculate the first term using log1p and you already have the second.

Related

Using optim to choose initial values for nls

One method I have seen in the literature is the use of optim() to choose initial values for nonlinear models in the package nls or nlme, however, I am puzzled by the actual implementation.
Take an example using COVID data from Alachua, FL:
dat=data.frame(x=seq(1,10,1), y=c(27.9,23.1,24.6,33.0,48.0,136.4,243.4,396.7,519.9,602.8))
x are time points and y is the number of people infected per 10,000 people
Now, if I wanted to fit a four-parameter logistic model in nls, I could use
n1 <- nls(y ~ SSfpl(x, A, B, M, S), data = dat)
But now imagine that parameter estimation is highly sensitive to the initial values so I want to optimize my approach. How would this be achieved?
The way I have thought to try is as follows
fun_to_optim <- function(data, guess){
x = data$x
y = data$y
A = guess[1]
B = guess[2]
M = guess[3]
S = guess[4]
y = A + (B-A)/(1+exp((M-x)/S))
return(-sum(y)) }
optim(fn=fun_to_optim, data=dat,
par=c(10,10,10,10),
method="Nelder-Mead")
The result from optim() is wrong but I cannot see my error. Thank you for any assistance.
The main issue is that you're not computing/returning the sum of squares from your objective function. However: I think you really have it backwards. Using nls() with SSfpl is about the best you're going to do in terms of optimization: it has sensible heuristics for picking starting values (SS stands for "self-starting"), and it provides a gradient function for the optimizer. It's not impossible that, with a considerable amount of work, you could find better heuristics for picking starting values for a particular system, but in general switching from nls to optim + Nelder-Mead will leave you worse off than when you started (illustration below).
fun_to_optim <- function(data, guess){
x = data$x
y = data$y
A = guess[1]
B = guess[2]
M = guess[3]
S = guess[4]
y_pred = A + (B-A)/(1+exp((M-x)/S))
return(sum((y-y_pred)^2))
}
Fit optim() with (1) your suggested starting values; (2) better starting values that are somewhere nearer the correct values (you could get most of these values by knowing the geometry of the function — e.g. A is the left asymptote, B is the right asymptote, M is the midpoint, S is the scale); (3) same as #2 but using BFGS rather than Nelder-Mead.
opt1 <- optim(fn=fun_to_optim, data=dat,
par=c(A=10,B=10,M=10,S=10),
method="Nelder-Mead")
opt2 <- optim(fn=fun_to_optim, data=dat,
par=c(A=10,B=500,M=10,S=1),
method = "Nelder-Mead")
opt3 <- optim(fn=fun_to_optim, data=dat,
par=c(A=10,B=500,M=10,S=1),
method = "BFGS")
Results:
xvec <- seq(1,10,length=101)
plot(y~x, data=dat)
lines(xvec, predict(n1, newdata=data.frame(x=xvec)))
p1 <- with(as.list(opt1$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p1, col=2)
p2 <- with(as.list(opt2$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p2, col=4)
p3 <- with(as.list(opt3$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p3, col=6)
legend("topleft", col=c(1,2,4,6), lty=1,
legend=c("nls","NM (bad start)", "NM", "BFGS"))
nls and good starting values + BFGS overlap, and provide a good fit
optim/Nelder-Mead from bad starting values is absolutely terrible — converges on a constant line
optim/N-M from good starting values gets a reasonable fit, but obviously worse; I haven't analyzed why it gets stuck there.

How to simulate a dataset with a binary target in proportions determined 'a-priori'?

Can someone tell me what is the best way to simulate a dataset with a binary target?
I understand the way in which a dataset can be simulated but what I'm looking for is to determine 'a-priori' the proportion of each class. What I thought was to change the intercept to achieve it but I couldn't do it and I don't know why. I guess because the average is playing a trick on me.
set.seed(666)
x1 = rnorm(1000)
x2 = rnorm(1000)
p=0.25 # <<< I'm looking for a 25%/75%
mean_z=log(p/(1-p))
b0 = mean( mean_z - (4*x1 + 3*x2)) # = mean_z - mean( 2*x1 + 3*x2)
z = b0 + 4*x1 + 3*x2 # = mean_z - (4*x1 + 3*x2) + (4*x1 + 3*x2) = rep(mean_z,1000)
mean( b0 + 4*x1 + 3*x2 ) == mean_z # TRUE!!
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr)
mean(pr) # ~ 40% << not achieved
table(y)/1000
What I'm looking for is to simulate the typical "logistic" problem in which the binary target can be modeled as a linear combination of features.
These 'logistic' models assume that the log-odd ratio of the binary variable behaves linearly. That means:
log (p / (1-p)) = z = b0 + b1 * x1 + b2 * x2 where p = prob (y = 1)
Going back to my sample code, we could do, for example: z = 1.3 + 4 * x1 + 2 * x2 , but the probability of the class would be a result. Or instead we could choose coefficient b0 such that the probability is (statistically) similar to the one sought :
log (0.25 / 0.75) = b0 + 4 * x1 + 2 * x2
This is my approach, but there may be betters
I gather that you are considering a logistic regression model, right? If so, one way to generate a data set is to create two Gaussian bumps and say that one is class 1 and the other is class 0. Then generate 25 items from class 1 and 75 items from class 0. Then each generated item plus its label is a datum or record or whatever you want to call it.
Obviously you can choose any proportions of 1's and 0's. It is also interesting to make the problem "easy" by making the Gaussian bumps farther apart (i.e. variances smaller in comparison to difference of means) or "hard" by making the bumps overlapping (i.e. variances larger compared to difference of means).
EDIT: In order to make sample data which correspond exactly to a logistic regression model, just make the variances of the two Gaussian bumps the same. When the variances (by this I mean specifically the covariance matrix) are the same, the surfaces of equal posterior class probability are planes; when the covariances are different, the surfaces of equal probability are quadratics. This is a standard result which will appear in many textbooks. I also have some notes online about this, which I can locate if it will help.
Aside from generating the two classes separately and then merging the results into one set, you can also sample from a single distribution over x, plug x into a logistic regression model with some weights (which you choose by any means you wish), and then use the resulting output as a probability for a coin toss. This method isn't guaranteed to output proportions that correspond exactly to prior class probabilities.

How to calculate log(sum of terms) from its component log-terms

(1) The simple version of the problem:
How to calculate log(P1+P2+...+Pn), given log(P1), log(P2), ..., log(Pn), without taking the exp of any terms to get the original Pi. I don't want to get the original Pi because they are super small and may cause numeric computer underflow.
(2) The long version of the problem:
I am using Bayes' Theorem to calculate a conditional probability P(Y|E).
P(Y|E) = P(E|Y)*P(Y) / P(E)
I have a thousand probabilities multiplying together.
P(E|Y) = P(E1|Y) * P(E2|Y) * ... * P(E1000|Y)
To avoid computer numeric underflow, I used log(p) and calculate the summation of 1000 log(p) instead of calculating the product of 1000 p.
log(P(E|Y)) = log(P(E1|Y)) + log(P(E2|Y)) + ... + log(P(E1000|Y))
However, I also need to calculate P(E), which is
P(E) = sum of P(E|Y)*P(Y)
log(P(E)) does not equal to the sum of log(P(E|Y)*P(Y)). How should I get log(P(E)) without solving for P(E|Y)*P(Y) (they are extremely small numbers) and adding them.
You can use
log(P1+P2+...+Pn) = log(P1[1 + P2/P1 + ... + Pn/P1])
= log(P1) + log(1 + P2/P1 + ... + Pn/P1])
which works for any Pi. So factoring out maxP = max_i Pi results in
log(P1+P2+...+Pn) = log(maxP) + log(1+P2/maxP + ... + Pn/maxP)
where all the ratios are less than 1.

How extreme values of a functional can be found using R?

I have a functional like this :
(LaTex formula: $v[y]=\int_0^2 (y'^2+23yy'+12y^2+3ye^{2t})dt$)
with given start and end conditions y(0)=-1, y(2)=18.
How can I find extreme values of this functional in R? I realize how it can be done for example in Excel but didn't find appropriate solution in R.
Before trying to solve such a task in a numerical setting, it might be better to lean back and think about it for a moment.
This is a problem typically treated in the mathematical discipline of "variational calculus". A necessary condition for a function y(t) to be an extremum of the functional (ie. the integral) is the so-called Euler-Lagrange equation, see
Calculus of Variations at Wolfram Mathworld.
Applying it to f(t, y, y') as the integrand in your request, I get (please check, I can easily have made a mistake)
y'' - 12*y + 3/2*exp(2*t) = 0
You can go now and find a symbolic solution for this differential equation (with the help of a textbook, or some CAS), or solve it numerically with the help of an R package such as 'deSolve'.
PS: Solving this as an optimization problem based on discretization is possible, but may lead you on a long and stony road. I remember solving the "brachistochrone problem" to a satisfactory accuracy only by applying several hundred variables (not in R).
Here is a numerical solution in R. First the functional:
f<-function(y,t=head(seq(0,2,len=length(y)),-1)){
len<-length(y)-1
dy<-diff(y)*len/2
y0<-(head(y,-1)+y[-1])/2
2*sum(dy^2+23*y0*dy+12*y0^2+3*y0*exp(2*t))/len
}
Now the function that does the actual optimization. The best results I got were using the BFGS optimization method, and parametrizing using dy rather than y:
findMinY<-function(points=100, ## number of points of evaluation
boundary=c(-1,18), ## boundary values
y0=NULL, ## optional initial value
method="Nelder-Mead", ## optimization method
dff=T) ## if TRUE, optimizes based on dy rather than y
{
t<-head(seq(0,2,len=points),-1)
if(is.null(y0) || length(y0)!=points)
y0<-seq(boundary[1],boundary[2],len=points)
if(dff)
y0<-diff(y0)
else
y0<-y0[-1]
y0<-head(y0,-1)
ff<-function(z){
if(dff)
y<-c(cumsum(c(boundary[1],z)),boundary[2])
else
y<-c(boundary[1],z,boundary[2])
f(y,t)
}
res<-optim(y0,ff,control=list(maxit=1e9),method=method)
cat("Iterations:",res$counts,"\n")
ymin<-res$par
if(dff)
c(cumsum(c(boundary[1],ymin)),boundary[2])
else
c(boundary[1],ymin,boundary[2])
}
With 500 points of evaluation, it only takes a few seconds with BFGS:
> system.time(yy<-findMinY(500,method="BFGS"))
Iterations: 90 18
user system elapsed
2.696 0.000 2.703
The resulting function looks like this:
plot(seq(0,2,len=length(yy)),yy,type='l')
And now a solution that numerically integrates the Euler equation.
As #HansWerner pointed out, this problem boils down to applying the Euler-Lagrange equation to the integrand in OP's question, and then solving that differential equation, either analytically or numerically. In this case the relevant ODE is
y'' - 12*y = 3/2*exp(2*t)
subject to:
y(0) = -1
y(2) = 18
So this is a boundary value problem, best approached using bvpcol(...) in package bvpSolve.
library(bvpSolve)
F <- function(t, y.in, pars){
dy <- y.in[2]
d2y <- 12*y.in[1] + 1.5*exp(2*t)
return(list(c(dy,d2y)))
}
init <- c(-1,NA)
end <- c(18,NA)
t <- seq(0, 2, by = 0.01)
sol <- bvpcol(yini = init, yend = end, x = t, func = F)
y = function(t){ # analytic solution...
b <- sqrt(12)
a <- 1.5/(4-b*b)
u <- exp(2*b)
C1 <- ((18*u + 1) - a*(exp(4)*u-1))/(u*u - 1)
C2 <- -1 - a - C1
return(a*exp(2*t) + C1*exp(b*t) + C2*exp(-b*t))
}
par(mfrow=c(1,2))
plot(t,y(t), type="l", xlim=c(0,2),ylim=c(-1,18), col="red", main="Analytical Solution")
plot(sol[,1],sol[,2], type="l", xlim=c(0,2),ylim=c(-1,18), xlab="t", ylab="y(t)", main="Numerical Solution")
It turns out that in this very simple example, there is an analytical solution:
y(t) = a * exp(2*t) + C1 * exp(sqrt(12)*t) + C2 * exp(-sqrt(12)*t)
where a = -3/16 and C1 and C2 are determined to satisfy the boundary conditions. As the plots show, the numerical and analytic solution agree completely, and also agree with the solution provided by #mrip

Minimization with constraint on all parameters in R

I want to minimize a simple linear function Y = x1 + x2 + x3 + x4 + x5 using ordinary least squares with the constraint that the sum of all coefficients have to equal 5. How can I accomplish this in R? All of the packages I've seen seem to allow for constraints on individual coefficients, but I can't figure out how to set a single constraint affecting coefficients. I'm not tied to OLS; if this requires an iterative approach, that's fine as well.
The basic math is as follows: we start with
mu = a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4
and we want to find a0-a4 to minimize the SSQ between mu and our response variable y.
if we replace the last parameter (say a4) with (say) C-a1-a2-a3 to honour the constraint, we end up with a new set of linear equations
mu = a0 + a1*x1 + a2*x2 + a3*x3 + (C-a1-a2-a3)*x4
= a0 + a1*(x1-x4) + a2*(x2-x4) + a3*(x3-x4) + C*x4
(note that a4 has disappeared ...)
Something like this (untested!) implements it in R.
Original data frame:
d <- data.frame(y=runif(20),
x1=runif(20),
x2=runif(20),
x3=runif(20),
x4=runif(20))
Create a transformed version where all but the last column have the last column "swept out", e.g. x1 -> x1-x4; x2 -> x2-x4; ...
dtrans <- data.frame(y=d$y,
sweep(d[,2:4],
1,
d[,5],
"-"),
x4=d$x4)
Rename to tx1, tx2, ... to minimize confusion:
names(dtrans)[2:4] <- paste("t",names(dtrans[2:4]),sep="")
Sum-of-coefficients constraint:
constr <- 5
Now fit the model with an offset:
lm(y~tx1+tx2+tx3,offset=constr*x4,data=dtrans)
It wouldn't be too hard to make this more general.
This requires a little more thought and manipulation than simply specifying a constraint to a canned optimization program. On the other hand, (1) it could easily be wrapped in a convenience function; (2) it's much more efficient than calling a general-purpose optimizer, since the problem is still linear (and in fact one dimension smaller than the one you started with). It could even be done with big data (e.g. biglm). (Actually, it occurs to me that if this is a linear model, you don't even need the offset, although using the offset means you don't have to compute a0=intercept-C*x4 after you finish.)
Since you said you are open to other approaches, this can also be solved in terms of a quadratic programming (QP):
Minimize a quadratic objective: the sum of the squared errors,
subject to a linear constraint: your weights must sum to 5.
Assuming X is your n-by-5 matrix and Y is a vector of length(n), this would solve for your optimal weights:
library(limSolve)
lsei(A = X,
B = Y,
E = matrix(1, nrow = 1, ncol = 5),
F = 5)

Resources