Bhattacharya Distance Bug in R Function - r

While using the bhattacharya.dist() function available in library("fps") we run into a bug that gives -Inf when calculating this distance for two distributions which are the same. (The correct value is zero in this case).
This happens because of multiplication of two large values in the function. I have posted the old code and the code with the fix below.
Could some experts please verify if this is correct and also suggest how this fix can be brought to the attention of the owners of this library and distributed out etc. Please note, I am very new to R and statistics etc. Hence I am concerned there might be some issues with this fix.
I have run some tests with matrices of different sizes and see expected results. But admittedly, (due to my lack of software development skills) it is nowhere close to a thorough test.
OLD ROUTINE
bhattacharyya.dist
function (mu1, mu2, Sigma1, Sigma2)
{
aggregatesigma <- (Sigma1 + Sigma2)/2
d1 <- mahalanobis(mu1, mu2, aggregatesigma)/8
d2 <- log(det(as.matrix(aggregatesigma))/sqrt(det(as.matrix(Sigma1)) *
det(as.matrix(Sigma2))))/2
out <- d1 + d2
out
}
NEW ROUTINE WITH BUG FIX
bhattacharyyaDistance = function (mu1, mu2, Sigma1, Sigma2)
{
aggregatesigma <- (Sigma1 + Sigma2)/2;
d1 <- mahalanobis(mu1, mu2, aggregatesigma)/8;
#d2 <- log(det(as.matrix(aggregatesigma))/sqrt(det(as.matrix(Sigma1)) *
# det(as.matrix(Sigma2))))/2;
d2 <- log((det(as.matrix(aggregatesigma))/sqrt(det(as.matrix(Sigma1))))
/ sqrt(det(as.matrix(Sigma2))))/2;
out <- d1 + d2;
return(out);
};

Related

Function to Produce Repeating Spikes

I asked a similar question on CrossValidated, but did not get a response. I went ahead anyway, and built out a function but am having a problem with replication...
The original question, posted here is as such:
I am seeking a function (or short algorithm, ideally implemented in R) that produces something similar to the following:
See, I would like to be able to generate a vector of n items that follows this sort of pattern, mapped to a set of inputs (say, seq(1:n)). Ideally, I would be able to tell the algorithm to "spike" to a maximum height h on every kth time period, and decay at rate r. However, I would be sufficiently happy with simply being able to generate a spike pattern that occurs periodically.
I wrote some code in R, which is included here, that works fairly well...
## Neural Networks / Deep Learning ##
# first, must install Python from:
# https://www.anaconda.com/download/#windows
# https://www.python.org/downloads/
if (!require(keras)) devtools::install_github("rstudio/keras") ; library(keras)
# install_tensorflow()
spikes_model <- function(maxiter, total_spikes = 10, max_height = 0.001, min_height = 0.000005, decay_rate = 1) {
value_at_iteration <- rep(0, maxiter)
spike_at <- maxiter / total_spikes
current_rate <- min_height
holder_timeval <- 0
for(i in 1:maxiter) {
spike_indicator <- i / spike_at
if (is.integer(spike_indicator)) {
current_rate <- max_height
value_at_iteration[i] <- current_rate
holder_timeval <- spike_indicator
} else if (i < spike_at) {
current_rate <- min_height
value_at_iteration[i] <- current_rate
} else {
timeval <- i - (holder_timeval*spike_at)
current_rate <- max_height*exp(-decay_rate*timeval) + min_height
value_at_iteration[i] <- current_rate
}
}
return(value_at_iteration)
}
asdf <- spikes_model(maxiter = 100)
plot(asdf, type="l")
... which results in the following plot:
This is exactly what I want, except there is only one spike. I know there is a code or logic error somewhere, but I can not find where I am going wrong. Please help me replicate this spike procedure across time.
The code this scheduler is used in:
eps <- 1000
sch <- spikes_model(eps)
lr_schedule <- function(epoch, lr) {
lrn <- sch[as.integer(epoch)]
lrn <- k_cast_to_floatx(lrn)
return(lrn)
}
## Add callback to automatically adjust learning rate downward when training reaches plateau ##
reduce_lr <- callback_learning_rate_scheduler(lr_schedule)
## Fit model using trainig data, validate with validation data ##
mod1.hst <- mod1 %>% fit(
x=X.train, y=Y.train,
epochs=eps, batch_size=nrow(X.train),
validation_data = list(X.val, Y.val),
shuffle=TRUE, callbacks = list(checkpoint, reduce_lr)
)
Wow, I just figured out my own error. I was using the is.integer() function, which does not work how I wanted. I needed to use the is.whole.number() function from mosaic.
Fixing that single error, I find the following chart, which is exactly what I wanted.

Issue with the dimension of matrix being optimised in R

I am attempting to calculate some weights in order to perform an indirect treatment comparison using R. I have altered some code slightly, in order to reflect that I am only centring the mean. However, this code will not run.
I believe this is due to the a1 matrix having two columns instead of one, but I really can't work out how to change this. I have tried adding a column of zeros and ones to the matrix, but I'm not sure if this will give me a correct result.
Of course, this may not be the issue at all, but I fail to see what else could be causing this. I have included the code and any advice would be appreciated.
# Objective function
objfn <- function(a1, X){
sum(exp(X %*% a1))
}
# Gradient function
gradfn <- function(a1, X){
colSums(sweep(X, 1, exp(X %*% a1), "*"))
}
X.EM.0 = data$A-age.mean
# Estimate weights
print(opt1 <- optim(par = c(0,0), fn = objfn, gr = gradfn, X = X.EM.0, method = "BFGS"))
a1 <- opt1$par
Such a simple solution, I'm slightly embarrassed to have posted this.
par=c(0,0) should be altered to match the columns of data. Here it should have been changed to one.

Solve non-linear system of equations R

I try to solve a set of non-linear systems of equation using the nleqslv function in R. Unfortunately I run into troubles guessing the right initial values to make the function run sucessfully. I have a vector with values between 0 and 1, which are called c(t). They should satisfy the following equation
c(t)=A*(exp(-mt)+exp(-m(1024-t)))+B^2
Using three subsequent values of t I aim to determine the coefficients A,B,m using the following code
library(nleqslv)
C10 <- c(1.000000e+00,9.754920e-01,9.547681e-01,9.359057e-01,9.182586e-01,9.014674e-01)
system_size <- 1024
for(i in 2:5)
{
C <- c(C10[i-1],C10[i],C10[i+1],i-2)
#function
target <- function(Coeffs){
y <- numeric(3)
y[1] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]-1))+exp(-Coeffs[2]*(system_size-(C[4]-1))))+Coeffs[3]^2-C[1]
y[2] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]))+exp(-Coeffs[2]*(system_size-(C[4]))))+Coeffs[3]^2-C[2]
y[3] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]+1))+exp(-Coeffs[2]*(system_size-(C[4]+1))))+Coeffs[3]^2-C[3]
y
}
init <- c(0.001,0.01,0)
sol <- nleqslv(init, target,control=list(btol=.01), method="Broyden")
}
The used initial values reflect what I get when plotting the associated values c(t). Nonetheless the generated output sol gives
chr "Jacobian is ill-conditioned (1/condition=9.0e-18) (see allowSingular option)"
Any idea what is going wrong and how to solve this?
OP edit: Modified Code to have minimal working example: added first few values for C10, adjusted loop and added value for system_size
With your addition of some data for C10 the example runs perfectly ok if one takes into account that the loop should take account of length(C10).
Like this (with some changes; for why see below):
library(nleqslv)
C10 <- c(1.000000e+00,9.754920e-01,9.547681e-01,9.359057e-01,9.182586e-01,9.014674e-01)
system_size <- 1024
target <- function(Coeffs){
y <- numeric(3)
y[1] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]-1))+exp(-Coeffs[2]*(system_size-(C[4]-1))))+Coeffs[3]^2-C[1]
y[2] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]))+exp(-Coeffs[2]*(system_size-(C[4]))))+Coeffs[3]^2-C[2]
y[3] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]+1))+exp(-Coeffs[2]*(system_size-(C[4]+1))))+Coeffs[3]^2-C[3]
y
}
init <- 50*c(0.001,0.01,0)
for(i in 2:min(length(C10)-1,(system_size/2)))
{
C <- c(C10[i-1],C10[i],C10[i+1],i-2)
#function
target <- function(Coeffs){
y <- numeric(3)
y[1] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]-1))+exp(-Coeffs[2]*(system_size-(C[4]-1))))+Coeffs[3]^2-C[1]
y[2] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]))+exp(-Coeffs[2]*(system_size-(C[4]))))+Coeffs[3]^2-C[2]
y[3] <- Coeffs[1]*(exp(-Coeffs[2]*(C[4]+1))+exp(-Coeffs[2]*(system_size-(C[4]+1))))+Coeffs[3]^2-C[3]
y
}
cat("i=",i,"init=",init, "target(init)=",target(init),"\n")
sol <- nleqslv(init, target,control=list(btol=.01), method="Broyden")
print(sol)
}
With your initial starting values the model doesn't solve and gives the error message you mention. I have changed the value of init by increasing the value. And then a solution is found up-to i=5. Larger values with the given C10 won't run since within the loop C10[i+1] is referenced (and it doesn't exist).
I have inserted a cat statement before the call of nleqslv and a print(sol) after the function call so that one can at least see what's going on and if a solution is actually found.
You do not need to specify method="Broyden" since it is the default.
You should test sol$termcd in the for loop end exit if an error occurs.
With scripts like this always print stuff inside the loop!
Mislav was correct: starting values can be totally wrong.
Even if starting values are ok the algorithms used can fail. That's why the package provides function testnslv and searchZeros.
I did some experiments (not shown here) with testnslv and the conclusion is that method="Newton" is a failure. The dogleg global strategies always seem to work. The linesearch strategies don't always work.

Error in Gradient Descent Calculation

I tried to write a function to calculate gradient descent for a linear regression model. However the answers I was getting does not match the answers I get using the normal equation method.
My sample data is:
df <- data.frame(c(1,5,6),c(3,5,6),c(4,6,8))
with c(4,6,8) being the y values.
lm_gradient_descent <- function(df,learning_rate, y_col=length(df),scale=TRUE){
n_features <- length(df) #n_features is the number of features in the data set
#using mean normalization to scale features
if(scale==TRUE){
for (i in 1:(n_features)){
df[,i] <- (df[,i]-mean(df[,i]))/sd(df[,i])
}
}
y_data <- df[,y_col]
df[,y_col] <- NULL
par <- rep(1,n_features)
df <- merge(1,df)
data_mat <- data.matrix(df)
#we need a temp_arr to store each iteration of parameter values so that we can do a
#simultaneous update
temp_arr <- rep(0,n_features)
diff <- 1
while(diff>0.0000001){
for (i in 1:(n_features)){
temp_arr[i] <- par[i]-learning_rate*sum((data_mat%*%par-y_data)*df[,i])/length(y_data)
}
diff <- par[1]-temp_arr[1]
print(diff)
par <- temp_arr
}
return(par)
}
Running this function,
lm_gradient_descent(df,0.0001,,0)
the results I got were
c(0.9165891,0.6115482,0.5652970)
when I use the normal equation method, I get
c(2,1,0).
Hope someone can shed some light on where I went wrong in this function.
You used the stopping criterion
old parameters - new parameters <= 0.0000001
First of all I think there's an abs() missing if you want to use this criterion (though my ignorance of R may be at fault).
But even if you use
abs(old parameters - new parameters) <= 0.0000001
this is not a good stopping criterion: it only tells you that progress has slowed down, not that it's already sufficiently accurate. Try instead simply to iterate for a fixed number of iterations. Unfortunately it's not that easy to give a good, generally applicable stopping criterion for gradient descent here.
It seems that you have not implemented a bias term. In a linear model like this, you always want to have an additional additive constant, i.e., your model should be like
w_0 + w_1*x_1 + ... + w_n*x_n.
Without the w_0 term, you usually won't get a good fit.
I know this is a couple of weeks old at this point but I'm going to take a stab at for several reasons, namely
Relatively new to R so deciphering your code and rewriting it is good practice for me
Working on a different Gradient Descent problem so this is all fresh to me
Need the stackflow points and
As far as I can tell you never got a working answer.
First, regarding your data structures. You start with a dataframe, rename a column, strip out a vector, then strip out a matrix. It would be a lot easier to just start with an X matrix (capitalized since its component 'features' are referred to as xsubscript i) and a y solution vector.
X <- cbind(c(1,5,6),c(3,5,6))
y <- c(4,6,8)
We can easily see what the desired solutions are, with and without scaling by fitting a linear fit model. (NOTE We only scale X/features and not y/solutions)
> lm(y~X)
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X1 X2
-4 -1 3
> lm(y~scale(X))
Call:
lm(formula = y ~ scale(X))
Coefficients:
(Intercept) scale(X)1 scale(X)2
6.000 -2.646 4.583
With regards to your code, one of the beauties of R is that it can perform matrix multiplication which is significantly faster than using loops.
lm_gradient_descent <- function(X, y, learning_rate, scale=TRUE){
if(scale==TRUE){X <- scale(X)}
X <- cbind(1, X)
theta <- rep(0, ncol(X)) #your old temp_arr
diff <- 1
old.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
while(diff>0.000000001){
theta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)
new.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
diff <- abs(old.error - new.error)
old.error <- new.error
}
return(theta)
}
And to show it works...
> lm_gradient_descent(X, y, .01, 0)
[,1]
[1,] -3.9360685
[2,] -0.9851775
[3,] 2.9736566
vs expected of (-4, -1, 3)
For what its worth while I agree with #cfh that I would prefer a loop with a defined number of iterations, I'm actually not sure you need the abs function. If diff < 0 then your function is not converging.
Finally rather than using something like old.error and new.error I'd suggest using a a vector that records all errors. You can then plot that vector to see how quickly your function converges.

How extreme values of a functional can be found using R?

I have a functional like this :
(LaTex formula: $v[y]=\int_0^2 (y'^2+23yy'+12y^2+3ye^{2t})dt$)
with given start and end conditions y(0)=-1, y(2)=18.
How can I find extreme values of this functional in R? I realize how it can be done for example in Excel but didn't find appropriate solution in R.
Before trying to solve such a task in a numerical setting, it might be better to lean back and think about it for a moment.
This is a problem typically treated in the mathematical discipline of "variational calculus". A necessary condition for a function y(t) to be an extremum of the functional (ie. the integral) is the so-called Euler-Lagrange equation, see
Calculus of Variations at Wolfram Mathworld.
Applying it to f(t, y, y') as the integrand in your request, I get (please check, I can easily have made a mistake)
y'' - 12*y + 3/2*exp(2*t) = 0
You can go now and find a symbolic solution for this differential equation (with the help of a textbook, or some CAS), or solve it numerically with the help of an R package such as 'deSolve'.
PS: Solving this as an optimization problem based on discretization is possible, but may lead you on a long and stony road. I remember solving the "brachistochrone problem" to a satisfactory accuracy only by applying several hundred variables (not in R).
Here is a numerical solution in R. First the functional:
f<-function(y,t=head(seq(0,2,len=length(y)),-1)){
len<-length(y)-1
dy<-diff(y)*len/2
y0<-(head(y,-1)+y[-1])/2
2*sum(dy^2+23*y0*dy+12*y0^2+3*y0*exp(2*t))/len
}
Now the function that does the actual optimization. The best results I got were using the BFGS optimization method, and parametrizing using dy rather than y:
findMinY<-function(points=100, ## number of points of evaluation
boundary=c(-1,18), ## boundary values
y0=NULL, ## optional initial value
method="Nelder-Mead", ## optimization method
dff=T) ## if TRUE, optimizes based on dy rather than y
{
t<-head(seq(0,2,len=points),-1)
if(is.null(y0) || length(y0)!=points)
y0<-seq(boundary[1],boundary[2],len=points)
if(dff)
y0<-diff(y0)
else
y0<-y0[-1]
y0<-head(y0,-1)
ff<-function(z){
if(dff)
y<-c(cumsum(c(boundary[1],z)),boundary[2])
else
y<-c(boundary[1],z,boundary[2])
f(y,t)
}
res<-optim(y0,ff,control=list(maxit=1e9),method=method)
cat("Iterations:",res$counts,"\n")
ymin<-res$par
if(dff)
c(cumsum(c(boundary[1],ymin)),boundary[2])
else
c(boundary[1],ymin,boundary[2])
}
With 500 points of evaluation, it only takes a few seconds with BFGS:
> system.time(yy<-findMinY(500,method="BFGS"))
Iterations: 90 18
user system elapsed
2.696 0.000 2.703
The resulting function looks like this:
plot(seq(0,2,len=length(yy)),yy,type='l')
And now a solution that numerically integrates the Euler equation.
As #HansWerner pointed out, this problem boils down to applying the Euler-Lagrange equation to the integrand in OP's question, and then solving that differential equation, either analytically or numerically. In this case the relevant ODE is
y'' - 12*y = 3/2*exp(2*t)
subject to:
y(0) = -1
y(2) = 18
So this is a boundary value problem, best approached using bvpcol(...) in package bvpSolve.
library(bvpSolve)
F <- function(t, y.in, pars){
dy <- y.in[2]
d2y <- 12*y.in[1] + 1.5*exp(2*t)
return(list(c(dy,d2y)))
}
init <- c(-1,NA)
end <- c(18,NA)
t <- seq(0, 2, by = 0.01)
sol <- bvpcol(yini = init, yend = end, x = t, func = F)
y = function(t){ # analytic solution...
b <- sqrt(12)
a <- 1.5/(4-b*b)
u <- exp(2*b)
C1 <- ((18*u + 1) - a*(exp(4)*u-1))/(u*u - 1)
C2 <- -1 - a - C1
return(a*exp(2*t) + C1*exp(b*t) + C2*exp(-b*t))
}
par(mfrow=c(1,2))
plot(t,y(t), type="l", xlim=c(0,2),ylim=c(-1,18), col="red", main="Analytical Solution")
plot(sol[,1],sol[,2], type="l", xlim=c(0,2),ylim=c(-1,18), xlab="t", ylab="y(t)", main="Numerical Solution")
It turns out that in this very simple example, there is an analytical solution:
y(t) = a * exp(2*t) + C1 * exp(sqrt(12)*t) + C2 * exp(-sqrt(12)*t)
where a = -3/16 and C1 and C2 are determined to satisfy the boundary conditions. As the plots show, the numerical and analytic solution agree completely, and also agree with the solution provided by #mrip

Resources