How to fit a function to measurements with error? - julia

I am currently using Measurements.jl for error propagation and LsqFit.jl for fitting functions to data. Is there a simple way to fit a function to data with errors? It would be no problem to use an other package if that makes things easier.
Thanks in advance for your help.

While in principle it should be possible to make these packages work together, the implementation of LsqFit.jl does not seem to play nicely with the Measurement type. However, if one writes a simple least-squares linear regression directly
# Generate test data, with noise
x = 1:10
y = 2x .+ 3
using Measurements
x_observed = (x .+ randn.()) .± 1
y_observed = (y .+ randn.()) .± 1
# Simple least-squares linear regression
# for an equation of the form y = a + bx
# using `\` for matrix division
linreg(x, y) = hcat(fill!(similar(x), 1), x) \ y
(a, b) = linreg(x_observed, y_observed)
then
julia> (a, b) = linreg(x_observed, y_observed)
2-element Vector{Measurement{Float64}}:
3.9 ± 1.4
1.84 ± 0.23
This ought to be able to work with either x uncertainties, y uncertainties, or both.
If you need a nonlinear least-squares fit, it should also be possible to extend the above approach to nonlinear least squares -- though for the latter it may be easier to just find where the incompatibility is in LsqFit.jl and make a PR.

Related

Is there a way to optimize the calculation of Bernoulli Log-Likelihoods for many multivariate samples?

I currently have two Torch Tensors, p and x, which both have the shape of (batch_size, input_size).
I would like to calculate the Bernoulli log likelihoods for the given data, and return a tensor of size (batch_size)
Here's an example of what I'd like to do:
I have the formula for log likelihoods of Bernoulli Random variables:
\sum_i^d x_{i} ln(p_i) + (1-x_i) ln (1-p_i)
Say I have p Tensor:
[[0.6 0.4 0], [0.33 0.34 0.33]]
And say I have the x tensor for the binary inputs based on those probabilities:
[[1 1 0], [0 1 1]]
And I want to calculate the log likelihood for every sample, which would result in:
[[ln(0.6)+ln(0.4)], [ln(0.67)+ln(0.34)+ln(0.33)]]
Would it be possible to do this computation without the use of for loops?
I know I could use torch.sum(axis=1) to do the final summation between the logs, but is it possible to do the Bernoulli log-likelihood computation without the use of for loops? or use at most 1 for loop? I am trying to vectorize this operation as much as possible. I could've sworn we could use LaTeX for equations before, did something change or is it another website?
Though not a good practice, you can directly use the formula on the tensors as follows (works because these are element wise operations):
import torch
p = torch.tensor([
[0.6, 0.4, 0],
[0.33, 0.34, 0.33]
])
x = torch.tensor([
[1., 1, 0],
[0, 1, 1]
])
eps = 1e-8
bll1 = (x * torch.log(p+eps) + (1-x) * torch.log(1-p+eps)).sum(axis=1)
print(bll1)
#tensor([-1.4271162748, -2.5879497528])
Note that to avoid log(0) error, I have introduced a very small constant eps inside it.
A better way to do this is to use BCELoss inside nn module in pytorch.
import torch.nn as nn
bce = nn.BCELoss(reduction='none')
bll2 = -bce(p, x).sum(axis=1)
print(bll2)
#tensor([-1.4271162748, -2.5879497528])
Since pytorch computes the BCE as a loss, it prepends your formula with a negative sign. The attribute reduction='none' says that I do not want the computed losses to be reduced (averaged/summed) across the batch in any way. This is advisable to use since we do not need to manually take care of numerical stability and error handling (such as adding eps above.)
You can indeed verify that the two solutions actually return the same tensor (upto a tolerance):
torch.allclose(bll1, bll2)
# True
or the tensors (without summing each row):
torch.allclose((x * torch.log(p+eps) + (1-x) * torch.log(1-p+eps)), -bce(p, x))
# True
Feel free to ask for further clarifications.

Multi label classification in Flux.jl?

I am currently working with a dataset in which the boundary between classes is not very well defined. I don't want to use regular classification since the nuance overlap of these classes might not be represented using that setup.
I've seen a similar setup in PyTorch where the Binary Cross Entropy Loss function was used, but other than that, I am not sure what needs to be done to reformation my problem from classification to multi label classification in Flux. From the Flux.jl docs, it looks like I may want to use a custom Split label?
This question made me thought on how to implement multi-label classification in BetaML, my own ML library, and it ended up it was relatively easily:
(EDIT: Model simplified by just using a couple of DenseLayers with the second layer's activation function f=x -> (tanh(x) + 1)/2 )
using BetaML
# Creating test data..
X = rand(2000,2)
# note that the Y are 0.0/1.0 floats
Y = hcat(round.(tanh.(0.5 .* X[:,1] + 0.8 .* X[:,2])),
round.(tanh.(0.5 .* X[:,1] + 0.3 .* X[:,2])),
round.(tanh.(max.(0.0,-3 .* X[:,1].^2 + 2 * X[:,1] + 0.5 .* X[:,2]))))
# Creating the NN model...
l1 = DenseLayer(2,10,f = relu)
l2 = DenseLayer(10,3,f = x -> (tanh(x) + 1)/2)
mynn = buildNetwork([l1,l2],squaredCost,name="Multinomial multilabel regression Model")
# Train of the model...
train!(mynn,X,Y,epochs=100,batchSize=8)
# Predictions...
ŷ = round.(predict(mynn,X))
(nrec,ncat) = size(Y)
# Just a basic accuracy measure. I could think to extend the ConfusionMatrix measures to multi-label classification if needed..
overallAccuracy = sum(ŷ .== Y)/(nrec*ncat) # 0.999
I initially thought on using softmax with a learnable beta parameter, but then I realised that such way is not possible: how would the model be able to distinguish between Y = [0 0 0] and Y = [1 1 1] ? So I ended up with a layer with an adjusted tanh function that guarantee me an output in the [0,1] range for each label "independently", and setting the threshold on 0.5, the value that maximise the loss (in BetaML the output is already a vector if the last layer has more than a single node).

how to specify final value (rather than initial value) for solving differential equations

I would like to solve a differential equation in R (with deSolve?) for which I do not have the initial condition, but only the final condition of the state variable. How can this be done?
The typical code is: ode(times, y, parameters, function ...) where y is the initial condition and function defines the differential equation.
Are your equations time reversible, that is, can you change your differential equations so they run backward in time? Most typically this will just mean reversing the sign of the gradient. For example, for a simple exponential growth model with rate r (gradient of x = r*x) then flipping the sign makes the gradient -r*x and generates exponential decay rather than exponential growth.
If so, all you have to do is use your final condition(s) as your initial condition(s), change the signs of the gradients, and you're done.
As suggested by #LutzLehmann, there's an even easier answer: ode can handle negative time steps, so just enter your time vector as (t_end, 0). Here's an example, using f'(x) = r*x (i.e. exponential growth). If f(1) = 3, r=1, and we want the value at t=0, analytically we would say:
x(T) = x(0) * exp(r*T)
x(0) = x(T) * exp(-r*T)
= 3 * exp(-1*1)
= 1.103638
Now let's try it in R:
library(deSolve)
g <- function(t, y, parms) { list(parms*y) }
res <- ode(3, times = c(1, 0), func = g, parms = 1)
print(res)
## time 1
## 1 1 3.000000
## 2 0 1.103639
I initially misread your question as stating that you knew both the initial and final conditions. This type of problem is called a boundary value problem and requires a separate class of numerical algorithms from standard (more elementary) initial-value problems.
library(sos)
findFn("{boundary value problem}")
tells us that there are several R packages on CRAN (bvpSolve looks the most promising) for solving these kinds of problems.
Given a differential equation
y'(t) = F(t,y(t))
over the interval [t0,tf] where y(tf)=yf is given as initial condition, one can transform this into the standard form by considering
x(s) = y(tf - s)
==> x'(s) = - y'(tf-s) = - F( tf-s, y(tf-s) )
x'(s) = - F( tf-s, x(s) )
now with
x(0) = x0 = yf.
This should be easy to code using wrapper functions and in the end some list reversal to get from x to y.
Some ODE solvers also allow negative step sizes, so that one can simply give the times for the construction of y in the descending order tf to t0 without using some intermediary x.

Julia Linear Regression

I was trying to fit a linear regression in Julia.
I have a data frame with 10 columns. The first 9 columns are the predictors
I call it X and the last column is the response variable I call it Y
I typed linreg(X, Y) But I get an error message saying
linreg has no method matching DataFrame and DataArray Float.
I was wondering how I could fix the issue.
I was thinking of converting X to a data Array
I tried convert(X, Array) But that threw an error as well:
'Convert has no method matching convert'
Does anyone have any suggestions
If you already have your data in a DataFrame, you should take a look at the GLM.jl package.
Specifically the lm function should do what you want and feel very familiar if you are an R user.
If you post more code (maybe which columns in your DataFrame store X and Y) we could help you further.
update: have to use dot operator in Julia 1.0 while performing scalar addition on arrays. i.e. y = m*x .+ b
You can also do linear regression using simple linear algebra.
Here is an example:
# Linear Algebra style
# For single linear regresion y= mx .+ b
m = 3.3; b = 2; x = rand(100,1)
y = m * x .+ b
# add noise
yn= y + randn(size(y)) * 0.5
# regression
X = zeros(100,2); X[:,1] = x; X[:,2] = 1.0
coeff_pred = X\yn
slope = round(coeff_pred[1], 2)
intercept = round(coeff_pred[2], 2)
println("The real slope is $m, and the predicted slope is $slope")
println("The real intercept is $b, and the predicted slope is $intercept")
You are just using convert wrong. The correct syntax is convert(T, x) it reads: convert x to a value of type T.
so basically you need to do:
linreg(convert(Array,X),convert(Array,Y))
and it should work.

How extreme values of a functional can be found using R?

I have a functional like this :
(LaTex formula: $v[y]=\int_0^2 (y'^2+23yy'+12y^2+3ye^{2t})dt$)
with given start and end conditions y(0)=-1, y(2)=18.
How can I find extreme values of this functional in R? I realize how it can be done for example in Excel but didn't find appropriate solution in R.
Before trying to solve such a task in a numerical setting, it might be better to lean back and think about it for a moment.
This is a problem typically treated in the mathematical discipline of "variational calculus". A necessary condition for a function y(t) to be an extremum of the functional (ie. the integral) is the so-called Euler-Lagrange equation, see
Calculus of Variations at Wolfram Mathworld.
Applying it to f(t, y, y') as the integrand in your request, I get (please check, I can easily have made a mistake)
y'' - 12*y + 3/2*exp(2*t) = 0
You can go now and find a symbolic solution for this differential equation (with the help of a textbook, or some CAS), or solve it numerically with the help of an R package such as 'deSolve'.
PS: Solving this as an optimization problem based on discretization is possible, but may lead you on a long and stony road. I remember solving the "brachistochrone problem" to a satisfactory accuracy only by applying several hundred variables (not in R).
Here is a numerical solution in R. First the functional:
f<-function(y,t=head(seq(0,2,len=length(y)),-1)){
len<-length(y)-1
dy<-diff(y)*len/2
y0<-(head(y,-1)+y[-1])/2
2*sum(dy^2+23*y0*dy+12*y0^2+3*y0*exp(2*t))/len
}
Now the function that does the actual optimization. The best results I got were using the BFGS optimization method, and parametrizing using dy rather than y:
findMinY<-function(points=100, ## number of points of evaluation
boundary=c(-1,18), ## boundary values
y0=NULL, ## optional initial value
method="Nelder-Mead", ## optimization method
dff=T) ## if TRUE, optimizes based on dy rather than y
{
t<-head(seq(0,2,len=points),-1)
if(is.null(y0) || length(y0)!=points)
y0<-seq(boundary[1],boundary[2],len=points)
if(dff)
y0<-diff(y0)
else
y0<-y0[-1]
y0<-head(y0,-1)
ff<-function(z){
if(dff)
y<-c(cumsum(c(boundary[1],z)),boundary[2])
else
y<-c(boundary[1],z,boundary[2])
f(y,t)
}
res<-optim(y0,ff,control=list(maxit=1e9),method=method)
cat("Iterations:",res$counts,"\n")
ymin<-res$par
if(dff)
c(cumsum(c(boundary[1],ymin)),boundary[2])
else
c(boundary[1],ymin,boundary[2])
}
With 500 points of evaluation, it only takes a few seconds with BFGS:
> system.time(yy<-findMinY(500,method="BFGS"))
Iterations: 90 18
user system elapsed
2.696 0.000 2.703
The resulting function looks like this:
plot(seq(0,2,len=length(yy)),yy,type='l')
And now a solution that numerically integrates the Euler equation.
As #HansWerner pointed out, this problem boils down to applying the Euler-Lagrange equation to the integrand in OP's question, and then solving that differential equation, either analytically or numerically. In this case the relevant ODE is
y'' - 12*y = 3/2*exp(2*t)
subject to:
y(0) = -1
y(2) = 18
So this is a boundary value problem, best approached using bvpcol(...) in package bvpSolve.
library(bvpSolve)
F <- function(t, y.in, pars){
dy <- y.in[2]
d2y <- 12*y.in[1] + 1.5*exp(2*t)
return(list(c(dy,d2y)))
}
init <- c(-1,NA)
end <- c(18,NA)
t <- seq(0, 2, by = 0.01)
sol <- bvpcol(yini = init, yend = end, x = t, func = F)
y = function(t){ # analytic solution...
b <- sqrt(12)
a <- 1.5/(4-b*b)
u <- exp(2*b)
C1 <- ((18*u + 1) - a*(exp(4)*u-1))/(u*u - 1)
C2 <- -1 - a - C1
return(a*exp(2*t) + C1*exp(b*t) + C2*exp(-b*t))
}
par(mfrow=c(1,2))
plot(t,y(t), type="l", xlim=c(0,2),ylim=c(-1,18), col="red", main="Analytical Solution")
plot(sol[,1],sol[,2], type="l", xlim=c(0,2),ylim=c(-1,18), xlab="t", ylab="y(t)", main="Numerical Solution")
It turns out that in this very simple example, there is an analytical solution:
y(t) = a * exp(2*t) + C1 * exp(sqrt(12)*t) + C2 * exp(-sqrt(12)*t)
where a = -3/16 and C1 and C2 are determined to satisfy the boundary conditions. As the plots show, the numerical and analytic solution agree completely, and also agree with the solution provided by #mrip

Resources