R non-linear model fitting using fitModel function - r

I want to fit a non-linear model to a real data.
The real data consists of 2 known numerical vectors ; thickness as 'x' and fh as 'y'
thickness=seq(0.15,2.00,by=0.05)
fh = c(5.17641, 4.20461, 3.31091, 2.60899, 2.23541, 1.97771, 1.88141, 1.62821, 1.50138, 1.51075, 1.40850, 1.26222, 1.09432, 1.13202, 1.12918, 1.10355, 1.11867, 1.09740,1.08324, 1.05687, 1.19422, 1.22984, 1.34516, 1.19713,1.25398 ,1.29885, 1.33658, 1.31166, 1.40332, 1.39550,1.37855, 1.41491, 1.59549, 1.56027, 1.63925, 1.72440, 1.74192, 1.82049)
plot(thickness,fh)
This is apparently non-linear. So, I am trying to fit this model as a non-linear function of
y= x*2/3+(2+2*a)/(3*x)
Variable a is an unknown constant and I am trying to find the best constant a that minimizes the sum of square of error between the regression line and the real data.
I first used a function fitModel that I found on a YouTube video, Fitting Functions to Data in R.
library(TIMP)
f=fitModel(fh~thickness^2/3+(2+2*A)/(3*thickness)) #it finds the coefficient 'A'
coef(f) # to represent just the coefficient
However, there's an error
Error in modelspec[[datasetind[i]]] : subscript out of bounds
So, as an alternative, want to find a plot of 'a' and 'the Sum of Squares of Error'. This time, I have such a hard time finding 'a' and plotting this graph. By manual work, I figured out the value 'a' is somewhere near 0.2 but this is not a precise value.
It would be helpful if someone could manifest either:
Why the fitModel function didn't work or
How to find the value a and plot the graph.

You could try this instead:
yf = function(a,xv) xv*(2/3)+(2+2*a)/(3*xv)
yf(2,thickness)
f <- function (a,y, xv) sum((y - yf(a,xv))^2)
f(2,fh,thickness)
xmin <- optimize(f, c(0, 10), tol = 0.0001, y=fh,xv=thickness)
xmin
plot(thickness,fh)
lines(thickness,yf(xmin$minimum,thickness),col=3)

Related

How can I load a library in R to call it from Excel with bert-toolkit?

Bert-toolkit is a very nice package to call R functions from Excel. See: https://bert-toolkit.com/
I have used bert-toolkit to call a fitted neuralnet (avNNnet fitted with Caret) within a wrapper function in R from Excel VBA. This runs perfect. This is the code to load the model within the wrapper function in bert-toolkit:
load("D:/my_model_avNNet.rda")
neuraln <- function(x1,x2,x3){
xx <- data.frame(x1,x2,x3)
z <- predict(my_model_avNNET, xx)
z
}
Currently I tried to do this with a fitted GAM (fitted with package mgcv). Although I do not succeed. If I call the fitted GAM from Excel VBA it gives error 2015. If I call the fitted GAM from a cell it gives #VALUE! At the same time the correct outcome of the calculation is shown in the bert-console!
This is the code to load the model in the wrapperfunction in bert-toolkit:
library(mgcv)
load("D:/gam_y_model.rda")
testfunction <- function(k1,k2){
z <- predict(gam_y, data.frame(x = k1, x2 = k2))
print (z)
}
The difference between the avNNnet-model (Caret) and the GAM-model (mgcv) is that the avNNnet-model does NOT need the Caret library to be loaded to generate a prediction, while the GAM-model DOES need the mgcv library to be loaded.
It seems to be not sufficient to load the mgvc-library in the script with the GAM-model which loads the GAM-model in a wrapper function in bert-toolkit, as I did in the code above. Although the correct outcome of the model is shown in the bert-console. It does not generate the correct outcome in Excel.
I wonder how this is possible and can be solved. It seems to me that maybe there are two instances of R running in bert-toolkit.
How can I load the the mgcv-library in such a way that it can be used by the GAM-model within the function called from Excel?
This is some example code to fit the GAM with mgcv and save to model (after running this code the model can uploaded in bert-toolkit with the code above) :
library(mgcv)
# construct some sample data:
x <- seq(0, pi * 2, 0.1)
x2 <- seq(0, pi * 20, 1)
sin_x <- sin(x)
tan_x2 <- tan(x2)
y <- sin_x + rnorm(n = length(x), mean = 0, sd = sd(sin_x / 2))
Sample_data <- data.frame(y,x,x2)
# fit gam:
gam_y <- gam(y ~ s(x) + s(x2), method = "REML")
# Make predictions with the fitted model:
x_new <- seq(0, max(x), length.out = 100)
x2_new <- seq(0, max(x2), length.out = 100)
y_pred <- predict(gam_y, data.frame(x = x_new, x2 = x2_new))
# save model, to load it later in bert-toolkit:
setwd("D:/")
save(gam_y, file = "gam_y_model.rda")
One of R's signatures is method dispatching where users call the same named method such as predict but internally a different variant is run such as predict.lm, predict.glm, or predict.gam depending on the model object passed into it. Therefore, calling predict on an avNNet model is not the same predict on a gam model. Similarly, just as the function changes due to the input, so does the output change.
According to MSDN documents regarding the Excel #Value! error exposed as Error 2015:
#VALUE is Excel's way of saying, "There's something wrong with the way your formula is typed. Or, there's something wrong with the cells you are referencing."
Fundamentally, without seeing actual results, Excel may not be able to interpret or translate into Excel range or VBA type the result R returns from gam model especially as you describe R raises no error.
For example, per docs, the return value of the standard predict.lm is:
predict.lm produces a vector of predictions or a matrix of predictions...
However, per docs, the return value of predict.gam is a bit more nuanced:
If type=="lpmatrix" then a matrix is returned which will give a vector of linear predictor values (minus any offest) at the supplied covariate values, when applied to the model coefficient vector. Otherwise, if se.fit is TRUE then a 2 item list is returned with items (both arrays) fit and se.fit containing predictions and associated standard error estimates, otherwise an array of predictions is returned. The dimensions of the returned arrays depends on whether type is "terms" or not: if it is then the array is 2 dimensional with each term in the linear predictor separate, otherwise the array is 1 dimensional and contains the linear predictor/predicted values (or corresponding s.e.s). The linear predictor returned termwise will not include the offset or the intercept.
Altogether, consider adjusting parameters of your predict call to render a numeric vector for easy Excel interpretation and not a matrix/array or some other higher dimension R type that Excel cannot render:
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="response")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="lpmatrix")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="linked")
return(z$fit) # NOTICE fit ELEMENT USED
}
...
Further diagnostics:
Check returned object of predict.glm with str(obj) and class(obj)/ typeof(obj) to see dimensions and underlying elements and compare with predict in caret;
Check if high precision of decimal numbers is the case such as Excel's limits of 15 decimal points;
Check amount of data returned (exceeds Excel's sheet row limit of 220 or cell limit of 32,767 characters?).

Setting the y value for a ROC

Apologies for a very basic question. I'm struggling to get R to recognise the y values for a ROC
I'm trying to do a basic ROC but can't seem to set the vector for y.
fullmodel= glm(culture_positive ~ No_symptoms + sex + art_status_v1 +current_cd4 +
bmi_v1 +nurse_tb_diagnosis_crp_v1 + temperature_v1,
family="binomial", data= Data1)
roc(y , fullmodel$fitted.values, plot=TRUE)
Error in roc(y, fullmodel$fitted.values, plot = TRUE) :
object 'y' not found
So 'y' is a column in my dataset Data1 labelled 'culture_positive' as per the glm but whatever I try I keep getting this message that 'y' is not found.
Once again, apologies for a basic question but it is really holding me up.
Since y is not in your global environment you need to specify where to find y. You can either use the value you used to fit the model:
roc(culture_positive , fullmodel$fitted.values, plot=TRUE)
or the response stored in the glm object
roc(fullmodel$y , fullmodel$fitted.values, plot=TRUE)
I would recommend the second option, it's somewhat safer, because you take y and fitted.values from the same object, so they will fit together.

Error when fitting a beta distribution: the function mle failed to estimate the parameters with error code 100

I'm trying to use fitdist () function from the fitdistrplus package to fit my data to different distributions. Let's say that my data looks like:
x = c (1.300000, 1.220000, 1.160000, 1.300000, 1.380000, 1.240000,
1.150000, 1.180000, 1.350000, 1.290000, 1.150000, 1.240000,
1.150000, 1.120000, 1.260000, 1.120000, 1.460000, 1.310000,
1.270000, 1.260000, 1.270000, 1.180000, 1.290000, 1.120000,
1.310000, 1.120000, 1.220000, 1.160000, 1.460000, 1.410000,
1.250000, 1.200000, 1.180000, 1.830000, 1.670000, 1.130000,
1.150000, 1.170000, 1.190000, 1.380000, 1.160000, 1.120000,
1.280000, 1.180000, 1.170000, 1.410000, 1.550000, 1.170000,
1.298701, 1.123595, 1.098901, 1.123595, 1.110000, 1.420000,
1.360000, 1.290000, 1.230000, 1.270000, 1.190000, 1.180000,
1.298701, 1.136364, 1.098901, 1.123595, 1.316900, 1.281800,
1.239400, 1.216989, 1.785077, 1.250800, 1.370000)
Next, if i run fitdist (x, "gamma") everything is fine, but if I use fitdist (x, "beta") instead I get the following error:
Error in start.arg.default(data10, distr = distname) :
values must be in [0-1] to fit a beta distribution
Ok, so I'm not native english but as far as I understand this method requires data to be in the range [0,1], so I scale it by using x_scaled = (x-min(x))/max(x). This gives me a vector with values in that range that perfectly correlates the original vector x.
Because of x_scaled is of class matrix, I convert into a numeric vector using as.numeric(). And then fit the model with fitdist(x_scale,"beta").
This time I get the following error:
Error in fitdist(x_scale, "beta") :
the function mle failed to estimate the parameters, with the error code 100
So after that I've been doing some search engine queries but I don't find anything useful. Does anybody ave an idea of whats going on wrong here? Thank you
By reading into the source code, it can be found that the default estimation method of fitdist is mle, which will call mledist from the same package, which will construct a negative log-likelihood for the distribution you have chosen and use optim or constrOptim to numerically minimize it. If there is anything wrong with the numerical optimization process, you get the error message you've got.
It seems like the error occurs because when x_scaled contains 0 or 1, there will be some problem in calculating the negative log-likelihood for beta distribution, so the numerical optimization method will simply broke. One dirty trick is to let x_scaled <- (x - min(x) + 0.001) / (max(x) - min(x) + 0.002), so there is no 0 nor 1 in x_scaled, and fitdist will work.

Maximum Likelihood Estimation by hand for normal distribution in R

I am a newbie in R and searched in several forums but didn't got an answer so far. We are asked to do a maximum likelihood estimation in R for an AR(1) model without using the arima() command. We should estimate the intercept alpha, the coefficient beta and the variance sigma2. The data should be following a normal distribution, where I derived the log-likelihood function from. I was then trying to program the function with the following code:
Y <- data$V2
nlogL <- function(theta,Y){
alpha <- theta[1]
rho <- theta[2]
sigma2 <- theta[3]
logl <- -(100/2)*log(2*pi) - (100/2)*log(theta[3]) - (0.5*theta[3])*sum(Y-(theta[1]/(1-theta[2]))**2)
return(-logl)
}
par0 <- c(0.1,0.1,0.1)
opt <- optim(par0, nlogL, hessian = TRUE)
When running this code I always get the error message: Error in Y - (theta[1]/(1 - theta[2]))^2 : 'Y' is missing.
It would be great if you could have a look whether the likelihood function is derived correctly.
Thank you very much in advance for your help!
Your nlogL function should only take a single argument, theta. So you can fix your immediate problem simply by removing the 2nd argument to the function, and the Y variable would be resolved by its definition outside of nlogL. Alternatively, you could keep the signature of nlogL as-is and pass Y as an additional argument through optim like this: optim(par0, nlogL, hessian = TRUE, Y=Y). Also I would second chinsoon12's suggestion to review ?optim.

Model fitting with nls.lm in R, "Error: unused argument"

I'm trying to use the nls.lm function in the minpack.lm to fit a non-linear model to some data from a psychophysics experiment.
I've had a search around and can't find a lot of information about the package so have essentially copied the format of the example given on the nls.lm help page. Unfortunately my script is still failing to run and R is throwing out this error:
Error in fn(par, ...) :
unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
It appears that the script thinks the data I want to fit the model to is irrelevant, which is definitely wrong.
I'm expecting it to fit the model and produce a value of 0.5403 for the spare parameter (w).
Any help is greatly appreciated.
I'm making the transfer from Matlab over to R so apologies if my code looks sloppy.
Here's the script.
install.packages("pracma")
require(pracma)
install.packages("minpack.lm")
require(minpack.lm)
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(w,n) .5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) )
# example for residFun
# calculates an error rate of 2.59%
a=matrix(c(2,1),1,byrow=TRUE)
residFun(.23,a)
# Initial guess for parameter to be fitted (w)
parStart=list(w=0.2)
# Recorded accuracies in matrix, 1- gives errors to input into residFun
# i.e. the y-values I want to fit the model
Acc=1-(matrix(c(0.8571,0.7143,0.6250,0.6154,0.5333,0.3846),ncol=6))
# Ratios (converted to proportions) used in testing
# i.e. the points along the x-axis to fit the above data to
Ratios=matrix(c(0.3,0.7,0.4,0.6,0.42,0.58,0.45,0.55,0.47,0.53,0.49,0.51),nrow=6,byrow=TRUE)
# non-linear model fitting, attempting to calculate the value of w using the Levenberg-Marquardt nonlinear least-squares algorithm
output=nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
# Error message shown after running
# Error in fn(par, ...) :
# unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
The error means you passed a function an argument that it did not expect. ?nls.lm has no argument observed, so it is passed to the function passed to fn, in your case, residFun. However, residFun doesn't expect this argument either, hence the error. You need to redefine this function like this :
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(par,observed, n) {
w <- par$w
r <- observed - (.5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) ))
return(r)
}
It gives the following result :
> output = nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
> output
Nonlinear regression via the Levenberg-Marquardt algorithm
parameter estimates: 0.540285874836135
residual sum-of-squares: 0.02166
reason terminated: Relative error in the sum of squares is at most `ftol'.
Why that happened :
It seems that you were inspired by this example in he documentation :
## residual function
residFun <- function(p, observed, xx) observed - getPred(p,xx)
## starting values for parameters
parStart <- list(a=3,b=-.001, c=1)
## perform fit
nls.out <- nls.lm(par=parStart, fn = residFun, observed = simDNoisy,
xx = x, control = nls.lm.control(nprint=1))
Note that observed is an argument of residFun here.

Resources