I am writing a function that allows many starting parameter combinations to be tried to fit nonlinear regression as nlsList() only allows one set of starting parameters.
I have managed this fine but want to add a predictions data frame into the function for easy plotting that returns the best fit curve at smaller increments of x than the data supplies. For example having 100 points instead of 10 to achieve a nice smooth predicted curve.
In my function arguments I specify the formula as an argument and treat is a formula within the function. Some of the formulas include a function that I make to encompass the non-linear relationship. I then use do.call(function, new.starting params) to pass on the predicted parameters onto a predictions data frame.
I have not found a way of isolating and passing any defined and fixed variables from the function to the do.call() function.
Is there a way to get the values that are defined in the formula? So Tc = 25 in this example...
model = y ~ schoolfield.high(ln.c, Ea, Eh, Th, temp = x, Tc = 25)
formula <- as.formula(model)
vars <- all.vars(formula[[3]])
This returns :
"ln.c" "Ea" "Eh" "Th" "x"
I am wondering if there is a way to isolate defined variables from a formula object, or if there is any other way I could do this?
Related
I had to transform a variable response (e.g. Variable 1) to fulfil the assumptions of linear models in lmer using an approach suggested here https://www.r-bloggers.com/2020/01/a-guide-to-data-transformation/ for heavy-tailed data and demonstrated below:
TransformVariable1 <- sqrt(abs(Variable1 - median(Variable1))
I then fit the data to the following example model:
fit <- lmer(TransformVariable1 ~ x + y + (1|z), data = dataframe)
Next, I update the reference grid to account for the transformation as suggested here Specifying that model is logit transformed to plot backtransformed trends:
rg <- update(ref_grid(fit), tran = "TransformVariable1")
Neverthess, the emmeans are not back transformed to the original scale after using the following command:
fitemm <- as.data.frame(emmeans(rg, ~ x + y, type = "response"))
My question is: How can I back transform the emmeans to the original scale?
Thank you in advance.
There are two major problems here.
The lesser of them is in specifying tran. You need to either specify one of a handful of known transformations, such as "log", or a list with the needed functions to undo the transformation and implement the delta method. See the help for make.link, make.tran, and vignette("transformations", "emmeans").
The much more serious issue is that the transformation used here is not a monotone function, so it is impossible to back-transform the results. Each transformed response value corresponds to two possible values on either side of the median of the original variable. The model we have here does not estimate effects on the given variable, but rather effects on the dispersion of that variable. It's like trying to use the speedometer as a substitute for a navigation system.
I would suggest using a different model, or at least a different response variable.
A possible remedy
Looking again at this, I wonder if what was meant was the symmetric square-root transformation -- what is shown multiplied by sign(Variable1 - median(Variable1)). This transformation is available in emmeans::make.tran(). You will need to re-fit the model.
What I suggest is creating the transformation object first, then using it throughout:
require(lme4)
requre(emmeans)
symsqrt <- make.tran("sympower", param = c(0.5, median(Variable1)))
fit <- with(symsqrt,
lmer(linkfun(Variable1) ~ x + y + (1|z), data = dataframe)
)
emmeans(fit, ~ x + y, type = "response")
symsqrt comprises a list of functions needed to implement the transformation. The transformation itself is symsqrt$linkfun, and the emmeans package knows to look for the other stuff when the response transformation is named linkfun.
BTW, please break the habit of wrapping emmeans() in as.data.frame(). That renders invisible some important annotations, and also disables the possibility of following up with contrasts and comparisons. If you think you want to see more precision than is shown, you can precede the call with emm_options(opt.digits = FALSE); but really, you are kidding yourself if you think those extra digits give you useful information.
me again!
In one of my assignment I have to create a plot with a regression line in and simply read this plot and give data.
Question: "at 80 degrees F what is the wind-speed?"
By simply looking at the plot you can state its ~9m/s at 80F. This would suffice, but knowing what you can do in R i would like to know for ether future reference and now.
How does one using only the Data frame ( in picture ) extract a Y value for a given X value using linear regression
Linear regression because the value itself isn't given, but it can be extracted if you assume its a linear regression.
So in essence instead of reading out the value in the plot ( pic 2 ) I would like to use a function that given a X(temp) value in the DF prints out a Y(wind) value using linear regression.
I tried other stuff i found on stackoverflow, using
lm(data~data, dataframe) but that didnt give me the result i desired.
You might look for the predict function.
First make a linear regression and then calculate the predicted value with predict. Just keep in mind, that you add your X-value in a data.frame.
datasets::airquality
lm_air <- lm(Wind ~ Temp, airquality)
predict(lm_air, data.frame(Temp = 80))
I am trying to convert Absorbance (Abs) values to Concentration (ng/mL), based on an established linear model & standard curve. I planned to do this by using the predict() function. I am having trouble getting predict() to return the desired results. Here is a sample of my code:
Standards<-data.frame(ng_mL=c(0,0.4,1,4),
Abs550nm=c(1.7535,1.5896,1.4285,0.9362))
LM.2<-lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
Abs<-c(1.7812,1.7309,1.3537,1.6757,1.7409,1.7875,1.7533,1.8169,1.753,1.6721,1.7036,1.6707,
0.3903,0.3362,0.2886,0.281,0.3596,0.4122,0.218,0.2331,1.3292,1.2734)
predict(object=LM.2,
newdata=data.frame(Concentration=Abs[1]))#using Abs[1] as an example, but I eventually want predictions for all values in Abs
Running that last lines gives this output:
> predict(object=LM.2,
+ newdata=data.frame(Concentration=Abs[1]))
1 2 3 4
0.5338437 0.4731341 0.3820697 -0.0732525
Warning message:
'newdata' had 1 row but variables found have 4 rows
This does not seem to be the output I want. I am trying to get a single predicted Concentration value for each Absorbance (Abs) entry. It would be nice to be able to predict all of the entries at once and add them to an existing data frame, but I can't even get it to give me a single value correctly. I've read many threads on here, webpages found on Google, and all of the help files, and for the life of me I cannot understand what is going on with this function. Any help would be appreciated, thanks.
You must have a variable in newdata that has the same name as that used in the model formula used to fit the model initially.
You have two errors:
You don't use a variable in newdata with the same name as the covariate used to fit the model, and
You make the problem much more difficult to resolve because you abuse the formula interface.
Don't fit your model like this:
mod <- lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
fit your model like this
mod <- lm(log(Abs550nm) ~ ng_mL, data = standards)
Isn't that some much more readable?
To predict you would need a data frame with a variable ng_mL:
predict(mod, newdata = data.frame(ng_mL = c(0.5, 1.2)))
Now you may have a third error. You appear to be trying to predict with new values of Absorbance, but the way you fitted the model, Absorbance is the response variable. You would need to supply new values for ng_mL.
The behaviour you are seeing is what happens when R can't find a correctly-named variable in newdata; it returns the fitted values from the model or the predictions at the observed data.
This makes me think you have the formula back to front. Did you mean:
mod2 <- lm(ng_mL ~ log(Abs550nm), data = standards)
?? In which case, you'd need
predict(mod2, newdata = data.frame(Abs550nm = c(1.7812,1.7309)))
say. Note you don't need to include the log() bit in the name. R recognises that as a function and applies to the variable Abs550nm for you.
If the model really is log(Abs550nm) ~ ng_mL and you want to find values of ng_mL for new values of Abs550nm you'll need to invert the fitted model in some way.
I have just finished fitting a GAM using the mgcv package (I will call this model gam1.5). I have been playing around with the vis.gam function and I have a question I have not been able to solve.
I would like to normalize the fitted values of my model so when I use vis.gam, the z-axis has limits [0, 1].
My idea was to apply the normalization formula in the $fitted.values of my GAM model as follows:
gam1.5$fitted.values<-(gam1.5$fitted.values-min(gam1.5$fitted.values))/(max(gam1.5$fitted.values)-min(gam1.5$fitted.values))
However, when I run the vis.gam, it does not change the scale of the z-axis. I was wondering if I am applying the normalization formula to the incorrect object (a different one to $fitted.values) within the GAM object.
Yes. Because vis.gam is based on predict.gam and your change to $fitted.values has no effect!
In fact, you can't achieve your goal with vis.gam. This function simply produces a plot and returns nothing for user to later reproduce the plot (unless vis.gam is called again). This means, we will need to work with predict.gam. Here are the basic steps.
Set up a 2D grid / mesh. You may want to use exclude.too.far to filter data far away from training data to avoid ridiculous spline / polynomial extrapolation (as vis.gam does);
Construct a new data frame newdat (from the above grid) and call oo <- predict.gam(gam1.5, newdat, type = "terms") to obtain term-wise prediction. This is a matrix. You need to retain only the column associated with the 2D smooth you want to plot. Let's say this column is stored into a vector z;
Augment z into a matrix by padding NA for those too far data.
Normalize z onto [0, 1].
Use image or contour to produce the plot yourself.
Ideally we should take an example (maybe from ?vis.gam) and work through the above steps. However, you returned to me saying that you quickly sorted out the problem using predict.gam. Then I will not add examples for demonstration.
I have created a loop to fit a non-linear model to six data points by participants (each participant has 6 data points). The first model is a one parameter model. Here is the code for that model that works great. The time variable is defined. The participant variable is the id variable. The data is in long form (one row for each datapoint of each participant).
Here is the loop code with 1 parameter that works:
1_p_model <- dlply(discounting_long, .(Participant), function(discounting_long) {wrapnls(indiff ~ 1/(1+k*time), data = discounting_long, start = c(k=0))})
However, when I try to fit a two parameter model, I get this error "Error: singular gradient matrix at initial parameter estimates" while still using the wrapnls function. I realize that the model is likely over parameterized, that is why I am trying to use wrapnls instead of just nls (or nlsList). Some in my field insist on seeing both model fits. I thought that the wrapnls model avoids the problem of 0 or near-0 residuals. Here is my code that does not work. The start values and limits are standard in the field for this model.
2_p_model <- dlply(discounting_long, .(Participant), function(discounting_long) {nlxb(indiff ~ 1/(1+k*time^s), data = discounting_long, lower = c (s = 0), start = c(k=0, s=.99), upper = c(s=1))})
I realize that I could use nlxb (which does give me the correct parameter values for each participant) but that function does not give predictive values or residuals of each data point (at least I don't think it does) which I would like to compute AIC values.
I am also open to other solutions for running a loop through the data by participants.
You mention at the end that 'nlxb doesn't give you residuals', but it does. If your result from your call to nlxbis called fit then the residuals are in fit$resid. So you can get the fitted values using just by adding them to the original data. Honestly I don't know why nlxb hasn't been made to work with the predict() function, but at least there's a way to get the predicted values.