I'm trying to fit a known function to data points, finding the best parameters settings. For that purpose I'm using the optimr package.
Here's a reproducible example:
packages = c("optimr", "ggplot2", "tidyverse")
lapply(packages, library, character.only=T)
testfun <- function(par, x){
case_when(
x < par[1] ~ par[2]*x,
x >= par[1] ~ par[3] + par[4]*x^2)
}
optfun <- function(par, x, y){
sum((testfun(par, x) - y)^2)
}
optout <- optimr(par = c(20000,0,0,0), optfun, x = ggplot2::midwest$popdensity, y = ggplot2::midwest$poptotal, method = "L-BFGS-B")
ggplot(ggplot2::midwest, aes(x = popdensity, y = poptotal)) +
geom_point() +
stat_function(fun = testfun, args = list(par = optout$par))
I can fit a simple function and the setup will find the best fitting parameters. But par[1] does not change and simply stays on the initial value when i introduce a piecewise function.
Any help is much appreciated!
Piecewise functions, especially piecewise functions with flat parts, will give optimisers trouble. Optimisers will generally give up on a parameter if the function value doesnt' change for some step size, and might not search far enough to hit the step.
Look for other parameters to optimr that might widen the search space.
Related
I created a function to roll apply an exponentially weighted least-squares using the dynlm package. Here is the code:
residualization<-function(df,formula_ref, size){
rollapply(df,
width=size,
FUN = ewma_regression,
formula_ref = formula_ref,
by.column=FALSE, align="right")
}
ewma_regression<-function(x,formula_ref) {
n<-nrow(x)
weights <- 0.06*0.94^(seq(n-1,0,by=-1))
t <- dynlm(formula=as.formula(formula_ref), data = as.zoo(x),weights = weights)
return(t$residuals)
}
However when I run this code on my dataset, it shows the problem:
Error in as.formula(formula_ref) : object 'formula_ref' not found
When I try to debug it, inside the environment of the function, the variable formula_ref does exist! However even inside the debug mode, I cannot run the dynlm regression even if I try to set formula_ref to a temporary formula object.
Can anyone help me out? I know it might be a silly mistake but I can't find out!
A reproducible example would be:
dates<-seq.Date(from=as.Date("2010-01-01"), length.out = 1000, by="day")
teste1<-data.frame(x=rnorm(1000),y=rnorm(1000)*5)
teste2<-xts(teste1,order.by = dates)
formula.test<- y ~ x + I(x^2)
teste3<-residualization(df=teste2,formula_ref = formula.test, size=100)
You can just wrap y ~ x + I(x^2) in quotation marks ("y ~ x + I(x^2)").
I am trying to use the package rgp for equations discovery
library(rgp)
x = c (1:100)
y = 5*x+3*sin(x)+4*x^2+75
data1 = data.frame(x,y)
newFuncSet <- functionSet("+","-","*")
result1 <- symbolicRegression(y ~ x, data = data1, functionSet = newFuncSet, stopCondition = makeStepsStopCondition(2000))
plot(data1$y, col=1, type="l"); points(predict(result1, newdata = data1), col=2, type="l")
model <- result1$population[[which.min(result1$fitnessValues)]]
However, I keep getting an error message.I would be grateful for your help in pointing out the errors I have made above.
Useful references (it would be great to have this in R):
https://www.researchgate.net/publication/237050734_Improving_Genetic_Programming_Based_Symbolic_Regression_Using_Deterministic_Machine_Learning
The problem is that R treats the x vector as integers and has some problems with types further. Try to use type x into numeric specifically:
x <- as.numeric(1:100)
It worked for me.
The R code shown below is a minimal working example to reproduce an error that I can't say I understand. Running the script should yield the error, Error in eval(expr, envir, enclos) : could not find function "fitModel". After reading a thing or two on environments I think I understand why this is happening in this case, "fitModel" is not defined in the execution environment of "obscureFunction". This I fixed by making the following change to "myFormula":
myFormula <- "y ~ eval(fitModel(x, a), envir = environment(fitModel))"
I don't understand how "fitModel" can be evaluated in the environment of "fitModel" when the function can't be found in the calling environment of "obscureFunction", in other words I don't understand why this code change works. I also don't understand why the original code works fine if the body of "topFunction" is run without calling it, i.e. if we define "fitModel" and "obscureFunction" in R_GlobalEnv and call "obscureFunction" from the console.
## Minimum Working Example to reproduce error
rm(list = ls())
library(minpack.lm)
topFunction <- function(){
fitModel <- function(x, a){
exp(-a * x)
}
## Create a function to use with lapply()
obscureFunction <- function(){
x <- seq(-1, 1, 0.01)
y <- exp(-0.5 * x)
Data <- data.frame(x, y)
init <- c(a = 1)
myFormula <- "y ~ fitModel(x, a)"
myFormula <- as.formula(myFormula)
nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data)
return(nlsOutput)
}
## Function call
obscureFunction()
## Other calculations done with fitModel()
}
topFunction()
Well, there are two issues here. The first is using a string for a formula. It's better to use
myFormula <- y ~ fitModel(x, a)
The reason is that formulas capture their environment, strings do not. (As noted by #BridieG, the as.formula() will capture the environment; I skipped over that line when reading the code. I still think it's better to create the formula directly.) Having a reference environment makes it easier to find functions used in a formula. So if you were using lm() rather than nlsLM, this would work with these two changes
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, 1)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- lm(formula = myFormula, data = Data)
This works with the formula syntax (unquoted var names) and not the string because the formula can capture the environment.
At least that's how it should work. Package authors are free to evaluate formulas how ever they want, and the authors of the nlsLM() function decided to ignore the environment assigned to the formula. They do so in this function inside nlsLM()
FCT <- function(par) {
mf[m] <- par
rhs <- eval(formula[[3L]], envir = mf)
res <- lhs - rhs
res <- .swts * res
res
}
So this is the second problem. Here they enforce the evaluation in the mf object which is a data.frame made up of the covariates of the data and the parameter estimates. Had it been written as
rhs <- eval(formula[[3L]], envir = mf, environment(formula))
it would have worked. This is basically what model.frame() does in lm() that allows this to work. We can make our own "corrected" version of the function with
# tested with minpack.lm_1.1-8
nlsLM2<-nlsLM
body(nlsLM2)[[27]][[3]][[3]][[3]]<-quote(rhs<-eval(formula[[3L]], envir = mf, environment(formula)))
And then make these substitutions
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, a)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- nlsLM2(formula = myFormula, start = init, data = Data)
it work work and return
Nonlinear regression model
model: y ~ fitModel(x, a)
data: Data
a
0.5
residual sum-of-squares: 0
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.49e-08
So there's not really much you can say about how all R functions handle environments and scope. This behavior is unique to how the nlsLM() authors decided to evaluate their parameters.
I would like to write a function and calls different sub-functions with parameters specified by string, such as:
genericModel <- function(model, dat, y, x, ...) {
fit <- get(model)(get(y) ~ get(x), data = dat, ...)
return(fit)
}
I am able to get it to work with simple cases:
> d <- data.frame(x.var = rnorm(10), y.var = rnorm(10), w = rep(1, 10))
> genericModel('lm', d, 'y.var', 'x.var')
Call:
get(model)(formula = get(y) ~ get(x), data = dat)
Coefficients:
(Intercept) get(x)
-0.04242 -0.31619
However, I have not been successful in terms of passing other optional arguments by string:
> genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
Error in model.frame.default(formula = get(y) ~ get(x), data = dat, weights = "w", :
variable lengths differ (found for '(weights)')
I know I can do genericModel('lm', d, 'y.var', 'x.var', weights = d$w), but that defeats the purpose of creating a flexible function where I can specify the model and column names by string.
Also I can foresee complications where the optional parameters include both column names of the data.frame(ex:weights = w) and generic options for the sub-function(ex:na.action=na.pass).
EDIT:
Just to clarify, what I am hoping to achieve is:
genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
genericModel('glm', d, 'y.var', 'x.var', family = 'binomial')
To run linear regression and logistic regression, respectively. I need some way to pass the optional arguments when calling genericModel.
Does anyone know how to deal with this? Thanks.
One suggestion: rather than fiddling with strings to specify analysis variables, what you should do is pass the formula. This is also much more flexible, since you'll be able to pass complicated model formulas directly to the underlying functions without any parsing.
If you do this, then obtaining what you want is simple with some language hacking. Get the call to the function, then manipulate it to call the model-fitting function instead.
genericModel <- function(mod, formula, data, ...)
{
cl <- match.call(expand=TRUE)
cl[[1]] <- cl$mod
cl$mod <- NULL
eval(cl, parent.frame())
}
genericModel(lm, mpg ~ hp, data=mtcars, weights=gear)
genericModel(glm, Volume ~ Girth + Height, data=trees, family=Gamma(link=log))
I've done a fair amount of reading here on SO and learned that I should generally avoid manipulation of formula objects as strings, but I haven't quite found how to do this in a safe manner:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
# Arguments are unquoted and in the typical form for lm etc
# Do some plotting with lattice using formula & groups (works, not shown)
# Append 'groups' to 'formula':
# Change y ~ x as passed in argument 'formula' to
# y ~ x * gr where gr is the argument 'groups' with
# scoping so it will be understood by aov
new_formula <- y ~ x * gr
# Now do some anova (could do if formula were right)
model <- aov(formula = new_formula, data = data)
# And print the aov table on the plot (can do)
print(summary(model)) # this will do for testing
}
Perhaps the closest I came was to use reformulate but that only gives + on the RHS, not *. I want to use the function like this:
p <- tf(carat ~ color, groups = clarity, data = diamonds)
and have the aov results for carat ~ color * clarity. Thanks in Advance.
Solution
Here is a working version based on #Aaron's comment which demonstrates what's happening:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
print(deparse(substitute(groups)))
f <- paste(".~.*", deparse(substitute(groups)))
new_formula <- update.formula(formula, f)
print(new_formula)
model <- aov(formula = new_formula, data = data)
print(summary(model))
}
I think update.formula can solve your problem, but I've had trouble with update within function calls. It will work as I've coded it below, but note that I'm passing the column to group, not the variable name. You then add that column to the function dataset, then update works.
I also don't know if it's doing exactly what you want in the second equation, but take a look at the help file for update.formula and mess around with it a bit.
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html
tf <- function(formula,groups,d){
d$groups=groups
newForm = update(formula,~.*groups)
mod = lm(newForm,data=d)
}
dat = data.frame(carat=rnorm(10,0,1),color=rnorm(10,0,1),color2=rnorm(10,0,1),clarity=rnorm(10,0,1))
m = tf(carat~color,dat$clarity,d=dat)
m2 = tf(carat~color+color2,dat$clarity,d=dat)
tf2 <- function(formula, group, d) {
f <- paste(".~.*", deparse(substitute(group)))
newForm <- update.formula(formula, f)
lm(newForm, data=d)
}
mA = tf2(carat~color,clarity,d=dat)
m2A = tf2(carat~color+color2,clarity,d=dat)
EDIT:
As #Aaron pointed out, it's deparse and substitute that solve my problem: I've added tf2 as the better option to the code example so you can see how both work.
One technique I use when I have trouble with scoping and calling functions within functions is to pass the parameters as strings and then construct the call within the function from those strings. Here's what that would look like here.
tf <- function(formula, data, groups) {
f <- paste(".~.*", groups)
m <- eval(call("aov", update.formula(as.formula(formula), f), data = as.name(data)))
summary(m)
}
tf("mpg~vs", "mtcars", "am")
See this answer to one of my previous questions for another example of this: https://stackoverflow.com/a/7668846/210673.
Also see this answer to the sister question of this one, where I suggest something similar for use with xyplot: https://stackoverflow.com/a/14858661/210673