R - object formula not found inside a function - r

I created a function to roll apply an exponentially weighted least-squares using the dynlm package. Here is the code:
residualization<-function(df,formula_ref, size){
rollapply(df,
width=size,
FUN = ewma_regression,
formula_ref = formula_ref,
by.column=FALSE, align="right")
}
ewma_regression<-function(x,formula_ref) {
n<-nrow(x)
weights <- 0.06*0.94^(seq(n-1,0,by=-1))
t <- dynlm(formula=as.formula(formula_ref), data = as.zoo(x),weights = weights)
return(t$residuals)
}
However when I run this code on my dataset, it shows the problem:
Error in as.formula(formula_ref) : object 'formula_ref' not found
When I try to debug it, inside the environment of the function, the variable formula_ref does exist! However even inside the debug mode, I cannot run the dynlm regression even if I try to set formula_ref to a temporary formula object.
Can anyone help me out? I know it might be a silly mistake but I can't find out!
A reproducible example would be:
dates<-seq.Date(from=as.Date("2010-01-01"), length.out = 1000, by="day")
teste1<-data.frame(x=rnorm(1000),y=rnorm(1000)*5)
teste2<-xts(teste1,order.by = dates)
formula.test<- y ~ x + I(x^2)
teste3<-residualization(df=teste2,formula_ref = formula.test, size=100)

You can just wrap y ~ x + I(x^2) in quotation marks ("y ~ x + I(x^2)").

Related

Error including correlation structure in function with gamm

I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:
df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)
try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}
try_fxn1(df,"x")
Error in eval(predvars, data, env) : object 'dfz' not found
I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.
try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}
try_fxn2(df,"x")
Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!
You are confusing a vector with the symbolic representation of that vector when building a formula.
You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is
dfz[[colz]] ~ ...
when what you really want is the variable colz:
colz ~ ...
And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
This turns colz into whatever it was storing, not the literal colz:
> fml
[1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"
Then convert the string into a formula object using formula() or as.formula().
The final solution then is:
fit_fun <- function(dfz, colz) {
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
fml <- formula(fml)
gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
}
This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like
gamm(df[[var]] ~ x + s(z), ....)
While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.

Getting R^2 from Dredge() when global model has an optimizer

Goal: get R^2 marginal and conditional in dredge results using an optimizer in the origninal model
This branches off of this question: dredge doesnt work when specifying glmer optimizer and the two solutions provided.
Solution 1: change r.squaredLR.R package code
Solution 2: add a function into the dredge function to call r.squaredGLMM instead of r.squaredLR
I tried Solution 2 first, which works perfectly on the simulated data, but when I try it on my model I get the error :
Error in r.squaredGLMM(x, null = nullmodel)["delta", ] :
subscript out of bounds
I then tried Solution 1 by altering the source code of r.squaredLR.R as descripbed and saving it as a R script and using source() to call the edited 'null.fit' function as to avoid editing r.squaredLR.R permenantly (I call MuMIn before sourcing the edited function). Yet this doesn't work.
Back to Solution 2...
I tried to simulate data similar to mine and was able to get the same error (the lmercontrol argument is disregarded in this global model, but I get the desired error so I didn't try to correct the data to need lmercontrol).
#Solution 2 attempt
set.seed(101)
dd <- data.frame(x1= rnorm(1920), x2=rnorm(1920), x3=rnorm(1920), x4=rnorm(1920),
treatment = factor(rep(1:2, each=3)),
replicate = factor(rep(1:3, each=1)),
stage = factor(rep(1:5, each=384)),
country = factor(rep(1:4, each=96)),
plot = factor(rep(1:10, each=24)),
chamber = factor(rep(1:6, each=1)),
n = 1920)
library(lme4)
dd$y <- simulate(~ x1 + x2 + x3 + (1|plot),
family = binomial,
weights = dd$n,
newdata = dd,
newparams = list(beta = c(1,1,1,1),
theta = 1))[[1]]
# my real response variable 'y' has a poisson distribution, but I had difficulty figuring
# out how to simulate a poisson distribution so I left the bionomial.
m0 <- lmer(y~ x1 + x2 + x3 + x4 + treatment*replicate*stage + (1|chamber) + (1|country/plot),
data=dd,
na.action = "na.fail",
REML = F,
lmercontrol = glmerControl(optimizer="bobyqa"))
nullmodel <- MuMIn:::.nullFitRE(m0)
dredge(m0, m.lim = c(0,5), rank = "AIC", extra =list(R2 = function(x) {
r.squaredGLMM(x, null = nullmodel)["delta", ]}))
A suggested reason for the error "subscript out of bounds" was that "the data being put into the algorithm are not in the format that the function expects."
Indeed, the function works when I remove ["delta", ] and I get the columns R21 and R22, but without taking into account the delta column these values are probably incorrect and I'm not sure which one is marginal and conditional R^2.
If you have any ideas, I'm all ears! Thanks in advance for all help.

Find Breakpoint with optimr

I'm trying to fit a known function to data points, finding the best parameters settings. For that purpose I'm using the optimr package.
Here's a reproducible example:
packages = c("optimr", "ggplot2", "tidyverse")
lapply(packages, library, character.only=T)
testfun <- function(par, x){
case_when(
x < par[1] ~ par[2]*x,
x >= par[1] ~ par[3] + par[4]*x^2)
}
optfun <- function(par, x, y){
sum((testfun(par, x) - y)^2)
}
optout <- optimr(par = c(20000,0,0,0), optfun, x = ggplot2::midwest$popdensity, y = ggplot2::midwest$poptotal, method = "L-BFGS-B")
ggplot(ggplot2::midwest, aes(x = popdensity, y = poptotal)) +
geom_point() +
stat_function(fun = testfun, args = list(par = optout$par))
I can fit a simple function and the setup will find the best fitting parameters. But par[1] does not change and simply stays on the initial value when i introduce a piecewise function.
Any help is much appreciated!
Piecewise functions, especially piecewise functions with flat parts, will give optimisers trouble. Optimisers will generally give up on a parameter if the function value doesnt' change for some step size, and might not search far enough to hit the step.
Look for other parameters to optimr that might widen the search space.

R using rgp symbolicRegression for equation discovery

I am trying to use the package rgp for equations discovery
library(rgp)
x = c (1:100)
y = 5*x+3*sin(x)+4*x^2+75
data1 = data.frame(x,y)
newFuncSet <- functionSet("+","-","*")
result1 <- symbolicRegression(y ~ x, data = data1, functionSet = newFuncSet, stopCondition = makeStepsStopCondition(2000))
plot(data1$y, col=1, type="l"); points(predict(result1, newdata = data1), col=2, type="l")
model <- result1$population[[which.min(result1$fitnessValues)]]
However, I keep getting an error message.I would be grateful for your help in pointing out the errors I have made above.
Useful references (it would be great to have this in R):
https://www.researchgate.net/publication/237050734_Improving_Genetic_Programming_Based_Symbolic_Regression_Using_Deterministic_Machine_Learning
The problem is that R treats the x vector as integers and has some problems with types further. Try to use type x into numeric specifically:
x <- as.numeric(1:100)
It worked for me.

Proper method to append to a formula where both formula and stuff to be appended are arguments

I've done a fair amount of reading here on SO and learned that I should generally avoid manipulation of formula objects as strings, but I haven't quite found how to do this in a safe manner:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
# Arguments are unquoted and in the typical form for lm etc
# Do some plotting with lattice using formula & groups (works, not shown)
# Append 'groups' to 'formula':
# Change y ~ x as passed in argument 'formula' to
# y ~ x * gr where gr is the argument 'groups' with
# scoping so it will be understood by aov
new_formula <- y ~ x * gr
# Now do some anova (could do if formula were right)
model <- aov(formula = new_formula, data = data)
# And print the aov table on the plot (can do)
print(summary(model)) # this will do for testing
}
Perhaps the closest I came was to use reformulate but that only gives + on the RHS, not *. I want to use the function like this:
p <- tf(carat ~ color, groups = clarity, data = diamonds)
and have the aov results for carat ~ color * clarity. Thanks in Advance.
Solution
Here is a working version based on #Aaron's comment which demonstrates what's happening:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
print(deparse(substitute(groups)))
f <- paste(".~.*", deparse(substitute(groups)))
new_formula <- update.formula(formula, f)
print(new_formula)
model <- aov(formula = new_formula, data = data)
print(summary(model))
}
I think update.formula can solve your problem, but I've had trouble with update within function calls. It will work as I've coded it below, but note that I'm passing the column to group, not the variable name. You then add that column to the function dataset, then update works.
I also don't know if it's doing exactly what you want in the second equation, but take a look at the help file for update.formula and mess around with it a bit.
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html
tf <- function(formula,groups,d){
d$groups=groups
newForm = update(formula,~.*groups)
mod = lm(newForm,data=d)
}
dat = data.frame(carat=rnorm(10,0,1),color=rnorm(10,0,1),color2=rnorm(10,0,1),clarity=rnorm(10,0,1))
m = tf(carat~color,dat$clarity,d=dat)
m2 = tf(carat~color+color2,dat$clarity,d=dat)
tf2 <- function(formula, group, d) {
f <- paste(".~.*", deparse(substitute(group)))
newForm <- update.formula(formula, f)
lm(newForm, data=d)
}
mA = tf2(carat~color,clarity,d=dat)
m2A = tf2(carat~color+color2,clarity,d=dat)
EDIT:
As #Aaron pointed out, it's deparse and substitute that solve my problem: I've added tf2 as the better option to the code example so you can see how both work.
One technique I use when I have trouble with scoping and calling functions within functions is to pass the parameters as strings and then construct the call within the function from those strings. Here's what that would look like here.
tf <- function(formula, data, groups) {
f <- paste(".~.*", groups)
m <- eval(call("aov", update.formula(as.formula(formula), f), data = as.name(data)))
summary(m)
}
tf("mpg~vs", "mtcars", "am")
See this answer to one of my previous questions for another example of this: https://stackoverflow.com/a/7668846/210673.
Also see this answer to the sister question of this one, where I suggest something similar for use with xyplot: https://stackoverflow.com/a/14858661/210673

Resources