stop when call is not from "aov()" or "lm()" in R? - r

I have an R function programmed to stop when input is not a call from "aov()" or "lm()".
Below, I expect when using fit3 as input, my function to stop, but I'm wondering why it does not?
P.S. The function correctly stops when fed fit4, BUT doesn't stop when fed fit3; WHY?
fit2 <- aov(mpg ~ wt, data = mtcars)
library(rstanarm)
fit3 <- stan_glm(mpg ~ wt, data = mtcars) # This call is from "rstanarm" package !!
fit4 <- glm(vs~mpg, data = mtcars)
bb <- function(fit = NA){
if(!(any(is.na(fit)))){
if(fit$call[1] != "lm()" && fit$call[1] != "aov()") stop("Error") else "OK"
}
}
# Examples of use:
bb(fit = fit4) # stops as expected ! because call is not from "lm()" or "aov()"
bb(fit = fit3) # I expect HERE to stop also; why it doesn't? !!!!

This is the reason:
> !(any(is.na(fit3)))
[1] FALSE
is.na(fits3) returns a logical named vector. Maybe you tried to check if is.na(fit3$call), instead of all elements of the fit3 object?

Related

Sequentially calling functions within function in R

How can you create a function that calls some predefined functions simultaneously?
E.g. I have 3 different functions like
myplot(data)
fmodel(data)
mymodel <- fmodel(data)
myconclusion(model = mymodel)
Now I want to create a new function that calls those predefined functions (from 1 to 3). What should I do?
I tried to do something like the below and receive the following error message, but I don't what was wrong.
P/s: my model involves linear regression and I've already put in the 'data' arguments.
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
funlist <- list(
F1 = myplot
F2 = fmodel
mymodel <- fmodel
F3 = myconclusion
)
callfun <- function(funrange, data, ...){
for(i in funrange){
funlist[[i]](...)
}
}
callfun(1:3, data = mydata)
#Error in model.frame.default(formula = Y ~ X, data = mydata, drop.unused.levels = TRUE) :
#argument "data" is missing, with no default
Running the 3 functions inside another function should execute them, However, depending on what the functions actually do, there may not be any visible output.
f1 <- function(mydata, mymodel){
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
}
f1(mydata, mymodel)
Again, depending on what these functions actually do will dictate the output.
EDIT
Here is an example for you
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
}
my_model <- function(my_data){
my_data %>%
lm(mpg ~ hp, data = .) %>%
summary
}
my_model_2 <- function(my_data){
my_data %>%
lm(mpg ~ disp, data = .) %>%
summary
}
f1 <- function(my_data){
my_plot(my_data)
my_model(my_data)
my_model_2(my_data)
}
If you call f1(mtcars), all you will see is the output from my_model_2(), because that was the last function to be executed. my_plot() and my_model() were still executed, but you just couldn't see the results because all it does is preview a plot in the viewer, or print the model summary to the console.
One way to 'see' the plot produced by my_plot() is to change what it does, from previewing a plot in the viewer, to saving a copy of the plot. This may be done like this:
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
ggsave('my_saved_plot.png')
}
Or, wrapping each function inside print will print the model summaries to the console, and show the plot in the viewer
f1 <- function(my_data){
print(my_plot(my_data))
print(my_model(my_data))
print(my_model_2(my_data))
}

control scoping of arguments supplied to lm() from whithin a function called by lapply

I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.

function of effect fails with xlevels

I am working on developing a function, with the effects package in R. I am constantly running into a problem, which I can't debug.
I run the following code:
rm(list = ls()) # clear working directory
library(effects)
head(mtcars)
mod <- lm(mpg ~ gear*cyl + gear + cyl + carb, data=mtcars)
summary(mod)
eff.dat <- effect("gear*cyl", mod=mod, KR=TRUE, xlevels=list(gear=seq(3,5,1)))
eff.dat <- as.data.frame(eff.dat)
View(eff.dat)
It works like a charm, I get the effects for the interaction term on mpg when gear equals 3, 4, and 5 and the corresponding values of cyl.
However, once I put this into a function like:
proba <- function(term, model, main) {
eff.dat2 <<- effect(term, mod=model, KR=TRUE,
xlevels=list(main=seq(min(mtcars[[main]]),
max(mtcars[[main]]), 1)))
eff.dat2 <<- as.data.frame(eff.dat2)
}
proba("gear*cyl", model=mod, main="gear")
View(eff.dat2)
The xlevels part fails and the interaction term is estimated for the default values of gear, not the ones I specify. Obviously, this is part of a larger function, otherwise I would not bother to write something solely for effect.
First an illustration what happens:
foo <- function(x) {
list(x = x)
}
foo("bar")
#$x
#[1] "bar
Note how the list element is named x. setNames can be used to set names programmatically:
foo <- function(x) {
setNames(list(x), x)
}
foo("bar")
#$bar
#[1] "bar"
Also, you should avoid creating function side effects with <<-. It's very bad practice. Create a proper return value instead:
proba <- function(term, model, main) {
as.data.frame(
effect(term, mod=model, KR=TRUE,
xlevels= setNames(list(seq(min(mtcars[[main]]),
max(mtcars[[main]]), 1)), main))
)
}
eff.dat2 <- proba("gear*cyl", model=mod, main="gear")
all.equal(eff.dat, eff.dat2)
#[1] TRUE

Change arguments in a "call" object

I have a call object and I want to add an argument, and I don't want to use parse in the way this answer does.
So, say I have an lm object, and so the call from the lm
lma <- lm(mpg ~ cyl, data=mtcars)
lma$call
# lm(formula = mpg ~ cyl, data = mtcars)
now, say I wanted to add an argument, weights=wt, using the call. I realize that there is an incredibly easy way to just create a new call, but I'm wondering if I can work with a call object. there is also a way to edit weights if it were already in there
lmb <- lm(mpg ~ cyl, data=mtcars, wei=wt)
cl <- lmb$call
wtpos <- which.max(pmatch(names(cl), "weights"))
cl[[wtpos]] <- mtcars$qsec
eval(cl)
but this won't work on lma$call because there is no weights argument in lma$call.
so, it feels like I should be able to simply "grow" the call by adding another element, but I don't see how to do that. For example, the following fails:
cl <- lma$call
cl <- c(cl, weights=quote(wt))
eval(cl)
# [[1]]
# lm(formula = mpg ~ cyl, data = mtcars)
#
# $weights
# wt
so, I would hope for the result to be a new "lm" object equal to the lmb, not just a list.
While there is a workaround, that doesn't use parse (to modify a copy of lm to have wt=weights as the default similar to in this solution) that, again, doesn't involve editing the call object.
I believe that the pryr package provides some useful functions for manipulating calls:
lma <- lm(mpg ~ cyl, data=mtcars)
lm_call <- lma$call
library(pryr)
modify_call(lm_call,list(weights = runif(32)))
> lm_call2 <- modify_call(lm_call,list(weights = runif(32)))
> eval(lm_call2)
Call:
lm(formula = mpg ~ cyl, data = mtcars, weights = c(0.934802365722135,
0.983909613220021, 0.762353664264083, 0.23217184189707, 0.850970500381663,
0.430563687346876, 0.962665138067678, 0.318865151610225, 0.697970792884007,
0.389103061752394, 0.824285467388108, 0.676439745584503, 0.344414771301672,
0.292265978176147, 0.925716639030725, 0.517001488478854, 0.726312294835225,
0.842773627489805, 0.669753148220479, 0.618112818570808, 0.139365098671988,
0.843711007386446, 0.851153723662719, 0.134744396666065, 0.92681276681833,
0.00274682720191777, 0.732672147220001, 0.4184603120666, 0.0912447033915669,
0.427389309043065, 0.721000595251098, 0.614837386412546))
Coefficients:
(Intercept) cyl
38.508 -2.945
You can look inside pryr::modify_call to see what it's doing if you'd like to do it manually, I suppose.

loop through column glmer

I am trying to run a glmer by looping through columns in my dataset which contain response variables (dat_prob).The code I am using is as follows, adapted from code researched on another stackoverflow question (Looping through columns in R).
Their code:
dat_y<-(dat[,c(2:1130)])
dat_x<-(dat[,c(1)])
models <- list()
#
for(i in names(dat_y)){
y <- dat_y[i]
model[[i]] = lm( y~dat_x )
}
My code:
dat_prob<-(probs[,c(108:188)])
dat_age<-(probs[,c(12)])
dat_dist<-(probs[,c(20)])
fyearcap=(probs[,c(25)])
fstation=(probs[,c(22)])
fnetnum=(probs[,c(23)])
fdepth=(probs[,c(24)])
models <- list()
#
for(i in names(dat_prob)){
y <- dat_prob[i]
y2=as.vector(y)
model[[i]] = glmer( y ~ dat_age * dat_dist + (1|fyearcap) + (1|fstation)+
(1|fnetnum)+ (1|fdepth),family=binomial,REML=TRUE )
}
And I receive this error, similar to the error received in the hyperlinked question:
Error in model.frame.default(drop.unused.levels = TRUE, formula = y ~ :
invalid type (list) for variable 'y'
I have been working through this for hours and now can't see the forest through the trees.
Any help is appreciated.
y <- dat_prob[i] makes y a list (or data frame, whatever). Lists are vectors - try is.vector(list()), so even y2 = as.vector(y) is still a list/data frame (even though you don't use it).
class(as.vector(mtcars[1]))
# [1] "data.frame"
To extract a numeric vector from a data frame, use [[: y <- dat_prob[[i]].
class(mtcars[[1]])
# [1] "numeric"
Though I agree with Roman - using formulas is probably a nicer way to go. Try something like this:
for(i in names(dat_prob)) {
my_formula = as.formula(paste(i,
"~ dat_age * dat_dist + (1|fyearcap) + (1|fstation)+ (1|fnetnum)+ (1|fdepth)"
))
model[[i]] = glmer(my_formula, family = binomial, REML = TRUE)
}
I'm also pretty skeptical of whatever you're doing trying 80 different response variables, but that's not your question...

Resources