I am looking for an elegant (and safe!) way to evaluate an amended call in the parent frame. By "amended" I mean I modified the call in such a way that it refers to something not included in parent frame but in another frame. I guess one could also say: "send something up but only for evaluation".
It is clarified what I want by the example below which works in some circumstances, but not all. The update function (stats:::update.default) uses eval and I added the weights argument with something (res) that is not in the same environment as the evaluation takes place. So I used get("res", pos = -1L) and I hope it is a safe way to refer to the environment res lives in. For models estimated with a variable as formula, both defined methods fail:
mod <- lm(mpg ~ cyl, data = mtcars)
form <- mpg ~ cyl
mod2 <- lm(form, data = mtcars)
wls1 <- function(x) {
res <- residuals(x)^2 # example
result <- update(x, weights = 1/get("res", pos = -1L))
return(result)
}
wls2 <- function(x) {
res <- residuals(x)^2 # example
result <- update(x, weights = 1/res)
return(result)
}
wls3 <- function(x) {
data(ChickWeight)
ChickWeight$cyl <- ChickWeight$weight
ChickWeight$mpg <- ChickWeight$Time
result <- update(x, data = ChickWeight)
return(result)
}
wls1(mod) # works
wls1(mod2) # errors
wls2(mod) # works
wls2(mod2) # erros
wls3(mod) # works
wls3(mod2) # works
How can this be solved in general in a safe way?
I was looking for a function that gives the current environment (something like a fictious this.environment() function) so avoid the pos argument and use the envir of get (I know I can create my own temporary environment and have res associated to it to use something like envir = my.eny).
We can do this by creating a quoted 'language' object for the formula and then update the call of the model
form <- quote(mpg ~ cyl)
wlsN <- function(x, formula) {
x$call$formula <- formula
res <- residuals(x)^2
update(x, weights = 1/res) # it is in the same environment. No need for get
}
wlsN(mod2, form)
#Call:
#lm(formula = mpg ~ cyl, data = mtcars, weights = 1/res)
#Coefficients:
#(Intercept) cyl
# 37.705 -2.841
-checking with other formula
form1 <- quote(disp ~ cyl + vs)
form2 <- quote(mpq ~ gear + carb)
mod1 <- lm(form1, data = mtcars)
mod2 <- lm(form2, data = mtcars)
wlsN(mod1, form1) # works
wlsN(mod2, form2) # works
It's hard to work around the fact that R looks for the value of weights in either data or the environment of the formula - which in the case of the variable named form in your example, is the global environment.
An alternative that riffs on the same theme as akrun's answer:
wls3 <- function(x) {
environment(x$call$formula) <- environment()
res <- residuals(x)^2
result <- update(x, weights=1/res)
}
I can see how this could get ugly in less trivial uses of this workaround such as when the formula of x already has an environment that does not enclose (potentially wrong use of the term) the environment in the call to wls3().
Another alternative (not recommended) is to use assign, e.g.
wls4 <- function(x) {
assign('res', residuals(x)^2, envir=environment(formula(x)))
result <- update(x, weights=1/res)
}
however this has the unintended consequence of leaving the variable res in the global environment.
Related
I want to perform a certain number of statistical models based on selection criteria specified in a dataframe. So using a basic example, say I had 2 responses variables and 2 explanatory variables:
#######################Data Input############################
Responses <- as.data.frame(matrix(sample(0:10, 1*100, replace=TRUE), ncol=2))
colnames(Responses) <- c("A","B")
Explanatories <- as.data.frame(matrix(sample(20:30, 1*100, replace=TRUE), ncol=2))
colnames(Explanatories) <- c("x","y")
I then define which statistical models that I would like to run, which can include different combinations of Response / Explanatory variables and different statistical functions:
###################Model selection#########################
Function <- c("LIN","LOG","EXP") ##Linear, Logarithmic (base 10) and exponential - see the formula for these below
Respo <- c("A","B","B")
Explan <- c("x","x","y")
Model_selection <- data.frame(Function,Respo,Explan)
How do I then perform a list of models based on these selection criteria? Here is an example of the models I would like to create based on the inputs from the Model_selection data frame.
####################Model creation#########################
Models <- list(
lm(Responses$A ~ Explanatories$x),
lm(Responses$B ~ log10(Explanatories$x)),
lm(Responses$B ~ exp(Explanatories$y))
)
I would guess that some kind of loop function would be required and after looking around perhaps paste too? Thanks in advance for any help with this
This isn't the prettiest solution, but it seems to work for your example:
Models <- list()
idx <- 1L
for (row in 1:nrow(Model_selection)){
if (Model_selection$Function[row]=='LOG'){
expl <- paste0('LOG', Model_selection$Explan[row])
Explanatories[[expl]] <- log10(Explanatories[[Model_selection$Explan[row]]])
Models[[idx]] <- lm(Responses[[Model_selection$Respo[row]]] ~ Explanatories[[expl]])
}
if (Model_selection$Function[row]=='EXP'){
expl <- paste0('EXP', Model_selection$Explan[row])
Explanatories[[expl]] <- exp(Explanatories[[Model_selection$Explan[row]]])
Models[[idx]] <- lm(Responses[[Model_selection$Respo[row]]] ~ Explanatories[[expl]])
}
if (Model_selection$Function[row]=='LIN'){
expl <- paste0('LIN', Model_selection$Explan[row])
Explanatories[[expl]] <- Explanatories[[Model_selection$Explan[row]]]
Models[[idx]] <- lm(Responses[[Model_selection$Respo[row]]] ~ Explanatories[[expl]])
}
names(Models)[idx] <- paste(Model_selection$Respo[row], '~', expl)
idx <- idx+1L
}
Models
This is a perfect use-case for the tidyverse
library(tidyverse)
## cbind both data sets into one
my_data <- cbind(Responses, Explanatories)
## use 'mutate' to change function names to the existing function names
## mutate_all to transform implicit factors to characters
## NB this step could be ommitted if Function would already use the proper names
model_params <- Model_selection %>%
mutate(Function = case_when(Function == "LIN" ~ "identity",
Function == "LOG" ~ "log10",
Function == "EXP" ~ "exp")) %>%
mutate_all(as.character)
## create a function which estimates the model given the parameters
## NB: function params must be named exactly like columns
## in the model_selection df
make_model <- function(Function, Respo, Explan) {
my_formula <- formula(paste0(Respo, "~", Function, "(", Explan, ")"))
my_mod <- lm(my_formula, data = my_data)
## syntactic sugar: such that we see the value of the formula in the print
my_mod$call$formula <- my_formula
my_mod
}
## use purrr::pmap to loop over the model params
## creates a list with all the models
pmap(model_params, make_model)
I am printing a table to PDF with much success. Standard hierarchical regression
with three steps. However, my questions is twofold: 1) How do I add the asterisk to mark sig p values on the covariates and 2) how do I remove rows like AIC etc.
At this point, I am just opening the pdf in word to edit the table but thought someone might have a solution.
H_regression <- apa_print(list(Step1 = model1,
Step2 = model2,
Step3 = model3),
boot_samples = 0)
I'll use the example from the papaja documentation as an example.
mod1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
mod2 <- update(mod1, formula = . ~ . + Petal.Length)
mod3 <- update(mod2, formula = . ~ . + Petal.Width)
moi <- list(Baseline = mod1, Length = mod2, Both = mod3)
h_reg <- apa_print(moi, boot_samples = 0)
h_reg_table <- h_reg$table
2) how do I remove rows like AIC
The table returned by apa_print() is a data.frame with some additional information. Hence, you can index and subset it as you would any other table. You can select rows by name (see below) or by row number.
# Remove rows
rows_to_remove <- c("$\\mathrm{AIC}$", "$\\mathrm{BIC}$")
h_reg_table <- h_reg_table[!rownames(h_reg_table) %in% rows_to_remove, ]
1) How do I add the asterisk to mark sig p values on the covariates
There is currently no way to highlight significant predictors (I'm not a fan of this practice). But here's some code that will let you add highlighting after the fact. The following function takes the formatted table, the list of the compared models and a character symbol to highlight significant predictors as input.
# Define custom function
highlight_sig_predictors <- function(x, models, symbol) {
n_coefs <- sapply(models, function(y) length(coef(y)))
for(i in seq_along(models)) {
sig_stars <- rep(FALSE, max(n_coefs))
sig_stars[1:n_coefs[i]] <- apply(confint(models[[i]]), 1, function(y) all(y > 0) || all(y < 0))
x[1:max(n_coefs), i] <- paste0(x[1:max(n_coefs), i], ifelse(sig_stars, symbol, paste0("\\phantom{", symbol, "}")))
}
x
}
Now this function can be used to customize the table returned by apa_print().
# Add significance symbols to predictors
h_reg_table <- highlight_sig_predictors(h_reg_table, moi, symbol = "*")
# Print table
apa_table(h_reg_table, escape = FALSE, align = c("lrrr"))
I'm looking to loop a number of independent variables through a mixed effect model. There are a couple of similar questions but nothing that quite works for me. An example using mtcars:
data(mtcars)
mtcars <- mtcars
t <- as.data.frame(seq(from = 10, to = 1000, by = 100))
names(t)[1] <- "return"
t <- as.data.frame(t(t))
#create some new variables to loop through
new <- cbind(mtcars$drat, t)
new2 <- 1-exp(-mtcars$drat/new[, 2:10])
new3 <- cbind(mtcars, new2)
xnam <- paste(colnames(mtcars)[c(3:4)], sep="")
xnam2 <- paste(colnames(reference)[c(12:20)], sep="")
#basic model (works fine)
fmla <- paste(xnam, collapse= "+")
fla <- paste("cyl ~", paste(fmla))
f <- paste0(paste(fla), " +(carb|gear)")
mtcarsmodel <- lmer(f, data= mtcars)
mtcarsmodel
So with my 'basic' model, I now want iteratively run each of the variables in xnam2 through the model as a fixed effect, but can't get it working with lapply and paste method:
f2 <- " +(carb|gear)"
newmodels <- lapply(xnam2, function(x) {
lmer(substitute(paste(fla), i + (paste(f2)), list(i = as.name(x))), data = mtcars)
})
So cyl ~ disp+hp + looping variable + (carb|gear) is what I'm after.
Hopefully that's clear with what I'm trying to accomplish. I'm getting a bit confused with the multiple pastes, but seems like the best way to approach dealing with many variables. Any suggestions?
If I've understood your question, I think you can just create the model formula with paste and use lapply to iterate through each new variable.
library(lme4)
vars = names(mtcars)[c(1,5,6,7)]
models = lapply(setNames(vars, vars), function(var) {
form = paste("cyl ~ disp + hp + ", var, "+ (carb|gear)")
lmer(form, data=mtcars)
})
A slight variant on #eipi10's solution:
library(lme4)
vars = names(mtcars)[c(1,5,6,7)]
otherVars <- c("disp","hp","(carb|gear)")
formList <- lapply(vars,function(x) {
reformulate(c(otherVars,x),response="cyl")
})
modList <- lapply(formList,lmer,data=mtcars)
EDIT: Ok, it has something to do with the data.all.filtered datatype.
The filtered datatype gets created from data.all.raw which works fine with any lapply below. The weird thing is that I can't find out how do the two differ...
data.selectedFeatures <- sapply(data.train.raw, FUN = sf.getGoodFeaturesVector, treshold = 5)
data.train.filtered <- lapply(seq(1, 8), FUN = function(i) sf.filterFeatures(data.train.raw[[i]], data.selectedFeatures[[i]]))
st.testFeature <- function(featureVector, treshold) {
if(!is.numeric(featureVector)) {return(T)}
numberOfNonZero <- sum(featureVector > 0)
numberOfZero <- length(featureVector) - numberOfNonZero
return(min(numberOfNonZero, numberOfZero) >= treshold)
}
sf.getGoodFeaturesVector <- function(data, treshold) {
selectedFeatures <- sapply(data, FUN = st.testFeature, treshold <- treshold)
whitelistedFeatures <- names(data) %in% c("id", "tp")
return(selectedFeatures | whitelistedFeatures)
}
sf.filterFeatures <- function(data, selectedFeatures) {
return(data[, selectedFeatures])
}
Any idea what am I doing wrong when manipulating the data that causes subsequent lapply to not to work?
Original post:
I have a list of datasets called data.train.filtered and want to get a list of models (for predicting a feature called tp) trained by rplot on them. The easiest solution I could think of was using lapply but it doesn't work for some reason.
lapply(data.train.filtered, function(dta) rpart(tp ~ ., data = dta))
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
The problem is probably not in the data as using it just for one (any) dataset works fine:
rpart(tp ~ ., data = data.train.filtered[[1]])
Even though accessing just one dataset via index works fine (as shown above) using lapply trough indexes fails just the same way the first example did.
lapply(1:8, function(i) rpart(tp ~ ., data = data.train.filtered[[i]]))
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
The traceback for the index version is following:
10 terms.formula(formula, data = data)
9 terms(formula, data = data)
8 model.frame.default(formula = tp ~ ., data = data.train.filtered[[i]],
na.action = function (x)
{
Terms <- attr(x, "terms") ...
7 stats::model.frame(formula = tp ~ ., data = data.train.filtered[[i]],
na.action = function (x)
{
Terms <- attr(x, "terms") ...
6 eval(expr, envir, enclos)
5 eval(expr, p)
4 eval.parent(temp)
3 rpart(tp ~ ., data = data.train.filtered[[i]])
2 FUN(X[[i]], ...)
1 lapply(1:8, function(i) rpart(tp ~ ., data = data.train.filtered[[i]]))
I'm quite sure I'm missing something extremely trivial here but being quite new to R I just can't find the problem.
PS: I know that I could iterate trough all the datasets via for loop but that feels really dirty and I'd prefer an R idiomatic solution.
The trick is to use lapply() on the original list, not on an index vector. For example:
# toy data:
data.train.filtered <- list()
# create 10 different length data frames:
for(i in 1:10){
n <- rpois(1, 15)
x = rnorm(n)
data.train.filtered[[i]] <- data.frame(x =x,
tp = 3 + 2 * x + rnorm(n)
)
}
library(rpart)
lapply(data.train.filtered, function(dta){rpart(tp ~ ., data = dta)})
using data(iris) and purrr::map:
datas <- split(iris, rep(sample(c(1,2,3)), length.out = nrow(iris))
models <- purrr::map(datas, ~ rpart(Species ~ ., data = .x)) # a better syntax
Ok, I finally managed to find the answer. The problem was that data.train.all was actually not what I thought it was. I had an error in the filtering process which corrupted (silently, thanks R) everything.
The fix was to use:
data.selectedFeatures <- lapply(data.train.raw, FUN = sf.getGoodFeaturesVector, treshold = 5)
instead of
data.selectedFeatures <- sapply(data.train.raw, FUN = sf.getGoodFeaturesVector, treshold = 5)
Thanks for all the other answers, though.
In my dataset I have 6 variables(x1,x2,x3,x4,x5,x6), i wish to create a function that allows me to input one variable and it will do the formula with the rest of the variables in the data set.
For instance,
fitRegression <- function(data, dependentVariable) {
fit = lm(formula = x1 ~., data = data1)
return(fit)
}
fitRegression(x2)
However, this function only returns me with results of x1. My desire result will be inputting whatever variables and will automatically do the formula with the rest of the variables.
For Example:
fitRegression(x2)
should subtract x2 from the variable list therefore we only compare x2 with x1,x3,x4,x5,x6.
and if:
fitRegression(x3)
should subtract x3 from the comparable list, therefore we compare x3 with x1,x2,x4,x5,x6.
Is there any ways to express this into my function, or even a better function.
You can do it like this:
# sample data
sampleData <- data.frame(matrix(rnorm(500),100,5))
colnames(sampleData) <- c("A","B","C","D","E")
# function
fitRegression <- function(mydata, dependentVariable) {
# select your independent and dependent variables
dependentVariableIndex<-which(colnames(mydata)==dependentVariable)
independentVariableIndices<-which(colnames(mydata)!=dependentVariable)
fit = lm(formula = as.formula(paste(colnames(mydata)[dependentVariableIndex], "~", paste(colnames(mydata)[independentVariableIndices], collapse = "+"), sep = "" )), data = mydata)
return(fit)
}
# ground truth
lm(formula = A~B+C+D+E, data = sampleData)
# reconcile results
fitRegression(sampleData, "A")
You want to select the Y variable in your argument. The main difficulty is to pass this argument without any quotes in your function (it is apparently the expected result in your code). Therefore you can use this method, using the combination deparse(substitute(...)):
fitRegression <- function(data, dependentVariable) {
formula <- as.formula(paste0(deparse(substitute(dependentVariable)), "~."))
return(lm(formula, data) )
}
fitRegression(mtcars, disp)
That will return the model.
The below function uses "purrr" and "caret" it produces a list of models.
df <-mtcars
library(purrr);library(caret)
#create training set
vect <- createDataPartition(1:nrow(df), p=0.8, list = FALSE)
#build model list
ModList <- 1:length(df) %>%
map(function(col) train(y= df[vect,col], x= df[vect,-col], method="lm"))