Linear regression function malfunction - r

variables.null.model <- paste('utalter', 'lcsex', 'utcigreg', 'utbmi', 'month', sep = '+')
variables.full.model <- paste('utalter', 'lcsex', 'utcigreg', 'utbmi', 'month', 'ltedyrs','occ_status', 'marital_status', 'social_cat','GC_linc125_07', 'GC_linc250_07', 'GC_linc500_07', 'GC_linc1000_07', 'GC_linc5000_07', 'GC_pop500_08','utalkkon', 'activity', 'utpyrs', 'cvd', 'utmstati', 'utmfibra', 'utantihy', 'utmeddia', 'utmadins','utwhrat','ul_choln', sep='+')
pollutants_3 <- c('GC_PM10_09', 'GC_PM25_09', 'GC_Coarse_09', 'GC_BS25_09', 'GC_NOX_09', '$GC_NO2_09')
null <- paste(variables.null.model, pollutants_3, sep='+')
full <- paste(variables.full.model, pollutants_3, sep='+')
fun.model.summary <- function(x) {
formula <- as.formula(paste("log_sfrp5 ~", x))
lm <- lm(formula, data = kalonji.na )
coef(summary(lm))
}
lm.summary <- lapply(full, fun.model.summary)
I am working on some air pollution data and would like to run a linear regression function and summarize the coefficients. I have the following code above but I am getting this error:
Error in parse(text = x, keep.source = FALSE) :
:1:269: unexpected '$'
Any ideas how I can fix this?

Your last pollutant is '$GC_NO2_09'. Note the stray $ sign.
But as I said in a comment, I strongly recommend against using character strings here1. Construct the formula directly from R objects by transforming the strings into R identifiers via as.name.
You can combine a list of names into a sum by the use of Reduce and call. E.g.:
make_addition = function (lhs, rhs)
call('+', lhs, rhs)
variables_null_model = c('utalter', 'lcsex', 'utcigreg', 'utbmi', 'month')
interaction_terms_full_model = Reduce(make_addition, lapply(variables_null_model, as.name))
fun_model_summary = function (x) {
formula = call('~', quote(log_sfrp5), call('+', interaction_terms_full_model, as.name(x)))
lm = lm(formula, data = kalonji_na)
coef(summary(lm))
}
lm_summary = lapply(pollutants_3, fun_model_summary)
1 For a bit of background, using strings here subverts the type system and replaces proper, distinct types by untyped strings. This is known as stringly typing and it’s an anti-pattern because it hides bugs. Your question is an example of such a bug.

Related

Create a function with an argument used in a formula

I'm a beginner at creating function and I have some trouble with something probably basic.
I'd like to create a function that takes as argument a data.frame and a name of a variable, and return the linear regression of this variable by the others (no real point with doing that, I'm just trying to learn how to create functions)
my_lm <- function(df, var) lm(var~., data = df)
my_lm(diamonds, price)
But I get this error:
Error in eval(predvars, data, env) : object 'price' not found"
Thanks for your help and sorry for bad english
One solution is to pass price as char, and use formula() to convert a string in the proper object for the lm.
my_lm <- function(df, var) {
f = formula(paste0(var, "~.")) # this creates "price ~ ." in the example
lm(f, data = df)
}
my_lm(diamonds, var="price")
Or, if you have to pass price as "not a string", you need NSE:
my_lm <- function(df, var) {
var = substitute(var)
f = formula(paste0(var, "~."))
lm(f, data = df)
}
my_lm(diamonds, var=price)

R how to pass NULL for optional parameters to function (e.g. in for loop)

I wrote a for loop to test different settings for an ordination function in R (package "vegan", called by "phyloseq"). I have several subsets of my data within a list (sample_subset_list) and therefore, testing different parameters for all these subsets results in many combinations.
The ordination function contains the optional argument formula and I would like to perform my ordinations with and without a formula. I assume NULL would be the correct way to not use the formula parameter? But how do I pass NULL when using a for loop (or apply etc)?
Using the phyloseq example data:
library(phyloseq)
data(GlobalPatterns)
ps <- GlobalPatterns
ps1 <- filter_taxa(ps, function (x) {sum(x > 0) > 10}, prune = TRUE)
ps2 <- filter_taxa(ps, function (x) {sum(x > 0) > 20}, prune = TRUE)
sample_subset_list <- list()
sample_subset_list <- c(ps1, ps2)
I tried:
formula <- c("~ SampleType", NULL)
> formula
[1] "~ SampleType"
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = as.formula(current_formula))
ordination_list[[paste(current_formula)]] <- tmp
}
this way, formula only consists of "~ SampleType". If I put NULL into ticks, it gets wrongly interpreted as formula:
formula <- c("~ SampleType", "NULL")
Error in parse(text = x, keep.source = FALSE)
What is right way to solve this?
Regarding Lyzander's answer:
# make sure to use (as suggested)
formula <- list("~ SampleType", NULL)
# and not
formula <- list()
formula <- c("~ SampleType", NULL)
You can use a list instead:
formula <- list("~ my_constraint", NULL)
# for (i in formula) print(i)
#[1] "~ my_constraint"
#NULL
If your function takes NULL as an argument for a function you should also do:
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = if (is.null(current_formula)) NULL else as.formula(current_formula))
ordination_list[[length(ordination_list) + 1]] <- tmp
}

Calling update within a lapply within a function, why isn't it working?

This a a follow up question from Error in calling `lm` in a `lapply` with `weights` argument but it may not be the same problem (but still related).
Here is a reproducible example:
dd <- data.frame(y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
wg = runif(100,1,100))
ls.form <- list(
formula(y~x1+x2),
formula(y~x3+x4),
formula(y~x1|x2|x3),
formula(y~x1+x2+x3+x4)
)
I have a function that takes different arguments (1- a subsample, 2- a colname for the weights argument, 3- a list of formulas to try and 4- the data.frame to use)
f1 <- function(samp, dat, forms, wgt){
baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
lapply(forms, update, object = baselm)
}
If I call the function, I get an error:
f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
Error in is.data.frame(data) : object 'dat' not found
I don't really get why it doesn't find the dat object, it should be part of the fonction environment. The problem is in the update part of the code as if you remove this line from the function, the code works.
At the end, this function will be call with a lapply
lapply(list(1:66, 33:99), f1, dat=dd, forms = ls.form, wgt="wg")
I think your problems are due to the scoping rules used by lm which are quite frankly a pain in the r-squared.
One option is to use do.call to get it to work, but you get some ugly output when it deparses the inputs to give the call used for the standard print method.
f1 <- function(samp, dat, forms, wgt){
baselm <- do.call(lm,list(formula=y~x1, data = dat[samp,], weights = dat[samp,wgt]))
lapply(forms, update, object = baselm)
}
A better way is to use an eval(substitute(...)) construct which gives the output you originally expected:
f2 <- function(samp, dat, forms, wgt){
baselm <- eval(substitute(lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])))
lapply(forms, update, object = baselm)
}
Such scoping issues are very common with lm objects. You can solve this by specifying the correct environment for evaluation:
f1 <- function(samp, dat, forms, wgt){
baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
mods <- lapply(forms, update, object = baselm, evaluate = FALSE)
e <- environment()
lapply(mods, eval, envir = e)
}
f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
#works
The accepted error work, but I continued digging and found this old r-help question (here) which gave more options and explanation. I thought I would post it here in case somebody else needs it.

Write a wrapper function to successfully take addition arguments (like subset) via ellipsis (...)

I am writing a function that calls another function (e.g. lm), and I would like to pass other
arguments to it using ellipsis (...). However, the data to be used is not
in the global environment, but inside a list. A minimal example:
L <- list(data = chickwts, other = 1:5)
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
lm(formula, data = list$data, ...)
}
wrapper(L, subset = feed != "casein") #fails
I can make it work using attach but I'm sure there is more efficient ways of doing it by specifying the evaluation frame...?
wrapper2 <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
attach(list$data)
m <- lm(formula, ...)
detach(list$data)
return(m)
}
wrapper2(L, subset = feed != "casein") #works
Another solution I have used before is to use list(...), and dealing with the arguments manually, but that would not be practical in the real situation.
I can see that this is fairly basic, but I couldn't find a solution. Any suggestion to the specific problem and also a link to a good conceptual explanation of environments in general would be appreciated.
We would need to construct a call and eval it.
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- weight ~ feed
cl <- match.call()
cl$list <- NULL
cl$formula <- formula
cl$data <- quote(list$data)
cl[[1]] <- quote(stats::lm)
eval(cl)
}
Reproducible example:
L <- list(data = trees, other = 1:5)
wrapper(L, Height ~ Girth, subset = Volume > 20)

I do not understand error "object not found" inside the function

I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.

Resources