R: substitute pattern in formula for a variable name - r

I have a general function that calls an expression that uses a formula and I would like to pass this functions to various environments that store some specific variables and modify parts of a formula designated by a specific pattern.
Here is an example:
# Let's assume I have an environment storing a variable
env <- new.env()
env$..M.. <- "Sepal.Length"
# And a function that calls an expression
func <- function() summary(lm(..M.. ~ Species, data = iris))$r.squared
# And let's assume I am trying to evaluate it within the environment
environment(func) <- env
# And I would like to have some method that makes it evaluate as:
summary(lm(Sepal.Length ~ Species, data = iris))$r.squared
So far I came up with a very dirty solution based on deparsing the function down to string, greping and then parsing it back. It goes like this:
tfunc <- paste(deparse(func), collapse = "")
tfunc <- gsub("\\.\\.M\\.\\.", ..M.., tfunc, perl = TRUE)
tfunc <- eval(parse(text = tfunc))
So yes, it works, but I would like to find a cleaner method, that would somewhat magically substitute this ..M.. pattern into Sepal.Length without a need for all this parsing and deparsing.
So I would really appreciate some help and hints for that problem.

Related

Creating a function for GWR maps

I have created a function for GWR maps and I have run the code without it being in the function and it works well. However, when I create into a function I get an error. I was wondering if anyone could help, thank you!
#a=polygonshapefile
#b= Dependent variabable of shapefile
#c= Explantory variable 1
#d= Explantory vairbale 2
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a$b ~ a$c+a$d, a,adapt=T)
gwr.model = gwr(a$b ~ a$c+a$d, data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
GWR_map(OA.Census,"Qualification", "Unemployed", "White_British")
The above code produces the following error:
Error in model.frame.default(formula = a$b ~ a$c + a$d, data = a, drop.unused.levels = TRUE) :
invalid type (NULL) for variable 'a$b'
You can't use function parameters with the $. Try changing your function to use the [[x]] notation instead. It should look like this:
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a[[b]] ~ a[[c]]+a[[d]], a,adapt=T)
gwr.model = gwr(a[[b]] ~ a[[c]]+a[[d]], data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
The R help docs (section 6.2 on lists) explain this difference well:
Additionally, one can also use the names of the list components in double square brackets,
i.e., Lst[["name"]] is the same as Lst$name. This is especially useful, when the name of the component to be extracted is stored in another variable as in
x <- "name"; Lst[[x]] It is very important to distinguish Lst[[1]] from Lst[1]. ‘[[...]]’ is the operator used to select a single element, whereas ‘[...]’ is a general subscripting operator. Thus the former is the first object in the list Lst, and if it is a named list the name is not included. The latter
is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist.

dot stored in call to update formula leads to scoping issue

I am relying on the compareGroups package to do some comparisons after a pipe-chain. When subsetting the final results, the call to [ triggers a call to update (both in their bespoke compareGroups-versions) which leads to a scoping problem.
Try this:
library(tidyverse)
# install.packages("compareGroups")
library(compareGroups)
get_data <- function() return(mtcars)
assign_group <- function(df) {
n <- nrow(df)
df$group <- rbinom(n, 1, 0.5)
return(df)
}
get_results <- function(){
get_data() %>% assign_group %>% compareGroups(group ~ ., data = .)
}
res <- get_results()
# all the above works, but the following triggers the error:
res["mpg"]
This leads to the following error:
Error in compareGroups(formula = group ~ mpg, data = .) :
object '.' not found
The relevant (abbreviated) traceback is this:
compareGroups(formula = group ~ mpg, data = .)
eval(call, parent.frame())
update.compareGroups(x, formula = group ~ mpg)
update(x, formula = group ~ mpg) at <text>#1
eval(parse(text = cmd))
`[.compareGroups`(res, "mpg")
res["mpg"]
So, my understanding is that that the dot-notation in the dplyr pipe-chain prevents the update-call to find the dataframe, which is stored as . in the call. So, the error makes sense as neither . is not the name of the dataframe, nor available outside of the scope of the function get_results (though the main issue is the .). One obvious way of avoiding this error is by fixing the update.compareGroups function - I don't think we need another call to the package to redo all calculations when I simply want to retrieve individual results (which have already been calculated).
However, this is a more general issue with the . notation of dplyr and the fact it is stored in the call. This problem seems general enough so that I would imagine someone has encountered it before, and has found a more general solution?
Firstly, I don't think piping your data into compareGroups makes sense - remember that piping means the first argument to compareGroups() is now the dataframe, even though the function specification is:
compareGroups(formula, data, ...)
Secondly, this dplyr vignette shows you can use .data instead of just . to access the piped data. However, in this case the following will cause a crash giving message data argument will be ignored since formula is already a data set (due to the data being piped into first argument).
get_results <- function(){
get_data() %>% assign_group %>% compareGroups(group ~ ., data = .data) # does NOT work
}
Making a separate call to compareGroups without piping then gets me into an unholy mess of environments whereby res does not have access to the data when requesting res['mpg'] outside the function get_results(), as you already alluded to with the scoping problem. I think this is a compareGroups problem, because if I use the same architecture with glm there's no such problem. So best I can do is to take the dataframe out of the function environment, which I think doesn't properly answer your question:
get_data <- function() return(mtcars)
assign_group <- function(df) {
n <- nrow(df)
df$group <- rbinom(n, 1, 0.5)
return(df)
}
df = get_data() %>% assign_group()
res = compareGroups(group ~ ., data = df)
print(res['mpg'])
But I hope the first two points I made get you closer to an answer.

R Function - assign LMER to dynamic variable name

To create a more compact script, I am trying to create my first function.
The general function is:
f.mean <- function(var, fig, datafile){
require(lme4)
change <- as.symbol(paste("change", var, sep=""))
base <- as.symbol(paste("baseline", var, sep = ""))
x <- substitute(lmer(change ~ base + (1|ID), data=datafile))
out<-eval(x)
name <- paste(fig,".", var, sep="")
as.symbol(name) <- out
}
}
The purpose of this function is to input var, fig and datafile and to output a new variable named fig.var containing out (eval of LMER).
Apparently it is difficult to 'change' the variable name on the left side of the <-.
What we have tried so far:
- assign(name, out)
- as.symbol(name) <<- out
- makeActive Binding("y",function() x, .GlobalEnv)
- several rename options to rename out to the specified var name
Can someone help me to assign the out value to this 'run' specific variable name? All other suggestions are welcome as well.
As #Roland comments, in R (or any) programming one should avoid indirect environment manipulators such as assign, attach, list2env, <<-, and others which are difficult to debug and break the flow of usual programming involving explicitly defined objects and methods.
Additionally, avoid flooding your global environment of potentially hundreds or thousands of similarly structured objects that may require environment mining such as ls, mget, or eapply. Simply, use one large container like a list of named elements which is more manageable and makes code more maintainable.
Specifically, be direct in assigning objects and pass string literals (var, fig) or objects (datafile) as function parameters and have function return values. And for many inputs, build lists with lapply or Map (wrapper to mapply) to retain needed objects. Consider below adjustment that builds a formula from string literals and passes into your model with results to be returned at end.
f.mean <- function(var, fig, datafile){
require(lme4)
myformula <- as.formula(paste0("change", var, " ~ baseline", var, " + (1|ID)"))
x <- lmer(myformula, data=datafile)
return(x)
}
var_list <- # ... list/vector of var character literals
fig_list <- # ... list/vector of fig character literals
# BUILD AND NAME LIST OF LMER OUTPUTS
lmer_list <- setNames(Map(f.mean, var_list, fig_list, MorArgs=df),
paste0(fig_list, ".", var_list))
# IDENTIFY NEEDED var*fig* BY ELEMENT NAME OF LARGER CONTAINER
lmer_list$fig1.var1
lmer_list$fig2.var2
lmer_list$fig3.var3

R: How to pass two filepaths as parameters to a function?

I'm trying to pass two file paths as parameters to a function. But it's not accepting the inputs. Here's what I'm doing:
partition<-function(d1,p2){
d1<-read.table(file = d1, fill = TRUE)
p2<-read.table(file = p2, fill = TRUE)
}
and while calling the function:
partition("samcopy.txt","partcopy.txt")
The .txt is not being read by the variables inside the function. How to make the variables read the table?
AidanGawronSki's approach works, but from a programming standpoint should be avoided! Here is a more traditional answer to your problem.
partition<-function(d1,p2){
a <- read.table(file = d1, fill = TRUE)
b <- read.table(file = p2, fill = TRUE)
res <- list(a,b)
names(res) <- c(d1,p2)
res
}
To understand why the above approach is "better", it is important to understand what environments are and more generally the R scoping rules. Environments are essentially your workspace. For example, when you first open R and begin assigning objects, these objects are stored within the Global Environment. Another example of an environment is when you call a function, the function creates it own environment comprised of any parameters you have passed to the function. By doing this R ensures that when you call a function, it has no "side effects" or said another way it does not affect the global environment.
Let me show you an example. Imagine you begin an R session, and assign d1 <- 1 in your Global Environment. You're going to want to use d1 later on in your analysis and it would be a shame if it changed without you knowing it, right?
If you utilize AidanGawronSki's approach when you call
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
}
The d1 in your Global Environment will change to be read.table(file = d1, fill = TRUE). This is very very dangerous! A object you previously assigned to be one thing is now another thing and you are not even warned of this change.
The same problem, however, will never occur with the approach I have proposed. I strongly recommend you get in the habit of using this approach! If you don't any function can change things in your Global Environment without you knowing.
For more info read this, this or just google something like "functions with no side effects"
FYI there are also several other problems with your code. First you need to tell your function what to return. All you did is call a function, assign stuff to the local environment and then close the function. Functions will always return the last line (as long as it is not an assignment). This is why in my example, I put res as the last line of the function. Also you are not correctly assigning your object. You pass a string like d1 <- "text.txt", to your function and then ask your function to do the following, "text.txt" <- read.table("text.txt",...). That simply does not make sense. You need to assign the output from read.table to an object. In my example, I assign them to a and b.
use the super assignment operator <<-
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
p2 <<- read.table(file = p2, fill = TRUE)
}

Accessing Arbitrary Columns from an R Data Frame using with()

Suppose that I have a data frame with a column whose name is stored in a variable. Accessing this column using the variable is easy using bracket notation:
df <- data.frame(A = rep(1, 10), B = rep(2, 10))
column.name <- 'B'
df[,column.name]
But it is not obvious how to access an arbitrary column using a call to with(). The naive approach
with(df, column.name)
effectively evaluates column.name in the caller's environment. How can I delay evaluation sufficiently that with() will provide the same results that brackets give?
You can use get:
with(df, get(column.name))
You use 'with' to create a localized and temporary namespace inside which you evaluate some expression. In your code above, you haven't passed in an expression.
For instance:
data(iris) # this data is in your R installation, just call 'data' and pass it in
Ordinarily you have to refer to variable names within a data frame like this:
tx = tapply(iris$sepal.len, list(iris$species), mean)
Unless you do this:
attach(iris)
The problem with using 'attach' is the likelihood of namespace clashes, so you've got to remember to call 'detach'
It's much cleaner to use 'with':
tx = with( iris, tapply(sepal.len, list(species), mean) )
So, the call signature (informally) is: with( data, function() )

Resources