scoping in R, and dealing with 'with' - r

In R, there are lots of situations where with seems to be used to help you write shorter code; however, this masks existing symbols like local variables and function parameters. Is there any way to refer to them without renaming them so they don't clash with your data?
For instance, in this frame, I've got a state column;
df <- data.frame(
label=c("a", "b", "c"),
state=c("off","on","off"))
I can write a filtering function with a .state parameter, and the filter works;
instateWorks <- function(.state) {
subset(df, df$state == .state)
}
# correct - 1 observation for "b"
onWorks <- instateWorks("on")
but if I give my function a sensible parameter name, there's a problem and the symbol state seems to refer to the data frame's column;
instateFails <- function(state) {
subset(df, df$state == state)
}
# fails - all 3 observations
onFails <- instateFails("on")
Is there any way to qualify that state is supposed to mean the parameter, to make the script work as expected?
Edit - to clarify why 'with' and 'eval' are the issue I'm struggling with, consider this code;
df <- data.frame(
label=c("a", "b", "c"),
state=c("off","on","off"))
with(df, state == "on")
# FALSE TRUE FALSE
state <- on
with(df, state == state)
# TRUE TRUE TRUE
In the last with statement, I'm looking for a way to express 'tell me which rows have the 'state' variable in DF has the same value as the 'state' variable defined on the line above.
Without this ability, I can't write a function with a parameter called the same thing as the name of a column.

Thanks to #HaddE.Nuff, I came up with this;
instateFails <- function(state) {
args <- environment()
subset(df, state == args$state)
}
capture the current environment before you make the call, which gives you a way to refer to all the locals in the calling function. Then refer to the environment variable inside the filter expression.

Related

IF statements inside function do not recognize conditions

I want to adjust my function so that my if and else if statements recognize the name of the dataframe used and execute the correct plotting function. These are some mock data structured the same as mine:
df1<-data.frame(A=c(1,2,2,3,4,5,1,1,2,3),
B=c(4,4,2,3,4,2,1,5,2,2),
C=c(3,3,3,3,4,2,5,1,2,3),
D=c(1,2,5,5,5,4,5,5,2,3),
E=c(1,4,2,3,4,2,5,1,2,3),
dummy1=c("yes","yes","no","no","no","no","yes","no","yes","yes"),
dummy2=c("high","low","low","low","high","high","high","low","low","high"))
df1[colnames(df1)] <- lapply(df1[colnames(df1)], factor)
vals <- colnames(df1)[1:5]
dummies <- colnames(df1)[-(1:5)]
step1 <- lapply(dummies, function(x) df1[, c(vals, x)])
step2 <- lapply(step1, function(x) split(x, x[, 6]))
names(step2) <- dummies
tbls <- unlist(step2, recursive=FALSE)
tbls<-lapply(tbls, function(x) x[(names(x) %in% names(df1[c(1:5)]))])
A<-lapply(tbls,"[", c(1,2))
B<-lapply(tbls,"[", c(3,4))
C<-lapply(tbls,"[", c(3,4))
list<-list(A,B,C)
names(list)<-c("A","B","C")
And this is my function:
plot_1<-function (section, subsample) {
data<-list[grep(section, names(list))]
data<-data[[1]]
name=as.character(names(data))
if(section=="A" && subsample=="None"){plot_likert_general_section(df1[c(1:2)],"A")}
else if (section==name && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the",name,"topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the",name,"topics rank?"))}
}
Basically what I want it to do is plot a certain graph by specifying section and subsample I'm interested in if, for example, I want to plot section C and subsample dummy.1, I just write:
plot_1(section="C", subsample="dummy1)
I want to avoid writing this:
else if (section=="A" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the A topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the A topics rank?"))}
else if (section=="B" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the B topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the B topics rank?"))}
else if (section=="C" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the c topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the C topics rank?"))}
else if (section=="C" && subsample=="dummy2")...
.
.
}
So I tried to extract the dataframe used from the list so that it matches the string of the section typed in the function (data<-list[grep(section, names(list))]) and store its name as a character (name=as.character(names(data))), because I thought that in this way the function would have recognized the string "A", "B" or "C" by itself, without the need for me to specify each condition.
However, if I run it, I get this error: Warning message: In section == name && subsample == "dummy1" : 'length(x) = 4 > 1' in coercion to 'logical(1)', that, from what I understand, is due to the presence of a vector in the statement. But I have no idea how to correct for this (I'm still quite new to R).
How can I fix the function so that it does what I want? Thanks in advance!
Well, I can't really test your code without the plot_likert_general_section function or the plot_likert function, but I've done a bit of simplifying and best practices--passing list in as an argument, consistent spaces and assignment operators, etc.--and this is my best guess as to what you want:
plot_1 = function(list, section, subsample) { ## added `list` as an argument
data = list[[grep(section, names(list))]] # use [[ to extract a single item
name = as.character(names(data))
if(subsample == "None"){
plot_likert_general_section(df1[c(1:2)], section)
} else {
yesno = paste(subsample, c("yes", "no"), sep = ".")
plot_likert(data[[yesno[1]]], title = paste("How do the", name, "topics rank?"))
plot_likert(data[[yesno[2]]], title = paste("How do the", name, "topics rank?"))
}
}
plot_1(list, section = "C", subsample = "dummy1)
I'm not sure if your plot_likert functions use base or grid graphics--but either way you'll need to handle the multiple plots. With base, probably use mfrow() to display both of them, if grid I'd suggest putting them in a list to return them both, and then maybe using gridExtra::grid.arrange() (or similar) to plot both of them.
You're right that the error is due to passing a vector where a single value is expected. Try inserting print statements before the equality test to diagnose why this is.
Also, be careful with choosing variable names like name which are baseR functions (e.g. ?name). I'd also recommend following the tidyverse style guide here: https://style.tidyverse.org/.

Is it possible to add external arguments to form partial field names?

I have two fields:
FirstVisit
SecondVisit
I am building a function to pull data from either field depending on user input (heavily reduced yet relevant version of function):
pullData(visit){
# Do something
}
What I am looking to do is for the function to take the user's input and use it to form part of the call to the data frame field.
For example, when the user runs:
pullData(First)
The function will run like this:
print(df$FirstVisit)
Conversely, when the user runs:
pullData(Second)
The function will run:
print(df$SecondVisit)
My function is considerably more complex than this, but this basic example relates to just the specific aspect of it that I am trying to work out.
So far I have tried something like:
print(paste0(df["df$", visit, "Visit", ])
# The intention is to result in df$FirstVisit or df$SecondVisit depending on the input
And this:
print(paste0(df[df$", visit, "Visit, ])
# Again, intended result should be df$FirstVisit or df$SecondVisit, depending on the input
among other alternatives (some with paste()), yet nothing has worked so far.
I suspect that it is possible and feel that I am close.
How can I achieve this?
If you really want to run the function like pullData(First), you need to use metaprogramming (to get the name of the argument instead of the arguements value) like
pullData <- function(...) {
arg <- rlang::ensyms(...)
if(length(arg)!=1) stop("invalid argument in pullData")
dataName <- paste0(as.character(arg[[1]]),"Visit")
print(df[[dataName]])
}
If you can manage to call the function with a character-argument like pullData("First"), you can simply do:
pullData <- function(choice = "First") {
dataName <- paste0(choice,"Visit")
print(df[[dataName]])
}
I am not quite sure if this is what you're going for, but here's a possible solution:
pullData <- function(visit){
visit <- rlang::quo_text(enquo(visit))
visit <- tolower(visit)
if (visit %in% c("first", "firstvisit")){
data <- df$FirstVisit
}
if (visit %in% c("second", "secondvisit")){
data <- df$SecondVisit
}
data
}
Using this sample data:
df <- data.frame(FirstVisit = c("first value"),
SecondVisit = c("second value"))
Gets us:
> pullData(first)
[1] "first value"
> pullData(second)
[1] "second value"
For the sake of completeness, R allows for partial matching when subsetting with character indices; see help("$").
df <- data.frame(FirstVisit = 11:12, SecondVisit = 21:22)
For interactive use:
df$F
[1] 11 12
df$S
[1] 21 22
For programming on computed indices, the [[ operator has to be used, e.g.,
df[["F", exact = FALSE]]
[1] 11 12
This can be wrapped in a function call:
pullData <- function(x) df[[x, exact = FALSE]]
Thus,
pullData("F")
pullData("Fi")
pullData("First")
pullData("FirstVisit")
return all
[1] 11 12
while
pullData("S")
pullData("Second")
return
[1] 21 22
But watchout when dealing with user supplied input as typos might lead to unexpected results:
pullData("f")
pullData("first")
pullData("Frist")
NULL

Check expression argument of function

When writing functions it is important to check for the type of arguments. For example, take the following (not necessarily useful) function which is performing subsetting:
data_subset = function(data, date_col) {
if (!TRUE %in% (is.character(date_col) | is.expression(date_col))){
stop("Input variable date is of wrong format")
}
if (is.character(date_col)) {
x <- match(date_col, names(data))
} else x <- match(deparse(substitute(date_col)), names(data))
sub <- data[,x]
}
I would like to allow the user to provide the column which should be extracted as character or expression (e.g. a column called "date" vs. just date). At the beginning I would like to check that the input for date_col is really either a character value or an expression. However, 'is.expression' does not work:
Error in match(x, table, nomatch = 0L) : object '...' not found
Since deparse(substitute)) works if one provides expressions I thought 'is.expression' has to work as well.
What is wrong here, can anyone give me a hint?
I think you are not looking for is.expression but for is.name.
The tricky part is to get the type of date_col and to check if it is of type character only if it is not of type name. If you called is.character when it's a name, then it would get evaluated, typically resulting in an error because the object is not defined.
To do this, short circuit evaluation can be used: In
if(!(is.name(substitute(date_col)) || is.character(date_col)))
is.character is only called if is.name returns FALSE.
Your function boils down to:
data_subset = function(data, date_col) {
if(!(is.name(substitute(date_col)) || is.character(date_col))) {
stop("Input variable date is of wrong format")
}
date_col2 <- as.character(substitute(date_col))
return(data[, date_col2])
}
Of course, you could use if(is.name(…)) to convert only to character when date_col is a name.
This works:
testDF <- data.frame(col1 = rnorm(10), col2 = rnorm(10, mean = 10), col3 = rnorm(10, mean = 50), rnorm(10, mean = 100))
data_subset(testDF, "col1") # ok
data_subset(testDF, col1) # ok
data_subset(testDF, 1) # Error in data_subset(testDF, 1) : Input variable date is of wrong format
However, I don't think you should do this. Consider the following example:
var <- "col1"
data_subset(testDF, var) # Error in `[.data.frame`(data, , date_col2) : undefined columns selected
col1 <- "col2"
data_subset(testDF, col1) # Gives content of column 1, not column 2.
Though this "works as designed", it is confusing because unless carefully reading your function's documentation one would expect to get col1 in the first case and col2 in the second case.
Abusing a famous quote:
Some people, when confronted with a problem, think “I know, I'll use non-standard evaluation.” Now they have two problems.
Hadley Wickham in Non-standard evaluation:
Non-standard evaluation allows you to write functions that are extremely powerful. However, they are harder to understand and to program with. As well as always providing an escape hatch, carefully consider both the costs and benefits of NSE before using it in a new domain.
Unless you expect large benefits from allowing to skip the quotes around the name of the column, don't do it.

How to add an attribute to any level of objects (list, list\$frame, list\$frame\$column)?

My problem is as follows: I'm trying to write a function that sets a collection of attributes on an object in a given environment. I'm trying to mimic a metadata layer, like SAS does, so you can set various attributes on a variable, like label, decimal places, date format, and many others.
Example:
SetAttributes(object = "list$dataframe$column", label="A label", width=20, decDigits=2,
dateTimeFormat="....", env=environment())
But I have to set attributes on different levels of objects, say:
comment(list$dataframe$column) <- "comment on a column of a dataframe in a list"
comment(dataframe$column) <- "comment on a column of a dataframe"
comment(list) <- "comment on a list/dataframe/vector"
Alternatively it can be done like this:
comment("env[[list]][[dataframe]][[column]]) <- "text"
# (my function recognizes both formats, as a variable and as a string with chain of
# [[]] components).
So I have implemented it this way:
SetAttributes <- function(varDescription, label="", .........., env=.GlobalEnv) {
parts <- strsplit( varDescription, "$", fixed=TRUE)[[1]]
if(length(parts) == 3) {
lst <- parts[1]
df <- parts[2]
col <- parts[3]
if(!is.na(label)) comment(env[[lst]][[df]][[col]]) <- label
if(!is.na(textWidth)) attr(env[[lst]][[df]][[col]], "width") <- textWidth
....
} else if(length(parts) == 2) {
df <- varTxtComponents[1]
col <- varTxtComponents[2]
if(!is.na(label)) comment(env[[df]][[col]]) <- label
if(!is.na(textWidth)) attr(env[[df]][[col]], "width") <- textWidth
....
} else if(length(parts) == 1) {
....
You see the problem now: I have three blocks of similar code for length(parts) == 3, 2 and 1
When I tried to automatize it this way:
path <- c()
sapply(parts, FUN=function(comp){ path <<- paste0(path, "[[", comp, "]]") )}
comment(eval(parse(text=paste0(".GlobalEnv", path)))) <- "a comment"
I've got an error:
Error in comment(eval(parse(text = paste0(".GlobalEnv", path)))) <- "a comment" :
target of assignment expands to non-language object
Is there any way to get an object on any level and set attributes for it not having a lot of repeated code?
PS: yes, I heard thousand times that changing external variables from inside a function is an evil, so please don't mention it. I know what I want to achieve.
Just to make sure you hear it 1001 times, it's a very bad idea for a function to have side effects like this. This is a very un R-like way to program something like this. If you're going to write R code, it's better to do things the R way. This means returning modified objects that can optionally be reassigned. This would make life much easier.
Here's a simplified version which only focuses on the comment.
SetComment <- function(varDescription, label=NULL, env=.GlobalEnv) {
obj <- parse(text= varDescription)[[1]]
eval(substitute(comment(X)<-Y, list(X=obj, Y=label)), env)
}
a<-list(b=4)
comment(a$b)
# NULL
SetComment("a$b", "check")
comment(a$b)
# [1] "check"
Here, rather than parsing and splitting the string, we build an expression that we evaluate in the proper context. We use substitute() to pop in the values you want to the actual call.

How to pass function arguments to within?

I wonder how I can pass arguments of some function to a subpart of the function that uses with / within. e.g.:
myfunction <- function(dataframe,col1,col2){
res <- within(dataframe, somenewcol <-paste(col1,"-","col2",sep=""))
return(res)
}
where col1 and col2 are columns contained in the dataframe. What´s the correct way to pass the arguments col1 and col2 to the within expression? When I just try to use it, i get :
Error in paste(col1, "-", , :
object 'Some_passed_col' not found
Here´s an example:
dataset <- data.frame(rnorm(20),2001:2020,rep(1:10,2))
names(dataset) <- c("mydata","col1","col2")
myfunction <- function(dataframe,arg1,arg2){
res <- with(dataframe, onesinglecol <- paste(arg1,"-","arg2",sep=""))
return(res)
}
# call function
myfunction(dataset,col1,col2)
EDIT:
the following works for me now, but I cannot completely understand why... so any further explanation is appreciated:
myfunction(dataset,arg1="col1",arg2="col2")
if I adjust
res <- with(dataframe, onesinglecol <- paste(get(arg1),"-",get(arg2),sep=""))
Try
myfunction <- function(dataframe,arg1,arg2){
dataframe["onesinglecol"] <- dataframe[[arg1]] -dataframe[[arg2]]
return(dataframe)
}
And call it with character-valued column names rather than object names that are nowhere defined:
myfunction(dataset,"col1","col2")
mydata col1 col2 onesinglecol
1 0.6834402 2001 1 2000
2 1.6623748 2002 2 2000
3 -0.5769926 2003 3 2000 .... etc
I think this is done via the ... directive:
E.g.:
myfunction <- function(dataframe, ...){
var <- anotherfunction( arg1= 1, arg2 = 2 , ...)
return(var)
}
... is a placeholder for additional arguments passed through to "anotherfunction".
You are missing the fact that col1 and col2 do not exist in dataframe (from your) nor in the user workspace.
Basically, with() and within() work like this:
> foo <- 10
> bar <- data.frame(FOO = 10)
> FOO + foo
Error: object 'FOO' not found
> with(bar, FOO + foo)
[1] 20
In the first case, FOO was not found as it is inside bar. In the second case, we set up an environment within which our expression is evaluated. Inside that environment FOO does exist. foo is also found in the workspace.
In your first example (please don't edit error messages etc, show us exactly what code you ran and what error was produced) either one col1 or col2 didn't exist in the environment created within which your expression was evaluated.
Further, you appear to want to store in col1 and col2 the name of a column (component) of your dataframe. DWin has shown you one way to use this information. An alternative maintaining the use of within() is to use get() like this:
res <- within(dataframe, somenewcol <- paste(get(col1), "-", get(col2), sep=""))
Why this works, as per your extra edit and quandary, is that get() returns the object named by it's first argument. get("foo") will return the object named foo (continuing from my example above):
get("foo") ## finds foo and returns it
1 10
In your example, you have a data frame with names inter alia "col1" and "col2". You changed your code to get(arg1) (where arg1 <- "col1"), where you are asking get() to return the object with name "col1" from the evaluation environment visible at the time your function is being evaluated. As your dataframe contained a col1 component, which was visible because within() had made it available, get() was able to find an object with the required name and include it in the expression.
But at this point you are trying to jump through too many hoops because your questions haven't been specific. I presume you are asking this because of my answer here to your previous Q. That answer suggested a better alternative than attach(). But you weren't clear there what some arguments were or what your really wanted to do. If I had known now what you really wanted to do then I would have suggested you use DWin's Answer above.
You seem to not want to hard code the column/component names. If you could hard code, this would be the solution:
res <- within(dataframe, somenewcol <- paste(col1, "-", col2, sep = ""))
But seeing as you don't want to hard code you need a get() version or DWin's solution.

Resources