I am building a gui which lets me select a subset of a data.frame by clicking on the various factor names.
After having received user input, how do i pass it to the subset function?
e.g.: I have a dataframe df with factors MORNING and EVENING in column timeofday and RECEIVE and SEND in column optype. From the GUI I know that the user wants to a subset containing only RECEIVE operations, so i have the following strings as well:
RequestedFactor1 which equals "optype"
RequestedRelationship1 equals "=="
RequestedValue1 which equals "RECEIVE"
What can i do to those strings to pass them to subset, so that I will receive the same output as if i had called subset(df,optype=="RECEIVE") ?
TIA
For this you can use an eval-parse construct, but again I warn that this is actually tricky business. Please read the help files about these two very carefully. So in your case this becomes :
subset(df,eval(parse(text=paste(RF1,RR1,RV1))))
An example to illustrate some tricky parts :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "\"RECEIVE\""
> optype <- c("RECEIVE","Not")
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
Mind the escaped quote-marks (\"). This is necessary as you want to test against a string, and not the RECEIVE object. Alternatively you can do :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "Text"
> optype <- c("RECEIVE","Not")
> Text <- "RECEIVE"
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
The comparison operators in R are actually special functions, so you can use do.call to run the functions, no need for eval and parse and the potential headaches that can come from there. e.g.:
rf1 <- 'Species'
rr1 <- '=='
rv1 <- 'setosa'
subset(iris, do.call(rr1, list( get(rf1), rv1 ) ) )
You need to "get" the variable so that you have the variable value rather than the name, the rest can be the character strings.
Related
I have two fields:
FirstVisit
SecondVisit
I am building a function to pull data from either field depending on user input (heavily reduced yet relevant version of function):
pullData(visit){
# Do something
}
What I am looking to do is for the function to take the user's input and use it to form part of the call to the data frame field.
For example, when the user runs:
pullData(First)
The function will run like this:
print(df$FirstVisit)
Conversely, when the user runs:
pullData(Second)
The function will run:
print(df$SecondVisit)
My function is considerably more complex than this, but this basic example relates to just the specific aspect of it that I am trying to work out.
So far I have tried something like:
print(paste0(df["df$", visit, "Visit", ])
# The intention is to result in df$FirstVisit or df$SecondVisit depending on the input
And this:
print(paste0(df[df$", visit, "Visit, ])
# Again, intended result should be df$FirstVisit or df$SecondVisit, depending on the input
among other alternatives (some with paste()), yet nothing has worked so far.
I suspect that it is possible and feel that I am close.
How can I achieve this?
If you really want to run the function like pullData(First), you need to use metaprogramming (to get the name of the argument instead of the arguements value) like
pullData <- function(...) {
arg <- rlang::ensyms(...)
if(length(arg)!=1) stop("invalid argument in pullData")
dataName <- paste0(as.character(arg[[1]]),"Visit")
print(df[[dataName]])
}
If you can manage to call the function with a character-argument like pullData("First"), you can simply do:
pullData <- function(choice = "First") {
dataName <- paste0(choice,"Visit")
print(df[[dataName]])
}
I am not quite sure if this is what you're going for, but here's a possible solution:
pullData <- function(visit){
visit <- rlang::quo_text(enquo(visit))
visit <- tolower(visit)
if (visit %in% c("first", "firstvisit")){
data <- df$FirstVisit
}
if (visit %in% c("second", "secondvisit")){
data <- df$SecondVisit
}
data
}
Using this sample data:
df <- data.frame(FirstVisit = c("first value"),
SecondVisit = c("second value"))
Gets us:
> pullData(first)
[1] "first value"
> pullData(second)
[1] "second value"
For the sake of completeness, R allows for partial matching when subsetting with character indices; see help("$").
df <- data.frame(FirstVisit = 11:12, SecondVisit = 21:22)
For interactive use:
df$F
[1] 11 12
df$S
[1] 21 22
For programming on computed indices, the [[ operator has to be used, e.g.,
df[["F", exact = FALSE]]
[1] 11 12
This can be wrapped in a function call:
pullData <- function(x) df[[x, exact = FALSE]]
Thus,
pullData("F")
pullData("Fi")
pullData("First")
pullData("FirstVisit")
return all
[1] 11 12
while
pullData("S")
pullData("Second")
return
[1] 21 22
But watchout when dealing with user supplied input as typos might lead to unexpected results:
pullData("f")
pullData("first")
pullData("Frist")
NULL
As applied to the same R code or objects, quote and substitute typically return different objects. How can one make this difference apparent?
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
> tmc <- function(X){
out <- list(typ = typeof(X), mod = mode(X), cls = class(X))
out
}
> df1 <- data.frame(a = 1, b = 2)
Here the printed output of quote and substitute are the same.
> quote(df1)
df1
> substitute(df1)
df1
And the structure of the two are the same.
> str(quote(df1))
symbol df1
> str(substitute(df1))
symbol df1
And the type, mode and class are all the same.
> tmc(quote(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
> tmc(substitute(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
And yet, the outputs are not the same.
> is.identical(df1)
[1] FALSE
Note that this question shows some inputs that cause the two functions to display different outputs. However, the outputs are different even when they appear the same, and are the same by most of the usual tests, as shown by the output of is.identical() above. What is this invisible difference, and how can I make it appear?
note on the tags: I am guessing that the Common LISP quote and the R quote are similar
The reason is that the behavior of substitute() is different based on where you call it, or more precisely, what you are calling it on.
Understanding what will happen requires a very careful parsing of the (subtle) documentation for substitute(), specifically:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
So there are essentially three options.
In this case:
> df1 <- data.frame(a = 1, b = 2)
> identical(quote(df1),substitute(df1))
[1] TRUE
df1 is an "ordinary variable", but it is called in .GlobalEnv, since env argument defaults to the current evaluation environment. Hence we're in the very last case where the symbol, df1, is left unchanged and so it identical to the result of quote(df1).
In the context of the function:
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
The important distinction is that now we're calling these functions on X, not df1. For most R users, this is a silly, trivial distinction, but when playing with subtle tools like substitute it becomes important. X is a formal argument of a function, so that implies we're in a different case of the documented behavior.
Specifically, it says that now "the expression slot of the promise replaces the symbol". We can see what this means if we debug() the function and examine the objects in the context of the function environment:
> debugonce(is.identical)
> is.identical(X = df1)
debugging in: is.identical(X = df1)
debug at #1: {
out <- identical(quote(X), substitute(X))
out
}
Browse[2]>
debug at #2: out <- identical(quote(X), substitute(X))
Browse[2]> str(quote(X))
symbol X
Browse[2]> str(substitute(X))
symbol df1
Browse[2]> Q
Now we can see that what happened is precisely what the documentation said would happen (Ha! So obvious! ;) )
X is a formal argument, or a promise, which according to R is not the same thing as df1. For most people writing functions, they are effectively the same, but the internal implementation disagrees. X is a promise object, and substitute replaces the symbol X with the one that it "points to", namely df1. This is what the docs mean by the "expression slot of the promise"; that's what R sees when in the X = df1 part of the function call.
To round things out, try to guess what will happen in this case:
is.identical <- function(X){
out <- identical(quote(A), substitute(A))
out
}
is.identical(X = df1)
(Hint: now A is not a "bound symbol in the environment".)
A final example illustrating more directly the final case in the docs with the confusing exception:
#Ordinary variable, but in .GlobalEnv
> a <- 2
> substitute(a)
a
#Ordinary variable, but NOT in .GlobalEnv
> e <- new.env()
> e$a <- 2
> substitute(a,env = e)
[1] 2
I am trying to understand names, lists and lists of lists in R. It would be convenient to have a way to dynamically label them like this:
> ll <- list("1" = 2)
> ll
$`1`
[1] 2
But this is not working:
> ll <- list(as.character(1) = 2)
Error: unexpected '=' in "ll <- list(as.character(1) ="
Neither is this:
> ll <- list(paste(1) = 2)
Error: unexpected '=' in "ll <- list(paste(1) ="
Why is that? Both paste() and as.character() are returning "1".
The reason is that paste(1) is a function call that evaluates to a string, not a string itself.
The The R Language Definition says this:
Each argument can be tagged (tag=expr), or just be a simple expression.
It can also be empty or it can be one of the special tokens ‘...’, ‘..2’, etc.
A tag can be an identifier or a text string.
Thus, tags can't be expressions.
However, if you want to set names (which are just an attribute), you can do so with structure, eg
> structure(1:5, names=LETTERS[1:5])
A B C D E
1 2 3 4 5
Here, LETTERS[1:5] is most definitely an expression.
If your goal is simply to use integers as names (as in the question title), you can type them in with backticks or single- or double-quotes (as the OP already knows). They are converted to characters, since all names are characters in R.
I can't offer a deep technical explanation for why your later code fails beyond "the left-hand side of = is not evaluated in that context (of enumerating items in a list)". Here's one workaround:
mylist <- list()
mylist[[paste("a")]] <- 2
mylist[[paste("b")]] <- 3
mylist[[paste("c")]] <- matrix(1:4,ncol=2)
mylist[[paste("d")]] <- mean
And here's another:
library(data.table)
tmp <- rbindlist(list(
list(paste("a"), list(2)),
list(paste("b"), list(3)),
list(paste("c"), list(matrix(1:4,ncol=2))),
list(paste("d"), list(mean))
))
res <- setNames(tmp$V2,tmp$V1)
identical(mylist,res) # TRUE
The drawbacks of each approach are pretty serious, I think. On the other hand, I've never found myself in need of richer naming syntax.
I have a bunch of data frames named Ldat.1, Ldat.2, etc., in my normal R environment that I can access interactively.
From the console, I can type:
> dim(Ldat.1)[1]
[1] 40
> dim(Ldat.2)[1]
[1] 39
So I can tell that the first has 40 rows and the second has 39 rows.
However, with dozens of data frames, I want to write a script to tell me how many rows are in each frame.
I tried the following:
print(dim(Ldat.1)[1])
print(dim(Ldat.2)[1])
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(.GlobalEnv$namex)
print(size[1])
}
and the console showed:
> print(dim(Ldat.1)[1])
[1] 40
> print(dim(Ldat.2)[1])
[1] 39
> for (i in 1:2){
+ namex<-paste("Ldat.",i,sep="")
+ size<-dim(.GlobalEnv$namex)
+ print(size[1])
+ }
NULL
NULL
It's easy enough to construct the strings:
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(namex)
print(namex)
}
produces:
> for (i in 1:2){
+ namex<-paste("Ldat.",i,sep="")
+ size<-dim(namex)
+ print(namex)
+ }
[1] "Ldat.1"
[1] "Ldat.2"
But despite trying various combinations of "as.data.frame" and "envir=" I can't seem to get R to interpret the string "Ldat.1" as meaning the name of an object accessible from the console.
Thanks in advance.
1) This will list the names of each data frame and the number of rows it has:
sapply(Filter(is.data.frame, mget(ls())), nrow)
If we already had nms, a vector of names of data frames, then we could reduce this to:
nms <- c("Ldat.1", "Ldat.2")
sapply(mget(nms), nrow)
2) Here is another way:
simplify2array(Filter(Negate(is.null), eapply(.GlobalEnv, nrow)))
3) Also try the ll function in the R.oo package.
Try this instead:
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(.GlobalEnv[[namex]])
print(size[1])
}
The problem has nothing to do with environments and everything to do with the fact that $ does not evaluate its second argument (it's first argument being the token name that precedes it, .GlobalEnv in this case). There is no object in .GlobalEnv named "namex". On the other hand "[[" does an evaluation step, so the value of namex (which is "Ldat.1" during the first pass of the for-loop) will get substituted and lookup will succeed.
I want to use information from a field and include it in a R function, e.g.:
data #name of the data.frame with only one raw
"(if(nclusters>0){OptmizationInputs[3,3]*beta[1]}else{0})" # this is the raw
If I want to use this information inside a function how could I do it?
Another example:
A=c('x^2')
B=function (x) A
B(2)
"x^2" # this is the return. I would like to have the return something like 2^2=4.
Use body<- and parse
A <- 'x^2'
B <- function(x) {}
body(B) <- parse(text = A)
B(3)
## [1] 9
There are more ideas here
Another option using plyr:
A <- 'x^2'
library(plyr)
body(B) <- as.quoted(A)[[1]]
> B(5)
[1] 25
A <- "x^2"; x <- 2
BB <- function(z){ print( as.expression(do.call("substitute",
list( parse(text=A)[[1]], list(x=eval(x) ) )))[[1]] );
cat( "is equal to ", eval(parse(text=A)))
}
BB(2)
#2^2
#is equal to 4
Managing expressions in R is very weird. substitute refuses to evaluate its first argument so you need to use do.call to allow the evaluation to occur before the substitution. Furthermore the printed representation of the expressions hides their underlying representation. Try removing the fairly cryptic (to my way of thinking) [[1]] after the as.expression(.) result.