R base::options with variables - r

I have noticed some behaviour in R's base::options() that I am unable to fully understand.
the following is fine:
> vals_vector
[1] "temp" "hum" "co2" "voc" "pm1" "pm2_5" "pm10"
> options("hum" = TRUE)
> if (getOption("hum")) {
+ print("stuff")
+ }
[1] "stuff"
And this is also fine:
> options(TEMP_ENABLE = "temp" %in% vals_vector)
> getOption("TEMP_ENABLE")
[1] TRUE
However the following does not work.
> options(as.character(vals_vector[1]) = TRUE)
Error: unexpected '=' in "options(as.character(vals_vector[1]) ="
> as.character(vals_vector[1])
[1] "temp"
> "temp"
[1] "temp"
Makes no sense. You can see, I have evaluated the argument and it's exactly the same is both cases. Just in one I've used the variable. My intention was to use a loop to set an option for each variable present in a data set. Why doesn't this work as expected?

You can't use function calls as stand-ins for argument names in R. This is nothing to do with options, it's just how the R parser works. Take the following example, using the function data.frame.
data.frame(A_B = 2)
#> A_B
#> 1 2
Suppose we wanted to generate the name A_B programmatically:
paste("A", "B", sep = "_")
#> [1] "A_B"
Looks good. But if we try to use this function call with the intention that its output is interpreted as an argument name, the parser will simply tell us we have a syntax error:
data.frame(paste("A", "B", sep = "_") = 2)
#> Error: unexpected '=' in "data.frame(paste("A", "B", sep = "_") ="
There are ways round this - with most base R functions we would create a named list programmatically and pass that as an argument list using do.call:
mylist <- list(TRUE)
names(mylist) <- vals_vector[1]
do.call(options, mylist)
getOption("temp")
#> [1] TRUE
However, if you read the docs for options, it says
Options can also be passed by giving a single unnamed argument which is a named list.
So a more concise idiom would be:
options(setNames(list(TRUE), vals_vector[1]))
getOption("temp")
#> [1] TRUE

Related

R error: "duplicate 'row.names' are not allowed"

I got the error when I wanted to set the first column as the row names:
dt <- fread('../data/data_logTMP.csv', header = T)
rownames(dt) <- dt$GENE
I used duplicated() to check the values:
> which(duplicated(dt$GENE) == TRUE)
[1] 20209 21919
Therefore, I compared these values:
> dt$GENE[20209] == dt$GENE[21919]
[1] FALSE
> dt$GENE[20209]
[1] "1-Mar"
> dt$GENE[21919]
[1] "2-Mar"
Why were these two values recognized as duplicated? And how can I fix this problem?
As you are using fread for reading the file the default class of you object dt will be of data.table. By design data.table will not support row.names. Therefore you need to pass an additional argument to fread as shown below to make sure that the class of the object that you are reading is not a data.table.
data.table::fread(input = "file name",sep = ",",header = T,data.table = FALSE)

Why are `invisible` and `capture.output` insufficient in removing both concatenated and diagnostic messages in R?

I would like to use invisible and capture.output to remove both concatenated and diagnostic messages. However, it seems that I can only remove one or the other. Here is an example function:
func <- function(x){
message('This is a diagnostic message')
cat('This is a cat message')
var <- x^2
return(var)
}
If we use type = "message", we get:
> invisible(capture.output(out <- func(5), type = "message"))
This is a cat message
> out
[1] 25
but with type = "output", we instead get:
> rm(list = "out")
> invisible(capture.output(out <- func(5), type = "output"))
This is a diagnostic message
> out
[1] 25
When I do both,
> invisible(capture.output(out <- func(5), type = c("output", "message")))
This is a diagnostic message
> out
[1] 25
I still get the diagnostic message. Now, if I do
> suppressMessages(invisible(capture.output(out <- func(5))))
> out
[1] 25
Then it finally seems to work. However, why was having two of the types specified not working? Is it a bug? Are there special cases where having suppressMessages(invisible(capture.output())) all together would result in outputs that do not show properly?
invisible() sets a flag on a function result so that it won't auto-print. In your examples, it is setting the invisible flag on the result of capture.output(). In the first example, that's the string
[1] "This is a diagnostic message"
In the second example, that's
[1] "This is a cat message"
In the third example you pass both. Despite the fact that the default value is both, only the first is used. It's a stupid convention that is very old in the S language. If you really want to capture both kinds of message, you need to call capture.output twice:
capture.output(capture.output(out <- func(5), type = "message"), type="output")
and if you don't want the result to auto-print, you can wrap it in invisible(), or just assign it to a variable:
msgs <- capture.output(capture.output(out <- func(5), type = "message"), type="output")
There is a function to suppress console messages. Rather unsurprisingly its name is suppressMessages.
res <- capture.output( suppressMessages(out <- func(5) ) )
# So :
> res
[1] "This is a cat message"
> out
[1] 25

How can one make visible the difference in the outputs of quote() and substitute()?

As applied to the same R code or objects, quote and substitute typically return different objects. How can one make this difference apparent?
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
> tmc <- function(X){
out <- list(typ = typeof(X), mod = mode(X), cls = class(X))
out
}
> df1 <- data.frame(a = 1, b = 2)
Here the printed output of quote and substitute are the same.
> quote(df1)
df1
> substitute(df1)
df1
And the structure of the two are the same.
> str(quote(df1))
symbol df1
> str(substitute(df1))
symbol df1
And the type, mode and class are all the same.
> tmc(quote(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
> tmc(substitute(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
And yet, the outputs are not the same.
> is.identical(df1)
[1] FALSE
Note that this question shows some inputs that cause the two functions to display different outputs. However, the outputs are different even when they appear the same, and are the same by most of the usual tests, as shown by the output of is.identical() above. What is this invisible difference, and how can I make it appear?
note on the tags: I am guessing that the Common LISP quote and the R quote are similar
The reason is that the behavior of substitute() is different based on where you call it, or more precisely, what you are calling it on.
Understanding what will happen requires a very careful parsing of the (subtle) documentation for substitute(), specifically:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
So there are essentially three options.
In this case:
> df1 <- data.frame(a = 1, b = 2)
> identical(quote(df1),substitute(df1))
[1] TRUE
df1 is an "ordinary variable", but it is called in .GlobalEnv, since env argument defaults to the current evaluation environment. Hence we're in the very last case where the symbol, df1, is left unchanged and so it identical to the result of quote(df1).
In the context of the function:
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
The important distinction is that now we're calling these functions on X, not df1. For most R users, this is a silly, trivial distinction, but when playing with subtle tools like substitute it becomes important. X is a formal argument of a function, so that implies we're in a different case of the documented behavior.
Specifically, it says that now "the expression slot of the promise replaces the symbol". We can see what this means if we debug() the function and examine the objects in the context of the function environment:
> debugonce(is.identical)
> is.identical(X = df1)
debugging in: is.identical(X = df1)
debug at #1: {
out <- identical(quote(X), substitute(X))
out
}
Browse[2]>
debug at #2: out <- identical(quote(X), substitute(X))
Browse[2]> str(quote(X))
symbol X
Browse[2]> str(substitute(X))
symbol df1
Browse[2]> Q
Now we can see that what happened is precisely what the documentation said would happen (Ha! So obvious! ;) )
X is a formal argument, or a promise, which according to R is not the same thing as df1. For most people writing functions, they are effectively the same, but the internal implementation disagrees. X is a promise object, and substitute replaces the symbol X with the one that it "points to", namely df1. This is what the docs mean by the "expression slot of the promise"; that's what R sees when in the X = df1 part of the function call.
To round things out, try to guess what will happen in this case:
is.identical <- function(X){
out <- identical(quote(A), substitute(A))
out
}
is.identical(X = df1)
(Hint: now A is not a "bound symbol in the environment".)
A final example illustrating more directly the final case in the docs with the confusing exception:
#Ordinary variable, but in .GlobalEnv
> a <- 2
> substitute(a)
a
#Ordinary variable, but NOT in .GlobalEnv
> e <- new.env()
> e$a <- 2
> substitute(a,env = e)
[1] 2

Extraction operator `$`() returns zero-length vectors within function

I am encountering an issue when I use the extraction operator `$() inside of a function. The problem does not exist if I follow the same logic outside of the loop, so I assume there might be a scoping issue that I'm unaware of.
The general setup:
## Make some fake data for your reproducible needs.
set.seed(2345)
my_df <- data.frame(cat_1 = sample(c("a", "b"), 100, replace = TRUE),
cat_2 = sample(c("c", "d"), 100, replace = TRUE),
continuous = rnorm(100),
stringsAsFactors = FALSE)
head(my_df)
This process I am trying to dynamically reproduce:
index <- which(`$`(my_df, "cat_1") == "a")
my_df$continuous[index]
But once I program this logic into a function, it fails:
## Function should take a string for the following:
## cat_var - string with the categorical variable name as it appears in df
## level - a level of cat_var appearing in df
## df - data frame to operate on. Function assumes it has a column
## "continuous".
extract_sample <- function(cat_var, level, df = my_df) {
index <- which(`$`(df, cat_var) == level)
df$continuous[index]
}
## Does not work.
extract_sample(cat_var = "cat_1", level = "a")
This is returning numeric(0). Any thoughts on what I'm missing? Alternative approaches are welcome as well.
The problem isn't the function, it's the way $ handles the input.
cat_var = "cat_1"
length(`$`(my_df,"cat_1"))
#> [1] 100
length(`$`(my_df,cat_var))
#> [1] 0
You can instead use [[ to achieve your desired outcome.
cat_var = "cat_1"
length(`[[`(my_df,"cat_1"))
#> [1] 100
length(`[[`(my_df,cat_var))
#> [1] 100
UPDATE
It's been noted that using [[ this way is ugly. And it is. It's useful when you want to write something like lapply(stuff,'[[',1)
Here, you should probably be writing it as my_df[[cat_var]].
Also, this question/answer goes into a little more detail about why $ doesn't work the way you want it to.
The problem is that the $ is non-standard, in the sense that when you don't quote the parameter input, it still tries to parse it and use what you typed, even if that was meant to refer to another variable.
Or more simply, as #42 put it in the first comment in the linked question:
The "$" function does not evaluate its arguments, whereas "[[" does`.
Here's a much simpler data set as an example.
my_df <- data.frame(a=c(1,2))
v <- "a"
Compare the usual usage; the first two give the same result, if you don't quote it, it parses it. So the third one (now) clearly doesn't work properly.
my_df$"a"
## [1] 1 2
my_df$a
## [1] 1 2
my_df$v
## NULL
That's exactly what's happening to you:
`$`(my_df, "a")
## [1] 1 2
`$`(my_df, v)
## NULL
Instead we need to evaluate v before sending to $ by using do.call.
do.call(`$`, list(my_df, v))
## [1] 1 2
Or, more appropriately, use the [[ version which does evaluate the parameters first.
`[[`(my_df, v)
## [1] 1 2
Problem lies in the way you are indexing to the column. This works just making a slight tweak to yours:
extract_sample <- function(cat_var, level, df = my_df) {
index <- df[, cat_var] == level
df$continuous[index]
}
Using it dynamically:
> extract_sample(cat_var = "cat_2", level = "d")
[1] -0.42769207 -0.75650031 0.64077840 -1.02986889 1.34800344 0.70258431 1.25193247
[8] -0.62892048 0.48822673 0.10432070 1.11986063 -0.88222370 0.39158408 1.39553002
[15] -0.51464283 -1.05265106 0.58391650 0.10555913 0.16277385 -0.55387829 -1.07822831
[22] -1.23894422 -2.32291394 0.11118881 0.34410388 0.07097271 1.00036812 -2.01981056
[29] 0.63417799 -0.53008375 1.16633422 -0.57130500 0.61614135 1.06768285 0.74182293
[36] 0.56538633 0.16784205 -0.14757303 -0.70928924 -1.91557732 0.61471302 -2.80741967
[43] 0.40552376 -1.88020372 -0.38821089 -0.42043745 1.87370600 -0.46198139 0.10788358
[50] -1.83945868 -0.11052531 -0.38743950 0.68110902 -1.48026285

Using partial within map

I've came up to an interesting problem. I have a function of three variables, let's say (for simplicity and transparency) it is this:
my_fun <- function(a, b, c) paste(a, b, c, sep = '-')
I want to create multiple functions with only argument c for several combinations of a anb b. I am using functions map2 and partial (both from package purrr).
require(purrr)
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun)
I would expect each function in the list of functions produce different output, but that is not true.
funs[[1]]('hi') # [1] "e-E-hi"
funs[[3]]('hi') # [1] "e-E-hi"
funs[[5]]('hi') # [1] "e-E-hi"
I am able to create different solution to my problem, so my question isn't "how to do it". I am rather interested in why it does this.
Another example using base mapply:
mapply(partial, letters[1:5], LETTERS[1:5], MoreArgs = list(...f = my_fun))[[1]]('hi')
# [1] "e-E-hi"
The problem stems from the fact that partial uses lazy evaluation, which within map2 means that it is storing .x and .y instead of a and A. Luckily there is a function argument for that, and we can use:
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun, .lazy = FALSE)
funs[[1]]('hi')
# [1] "a-A-hi"
If you look at your version, we see this:
funs[[1]]
# function (...)
# my_fun(.x[[i]], .y[[i]], ...)
# <environment: 0x00000000201d9598>
And the same for each one of the other 4.
Now, if we look into that environment, we can see:
ls(envir = environment(funs[[1]]))
# [1] "i"
So there is an object stored i in there, that will determine which .x and .y we get and its value is:
get('i', environment(funs[[1]]))
# [1] 5
Also note that your arguments are stored there as well, but are hidden due to their starting with a .:
ls(envir = environment(funs[[1]]), all.names = TRUE)
# [1] "..." ".f" ".x" ".y" "i"
get('.x', envir = environment(funs[[1]]))
# [1] "a" "b" "c" "d" "e"
So for all of these, we get the same result. Specifically, the executed call ends up being:
my_fun(letters[1:5][[5]], LETTERS[1:5][[5]], 'hi')
The lazy evaluation is not playing nice here, and using the stored internal loop counter inside map2.

Resources