If I have a data table, foo, in R with a column named "date", I can get the vector of date values by the notation
foo[, date]
(Unlike data frames, date doesn't need to be in quotes).
How can this be done programmatically? That is, if I have a variable x whose value is the string "date", then how to I access the column of foo with that name?
Something that sort of works is to create a symbol:
sym <- as.name(x)
v <- foo[, eval(sym)]
...
As I say, that sort of works, but there is something not quite right about it. If that code is inside a function myFun in package myPackage, then it seems that it doesn't work if I explicitly use the package through:
myPackage::myFun(...)
I get an error message saying "undefined columns selected".
[edited] Some more details
Suppose I create a package called myPackage. This package has a single file with the following in it:
library(data.table)
#' export
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.Name(name1)
table1[, eval(sym)]
}
If I load that function using R Studio, then
myFun(tbl)
returns the first column of the data table tbl.
On the other hand, if I call
myPackage::myFun(tbl)
it doesn't work. It complains about
Error in .subset(x, j) : invalid subscript type 'builtin'
I'm just curious as to why myPackage:: would make this difference.
A quick way which points to a longer way is this:
subset(foo, TRUE, date)
The subset function accepts unquoted symbol/names for its 'subset' and 'select' arguments. (Its author, however, thinks this was a bad idea and suggests we use formulas instead.) This was the jumping off place for sections of Hadley Wickham's Advanced Programming webpages (and book).: http://adv-r.had.co.nz/Computing-on-the-language.html and http://adv-r.had.co.nz/Functional-programming.html . You can also look at the code for subset.data.frame:
> subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
The problem with the use of "naked" expressions that get passed into functions is that their evaluation frame is sometimes not what is expected. R formulas, like other functions, carry a pointer to the environment in which they were defined.
I think the problem is that you've defined myFun in your global environment, so it only appeared to work.
I changed as.Name to as.name, and created a package with the following functions:
library(data.table)
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.name(name1)
table1[, eval(sym)]
}
myFun_mod <- function(dt) {
# dt[, eval(as.name(colnames(dt)[1]))]
dt[[colnames(dt)[1]]]
}
Then, I tested it using this:
library(data.table)
myDt <- data.table(a=letters[1:3],b=1:3)
myFun(myDt)
myFun_mod(myDt)
myFun didn't work
myFun_mod did work
The output:
> library(test)
> myFun(myDt)
Error in eval(expr, envir, enclos) : object 'a' not found
> myFun_mod(myDt)
[1] "a" "b" "c"
then I added the following line to the NAMESPACE file:
import(data.table)
This is what #mnel was talking about with this link:
Using data.table package inside my own package
After adding import(data.table), both functions work.
I'm still not sure why you got the particular .subset error, which is why I went though the effort of reproducing the result...
Related
Within a loop, I try to remove a list of data frames simply with
rm(a,b,c,d)
However, in case an data-frame (e.g. b) does not exist in the global environment, I get a warning
In rm(a,b,c,d,...:
object 'b' not found
How can I suppress this warning to not pop-up?
Use suppressWarnings
suppressWarnings(rm(a,b,c,d))
#RonakShah has the correct answer here, though it is possible to avoid generating a warning at all by defining a function that checks for the existence of variables before trying to remove them:
rm2 <- function(...)
{
names <- sapply(as.list(match.call()[-1]), deparse);
rm(list = names[sapply(names, exists)], envir = parent.frame())
}
So you can do:
x <- 1; y <- 2;
ls()
#> [1] "rm2" "x" "y"
rm2(x, y, z) # Note no warning generated since no attempt made to remove z
ls()
[1] "rm2"
In the R-Package data.table the manual entry for ?data.table-class says that 'data.table' can be used for inheritance in a class definition, i.e. in the contains argument in a call to setClass:
library("data.table")
setClass("Data.Table", contains = "data.table")
However, if I create an instance of a Data.Table I would have expected that I can treat it like a data.table. This is not so. The following snippet will result in an error, which, as far as I understand, is because the [.data.table function can not handle the mix of S3 and S4 dispatch:
dat <- new("Data.Table", data.table(x = 1))
dat[TRUE]
I solved this, by defining a new method for [ and coercing any Data.Table to a data.table before evaluating it therein.
setMethod(
"[",
"Data.Table",
function(x, i, j, ..., drop = TRUE) {
mc <- match.call()
mc$x <- substitute(S3Part(x, strictS3 = TRUE))
Data.Table(
eval(mc, envir = parent.frame())
)
})
And a constructor function to feel more comfortable with it:
Data.Table <- function(...) new("Data.Table", data.table(...))
dat <- Data.Table(x = 1, key = "x")
dat[1]
This is acceptable for some scenarios but I loose all get and set functions from the data.table package and I suspect that I destroyed some other features. So the question is how to implement a working S4 data.table class? I would appreciate
Pointers to similar attempts/projects
Better/alternative solutions/ideas for an implementation
Any advice on what I loose with respect to performance with the above solution
There is one related question on SO I found, which presents a similar approach. However, I think it would involve too much coding to be feasible.
I think the short answer (the problem is still as valid as it was when raised) is that using data.table as a super class in S4 is not recommendable and not possible without considerable amount of effort and certain risks of instability.
It is also not quite clear what the goal should have been with the case at hand, but let's assume there was no alternative like forking and modifying the existing data.table package.
Then, to illustrate the case mentioned above with the [, let's first initialize the example:
# replicating some code from above
library("data.table")
Data.Table <- setClass("Data.Table", contains = "data.table")
dat <- Data.Table(data.table(x = 1))
dat[1]
> Error in if (n > 0) c(NA_integer_, -n) else integer() :
argument is of length zero
dat2 <- data.table(x = 1)
Now to check [.data.table, which is a lot of code as you can see on the Github repo data.table.R, so just reproducing the relevant part in the simplest dummy way:
# initializing output
ans = vector("list", 1)
# data (just one line of code as we have just one value in our example).
# desired subscript is row 1, but we have just one column as well.
ans[[1]] <- dat[[1]][1]
# add 'names' attribute
setattr(ans, "names", "x")
# set 'class' attribute
setattr(ans, "class", class(dat))
# set 'row.names'
setattr(ans, "row.names", .set_row_names(nrow(ans)))
And there we have the error, trying to set the row.names, which doesn't work because dim(ans) and therefore nrow is NULL.
So the real problem is here with the usage of setattr(ans, "class", class(dat)), which doesn't work well (try isS4(ans) or print(ans) just afterwards). In fact, from ?class we can read about S4:
The replacement version of the function sets the class to the value provided. For classes that have a formal definition, directly replacing the class this way is strongly deprecated. The expression as(object, value) is the way to coerce an object to a particular class.
data.table's setattr, which through C uses R's setAttrib function, is similar to calling attr(ans, "class") <- "Data.Table" or class(ans) <- "Data.Table", which would screw up as well.
If you do setattr(ans, "class", class(dat2)) instead, you will see that everything is fine here, as should be with S3.
One more word of caution though:
setattr(ans, "class", "data.frame")
and then print(ans) or dim(ans) may not look very nice to you... (although ans$x is ok).
Overriding setattr() in a good way isn't trivial either and such an approach will probably not get you any farther than the approach you have outlined above. Result could be something like:
setattr_new <- function(x, name, value) {
if (name == "class" && "Data.Table" %in% value) {
value <- c("data.table", "data.frame")
}
if (name == "names" && is.data.table(x) && length(attr(x, "names")) && !is.null(value))
setnames(x, value)
else {
ans = .Call(Csetattrib, x, name, value)
if (!is.null(ans)) {
warning("Input is a length=1 logical that points to the same address as R's global TRUE value. Therefore the attribute has not been set by reference, rather on a copy. You will need to assign the result back to a variable. See https://github.com/Rdatatable/data.table/issues/1281 for more.")
x = ans
}
}
if (name == "levels" && is.factor(x) && anyDuplicated(value))
.Call(Csetlevels, x, (value <- as.character(value)), unique(value))
invisible(x)
}
godmode:::assignAnywhere("setattr", setattr_new)
identical(dat[1], dat2[1])
[1] TRUE
# then possibly convert back to S4 class if desired for further processing at the end
as(dat[1], "Data.Table")
I built an R package and include a dataset called mouse where I can access it using data(mouse). In the package, I also have a function called fun which takes, as its first argument, the name of a dataset (included in the package):
fun = function(dt = NULL, ...) {
data(dt)
...
dt.sub = dt[ ,1:6]
...
}
However, when I use the function as fun(dt = "mouse"), it says In data(dt) : data set ‘dt’ not found. Also, I cannot use dt[ ,1:6] since dt here is a string. I tried to use noquote and as.name functions to get rid of the quotation marks, but the object dt does NOT refer to the mouse dataset.
My question is, what's the best approach to pass the name of a dataset (mouse in this case) in the function argument, and then use it in the function body? Thanks!
Try this:
f <- function(dt = NULL) {
do.call("data", list(dt))
dt <- eval(as.name(dt))
head(dt)
}
Is there any way to "check" or "verify" a source code file in R when sourcing it ?
For example, I have this function in a file "source.R"
MyFunction <- function(x)
{
print(x+y)
}
When sourcing "source.R", I would like to see some sort of warning : MyFunctions refers to an undefined object Y.
Any hints on how to check / verifiy R code ?
Cheers!
I use a function like this one for scanning all the functions in a file:
critic <- function(file) {
require(codetools)
tmp.env <- new.env()
sys.source(file, envir = tmp.env)
checkUsageEnv(tmp.env, all = TRUE)
}
Assuming source.R contains the definitions of two rather poorly written functions:
MyFunction <- function(x) {
print(x+y)
}
MyFunction2 <- function(x, z) {
a <- 10
x <- x + 1
print(x)
}
Here is the output:
critic("source.R")
# MyFunction: no visible binding for global variable ‘y’
# MyFunction2: local variable ‘a’ assigned but may not be used
# MyFunction2: parameter ‘x’ changed by assignment
# MyFunction2: parameter ‘z’ may not be used
You can use the codetools package in base R for that. And if you had your code in a package, it would tell you about this:
I have loaded in a R console different type of objects.
I can remove them all using
rm(list=ls())
or remove only the functions (but not the variables) using
rm(list=lsf.str())
My question is:
is there a way to remove all variables except the functions
Here's a one-liner that removes all objects except for functions:
rm(list = setdiff(ls(), lsf.str()))
It uses setdiff to find the subset of objects in the global environment (as returned by ls()) that don't have mode function (as returned by lsf.str())
The posted setdiff answer is nice. I just thought I'd post this related function I wrote a while back. Its usefulness is up to the reader :-).
lstype<-function(type='closure'){
inlist<-ls(.GlobalEnv)
if (type=='function') type <-'closure'
typelist<-sapply(sapply(inlist,get),typeof)
return(names(typelist[typelist==type]))
}
You can use the following command to clear out ALL variables. Be careful because it you cannot get your variables back.
rm(list=ls(all=TRUE))
Here's a pretty convenient function I picked up somewhere and adjusted a little. Might be nice to keep in the directory.
list.objects <- function(env = .GlobalEnv)
{
if(!is.environment(env)){
env <- deparse(substitute(env))
stop(sprintf('"%s" must be an environment', env))
}
obj.type <- function(x) class(get(x, envir = env))
foo <- sapply(ls(envir = env), obj.type)
object.name <- names(foo)
names(foo) <- seq(length(foo))
dd <- data.frame(CLASS = foo, OBJECT = object.name,
stringsAsFactors = FALSE)
dd[order(dd$CLASS),]
}
> x <- 1:5
> d <- data.frame(x)
> list.objects()
# CLASS OBJECT
# 1 data.frame d
# 2 function list.objects
# 3 integer x
> list.objects(env = x)
# Error in list.objects(env = x) : "x" must be an environment
I wrote this to remove all objects apart from functions from the current environment (Programming language used is R with IDE R-Studio):
remove_list=c() # create a vector
for(i in 1:NROW(ls())){ # repeat over all objects in environment
if(class(get(ls()[i]))!="function"){ # if object is *not* a function
remove_list=c(remove_list,ls()[i]) # ..add to vector remove_list
}
}
rm(list=remove_list) # remove all objects named in remove_list
Notes-
The argument "list" in rm(list=) must be a character vector.
The name of an object in position i of the current environment is returned from ls()[i] and the object itself from get(ls()[i]). Therefore the class of an object is returned from class(get(ls()[i]))