Conditionally remove data frames from environment - r

How can I drop data frames with less than 3 variables? I tried this:
`1001.AFG.1.A`<-data.frame(x = 1, y = 1:10)
apply(ls(), function(x) {if (dim(x)[2]<3) rm(x)})
The error message is:
Error in match.fun(FUN) : argument "FUN" is missing, with no default

1) The first line produces a named logical vector, to.rm with a component for each object which is TRUE if that object should be removed and FALSE otherwise. Thus names(to.rm)[to.rm] are the objects to be removed so feed that into rm. By splitting it into two steps, this lets one review to.rm before actually performing the rm.
to.rm <- unlist(eapply(.GlobalEnv, function(x) is.data.frame(x) && ncol(x) < 3))
rm(list = names(to.rm)[to.rm], envir = .GlobalEnv)
If this is entered directly into the global environment (i.e. not placed in a fucntion) then envir = .GlobalEnv in the last line is the default and can be omitted.
2) Another way is to iterate through the object names of env as shown. We have provided a verbose argument to show what it is doing and a dryrun argument to show what it would remove without actually removing anything.
rm2 <- function(env = .GlobalEnv, verbose = FALSE, dryrun = FALSE, all.names = FALSE) {
for(nm in ls(env, all.names = all.names)) {
obj <- get(nm, env)
if (is.data.frame(obj) && ncol(obj) < 3) {
if (verbose || dryrun) cat("removing", nm, "\n")
if (!dryrun) rm(list = nm, envir = env)
}
}
}
rm2(dryrun = TRUE)
rm2(verbose = TRUE)
Update Added envir argument to rm in (1). It was already in (2).
Update 2 Minor imrovements to (2).

You may want to try :
sapply(ls(), function(x) {
if (is.data.frame(get(x)) && dim(get(x))[2]<3) rm(list=x,envir=.GlobalEnv)
})
I you want to suppress the printings, you can do :
invisible(sapply(ls(), function(x) {
if (is.data.frame(get(x)) && dim(get(x))[2]<3) rm(list=x,envir=.GlobalEnv)
}))

Related

Overriding a method of data.table - failing to perfectly forward arguments

I'm able to successfully modify the behaviour of [.data.frame, but fail to do so for [.data.table.
For data.frame:
# Exact same signature as "[.data.frame" :
"[.my.data.frame" <- function (x, i, j,
drop = if (missing(i)) TRUE
else length(cols) == 1) {
if(!missing(j) && j==8 ) {
cat("Oy vey\n")
}
NextMethod()
}
df <- data.frame(a=1,b=2)
class(df) <- c("my.data.frame", class(df))
# Works as expected:
df[1,2] # 2
df[1,8] # Oy Vey NULL
df[1,] # 1 2
However, for (the considerably more complicated) data.table:
# Exact same signature as "[.data.table" :
"[.my.data.table" <- function (x, i, j, by, keyby, with = TRUE, nomatch = getOption("datatable.nomatch"),
mult = "all", roll = FALSE,
rollends = if (roll == "nearest") c(TRUE, TRUE)
else if (roll >= 0) c(FALSE, TRUE) else c(TRUE, FALSE),
which = FALSE, .SDcols, verbose = getOption("datatable.verbose"),
allow.cartesian = getOption("datatable.allow.cartesian"),
drop = NULL, on = NULL) {
if(!missing(j) && j==8 ) {
cat("Oy vey\n")
}
NextMethod()
}
dt <- data.table(a=1,b=2)
class(dt) <- c("my.data.table", class(dt))
dt[1,2] # ERROR: i is not found in calling scope and it is not a column of type logical. When the first argument inside DT[...] is a single symbol, data.table looks for it in calling scope.
I know better than to pass arguments to NextMethod. It looks like I must call [.data.table explicitly, capture and pass the arguments as unevaluated promises - but all my attempts with quote, substitute or match.call have so far failed. Any insight would be appreciated.
I've found a partial solution, posting here in hope someone might improve on it.
"[.my.data.table" <- function (x, ...) {
# Modifications and tests galore - which can be tricky with this signature
class(x) <- class(x)[-1]
ret <- x[...]
class(x) <- c("my.data.table", class(x))
ret
}
I still consider this partial, because actually doing something in the function probably involves at least something like arglist <- list(...), and this fails when [ is called like this -
dt[1,]
Other directions are still very welcome.

Unexpected behavior from `...` in a function's definition

NB: This question is not a duplicate of How to use R's ellipsis feature when writing your own function?. That question asks how to use ellipses, and in particular, "how can [one] convert the ellipsis from the function's signature into, for example, a list?" Below I don't ask anything of the sort. I am asking why my use of ellipses fails to produce the expected result.
As an example, the function my.ls below is meant to be a wrapper around base::ls that makes all.names = TRUE the default:
my.ls <- function (...) base::ls(..., all.names = TRUE)
I had expected that, with this definition, my.ls() would produce the same value as base::ls(all.names = TRUE) does. To my surprise, the values from these two expressions are not even close.
For example:
% /usr/bin/R --quiet --vanilla
> x <- 3; .y <- 1; z <- 4
> base::ls(all.names = TRUE)
[1] "x" ".y" "z"
> my.ls <- function (...) base::ls(..., all.names = TRUE)
> my.ls()
[1] "..."
What am I doing wrong?
Is it because the default argument for envir in base::ls is as.environement(pos)?
Adding envir = gloal.env() seems to work for me, assuming you are working from the global environment.
OK, I figured out the problem. The following implementation of my.ls is closer to what I'm after:
my.ls <- function (name, all.names = TRUE, ...) {
if (missing(name))
return(my.ls(parent.frame(1), all.names = all.names, ...))
base::ls(name, all.names = all.names, ...)
}

The arguments of rm() in language R

I am new in language R,I found something special with it.
When using the method rm(),I wonder why I can't pass ls() as a parameter.
while using rm(list = ls()) will pass the compilation.
The method ls() will return a data whose type is List,won't it ?
It is the first time that I ask a question at foreign website, and my English is terrible, sorry! Waiting for your answers!
It has to do with the ... special argument in R (AKA "dot-dot-dot" or "ellipsis"). ... captures all unnamed arguments (as well as undocumented named arguments), "positionnally".
See ?rm for its arguments: rm(..., list = character(), pos = -1, envir = as.environment(pos), inherits = FALSE).
Since ... is the first argument, it captures ls() in rm(ls()).
But there are expectations on ... as you can see in the source code of rm (simply type rm at the command line):
function (..., list = character(), pos = -1, envir = as.environment(pos),
inherits = FALSE)
{
dots <- match.call(expand.dots = FALSE)$...
if (length(dots) && !all(vapply(dots, function(x) is.symbol(x) ||
is.character(x), NA, USE.NAMES = FALSE)))
stop("... must contain names or character strings")
names <- vapply(dots, as.character, "")
if (length(names) == 0L)
names <- character()
list <- .Primitive("c")(list, names)
.Internal(remove(list, envir, inherits))
}
Here it is is.symbol() that fails.
Maybe it will be easier with an example:
foo <- 1L
bar <- 2L
rm(ls())
# Error
ls()
# [1] "bar" "foo"
rm(c("foo", "bar"))
# Same error
rm("foo", "bar")
# OK
If you want to investigate further, I suggest: debugonce(rm) then rm(ls()) then step by step execution (easier in an IDE like RStudio).

Pass an argument in the form of a character vector in order to use it later inside the select argument of the `subset()` function

I'm trying to pass an argument in the form of a character vector (called keep_col) in order to use it later inside the select argument of the subset() function, and all of this is inside a bigger function called early_prep() I created.
Bellow is the relevant part of my code.
early_prep <- function(file_name, results_name, id = NULL ,keep_rows = FALSE, keep_col = FALSE, within_vars = c(), reaction_time = NULL, accuracy = NULL, clear_all = FALSE, decimal_places = 4){
if (clear_all %in% TRUE){
# Removes all objects form the console
rm(list = ls())
}
# Call read_data() function
read_data(file_name)
if (keep_rows != FALSE) {
raw_data <<- subset(raw_data, eval(parse(text = keep_rows)))
# Print to console
print("#### Deleting unnecesarry rows in raw_data ####", quote = FALSE)
}
if (keep_col != FALSE) {
raw_data <<- subset(raw_data, select = keep_col)
# Print to console
print("#### Deleting unnecesarry columns in raw_data ####", quote = FALSE)
}
}
The problem is when I call early_prep(file_name ="n44.txt", keep_col = c("subject", "soa", "congruency")) I get the following warning message:
> early_prep(file_name = "n44.txt", keep_col = c("subject", "soa", "congruency"))
[1] #### Reading txt file ####
[1] #### Deleting unnecesarry columns in raw_data ####
Warning message:
In if (keep_col != FALSE) { :
the condition has length > 1 and only the first element will be used
Does anyone have an idea about how I can solve this problem?
Any help will be greatly appreciated
Best,
Ayala

Grep in R using conditions on the matching

I am using the tm in R and would like to change the stemCompletion function a little.
Currently when I have a string
x <- c('everi','new')
and a pattern
dictionary <- c ('every','everyone','new')
When I run the build in code in the stemCompletion, the function that is running is the following
possibleCompletions <- lapply(x, function(w) grep(sprintf("^%s", w),
dictionary,
value = TRUE))
structure(sapply(possibleCompletions, "[", 1), names = x)
and the result is
everi new
NA "new"
I want to change the function so that if the grep does not find anything for a particular value of x
then it tries by taking out the last value of the string. In my case 'ever' instead of 'everi'
I tried this code but it does not work.
substrLeft <- function(x, n) { substr(x, 1, nchar(x)-n) }
possibleCompletions <- lapply(x, function(w)
if (grepl(sprintf("^%s", w),dictionary,fixed = FALSE) = FALSE) {
grep(sprintf("^%s", substrLeft(w,1)),dictionary,value = TRUE,fixed = FALSE)
} else {
grep(sprintf("^%s", w),dictionary,value = TRUE,fixed = FALSE, invert = TRUE)
})
structure(sapply(possibleCompletions, "[", 1), names = x)
Thanks all.
Basically you have a few syntax problems.
First, = is assignment, == compares.
Also, if conditions in ℝ are, according to the manual "a single logical value". So you could get around this by wrapping a comparison in parentheses ((grepl(sprintf("^%s", w),dictionary,fixed = FALSE) == FALSE)) but you'll get warnings since grepl produces a list of length 2.
Finally, your one-line function has expanded to several lines, so you'll need to wrap it in {}:
possibleCompletions <- lapply(x, function (w) {
+ if ((grepl(sprintf("^%s", w),dictionary,fixed = FALSE) == FALSE) ) {
+ grep(sprintf("^%s", substrLeft(w,1)),dictionary,value = TRUE,fixed = FALSE)
+ } else {
+ grep(sprintf("^%s", w),dictionary,value = TRUE,fixed = FALSE, invert = TRUE)
+ }
+ })
Warning messages:
1: In if ((grepl(sprintf("^%s", w), dictionary, fixed = FALSE) == FALSE)) { :
the condition has length > 1 and only the first element will be used
2: In if ((grepl(sprintf("^%s", w), dictionary, fixed = FALSE) == FALSE)) { :
the condition has length > 1 and only the first element will be used
> str(possibleCompletions)
List of 2
$ : chr [1:2] "every" "everyone"
$ : chr "new"

Resources