Using `subset()` within a secondary R function

Using `subset()` within a secondary R function - r

When requested, function foo1 can subset a list by a desired variable (e.g., by = ESL == 1). Otherwise, foo1 will simply output the inputted list itself.
For my purposes, I need to use foo1 within a new function called foo2. BUT I'm wondering why foo2 fails and how to fix it:
Error in eval(e, x, parent.frame()) : object 'ESL' not found
The full reproducible data and code is below:
foo1 <- function(by, data){
L <- split(data, data$study.name) ; L[[1]] <- NULL
if(!missing(by)){
s <- substitute(by)
H <- lapply(L, function(x) do.call("subset", list(x, s)))
L <- Filter(nrow, H)
}
return(L)
}
## EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", h = T) ## Data
foo1(data = D, by = ESL == 1) ## works fine :-) ####
## BUT:
foo2 <- function(by, data){
foo1(by = by, data = data)
}
## EXAMPLE OF USE:
foo2(data = D, by = ESL == 1) ## Fails :-( ####

Here, we can modify the foo2 to evalluate the function call
foo2 <- function(by, data){
eval(substitute(foo1(by = by, data = data)))
}
out1 <- foo1(data = D, by = ESL == 1)
out2 <- foo2(data = D, by = ESL == 1)
identical(out1, out2)
#[1] TRUE

Related

Subsetting in a second level R function

Function foo1 can subset a list by a requested variable (e.g., by = type == 1). Otherwise, foo1 will simply output the inputted list itself.
For my purposes, I need to use foo1 within a new function called foo2.
In my code below, my desired output is obtained like so: foo2(data = D, by = G[[1]]) ; foo2(data = D, by = G[[2]]) ; foo2(data = D, by = G[[3]]).
But, I wonder why when I loop over G using lapply, I get an error as shown below?
foo1 <- function(data, by){
L <- split(data, data$study.name) ; L[[1]] <- NULL
if(!missing(by)){
L <- lapply(L, function(x) do.call("subset", list(x, by)))
}
return(L)
}
foo2 <- function(data, by){
eval(substitute(foo1(data = data, by = by)))
}
## EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", h = T) ## Data
G <- lapply(unique(na.omit(D$type)), function(i) bquote(type == .(i)))# all levels of `type`
foo2(data = D, by = G[[1]]) # Works fine without `lapply` :-)
lapply(1:3, function(i) foo2(data = D, by = G[[i]])) # Doesn't work with `lapply`! :-(
# Error in do.call("subset", list(x, by)) : object 'i' not found

Your foo2 function tries to evaluate the expression
foo1(data = D, by = G[[i]])
but it doesn't have i available. You need to evaluate G[[i]] in the anonymous function you're passing to lapply to get an expression defining the subset, and then evaluate that subset in foo2. I recommend naming that function instead of using an anonymous one; it makes debugging a lot easier.
Here's some recoding that appears to work:
Redefine foo2 to
foo2 <- function(data, by){
by <- eval(by, envir = data)
foo1(data = data, by = by)
}
and
foo3 <- function(i) {
expr <- G[[i]]
foo2(data = D, by = expr)
}
and then
lapply(1:3, foo3)
I'm not sure this does exactly what you want, but it should be close enough that you can fix it up.

Instead of using lapply, here a for loop can be used
lst1 <- vector("list", length(G))
for(i in 1:3) lst1[[i]] <- foo2(data = D, by = G[[i]])
-checking
identical(lst1[[2]], foo2(data = D, by = G[[2]]))
#[1] TRUE
identical(lst1[[3]], foo2(data = D, by = G[[3]]))
#[1] TRUE
For the lapply part, there seems to be a conflict with i anonymous function which is also called in the G. If we use a new variable say 'j'
lst2 <- lapply(1:3, function(j) foo1(data = D, by = G[[j]]))
should work
identical(lst2[[2]], lst1[[2]])
#[1] TRUE

how to append an element to a list without keeping track of the index?

I am looking for the r equivalent of this simple code in python
mylist = []
for this in that:
df = 1
mylist.append(df)
basically just creating an empty list, and then adding the objects created within the loop to it.
I only saw R solutions where one has to specify the index of the new element (say mylist[[i]] <- df), thus requiring to create an index i in the loop.
Is there any simpler way than that to just append after the last element.

There is a function called append:
ans <- list()
for (i in 1992:1994){
n <- 1 #whatever the function is
ans <- append(ans, n)
}
ans
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 1
##
Note: Using apply functions instead of a for loop is better (not necessarily faster) but it depends on the actual purpose of your loop.
Answering OP's comment: About using ggplot2 and saving plots to a list, something like this would be more efficient:
plotlist <- lapply(seq(2,4), function(i) {
require(ggplot2)
dat <- mtcars[mtcars$cyl == 2 * i,]
ggplot() + geom_point(data = dat ,aes(x=cyl,y=mpg))
})
Thanks to #Wen for sharing Comparison of c() and append() functions:
Concatenation (c) is pretty fast, but append is even faster and therefor preferable when concatenating just two vectors.

There is: mylist <- c(mylist, df) but that's usually not the recommended way in R. Depending on what you're trying to achieve, lapply() is often a better option.

mylist <- list()
for (i in 1:100){
n <- 1
mylist[[(length(mylist) +1)]] <- n
}
This seems to me the faster solution.
x <- 1:1000
aa <- microbenchmark({xx <- list(); for(i in x) {xx <- append(xx, values = i)} })
bb <- microbenchmark({xx <- list(); for(i in x) {xx <- c(xx, i)} } )
cc <- microbenchmark({xx <- list(); for(i in x) {xx[(length(xx) + 1)] <- i} } )
sapply(list(aa, bb, cc), (function(i){ median(i[["time"]]) / 10e5 }))
#{append}=4.466634 #{c}=3.185096 #{this.one}=2.925718

mylist <- list()
for (i in 1:100) {
df <- 1
mylist <- c(mylist, df)
}

Use
first_list = list(a=0,b=1)
newlist = c(first_list,list(c=2,d=3))
print(newlist)
$a
[1] 0
$b
[1] 1
$c
[1] 2
$d
[1] 3
Here's an example:
glmnet_params = list(family="binomial", alpha = 1,
type.measure = "auc",nfolds = 3, thresh = 1e-4, maxit = 1e3)
Now:
glmnet_classifier = do.call("cv.glmnet",
c(list(x = dtm_train, y = train$target), glmnet_params))

Add a progress bar to boot function in R

I am trying to add a progress bar to a bootstrap function in R.
I tried to make the example function as simple as possible (hence i'm using mean in this example).
library(boot)
v1 <- rnorm(1000)
rep_count = 1
m.boot <- function(data, indices) {
d <- data[indices]
setWinProgressBar(pb, rep_count)
rep_count <- rep_count + 1
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
pb <- winProgressBar(title = "Bootstrap in progress", label = "",
min = 0, max = tot_rep, initial = 0, width = 300)
b <- boot(v1, m.boot, R = tot_rep)
close(pb)
The bootstrap functions properly, but the problem is that the value of rep_count does not increase in the loop and the progress bar stays frozen during the process.
If I check the value of rep_count after the bootstrap is complete, it is still 1.
What am i doing wrong? maybe the boot function does not simply insert the m.boot function in a loop and so the variables in it are not increased?
Thank you.

You could use the package progress as below:
library(boot)
library(progress)
v1 <- rnorm(1000)
#add progress bar as parameter to function
m.boot <- function(data, indices, prog) {
#display progress with each run of the function
prog$tick()
d <- data[indices]
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
#initialize progress bar object
pb <- progress_bar$new(total = tot_rep + 1)
#perform bootstrap
boot(data = v1, statistic = m.boot, R = tot_rep, prog = pb)
I haven't quite figured out yet why it's necessary to set the number of iterations for progress_bar to be +1 the total bootstrap replicates (parameter R), but this is what was necessary in my own code, otherwise it throws an error. It seems like the bootstrap function is run one more time than you specify in parameter R, so if the progress bar is set to only run R times, it thinks the job is finished before it really is.

The pbapply package was designed to work with vectorized functions. There are 2 ways to achieve that in the context of this question: (1) write a wrapper as was suggested, which will not produce the same object of class 'boot'; (2) alternatively, the line lapply(seq_len(RR), fn) can be written as pblapply(seq_len(RR), fn). Option 2 can happen either by locally copying/updating the boot function as shown in the example below, or asking the package maintainer, Brian Ripley, if he would consider adding a progress bar directly or through pbapply as dependency.
My solution (changes indicated by comments):
library(boot)
library(pbapply)
boot2 <- function (data, statistic, R, sim = "ordinary", stype = c("i",
"f", "w"), strata = rep(1, n), L = NULL, m = 0, weights = NULL,
ran.gen = function(d, p) d, mle = NULL, simple = FALSE, ...,
parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",
1L), cl = NULL)
{
call <- match.call()
stype <- match.arg(stype)
if (missing(parallel))
parallel <- getOption("boot.parallel", "no")
parallel <- match.arg(parallel)
have_mc <- have_snow <- FALSE
if (parallel != "no" && ncpus > 1L) {
if (parallel == "multicore")
have_mc <- .Platform$OS.type != "windows"
else if (parallel == "snow")
have_snow <- TRUE
if (!have_mc && !have_snow)
ncpus <- 1L
loadNamespace("parallel")
}
if (simple && (sim != "ordinary" || stype != "i" || sum(m))) {
warning("'simple=TRUE' is only valid for 'sim=\"ordinary\", stype=\"i\", n=0', so ignored")
simple <- FALSE
}
if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
runif(1)
seed <- get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
n <- NROW(data)
if ((n == 0) || is.null(n))
stop("no data in call to 'boot'")
temp.str <- strata
strata <- tapply(seq_len(n), as.numeric(strata))
t0 <- if (sim != "parametric") {
if ((sim == "antithetic") && is.null(L))
L <- empinf(data = data, statistic = statistic, stype = stype,
strata = strata, ...)
if (sim != "ordinary")
m <- 0
else if (any(m < 0))
stop("negative value of 'm' supplied")
if ((length(m) != 1L) && (length(m) != length(table(strata))))
stop("length of 'm' incompatible with 'strata'")
if ((sim == "ordinary") || (sim == "balanced")) {
if (isMatrix(weights) && (nrow(weights) != length(R)))
stop("dimensions of 'R' and 'weights' do not match")
}
else weights <- NULL
if (!is.null(weights))
weights <- t(apply(matrix(weights, n, length(R),
byrow = TRUE), 2L, normalize, strata))
if (!simple)
i <- index.array(n, R, sim, strata, m, L, weights)
original <- if (stype == "f")
rep(1, n)
else if (stype == "w") {
ns <- tabulate(strata)[strata]
1/ns
}
else seq_len(n)
t0 <- if (sum(m) > 0L)
statistic(data, original, rep(1, sum(m)), ...)
else statistic(data, original, ...)
rm(original)
t0
}
else statistic(data, ...)
pred.i <- NULL
fn <- if (sim == "parametric") {
ran.gen
data
mle
function(r) {
dd <- ran.gen(data, mle)
statistic(dd, ...)
}
}
else {
if (!simple && ncol(i) > n) {
pred.i <- as.matrix(i[, (n + 1L):ncol(i)])
i <- i[, seq_len(n)]
}
if (stype %in% c("f", "w")) {
f <- freq.array(i)
rm(i)
if (stype == "w")
f <- f/ns
if (sum(m) == 0L)
function(r) statistic(data, f[r, ], ...)
else function(r) statistic(data, f[r, ], pred.i[r,
], ...)
}
else if (sum(m) > 0L)
function(r) statistic(data, i[r, ], pred.i[r, ],
...)
else if (simple)
function(r) statistic(data, index.array(n, 1, sim,
strata, m, L, weights), ...)
else function(r) statistic(data, i[r, ], ...)
}
RR <- sum(R)
res <- if (ncpus > 1L && (have_mc || have_snow)) {
if (have_mc) {
parallel::mclapply(seq_len(RR), fn, mc.cores = ncpus)
}
else if (have_snow) {
list(...)
if (is.null(cl)) {
cl <- parallel::makePSOCKcluster(rep("localhost",
ncpus))
if (RNGkind()[1L] == "L'Ecuyer-CMRG")
parallel::clusterSetRNGStream(cl)
res <- parallel::parLapply(cl, seq_len(RR), fn)
parallel::stopCluster(cl)
res
}
else parallel::parLapply(cl, seq_len(RR), fn)
}
}
else pblapply(seq_len(RR), fn) #### changed !!!
t.star <- matrix(, RR, length(t0))
for (r in seq_len(RR)) t.star[r, ] <- res[[r]]
if (is.null(weights))
weights <- 1/tabulate(strata)[strata]
boot.return(sim, t0, t.star, temp.str, R, data, statistic,
stype, call, seed, L, m, pred.i, weights, ran.gen, mle)
}
## Functions not exported by boot
isMatrix <- boot:::isMatrix
index.array <- boot:::index.array
boot.return <- boot:::boot.return
## Now the example
m.boot <- function(data, indices) {
d <- data[indices]
mean(d, na.rm = T)
}
tot_rep <- 200
v1 <- rnorm(1000)
b <- boot2(v1, m.boot, R = tot_rep)

The increased rep_count is a local variable and lost after each function call. In the next iteration the function gets rep_count from the global environment again, i.e., its value is 1.
You can use <<-:
rep_count <<- rep_count + 1
This assigns to the rep_count first found on the search path outside the function. Of course, using <<- is usually not recommended because side effects of functions should be avoided, but here you have a legitimate use case. However, you should probably wrap the whole thing in a function to avoid a side effect on the global environment.
There might be better solutions ...

I think i found a possible solution. This merges the answer of #Roland with the convenience of the pbapply package, using its functions startpb(), closepb(), etc..
library(boot)
library(pbapply)
v1 <- rnorm(1000)
rep_count = 1
tot_rep = 200
m.boot <- function(data, indices) {
d <- data[indices]
setpb(pb, rep_count)
rep_count <<- rep_count + 1
Sys.sleep(0.01) #Just to slow down the process
mean(d, na.rm = T)
}
pb <- startpb(min = 0, max = tot_rep)
b <- boot(v1, m.boot, R = tot_rep)
closepb(pb)
rep_count = 1
As previously suggested, wrapping everything in a function avoids messing with the rep_count variable.

The progress bar from the package dplyr works well:
library(dplyr)
library(boot)
v1 <- rnorm(1000)
m.boot <- function(data, indices) {
d <- data[indices]
p$tick()$print() # update progress bar
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
p <- progress_estimated(tot_rep+1) # init progress bar
b <- boot(v1, m.boot, R = tot_rep)

You can use the package pbapply
library(boot)
library(pbapply)
v1 <- rnorm(1000)
rep_count = 1
# your m.boot function ....
m.boot <- function(data, indices) {
d <- data[indices]
mean(d, na.rm = T)
}
# ... wraped in `bootfunc`
bootfunc <- function(x) { boot(x, m.boot, R = 200) }
# apply function to v1 , returning progress bar
pblapply(v1, bootfunc)
# > b <- pblapply(v1, bootfunc)
# > |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% Elapsed time: 02s

Sorting list of list of elements of a custom class in R?

I have a custom class object (list of tuples).
I have defined <.myclass >.myclass and ==.myclass on it as well.
Now I have a
a <- obj1 # of myclass
b <- obj2 # of myclass
c <- obj3 # of myclass
L <- list(list(a,12,1),list(b,215,23),list(c,21,9))
I want to sort L, on index 1. i.e. I have b < c < a then, I want sorted L in this form list(list(b,215,23),list(c,21,9),list(a,12,1))
How do I achieve this?
In my searches, I found how to sort on particular index, and using that I wrote the following function
magic_sort <- function(lst, sortind, dec = T) {
return(lst[order(sapply(lst,'[[',sortind), decreasing = dec)])
}
But when I give index 1 to it, to sort on obj1, it fails with
> magic_sort(L,1)
Error in order(sapply(lst, "[[", sortind), decreasing = dec) :
unimplemented type 'list' in 'orderVector1'
Is there any fix for this? In general, can I have functions like sort, minimum and so on, based on custom definition of comparison operators?
Edit: Following perhaps will help understand the structure better: http://pastebin.com/0M7JRLTu
Edit 2:
library("sets")
a <- list()
class(a) <- "dfsc"
a[[1]] <- tuple(1L, 2L, "C", "a", "B")
b <- list()
class(b) <- "dfsc"
b[[1]] <- tuple(1L, 2L, "A", "b", "B")
c <- list()
class(c) <- "dfsc"
c[[1]] <- tuple(1L, 2L, "A", "a", "B")
L <- list()
L[[1]] <- list(a, 12, 132)
L[[2]] <- list(b, 21, 21)
L[[3]] <- list(c, 32, 123)
`<.dfsc` <- function(c1, c2) {
return(lt_list(toList(c1),toList(c2)))
}
`==.dfsc` <- function(c1, c2) {
return(toString(c1) == toString(c2))
}
`>.dfsc` <- function(c1, c2) {
return(!((c1 < c2) || (c1 == c2)))
}
lt_list <- function(l1, l2) {
n1 <- length(l1)
n2 <- length(l2)
j = 1
while(j <= n1 && j <= n2) {
if (l1[[j]] != l2[[j]]) {
return (l1[[j]] < l2[[j]])
}
j = j + 1
}
return(n1 < n2)
}
toString.dfsc <- function(x) {
code_string <- ""
#for(ii in x[[1]]) {
for(ii in x) {
code_string <- paste(code_string,"(",ii[[1]],",",ii[[2]],",",ii[[3]],",",ii[[4]],",",ii[[5]],")", sep = "")
}
return(code_string)
}
Now I want the L desired to be list(list(c,_,_),list(b,_,_),list(a,_,_))

This answer from Aaron demonstrates, exactly, what is needed to apply a customized sort on a classed object. As Roland notes, you -actually- need to sort "L" and, thus, that is where the focus on custom sort should be. To provide flexibility specifying on which index of "L" 's elements to sort, a way would be to store an extra attr on "L":
Turn "L" to an appropriate object:
class(L) = "myclass"
attr(L, "sort_ind") = 1L
Ops methods need to be defined (extract the relevant element of your data):
"<.myclass" = function(x, y)
{
i = attr(x, "sort_ind") ## also check if 'x' and 'y' have the same 'attr(, "sort_ind")'
x[[1]][[i]] < y[[1]][[i]]
}
"==.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] == y[[1]][[i]]
}
">.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] > y[[1]][[i]]
}
And a subset method:
"[.myclass" = function(x, i)
{
y = .subset(x, i)
attributes(y) = attributes(x)
return(y)
}
The above methods are necessary (perhaps, except "<") to be defined since a call to sort/order will end up calling rank which needs .gt in order to subset accordingly each element and compare.
Finally, a get/set function for sauce:
sort_ind = function(x) attr(x, "sort_ind")
"sort_ind<-" = function(x, value)
{
attr(x, "sort_ind") = value
return(x)
}
And:
order(L)
#[1] 3 2 1
sort_ind(L) = 3
order(L)
#[1] 2 3 1
A method for sort can be, also, created to wrap all the above:
sort.myclass = function(x, sort_ind = attr(x, "sort_ind"), ...)
{
sort_ind(x) = sort_ind
NextMethod()
}
sort(L)
sort(L, sort_ind = 1)
(I assumed that your toList function would look like something toList = function(x) x[[1L]])

I wanted to make use of internal and supposedly more efficient sort, but doesn't seem like this sort has facility to take custom comparison operator. So I ended up using implementation of quicksort to sort lists of lists at arbitrary index, assuming comparison exists between the elements at that index.
part_qsort <- function(l, idx, low, high) {
lst <- l
pivot <- lst[[high]][[idx]]
i <- low - 1
for(j in low:(high-1)) {
if ((lst[[j]][[idx]] < pivot) || (lst[[j]][[idx]] == pivot)) {
i <- i + 1
swap(lst[[i]],lst[[j]])
}
}
swap(lst[[(i+1)]],lst[[high]])
eval.parent(substitute(l <- lst))
return(i+1)
}
# recursive calls to quicksort
qsort <- function(l,idx,low,high) {
if (low < high) {
lst <- l
pi <- part_qsort(lst,idx,low,high)
qsort(lst, idx, low, pi-1)
qsort(lst, idx, pi+1, high)
eval.parent(substitute(l <- lst))
}
}
Another thing to look into can be library("rlist") which seems to have a bunch of functions implemented on lists.

Create new functions using a list of functions and list of function parameters to Be Passed

I am trying to create new functions from a list of function and a list of parameters to be passed to these functions, but am unable to do so so far. Please see the example below.
fun_list <- list(f = function(x, params) {x+params[1]},
z = function(a, params) {a * params[1] * params[2]})
params_list <- list(f = 1, z = c(3, 5))
# goal is to create 2 new functions in global environment
# fnew <- function(x) {x+1}
# znew <- function(a) {a*3*5}
# I've tried
for(x in names(fun_list)){
force(x)
assign(paste0(x, "new"), function(...) fun_list[[x]] (..., params = params_list[[x]]))
}
The goal is to do this dynamically for arbitrary functions and parameters.

Well, force() doesn't work in a for-loop because for loops do not create new environments. Based on a previous question of mine, I created a capture() function
capture <- function(...) {
vars <- sapply(substitute(...()), deparse);
pf <- parent.frame();
Map(assign, vars, mget(vars, envir=pf, inherits = TRUE), MoreArgs=list(envir=pf))
}
this allows
for(x in names(fun_list)) {
f = local({
capture(x);
p = params_list[[x]];
f = fun_list[[x]];
function(x) f(x, p)
})
assign(paste0(x, "new"), f)
}
where we create a local, private environment for the functions to store their default parameter values.
Which gives
fnew(2)
# [1] 3
znew(2)
# [1] 30

How about this:
for(x in names(fun_list)) {
formals(fun_list[[x]])$params <- params_list[[x]]
assign(paste0(x, "new"), fun_list[[x]])
}

This is similar in spirit:
ps <- list(fp=1,zp=c(3,5))
f0s <- substitute(list(f=function(x)x+fp,z=function(a)a*zp1*zp2),as.list(unlist(ps)))
f0s # list(f = function(x) x + 1, z = function(a) a * 3 * 5)
fs <- eval(f0s)
fs$f(1) # 2
To do the fancy thing described in the OP, you'd probably have to mess with formals.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using `subset()` within a secondary R function - r

Here, we can modify the foo2 to evalluate the function call foo2 <- function(by, data){ eval(substitute(foo1(by = by, data = data))) } out1 <- foo1(data = D, by = ESL == 1) out2 <- foo2(data = D, by = ESL == 1) identical(out1, out2) #[1] TRUE

Related

Subsetting in a second level R function

how to append an element to a list without keeping track of the index?

Add a progress bar to boot function in R

Sorting list of list of elements of a custom class in R?

Create new functions using a list of functions and list of function parameters to Be Passed

Categories

Resources