check for a particular object name in a loop - r

I have a loop with a lot of data frames.
When I come to certain data frames my logic needs to adjust. I'm looking for a while to make a TRUE/FALSE statement out of this, but I'm not sure exactly how to pull this off.
I want to convert the actual name of the variable into the text of the variable.
my_list_of_dfs = list(iris,mtcars)
for (i in my_list_of_dfs){ if i =='iris'{ print('this works') } }

Try a named list instead
# Created a named list instead...
my_list_of_dfs = list('iris'=iris, 'mtcars'=mtcars)
# check the names
names(my_list_of_dfs)
# [1] "iris" "mtcars"
for (i in names(my_list_of_dfs) ) {
if (i =='iris') {
print('this works')
}
print (my_list_of_dfs[i]) # You can access data-frame like this...
}
BTW, you also forgot the brackets around the condition in your if statement

An addition to answer by #Ismail (so please upvote that answer if you agree with the idea of using a named list), is to create a function that generates a named list for you. That is, instead of explicitly typing something like list('iris'=iris, 'mtcars'=mtcars), the following function will take R objects, combine them in a list, and name the list with the objects:
named_list <- function(...) {
.l <- list(...)
.names <- deparse(substitute(list(...)))
.names <- strsplit(gsub("list\\(|\\)| ", "", .names), ",")[[1]]
names(.l) <- .names
.l
}
x <- 3
y <- data.frame(a=1,b=2)
named_list(x, y)
#> $x
#> [1] 3
#>
#> $y
#> a b
#> 1 1 2
my_list_of_dfs <- named_list(iris, mtcars)
names(my_list_of_dfs )
#> [1] "iris" "mtcars"
Can follow #Ismail answer from here with something like:
for (i in names(my_list_of_dfs )) {
if (i == "iris")
print(names(my_list_of_dfs[[i]]))
else
print ("This is NOT iris")
}
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#> [1] "This is NOT iris"

Related

How to get rid of data frame name in `deparse(substitute())`?

Just like in question: How to get rid of data frame name in deparse(substitute())?
For example, create a simple function:
foo<-function(x){
print(
paste(
'This is called', deparse(substitute(x))
)
)
}
Feeding it with a vector gives desired behaviour:
test<-1:3
foo(test)
[1] "This is called test"
But when I pass a column of data frame it includes data frame name:
> df<-data.frame(ones=c(1,1,1), twos=c(2,2,2))
> foo(df$ones)
[1] "This is called df$ones"
How to get rid of this df$ in output to get "This is called ones" (hopefully with no use of regular expressions)?
The way to do this seems not to put this into the function but to use he function that was presented in the question like this:
with(df, foo(ones))
Of course, one could modify the function to remove everything up to $ but there are other ways to specify this such as DF[["ones"]] and this line of fixing it up seems to get into an endless hole of finding new situations that have to be fixed up.
# not recommended
foo2 <- function(x) {
Name <- sub(".*\\$", "", deparse(substitute(x)))
print( paste('This is called', Name) )
}
DF <- data.frame(ones = 5:6)
foo2(DF$ones)
## [1] "This is called ones"
Really the problem is the design of foo. Functions such as lm that involve column names and data frames use a call which separates them:
foo3 <- function(x, data = parent.frame()) {
Name <- deparse(substitute(x))
print( paste('This is called', Name ))
data[[Name]]
}
ones <- 3:4
foo3(ones)
## [1] "This is called ones"
## [1] 3 4
DF <- data.frame(ones = 5:6)
foo3(ones, DF)
## [1] "This is called ones"
## [1] 5 6
or another design would be to specify the column as a formula:
foo4 <- function(formula, data = parent.frame()) {
Name <- all.vars(formula)[1]
print( paste('This is called', Name ))
data[[Name]]
}
DF <- data.frame(ones = 5:6)
foo4(~ ones, DF)
## [1] "This is called ones"
## [1] 5 6
ones <- 4:5
foo4(~ ones)
## [1] "This is called ones"
## [1] 4 5

get name of an object inside a loop

I know the deparse+substitute trick to get the name from an object passed as argument to a function, but the same trick inside a loop does no work.
My code (just for testing):
mylist <- list(first = c("lawyer","janitor"), second = c("engineer","housewife"))
for (element in names(mylist)){
print(deparse(substitute(mylist[[element]])))
}
[1] "mylist[[element]]"
[1] "mylist[[element]]"
is there any way of getting the result?:
first
second
using lapply
lapply(mylist, function(x) { print(names(x))} )
# NULL
# NULL
# $first
# NULL
#
# $second
# NULL
using for loop as per your question
for (element in names(mylist)){
print(element)
}
# [1] "first"
# [1] "second"
Use "names"
for (element in names(mylist)){
print(as.name(element))
}

In R, is it possible to create a new object inside a function and pass it to the parent environment?

The function would look something like:
function(input, FUN, output) {
output <- FUN(input)
return(input)
}
Where output is an unquoted name of an object to be created.
Let's skip the part where this is probably a bad idea: is this sort of thing possible? How would you go about doing it?
Clean code would just return it.
But you have other options:
the <<- operator
the assign() function where you can list the environment to assign to
Here is a trivial example:
R> foo <- function(x=21) { y <<- 2*x; return(3*x) }
R> foo(10)
[1] 30
R> y
[1] 20
R>
1) Try this:
fun <- function(input, FUN, output = "output", envir = parent.frame()) {
envir[[output]] <- FUN(input)
input
}
fun(4, sqrt)
## [1] 4
output
## [1] 2
Note that if hardcoding the output variable name to output is ok then the assignment could be written:
envir$output <- FUN(input)
2) Another possibility if you want to output both the input and output yet avoiding side effects is to return both in a list:
fun2 <- function(input, FUN, output = "output")
setNames(list(input, FUN(input)), c("input", output))
fun2(4, sqrt)
giving:
$input
[1] 4
$output
[1] 2
2a) A variation of this is:
devtools::install_github("ggrothendieck/gsubfn")
library(gsubfn) # list[...] <- ...
list[input, output] <- fun2(sqrt)
giving:
> input
[1] 4
> output
[1] 2
3) Yet another possibility is to pass the input in an attribute:
fun3 <- function(input, FUN) {
out <- FUN(input)
attr(out, "input") <- input
out
}
fun3(4, sqrt)
giving:
[1] 2
attr(,"input")
[1] 4

How to let print() pass arguments to a user defined print method in R?

I have defined an S3 class in R that needs its own print method. When I create a list of these objects and print it, R uses my print method for each element of the list, as it should.
I would like to have some control over how much the print method actually shows. Therefore, the print method for my class takes a few additional arguments. However, I have not found a way to make use of these arguments, when printing a list of objects.
To make this more clear, I give an example. The following code defines two objects of class test, a list that contains both objects and a print method for the class:
obj1 <- list(a = 3, b = 2)
class(obj1) <- "test"
obj2 <- list(a = 1, b = 5)
class(obj2) <- "test"
obj_list <- list(obj1, obj2)
print.test <- function(x, show_b = FALSE, ...) {
cat("a is", x$a, "\n")
if (show_b) cat("b is", x$b, "\n")
}
Printing a single object works as expected:
print(obj1)
## a is 3
print(obj2, show_b = TRUE)
## a is 1
## b is 5
When I print obj_list, my print method is used to print each object in the list:
print(obj_list)
## [[1]]
## a is 3
##
## [[2]]
## a is 1
But I would like to be able to tell print() to show b also in this situation. The following (a bit naive...) code does not produce the desired result:
print(obj_list, show_b = TRUE)
## [[1]]
## a is 3
##
## [[2]]
## a is 1
Is it possible to print obj_list and at the same time pass the argument show_b = TRUE to print.test()? How?
Following Josh's suggestion, I found a way to avoid print.default() being called when printing a list. I simply wrote a print method for lists, since none seems to exist as part of base R:
print.list <- function(x, ...) {
list_names <- names(x)
if (is.null(list_names)) list_names <- rep("", length(x))
print_listelement <- function(i) {
if (list_names[i]=="") {
cat("[[",i,"]]\n", sep="")
} else {
cat("$", list_names[i], "\n", sep="")
}
print(x[[i]], ...)
cat("\n")
}
invisible(lapply(seq_along(x), print_listelement))
}
The relevant part is that ... is passed on to print, when the objects inside the list are printed. So now, coming back to the example in the question, printing a list of test objects works together with show_b =TRUE:
print(obj_list, show_b = TRUE)
## [[1]]
## a is 3
## b is 2
##
## [[2]]
## a is 1
## b is 5
However, I am a bit uncomfortable with defining print.list myself. Chances are that it is not working as well as the built-in printing mechanism for lists.

R - store functions in a data.frame

I would like to return a matrix/data.frame each row containing arguments and the content of a file.
However, there may be many files, so I would prefer if I could load the file lazily, so the file is only read if the actual content is requested. The function below loads the files actively if as.func=F.
It would be perfect if it could load them lazily, but it would also be acceptable, if instead of the content a function is returned that would read the content.
I can make functions that read the content (see below with as.func=T), but for some reason I cannot put that into the data.frame to return.
load_parallel_results <- function(resdir,as.func=F) {
## Find files called .../stdout
stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
## Find files called .../stderr
stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
if(as.func) {
## Create functions to read them
stdoutcontents <-
lapply(stdoutnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
stderrcontents <-
lapply(stderrnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
} else {
## Read them
stdoutcontents <-
lapply(stdoutnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
stderrcontents <-
lapply(stderrnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
}
if(length(stdoutnames) == 0) {
## Return empty data frame if no files found
return(data.frame());
}
## Make the columns containing the variable values
m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow = length(stdoutnames),byrow=T);
mm <- as.data.frame(m[,c(F,T)]);
## Append the stdout and stderr column
mmm <- cbind(mm,unlist(stdoutcontents),unlist(stderrcontents));
colnames(mmm) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
## Example:
## parallel --results my/res/dir --header : 'echo {};seq {myvar1}' ::: myvar1 1 2 ::: myvar2 A B
## > load_parallel_results("my/res/dir")
## myvar1 myvar2 stdout stderr
## [1,] "1" "A" "1 A\n1" ""
## [2,] "1" "B" "1 B\n1" ""
## [3,] "2" "A" "2 A\n1\n2" ""
## [4,] "2" "B" "2 B\n1\n2" ""
return(mmm);
}
Background
GNU Parallel has a --results option that stores output in a structured way. If there are 1000000 outputfiles it may be hard to manage them. R is good for that, but it would be awfully slow if you had to read all 1000000 files just to get the ones where argument 1 = "Foo" and argument 2 = "Bar".
Unfortunately I don't think you can save a function in a data.frame column.
But you could store the deparsed text of the function and evaluate it when needed:
e.g.
myFunc <- function(x) { print(x) }
# convert the function to text
funcAsText <- deparse(myFunc)
# convert the text back to a function
newMyFunc <- eval(parse(text=funcAsText))
# now you can use the function newMyFunc exactly like myFunc
newMyFunc("foo")
> [1] "foo"
EDIT:
Since the files are a lot, I suggest you to simply store a string indicating the type of the file and create a function that understands the types and reads the file accordingly; so you can call it when needed by passing the type and filepath.
(Without reading the question body:)
You can store functions in a data.frame like this:
df <- data.frame(fun = 1:3)
df$fun <- c(mean, sd, function(x) x^2)
I am not sure if this will break other things, so consider using tibble or data.table from the same named packages which really support arbitrary object types.
You can use 2D lists to store your functions. Obviously, you lose some of the checks you get with DFs, but that's the whole point here:
> funs <- c(replicate(5, function(x) NULL), replicate(5, function(y) TRUE))
> names <- as.list(letters[1:10])
> # df doesn't work
> df <- data.frame(names=names)
> df.2 <- cbind(df, funs)
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
# but 2d lists do
> lst.2d <- cbind(funs, names)
> lst.2d[2, 1]
$funs
function (x)
NULL
> lst.2d[6, 1]
$funs
function (y)
TRUE

Resources