ifelse and append not working within a function - r

I want to distribute the names of files in two different vectors. If they exist, they should go to existing_files and if not, they should go to missing_files.
existing_files <- ""
missing_files <- ""
f1 <- function(file_name){
if (any(file_name == vector_of_file_names)) {
existing_files <- append(existing_files, file_name)
} else {
missing_files <- append(missing_files, file_name)
}
}
f1("file1")
Executing "file1" does not work and i don't get any warning or error. Why is that? If i remove "existing_files <-" and "missing_files <-" in the function, i would get the correct result in the console. But i need it stored in the global environment.
A bit more information on what i am trying to solve here:
I am scanning the folder i am working with for file names i have stored in "vector_of_file_names". I just wanna know, if all the files i need are in that folder or not. I do that with list.files, which gives me all the files. Then i compare those files to "vector_of_file_names". The result is what is seen in my question. I want two vectors that list the files (that i want in that folder) according to their existence in that folder.

existing_files and missing_files are in the global scope. They can't be changed within the scope of your function without super-assignment <<-
existing_files <- ""
missing_files <- ""
f1 <- function(file_name){
if (any(file_name == existing_files)) {
existing_files <<- append(existing_files, file_name)
} else {
missing _files <<- append(missing_files, file_name)
}
}
f1("file1")

Can you give a little more context on the problem you are trying to solve? I don't see anything in your code that actually tests for the "existence" of files, only the state of the vector existing_files.
It is unusual design choice to have a function which acts on a single object at a time and then potentially modifies state in one of multiple external objects depending on the result. Yes, you can "solve" your problem using exotic methods like out of superassignment (where objects in others scopes are modified as a side effect of your function) but your code will quickly become hard to reason about.

Related

Precise specification of function environments

I would like to have the option of specifying very precisely (exactly, if possible) the behavior of functions. For example,
foo <- function() {}
Env <- new.env(parent=emptyenv())
environment(foo) <- Env
foo()
# Error in { : could not find function "{"
Env$`{` <- base::`{`
environment(foo) <- Env
foo()
# NULL
I know I need to set the environment of foo, and that whatever foo depends upon is defined in that environment (or that environment's parent, or the parent's parent, etc.). But
what all must be specified, and
what can't be specified (i.e., in what cases am I stuck with default behavior)?
For example, in the following, why is there no error when the call to bar is attempted, when I leave " undefined (what kind of a thing is ", anyway? Or is it ""?).
bar <- function() {""}
Env <- new.env(parent=emptyenv())
Env$`{` <- base::`{`
environment(bar) <- Env
bar()
# [1] ""
My larger exercise is exploring ways to build very large systems with lots of dependencies, but no circular dependencies. I think I will need precise control over function environments in order to guarantee that there are no cycles. I am aiming for a ball of mud of sorts, where I can, given certain discipline, put all the R code I ever write into a single structure (which might not all be in memory, or even on local storage)—and, whenever I write new code, have the option of depending on some code I wrote in the past, provided that I have access to that ealier code.
Related posts include:
[distinct], [specify], and [environments-enclosures-frames].

R: IF object is TRUE then assign object NOT WORKING

I am trying to write a very basic IF statement in R and am stuck. I thought I'd find someone with the same problem, but I cant. Im sorry if this has been solved before.
I want to check if a variable/object has been assigned, IF TRUE I want to execute a function that is part of a R-package. First I wrote
FileAssignment <- function(x){
if(exists("x")==TRUE){
print("yes!")
x <- parse.vdjtools(x)
} else { print("Nope!")}
}
I assign a filename as x
FILENAME <- "FILENAME.txt"
I run the function
FileAssignment(FILENAME)
I use print("yes!") and print("Nope!") to check if the IF-Statement works, and it does. However, the parse.vdjtools(x) part is not assigned. Now I tested the same IF-statement outside of the function:
if(exists("FILENAME1")==TRUE){
FILENAME1 <- parse.vdjtools(FILENAME1)
}
This works. I read here that it might be because the function uses {} and the if-statement does too. So I should remove the brackets from the if-statement.
FileAssignment <- function(x){
if(exists("x")==TRUE)
x <- parse.vdjtools(x)
else { print("Nope!")
}
Did not work either.
I thought it might be related to the specific parse.vdjtools(x) function, so I just tried assigning a normal value to x with x <- 20. Also did not work inside the function, however, it does outside.
I dont really know what you are trying to acheive, but I wpuld say that the use of exists in this context is wrong. There is no way that the x cannot exist inside the function. See this example
# All this does is report if x exists
f <- function(x){
if(exists("x"))
cat("Found x!", fill = TRUE)
}
f()
f("a")
f(iris)
# All will be found!
Investigate file.exists instead? This is vectorised, so a vector of files can be investigated at the same time.
The question that you are asking is less trivial than you seem to believe. There are two points that should be addressed to obtain the desired behavior, and especially the first one is somewhat tricky:
As pointed out by #NJBurgo and #KonradRudolph the variable x will always exist within the function since it is an argument of the function. In your case the function exists() should therefore not check whether the variable x is defined. Instead, it should be used to verify whether a variable with a name corresponding to the character string stored in x exists.
This is achieved by using a combination of deparse() and
substitute():
if (exists(deparse(substitute(x)))) { …
Since x is defined only within the scope of the function, the superassignment operator <<- would be required to make a value assigned to x visible outside the function, as suggested by #thothai. However, functions should not have such side effects. Problems with this kind of programming include possible conflicts with another variable named x that could be defined in a different context outside the function body, as well as a lack of clarity concerning the operations performed by the function.
A better way is to return the value instead of assigning it to a variable.
Combining these two aspects, the function could be rewritten like this:
FileAssignment <- function(x){
if (exists(deparse(substitute(x)))) {
print("yes!")
return(parse.vdjtools(x))
} else {
print("Nope!")
return(NULL)}
}
In this version of the function, the scope of x is limited to the function body and the function has no side effects. The return value of FileAssignment(a) is either parse.vdjtools(a) or NULL, depending on whether a exists or not.
Outside the function, this value can be assigned to x with
x <- FileAssignment(a)

How to change the loop structure into sapply function?

I had changed the for loop into sapply function ,but failed .
I want to know why ?
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
for (i in 1:length(z)){
if (is.element(basename(z[i]),left))
{
append(w,values=z[i])->w;
setdiff(left,basename(z[i]))->left
}}
print(w)
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
sapply(z,function(y){
if (is.element(basename(y),left))
{ append(w,values=y)->w;
setdiff(left,basename(y))->left
}})
print(w)
my rule of selecting music is that if the basename(music) is the same ,then save only one full.name of music ,so unique can not be used directly.there are two concepts full.name and basename in file path which can confuse people here.
The problem you have here is that you want your function to have two side-effects. By side-effect, I mean modify objects that are outside its scope:w and left.
Curently, w and left are only modified within the function's scope, then they are eventually lost as the function call ends.
Instead, you want to modify w and left outside the function's environment. For that you can use <<- instead of <-:
sapply(z, function(y) {
if (is.element(basename(y),left)) {
w <<- append(w, values = y)
left <<- setdiff(left, basename(y))
}
})
Note that I have been saying "you want", "you can", but this is not what "you should" do. Functions with side-effects are really considered bad programming. Try reading about it.
Also, it is good to reserve the *apply tools to functions that can run their inputs independently. Here instead, you have an algorithm where the outcome of an iteration depends on the outcome of the previous ones. These are cases where you're better off using a for loop, unless you can rethink the algorithm in a framework that better suits *apply or can make use of functions that can handle such dependent situations: filter, unique, rle, etc.
For example, using unique, your code can be rewritten as:
base.names <- basename(z)
left <- unique(base.names)
w <- z[match(left, base.names)]
It also has the advantage that it is not building an object recursively, another no-no of your current code.

R Script - How to Continue Code Execution on Error

I have written an R script which includes a loop that retrieves external (web) data. The format of the data are most of the time the same, however sometimes the format changes in an unpredictable way and my loop is crashing (stops running).
Is there a way to continue code execution regardless the error? I am looking for something similar to "On error Resume Next" from VBA.
Thank you in advance.
Use try or tryCatch.
for(i in something)
{
res <- try(expression_to_get_data)
if(inherits(res, "try-error"))
{
#error handling code, maybe just skip this iteration using
next
}
#rest of iteration for case of no error
}
The modern way to do this uses purrr::possibly.
First, write a function that gets your data, get_data().
Then modify the function to return a default value in the case of an error.
get_data2 <- possibly(get_data, otherwise = NA)
Now call the modified function in the loop.
for(i in something) {
res <- get_data2(i)
}
You can use try:
# a has not been defined
for(i in 1:3)
{
if(i==2) try(print(a),silent=TRUE)
else print(i)
}
How about these solutions on this related question :
Is there a way to `source()` and continue after an error?
Either parse(file = "script.R") followed by a loop'd try(eval()) on each expression in the result.
Or the evaluate package.
If all you need to do is a small piece of clean up, then on.exit() may be the simplest option. It will execute the expression "when the current function exits (either naturally or as the result of an error)" (documentation here).
For example, the following will delete my_large_dataframe regardless of whether output_to_save gets created.
on.exit(rm("my_large_dataframe"))
my_large_dataframe = function_that_does_not_error()
output_to_save = function_that_does_error(my_large_dataframe)

How does local() differ from other approaches to closure in R?

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,
example <- local({
hidden.x <- "You can't see me!"
hidden.fn <- function(){
cat("\"hidden.fn()\"")
}
function(){
cat("You can see and call example()\n")
cat("but you can't see hidden.x\n")
cat("and you can't call ")
hidden.fn()
cat("\n")
}
})
which behaves as follows from the command prompt:
> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"
I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.
What the pros and cons of these two methods?
Encapsulation
The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.
Note that we actually can access the hidden function although its not that easy:
# run hidden.fn
environment(example)$hidden.fn()
Object Oriented Programming
Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:
library(proto)
p <- proto(x = "x",
fn = function(.) cat(' "fn()"\n '),
example = function(.) .$fn()
)
p$example() # prints "fn()"
proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()
EDIT:
The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.
ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch
local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.
getMPIcluster <- NULL
setMPIcluster <- NULL
local({
cl <- NULL
getMPIcluster <<- function() cl
setMPIcluster <<- function(new) cl <<- new
})
local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.
Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.
As #otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler
library(XML)
crawler <- local({
seen <- new.env(parent=emptyenv())
.do_crawl <- function(url, base, pattern) {
if (!exists(url, seen)) {
message(url)
xml <- htmlTreeParse(url, useInternal=TRUE)
hrefs <- unlist(getNodeSet(xml, "//a/#href"))
urls <-
sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
seen[[url]] <- length(urls)
for (url in urls)
.do_crawl(url, base, pattern)
}
}
.do_report <- function(url) {
urls <- as.list(seen)
data.frame(Url=names(urls), Links=unlist(unname(urls)),
stringsAsFactors=FALSE)
}
list(crawl=function(base, pattern="^/.*html$") {
.do_crawl(base, base, pattern)
}, report=.do_report)
})
crawler$crawl(favorite_url)
dim(crawler$report())
(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

Resources