How to change the loop structure into sapply function? - r

I had changed the for loop into sapply function ,but failed .
I want to know why ?
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
for (i in 1:length(z)){
if (is.element(basename(z[i]),left))
{
append(w,values=z[i])->w;
setdiff(left,basename(z[i]))->left
}}
print(w)
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
sapply(z,function(y){
if (is.element(basename(y),left))
{ append(w,values=y)->w;
setdiff(left,basename(y))->left
}})
print(w)
my rule of selecting music is that if the basename(music) is the same ,then save only one full.name of music ,so unique can not be used directly.there are two concepts full.name and basename in file path which can confuse people here.

The problem you have here is that you want your function to have two side-effects. By side-effect, I mean modify objects that are outside its scope:w and left.
Curently, w and left are only modified within the function's scope, then they are eventually lost as the function call ends.
Instead, you want to modify w and left outside the function's environment. For that you can use <<- instead of <-:
sapply(z, function(y) {
if (is.element(basename(y),left)) {
w <<- append(w, values = y)
left <<- setdiff(left, basename(y))
}
})
Note that I have been saying "you want", "you can", but this is not what "you should" do. Functions with side-effects are really considered bad programming. Try reading about it.
Also, it is good to reserve the *apply tools to functions that can run their inputs independently. Here instead, you have an algorithm where the outcome of an iteration depends on the outcome of the previous ones. These are cases where you're better off using a for loop, unless you can rethink the algorithm in a framework that better suits *apply or can make use of functions that can handle such dependent situations: filter, unique, rle, etc.
For example, using unique, your code can be rewritten as:
base.names <- basename(z)
left <- unique(base.names)
w <- z[match(left, base.names)]
It also has the advantage that it is not building an object recursively, another no-no of your current code.

Related

My defined R function does not 'save' the changes made to a matrix [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

Global vs Local within userdefined function

I am making changes to a global dataframe within my user defined function. The dataframe is created outside of the function.
However, my changes to the dataframe are not visible outside of the function. Only if I use a return option, I end up with the dataframe.
Is there a way to change this?
Whether you should do "call by reference" functionality in R is one question (addressed in the comments - generally the answer is no).
However, you asked whether you can do it. The answer is yes, you can modify your global dataframe in the local scope of your function. Here is how you do it: 1) Use eval.parent() (set the evaluation scope to the calling scope, which, presumably, is the global scope) and 2) substitute() (to replace the variable reference instead of destroying one and creating a new one).
Here's an example:
> attach(mtcars)
> my_cars <- mtcars[mpg,] #not sorted
> pointless_sort <- function() {
+ eval.parent(substitute(my_cars<-mtcars[order(mpg),]))
+ }
> pointless_sort()
> #here the global my_cars is ordered/sorted by mpg
Important points: 1) You can do it; 2) Good programming generally means not doing it (but we've all been lazy, wanted a convenient way to split up code). Now you have the power.
"With Great Power Comes Great Responsibility."

R: IF object is TRUE then assign object NOT WORKING

I am trying to write a very basic IF statement in R and am stuck. I thought I'd find someone with the same problem, but I cant. Im sorry if this has been solved before.
I want to check if a variable/object has been assigned, IF TRUE I want to execute a function that is part of a R-package. First I wrote
FileAssignment <- function(x){
if(exists("x")==TRUE){
print("yes!")
x <- parse.vdjtools(x)
} else { print("Nope!")}
}
I assign a filename as x
FILENAME <- "FILENAME.txt"
I run the function
FileAssignment(FILENAME)
I use print("yes!") and print("Nope!") to check if the IF-Statement works, and it does. However, the parse.vdjtools(x) part is not assigned. Now I tested the same IF-statement outside of the function:
if(exists("FILENAME1")==TRUE){
FILENAME1 <- parse.vdjtools(FILENAME1)
}
This works. I read here that it might be because the function uses {} and the if-statement does too. So I should remove the brackets from the if-statement.
FileAssignment <- function(x){
if(exists("x")==TRUE)
x <- parse.vdjtools(x)
else { print("Nope!")
}
Did not work either.
I thought it might be related to the specific parse.vdjtools(x) function, so I just tried assigning a normal value to x with x <- 20. Also did not work inside the function, however, it does outside.
I dont really know what you are trying to acheive, but I wpuld say that the use of exists in this context is wrong. There is no way that the x cannot exist inside the function. See this example
# All this does is report if x exists
f <- function(x){
if(exists("x"))
cat("Found x!", fill = TRUE)
}
f()
f("a")
f(iris)
# All will be found!
Investigate file.exists instead? This is vectorised, so a vector of files can be investigated at the same time.
The question that you are asking is less trivial than you seem to believe. There are two points that should be addressed to obtain the desired behavior, and especially the first one is somewhat tricky:
As pointed out by #NJBurgo and #KonradRudolph the variable x will always exist within the function since it is an argument of the function. In your case the function exists() should therefore not check whether the variable x is defined. Instead, it should be used to verify whether a variable with a name corresponding to the character string stored in x exists.
This is achieved by using a combination of deparse() and
substitute():
if (exists(deparse(substitute(x)))) { …
Since x is defined only within the scope of the function, the superassignment operator <<- would be required to make a value assigned to x visible outside the function, as suggested by #thothai. However, functions should not have such side effects. Problems with this kind of programming include possible conflicts with another variable named x that could be defined in a different context outside the function body, as well as a lack of clarity concerning the operations performed by the function.
A better way is to return the value instead of assigning it to a variable.
Combining these two aspects, the function could be rewritten like this:
FileAssignment <- function(x){
if (exists(deparse(substitute(x)))) {
print("yes!")
return(parse.vdjtools(x))
} else {
print("Nope!")
return(NULL)}
}
In this version of the function, the scope of x is limited to the function body and the function has no side effects. The return value of FileAssignment(a) is either parse.vdjtools(a) or NULL, depending on whether a exists or not.
Outside the function, this value can be assigned to x with
x <- FileAssignment(a)

Write R function with control flow that depends on argument value

For small function, it is trivial to just write conditional statement based on the argument value. For example, I have a function that extracts variable label from an ex-STATA dataframe. There are two options for output-type, environment and df.
f_extract_stata_label <- function(df, output="environment") {
if (output=="env") {
lab_env <- new.env()
for (i in seq_along(names(df))) {
lab_env[[names(df)[i]]] <- attr(df, "var.labels")[i]
}
return(lab_env)
} else if (output=="df") {
lab_df <- data.frame(var.name = names(d_tmp),
var.label = attr(d_tmp, "var.labels"))
return(lab_df)
}
}
However, I suspect that this is not good R idiom. First, how the function depends on output is not clear -- the reader has to read half way through the code to find out. Second, adding options to output in the future makes the function very hard to read.
So how should I rewrite this function?
R uses this kind of pattern in its core stats libraries where "label" strings make sense. These are functions where R's dispatch system is not that useful. That said, what you want is still dispatch-like.
You could refactor it to use a switch that calls a function dedicated to a specific output type. Two things happen then. First, the extra function call makes it clear what context you're in when using the traceback. Second, it makes the functions smaller and easier to read.
I would question whether you really want to use a dispatch function though, and why separate direct functions are not appropriate.

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

Resources