Why are these functions different? - r

I am not sure why I get different results from these functions.
change_it1 <- function(x) {
x[x == 5] <- -10
}
change_it2 <- function(x) {
x[x == 5] <- -10
x
}
x <- 1:5
x <- change_it1(x)
x
x <- 1:5
x <- change_it2(x)
x
Why do both functions not change x in the same way as?
x[x==5] <- -10

The assignment operator <- is really a function that has the side effect of changing a variables value. But as a function, it also invisibly returns the value that was used on the right hand side for assignment. We can force the invisible value to be seen with a print(). For example
x <- 1:2
print(names(x) <- c("a","b"))
# [1] "a" "b"
or again with subsetting
print(x[1] <- 10)
# [1] 10
print(x[2] <- 20)
# [1] 20
x
# a b
# 10 20
See in each case the assignment returned the right-hand-side value and not the updated value of x. Functions will return whatever value was returned by the last expression. In the first case, you are returning the value returned by the assignment (which is just the value -10) and in the second case you are explicitly returning the updated x value.
The functions both change x in the same way (at least in the scope of the function), but you are just not returning the updated x value in both cases.

Related

Understanding code for custom in-place modification function?

I came across this post: http://r.789695.n4.nabble.com/speeding-up-perception-tp3640920p3646694.html from Matt Dowle, discussing some early? implementation ideas of the data.table package.
He uses the following code:
x = list(a = 1:10000, b = 1:10000)
class(x) = "newclass"
"[<-.newclass" = function(x,i,j,value) x # i.e. do nothing
tracemem(x)
x[1, 2] = 42L
Specifically I am looking at:
"[<-.newclass" = function(x,i,j,value) x
I am trying to understand what is done there and how i could use this notation.
It looks to me like:
i is the row index
j is column index
value is the value to be assigned
x is the object under consideration
My best guess would therefore be that i define a custom function for in place modification (for a given class).
[<-.newclass is in class modification for class newclass.
Understanding what happens:
Usually the following code should return an error:
x = list(a = 1:10000, b = 1:10000)
x[1, 2] = 42L
so i guess the sample code does not have any practical use.
Attempt to use the logic:
A simple non-sense try would be to square the value to be inserted:
x[i, j] <- value^2
Full try:
> x = matrix(1:9, 3, 3)
> class(x) = "newclass"
> "[<-.newclass" = function(x, i, j, value) x[i, j] <- value^2 # i.e. do something
> x[1, 2] = 9
Error: C stack usage 19923536 is too close to the limit
This doesnt seem to work.
My question(s):
"[<-.newclass" = function(x,i,j,value) x
How exactly does this notation work and how would I use it?
(I add data.table tag since the linked discussion is about the "by-reference" in place modification in data.table, i think).
The `[<-`() function is (traditionally) used for subassignment, and is, more broadly, a type of replacement function. It is also generic (more specifically, an internal generic), which allows you to write custom methods for it, as you correctly surmised.
Replacement functions
In general, when you call a replacement function, such as ...
foo(x) <- bar(y)
... the expression on the right hand side of <- (so here bar(y)) gets passed as a named value argument to `foo<-`() with x as the first argument, and the object x is reassigned with the result: that is, the said call is equivalent to writing:
x <- `foo<-`(x, value = bar(y))
So in order to work at all, all replacement functions must take at least two arguments, one of which must be named value.
Most replacement functions only have these two arguments, but there are also exceptions: such as `attr<-` and, typically, subassignment.
Subassignment
When you have a subassignment call like x[i, j] <- y, i and j get passed as additional arguments to the `[<-`() function with x and y as the first and value arguments, respectively:
x <- `[<-`(x, i, j, value = y) # x[i, j] <- y
In the case of a matrix or a data.frame, i and j would be used for selecting rows and columns; but in general, this does not need to be the case. A method for a custom class could do anything with the arguments. Consider this example:
x <- matrix(1:9, 3, 3)
class(x) <- "newclass"
`[<-.newclass` <- function(x, y, z, value) {
x + (y - z) * value # absolute nonsense
}
x[1, 2] <- 9
x
#> [,1] [,2] [,3]
#> [1,] -8 -5 -2
#> [2,] -7 -4 -1
#> [3,] -6 -3 0
#> attr(,"class")
#> [1] "newclass"
Is this useful or reasonable? Probably not. But is it valid R code? Absolutely!
It's less common to see custom subassignment methods in real applications, as `[<-`() usually "just works" as you might expect it to, based on the underlying object of your class. A notable exception is `[<-.data.frame`, where the underlying object is a list, but subassignment behaves matrix-like. (On the other hand, many classes do need a custom subsetting method, as the default `[`() method drops most attributes, including the class attribute, see ?`[` for details).
As to why your example doesn't work: remember that you are writing a method for a generic function, and all the regular rules apply. If we use the functional form of `[<-`() and expand the method dispatch in your example, we can see immediately why it fails:
`[<-.newclass` <- function(x, i, j, value) {
x <- `[<-.newclass`(x, i, j, value = value^2) # x[i, j] <- value^2
}
That is, the function was defined recursively, without a base case, resulting in an infinite loop. One way to get around this would be to unclass(x) before calling the next method:
`[<-.newclass` <- function(x, i, j, value) {
x <- unclass(x)
x[i, j] <- value^2
x # typically you would also add the class back here
}
(Or, using a somewhat more advanced technique, the body could also be replaced with an explicit next method like this: NextMethod(value = value^2). This plays nicer with inheritance and superclasses.)
And just to verify that it works:
x <- matrix(1:9, 3, 3)
class(x) <- "newclass"
x[1, 2] <- 9
x
#> [,1] [,2] [,3]
#> [1,] 1 81 7
#> [2,] 2 5 8
#> [3,] 3 6 9
Perfectly confusing!
As for the context of Dowle's "do nothing" subassignment example, I believe this was to illustrate that back in R 2.13.0, a custom subassignment method would always cause a deep copy of the object to be made, even if the method itself did nothing at all. (This is no longer the case, since R 3.1.0 I believe.)
Created on 2018-08-15 by the reprex package (v0.2.0).

R programming language LOOPS

y <- vector()
i <- 5
while((2<3)<i){
y[i] <- "Hello World!"
i <- i-1 }
y
So I didn't understand how to while loop works when while((2<3)<i) is the case, 2<3 is true for all conditions and i end up with TRUE<i, what does this mean? Or am I thinking wrong?
I just didn't get how to condition of the while loop works, if I get that I believe I will work it out.
Also another question:
xxx <- function(vec){
n <- length(vec)
}
for(i in 1:n){
x <- vec[i]
if (vec[i]<x){
x <- vec[i]
}
} return(x)
This xxx function is suppose to output the minimum value of the function? okay i see but how?
when we enter the loop we first do x<- vec[i] without doing this we can't pass to the next command the if statement right? so since we do x <- vec[i] earlier if command won't work probably since x==vec[i] all the time.
Please help guys since iI have the exam tomorrow :(
1) ?Comparison says, referring to the two arguments of any comparison operator such as < :
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
so in this case we have one logical argument and one numeric argument so the the logical argument is coerced to numeric (where FALSE is converted to 0 and TRUE is converted to 1). Thus (2<3)<5 is the same as TRUE < 5 which is the same as 1 < 5 which is TRUE:
(2<3)<5
## [1] TRUE
2) For xxx you probably want this:
xxx <- function(vec) {
x <- Inf
for(i in seq_along(vec)) if (vec[i] < x) x <- vec[i]
x
}
The first statement in the body assigns Inf to x In the second statement in the body seq_along(vec) is 1, 2, ..., length(vec) so the for loop iterates i over 1, 2, ..., length(vec) with each iteration replacing x with vec[i] if vec[i] is less than x. Note that if vec has zero length then the loop is not run at all since seq_along(vec) has zero length.
Testing it out:
> xxx(1:3)
[1] 1
> xxx(3:1)
[1] 1
> xxx(numeric(0)) # zero length input
Inf
Of course R already has the min function which does the same thing.

Apply a function to two vectors the "R" way?

There are two vectors x and y. If x contains an NA I want the NA to be replaced by a value from "y" with the corresponding index. Here is some example code that works:
x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
combineVector <- function(x,y)
{
for (i in 1:length(x)){
if (is.na(x[i]) && !is.na(y[i])){
x[i] = y[i]
}
}
return (x)
}
combineVector(x,y)
# [1] 1 2 3 9 5
I could have written this in almost any programming language. Is there a more "R" way to perform this task?
x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
x[is.na(x)] <- y[is.na(x)]
See the above. using is.na() on x returns a logical vector where it is TRUE for the NA elements of x. Using these in the selector for X and Y will select only those NA elements. Using it in assignment will replace the NA elements from x with the corresponding ones from Y.
That will be much faster than looping as the vector gets large.
Try this code:
x[is.na(x)] <- y[is.na(x)]
By subsetting the x vector with is.na(x) you will be assigning only those values of x which are NA to the corresponding indices in the y vector.
To generate a new vector taking x and y as input, you can use the ifelse function:
x<-c(1,2,3,NA,NA)
y<-c(6,7,8,9,NA)
ifelse(is.na(x), y, x)
# [1] 1 2 3 9 NA

R split numeric vector at position

I am wondering about the simple task of splitting a vector into two at a certain index:
splitAt <- function(x, pos){
list(x[1:pos-1], x[pos:length(x)])
}
a <- c(1, 2, 2, 3)
> splitAt(a, 4)
[[1]]
[1] 1 2 2
[[2]]
[1] 3
My question: There must be some existing function for this, but I can't find it? Is maybe split a possibility? My naive implementation also does not work if pos=0 or pos>length(a).
An improvement would be:
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
which can now take a vector of positions:
splitAt(a, c(2, 4))
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2 2
#
# [[3]]
# [1] 3
And it does behave properly (subjective) if pos <= 0 or pos >= length(x) in the sense that it returns the whole original vector in a single list item. If you'd like it to error out instead, use stopifnot at the top of the function.
I tried to use flodel's answer, but it was too slow in my case with a very large x (and the function has to be called repeatedly). So I created the following function that is much faster, but also very ugly and doesn't behave properly. In particular, it doesn't check anything and will return buggy results at least for pos >= length(x) or pos <= 0 (you can add those checks yourself if you're unsure about your inputs and not too concerned about speed), and perhaps some other cases as well, so be careful.
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
}
return(out)
}
However, splitAt2 runs about 20 times faster with an x of length 106:
library(microbenchmark)
W <- rnorm(1e6)
splits <- cumsum(rep(1e5, 9))
tm <- microbenchmark(
splitAt(W, splits),
splitAt2(W, splits),
times=10)
tm
Another alternative that might be faster and/or more readable/elegant than flodel's solution:
splitAt <- function(x, pos) {
unname(split(x, findInterval(x, pos)))
}

How to assign from a function which returns more than one value?

Still trying to get into the R logic... what is the "best" way to unpack (on LHS) the results from a function returning multiple values?
I can't do this apparently:
R> functionReturningTwoValues <- function() { return(c(1, 2)) }
R> functionReturningTwoValues()
[1] 1 2
R> a, b <- functionReturningTwoValues()
Error: unexpected ',' in "a,"
R> c(a, b) <- functionReturningTwoValues()
Error in c(a, b) <- functionReturningTwoValues() : object 'a' not found
must I really do the following?
R> r <- functionReturningTwoValues()
R> a <- r[1]; b <- r[2]
or would the R programmer write something more like this:
R> functionReturningTwoValues <- function() {return(list(first=1, second=2))}
R> r <- functionReturningTwoValues()
R> r$first
[1] 1
R> r$second
[1] 2
--- edited to answer Shane's questions ---
I don't really need giving names to the result value parts. I am applying one aggregate function to the first component and an other to the second component (min and max. if it was the same function for both components I would not need splitting them).
(1) list[...]<- I had posted this over a decade ago on r-help. Since then it has been added to the gsubfn package. It does not require a special operator but does require that the left hand side be written using list[...] like this:
library(gsubfn) # need 0.7-0 or later
list[a, b] <- functionReturningTwoValues()
If you only need the first or second component these all work too:
list[a] <- functionReturningTwoValues()
list[a, ] <- functionReturningTwoValues()
list[, b] <- functionReturningTwoValues()
(Of course, if you only needed one value then functionReturningTwoValues()[[1]] or functionReturningTwoValues()[[2]] would be sufficient.)
See the cited r-help thread for more examples.
(2) with If the intent is merely to combine the multiple values subsequently and the return values are named then a simple alternative is to use with :
myfun <- function() list(a = 1, b = 2)
list[a, b] <- myfun()
a + b
# same
with(myfun(), a + b)
(3) attach Another alternative is attach:
attach(myfun())
a + b
ADDED: with and attach
I somehow stumbled on this clever hack on the internet ... I'm not sure if it's nasty or beautiful, but it lets you create a "magical" operator that allows you to unpack multiple return values into their own variable. The := function is defined here, and included below for posterity:
':=' <- function(lhs, rhs) {
frame <- parent.frame()
lhs <- as.list(substitute(lhs))
if (length(lhs) > 1)
lhs <- lhs[-1]
if (length(lhs) == 1) {
do.call(`=`, list(lhs[[1]], rhs), envir=frame)
return(invisible(NULL))
}
if (is.function(rhs) || is(rhs, 'formula'))
rhs <- list(rhs)
if (length(lhs) > length(rhs))
rhs <- c(rhs, rep(list(NULL), length(lhs) - length(rhs)))
for (i in 1:length(lhs))
do.call(`=`, list(lhs[[i]], rhs[[i]]), envir=frame)
return(invisible(NULL))
}
With that in hand, you can do what you're after:
functionReturningTwoValues <- function() {
return(list(1, matrix(0, 2, 2)))
}
c(a, b) := functionReturningTwoValues()
a
#[1] 1
b
# [,1] [,2]
# [1,] 0 0
# [2,] 0 0
I don't know how I feel about that. Perhaps you might find it helpful in your interactive workspace. Using it to build (re-)usable libraries (for mass consumption) might not be the best idea, but I guess that's up to you.
... you know what they say about responsibility and power ...
Usually I wrap the output into a list, which is very flexible (you can have any combination of numbers, strings, vectors, matrices, arrays, lists, objects int he output)
so like:
func2<-function(input) {
a<-input+1
b<-input+2
output<-list(a,b)
return(output)
}
output<-func2(5)
for (i in output) {
print(i)
}
[1] 6
[1] 7
I put together an R package zeallot to tackle this problem. zeallot includes a multiple assignment or unpacking assignment operator, %<-%. The LHS of the operator is any number of variables to assign, built using calls to c(). The RHS of the operator is a vector, list, data frame, date object, or any custom object with an implemented destructure method (see ?zeallot::destructure).
Here are a handful of examples based on the original post,
library(zeallot)
functionReturningTwoValues <- function() {
return(c(1, 2))
}
c(a, b) %<-% functionReturningTwoValues()
a # 1
b # 2
functionReturningListOfValues <- function() {
return(list(1, 2, 3))
}
c(d, e, f) %<-% functionReturningListOfValues()
d # 1
e # 2
f # 3
functionReturningNestedList <- function() {
return(list(1, list(2, 3)))
}
c(f, c(g, h)) %<-% functionReturningNestedList()
f # 1
g # 2
h # 3
functionReturningTooManyValues <- function() {
return(as.list(1:20))
}
c(i, j, ...rest) %<-% functionReturningTooManyValues()
i # 1
j # 2
rest # list(3, 4, 5, ..)
Check out the package vignette for more information and examples.
functionReturningTwoValues <- function() {
results <- list()
results$first <- 1
results$second <-2
return(results)
}
a <- functionReturningTwoValues()
I think this works.
There's no right answer to this question. I really depends on what you're doing with the data. In the simple example above, I would strongly suggest:
Keep things as simple as possible.
Wherever possible, it's a best practice to keep your functions vectorized. That provides the greatest amount of flexibility and speed in the long run.
Is it important that the values 1 and 2 above have names? In other words, why is it important in this example that 1 and 2 be named a and b, rather than just r[1] and r[2]? One important thing to understand in this context is that a and b are also both vectors of length 1. So you're not really changing anything in the process of making that assignment, other than having 2 new vectors that don't need subscripts to be referenced:
> r <- c(1,2)
> a <- r[1]
> b <- r[2]
> class(r)
[1] "numeric"
> class(a)
[1] "numeric"
> a
[1] 1
> a[1]
[1] 1
You can also assign the names to the original vector if you would rather reference the letter than the index:
> names(r) <- c("a","b")
> names(r)
[1] "a" "b"
> r["a"]
a
1
[Edit] Given that you will be applying min and max to each vector separately, I would suggest either using a matrix (if a and b will be the same length and the same data type) or data frame (if a and b will be the same length but can be different data types) or else use a list like in your last example (if they can be of differing lengths and data types).
> r <- data.frame(a=1:4, b=5:8)
> r
a b
1 1 5
2 2 6
3 3 7
4 4 8
> min(r$a)
[1] 1
> max(r$b)
[1] 8
If you want to return the output of your function to the Global Environment, you can use list2env, like in this example:
myfun <- function(x) { a <- 1:x
b <- 5:x
df <- data.frame(a=a, b=b)
newList <- list("my_obj1" = a, "my_obj2" = b, "myDF"=df)
list2env(newList ,.GlobalEnv)
}
myfun(3)
This function will create three objects in your Global Environment:
> my_obj1
[1] 1 2 3
> my_obj2
[1] 5 4 3
> myDF
a b
1 1 5
2 2 4
3 3 3
Lists seem perfect for this purpose. For example within the function you would have
x = desired_return_value_1 # (vector, matrix, etc)
y = desired_return_value_2 # (vector, matrix, etc)
returnlist = list(x,y...)
} # end of function
main program
x = returnlist[[1]]
y = returnlist[[2]]
Yes to your second and third questions -- that's what you need to do as you cannot have multiple 'lvalues' on the left of an assignment.
How about using assign?
functionReturningTwoValues <- function(a, b) {
assign(a, 1, pos=1)
assign(b, 2, pos=1)
}
You can pass the names of the variable you want to be passed by reference.
> functionReturningTwoValues('a', 'b')
> a
[1] 1
> b
[1] 2
If you need to access the existing values, the converse of assign is get.
[A]
If each of foo and bar is a single number, then there's nothing wrong with c(foo,bar); and you can also name the components: c(Foo=foo,Bar=bar). So you could access the components of the result 'res' as res[1], res[2]; or, in the named case, as res["Foo"], res["BAR"].
[B]
If foo and bar are vectors of the same type and length, then again there's nothing wrong with returning cbind(foo,bar) or rbind(foo,bar); likewise nameable. In the 'cbind' case, you would access foo and bar as res[,1], res[,2] or as res[,"Foo"], res[,"Bar"]. You might also prefer to return a dataframe rather than a matrix:
data.frame(Foo=foo,Bar=bar)
and access them as res$Foo, res$Bar. This would also work well if foo and bar were of the same length but not of the same type (e.g. foo is a vector of numbers, bar a vector of character strings).
[C]
If foo and bar are sufficiently different not to combine conveniently as above, then you shuld definitely return a list.
For example, your function might fit a linear model and
also calculate predicted values, so you could have
LM<-lm(....) ; foo<-summary(LM); bar<-LM$fit
and then you would return list(Foo=foo,Bar=bar) and then access the summary as res$Foo, the predicted values as res$Bar
source: http://r.789695.n4.nabble.com/How-to-return-multiple-values-in-a-function-td858528.html
Year 2021 and this is something I frequently use.
tidyverse package has a function called lst that assigns name to the list elements when creating the list.
Post which I use list2env() to assign variable or use the list directly
library(tidyverse)
fun <- function(){
a<-1
b<-2
lst(a,b)
}
list2env(fun(), envir=.GlobalEnv)#unpacks list key-values to variable-values into the current environment
This is only for the sake of completeness and not because I personally prefer it. You can pipe %>% the result, evaluate it with curly braces {} and write variables to the parent environment using double-arrow <<-.
library(tidyverse)
functionReturningTwoValues() %>% {a <<- .[1]; b <<- .[2]}
UPDATE:
Your can also use the multiple assignment operator from the zeallot package:: %<-%
c(a, b) %<-% list(0, 1)
I will post a function that returns multiple objects by way of vectors:
Median <- function(X){
X_Sort <- sort(X)
if (length(X)%%2==0){
Median <- (X_Sort[(length(X)/2)]+X_Sort[(length(X)/2)+1])/2
} else{
Median <- X_Sort[(length(X)+1)/2]
}
return(Median)
}
That was a function I created to calculate the median. I know that there's an inbuilt function in R called median() but nonetheless I programmed it to build other function to calculate the quartiles of a numeric data-set by using the Median() function I just programmed. The Median() function works like this:
If a numeric vector X has an even number of elements (i.e., length(X)%%2==0), the median is calculated by averaging the elements sort(X)[length(X)/2] and sort(X)[(length(X)/2+1)].
If Xdoesn't have an even number of elements, the median is sort(X)[(length(X)+1)/2].
On to the QuartilesFunction():
QuartilesFunction <- function(X){
X_Sort <- sort(X) # Data is sorted in ascending order
if (length(X)%%2==0){
# Data number is even
HalfDN <- X_Sort[1:(length(X)/2)]
HalfUP <- X_Sort[((length(X)/2)+1):length(X)]
QL <- Median(HalfDN)
QU <- Median(HalfUP)
QL1 <- QL
QL2 <- QL
QU1 <- QU
QU2 <- QU
QL3 <- QL
QU3 <- QU
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
} else{ # Data number is odd
# Including the median
Half1DN <- X_Sort[1:((length(X)+1)/2)]
Half1UP <- X_Sort[(((length(X)+1)/2)):length(X)]
QL1 <- Median(Half1DN)
QU1 <- Median(Half1UP)
# Not including the median
Half2DN <- X_Sort[1:(((length(X)+1)/2)-1)]
Half2UP <- X_Sort[(((length(X)+1)/2)+1):length(X)]
QL2 <- Median(Half2DN)
QU2 <- Median(Half2UP)
# Methods (1) and (2) averaged
QL3 <- (QL1+QL2)/2
QU3 <- (QU1+QU2)/2
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
}
return(Quartiles)
}
This function returns the quartiles of a numeric vector by using three methods:
Discarding the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Keeping the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Averaging the results obtained by using methods 1 and 2.
When the number of elements in the numeric vector X is even, the three methods coincide.
The result of the QuartilesFunction() is a vector that depicts the first and third quartiles calculated by using the three methods outlined.
With R 3.6.1, I can do the following
fr2v <- function() { c(5,3) }
a_b <- fr2v()
(a_b[[1]]) # prints "5"
(a_b[[2]]) # prints "3"
To obtain multiple outputs from a function and keep them in the desired format you can save the outputs to your hard disk (in the working directory) from within the function and then load them from outside the function:
myfun <- function(x) {
df1 <- ...
df2 <- ...
save(df1, file = "myfile1")
save(df2, file = "myfile2")
}
load("myfile1")
load("myfile2")

Resources