Understanding the source code of "ave" function - r

Here is the source code for the "ave" function in R:
function (x, ..., FUN = mean)
{
if (missing(...))
x[] <- FUN(x)
else {
g <- interaction(...)
split(x, g) <- lapply(split(x, g), FUN)
}
x
}
I am having trouble understanding how the assignment, "split(x, g) <- lapply(split(x, g), FUN)" works. Consider the following example:
# Overview: function inputs and outputs
> x = 10*1:6
> g = c('a', 'b', 'a', 'b', 'a', 'b')
> ave(x, g)
[1] 30 40 30 40 30 40
# Individual components of "split" assignment
> split(x, g)
$a
[1] 10 30 50
$b
[1] 20 40 60
> lapply(split(x, g), mean)
$a
[1] 30
$b
[1] 40
# Examine "x" before and after assignment
> x
[1] 10 20 30 40 50 60
> split(x, g) <- lapply(split(x, g), mean)
> x
[1] 30 40 30 40 30 40
Questions:
• Why does the assignment, "split(x,g) <- lapply(split(x,g), mean)", directly modify x? Does "<-" always modify the first argument of a function, or is there some other rule for this?
• How does this assignment even work? Both the "split" and "lapply" statements have lost the original ordering of x. They are also length 2. How do you end up with a vector of length(x) that matches the original ordering of x?

This is a tricky one. <- usually does not work in this way. What is actually happening is that you are not calling split(), you are calling a replacement function called split<-(). The documentation of split says
[...] The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split.
See also this answer

Related

How to make a result of a function the arguments of the same function atumatically in R

I am working on a complicated project, and each time I need to run my function using the result of the previous run of the function. To make my point clearer, suppose that I have a vector x, and a function myfunc. Then, I need to run myfunc using the vector x. Then, I take the output of my function and plug them again as an argument of the same function. I need to repeat this automatically several times.
For example,
x <- c(1,2,3)
myfunc <- function(x){
res <- 2*x
return(res)
}
Then,
x <- myfunc(x)
> x
[1] 2 4 6
x <- myfunc(x)
> x
[1] 4 8 12
How can I do this automatically (repeat for, say, 5 times)? In the end, I need the result of the final run only. For example, the result of the fifth run.
x <- c(1,2,3)
for (i in 1:5) {
x = myfunc(x);
}
outputs [1] 32 64 96, as does myfunc(myfunc(myfunc(myfunc(myfunc(x))))).
Just keep reassigning in a loop?
A good way to do so would be to include an argument repeats in your function itself.
myfunc <- function(x, repeats=1){
res <- x
for(i in 1:repeats) {
res <- 2*res
}
return(res)
}
> myfunc(x, 5)
[1] 32 64 96
Here's a one liner. Recall allows for recursive calling based on a condition. Here I assume whatever happens in the expression in my_fun is vectorized, as * is. If it is not, wrap the function in Vectorize.
f <- function(n, rep) if(rep) Recall(n * 2, rep - 1) else n
f(1:3, 5)
[1] 32 64 96
Here is another option with reduce
library(purrr)
reduce(1:5, ~ .x *2, .init = x)
[1] 32 64 96
You can use:
x <- c(1,2,3)
myfunc <- function(x){
res <- 2*x
x <<- res
return(res)
}
The double assign operator makes sure that your initial x gets overwritten in each function call.
Here‘s the result for 5 runs:
replicate(5, myfunc(x))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 4 8 16 32
[2,] 4 8 16 32 64
[3,] 6 12 24 48 96

How to add possible divisor numbers?

How do I retrieve maximum sum of possible divisors numbers
I have a below function which will give possible divisors of number
Code
divisors <- function(x) {
y <- seq_len(ceiling(x / 2))
y[x %% y == 0]
}
Example
Divisors of 99 will give the below possible values.
divisors(99)
[1] 1 3 9 11 33
My expected Logic :
Go from last digit to first digit in the divisors value
The last number is 33, Here next immediate number divisible by 33 is 11 . So I selected 11 , now traversing from 11 the next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
33 + 11 + 1 = 45
Move to next number 11, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
11 + 1 = 12
Here immediate
Move to next number 9, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
9 + 3 + 1 = 13
Move to next number 3, Now next immediate number divisible by 3 is 1. So selected 1. Now add all the numbers.
3+1=4
Now maximum among these is 45.
Now I am struggling to write this logic in R . Help / Advice much appreciated.
Note : Prime numbers can be ignored.
update
For large integers, e.g., the maximum integer .Machine$integer.max (prime number), you can run the code below (note that I modified functions divisors and f a bit)
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
f <- function(y) {
if (length(y) <= 2) {
return(as.integer(sum(y)))
}
l <- length(y)
h <- y[l]
yy <- y[-l]
h + f(yy[h %% yy == 0])
}
and you will see
> n <- .Machine$integer.max - 1
> x <- divisors(n)
> max(sapply(length(x):2, function(k) f(head(x, k))))
[1] 1569603656
You can define a recursive function f that gives successive divisors
f <- function(y) {
if (length(y) == 1) {
return(y)
}
h <- y[length(y)]
yy <- y[-length(y)]
c(f(yy[h %% yy == 0]), h)
}
and you will see all possible successive divisor tuples
> sapply(rev(seq_along(x)), function(k) f(head(x, k)))
[[1]]
[1] 1 11 33
[[2]]
[1] 1 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then, we apply f within sapply like below
> max(sapply(rev(seq_along(x)), function(k) sum(f(head(x, k)))))
[1] 45
which gives the desired output.
You can also use the following solution. It may sound a little bit complicated and of course there is always an easier, more efficient solution. However, I thought this could be useful to you. I will take it from your divisors output:
> x
[1] 1 3 9 11 33
# First I created a list whose first element is our original x and from then on
# I subset the first element till the last element of the list
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
> lst
[[1]]
[1] 1 3 9 11 33
[[2]]
[1] 1 3 9 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then I wrote a custom function in order to implement your conditions and gather your desired output. For this purpose I created a function factory which in fact is a function that creates a function:
As you might have noticed the outermost function does not take any argument. It only sets up an empty vector out to save our desired elements in. It is created in the execution environment of the outermost function to shield it from any changes that might affect it in the global environment
The inner function is the one that takes our vector x so in general we call the whole setup like fnf()(x). First element of of our out vector is in fact the first element of the original x(33). Then I found all divisors of the first element whose quotient were 0. After I fount them I took the second element (11) as the first one was (33) and stored it in our out vector. Then I modified the original x vector and omitted the max value (33) and repeated the same process
Since we were going to repeat the process over again, I thought this might be a good case to use recursion. Recursion is a programming technique that a function actually calls itself from its body or from inside itself. As you might have noticed I used fn inside the function to repeat the process again but each time with one fewer value
This may sound a bit complicated but I believed there may be some good points for you to pick up for future exploration, since I found them very useful, hoped that's the case for you too.
fnf <- function() {
out <- c()
fn <- function(x) {
out <<- c(out, x[1])
z <- x[out[length(out)]%%x == 0]
if(length(z) >= 2) {
out[length(out) + 1] <<- z[2]
} else {
return(out)
}
x <- x[!duplicated(x)][which(x[!duplicated(x)] == z[2]):length(x[!duplicated(x)])]
fn(x)
out[!duplicated(out)]
}
}
# The result of applying the custom function on `lst` would result in your
# divisor values
lapply(lst, function(x) fnf()(sort(x, decreasing = TRUE)))
[[1]]
[1] 33 11 1
[[2]]
[1] 11 1
[[3]]
[1] 9 3 1
[[4]]
[1] 3 1
[[5]]
[1] 1
In the end we sum each element and extract the max value
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 45
Testing a very large integer number, I used dear #ThomasIsCoding's modified divisors function:
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
x <- divisors(.Machine$integer.max - 1)
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 1569603656
You'll need to recurse. If I understand correctly, this should do what you want:
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x)/2)
factors <- div[x %% div == 0L]
return(factors)
}
maxfact <- function(x) {
factors <- fact(x)
if (length(factors) < 3L) {
return(sum(factors))
} else {
return(max(factors + mapply(maxfact, factors)))
}
}
maxfact(99)
[1] 45

How to write generic function with two inputs?

I am a newbee in programming, and I run into an issue with R about generic function: how to write it when there are multiple inputs?
For an easy example, for dataset and function
z <- c(2,3,4,5,8)
calc.simp <- function(a,x){a*x+8}
# Test the function:
calc.simp(x=z,a=3)
[1] 14 17 20 23 32
Now I change the class of z:
class(z) <- 'simp'
How should I write the generic function 'calc' as there are two inputs?
My attempts and errors are below:
calc <- function(x) UseMethod('calc',x)
calc(x=z)
Error in calc.simp(x = z) : argument "a" is missing, with no default
And
calc <- function(x,y) UseMethod('calc',x,y)
Error in UseMethod("calc", x, y) : unused argument (y)
My confusion might be a fundamental one as I am just a beginner. Please help! Thank you very much!
I'd suggest you model your generic function off of the template used by innumerable base R functions as, e.g., mean:
> mean
function (x, ...)
UseMethod("mean")
In your case, that would translate to the following generic which (if I understand your question correctly) works just fine:
calc <- function(x, ...) UseMethod('calc')
calc.simp <- function(a, x) {
x <- unclass(x)
a * x + 8
}
## Try it out
z <- c(2,3,4,5,8)
class(z) <- "simp"
calc.simp(x = z, 10)
## [1] 28 38 48 58 88
calc(x = z, 10)
## [1] 28 38 48 58 88

Take the first unique value form a function

This is my function:
g <- function(x,y){
x <- (x-y):x
y <- 1:30 # ------> (y is always fixed 1:30)
z<- outer(x,y,fv) # ---->(fv is a previous function)
s <- colSums(z)
which(s==max(s),arr.ind=T)
}
It tells me the position of the max value in s. I basically have a problem in choosing y because given a small y, the max(s) appears more than once in s. For example:
#given x=53
> g(53,1)
[1] 13 16 20 22 25 26 27
> g(53,2)
[1] 20 25 26
> g(53,3)
[1] 20 25 26
> g(53,4)
[1] 20 25 26
> g(53,5)
[1] 20 25
> g(53,6)
[1] 25 -----> This is the only result i would like from my function (right y=6)
Another example:
# given x=71
> g(71,1)
[1] 7 9 14
> g(71,2)
[1] 7 14
> g(71,3)
[1] 14 -----> my desired result (right y=3)
Therefore, i would like a function resulting in the first unique solution given y as small as possible ( ex: g(53)=25 , g(71)=14, ...). Any help? Thanks
This is a simplify example. I hope to be more clear in questioning:
#The idea is the same:
n <- 1:9
e <- rep(nn,500)
p<- sample(e) # --->(Need to sample in order to have more max later (mixed matrix)
mat <- matrix(p,90)
g <- function(x,y){
x <- (x-y):x
k <- rowSums(mat[,x])
which(k==max(k), arr.ind=T)
}
#In my sample matrix :
k <- rowSums(mat[,44:45])
which(k==max(k), arr.ind=T)
[1] 44 71 90
#In fact
g(45,1)
[1] 44 71 90 # ---> more than one solution
g(45,2)
[1] 90 # ----> I would like to pick up this value wich is the first unique solution given x=45
Therefore, i would like a function resulting in the first unique solution for y as small as possible given x ( in this new ex: g(45)=90... ).
I got it. It is a bit long but i think right.
Taking into consideration the second simplify example:
g <- function(x,y){
x <- (x-y):x
k <- rowSums(mat[,x])
q <- which(k==max(k), arr.ind=T)
length(q)
}
gv <- Vectorize(g)
l <- function(x){
y<- 1:30 # <- (until 30 to be sure)
z<- outer(x,y,gv)
y <- which.min(z) # <- (min is surely length=1 and which.min takes the first)
x <- (x-y):x
k <- rowSums(mat[,x])
q <- which(k==max(k), arr.ind=T)
q
}
l(45)
[1] 90
It seems like you could just do this with a recursive function. Consider the following:
set.seed(42)
n = 1:9
e = rep(n, 500)
p = sample(e)
mat = matrix(p, 90)
g <- function(x, y=1) {
xv <- (x-y):x
k <- rowSums(mat[, xv])
i <- which(k == max(k), arr.ind=T)
n <- length(i)
if (n == 1) {
return(y) # want to know the min y that solves the problem, right?
} else {
y <- y + 1 # increase y by 1
g(x,y) # run our function again with a new value of y
}
}
You should now be able to run g(45) and get 1 as the result, since that is the value of y that solves the problem, and g(33) to get 2.

vectorize head(which(t > x), n=1) for many values of x

I have a situation similar to the following in R:
t <- (1:100) * 15
x <- c(134, 552, 864, 5000)
And I want to find for each value in x what the first index in t where t > x is. The following works using a loop:
y <- numeric(length(x))
for (i in 1:length(x))
y[i] <- which(t > x[i])[1]
# y
# [1] 9 37 58 NA
I was taught that loops in R are 'bad and slow', and while the time this takes to run for a reasonably large x is not a deal-breaker, I would like to know whether there is a better way?
If the objects are not too big (so that RAM is not limiting), you don't need *apply functions, which are just hidden loops.
temp <- outer(x,t,'<')
y <- length(t) - (rowSums(temp)-1)
y[y>length(t)] <- NA
#[1] 9 37 58 NA
fun <- function(x){
which(t > x)[1]
}
R > sapply(x, fun)
[1] 9 37 58 NA
Almost the same:
require(functional)
apply(matrix(t > rep(x, each=length(t)), length(t)), 2, Compose(which, Curry(append, Inf), min))
## [1] 9 37 58 Inf

Resources