In R, what is the most efficient way of implementing
y[1] = x[1]
for (i in 2:length(x)) {
y[i] = f (y[i-1], x[i])
}
where x is my input and f is a given function?
Ideally I would have liked to be able to say
y[1] = x[1]
y[2:N] = sapply (2:N, function (i) {f(y[i-1], x[i])}) # WRONG of course!
But this is of course wrong. Is there a built-in looping function that will do this for me?
Reduce will reduce the number of keystrokes needed to do this sort of operation:
Reduce(f, x, accumulate=TRUE)
For instance, let's consider a simple case where you are just adding the elements (of course cumsum(x) would be more efficient in this case):
x <- c(1, 2, 4, 5)
Reduce("+", x, accumulate=TRUE)
# [1] 1 3 7 12
Related
I was given a task to write a function, which I name: my_mode_k.
The input is consisted of two variables:
(x, k)
as x, is a vector of natural numbers with the length of n. the greatest object of x can be k, given that k < n.
my_mode_k output is the highest frequency object of x. if there's more then one object in the vector that are common in x the same number of times - then the function will output the minimum object between them.
for example:
my_mode_k(x = c(1, 1, 2, 3, 3) , k =3)
1
This is code I wrote:
my_mode_k <- function(x, k){
n <- length(x)
x_lemma <- rep(0, k)
for(i in 1:n){
x_lemma[i] < x_lemma[i] +1
}
x_lem2 <- 1
for( j in 2:k){
if(x_lemma[x_lem2] < x_lemma[j]){
x_lem2 <- j
}
}
x_lem2
}
which isn't working properly.
for example:
my_mode_k(x = c(2,3,4,3,2,2,5,5,5,5,5,5,5,5), k=5)
[1] 1
as the function is supposed to return 5.
I don't understand why and what is the intuition to have in order to even know if a function is working properly (It took me some time to realize that it's not executing the needed task) - so I could fix the mistake in it.
Here are a few steps on how you can achieve this.
k <- 5
input <- c(2,3,4,3,3,3,3,3,3,3,2,2,5,5,5,5,5,5,5,5)
# Calculate frequencies of elements.
tbl <- table(input[input <= k])
# Find which is max. Notice that it returns the minimum of there is a tie.
tbl.max <- which.max(tbl)
# Find which value is your result.
names(tbl.max)
input <- c(2,2,3,3,3,5,5,5)
names(which.max(table(input[input <= k])))
# 3
input <- c(2,2,5,5,5,3,3,3)
names(which.max(table(input[input <= k])))
# 3
I would like to create a bunch of functions with a particular structure in the variable name as a crude workaround for what should be one function with multiple arguments (this I cannot do directly). Let's consider the following analogous example:
for(i in 1:3){
for(j in 1:2){
temp_fun <- function(x){
(x+i)^j
}
assign(paste0("fun", paste0("_plus_", i, "_pow_", j)), temp_fun)
}
}
This loop creates 6 functions that have x as dependent variable only
fun_plus_1_pow_1
fun_plus_1_pow_2
fun_plus_2_pow_1
fun_plus_2_pow_2
fun_plus_3_pow_1
fun_plus_3_pow_2
For instance fun_plus_2_pow_1(2) should return (2+2)^1 = 4, however it returns 25. I know what happens here, the values for i and j get updated while the loop is running and eventually i=3 and j=2are taken resulting in (2+3)^2 = 25.
But how can I make them local?
Here is one option. I also changed that assign stuff (creating a bunch of systematically named objects in the global environment is a clear sign to use a list instead).
funs <- matrix(list(), 3, 2, dimnames = list(paste0("plus", 1:3),
paste0("pow", 1:2)))
for(i in 1:3){
for(j in 1:2){
create_fun <- function(i, j){
#force evaluation so that the values are stored in the closure
force(i); force(j)
function(x) (x+i)^j
}
funs[i, j][[1]] <- create_fun(i, j)
}
}
funs["plus2", "pow1"][[1]](2)
#[1] 4
Why do you need to do this? Would it be sufficient to just define one function fun(x, i, j) and then use partial application:
library(pryr)
fun <- function(x, i, j) (x + i)^j
partial(fun, i = 2, j = 1)(2)
## [1] 4
# an example of passing partial(...) as a function to another function, i.e. to sapply
sapply(1:10, partial(fun, i = 2, j = 1))
## [1] 3 4 5 6 7 8 9 10 11 12
Note that partial(fun, i = i, j = j) for particular values of i and j is a function of x alone.
a <- c(0,3,7,2)
b <- 10`
I try to distribute a certain number of values (b) over the length of vector a. Instead of adding 10/4 to every value of a, I want to fill them up. the result vector for this case should be c(5,5,7,5).
what I've tried:
f = e + b
opt.vert <- function(b,a,f) {
repeat{lapply(1:length(a),
function(x) if((a[[x]] == min(a)) && (a[[x]]) < (b/length(a))){
a[[x]] <- a[[x]] +1
} else {
a[[x]] <- a[[x]]
} )
if(sum(a) >= f) break
}
return(a)
}
Apart from that approach being horribly unelegant, it also doesn't work. I'm having a hard time figuring out what's wrong in it bc it seems to drag me into an eternal loop and I therefore get no error message.
for (i in seq_len(b)) a[which.min(a)] <- a[which.min(a)] + 1
#[1] 5 5 7 5
Note that which.min returns the position of the first minimum. If you want to break ties differently, you'll have to modify this slightly.
(I suspect spending some time on the mathematical background of the task might lead to more efficient solutions that could avoid loops. Might be a nice puzzle for people with more spare time.)
something like this using recursion
a <- c(0, 3, 7, 2)
b <- 10
Reduce(function(x, y) {
idx <- which.min(x)
x[idx] <- x[idx] + 1
x
}, rep(1, b), a, accumulate=TRUE)
I am struggling to produce an efficient code to compute the vector result r result from an input vector v using this function.
r(i) = \sum_{j=i}^{i-N} [o(i)-o(j)] * exp(o(i)-o(j))
where i loops (from N to M) over the vector v. Size of v is M>>N.
Of course this is feasible with 2 nested for loops, but it is too slow for computational purposes, probably out of fashion and deprecated style...
A MWE:
for (i in c(N+1):length(v)){
csum <- 0
for (j in i:c(i-N)) {
csum <- csum + (v[i]-v[j])*exp(v[i]-v[j])
}
r[i] <- csum
}
In my real application M > 10^5 and the v vector is indeed several vectors.
I have been trying with nested applications of lapply and rollapply without success.
Any suggestion is welcome.
Thanks!
I don't know if it is any more efficient but something you can try:
r[N:M] <- sapply(N:M, function(i) tail(cumsum((v[i]-v[1:N])*exp(v[i]-v[1:N])), 1))
checking that both computations give same results, I got r with your way and r2 with mine, initializing r2 to rep(NA, M) and assessed the similarity:
all((r-r2)<1e-12, na.rm=TRUE)
# [1] TRUE
NOTE: as in #lmo answer, tail(cumsum(...), 1) can be efficiently replaced by just using sum(...):
r[N:M] <- sapply(N:M, function(i) sum((v[i]-v[1:N])*exp(v[i]-v[1:N])))
Here is a method with a single for loop.
# create new blank vector
rr <- rep(NA,M)
for(i in N:length(v)) {
rr[i] <- sum((v[i] - v[seq_len(N)]) * exp(v[i] - v[seq_len(N)]))
}
check for equality
all.equal(r, rr)
[1] TRUE
You could reduce the number of operations by 1 if you store the difference. This should add a little speed up.
for(i in N:length(v)) {
x <- v[i] - v[seq_len(N)]
rr[i] <- sum(x * exp(x))
}
I am trying to construct a new variable, z, using two pre-existing variables - x and y. Suppose for simplicity that there are only 5 observations (corresponding to 5 time periods) and that x=c(5,7,9,10,14) and y=c(0,2,1,2,3). I’m really only using the first observation in x as the initial value, and then constructing the new variable z using depreciated values of x[1] (depreciation rate of 0.05 per annum) and each of the observations over time in the vector, y. The variable I am constructing takes the form of a new 5 by 1 vector, z, and it can be obtained using the following simple commands in R:
z=NULL
for(i in 1:length(x)){n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))}
The problem I am having is that I need to define this operation as a function. That is, I need to create a function f that will spit out the vector z whenever any arbitrary vectors x and y are plugged into the function, f(x,y). I’ve been going around in circles for days now and I was wondering if someone would be kind enough to provide me with a suggestion about how to proceed. Thanks in advance.
I hope following will work for you...
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getZ = function(x,y){
z = NULL
for(i in 1:length(x)){
n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))
}
return = z
}
z = getZ(x,y)
z
5.000000 6.750000 7.412500 9.041875 11.589781
This will allow .05 (or any other value) passed in as r.
ConstructZ <- function(x, y, r){
n <- length(y)
d <- 1 - r
Z <- vector(length = n)
for(i in seq_along(x)){
n = seq_len(i)
Z[i] = sum(c(d^(i-1)*x[1],d^(i-n)*y[n]))
}
return(Z)
}
Here is a cool (if I say so myself) way to implement this as an infix operator (since you called it an operation).
ff = function (x, y, i) {
n = seq.int(i)
sum(c(0.95 ^ (i - 1) * x[[1]],
0.95 ^ (i - n) * y[n]))
}
`%dep%` = function (x, y) sapply(seq_along(x), ff, x=x, y=y)
x %dep% y
[1] 5.000000 6.750000 7.412500 9.041875 11.589781
Doing the loop multiple times and recalculating the exponents every time may be inefficient. Here's another way to implement your calculation
getval <- function(x,y,lambda=.95) {
n <- length(y)
pp <- lambda^(1:n-1)
yy <- sapply(1:n, function(i) {
sum(y * c(pp[i:1], rep.int(0, n-i)))
})
pp*x[1] + yy
}
Testing with #vrajs5's sample data
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getval(x,y)
# [1] 5.000000 6.750000 7.412500 9.041875 11.589781
but appears to be about 10x faster when testing on larger data such as
set.seed(15)
x <- rpois(200,20)
y <- rpois(200,20)
I'm not sure of how often you will run this or on what size of data so perhaps efficiency isn't a concern for you. I guess readability is often more important long-term for maintenance.