R - apply to replace for loop with functions [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have seen a dozen sites to learn apply, sapply, lapply, but they don't get much farther than teaching you how to take the sum or mean of a row or column. I have to use a lot of for loops, often nested. Please show me how to replace for with apply in the following.
for (i in 1:NTR) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
for (j in 1:NOUT) {
F[i+NOUT*(j-1)] = Y[j]-YTR[i,j]
} # j
} # i
and
for (k in 1:NOUT) {
for (iwt in 1:NWT) {
m = NHID*(NNLIN+1)
if (iwt <= m) {
i = (iwt-1) %/% (NNLIN+1) + 1
j = (iwt-1) %% (NNLIN+1)
EVZ = V[k,i+1]*Z[i]*(1-Z[i])
if (j>0) EVZ = EVZ*X[j]
J[k+(n-1)*NOUT,iwt] = EVZ
}
} # iwt
} # k
Thanks very much for any replies.

The easiest way to transform a loop into an lapply is to take what's inside your for loop and plug it inside a function. Let's start with your inner loop.
foo_inner <- function(j) {
F[i+NOUT*(j-1)] = Y[j]-YTR[i,j]
}
There is one problem here though: your function should return a value instead of assigning it. In this case, you want to return Y[j]-YTR[i,j].
foo_inner <- function(j) {
return(Y[j]-YTR[i,j])
}
You can then simply apply this function to 1:NOUT. In this case, it seems (though I can't be sure given the lack of info in your post) that this return value is just a number so that you can create a vector directly instead of a list. In this case, it is better to use sapply rather than lapply (the 's' stands for simplify (to a vector) while the 'l' means list):
sapply(1:NOUT, foo_inner)
This new vector must now be assigned in your outer loop. From your code, it appears you want to assign this vector to F[i+NOUT*(1:NOUT-1)]. I simply replaced your j value by 1:NOUT which is your iterator. Your code snippet can therefore be modified like thus:
for (i in 1:NTR) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
} # i
Let's tackle the outer loop now. As before, we can plug what's inside in a function:
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
}
There is here two problems to overcome: First, your foo_inner function takes i as parameter. But, foo_inner being defined outside foo_outer, it will always use the i defined in your environment and not the argument of foo_outer. There are two solutions to this: either define foo_inner inside foo_outer:
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
foo_inner <- function(j) {
return(Y[j]-YTR[i,j])
}
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
}
Or, modify your foo_inner function so that it takes i as parameter and returns a correct function. You will then apply the function foo_inner(i) inside the inner sapply rather than foo_inner:
foo_inner <- function(i) {
function(j) {
return(Y[j]-YTR[i,j])
}
}
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner(i))
}
The next modification to do is to ensure you want to return a value rather than doing an assignment inside your function:
foo_inner <- function(i) {
function(j) {
return(Y[j]-YTR[i,j])
}
}
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
return(sapply(1:NOUT, foo_inner(i)))
}
You can now apply this function:
lapply(1:NTR, foo_outer)
In this case, I used an lapply as each element returned will be a vector so I'd rather return a list and collapse it afterwards as I'm not sure if that's what sapply does in this case (and am too lazy to find out at the moment, I'll correct if someone can confirm this).
So you want one big vector now, so I'll just collapse all that.
do.call(c, lapply(1:NTR, foo_outer))
I can then just assign this value to F directly:
F <- do.call(c, lapply(1:NTR, foo_outer))
Obviously, I can't ensure this is exactly what you wanted without knowing what F, YTR, Y and all your inputs are. But I hope it gets you started on the right path!
Edit: I think I created the wrong order for your final vector: with the above it will put all the "j" values for i= 1 then all the "j" values for i = 2... But looking back at your F, it looks like the order you want is for j=1 all the "i" values... To do this, You just have to rearrange the output of your second lapply. This hould work.
I want to create a function that will get all j elements from the list:
LL <- lapply(1:NTR, foo_outer)
get_j_values <- function(j) {
sapply(LL, function(x) x[[j]])
}
For any j value, get_j_value returns in a vector all the jth element of LL. By applying this function to all possible j values, it returns a list with first element: all "i" values for j=1 then for the second element all "i" values for j=2...
LL2 <- lapply(1:NOUT, get_j_values)
I can then collapse this list in one big vector and assign it.
F <- do.call(c, LL2)
Edit 2: Although it is possible to work with apply functions to recreate your for loop, this might be one of these times where a for loop is actually better: there is no accumulation of result so the apply method shouldn't be faster and the for loop would be clearer I believe. This is usually the case when you are looping over indices that are used over several different objects so that you can't use an apply on any specific object directly but need to apply a function to a vector of indices... Just my two cents...

Related

Apply arbitrary function that returns a list to vector in R

I recently asked a similar question (link), but the example that I gave there was a little too simple, and the answers did not work for my actual use case. Again, I am using R and want to apply a function to a vector. The function returns a list, and I want the results to be formatted as a list of vectors, where the names of the output list correspond to the names in the list returned by the function, and the value for each list element is the vector of values over the elements of the input vector. The following example shows a basic set up, together with two ways of calculating the desired output (sum.of.differences and sum.of.differences.2). The first method (sum.of.differences) seems to be the easiest way to understand what the desired output; the second method (sum.of.differences.2) avoids two major problems with the first method -- computing the function twice for each element of the input vector, and being forced to give the names of the list elements explicitly. However, the second method also seems relatively complicated for such a fundamental task. Is there a more idiomatic way to get the desired results in R?
x <- rnorm(n = 10)
a <- seq(from = -1, to = +1, by = 0.01)
sum.of.differences.fun <- function(a) {
d <- x - a
list(
sum.of.absolute.differences = sum(abs(d)),
sum.of.squared.differences = sum(d^2)
)
}
sum.of.differences <- list(
sum.of.absolute.differences = sapply(
X = a,
FUN = function(a) sum.of.differences.fun(a)$sum.of.absolute.differences
),
sum.of.squared.differences = sapply(
X = a,
FUN = function(a) sum.of.differences.fun(a)$sum.of.squared.differences
)
)
sum.of.differences.2 <- (function(lst) {
processed.lst <- lapply(
X = names(lst[[1]]),
FUN = function(name) {
sapply(
X = lst,
FUN = function(x) x[[name]]
)
}
)
names(processed.lst) <- names(lst[[1]])
return(processed.lst)
})(lapply(X = a, FUN = sum.of.differences.fun))
What language did you learn before R? It seems as though you might be using design patterns from a different functional language (I'd guess Lisp). The following code is much simpler and the output is identical (aside from names) as far as I can tell.
x <- rnorm(n = 10)
a <- seq(from = -1, to = +1, by = 0.01)
funs <- c(
sumabsdiff = function(a) sum(abs(x - a)),
sumsquarediff = function(a) sum((x - a) ^ 2)
)
sumdiff <- lapply(
funs,
function(fun) sapply(a, fun)
)

Function mapped to reduce function to concatenate vectors together

I'm trying to write a function that maps a function to reduce to concatenate a list of vectors together into 1 with the very first entry and the very last entry.
For example,
reduce(list(1:10, 11:20, 21:100), r_cat, .init = NULL)
should return a vector equal to
1:100
This is what I have so far.
r_cat = function(x, y) {
out <- y[[1]]
for(i in seq(2, length(y))) {
out <- x(out, y[[i]])
}
out
}
Any thoughts?
No need to write a new function, unlist solves your problem:
List <- list(1:10, 11:20, 21:100)
unlist(List)
If you want to use Reduce from R base, then you can use c
Reduce("c", List)
You can also get the same result plugging c into reduce from purrr
library(purrr)
reduce(List, c)

Using for() over variables that need to be changed

I'd like to be able use for() loop to automate the same operation that runs over many variables modifying them.
Here's simplest example to could design:
varToChange = list( 1:10, iris$Species[1:10], letters[1:10]) # assume that it has many more than just 3 elements
varToChange
for (i in varToChange ) {
if (is.character(y)) i <- as.integer(as.ordered(i))
if (is.factor(y)) i <- as.integer(i)
}
varToChange # <-- Here I want to see my elements as integers now
Here's actual example that led me to this question - taken from: Best way to plot automatically all data.table columns using ggplot2
In the following function
f <- function(dt, x,y,k) {
if (is.numeric(x)) x <- names(dt)[x]
if (is.numeric(y)) y <- names(dt)[y]
if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
instead of brutally repeating the same line many times, as a programmer, I would rather have a loop to repeat this line for me.
Something like this one:
for (i in c(x,y,k)) {
if (is.numeric(i)) i <- names(dt)[i]
}
In C/C++ this would have been done using pointers. In R - is it all possible?
UPDATE: Very nice idea to use Map below. However it does not work for this example
getColName <- function(dt, x) {
if (is.numeric(x)) {
x <- names(dt)[x]
}
x
}
f<- function(dt, x,y,k) {
list(x,y,k) <- Map(getColName, list(x,y,k), dt)
# if (is.numeric(x)) x <- names(dt)[x]
# if (is.numeric(y)) y <- names(dt)[y]
# if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2) # Brrr..
No need for for loop, just Map a function over each of your list items
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
myfun <- function(y) {
if (is.character(y)) y <- as.integer(as.ordered(y))
if (is.factor(y)) y <- as.integer(y)
y
}
varToChange <- Map(myfun, varToChange)
UPDATE: Map never modifies variables in place, This is simply not done in R. Use the new values returned by Map
f<- function(dt, x, y, k) {
args <- Map(function(x) getColName(dt, x), list(x=x,y=y,k=k))
ggplot(dt, aes_string(args$x,args$y, col=args$k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
You have two choices for iteration in R, iterate over variables themselves, or over their indices. I generally recommend iterating over indices. This case illustrates a strong advantage of that because your question is a non-issue if you are using indices.
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
for (i in seq_along(varToChange)) {
if (is.character(varToChange[[i]])) varToChange[[i]] <- as.integer(as.factor(varToChange[[i]]))
if (is.factor(varToChange[[i]])) varToChange[[i]] <- as.integer(varToChange[[i]])
}
I also replaced as.ordered() with as.factor() - the only difference between an ordered factor and a regular factor are the default contrasts used in modeling. As you are just coercing to integer, it doesn't matter.

R: Compute on previous elements of an ordered vector [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Given an ordered vector vec <- c(1, 4, 6, 3, 2, 7), I want to compute for each element i of vec the weighted average of the previous elements where the weight is the inverse of the distance from the element i.
The function should proceed as following.
For the first element 1, should return NA (no previous element).
For the second element 4, should return 1.
For the third element 6, should return weighted.mean(x = c(1,4), w
= c(1,2)).
For the fourth element 3, should return weighted.mean(x =
c(1,4,6), w = c(1,2,3))
The resulting vector result should be, with length(result) == length(vec), c(NA, 1, 3, 4.5, 3.9, 3.266667).
UPDATE:
I clearly mean without using a loop
result <- numeric()
for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function seq_len() instead of 1:i to generate the indexes
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
and then use it in sapply
sapply(seq_along(vec), fun, vec)
This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by sapply(), rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length result = numeric(length(vec)) and then fill it result[[i]] = ..., and better still to just let sapply() do the right thing for you!
The problem is that the naive implementation scales quadratically -- you make a pass along vec to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are n (n - 1) / 2 calculations. So...
Take a look at weighted.mean
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
and use cumsum() instead of sum() to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as x, where the ith element is the weighted mean up to that point
cumweighted.mean <- function(x, w) {
## handle NA values?
w <- as.numeric(w) # to avoid integer overflow
cumsum(x * w)[w != 0] / cumsum(w)
}
You'd like something a little different
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
This makes a single pass through the data, so scales linearly (at least in theory).

Need help in understanding this R script [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am very new in R so I would appreciate if anybody helps me to understand the following script so I can write in Matlab.
I am only interested in "R" method section of this script:
The following function is from copula package in R and it is for empirical copula:
> F.n
function (x, X, offset = 0, method = c("C", "R"))
{
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
n <- nrow(X)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
else {
method <- match.arg(method)
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
}, R = {
tX <- t(X)
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
}, stop("wrong 'method': ", method))
}
}
<environment: namespace:copula>
Walk-Through
function (x, X, offset = 0, method = c("C", "R"))
{
Function definition that takes two mandatory and two optional named arguments.
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
This makes an assertion (and exits if not true) that the return from ncol(X) is numeric (is it ever not?), that x is a matrix (2-dim array), and x and X have the same number of columns.
n <- nrow(X)
Stores the number of rows for future use. (nrow is not an "expensive" function, but it does no harm to do it this way.)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
Branch on the condition that X has only one column.
The vapply function is iterating over each element of x, passes the number to the inner function, and captures each response into another vector. The function(x.) ... is an anonymous or immediate function, and in this case is called as many times as there are elements in x. The NA_real_ is a way to tell vapply what type of variable will be returned; sapply would have worked as clearly (though not as quickly).
The resulting vector is divided element-wise by n + offset and is returned out of the overall function.
else {
method <- match.arg(method)
This ensures that method will be either "C" or "R" or the function will error-out.
The following, switch, is similar to a select or case statement, where it will do the first block of code if method is "C", the second if "R", and will stop otherwise.
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
Makes a call to a C library function named Cn_C with the provided arguments, and extracts the ec component from the return.
}, R = {
tX <- t(X)
Takes the transpose of X.
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
(See my discussion above for vapply. This time it is iterating over a sequence of incrementing numbers from 1 to the number of rows in x.)
For the anonymous (inner) function, going inside-out, it starts by comparing the transposed X with the kth row of x, resulting in a matrix of booleans. The colSums in this case is merely counting the number of TRUEs in each column that come from that matrix-wise comparison. The outer sum is counting the number of these column-wise sums that are equal to the number of columns in X (as stored in d, earlier).
This vector is divided element-wise by (n + offset), resulting in another vector.
}, stop("wrong 'method': ", method))
Throw an exception if something else was provided to method in the function call.
}
}
Wrap-Up
You really should do some more study on help(vapply) if you are not very comfortable with Map-like functions on vectors. You might benefit from reading help(match.arg), help(switch). I think the other major functions (nrow, ncol, t) are clear enough.
You probably don't need to worry about the .C(...) call to a library function, since this function is kind enough to provide an R-native implementation as well. (Well, that's what I'm inferring, at least.)

Resources