How would one use rollapply (or some other R function) to grow the window size as the function progresses though the data. To phrase it another way, the first apply works with the first element, the second with the first two elements, the third with the first three elements etc.
If you are looking to apply min , max, sum or prod, these functions already have their cumulative counterparts as:
cummin, cummax, cumsum and cumprod
To apply more exotic functions on a growing / expanding window, you can simply use sapply
eg
# your vector of interest
x <- c(1,2,3,4,5)
sapply(seq_along(x), function(y,n) yourfunction(y[seq_len(n)]), y = x)
For a basic zoo object
x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 9, 14) - 1
x <- zoo(rnorm(5), x.Date)
# cumsum etc will work and return a zoo object
cs.zoo <- cumsum(x)
# convert back to zoo for the `sapply` solution
# here `sum`
foo.zoo <- zoo(sapply(seq_along(x), function(n,y) sum(y[seq_len(n)]), y= x), index(x))
identical(cs.zoo, foo.zoo)
## [1] TRUE
From peering at the documentation at ?zooapply I think this will do what you want, where a is your matrix and sum can be any function:
a <- cbind(1:5,1:5)
# [,1] [,2]
# [1,] 1 1
# [2,] 2 2
# [3,] 3 3
# [4,] 4 4
# [5,] 5 5
rollapply(a,width=seq_len(nrow(a)),sum,align="right")
# [,1] [,2]
# [1,] 1 1
# [2,] 3 3
# [3,] 6 6
# [4,] 10 10
# [5,] 15 15
But mnel's answer seems sufficient and more generalizable.
in addition to #mnel's answer:
For more exotic functions you can simply use sapply
and if the sapply approach takes too long, you may be better off formulating your function iteratively.
Related
I'm trying to learn how to use the apply() functions.
Suppose we have a 3 row, 2 column matrix of test <- matrix(c(1,2,3,4,5,6), ncol = 2), and we would like the maximum value of each element in the first column (1, 2, 3) to not exceed 2 for example, so we end up with a matrix of (1,2,2,4,5,6).
How would one write an apply() function to do this?
Here's my latest attempt: test1 <- apply(test[,1], 2, function(x) {if(x > 2){return(x = 2)} else {return(x)}})
We may use pmin on the first column with value 2 as the second argument, so that it does elementwise checking with the recycled 2 and gets the minimum for each value from the first column
test[,1] <- pmin(test[,1], 2)
-output
> test
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 6
Note that apply needs the 'X' as an array/matrix or one with dimensions, when we subset only a single column/row, it drops the dimensions because drop = TRUE by default
If you really want to use the apply() function, I guess you're looking for something like this:
t(apply(test, 1, function(x) c(min(x[1], 2), x[2])))
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 2 6
But if you want my opinion, akrun's suggestion is definitely better.
Suppose we have a "test" matrix that looks like this: (1,2,3, 4,5,6, 7,8,9, 10,11,12) generated by running test <- matrix(1:12, ncol = 4). A simple 3 x 4 (rows x columns) matrix of numbers running from 1 to 12.
Now suppose we'd like to add a value of 1 to each element in each odd-numbered matrix column, so we end up with a matrix of the following values: (2,3,4, 4,5,6, 8,9,10, 10,11,12). How would we use an apply() function to do this?
Note that this is a simplified example. In the more complete code I'm working with, the matrix dynamically expands/contracts based on user inputs so I need an apply() function that counts the actual number of matrix columns, rather than using a fixed assumption of 4 columns per the above example. (And I'm not adding a value of 1 to the elements; I'm running the parallel minima function test[,1] <- pmin(test1[,1], 5) to say limit each value to a max of 5).
With my current limited understanding of the apply() family of functions, all I can so far do is apply(test, 2, function(x) {return(x+1)}) but this is adding a value of 1 to all elements in all columns rather than only the odd-numbered columns.
You may simply subset the input data frame to access only odd or even numbered columns. Consider:
test[c(TRUE, FALSE)] <- apply(test[c(TRUE, FALSE)], 2, function(x) f(x))
test[c(FALSE, TRUE)] <- apply(test[c(FALSE, TRUE)], 2, function(x) f(x))
This works because the recycling rules in R will cause e.g. c(TRUE, FALSE) to be repeated however many times is needed to cover all columns in the input test data frame.
For a matrix, we need to use the drop=FALSE flag when subsetting the matrix in order to keep it in matrix form when using apply():
test <- matrix(1:12, ncol = 4)
test[,c(TRUE, FALSE)] <- apply(test[,c(TRUE, FALSE),drop=FALSE], 2, function(x) x+1)
test
[,1] [,2] [,3] [,4]
[1,] 2 4 8 10
[2,] 3 5 9 11
[3,] 4 6 10 12
^ ^ ... these columns incremented by 1
You may use modulo %% 2.
odd <- !seq(ncol(test)) %% 2 == 0
test[, odd] <- apply(test[, odd], 2, function(x) {return(x + 1)})
# [,1] [,2] [,3] [,4]
# [1,] 2 4 8 10
# [2,] 3 5 9 11
# [3,] 4 6 10 12
I want to assure that the result of which(..., arr.ind = TRUE) is always ordered, specifically: arranged ascending by (col, row). I do not see such a remark in the which function documentation, whereas it seems to be the case based on some experiments I made. How I can check / learn if it is the case?
Example. When I run the code below, the output is a matrix in which the results are arranged ascending by (col, row) columns.
> set.seed(1)
> vals <- rnorm(10)
> valsall <- sample(as.numeric(replicate(10, vals)))
> mat <- matrix(valsall, 10, 10)
> which(mat == max(mat), arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 2 2
[5,] 10 2
[6,] 1 6
[7,] 2 8
[8,] 4 8
[9,] 1 9
[10,] 6 9
Part1:
Answering a part of your question on how to understand functions on a deeper level, if the documentation is not enough, without going into the detail of function which().
As match() is not a primitive function (which are written in C), i.e. written using the basic building blocks of R, we can check what's going on behind the scenes by printing the function itself. Note that using the backticks allows to check functions that have reserved names, e.g. +, and is therefore optional in this example. This dense R code can be extremely tiresome to read, but I've found it very educational and it does solve some mental knots every once in a while.
> print(`which`)
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x))
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x00000000058673e0>
<environment: namespace:base>
Part2:
So after giving up on trying to understand the which and arrayInd function in the way described above, I'm trying it with common sense. The most efficient way to check each value of a matrix/array that makes sense to me, is to at some point convert it to a one-dimensional object. Coercion from matrix to atomic vector, or any reduction of dimensions will always result in concatenating the complete columns of each dimension, so to me it is natural that higher-level functions will also follow this fundamental rule.
> testmat <- matrix(1:10, nrow = 2, ncol = 5)
> testmat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> as.numeric(testmat)
[1] 1 2 3 4 5 6 7 8 9 10
I found Hadley Wickham's Advanced R an extremely valuable resource in answering your question, especially the chapters about functions and data structures.
[http://adv-r.had.co.nz/][1]
I want to find maximum value in each column for every 2 rows (say). How to do that in R? For example
matrix(c(3,1,20,5,4,12,6,2,9,7,8,7), byrow=T, ncol=3)
I want the output like this
matrix(c(5,4,20,7,8,9), byrow=T, ncol=3)
Here is one way of doing it.
Define a vector that contains information about the groups you want. In this case, I use rep to repeat a sequence of numbers.
Then define a helper function to calculate the column maximum of an array — this is a simple apply of max.
finally, use sapply with an anonymous function that applies colMax to each of your grouped array subsets.
The code:
groups <- rep(1:2, each=2)
colMax <- function(x)apply(x, 2, max)
t(
sapply(unique(groups), function(i)colMax(x[which(groups==i), ]))
)
The results:
[,1] [,2] [,3]
[1,] 5 4 20
[2,] 7 8 9
A one long line:
t(sapply(seq(1,nrow(df1),by=2),function(i) apply(df1[seq(i,1+i),],2,max)))
Another option,
do.call(rbind, by(m, gl(nrow(m)/2, 2), function(x) apply(x, 2, max)))
apply(mat, 2, function(x) tapply(x, # work on each column
# create groups of 2 vector of proper length: 1,1,2,2,3,3,4,4 ....
rep(1:(length(x)/2), each=2, len=length(x))
max))
[,1] [,2] [,3]
1 5 4 20
2 7 8 9
Starting with a simple matrix and a simple function:
numbers <- matrix(c(1:10), nrow = 5, ncol=2)
numbers
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
add <- function(x,y){
sum <- x + y
return(sum)
}
I'd like to add a third column that applies the add function by taking the first two elements of each row.
cheet_sheet <- cbind(numbers, apply(numbers,<MARGIN=first_2_elements_of_row>, add))
The MARGIN seems like a natural place to specify this, but MARGIN=1 is not enough, as it seems to only take one variable from each row while I need two.
With apply you send each selected margin as the first argument to the function. Your function however, requires two arguments. One way to do this without changing your function would be by defining a function of a vector (each row) and sending the first element as first argument and the second element as second:
numbers <- matrix(c(1:10), nrow = 5, ncol=2)
add <- function(x,y){
sum <- x + y
return(sum)
}
cheet_sheet <- cbind(numbers, apply(numbers,1, function(x)add(x[1],x[2])))
However, I guess this is meant purely theoretical? In this case this would be much easier:
cheet_sheet <- cbind(numbers, rowSums(numbers))
Why not just use mapply:
> cbind( numbers, mapply( add, numbers[,1], numbers[,2]))
[,1] [,2] [,3]
[1,] 1 6 7
[2,] 2 7 9
[3,] 3 8 11
[4,] 4 9 13
[5,] 5 10 15