Better way to apply rolling function to zoo or xts object? - r

I am wondering if there isn’t a more elegant way to do this. I tried rollapply but could never get it to respond to more than the first column of the zoo object.
I want to access a 2-dimensional zoo or xts object, create a rolling window that includes all columns, perform some operation on each instance of the rolling window, and return a matrix containing the result of the operation on each of the rolling windows. I want the operation on a windowed snippet to be assignable to a function that I externally define.
Here is an example that works, but is not very elegant:
rolling_function <- function(my_data, w, FUN = my_func)
{
## Produce a rolling window of width w starting at
## w, ending at nrow(my_data), with window width w.
## FUN is some function passed that performs some
## operation on 'snippet' and returns a value for
## each column of snippet. That is assembled into
## a matrix and returned.
## Set up a matrix to hold results
results <- matrix(ncol = ncol(my_data),
nrow = (nrow(my_data) - w + 1))
nn <-nrow(my_data)
for(jstart in 1:(nn - w + 1))
{
snippet <- window(my_data,
start = index(my_data[jstart]),
end = index(my_data[jstart + w - 1]))
## Do something with snippet here
# print(my_func(snippet))
results[jstart, ] <- FUN(snippet)
}
return(results)
}
my_func <- function(x)
{
# An example function that takes the difference between
# the first and last rows of the snippet, x
result <- as.vector(x[1,]) - as.vector(x[nrow(x),])
return(result)
}
A small test case is given below:
## Main code
## Define a zoo object with dummy dates
my_data <-zoo(matrix(data = c(1,5,6,5,3,7,8,8,8,2,4,5),
nrow = 4, ncol = 3), order.by = as.Date(100:103))
## Define a window width of 2 and call the rolling function
width = 2
print(rolling_function(my_data, width))
The test zoo object is:
1970-04-11 1 3 8
1970-04-12 5 7 2
1970-04-13 6 8 4
1970-04-14 5 8 5
and the test output is:
[,1] [,2] [,3]
[1,] -4 -4 6
[2,] -1 -1 -2
[3,] 1 0 -1
Is there a more elegant/straightforward/faster way to perform this operation, perhaps using rollapply (I could not make this work)?

Assuming the input z shown reproducibly in the Note at the end, if the width is 2 then:
library(zoo)
-diff(z)
## V2 V3 V4
## 1970-04-12 -4 -4 6
## 1970-04-13 -1 -1 -2
## 1970-04-14 1 0 -1
and in general:
w <- 2 # modify as needed
-diff(z, w-1)
## V2 V3 V4
## 1970-04-12 -4 -4 6
## 1970-04-13 -1 -1 -2
## 1970-04-14 1 0 -1
or using rollapplyr:
w <- 2 # modify as needed
rollapplyr(z, w, function(x) x[1] - x[w])
## V2 V3 V4
## 1970-04-12 -4 -4 6
## 1970-04-13 -1 -1 -2
## 1970-04-14 1 0 -1
Note
Lines <- "
1970-04-11 1 3 8
1970-04-12 5 7 2
1970-04-13 6 8 4
1970-04-14 5 8 5"
library(zoo)
z <- read.zoo(text = Lines)

Related

Varying Symbols of sign programmatically

I have a vector of numbers:
v1 <- c(1,2,3)
and I want to programmatically analyze the impact of sign change whose
variants could be:
v1[1] + v1[2] + v1[3]
[1] 6
v1[1] + v1[2] - v1[3]
[1] 0
v1[1] - v1[2] - v1[3]
[1] -4
v1[1] - v1[2] + v1[3]
[1] 2
How can I exchange signs ('+', '-') programatically? I'm thinking this is a silly question, but can't think my way out of my box, though my line of analysis points to evaluating changing signs.
Here's a quick way to get all possibilities with matrix multiplication:
signs = rep(list(c(1, -1)), length(v1))
signs = do.call(expand.grid, args = signs)
signs$sum = as.matrix(signs) %*% v1
signs
# Var1 Var2 Var3 sum
# 1 1 1 1 6
# 2 -1 1 1 4
# 3 1 -1 1 2
# 4 -1 -1 1 0
# 5 1 1 -1 0
# 6 -1 1 -1 -2
# 7 1 -1 -1 -4
# 8 -1 -1 -1 -6
If you don't want all combinations, you could filter down the signs data frame to the combos of interest, or build it in a way that only creates the combos you care about.
You may use gtools::permutations to get all possible permutations of signs, use apply to evaluate the values for each combination.
v1 <- c(1,2,3)
sign <- c('+', '-')
all_comb <- gtools::permutations(length(sign), length(v1) - 1, v = symbols, repeats.allowed = TRUE)
do.call(rbind, apply(all_comb, 1, function(x) {
exp <- do.call(sprintf, c(fmt = gsub(',', ' %s', toString(v1)), as.list(x)))
data.frame(exp, value = eval(parse(text = exp)))
}))
# exp value
#1 1 - 2 - 3 -4
#2 1 - 2 + 3 2
#3 1 + 2 - 3 0
#4 1 + 2 + 3 6

Getting the values in a matrix with row and column names stored in a dataframe [duplicate]

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.
Here is my conceptual fail:
> dat$matval <- mat[dat$I, dat$J]
Error: cannot allocate vector of length 1617278737
(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:
> str(mat[dat$I[1:100], dat$J[1:100]])
int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...
I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?
Almost. Needs to be offered to "[" as a two column matrix:
dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it.
There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.
Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:
# Huge sample data
mat <- matrix(sin(1:1e7), ncol=1000)
dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T),
J=sample.int(ncol(mat), 1e7, rep=T))
system.time( x <- mat[cbind(dat$I, dat$J)] ) # 0.51 seconds
system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds
The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.
...I also tried gsk3's apply-based solution. It's almost 500x slower though:
system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212
Here's a one-liner using apply's row-based operations
> dat <- as.data.frame(matrix(rep(seq(4),4),ncol=2))
> colnames(dat) <- c('I','J')
> dat
I J
1 1 1
2 2 2
3 3 3
4 4 4
5 1 1
6 2 2
7 3 3
8 4 4
> mat <- matrix(seq(16),ncol=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> dat$K <- apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat )
> dat
I J K
1 1 1 1
2 2 2 6
3 3 3 11
4 4 4 16
5 1 1 1
6 2 2 6
7 3 3 11
8 4 4 16
n <- 10
mat <- cor(matrix(rnorm(n*n),n,n))
ix <- matrix(NA,n*(n-1)/2,2)
k<-0
for (i in 1:(n-1)){
for (j in (i+1):n){
k <- k+1
ix[k,1]<-i
ix[k,2]<-j
}
}
o <- rep(NA,nrow(ix))
o <- mat[ix]
out <- cbind(ix,o)

Subsetting rows and columns with given indices [duplicate]

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.
Here is my conceptual fail:
> dat$matval <- mat[dat$I, dat$J]
Error: cannot allocate vector of length 1617278737
(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:
> str(mat[dat$I[1:100], dat$J[1:100]])
int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...
I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?
Almost. Needs to be offered to "[" as a two column matrix:
dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it.
There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.
Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:
# Huge sample data
mat <- matrix(sin(1:1e7), ncol=1000)
dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T),
J=sample.int(ncol(mat), 1e7, rep=T))
system.time( x <- mat[cbind(dat$I, dat$J)] ) # 0.51 seconds
system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds
The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.
...I also tried gsk3's apply-based solution. It's almost 500x slower though:
system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212
Here's a one-liner using apply's row-based operations
> dat <- as.data.frame(matrix(rep(seq(4),4),ncol=2))
> colnames(dat) <- c('I','J')
> dat
I J
1 1 1
2 2 2
3 3 3
4 4 4
5 1 1
6 2 2
7 3 3
8 4 4
> mat <- matrix(seq(16),ncol=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> dat$K <- apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat )
> dat
I J K
1 1 1 1
2 2 2 6
3 3 3 11
4 4 4 16
5 1 1 1
6 2 2 6
7 3 3 11
8 4 4 16
n <- 10
mat <- cor(matrix(rnorm(n*n),n,n))
ix <- matrix(NA,n*(n-1)/2,2)
k<-0
for (i in 1:(n-1)){
for (j in (i+1):n){
k <- k+1
ix[k,1]<-i
ix[k,2]<-j
}
}
o <- rep(NA,nrow(ix))
o <- mat[ix]
out <- cbind(ix,o)

R: How to attribute a specific value to specific elements of a matrix [duplicate]

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.
Here is my conceptual fail:
> dat$matval <- mat[dat$I, dat$J]
Error: cannot allocate vector of length 1617278737
(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:
> str(mat[dat$I[1:100], dat$J[1:100]])
int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...
I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?
Almost. Needs to be offered to "[" as a two column matrix:
dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it.
There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.
Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:
# Huge sample data
mat <- matrix(sin(1:1e7), ncol=1000)
dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T),
J=sample.int(ncol(mat), 1e7, rep=T))
system.time( x <- mat[cbind(dat$I, dat$J)] ) # 0.51 seconds
system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds
The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.
...I also tried gsk3's apply-based solution. It's almost 500x slower though:
system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212
Here's a one-liner using apply's row-based operations
> dat <- as.data.frame(matrix(rep(seq(4),4),ncol=2))
> colnames(dat) <- c('I','J')
> dat
I J
1 1 1
2 2 2
3 3 3
4 4 4
5 1 1
6 2 2
7 3 3
8 4 4
> mat <- matrix(seq(16),ncol=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> dat$K <- apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat )
> dat
I J K
1 1 1 1
2 2 2 6
3 3 3 11
4 4 4 16
5 1 1 1
6 2 2 6
7 3 3 11
8 4 4 16
n <- 10
mat <- cor(matrix(rnorm(n*n),n,n))
ix <- matrix(NA,n*(n-1)/2,2)
k<-0
for (i in 1:(n-1)){
for (j in (i+1):n){
k <- k+1
ix[k,1]<-i
ix[k,2]<-j
}
}
o <- rep(NA,nrow(ix))
o <- mat[ix]
out <- cbind(ix,o)

How to select/find coordinates within a distance from a list (X/Y) using R

I have a data frame with list of X/Y locations (>2000 rows). What I want is to select or find all the rows/locations based on a max distance. For example, from the data frame select all the locations that are between 1-100 km from each other. Any suggestions on how to do this?
You need to somehow determine the distance between each pair of rows.
The simplest way is with a corresponding distance matrix
# Assuming Thresh is your threshold
thresh <- 10
# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))
# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))
# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA
# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB") # clean up the names
Starting with:
> DT
X Y
1: -4 -10
2: 6 1
3: -2 8
4: 8 1
5: 9 -1
We get:
> pairedRows
RowA RowB
[1,] 1 2
[2,] 1 3
[3,] 2 3
[4,] 1 4
[5,] 3 4
[6,] 1 5
[7,] 3 5
These are the two functions used for creating the distance matrix
# pair-up all of the rows
createTable <- function(DT)
expand.grid(apply(DT, 1, list), apply(DT, 1, list))
# simple cartesian/pythagorean distance
distance <- function(CoordPair)
sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))
I'm not entirely clear from your question, but assuming you mean you want to take each row of coordinates and find all the other rows whose coordinates fall within a certain distance:
# Create data set for example
set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)
coords <- data.frame(
"x" = x,
"y" = y)
# Loop through all rows
lapply(1:nrow(coords), function(i) {
dis <- sqrt(
(coords[i,"x"] - coords[, "x"])^2 + # insert your preferred
(coords[i,"y"] - coords[, "y"])^2 # distance calculation here
)
names(dis) <- 1:nrow(coords) # replace this part with an index or
# row names if you have them
dis[dis > 0 & dis <= 100] # change numbers to preferred threshold
})
[[1]]
2 6 7 9 10
25.31798 95.01579 40.01250 30.87070 73.75636
[[2]]
1 6 7 9 10
25.317978 89.022469 51.107729 9.486833 60.539243
[[3]]
5 6 8
70.71068 91.78780 94.86833
[[4]]
5 10
40.16217 99.32774
[[5]]
3 4 6 10
70.71068 40.16217 93.40771 82.49242
[[6]]
1 2 3 5 7 8 9 10
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850
[[7]]
1 2 6 9 10
40.01250 51.10773 64.53681 60.41523 57.55867
[[8]]
3 6
94.86833 75.66373
[[9]]
1 2 6 7 10
30.870698 9.486833 97.082439 60.415230 67.119297
[[10]]
1 2 4 5 6 7 9
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930

Resources