alignment and offsets in rollapply - r

I am trying to calculate some statistics for a moving window and am using rollapply in the zoo package. My question is how do I get rollapply to apply that function to the previous n observations instead of the current observation and the previous n-1 observations as align right seems to do.
require(zoo)
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
I have a hunch this is answered by ?rollapply "If width is a plain numeric vector its elements are regarded as widths to be interpreted in conjunction with align whereas if width is a list its components are regarded as offsets. In the above cases if the length of width is 1 then width is recycled for every by-th point. If width is a list its components represent integer offsets such that the i-th component of the list refers to time points at positions i + width[[i]]." But I have no idea what that means in terms of R code an no example is provided.

Nevermind, I deciphered the 'help.' Adding the parameter width to rollapply like this:
width=list(-1:-5)
accomplishes it.

If I'm reading correctly, you just want the column "shifted" down by one - so that the value for row n is the value that row n+1 has now.
This can be easily done using the lag function:
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
output$x1 <- lag(output$x1, 1)
output$x2 <- lag(output$x2, 1)

Related

r - Use apply to take values of one column and calculate values for another column

I have a data frame with measurements.
One column show the measurements in mm, and the other in units (which is a relative scale varying with zoom-level on the stereo microscope I was using).
I want to go through every row of my data frame, and for each "length_mm" that equals NA, I want to use the value in "length_units" to calculate "length_mm".
So far I've tried this code:
convert_to_mm <- function(x) {
x["length_mm"==NA] <- x["length_units"]/10
x["length_mm"]
}
apply(zooplankton[,c("length_mm","length_units")], 1, convert_to_mm)
Sometimes I also have instances where both "length_mm" and "length_units" equals NA. In these cases I want to just skip the row.
This is easier than you're making it
# Which are the rows with bad values for mm? Create an indexing vector:
bad_mm <- is.na(zooplankton$length_mm)
# Now, for those rows, replace length_mm with length_units/10
zooplankton$length_mm[bad_mm] <- zooplankton$length_units[bad_mm]/10
Remember to use is.na(x) instead of x==NA when checking for NA vals. Why? Take a look at NA==NA for a hint!
Almost there. You should use is.na(myVariable) and not myVariable ==NA

rollmean length of timeseries

I have a time series of regular 15 minute data intervals and I want to take a rolling average of the data for every 2 hours, hence the width of the rollmean interval would be 8. The only issue is that the length of the original time series that I put in is 35034, but the length of the data that I get as output is 35027. Is there a way that the length of the output is the same as the input and does have data in it at the end as well. i dont want to fill it with NA's at the end
interval <- 2 #2 hours
data <- data.frame(t=streamflowDateTime,flow=streamflow)
data2hr <- data
rollingWidth <- (interval*60)/15
library(zoo)
smoothedFlow <- rollmean(data2hr$flow,rollingWidth,align="center")
I am not completely clear on how you want to do the filling but here are some ways:
1) Add the argument fill = NA to rollmean to add NA's to the ends.
2) If you want partial means at the ends then use this where x is your data, width is the number of points to take the mean of each time:
rollapply(x, width, mean, partial = TRUE)
(The points near the ends will be the mean of fewer than width points since, of course, there just aren't that many near the ends.)
3) Assuming width is odd you could pad the input series with (width-1)/2 values (possibly NA depending on what you want) on each end.
4) This keeps the original values near the ends:
out <- rollmean(x, width, fill = NA)
isNA <- is.na(out)
out[isNA] <- x[isNA]
Note: align = "center" is the default for both rollmean and rollapply so that can be omitted.
If you don't want NA, you can use fill parameter for extending the data range with simple constant values:
With keyword "extend" rollmean will extend your input vector:
rollmean(x, k, align="center", fill = "extend")
or define a three component constant for left|at|right:
rollmean(x, k, align="center", fill = c(42, 69, 666))

Moving average with changing period in R

I have a data frame named abc on which I'm doing moving average using rollapply. The following code works:
forecast <- rollapply(abc, width=12, FUN=mean, align = "right", fill=NA)
Now, I want to do the same thing with the width being variable, i.e. for the 1st month, it'll be empty, for the second month, first month's value will come. For the third month, it'll be (first month+second month/2), i.e. for the ith month, if i<=12, the value will be (sum(1:i-1)/(i-1)) and for i>=12 it will be the average of the last 12 months as done by the forecast. Please help.
Here are some appraoches:
1) partial=TRUE
n <- length(x)
c(NA, rollapplyr(x, 12, mean, partial = TRUE)[-n])
Note the r at the end of rollapplyr.
2) width as list The width argument of rollapply can be a list such that the ith list element is a vector of the offsets to use for the ith rolling computation. If we specify partial=TRUE then offsets that run off the end of the vector will be ignored. If we only specify one element in the list it will be recycled:
rollapply(x, list(-seq(12)), mean, partial = TRUE, fill = NA)
2a) Rather than recycling and depending on partial we can write it out. Here we want width <- list(numeric(0), -1, -(1:2), -(1:3), ..., -(1:12), ..., -(1:12)) which can be calculated like this:
width <- lapply(seq_along(x), function(x) -seq_len(min(12, x-1)))
rollapply(x, width, mean)
This one would mainly be of interest if you want to modify the specification slightly because it is very flexible.
Note: Later in the comments the poster asked for the same rolling average except for it not to be lagged. That would be just:
rollapplyr(x, 12, mean, partial = TRUE)
Note the r at the end of rollapplyr.
Update Some improvements and additional solutions.

converting irregular grid to regular grid

I have a set of observation in irregular grid. I want to have them in regular grid with resolution of 5. This is an example :
d <- data.frame(x=runif(1e3, 0, 30), y=runif(1e3, 0, 30), z=runif(1e3, 0, 30))
## interpolate xy grid to change irregular grid to regular
library(akima)
d2 <- with(d,interp(x, y, z, xo=seq(0, 30, length = 500),
yo=seq(0, 30, length = 500), duplicate="mean"))
how can I have the d2 in SpatialPixelDataFrame calss? which has 3 colomns, coordinates and interpolated values.
You can use code like this (thanks to the comment by #hadley):
d3 <- data.frame(x=d2$x[row(d2$z)],
y=d2$y[col(d2$z)],
z=as.vector(d2$z))
The idea here is that a matrix in R is just a vector with a bit of extra information about its dimensions. The as.vector call drops that information, turning the 500x500 matrix into a linear vector of length 500*500=250000. The subscript operator [ does the same, so although row and col originally return a matrix, that is treated as a linear vector as well. So in total, you have three matrices, turn them all to linear vectors with the same order, use two of them to index the x and y vectors, and combine the results into a single data frame.
My original solution didn't use row and col, but instead rep to formulate the x and y columns. It is a bit more difficult to understand and remember, but might be a bit more efficient, and give you some insight useful for more difficult applications.
d3 <- data.frame(x=rep(d2$x, times=500),
y=rep(d2$y, each=500),
z=as.vector(d2$z))
For this formulation, you have to know that a matrix in R is stored in column-major order. The second element of the linearized vector therefore is d2$z[2,1], so the rows number will change between two subsequent values, while the column number will remain the same for a whole column. Consequently, you want to repeat the x vector as a whole, but repeat each element of y by itself. That's what the two rep calls do.

R: apply() type function for two 2-d arrays

I'm trying to find an apply() type function that can run a function that operates on two arrays instead of one.
Sort of like:
apply(X1 = doy_stack, X2 = snow_stack, MARGIN = 2, FUN = r_part(a, b))
The data is a stack of band arrays from Landsat tiles that are stacked together using rbind. Each row contains the data from a single tile, and in the end, I need to apply a function on each column (pixel) of data in this stack. One such stack contains whether each pixel has snow on it or not, and the other stack contains the day of year for that row. I want to run a classifier (rpart) on each pixel and have it identify the snow free day of year for each pixel.
What I'm doing now is pretty silly: mapply(paste, doy, snow_free) concatenates the day of year and the snow status together for each pixel as a string, apply(strstack, 2, FUN) runs the classifer on each pixel, and inside the apply function, I'm exploding each string using strsplit. As you might imagine, this is pretty inefficient, especially on 1 million pixels x 300 tiles.
Thanks!
I wouldn't try to get too fancy. A for loop might be all you need.
out <- numeric(n)
for(i in 1:n) {
out[i] <- snow_free(doy_stack[,i], snow_stack[,i])
}
Or, if you don't want to do the bookkeeping yourself,
sapply(1:n, function(i) snow_free(doy_stack[,i], snow_stack[,i]))
I've just encountered the same problem and, if I clearly understood the question, I may have solved it using mapply.
We'll use two 10x10 matrices populated with uniform random values.
set.seed(1)
X <- matrix(runif(100), 10, 10)
set.seed(2)
Y <- matrix(runif(100), 10, 10)
Next, determine how operations between the matrices will be performed. If it is row-wise, you need to transpose X and Y then cast to data.frame. This is because a data.frame is a list with columns as list elements. mapply() assumes that you are passing a list. In this example I'll perform correlation row-wise.
res.row <- mapply(function(x, y){cor(x, y)}, as.data.frame(t(X)), as.data.frame(t(Y)))
res.row[1]
V1
0.36788
should be the same as
cor(X[1,], Y[1,])
[1] 0.36788
For column-wise operations exclude the t():
res.col <- mapply(function(x, y){cor(x, y)}, as.data.frame(X), as.data.frame(Y))
This obviously assumes that X and Y have dimensions consistent with the operation of interest (i.e. they don't have to be exactly the same dimensions). For instance, one could require a statistical test row-wise but having differing numbers of columns in each matrix.
Wouldn't it be more natural to implement this as a raster stack? With the raster package you can use entire rasters in functions (eg ras3 <- ras1^2 + ras2), as well as extract a single cell value from XY coordinates, or many cell values using a block or polygon mask.
apply can work on higher dimensions (i.e. list elements). Not sure how your data is set up, but something like this might be what you are looking for:
apply(list(doy_stack, snow_stack), c(1,2), function(x) r_part(x[1], x[2]))

Resources