I have a time series of regular 15 minute data intervals and I want to take a rolling average of the data for every 2 hours, hence the width of the rollmean interval would be 8. The only issue is that the length of the original time series that I put in is 35034, but the length of the data that I get as output is 35027. Is there a way that the length of the output is the same as the input and does have data in it at the end as well. i dont want to fill it with NA's at the end
interval <- 2 #2 hours
data <- data.frame(t=streamflowDateTime,flow=streamflow)
data2hr <- data
rollingWidth <- (interval*60)/15
library(zoo)
smoothedFlow <- rollmean(data2hr$flow,rollingWidth,align="center")
I am not completely clear on how you want to do the filling but here are some ways:
1) Add the argument fill = NA to rollmean to add NA's to the ends.
2) If you want partial means at the ends then use this where x is your data, width is the number of points to take the mean of each time:
rollapply(x, width, mean, partial = TRUE)
(The points near the ends will be the mean of fewer than width points since, of course, there just aren't that many near the ends.)
3) Assuming width is odd you could pad the input series with (width-1)/2 values (possibly NA depending on what you want) on each end.
4) This keeps the original values near the ends:
out <- rollmean(x, width, fill = NA)
isNA <- is.na(out)
out[isNA] <- x[isNA]
Note: align = "center" is the default for both rollmean and rollapply so that can be omitted.
If you don't want NA, you can use fill parameter for extending the data range with simple constant values:
With keyword "extend" rollmean will extend your input vector:
rollmean(x, k, align="center", fill = "extend")
or define a three component constant for left|at|right:
rollmean(x, k, align="center", fill = c(42, 69, 666))
Related
I am trying to calculate some statistics for a moving window and am using rollapply in the zoo package. My question is how do I get rollapply to apply that function to the previous n observations instead of the current observation and the previous n-1 observations as align right seems to do.
require(zoo)
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
I have a hunch this is answered by ?rollapply "If width is a plain numeric vector its elements are regarded as widths to be interpreted in conjunction with align whereas if width is a list its components are regarded as offsets. In the above cases if the length of width is 1 then width is recycled for every by-th point. If width is a list its components represent integer offsets such that the i-th component of the list refers to time points at positions i + width[[i]]." But I have no idea what that means in terms of R code an no example is provided.
Nevermind, I deciphered the 'help.' Adding the parameter width to rollapply like this:
width=list(-1:-5)
accomplishes it.
If I'm reading correctly, you just want the column "shifted" down by one - so that the value for row n is the value that row n+1 has now.
This can be easily done using the lag function:
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
output$x1 <- lag(output$x1, 1)
output$x2 <- lag(output$x2, 1)
I wish to execute a function FUN over a rolling window of 1 year. My xts has not the same number of points per year. How can I do that in an efficient way?
P.S. usually to execute a FUN over a fixed number of datapoints (for instance 100) I use:
as.xts(rollapply(data = zoo(indicator), FUN = FUN, width = 100, align = "right"))
but obviously this doesn't work if there are not always the same number of points per year.
I'll try to answer my own question: One way to do that is:
First to NA-pad the time series so that there is one datapoint per day (or any unit relevant for your case),
(optional, depending on your FUN) Then to use na.locf to carry over the last data to fill the holes.
Finally to use the usual rollapply as shown in the question, over a fixed number of datapoints that corresponds to 1 year.
Your can use the apply.yearly(x, FUN, ...) function from the xts library.
dat <- xts(rnorm(1000), order.by = as.Date(1:1000))
plot(dat)
apply.yearly(dat, mean)
I have a data frame named abc on which I'm doing moving average using rollapply. The following code works:
forecast <- rollapply(abc, width=12, FUN=mean, align = "right", fill=NA)
Now, I want to do the same thing with the width being variable, i.e. for the 1st month, it'll be empty, for the second month, first month's value will come. For the third month, it'll be (first month+second month/2), i.e. for the ith month, if i<=12, the value will be (sum(1:i-1)/(i-1)) and for i>=12 it will be the average of the last 12 months as done by the forecast. Please help.
Here are some appraoches:
1) partial=TRUE
n <- length(x)
c(NA, rollapplyr(x, 12, mean, partial = TRUE)[-n])
Note the r at the end of rollapplyr.
2) width as list The width argument of rollapply can be a list such that the ith list element is a vector of the offsets to use for the ith rolling computation. If we specify partial=TRUE then offsets that run off the end of the vector will be ignored. If we only specify one element in the list it will be recycled:
rollapply(x, list(-seq(12)), mean, partial = TRUE, fill = NA)
2a) Rather than recycling and depending on partial we can write it out. Here we want width <- list(numeric(0), -1, -(1:2), -(1:3), ..., -(1:12), ..., -(1:12)) which can be calculated like this:
width <- lapply(seq_along(x), function(x) -seq_len(min(12, x-1)))
rollapply(x, width, mean)
This one would mainly be of interest if you want to modify the specification slightly because it is very flexible.
Note: Later in the comments the poster asked for the same rolling average except for it not to be lagged. That would be just:
rollapplyr(x, 12, mean, partial = TRUE)
Note the r at the end of rollapplyr.
Update Some improvements and additional solutions.
I have a vector of numeric samples. I have calculated a smaller vector of breaks that group the values. I would like to create a boxplot that has one box for every interval, with the width of each box coming from a third vector, the same length as the breaks vector.
Here is some sample data. Please note that my real data has thousands of samples and at least tens of breaks:
v <- c(seq.int(5), seq.int(7) * 2, seq.int(4) * 3)
v1 <- c(1, 6, 13) # breaks
v2 <- c(5, 10, 2) # relative widths
This is how I might make separate boxplots, ignorant of the widths:
boxplot(v[v1[1]:v1[2]-1])
boxplot(v[v1[2]:v1[3]-1])
boxplot(v[v1[3]:length(v)])
I would like a solution that does a single boxplot() call without excessive data conditioning. For example, putting the vector in a data frame and adding a column for region/break number seems inelegant, but I'm not yet "thinking in R", so perhaps that is best.
Base R is preferred, but I will take what I can get.
Thanks.
Try this:
v1 <- c(v1, length(v) + 1)
a01 <- unlist(mapply(rep, 1:(length(v1)-1), diff(v1)))
boxplot(v ~ a01, names= paste0("g", 1:(length(v1)-1)))
I have a set of observation in irregular grid. I want to have them in regular grid with resolution of 5. This is an example :
d <- data.frame(x=runif(1e3, 0, 30), y=runif(1e3, 0, 30), z=runif(1e3, 0, 30))
## interpolate xy grid to change irregular grid to regular
library(akima)
d2 <- with(d,interp(x, y, z, xo=seq(0, 30, length = 500),
yo=seq(0, 30, length = 500), duplicate="mean"))
how can I have the d2 in SpatialPixelDataFrame calss? which has 3 colomns, coordinates and interpolated values.
You can use code like this (thanks to the comment by #hadley):
d3 <- data.frame(x=d2$x[row(d2$z)],
y=d2$y[col(d2$z)],
z=as.vector(d2$z))
The idea here is that a matrix in R is just a vector with a bit of extra information about its dimensions. The as.vector call drops that information, turning the 500x500 matrix into a linear vector of length 500*500=250000. The subscript operator [ does the same, so although row and col originally return a matrix, that is treated as a linear vector as well. So in total, you have three matrices, turn them all to linear vectors with the same order, use two of them to index the x and y vectors, and combine the results into a single data frame.
My original solution didn't use row and col, but instead rep to formulate the x and y columns. It is a bit more difficult to understand and remember, but might be a bit more efficient, and give you some insight useful for more difficult applications.
d3 <- data.frame(x=rep(d2$x, times=500),
y=rep(d2$y, each=500),
z=as.vector(d2$z))
For this formulation, you have to know that a matrix in R is stored in column-major order. The second element of the linearized vector therefore is d2$z[2,1], so the rows number will change between two subsequent values, while the column number will remain the same for a whole column. Consequently, you want to repeat the x vector as a whole, but repeat each element of y by itself. That's what the two rep calls do.