I wanted to use rollapply in order to build a rolling window for Value at Risk function. I use the following code:
var<-rollapply(phelix, width=1000, FUN=function(x) VaR(R=phelix, p=0.95, method="historical"),by=1, by.column=TRUE )
phelix is the name of the data vector with returns. It is 3995 observations. I wanted to use a rolling window with 1000 observations. Starting from 1001 and executing the VaR function for every single observation onwards.
After executing the rollapply function I get a vector with 2996 one and the same values. It seems that my window has stuck and doesn't roll :)
Can you please help me with that? Many thanks in advance!
Rollapply repeated calls the function you supply to it with a vector that contains data within the rolling window. In your case you supply FUN=function(x), so x will contain the data within the window. However the function you define has no reference to x so so it always returns the same thing. Assuming that the first argument to VaR is the one that should receive the rolling data, you should use: var<-rollapply(phelix, width=1000, FUN=function(x) VaR(R=x, p=0.95, method="historical"),by=1, by.column=TRUE )
Related
I have two time series (ts) objects Yt and Yt1 that both contain daily values over five years (start = c(1, 1), end = c(5 ,365), frequency = 365). Yt is an original time series while Yt1 represents a smoothed and gap-filled version of Yt. I want to find the Normalized Root Mean Square Error (NRMSE) between the two time series but I'd like to get one result for each of the five years. For this, I wanted to use the aggregate() function. But since this function only takes one input variable to aggregate, I thought I can just bind the two time series together with ts.union and then call aggregate() on a function that uses matrix subsetting.
So I have data in the form of
Yt <- ts(rnorm(1825), frequency=365) # would be a seasonal signal in reality
Yt1 <- smooth(Yt) # smoothed version of Yt
Yt_union <- ts.union(Yt1, Yt)
and want to apply the NRMSE function
nrmse_fun <- function(Yt_matrix) sqrt(mean((Yt_matrix[,1] - Yt_matrix[,2])^2, na.rm=TRUE)) / mean(Yt_matrix[,2], na.rm=TRUE)
Calling aggregate() like
aggregate(Yt_union, FUN=nrmse_fun)
I expect a result in the form of
Time Series:
Start = 1
End = 5
Frequency = 1
[1] 0.1256365 0.1091591 0.0989738 0.1071725 0.1188176
However, instead I get an error
Error in Yt_matrix[, 1]: incorrect number of dimensions
I know this has probably something to do with the matrix subsetting within the NRMSE function but I don't know how I could rephrase the function so that aggregate() has no problem with it? Using a function with two arguments also wouldn't work since I need both time series to be aggregated simultaneously. I should also mention that I need the result to still be a time series object.
I'm fairly new to R programming so I don't know if there is a simple workaround I'm missing. Maybe aggregate() isn't even needed here? Any help is appreciated!
Looks like I found an easy workaround for this. If I calculate different parts of the NRMSE equation separately I can call aggregate() only when dealing with one time series.
This is my solution:
inner_term <- aggregate((Yt1 - Yt)^2, FUN = mean, na.rm = TRUE)
Yt_mean <- aggregate(Yt, FUN = mean, na.rm = TRUE)
rmse <- sqrt(inner_term)
nrmse <- rmse / Yt_mean
However, since this approach doesn't need simultaneous aggregation over two time series, I'm not sure if another solution exists that is more in line with my initial question. But like #Onyambu suggests, it might just not be possible to use the aggregate() function for matrices.
I am trying to create a column which has the mean of a variable according to subsectors of my data set. In this case, the mean is the crime rate of each state calculated from county observations, and then assigning this number to each county relative to the state they are located in. Here is the function wrote.
Create the new column
Data.Final$state_mean <- 0
Then calculate and assign the mean.
for (j in range[1:3136])
{
state <- Data.Final[j, "state"]
Data.Final[j, "state_mean"] <- mean(Data.Final$violent_crime_2009-2014,
which(Data.Final[, "state"] == state))
}
Here is the following error
Error in range[1:3137] : object of type 'builtin' is not subsettable
Very much appreciated if you could, take a few minutes to help a beginner out.
You've got a few problems:
range[1:3136] isn't valid syntax. range(1:3136) is valid syntax, but the range() function just returns the minimum and maximum. You don't need anything more than 1:3136, just use
for (j in 1:3136) instead.
Because of the dash, violent_crime_2009-2014 isn't a standard column name. You'll need to use it in backticks, Data.Final$\violent_crime_2009-2014`` or in quotes with [: Data.Final[["violent_crime_2009-2014"]] or Data.Final[, "violent_crime_2009-2014"]
Also, your code is very inefficient - you re-calculate the mean on every single time. Try having a look at the
Mean by Group R-FAQ. There are many faster and easier methods to get grouped means.
Without using extra packages, you could do
Data.Final$state_mean = ave(x = Data.Final[["violent_crime_2009-2014"]],
Data.Final$state,
FUN = mean)
For friendlier syntax and greater efficiency, the data.table and dplyr packages are popular. You can see examples using them at the link above.
Here is one of many ways this can be done (I'm sure someone will post a tidyverse answer soon if not before I manage to post):
# Data for my example:
data(InsectSprays)
# Note I have a response column and a column I could subset on
str(InsectSprays)
# Take the averages with the by var:
mn <- with(InsectSprays,aggregate(x=list(mean=count),by=list(spray=spray),FUN=mean))
# Map the means back to your data using the by var as the key to map on:
InsectSprays <- merge(InsectSprays,mn,by="spray",all=TRUE)
Since you mentioned you're a beginner, I'll just mention that whenever you can, avoid looping in R. Vectorize your operations when you can. The nice thing about using aggregate, and merge, is that you don't have to worry about errors in your mapping because you get an index shift while looping and something weird happens.
Cheers!
This is probably a basic rollapply or loop question, however I can not find a way to instruct the rollapply function or to make an expression for a loop calculation
I have a vector of growth rates with an initial value of 100. I would like to calculate the value at each point of the growth series and obtain a vector of this values. Given that the actual growth series is much longer tan the one below the example below is not possible.
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
100*(1+x[1])*(1+x[2])*(1+x[3])*(1+x[4])*(1+x[5])*(1+x[6])#End Value
a1<-100*(1+x[1])#1st value
a2<-a1*(1+x[2])#2nd value
a3<-a2*(1+x[3])#3rd value
a4<-a3*(1+x[4])#4th value
a5<-a4*(1+x[5])#5th value
a6<-a5*(1+x[6])#6th value
s<-c(a1,a,2,a,3,a4,a,5,a6) #vector of values
I believe rollapply could be used here, however I can not write the function as to take the prior value and the next sequentially as to create a function and also I am unsure if and how to incorporate the initial value of 100 in the function or adding it at the beguining of x. In addition maybe this can be done as a loop. (Find the function in pseudo code)
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
require(zoo);
fn<- function(y) {(1+prior x)*(1+next x)}
rollapply(x, 1, fun= fn, fill=NA, align='right')
Any help is welcomed
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
desired <- 100*(1+x[1])*(1+x[2])*(1+x[3])*(1+x[4])*(1+x[5])*(1+x[6])#End Value
desired
100*tail( cumprod (1+x) , 1)
Oh, dammit. I should have read the comments first. #G.Grothendieck has already been here. I suppose showing how to do it with Reduce could be useful:
> 100*Reduce("*", 1+x)
[1] 121.0506
I have a dataset, which is broken into 20 groups. The matrices storing the data for each group (2 columns of data), are stored in a list, so that I can perform functions on each set within a loop. I would like to store the output of any function that I might run in another matrix.
For example, if I run a fitdistr() on all 20 groups, I would like the output of the function stored in a matrix, so that I can call distribution[1] to call the results from group 1. I have tried the following:
distribution<-ls()
for(i in (1:20))
{ distribution[[i]]<-fitdistr(as.numeric(data[[i]]$Column2,"normal") }
This sucessfully stores the outputs, and I can call:
distribution[1]
The issue is that the fitdistr() results in 2 columns of data - a mean and a standard deviation. I checked that I cannot call the mean for a given point:
names(distribtuion)
"NULL"
So I obviously cannot call get the means, say by:
distribution[1]$mean
I will be looking for trends in the means and standard deviations (and other parameters for other distributions), so I would like to have the results of fitdistr() stored in a matrix somehow if at all possible. Even if I could somehow call only say, the mean, when running the function, then I can just create an empty vector and populated it in a loop, then repeat for the standard deviation.
I have considered creating an empty matrix large enough to store the data (so it would be 20 rows, 1 for each group, and 2 columns, 1 for each calculated value). I'm still not sure how I would dictate that I want the calculated mean stored in column 1 and the calculated standard deviation stored in column 2. Again, it is an issue of asking the function for only one of its multiple outputs at a time.
I've also looked into one of the apply functions, but these do not seem to be appropriate for what I am doing.
ls() is a function that lists the objects in a given environment. It returns a character vector.
You (probably) mean to have list().
But then you would be growing your list within a loop. Which is the second circle of R hell.
Instead use lapply with the appropriate function (hard to tell where you want the as.numeric to go, but it is not correct in your example).
something like..
distribution <- lapply(data, function(x) fitdistr(as.numeric(x[['Column2']]),"normal"))
I try am trying to use the "to.minutes3" function in the xts package to segment my data.
This function does correctly put the time column into the desired intervals. But data columns becomes "open" , "close", "high" and "low". Is there are way tell the function to average the data points that fall into the same interval?
Thanks,
Derek
You want period.apply. Assuming your data are in object x and are more frequent than 3-minutes, the code below will give you a mean for each distinct, non-overlapping, 3-minute interval.
> period.apply(x, endpoints(x,k=3,"minutes"), mean)
It looks to me like the answer is no, without completely changing that function, based on help("to.period"). to.minutes uses to.period, which says the following w.r.t. the OHLC parameter:
OHLC should an OHLC object be
returned? (only OHLC=TRUE currently
supported)
So other return values aren't supported.