calculate Value at Risk in a data frame - r

My data set has 1000s hedge fund returns for 140 months and I was trying to calculate Value at Risk (VaR) suing command VaR in PerformanceAnalytics package. However, I have come up with many questions when using this function. I have created a sample data frame to show my problem.
df=data.frame(matrix(rnorm(24),nrow=8))
df$X1<-c('2007-01','2007-02','2007-03','2007-04','2007-05','2007-06','2007-07','2007-08')
df[2,2]<-NA
df[2,3]<-NA
df[1,3]<-NA
df
I got a data frame:
X1 X2 X3
1 2007-01 -1.4420195 NA
2 2007-02 NA NA
3 2007-03 -0.4503824 -0.78506597
4 2007-04 1.4083746 0.02095307
5 2007-05 0.9636549 0.19584430
6 2007-06 1.1935281 -0.14175623
7 2007-07 -0.3986336 1.58128683
8 2007-08 0.8211377 -1.13347168
I then run
apply(df,2,FUN=VaR, na.rm=TRUE)
and received a warning message:
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
I have tried to convert my data frame into combination of time series using zoo() but it didn't help. Can someone help to figure out what should I do now?

#user2893255, you should convert your data frame into an xts-object before using the apply function:
df.xts <- as.xts(df[,2:3],order.by=as.Date(df$X1,"%Y-%m"))
and then
apply(df.xts,2,FUN=VaR, na.rm=TRUE)
gives you the result without warnings or error messages.

Try dropping the Date column:
apply(df[,-1L], 2, FUN=VaR, na.rm=TRUE)

Related

Aggregating data on monthly basis

I do have following data in R:
date category1 category2 category 3 category 4
1 2012-04-01 7496.00 77288.37 224099.15 700050.04
2 2012-04-02 24541.00 59103.94 138408.65 625006.84
3 2012-04-03 1249.00 15951.50 574170.30 249390.53
4 2012-04-04 5205.00 10866.00 0.00 358703.88
5 2012-04-05 10398.00 0.00 119745.17 270585.46
And use following script to aggregate data on monthly basis:
data <- as.xts(data$category1,order.by=as.Date(data$date))
monthly <- apply.monthly(data,sum)
monthly
Question: Instead of repeating the step each for every category and then joining each monthly dataframe, how can I apply as.xts(...) to all columns? I tried
as.xts(c("data$category1","data$category1"),order.by=as.Date(data$date))
which did not work.
Also: Is there a better way to aggregate on a monthly basis?
Use xts instead of as.xts.
apply.monthly(xts(df[ -1], order.by = as.Date(df$date)), mean)
However, this seems to only work for mean, not for sum. You can always use sapply to iterate through the columns
sapply(colnames(data[, -1]), function(x) apply.monthly(as.xts(data[,x],
order.by=as.Date(data$date)),sum))
You can use the daily2monthly function in the HydroTSM package. It can handle more than just xts for arguments, including multiple columns. Fun can be sum or mean.
monthly <- daily2monthly(data, FUN=sum, na.rm=TRUE)

Time series analysis applicability?

I have a sample data frame like this (date column format is mm-dd-YYYY):
date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6
I want to convert this data frame into time series using ts(), but the problem is: the current data frame has multiple values for the same date. Can we apply time series in this case?
Can I convert data frame into time series, and build a model (ARIMA) which can forecast count value on a daily basis?
OR should I forecast count value based on grp, but in that case, I have to select only grp and count column of a data frame. So in that case, I have to skip date column, and daily forecast for count value is not possible?
Suppose if I want to aggregate count value on per day basis. I tried with aggregate function, but there we have to specify date value, but I have a very large data set? Any other option available in r?
Can somebody, please, suggest if there is a better approach to follow? My assumption is that the time series forcast works only for bivariate data? Is this assumption right?
It seems like there are two aspects of your problem:
i want to convert this data frame into time series using ts(), but the
problem is- current data frame having multiple values for the same
date. can we apply time series in this case?
If you are happy making use of the xts package you could attempt:
dta2$date <- as.Date(dta2$date, "%d-%m-%Y")
dtaXTS <- xts::as.xts(dta2[,2:3], dta2$date)
which would result in:
>> head(dtaXTS)
count grp
2009-09-01 54 1
2009-09-01 100 2
2009-09-01 546 3
2009-10-01 67 4
2009-11-01 80 5
2009-11-01 45 6
of the following classes:
>> class(dtaXTS)
[1] "xts" "zoo"
You could then use your time series object as univariate time series and refer to the selected variable or as a multivariate time series, example using PerformanceAnalytics packages:
PerformanceAnalytics::chart.TimeSeries(dtaXTS)
Side points
Concerning your second question:
can somebody plz suggest me what is the better approach to follow, my
assumption is time series forcast is works only for bivariate data? is
this assumption also right?
IMHO, this is rather broad. I would suggest that you use created xts object and elaborate on the model you want to utilise and why, if it's a conceptual question about nature of time series analysis you may prefer to post your follow-up question on CrossValidated.
Data sourced via: dta2 <- read.delim(pipe("pbpaste"), sep = "") using the provided example.
Since daily forecasts are wanted we need to aggregate to daily. Using DF from the Note at the end, read the first two columns of data into a zoo series z using read.zoo and argument aggregate=sum. We could optionally convert that to a "ts" series (tser <- as.ts(z)) although this is unnecessary for many forecasting functions. In particular, checking out the source code of auto.arima we see that it runs x <- as.ts(x) on its input before further processing. Finally run auto.arima, forecast or other forecasting function.
library(forecast)
library(zoo)
z <- read.zoo(DF[1:2], format = "%m-%d-%Y", aggregate = sum)
auto.arima(z)
forecast(z)
Note: DF is given reproducibly here:
Lines <- "date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6"
DF <- read.table(text = Lines, header = TRUE)
Updated: Revised after re-reading question.

Creating with time series from a dataset including missing values

I need to create a time series from a data frame. The problem is variables is not well-ordered. Data frame is like below
Cases Date
15 1/2009
30 3/2010
45 12/2013
I have 60 observations like that. As you can see, data was collected randomly, which is starting from 1/2008 and ending 12/2013 ( There are many missing values(cases) in bulk of the months between these years). My assumption will be there is no cases in that months. So, how can I convert this dataset as time series? Then, I will try to make some prediction for possible number of cases in future.
Try installing the plyr library,
install.packages("plyr")
and then to sum duplicated Date2 rows:
library(plyr)
mergedData <- ddply(dat, .(Date2), .fun = function(x) {
data.frame(Cases = sum(x$Cases))
})
> head(mergedData)
Date2 Cases
1 2008-01-01 16352
2 2008-11-01 10
3 2009-01-01 23
4 2009-02-01 138
5 2009-04-01 18
6 2009-06-01 3534
you can create a separate sequence of time series and merge with data series.This will create a complete time series with missing values as NA.
if df is your data frame with Date as column of date than create new time series ts and merge as below.
ts <- data.frame(Date = seq(as.Date("2008-01-01"), as.Date("2013-12-31"), by="1 month"))
dfwithmisisng <- merge(ts, df, by="Date", all=T)

How to align dates for merging two xts files?

I'm trying to analyze 1-year %-change data in R on two data series by merging them into one file. One series is weekly and the other is monthly. Converting the weekly series to monthly is the problem. Using apply.monthly() on the weekly data creates a monthly file but with intra-monthly dates that don't match the first-day-of-month format in the monthly series after combining the two files via merge.xts(). Question: How to change the resulting merged file (sample below) to one monthly entry for both series?
2012-11-01 0.02079801 NA
2012-11-24 NA -0.03375796
2012-12-01 0.02052502 NA
2012-12-29 NA 0.04442094
2013-01-01 0.01881466 NA
2013-01-26 NA 0.06370272
2013-02-01 0.01859883 NA
2013-02-23 NA 0.02999318
You can pass indexAt="firstof" in a call to to.monthly to get monthly data using the first of the month for the index.
library(quantmod)
getSymbols(c("USPRIV", "ICSA"), src="FRED")
merge(USPRIV, to.monthly(ICSA, indexAt="firstof", OHLC=FALSE))
Something like this:
do.call(rbind, by(d[-1], d[[1]] - as.POSIXlt(d[[1]])$mday, FUN=apply, 2, sum, na.rm=TRUE))
## V2 V3
## 2012-10-31 0.02079801 -0.03375796
## 2012-11-30 0.02052502 0.04442094
## 2012-12-31 0.01881466 0.06370272
## 2013-01-31 0.01859883 0.02999318
Note that the dates are encoded as row names, not as a column in the result.
It is a frequently occurring issue. And sometimes I forget my own solution for it and google does not easily lead to one. So I am posting my solution here.
Basically, just convert the index of monthly aggregated series to yearmon. You can also optionally convert it back to yyyy-mm-dd (to 1st of each month ) format with as.date . After the exact dates are stripped and the indices are 'homogenised' , all the columns align perfectly.
# Here with dplyr
time(myxts)<- time(myxts) %>% as.yearmon() %%> as.date()
#or without dplyr
time(myxts)<- as.date( as.yearmon( time(myxts) ) )

Data aggregation loop in R

I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.
I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.
I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,
Jan
Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html
Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:
Group.1 x
1 2006-10-06 12:00:00 636.5395
2 2006-10-06 20:00:00 859.0109
3 2006-10-07 04:00:00 301.8548
4 2006-10-07 12:00:00 649.3357
5 2006-10-07 20:00:00 944.8272
6 2006-10-08 04:00:00 136.7393
7 2006-10-08 12:00:00 360.9560
8 2006-10-08 20:00:00 NaN
The code used was:
dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)
Don't use a loop. Use R. Some example data :
dates <- rep(seq(as.Date("2001-01-05"),
as.Date("2001-01-20"),
by="day"),
each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA
and any of :
aggregate(values,list(dates),mean,na.rm=TRUE)
tapply(values,dates,mean,na.rm=TRUE)
gives you what you want. See also ?aggregate and ?tapply.
If you want a dataframe back, you can look at the package plyr :
Data <- as.data.frame(dates,values)
require(plyr)
ddply(data,"dates",mean,na.rm=TRUE)
Keep in mind that ddply is not fully supporting the date format (yet).
Look at the data.table package especially if your data is huge. Here is some code that calculates the mean of dist by day.
library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']
It looks like your main problem is that your date field has times attached. The first thing you need to do is create a column that has just the date using something like
Dis_sub$date_only <- as.Date(Dis_sub$date)
Then using Joris Meys' solution (which is the right way to do it) should work.
However if for some reason you really want to use a loop you could try something like
newFrame <- data.frame()
for d in unique(Dis_sub$date){
meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
newFrame <- rbind(newFrame,c(d,meanDist))
}
But keep in mind that this will be slow and memory-inefficient.

Resources