Using 'PerformanceAnalytics' package to calculate Performance Measures - r

I need to use 'PerformanceAnalytics' package of R and to use this package, I understand that I need to convert the data into xts data, which is actually a panel data. Following this forum's suggestion I have done the following:
library(foreign)
RNOM <- read.dta("Return Panel without missing.dta")
RNOM_list<-split(RNOM,RNOM$gvkey)
xts_list<-lapply(RNOM_list,function(x)
{out<-xts(x[,-1],order.by=as.Date(x$datadate,format="%d/%m/%Y")) })
It gives me RNOM_list and xts_list.
After this, can some please help me to estimate the monthly returns using the function Return.calculate and lapply and save the output generated as an addition variable in my original data-set for regression analysis? Subsequently, I also need to estimate VaR, ES and semi-sd.
The data can be downloaded here. Note, prccm is the monthly closing price in the data and gvkey is the firm ID.

An efficient way to achieve this goal is to covert the Panel Data (long format) into wide format using 'reshape2' package. After performing the estimations, convert it back to long format or panel data format. Here is an example:
library(foreign)
library(reshape2)
dd <- read.dta("DDA.dta") // DDA.dta is Stata data; keep only date, id and variable of interest (i.e. three columns in total)
wdd<-dcast(dd, datadate~gvkey) // gvkey is the id
require(PerformanceAnalytics)
wddxts <- xts(wdd[,-1],order.by=as.Date(wdd$datadate,format= "%Y-%m-%d"))
ssd60A<-rollapply(wddxts,width=60,SemiDeviation,by.column=TRUE,fill=NA) // e.g of rolling window calculation
ssd60A.df<-as.data.frame(ssd60A.xts) // convert dataframe to xts
ssd60A.df$datadate=rownames(ssd60A.df) // insert time index
lssd60A.df<-melt(ssd60A.df, id.vars=c('datadate'),var='gvkey') // convert back to panel format
write.dta(lssd60A.df,"ssd60A.dta",convert.factors = "string") // export as Stata file
Then simply merge it with the master database to perform some regression.

Related

R native time series: date data

There are R native datasets, such as the Nile dataset, that are time series. However, if I actually look at the data set, be it as it was, after as_tibble(), after as.data.frame() – it doesn't matter –, there is only one column: x (which, in this specific case, is the "measurement of anual flow of the river"). However, if I plot() the data, in any of the three formats (raw, tibble or data.frame), I plots with the dates:
(Technically, the x axis label changes, but that's not the point).
Where are these dates stored? How can I access them (to use ggplot(), for example), or even – how can I see them?
If you use str(Nile) or print(Nile), you'll see that the Nile data set is store in a Time-Series object. You can use the start(), end() and frequency() functions to extract those attribute then create a new column to store those informations.
data(Nile)
new_df = data.frame(Nile)
new_df$Time = seq(from = start(Nile)[[1]], to = end(Nile)[[1]], by = frequency(Nile))

Download forecast values from tsibble (Fable)

I'm sure this is a simple question, but relatively new here. I'm trying to extract the forecasted values in a CSV/table I can use outside of R. I followed along with the multiple series example from here: https://www.mitchelloharawild.com/blog/fable/ . I'm trying to extract the 2 years forecasted data that's completed in this step:
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_state, level = NULL)
I can see the 3 models in the autoplot, but can't figure out how to get the forecasted values from the Fit tsibble. Any help is appreciated. It looks like there's quite a bit of information that can be genreated (forecast intervals, etc.), so if there's somewhere I can reference on how to parse through what all can be downloaded and how please let me know. Thanks!
The forecasted values of a fable can be saved to a csv using readr::write_csv().
When used with columns that are not in a flat format (such as forecast distributions or intervals), the values will be stored as character strings and information will be lost. Before writing to a file, you should flatten these structures by extracting their components into separate columns.
You can use unpack_hilo() to extract the lower, upper, and level values within a <hilo> to create a flat data structure. Alternatively you can access the components of a <hilo> with $, for example: my_interval$lower.

SuperLearner for survival outcome in R

I recently started reading about the SuperLearner and I am trying to run SuperLearner for survival outcome in R. I found an example code in the Targeted Learning book by Mark J. van der Laan and Sherri Rose, which require the data to be converted to long format to run.
The function that converts the data to the long format is no longer available. Here is the code:
library(survival)
data(lung)
subLung <- subset(lung, select = c(time, status, age,ph.ecog, ph.karno, pat.karno))
subLung$female <- (lung$sex - 1)
subLung <- subLung[complete.cases(subLung), ]
## Expand subLung to Long Format
longData <- SuperLearner:::createDiscrete(time =subLung$time,
event = (subLung$status == 2),dataX = subset(subLung,
select =-c(time, status)), n.delta = 30)
The createDiscrete function is no longer available in the SuperLearner package. Is there any other function that will convert the data to long format? If not, then a toy example of how to convert the data into appropriate long format would be very helpful. Or a sample R code to run SuperLearner for survival outcome would be also helpful.
I found the answer. To run SuperLearner for survival outcome, the data structure has to be converted to counting process format, meaning that, the time variable should be split in such a way that at most 1 event can happen given a time interval. The survsplit function in survival package does that! Thanks to Dr. Eric C. Polley.

Moving from zoo to xts object

I have various financial data that I am trying to merge into an xts object so I can perform multiple statistical analyses. I am having difficulty, however, with dates when moving from the original data to a zoo object to an xts object.
For instance, I read in some hedge fund return data, change the report date variable using the ymd function from the lubridate package, create a zoo object, then just as a check create a timeSeries object. All seems to be OK, but I continue to get an error when I attempt to create the xts object, as shown below:
hfIndexes$ReportDt <- ymd(hfIndexes$ReportDt)
hfIndexesZoo <- zoo(hfIndexes,order.by="ReportDt")
hfIndexesTimeSeries <- as.timeSeries(hfIndexesZoo)
hfIndexesXTS <- as.xts(hfIndexesZoo)
Error in xts(coredata(x), order.by = order.by, frequency = frequency, :
order.by requires an appropriate time-based object
What do I need to do to ensure that I have the correct time-based object to create the desired xts object?
Consider this answer: https://stackoverflow.com/a/4297342/3253015
order.by is an argument needed in xts objects. As we are dealing with timeseries, you can consider it to be one, that creates a frame of sorts, into which the data is put. So you tell as.xts that the data you want inside is spaced out by the time-based object given in order.by.

Using R to create and merge zoo object time series from csv files

I have a large set of csv files in a single directory. These files contain two columns, Date and Price. The filename of filename.csv contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function, I can fill in the missing values with the most recent observations.
I want to automate the process of.
loading the *.csv file columnar Date and Price data into R dataframes.
establishing each distinct time series within the Merged zoo "portfolio of time series" objects with an identity that is equal to each of their s.
merging these zoo objects time series using MergedData <- na.locf(merge( )).
The ultimate goal, of course, is to use the fPortfolio package.
I've used the following statement to create a data frame of Date,Price pairs. The problem with this approach is that I lose the <filename> identifier of the time series data from the files.
result <- lapply(files, function(x) x <- read.csv(x) )
I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.
Try this:
z <- read.zoo(files, header = TRUE, sep = ",")
z <- na.locf(z)
I have assumed a header line and lines like 2000-01-31,23.40 . Use whatever read.zoo arguments are necessary to accommodate whatever format you have.
You can have better formatting using sapply( keep the files names). Here I will keep lapply.
Assuming that all your files are in the same directory you can use list.files.
it is very handy for such workflow.
I would use read.zoo to get directly zoo objects(avoid later coercing)
For example:
zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv', ## I look for csv files,
## which names start with zoo_
full.names=T), ## to get full names path+filename
read.zoo)
I use now list.files again to rename my result
names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv')

Resources