Price index modelling with a loop in R - r

I am trying to reproduce an equally weighted price index. I have the prices and a matrix which tells when a stock is in and out of the index. Here is a small part of the data.
table="date, A, B,C,D,E,F
1,31/01/1998,1,1,1,1,1,0
2,28/02/1998,1,1,1,1,1,0
3,31/03/1998,1,1,1,1,1,0
4,30/04/1998,1,1,1,1,1,0
5,31/05/1998,1,1,1,1,1,0"
matrix=read.csv(text=table)
table2="date,A,B,C,D,E,F
1,05/01/98,20.56,97.40,279.70,72.85,20.33,298.00
2,06/01/98,20.56,96.50,276.20,72.90,20.22,299.90
3,14/02/98,20.84,98.45,282.50,73.75,20.70,302.80
4,15/02/98,20.90,98.50,280.70,73.65,20.71,306.50
5,09/03/98,20.58,97.00,276.20,72.95,20.25,304.00"
price=read.csv(text=table2)
The order of the stock is the same in the price and matrix data. Since I would like to multiply the matrix with the price I turned both into matrix.
as.matrix(price)
as.matrix(matrix)
as.Date[price[,1], format="%d/%m/%y"] #Error: object of type 'closure' is not subsettable
as.Date[matrix[,1], format="%d/%m/%Y"]
(1)However here I got my first problem. The dates are not recognized whether in the matrix nor if I do it before using as.Matrix(). I also tried methods proposed here (Extract month and year from a zoo::yearmon object). I need the dates for the following reason. I would like to make a loop which 1. Takes the month and year from the matrix and searches for the same months and years in the price data. 2. If same month and years are found it should multiply the row from matrix with the rows from prices. This is due to the fact that the matrix is on monthly basis and the prices are daily. And this would be my Idea of the loop:
for (matrix(%m/%Y) in price$date){
if (matrix(%m/%Y)== price(%m/%y)
c<- matrix[position of matrix(%m/%Y),] %*% price[position of price(%m/%y),]
}
(2)However I never worked with loops before and the second question is if the for loop is suitable for my problem? The desired output of the loop would be the following:
table3="date,A,B,C,D,E,F
1,05/01/98,20.56,97.40,279.70,72.85,20.33,0
2,06/01/98,20.56,96.50,276.20,72.90,20.22,0
3,14/02/98,20.84,98.45,282.50,73.75,20.70,0
4,15/02/98,20.90,98.50,280.70,73.65,20.71,0
5,09/03/98,20.58,97.00,276.20,72.95,20.25,0"
desired_c=read.csv(text=table3)
At the end however, I would like to have an equally weighted price index like this:
table4="date, price
1,05/01/98,98.168
2,06/01/98,97.276
3,14/02/98,99.248
4,15/02/98,98.892
5,09/03/98,97.396"
desired_index=read.csv(text=table4)
if I could put that in my loop that would be great. Please note that the matrix and the prices are consisting of many observations. Therefore only deleting last column is not an option.

Related

R: Annualize rolling quaterly returns

I am new to R and have a quite basic question I guess. I couldn't find help for my specific issue.
I have a data frame consisting of two columns. The first indicating the year and quarter (e.g. 19901, 19902, 19903 etc.). The second shows the corresponding quarterly return.
Now, I want to annualize the returns. As a result I want to have a data.frame with only a year column and the corresponding annualized return.
I know there is the function Return.annualized from the ‘PerformanceAnalytics’ package. However, this function does not calculate rolling annualized returns.
Is there a nice package or function that could solve my problem?
Any help is really appreciated. Thank you!
If you have log returns, they exhibit the nice advantage of being summable over time.
If that's the case you can simply apply a rolling window that sums the four previous quarters.
Let's assume you have a vector or list with quarterly log returns called q_ret
library('zoo')
an_ret <- rollapply(q_ret, 4, sum)
Note, that from a finance perspective, this does not hold with simple returns.

How to calculate the average of different groups in a dataset using R

I have a dataset in R that I would like to find the average of a given variable for each year in the dataset (here, from 1871-2019). Not every year has the same number of entries, and so I have encountered two problems: first, how to find the average of the variable for each year, and second, how to add the column of averages to the dataset. I am unsure how to approach the first problem, but I attempted a version of the second problem by simply finding the sum of each group and then trying to add those values to the dataset for each entry of a given year with the code teams$SBtotal <- tapply(teams$SB, teams$yearID, FUN=sum). That code resulted in an error that notes replacement has 149 rows, data has 2925. I know that this can be done less quickly in Excel, but I'm hoping to be able to use R to solve this problem.
The tapply should work
data(iris)
tapply(iris$Sepal.Length, iris$Species, FUN = sum)

How to do two sorting in r when order matters

I have a data frame consisting of three variables named momentum returns(numeric),volatility (factor) and market states (factor). Volatility and market states both have two -two levels. Volatility have levels named high and low. Market states have level named positive and negative I want to make a two sorted table. I want mean of momentum returns in every case.
library(wakefield)
mom<-rnorm(30)
vol<-r_sample_factor(30,x=c("high","low"))
mar_state<-r_sample_factor(30,x=c("positive","negtive"))
df<-data.frame(mom,vol,mar)
Based on the suggestion given by #r2evans if you want mean of every sorted cases you can apply following code.
xtabs(mom~vol+mar,aggregate(mom~vol+mar,data=df,mean))
## If you want simple sum in every case
xtabs(mom~vol+mar,data=df)
You can also do this with help of data.table package. This approach will do same task in less time.
library(data.table)
df<-as.data.table(df)
## if you want results in data frame format
df[,.(mean(mom)),by=.(vol,mar)]
## if you want in simple vector form
df[,mean(mom),by=vol,mar]

How do I calculate overlapping three-day log returns in the same dataframe in R?

I've just started learning R. As for now, I have prices PRC in a dataframe test together with the date and several other variables.
My goal is to calculate the following within the same dataframe so I can maintain the connection to the date.
1. Overlapping three-day log returns
2. One-day log returns
Through other posts I came up with the following code for the three day lag returns and the one-day lag returns respectively, but I am still unsure on how to incorporate it into my dataframe:
test$logR3 <- diff(log(test$PRC)), lag=3)
This code currently doesn't work due to the difference in number of rows. How do I take this into account? Can I somehow put zeros or NAs in order to fill the missing rows?
Thank you in advance.
maybe something like:
days=c()
for(i in seq(3,nrow(test),3)){ #loop through it in steps of 3
one_day_ago_diff=log(test$PRC[i])-log(test$PRC[i-1]) #difference between today and yesterday
three_days_ago_diff=log(test$PRC[i])-log(test$PRC[i-3]) #difference between today and three days ago
days=c(days,c(three_days_ago_diff,NA,one_day_ago_diff)) # fills empty vector with diff from 3 days ago- followed by NA to skip 2 days ago and then one day ago
}
if(length(days)<nrow(test)){days=c(days, rep(NA,nrow(test)-length(days)))} #check they're the same length
test$lags=days #add column to test

Time Dummy Variables and Regressing columns of a dataframe as dependent variables

I have tried to search this question on here but I couldn't find anything so sorry if this question has already been answered. My dataset consists of daily information for a large number of stocks (1000+) over a 10 year period. So I have read my dataset as a data frame time series where each column is a separate stock. I would like to regress each of the stock against month dummy variables capture the season variation and obtain the residuals. What I have done is the following:
for (i in 1:1000){
month.f<-factor(months(time(stockinfo[,i])))
dummy<-model.matrix(month.f)
residStock[,1]<-residuals(lm(stockinfo[,i]~dummy,na.action=na.exclude))
}
#Stockinfo is data.frame
Is this the correct way to do it?
Secondly, i would like to run a regression using the residuals as the the dependent variable and other independent variables from another data frame. What would be the best way to do this, would I have to use a for loop again?
Thank you a lot for your help.
You can create a list of stocks as follows and then use Map function and can avoid R for loop (Not tested since you didn't provide the sample data)
Assume your data is mydata with month as 1,2, you use 11 months as dummy if there are 12 months
mystock<-list("APP~","INTEL~","MICROSOFT~") # stocks with tilde sign
myresi<-Map(function(x) resi(lm(as.formula(paste(x,paste(levels(as.factor(mydata$month))[-1],collapse="+"))),data=mydata),mystock) #-1 means we are using only 11 months excluding first as base month
Say your independent var is indep1,indep2, and indep3 and dependent is dep (And assuming that dep and indep are same for each stocks)
myestimate<-Map(function(x)lm(dep~indep1+indep2+indep3,data=x),myresi)

Resources