Extract years from panel data and portfolio construction - r

I have a panel data set with return, ESG score and market value for a number of companies over 11 years. I need to extract data for all variables for one year at a time, to make yearly portfolios.
The data frame looks like this:
How can I extract one year at a time and then construct portfolios of high and low ESG score for each year?
Thanks in advance

Have you considered processing the data with Python and Pandas instead of R? The following solution should help to slice your data into different time intervals:
Slice JSON File into Different Time Intercepts with Python
In terms of sorting ESG scores, you can use the following command: df.sort_values('ESG')
Hope that helps and good luck with your dataset.

Related

How to set up a time series for this project (r)?

I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?

Finding and visualizing 1 lag autocorrelation for large amount of variables at once in R

So I have a rather large dataset for Stock returns. Specifically it contains 4 columns, One with the Stock name, date, returns, and lagged returns. What I would like is to somehow illustrate how many of the Stocks that is autocorrelated with the 1-lagged return. It is especially difficult as my data contains about 20k Stocks and 227k observations. Any suggestions? :)
Thanks in advance!

Approach to predictive modelling of monthly orders using R

I have a question regarding what approach to building a predictive model in R would be best for my data.
Say I have a series of orders per month for the past 5 years. The data have three variables- month, year and sum or orders.
What is the best way to build a model that will predict the number of orders for next month based on the number of orders over the past 6 months and the normal seasonal peaks and troughs for the number of orders? What is the best way to approach this problem using R?
Unfortunately I do not have the data at hand, but am just asking generally how to approach this problem in R.
Thanks in advance.

Using acf function in r for time series data

I am new to time-series analysis and have a data set with a daily time step at 5 factor levels. My goal is to use the acf function in R to determine whether there is significant autocorrelation across the response variable of interest so that I can justify whether or not a time-series model is necessary.
I have sorted the dataset by Day, and am using the following code:
acf(DE_vec, lag.max=7)
The dataset has not been converted to a time-series object…it is a vector sorted by Day.
My first question is whether the dataframe should be converted to a time-series object, or if it is also correct to sort the vector by Day?
Second, if I have a variable repeated over the 5 levels for each Day, then should I construct 5 different acf plots for each level, or would it be ok to pool over stations as was done with the code above?
Thanks in advance,
Yes, acf() will work on a data.frame class, and yes, you should compute the ACF for each of the 5 levels separately. If you pass the entire df to acf(), it will return the ACF for each of the levels.
If you are curious about the relationship across levels, then you need to use ccf() or some mutual information metric like those in the entropy or infotheo pkgs.

Time Dummy Variables and Regressing columns of a dataframe as dependent variables

I have tried to search this question on here but I couldn't find anything so sorry if this question has already been answered. My dataset consists of daily information for a large number of stocks (1000+) over a 10 year period. So I have read my dataset as a data frame time series where each column is a separate stock. I would like to regress each of the stock against month dummy variables capture the season variation and obtain the residuals. What I have done is the following:
for (i in 1:1000){
month.f<-factor(months(time(stockinfo[,i])))
dummy<-model.matrix(month.f)
residStock[,1]<-residuals(lm(stockinfo[,i]~dummy,na.action=na.exclude))
}
#Stockinfo is data.frame
Is this the correct way to do it?
Secondly, i would like to run a regression using the residuals as the the dependent variable and other independent variables from another data frame. What would be the best way to do this, would I have to use a for loop again?
Thank you a lot for your help.
You can create a list of stocks as follows and then use Map function and can avoid R for loop (Not tested since you didn't provide the sample data)
Assume your data is mydata with month as 1,2, you use 11 months as dummy if there are 12 months
mystock<-list("APP~","INTEL~","MICROSOFT~") # stocks with tilde sign
myresi<-Map(function(x) resi(lm(as.formula(paste(x,paste(levels(as.factor(mydata$month))[-1],collapse="+"))),data=mydata),mystock) #-1 means we are using only 11 months excluding first as base month
Say your independent var is indep1,indep2, and indep3 and dependent is dep (And assuming that dep and indep are same for each stocks)
myestimate<-Map(function(x)lm(dep~indep1+indep2+indep3,data=x),myresi)

Resources