Beta(Market model regression) value company wise with moving window - r

I have a dataframe which looks something like this:
company_name co_stkdate dailyreturns marketreturn
A 01-01-2000 5.67 4.54
A 02-01-2000 3.43 1.23
A 03-01-2000 -1.01 -0.53
.
.
.
A 30-12-2018 5.65 3.45
A 31-12-2018 2.32 1.32
B 01-01-2000 -2.34 -1.12
B 02-01-2000 1.32 0.34
.
.
.
There are hundred such companies. I want to perform OLS regression company-wise with moving window of 1 year.
regression model is
dailyreturn=alpha+beta*marketreturn
After performing regression. I want to get beta value for each year.
Output should look something like.
company_name year beta
A 2000 0.87
A 2001 0.99
A 2002 0.76
A 2003 0.65
.
.
.
this is what I have done so far.
betas <- dbdf %>% group_by(co_code,company_name) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.))
This helped me to get one beta value company-wise for 2000-2018. I am not sure how to perform regression analysis with moving windows of 1 year.
Regression analysis should be from 01-01-2000 to 31-12-2000 then new window 01-01-2001 to 31-12-2001 then 01-01-2002 to 31-12-2002 and so on.

I have solved this on my own. First you need to convert date into year .You just need to replace some terms in group_by().
betas <- dbdf %>% group_by(company_name,year) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.))
model can be converted into dataframe by tiny() function in library(broom)

Related

monthly slope coefficient regressing first column with rest of the column

I like to regress the first column means market return (as y) with rest of the columns (as X) and create a data frame with the list of monthly slope coefficients. My data frame is like this
Date Marker return AFARAK GROUP PLC AFFECTO OYJ
1/3/2007 -0.45 0.00 0.85
1/4/2007 -0.92 2.47 -0.85
1/5/2007 -1.98 3.98 -1.14
The expected output of slope coefficient data frame is like this
Date AFARAK GROUP PLC AFFECTO OYJ
Jan-07 1 0.5
Feb-07 2 1.5
Mar-07 2 1
Apr-07 3 2
Could someone help me in this regard?

apply model coefficients on new data

I have two matrices sub and macro_data. They include the estimated coefficients of a model and the macro data, respectively
> sub
coeff varname
1 -1.50 gdp
2 0.005 inflation
3 -2.4 constant
> macro_data
gdp inflation
1 18.0 -0.17
2 15.8 -0.14
3 17.7 -0.15
I would like to apply the following formula: -1.5*gdp+0.005*inflation-2.4 in order to get the scores.
I have tried
for (i in 1:1){
sub$coeff[i]*macro_data[,1]+sub$coeff[i+1]*macro_data[,sub$coeff[i+1]]+sub$coeff[i+2]
}
Actually it works but this is not the best solution, because I would like something general. Any idea?
You can do a matrix multiplication:
cbind(macro_data, 1) %*% sub[, "coeff", drop=FALSE]
If your coefficients are from estimating a model, then normally the function predict.~() can take a parameter newdata= to claculate estimates for new data.
For your example data this wont work because you have dataframes. This will do:
sub <- read.table(header=TRUE, text=
"coeff varname
-1.50 gdp
0.005 inflation
-2.4 constant ")
macro_data <- read.table(header=TRUE, text=
"gdp inflation
1 18.0 -0.17
2 15.8 -0.14
3 17.7 -0.15")
m <- cbind(macro_data, constant=1)
C <- sub$coeff
names(C) <- sub$varname
m$gdp*C["gdp"] + m$inflation*C["inflation"] + m$constant*C["constant"]
The last line can be shorten to:
as.matrix(m) %*% C[names(m)]

Find where species accumulation curve reaches asymptote

I have used the specaccum() command to develop species accumulation curves for my samples.
Here is some example data:
site1<-c(0,8,9,7,0,0,0,8,0,7,8,0)
site2<-c(5,0,9,0,5,0,0,0,0,0,0,0)
site3<-c(5,0,9,0,0,0,0,0,0,6,0,0)
site4<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site5<-c(5,0,9,0,0,6,6,0,0,0,0,0)
site6<-c(5,0,9,0,0,0,6,6,0,0,0,0)
site7<-c(5,0,9,0,0,0,0,0,7,0,0,3)
site8<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site9<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site10<-c(5,0,9,0,0,0,0,0,0,0,1,6)
site11<-c(5,0,9,0,0,0,5,0,0,0,0,0)
site12<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site13<-c(5,1,9,0,0,0,0,0,0,0,0,0)
species_counts<-rbind(site1,site2,site3,site4,site5,site6,site7,site8,site9,site10,site11,site12,site13)
accum <- specaccum(species_counts, method="random", permutations=100)
plot(accum)
In order to ensure I have sampled sufficiently, I need to make sure the curve of the species accumulation plot reaches an asymptote, defined as a slope of <0.3 between the last two points (ei between sites 12 and 13).
results <- with(accum, data.frame(sites, richness, sd))
Produces this:
sites richness sd
1 1 3.46 0.9991916
2 2 4.94 1.6625403
3 3 5.94 1.7513054
4 4 7.05 1.6779918
5 5 8.03 1.6542263
6 6 8.74 1.6794660
7 7 9.32 1.5497149
8 8 9.92 1.3534841
9 9 10.51 1.0492422
10 10 11.00 0.8408750
11 11 11.35 0.7017295
12 12 11.67 0.4725816
13 13 12.00 0.0000000
I feel like I'm getting there. I could generate an lm with site vs richness and extract the exact slope (tangent?) between sites 12 and 13. Going to search a bit longer here.
Streamlining your data generation process a little bit:
species_counts <- matrix(c(0,8,9,7,0,0,0,8,0,7,8,0,
5,0,9,0,5,0,0,0,0,0,0,0, 5,0,9,0,0,0,0,0,0,6,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,0,9,0,0,6,6,0,0,0,0,0,
5,0,9,0,0,0,6,6,0,0,0,0, 5,0,9,0,0,0,0,0,7,0,0,3,
5,0,9,0,0,0,0,0,0,0,1,0, 5,0,9,0,0,0,0,0,0,0,1,0,
5,0,9,0,0,0,0,0,0,0,1,6, 5,0,9,0,0,0,5,0,0,0,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,1,9,0,0,0,0,0,0,0,0,0),
byrow=TRUE,nrow=13)
Always a good idea to set.seed() before running randomization tests (and let us know that specaccum is in the vegan package):
set.seed(101)
library(vegan)
accum <- specaccum(species_counts, method="random", permutations=100)
Extract the richness and sites components from within the returned object and compute d(richness)/d(sites) (note that the slope vector is one element shorter than the origin site/richness vectors: be careful if you're trying to match up slopes with particular numbers of sites)
(slopes <- with(accum,diff(richness)/diff(sites)))
## [1] 1.45 1.07 0.93 0.91 0.86 0.66 0.65 0.45 0.54 0.39 0.32 0.31
In this case, the slope never actually goes below 0.3, so this code for finding the first time that the slope falls below 0.3:
which(slopes<0.3)[1]
returns NA.

How to analysis irregular results by discriminant analysis in R?

I have learnt the use of LDA function in R to analysis regular results like this:
(x1&x2 are factors, G is classification)
X1 X2 G
2.95 6.63 1
2.53 7.79 1
3.57 5.65 1
3.16 5.47 2
2.16 6.22 2
Now my question is how to analysis the data if G are irregular results like (1.2;2.3;1.6...)
I also have learnt how I can get the scores of LDA, but I can't get it in QDA.
I had read the file of predict.qda and can't find the score option. Is this impossible in QDA?
This is my code:
C=read.table("clipboard",header=TRUE)
attach(T)
library(MASS)
ld=lda(G~x1+x2)
Z=predict(ld)
newG=Z$class
cbind(G,Z$x,newG)

Error in R: multi effects models

I'm having a few issue's I'd appreciate some help with.
head(new.data)
WSZ_Code Treatment_Code Year Month TTHM CL2_FREE BrO3 Colour PH TURB seasons
1 2 3 1996 1 30.7 0.35 0.5000750 0.75 7.4 0.055 winter
2 6 1 1996 2 24.8 0.25 0.5001375 0.75 6.9 0.200 winter
3 7 4 1996 2 60.4 0.05 0.5001375 0.75 7.1 0.055 winter
4 7 4 1996 2 58.1 0.15 0.5001570 0.75 7.5 0.055 winter
5 7 4 1996 3 62.2 0.20 0.5003881 2.00 7.6 0.055 spring
6 5 2 1996 3 40.3 0.15 0.5003500 2.00 7.7 0.055 spring
library(nlme)
> mod3 <- lme(TTHM ~ CL2_FREE, random= ~ 1| Treatment_Code/WSZ_Code, data=new.data, method ="ML")
> mod3
Linear mixed-effects model fit by maximum likelihood
Data: new.data
Log-likelihood: -1401.529
Fixed: TTHM ~ CL2_FREE
(Intercept) CL2_FREE
54.45240 -40.15033
Random effects:
Formula: ~1 | Treatment_Code
(Intercept)
StdDev: 0.004156934
Formula: ~1 | WSZ_Code %in% Treatment_Code
(Intercept) Residual
StdDev: 10.90637 13.52372
Number of Observations: 345
Number of Groups:
Treatment_Code WSZ_Code %in% Treatment_Code
4 8
> plot(augPred(mod3))
Error in plot(augPred(mod3)) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Error in sprintf(gettext(fmt, domain = domain), ...) :
invalid type of argument[1]: 'symbol'
I'm not sure why I get this error. The ranef plot seems OK
plot(ranef(mod3))
But that only gives the value of the random intercepts, no TTHM predictions.
I'm looking for a way to plot the predictions like in a typical augPred which would show all the random effects for each zone. Hope that makes sense.
You need a groupedData object to use augPred. I hope this helps.
Best wishes #CSJCampbell
con <- textConnection("
WSZ_Code Treatment_Code Year Month TTHM CL2_FREE BrO3 Colour PH TURB seasons
2 3 1996 1 30.7 0.35 0.5000750 0.75 7.4 0.055 winter
6 1 1996 2 24.8 0.25 0.5001375 0.75 6.9 0.200 winter
7 4 1996 2 60.4 0.05 0.5001375 0.75 7.1 0.055 winter
7 4 1996 2 58.1 0.15 0.5001570 0.75 7.5 0.055 winter
7 4 1996 3 62.2 0.20 0.5003881 2.00 7.6 0.055 spring
5 2 1996 3 40.3 0.15 0.5003500 2.00 7.7 0.055 spring
")
new.data <- read.table(con, header = TRUE)
library(nlme)
new.data.grp <- groupedData(TTHM ~ CL2_FREE | Treatment_Code/WSZ_Code, data = new.data)
mod3 <- lme(TTHM ~ CL2_FREE, random= ~ 1| Treatment_Code/WSZ_Code, data=new.data.grp, method ="ML")
mod3
ap3 <- augPred(mod3)
plot(ap3)
I realize most are probably using ggplot2 and lme4 at this point, but I'm a bit crufty.
Here are a couple of things that I've found working with lists of response variables that are fit using lme().
So, I've been working with a number of response variables that I want to fit to a particular set of inputs. In short my code looks something like
mymodels = list()
for(resp in my_response_vars){
f = as.formula(paste(resp,paste(my_input_vars,collapse='+'),sep='~'))
mymodels[[resp]] = lme(fixed=f,random=~wave|group,method="ML",
data=mydata, na.action=na.exclude)
}
I've been successful in treating the entries in the resulting list as normal lme() objects. The problem comes when I want to plot predictions via augPred(). Specifically I get the following error,
Error in tapply(object[[nm]], groups, FUN[["numeric"]], ...) :
arguments must have same length
So, after much searching, I decided to have a look under the hood of augPred() via debug(). Here are some of the insights I came to... I'm not sure that these qualify as bugs or if they would require a patch, but I hope they can help others with similar problems.
When calling augPred() the function looks for the name of the data that was used in the original lme() call, then inherits this object from the parent.frame() via a call to eval(). I'm not sure if this defaults to the object frame or the global, but, when I change this to data = object$data in the debug, things work. So, ostensibly, if you have used a subset of these data in your model, it will call on the full set of data.
The above causes issues if one response has missing values and you are interested in one that does not. Since it includes everything in the data.frame as part of an eventual call to gsummary() the missing values in the non-response variable will throw a wrench into things.
So, missing values mess things up. I have defaulted to making a temporary data.frame with the columns of interest, then running complete.cases() on this prior to fitting the lme() model.
mymods = list()
for(resp in my_response_vars){
f = as.formula(paste(resp,paste(my_input_vars,collapse='+'),sep='~'))
v2keep = all.vars(f) # grab terms
smdat = mydata[,c(v2keep,'group')] # include group
smdat=smdat[complete.cases(smdat),] # scrub missing
tmpmod = lme(fixed=f, random=~wave|group,
method='ML', data=smdat)
mymods[[resp]] = tmpmod
# include augPred() call here
}
If you are not including a primary argument in your call to augPred() it will require that your data.frame is a groupedData() object.
So, if you are running into the arguments must have the same length error, try: subsetting your data first under a different name, make sure to clear out missing rows explicitly prior to fitting your model.

Resources