Fama MacBeth regression pmg function error in R - r

I've been trying to run a Fama Macbeth regression using the pmg function for my data "Dev_Panel" but I keep getting this error message:
Fehler in pmg(BooktoMarket ~ Returns + Profitability + BEtoMEpersistence, :
Insufficient number of time periods
I've read in other posts on here that this could be due to NAs in the data. But I've already removed these from the panel.
Additionally, I've used the pmg function on the data frame "Em_Panel" for which I have undertaken the exact same data cleaning measures as for the "Dev_Panel". The regression for this panel worked, but it only produces a coefficient for the intercept. The other coefficients are NA.
Here's the code I used for the Em_Panel:
require(foreign)
require(plm)
require(lmtest)
Em_Panel <- read.csv2("Em_Panel.csv", na="NA")
FMR_Em <- pmg(BooktoMarket~Returns+Profitability+BEtoMEpersistence, Em_Panel, index = c("companyID", "years"))
And here's the code for the Dev_Panel:
Em_Panel <- read.csv2("Dev_Panel.csv", na="NA")
FMR_Dev <- pmg(BooktoMarket~Returns+Profitability+BEtoMEpersistence, Dev_Panel, index = c("companyID", "years"))
Since this seemingly is a problem concerning my data I will gladly provide it:
http://www.filedropper.com/empanel
http://www.filedropper.com/devpanel
Thank you so much for any help!!!
Edit
After switching the arguments as suggested the error is now produced by the Dev_Panel and not the Em_Panel.
Also the regression for the Em_Panel now only provides a coefficient for the intercept. The other coefficients are NA.

Related

Using a For Loop to Run Multiple Response Variables through a Train function to create multiple seperate models in R

I am trying to create a for loop to index thorugh each individual response variable I have and train a model using the train() funciton within the Caret Package. I have about 30 response variable and 43 predictor variables. I can train each model individually but I would like to automate the process and have a for loop run through a model (I would like to eventually upscale to multiple models if possible, i.e. lm, rf, cubist, etc.). I then want to save each model to a dataframe along with R-squared values and RMSE values. The individual models that I currenlty have that will run for me goes as follows, with column 11 being the response variable and column 35-68 being predictor variables.
data_Mg <- subset(data_3, !is.na(Mg))
mg.lm <- train(Mg~., data=data_Mg[,c(11,35:68)], method="lm", trControl=control)
mg.cubist <- train(Mg~., data=data_Mg[,c(11,35:68)], method="cubist", trControl=control)
mg.rf <- train(Mg~., data=data_Mg[,c(11,35:68)], method="rf", trControl=control, na.action = na.roughfix)
max(mg.lm$results$Rsquared)
min(mg.lm$results$RMSE)
max(mg.cubist$results$Rsquared)
min(mg.cubist$results$RMSE)
max(mg.rf$results$Rsquared) #Highest R squared
min(mg.rf$results$RMSE)
This gives me 3 models with everything the relevant information that I need. Now for the for loop. I've only tried the lm model so far for this.
bucket <- list()
for(i in 1:ncol(data_4)) { # for-loop response variables, need to end it at response variables, rn will run through all variables
data_y<-subset(data_4, !is.na(i))#get rid of NA's in the "i" column
predictors_i <- colnames(data_4)[i] # Create vector of predictor names
predictors_1.1 <- noquote(predictors_i)
i.lm <- train(predictors_1.1~., data=data_4[,c(i,35:68)], method="lm", trControl=control)
bucket <- i.lm
#mod_summaries[[i - 1]] <- summary(lm(y ~ ., data_y[ , c("i.lm", predictors_i)]))
#data_y <- data_4
}
Below is the error code that I am getting, with Bulk_Densi being the first variable in predictors_1.1. The error code is that variable lengths differ so I originally thought that my issue was that quotes were being added around "Bulk_Densi" but after trying the NoQuote() function I have not gotten anywehre so I am unsure of where I am going wrong.
Error code that I am getting
Please let me know if I can provide any extra info and thanks in advance for the help! I've already tried the info in How to train several models within a loop for and was struggling with that as well.

Cannot use coxph.predict for type="expected" with newdata in Competing Risks context

I'm using a Cox Proportional Hazards (survival::coxph) model in a competing risks context- i.e. multiple event types with one endpoint for each observation. I'm having a hard time using the coxph.predict function to show an estimate of expected number of events given a supplied set of covariates and follow-up time.
Here is an example using the mgus2 dataset in the survival package:
library(survival)
#Modify data so each subject transitions only once to a state.
crdata <- mgus2
crdata$etime <- pmin(crdata$ptime, crdata$futime)
crdata$event <- ifelse(crdata$pstat==1, 1, 2*crdata$death)
crdata$event <- factor(crdata$event, 0:2, c("censor", "PCM", "death"))
cfit <- coxph(Surv(etime, event) ~ I(age/10) + sex + mspike,
id = id, crdata)
Once I fit a model, and create a "newdata" data frame, R throws an error.
I tried using a from-scratch dataframe but this results in an error suggesting that the column size or the number of rows does not mesh:
#providing both follow-up time and covariates
nd=data.frame(etime=81 ,sex= "M", age=60, mspike=1.2)
predict(cfit, newdata=nd ,type="expected")
> Data is not the same size as it was in the original fit
I get the same issue Using model.frame when extracting the same data.frame used fitting the model.
nd=model.frame(cfit)
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
This results in the same error. Trying to use the original data frame to make predictions doesn't work either:
nd=crdata[1,]
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
I'm wondering what I'm missing here. Thanks in advance!
I've updated my survival package from 2.7 to 3.1 and the error thrown states that "expected" predict type is not available for multistate coxph.
> predict(fit,type="expected",newdata=newdat)
Error in predict.coxphms(fit, type = "expected", newdata = newdat) :
predict method not yet available for multistate coxph

Can't run glm due to the following error: "variable lengths differ (found for 'data')"

I try to run a regression using the glm function, however I keer getting the same error message: "variable lengths differ (found for 'data')". I can't see how my data does not have the same length as I use a sample of 1000 for both my dependent and independent variables. The reason I take a sample of my total data is because I have more than a million observations and I want to see if the model works properly. (running it with all the data takes a very long time) This is the code I use:
sample = sample(1:nrow(agg), 1000, replace = FALSE)
y=agg$TO_DEFAULT_IN_12M_INDICATOR[sample]
test <- glm(as.factor(y) ~., data = as.factor(agg[sample,]), family = binomial)
#coef(full.model)
Here agg contains all my data, and my y is an indicator function of 0's and 1's. Does anyone know how I could fix this problem?

R : TUKEY following one-way ANOVA

I conducted a one-way ANOVA in R, but I keep getting error messages when I attempt to do a Tukey post-hoc to see which treatments differ from each other.
(I would like the results to be ranked (a, ab, b, bcd...etc.)
DATA details:
data = "abh2"
x = 6 treatments : "treatment"
y= moisture readings "moist" (n=63 per treatment, total=378)
I ran a one-way ANOVA:
anov <- anova(lm(moist~treatment, data=abh2))
.# RESULTS indicate I can move to a post hoc (p<0.05):
Analysis of Variance Table
Response: moist
Df Sum Sq Mean Sq F value Pr(>F)
treatment 5 1706.3 341.27 25.911 < 2.2e-16 ***
I chose Tukey HSD and tried to run it with 2 methods, but get error messages for both:
Built-in R function:
TukeyHSD(anov)
# ERROR : no applicable method for 'TukeyHSD' applied to an object of class "c('anova', 'data.frame')"
Agricolae package:
HSD.test(anov, "treatment", group=TRUE, console=TRUE)
# ERROR : Error in HSD.test(anov, "treatment", group = TRUE, console = TRUE) :
argument "MSerror" is missing, with no default
I found the MSerror was
1) An "# Old version HSD.test()" (But I've just updated the agricolae package)
2) MSerror<-deviance(model)/df
So I tried:
HSD.test(anov, "treatment", MSerror=deviance(moist)/5, group=TRUE, console=TRUE)
*but still* # ERROR: $ operator is invalid for atomic vectors
Could anyone help me move ahead from here? It seems like a pretty simple problem but I've spent hours on this!
Many thanks :)
Try specifying your treatment as a factor with the following code:
abh2$treatment <- factor(abh2$treatment)
Thanks for the feed back Annie-Claude, it got me on the right track that R wasn't recognizing the data as it should.
I solved the problem using this code though:
library(agricolae)
model<-aov(moist~treatment, data=abh2)
out <- HSD.test(model,"treatment", group=TRUE,console=TRUE)
It appears that the ANOVA command that I was initially using (from R base package), was simply not understood by the Tukey test I was trying to perform afterwards (with the agricolae package).
Take-home message for me: Try to conduct a related string of analyses in the same package!
p.s. To obtain the p-values:
summary(model)
they can be converted to a dataframe like so:
as.data.frame( summary(model)[[1]] )

Predict function from Caret package give an Error

I am doing just a regular logistic regression using the caret package in R. I have a binomial response variable coded 1 or 0 that is called a SALES_FLAG and 140 numeric response variables that I used dummyVars function in R to transform to dummy variables.
data <- dummyVars(~., data = data_2, fullRank=TRUE,sep="_",levelsOnly = FALSE )
dummies<-(predict(data, data_2))
model_data<- as.data.frame(dummies)
This gives me a data frame to work with. All of the variables are numeric. Next I split into training and testing:
trainIndex <- createDataPartition(model_data$SALE_FLAG, p = .80,list = FALSE)
train <- model_data[ trainIndex,]
test <- model_data[-trainIndex,]
Time to train my model using the train function:
model <- train(SALE_FLAG~. data=train,method = "glm")
Everything runs nice and I get a model. But when I run the predict function it does not give me what I need:
predict(model, newdata =test,type="prob")
and I get an ERROR:
Error in dimnames(out)[[2]] <- modelFit$obsLevels :
length of 'dimnames' [2] not equal to array extent
On the other hand when I replace "prob" with "raw" for type inside of the predict function I get prediction but I need probabilities so I can code them into binary variable given my threshold.
Not sure why this happens. I did the same thing without using the caret package and it worked how it should:
model2 <- glm(SALE_FLAG ~ ., family = binomial(logit), data = train)
predict(model2, newdata =test, type="response")
I spend some time looking at this but not sure what is going on and it seems very weird to me. I have tried many variations of the train function meaning I didn't use the formula and used X and Y. I used method = 'bayesglm' as well to check and id gave me the same error. I hope someone can help me out. I don't need to use it since the train function to get what I need but caret package is a good package with lots of tools and I would like to be able to figure this out.
Show us str(train) and str(test). I suspect the outcome variable is numeric, which makes train think that you are doing regression. That should also be apparent from printing model. Make it a factor if you want to do classification.
Max

Resources