I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.
Related
I am trying to use the lqmm package in R and receiving the error Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1). I can successfully use it for a version of my data in which a variable called cluster_name is averaged over.
I've tried to verify that there are no NaNs or infinite values in my dataset this way:
na_data = mydata
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # yields a dataframe with no observations
is.na(na_data) <- sapply(na_data, is.infinite)
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # still a dataframe with no observations
There are no variables in my dataframe that are type char -- every such variable has been converted to a factor.
When I run my model
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = begin_data, tau=.5, na.action=na.exclude)
on the first 12,528 lines of my dataset, the model works fine. Line 12,529 looks totally normal.
Similarly, if I run tail(mydata, 11943) I get a dataframe that runs without error, but tail(mydata, 11944) gives me a dataframe that generates the error. I can also run a subset from 9990:21825 without error, but extending the dataframe on either side generates the error. The whole dataframe is 29450 observations, and thus this middle slice contains the supposedly problematic observations. I tried making a smaller version of my dataset that contained just the borders of problems, and some observations around them, and I can see that 3/4 cases involve the same subject (7645), but I don't know what to make of that. I don't see how to make this reproducible without providing the whole dataframe (in case you were wondering, the small dataset doesn't cause any error). So here is the csv file I used.
Here is the function that gets the dataframe ready for analysis:
prep_data_set <- function(data_file, brain_var = 'beta', beh_var = 'accuracy') {
data = read.csv(data_file)
data$subject <- factor(data$subject)
data$type <- factor(data$type)
data$type <- relevel(data$type, ref = "S")
data$taught <- factor(data$taught)
data <- subset(data, data$run_num < 13)
data$run = factor(data$run_num)
brain_mean <- mean(data[[brain_var]])
brain_sd <- sd(data[[brain_var]])
beh_mean <- mean(data[[beh_var]])
beh_sd <- sd(data[[beh_var]])
data <- subset(data, data$cluster_name != "")
data$cluster_name <- factor(data$cluster_name)
data$mean_centered_brain <- data[[brain_var]]
data$std_brain <- data$mean_centered_brain/brain_sd
data$mean_centered_beh <- data[[beh_var]]
data$std_beh <- data$mean_centered_beh/beh_sd
return(data)
}
I run
mydata = prep_data_set(file.path(resdir, 'robust0005', 'pos_rel_con__all_clusters.csv'))
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = mydata, tau=.5, na.action=na.exclude)
to generate the error.
By comparison
regular_model = lmer(std_brain ~ type*taught*std_beh + (1|subject/run) +
(1|subject:cluster_name), data = mydata)
runs fine.
I hope there is something interesting and generalizable in this question; I know it's kind of annoying to post to Stack Overflow with some idiosyncratic problem in a ~30000 line dataset.
I am trying to predict a single variable in R, I have tried everything but I still get no result.
I am using the Auto data and I build a model like that
my_acc<-auto_df$acceleration
my_horse<-auto_df$horsepower
mydata <- data.frame(my_acc, my_horse )
car_linear_regression <- lm(my_acc ~ my_horse, mydata )
then I go on and I want to predict the following:
What is the predicted acceleration associated with a horsepower of 93.5? What are the associated 99% confidence and prediction intervals?
I use the following code and I get a warning
Warning: 'newdata' had 1 row but variables found have 392 rows
but also prints out all the rows in the dataframe while I am suposed to get only one back.
can someone help?
predict(car_linear_regression,newdata = data.frame(horsepower = 93.5))
It would help to make the example reproducible but I did some work and was able to reproduce the error:
library(ISLR)
auto_df = Auto
my_acc<-auto_df$acceleration
my_horse<-auto_df$horsepower
mydata <- data.frame(my_acc, my_horse )
car_linear_regression <- lm(my_acc ~ my_horse, mydata )
predict(car_linear_regression,newdata = data.frame(horsepower = 93.5))
The problem is in your model there is no variable called horsepower. You called it my_horse, so the following works:
predict(car_linear_regression,newdata = data.frame(my_horse = 93.5))
As a side note, instead of creating separate variables I would just call the regression model as:
car_linear_regression <- lm(acceleration ~ horsepower, data=auto_df )
Then your original prediction would have worked.
Can some-one help me with my code, i have a code which is calculating a lot of logistic regression at the same time. i used this code also for a lm model and then it worked quite wel, however i tried to adapt it to a glm model but it does not work anymore.
Output_logistic <- data.frame()
glm_output = glm(test[,1] ~ test_2[,1], family = binomial ('logit'))
Output_2 <- data.frame(R_spuared = summary(glm_output)$r.squared)
Output_2$P_value <- summary(glm_output)$coefficients[2,4]
Output_2$Variabele <- paste(colnames(test))
Output_2$Variabele_1 <- paste(colnames(test_2))
Output_2$N_NA <- length(glm_output$na.action)
Output_2$df <- paste(glm_output$df.residual)
Output_logistic <- rbind(Output_logistic,Output_2)
running this code gives the next error:
Error in $<-.data.frame(*tmp*, "P_value", value = 9.66218350888067e-05) :
replacement has 1 row, data has 0
does anybody know what i have to adapt so that the code will work?
Thanks in advance
Your Output_2 is an empty data.frame (it has no rows) because summary(glm_output)$r.squared does not exist, because glm doesn’t report this value.
If you need the R-squared value you’ll have to calculate it yourself. But to fix the error you can simply change your code to construct the data-frame from the existing data in the summary:
output_2 = data.frame(
P_value = summary(glm_output)$coefficients[2, 4],
Variable = colnames(test),
# … etc.
)
I am new to R and trying to save my svm model in R and have read the documentation but still do not understand what is wrong.
I am getting the error "object is not a matrix" which would seem to mean that my data is not a matrix, but it is... so something is missing.
My data is defined as:
data = read.table("data.csv")
trainSet = as.data.frame(data[,1:(ncol(data)-1)])
Where the last line is my label
I am trying to define my model as:
svm.model <- svm(type ~ ., data=trainSet, type='C-classification', kernel='polynomial',scale=FALSE)
This seems like it should be correct but I am having trouble finding other examples.
Here is my code so far:
# load libraries
require(e1071)
require(pracma)
require(kernlab)
options(warn=-1)
# load dataset
SVMtimes = 1
KERNEL="polynomial"
DEGREE = 2
data = read.table("head.csv")
results10foldAll=c()
# Cross Fold for training and validation datasets
for(timesRun in 1:SVMtimes) {
cat("Running SVM = ",timesRun," result = ")
trainSet = as.data.frame(data[,1:(ncol(data)-1)])
trainClasses = as.factor(data[,ncol(data)])
model = svm(trainSet, trainClasses, type="C-classification",
kernel = KERNEL, degree = DEGREE, coef0=1, cost=1,
cachesize = 10000, cross = 10)
accAll = model$accuracies
cat(mean(accAll), "/", sd(accAll),"\n")
results10foldAll = rbind(results10foldAll, c(mean(accAll),sd(accAll)))
}
# create model
svm.model <- svm(type ~ ., data = trainSet, type='C-classification', kernel='polynomial',scale=FALSE)
An example of one of my samples would be:
10.135338 7.214543 5.758917 6.361316 0.000000 18.455875 14.082668 31
Here, trainSet is a data frame but in the svm.model function it expects data to be a matrix(where you are assigning trainSet to data). Hence, set data = as.matrix(trainSet). This should work fine.
Indeed as pointed out by #user5196900 you need a matrix to run the svm(). However beware that matrix object means all columns have same datatypes, all numeric or all categorical/factors. If this is true for your data as.matrix() may be fine.
In practice more than often people want to model.matrix() or sparse.model.matrix() (from package Matrix) which gives dummy columns for categorical variables, while having single column for numerical variables. But a matrix indeed.
I was hoping someone would be able to help me out with an issue I am having with the prediction function of the randomForest package in R. I keep getting the same error when I try to predict my test data:
Here's my code so far:
extractFeatures <- function(RCdata) {
features <- c(4, 9:13, 17:20)
fea <- RCdata[, features]
fea$Week <- as.factor(fea$Week)
fea$Age_Range <- as.factor(fea$Age_Range)
fea$Race <- as.factor(fea$Race)
fea$Referral_Source <- as.factor(fea$Referral_Source)
fea$Referral_Source_Category <- as.factor(fea$Referral_Source_Category)
fea$Rehire <- as.factor(fea$Rehire)
fea$CLFPR_.HS <- as.factor(fea$CLFPR_.HS)
fea$CLFPR_HS <- as.factor(fea$CLFPR_HS)
fea$Job_Openings <- as.factor(fea$Job_Openings)
fea$Turnover <- as.factor(fea$Turnover)
return(fea)
}
gp <- runif(nrow(RCdata))
RCdata <- RCdata[order(gp), ]
train <- RCdata[1:4600, ]
test <- RCdata[4601:6149, ]
rf <- randomForest(extractFeatures(train), suppressWarnings(as.factor(train$disposition_category)), ntree=100, importance=TRUE)
testpredict <- predict(rf, extractFeatures(test))
"Error in predict.randomForest(rf, extractFeatures(test)) :
Type of predictors in new data do not match that of the training data."
I have tried adding in the following line to the code, and still receive the same error:
testpredict <- predict(rf, extractFeatures(test), type="prob")
I found the source of the error being the fact that the training data has a level or two that is not found in the test data. So when I tried another suggestion I found online to adjust the levels of the test data to that of the training data, I keep getting NULL values in the fields I am using in both the training and test sets.
levels(test$Referral)
NULL
I can see the levels when I use the function, however.
levels(as.factor(test$Referral))
So then I tried the same suggestion I found online with adjusting the levels of the test to equal that of the training data using the following function and received an error:
levels(as.factor(test$Referral)) -> levels(as.factor(train$Referral))
Error in `levels<-.factor`(`*tmp*`, value = c(... :
number of levels differs
I am sure there is something simple I am missing (I am still very new to R), so any insight you can provide would be unbelievably helpful. Thanks!