Using the panel regression on Hedonic data using plm package in R - r

I am trying to run the panel regression for unbalanced panel in R using the plm package. I am using the 'Hedonic' data to run the same.
I was trying to replicate something similar that is done in the following paper: http://ftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/vignettes/plm/plmEN.pdf (page 14, 3.2.5 Unbalanced Panel).
My code looks something like this:
form = mv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + tax + ptratio + blacks + lstat
ba = plm(form, data = Hedonic)
However, I am getting the following error on execution:
Error in names(y) <- namesy :
'names' attribute [506] must be the same length as the vector [0]
traceback() yields the following result:
4: pmodel.response.pFormula(formula, data, model = model, effect = effect,
theta = theta)
3: pmodel.response(formula, data, model = model, effect = effect,
theta = theta)
2: plm.fit(formula, data, model, effect, random.method, random.dfcor,
inst.method)
1: plm(form, data = Hedonic)
I am new to panel regression and would be really grateful if someone can help me with this issue.
Thanks.

That paper is ten years old, and I'm not sure plm works like that. The latest docs are here https://cran.r-project.org/web/packages/plm/vignettes/plm.pdf
Your problem arises because, in the docs:
the current version of plm is capable of working with a regular
data.frame without any further transformation, provided that the
individual and time indexes are in the first two columns,
The Hedonic data set does not have individual and time indexes in the first two columns. I'm not sure where the individual and time indexes are in the data, but if I specify townid for the index I at least get something that runs:
> p <- plm(mv~crim,data=Hedonic)
Error in names(y) <- namesy :
'names' attribute [506] must be the same length as the vector [0]
> p <- plm(mv~crim,data=Hedonic, index="townid")
> p
Model Formula: mv ~ crim
Coefficients:
crim
-0.0097455
because when you don't specify id and time indexes it is going to try using the first two columns, and in Hedonic that is giving unique numbers for the id, so the whole model falls apart.
If you look at the examples in help(plm) you might notice that the first two columns in all the data sets define the id and the time.

Related

Simple slopes: seq.default(to = nrow(grid)) : 'to' must be of length 1

I am trying to create some simple slopes using the following code:
model <- lm(EE ~ HO4 + A + interaction, data = MyData)
> Coefficients:
(Intercept) HO4 AFactor1 AFactor2 AFactor3 AFactor4
4.6190 -0.2876 -1.9633 -1.4149 -1.8414 -1.6004
AFactor5 interactionPlot
-1.7431 -0.1724
My simple model seems to be working, as the output shows.
So what I try to do is inserting my 3 variables I use for my moderation analysis.
I do know, that those 3 variables and the interaction (ZHO4 * ZA) output a significant moderation.
After creating a simple model I try to run the simple slopes function, resulting in this error:
library("reghelper")
simple_slopes(model)
> Error in seq.default(to = nrow(grid)) : 'to' must be of length 1
EE = Dependent Variable
HO4 = Independent Variable
A = Moderator
Z = standardized
How can I get it to work?
Or are there any other options to output simple slopes?

Leads/Lags in linear model with a subsample of the data frame in R

I want to perform in R the next linear model:
\begin{equation}
lPC_t = \beta_0 + \beta_1PIBtvh_{t+1} + \beta_2txDes_t + \beta_3Spread_{t+4} + u_t
\end{equation}
The name of my data frame is Dados_R. I need to impose a restriction in the data once I want to estimate over just the observations between 19 and 45. The problem is that when I create the variables with the lead I cannot change the scope of them, or at least I cannot do it, unless I change the original data frame by myself what is not convenient once I want to perform more models with different leads.
So my question is how can I change the range of the variables that I created (leadPIBtvh0 e leadSpread0), in such a way that allows me to perform the linear model with just the observations between 19 and 45?
The code that I wrote:
attach(Dados_R)
leadPIBtvh0=lag(PIBtvh,1)
leadSpread0=lag(Spread,4)
data=Dados_R[19:45,]
detach(Dados_R)
attach(data)
lPC=log(PC/(1-PC))
lm_lPC=lm(lPC~leadPIBtvh0+txDes+leadSpread0)
This code give me the error (that I understood):
Error in model.frame.default(formula = lPC ~ leadPIBtvh0 + txDes + leadSpread0, : :
variable lengths differ (found for 'leadPIBtvh0')

R: How to update model frame after reducing model formula

I am working a phylogenetic multiple regression using the caper package on Windows 7, and am receiving a Model frame / formula mismatch error consistently when ever I try to graph a residual leverage plot after generating a reduced model.
Here is the minimal code needed to reproduce the error:
g <- Response ~ (Name1 + Name2 + Name3 + Name4 + Name5 + Name6 + Name7)^2 + Name1Sqd
+ Name2Sqd + Name3Sqd + Name4Sqd + Name5Sqd + Name6Sqd + Name7Sqd
crunchMod <- crunch(g, data = contrasts)
plot(crunchMod, which=c(5)) ####Works just fine####
varName <- row.names(summary(crunchMod)$coefficients)[1]
#it doesn't matter which predictor I remove.
Reduce(paste, deparse(g))
g <- as.formula(paste(Reduce(paste, deparse(g)), as.name(varName), sep=" - "))
#Edits the model formula to remove varName
crunchMod <- crunch(g, data = contrasts)
plot(crunchMod, which=c(5)) ####Error Happens Here####
When I try to graph a residual leverage plot to look at the effects of model complexity, I get the following error:
Error in model.matrix.default(object, data = list(Response = c(-0.0458443124730482,
: model frame and formula mismatch in model.matrix()
The code that starts this error is: plot(crunchMod, which=c(5)) where crunchMod
holds my regression model via crunchMod <- crunch(g, data = contrasts) from the
caper Package on Windows 7 OS.
How can I update my model frame to be able to examine cook's distance again (either graphically or numerically)?
Within the source code of crunch() there was the implementation:
data <- subset(data, select = all.vars(formula))
which has the side effect of making all interaction effects from a deleted primary effect invalid in the model frame. This becomes more apparent when one realizes that plotting cook's distance vs leverage will work if he/she only deletes interaction effects.
Thus to solve this problem, all interaction effects must be included in the original data frame before calling crunch() to create a linear model. While this makes transforming the data slightly more complicated, it is easy to add these interactions following these two links:
Generating interaction variables in R dataframes (second answer down)
http://www.r-bloggers.com/type-conversion-and-you-or-and-r/

I get many predictions after running predict.lm in R for 1 row of new input data

I used ApacheData data with 83784 rows to build a linear regression model:
fit <-lm(tomorrow_apache~ as.factor(state_today)
+as.numeric(daily_creat)
+ as.numeric(last1yr_min_hosp_icu_MDRD)
+as.numeric(bun)
+as.numeric(urin)
+as.numeric(category6)
+as.numeric(category7)
+as.numeric(other_fluid)
+ as.factor(daily)
+ as.factor(age)
+ as.numeric(apache3)
+ as.factor(mv)
+ as.factor(icu_loc)
+ as.factor(liver_tr_before_admit)
+ as.numeric(min_GCS)
+ as.numeric(min_PH)
+ as.numeric(previous_day_creat)
+ as.numeric(previous_day_bun) ,ApacheData)
And I want to use this model to predict a new input so I give each predictor variable a value:
predict(fit, data=data.frame(state_today=1, daily_creat=2.3, last1yr_min_hosp_icu_MDRD=3, bun=10, urin=0.01, category6=10, category7=20, other_fluid=0, daily=2 , age=25, apache3=12, mv=1, icu_loc=1, liver_tr_before_admit=0, min_GCS=20, min_PH=3, previous_day_creat=2.1, previous_day_bun=14))
I expect a single value as a prediction to this new input, but I get many many predictions! I don't know why is this happening. What am I doing wrong?
Thanks a lot for your time!
You may also want to try the excellent effects package in R (?effects). It's very useful for graphing the predicted probabilities from your model by setting the inputs on the right-hand side of the equation to particular values. I can't reproduce the example you've given in your question, but to give you an idea of how to quickly extract predicted probabilities in R and then plot them (since this is vital to understanding what they mean), here's a toy example using the in-built data sets in R:
install.packages("effects") # installs the "effects" package in R
library(effects) # loads the "effects" package
data(Prestige) # loads in-built dataset
m <- lm(prestige ~ income + education + type, data=Prestige)
# this last step creates predicted values of the outcome based on a range of values
# on the "income" variable and holding the other inputs constant at their mean values
eff <- effect("income", m, default.levels=10)
plot(eff) # graphs the predicted probabilities

R: How to make column of predictions for logistic regression model?

So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.

Resources