R splm: Spatial panel with no exogenous regressors - r

I want to estimate the spatial panel autoregressive model
y_{t} = a + \rho W y_{t} + \epsilon_{t}
where a is a vector of individual fixed effects. I am using the excellent splm package in R.
Note that I don't have any independent variables X here - if I include some regressors X there is no problem, but I wonder how to specify the model with splm in the absence of independent variables.
library(splm)
library("spdep")
data("Produc", package = "Ecdat")
data("usaww")
usalw <- mat2listw(usaww)
# this works well since I have independent regressors
spml(formula = log(gsp) ~ log(pcap), data = Produc,
listw = usaww, lag = TRUE, spatial.error = "none", model = "within",
effect = "twoways")
# this does not work
spml(formula = log(gsp) ~ ., data = Produc,
listw = usaww, lag = TRUE, spatial.error = "none",
model = "within", effect = "individual")

To estimate an "empty" model (intercept only) the formula has to be y ~ 1. This currently works with random or no individual effects, "within" (fixed effects) estimators need a fix.
A workaround for getting the FE estimates: explicit demeaning of the data
library(plm)
spml(formula = Within(log(gsp)) ~ 1, data = Produc,
listw = usaww, lag = TRUE, spatial.error = "none",
model = "pooling")

Related

Error when predicting partial effects using new data for gamlss model

I'm here re-raising the issue of predicting CI's for gamlss models using the newdata argument. A further complication is that I'm interested in partial effects as well.
A closely related issue (without partial effects) was un-resolved in 2018: Error when predicting new fitted values from R gamlss object.
I'm wondering if there has been updates that also extend to partial effects. The example below reproduces the error (notice the `type = "terms" specifying I'm interested in the effects of each model term)".
library(gamlss)
library(tidyverse)
#example data
test_df <- tibble(x = rnorm(1e4),
x2 = rnorm(n = 1e4),
y = x2^2 + rnorm(1e4, sd = 0.5))
#fitting gamlss model
gam_test = gamlss(formula = y ~ pb(x2) + x,
sigma.fo= y ~ pb(x2) + x,
data = test_df)
#data I want predictions for
pred_df <- tibble(x = seq(-0.5, 0.5, length.out = 300),
x2 = seq(-0.5, 0.5, length.out = 300))
#returns error when se.fit = TRRUE
pred <- predictAll(object = gam_test,
type = "terms",
se.fit = TRUE, #works if se.fit = FALSE
newdata = pred_df)
Many thanks in advance!
I talked to the main developer of the gamlss software (who is responsible for this function).
He says that the option se.fit=TRUE with type="terms"
has not yet been implemented,
and unfortunately he is too busy at present.
One idea is to bootstrap the original data,
and predict terms for each bootstrap sample,
and then use the results to obtain CI's.

What do you set grouping factor to when using glmer/lme4 and predictInterval?

Problem: Using multilevel (mixed-effects) model and not sure what to set grouping variable to in order to generate predicted probabilities for a measured group-level variable from glmer model using merTools' predictInterval function.
Goal: Generate predicted probabilities and SEs/CIs across a range of values from a "second level" group-level variable.
Seeking: advice on how to properly do this or other recommendations to generate predicted probabilities and CIs the range of values for a group level variable from a glmer model.
library(lme4)
library(merTools)
library(ggplot2)
hier_data <- data_frame(pass = sample(c(0, 1), size = 1000, replace = T),
wt = rnorm(1000),
ht = sample(1:5, size = 1000, replace = T, prob = c(.1, .1, .6, .1, .1)),
school_funding = rnorm(1000),
school = rep(c("A", "B", "C", "D", "E"), each = 200))
mod <- glmer(pass ~ wt + ht + school_funding + (1 | school),
family = binomial("logit"), data = hier_data)
### Without school: error
ndata <- data.frame(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100))
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata,
type = "probability"))
### Problem, when adding school variable: which school?
ndata <- data.frame(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100),
school = "A")
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata,
type = "probability"))
ggplot(pp, aes(x = school_funding, y = fit)) +
geom_point() +
geom_errorbar(aes(ymin = lwr, ymax = upr))
It seems what you are trying to achieve is effects plots for your variables, with fast prediction intervals. Note first of all that predictInterval does not incorporate the uncertainty in the estimated values of the variance parameters, theta. If more accurate confidence intervals are needed, you should use the bootMer function as described in ?bootMer which uses bootstrapping to estimate the uncertainty. However it might simply be infeasible as the model size and complexity increases. Alternatively the effects package contains the capability to illustrate the effects of merMod objects (however the documentation is simply atrocious).
In general when illustrating the effects of merMod objects a question is "which effects?". Are you interested in the marginal effects or the conditional effects (such as variability in random effects?). If your model contains only first-order random effects (no random slopes), and you are interested in the uncertainty of the fixed-effect coefficient or the effect on the conditional mean, you can get away with using any school and specifying which = "fixed" in predictInterval as
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata, #<= any school chosen
type = "probability",
which = "fixed"))
Note the size will depend on the chosen school and remaining coefficients as in standard models, and are thus not causal.
If you are interested in the marginal effect, there are multiple methods for approximating this. The optimal would be to bootstrap the predicted values of the marginal mean. Alternatively if the number of independent groups in your grouping variable is "large" enough, you could (maybe) average predicts intervals across groups as illustrated below
newData <- expand.grid(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school = levels(hier_data$school),
school_funding = seq(from = min(hier_data$school_funding),
to = max(hier_data$school_funding),
length.out = 100))
pp <- predictInterval(merMod = mod,
newdata = newData,
type = "probability")
#Split predictions by every column but school
# And calculate estimated means
predictions <- do.call("rbind", lapply(split(as.data.frame(pp),
newData[, !names(newData) == "school"]),
colMeans))
rownames(predictions) <- 1:nrow(predictions)
#create a plot
ggplot(as.data.frame(cbind(predictions, funding = newData$school_funding[newData$school == "A"])),
aes(x = funding, y = fit, ymax = upr, ymin = lwr)) +
geom_point() +
geom_errorbar()
For this example the model is more often than not singular and contains very few groups, and as such the result is a unlikely to be a great estimator for the marginal effect, but outside of extracting the simulations from predictInterval it might suffice. It is likely going to improve with models with more grouping levels in the random effect. predictInterval doesn't seem to incorporate a method for this situation directly.
An alternative for looking at marginal effects would be assuming marginal mean of the form 1/(1+exp(-eta) (which is often assumed for new groupings of the random effect). This isn't directly implemented in the predictInterval function, but can be achieved by substracting the random effect from the linear predicter, and only estimating the randomness of the fixed effects, as below:
pp <- predictInterval(merMod = mod,
newdata = ndata, #<= any school chosen
type = "linear.prediction",
which = "fixed")
#remove random effects
pp <- sweep(pp, 1, predict(mod, newdata = ndata, random.only = TRUE), "-")
pp <- 1/(1+exp(-pp))
which could then be plotted using similar methods. For fewer groups this might be a better predictor for the marginal mean(?, someone might correct me here).
In either case, adding a bit of x-jitter might improve the plot.
In all cases there might be some golden nuggets in the references to GLMM FAQ by bolker and others.

Polynomial regression model to predict values for a variable with 500 rows

There are around 500 values planned and on the basis of this I have to predict new values actual.
Kindly help me with the coding, here I'm showing what I am doing manually:
predict(poly_reg, data.frame(planned= 48.80000,
Level2 = 48.80000^2,
Level3 = 48.80000^3))
lin_reg = lm(formula = actual ~ ., data = dataset)
summary(lin_reg)
# Fitting Polynomial Regression to the dataset
dataset$Level2 = dataset$planned^2
dataset$Level3 = dataset$planned^3
dataset$Level4 = dataset$Planned_FTM^0.25000
poly_reg = lm(formula = actual~ ., data = dataset)
# Visualising the Linear Regression results
# install.packages('ggplot2')
library(ggplot2)
ggplot() + geom_point(aes(x = dataset$planned, y = dataset$actual), colour = 'red') + geom_line(aes(x = dataset$planned, y = predict(lin_reg, newdata = dataset)), colour = 'blue') + ggtitle('Truth or Bluff (Linear Regression)') + xlab('planned') + ylab('actual')
# Predicting a new result with Linear Regression
predict(lin_reg, data.frame(planned= 48.80000))
# Predicting a new result with Polynomial Regression
predict(poly_reg, data.frame(planned= 48.80000, Level2 = 48.80000^2, Level3 = 48.80000^3))
Try using poly on planned to whatever order polynomial you're using (I used 4):
# Fitting Linear Regression to the dataset
lin_reg = lm(formula = actual ~ planned, data = dataset)
# Fitting Polynomial Regression to the dataset
poly_reg = lm(formula = actual ~ poly(planned, 4), data = dataset)
# Predicting a new result with Linear Regression
predict(lin_reg)
# Predicting a new result with Polynomial Regression
predict(poly_reg)

R: Bayesian package for nonlinear mixed effects model

I'm looking for a Bayesian parallel for nonlinear mixed effects models, specifically those using the nlme package in R.
I've come across blme but that seems to be only for linear mixed-effects models. Would brms be appropriate in this case? I've tried to write some code that's analogous to the nlme construction below with the function brm.
library(nlme)
model <- nlme(height ~ exp(beta1*age + 1),
data = Loblolly,
fixed = list(beta1 ~ 1),
random = list(Seed = pdDiag(list(beta1 ~ 1))),
start = list(fixed = c(beta1 = 3)))
library(brms)
bayesian_model <- brm(bf(height ~ exp(beta1*age + 1), beta1 ~ 1, nl = TRUE),
data = Loblolly,
prior = c(prior(normal(0, 1), nlpar = beta1)))
I was able to get to this point, but how exactly do I specify random effects for beta1? And how would I specify the diagonal variance structure like I have with random = list(Seed = pdDiag(list(beta1 ~ 1)))?

estimate h2o glm coefficients by a categorical variable level

I would like to estimate coefficient for a predictor by a categorical variable level in h2o glm. For example, if my data frame has product price (continuous variable) and product type (categorical variable), then I want to estimate a coefficient for price by product. In SAS, you can easily accomplish this by specifying model effect as price*type. How can I do the same in h2o or R?
There is an interactions() function, but it cannot handle interaction between a continuous and categorical variables. Any tips to get around this problem?
Many thanks,
set.seed(1234)
x1 = rnorm(100,0,1)
x2 = as.factor(rep(c("A","B","C","D"), each = 25))
y = as.factor(rep(0:1, each = 50))
data = data.frame(x1 = x1, x2 = x2, y = y)
Interactions can be specified using a ":" in the formula argument
# glm base example
fit <- glm(data = data, y ~ x1 + x2 + x1:x2, family = "binomial")
print(fit)
Using h2o.glm pairwise interactions can be specified by passing column indices to the interactions argument
# h2o.glm example
library("h2o")
h2o.init(nthreads = -1)
data.hex = as.h2o(data)
h2o_fit <- h2o.glm(x = 1:2, y = 3, training_frame = data.hex, family = "binomial", interactions = 1:2)
h2o_fit#model$coefficients_table
h2o.shutdown(prompt = F)

Resources