How to plot ols with r.c. splines - r

I'd like to plot the predicted line of the regression that contains a restricted cubic spline due to non-linearity in the model and the standard error bands. I can get the predicted points, but am not sure to to just plot the lines and error bands. ggplot is preferred, or base graphics is fine also. Thanks.
Here is an example from the documentation:
library(rms)
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
pred <- fitted(f) # or predict(f) or f$linear.predictors
f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1)
fastbw(f2, aics=100000)
options(datadist=NULL)
And a plot of the predicted values of the model:
plot(predict(f2))

The rms package has a number of helpful functions for this purpose. It is worth looking at http://biostat.mc.vanderbilt.edu/wiki/Main/RmS
In this instance, you can simple set datadist (which set up distribution summaries for predictor variables) appropriately and then use plot(Predict(f) or ggplot(Predict(f))
set.seed(5)
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
ddist <- datadist(x1,x2,x3,x4)
options(datadist='ddist')
plot(Predict(f))
ggplot(Predict(f))

Related

r increase the font size of values in dendogram plot

How can I can increase the font size of labels , x2, x1, x3, x4 in the plot produced based on the function varclus
set.seed(1)
x1 <- rnorm(200)
x2 <- rnorm(200)
x3 <- x1 + x2 + rnorm(200)
x4 <- x2 + rnorm(200)
x <- cbind(x1,x2,x3,x4)
v <- varclus(x, similarity="spear") # spearman is the default anyway
v # invokes print.varclus
print(round(v$sim,2))
plot(v)
Thanks.
plot.varclus internally calls plot.hclus as you can see by running:
getS3method("plot",class = 'varclus')
and it passes along the labels argument (and the ... argument(s)).
this includes a font scaling argument cex
so try:
plot(v,
cex = 1.5)

Why does xtnbreg, fe in Stata produce different findings than femlm and glm.nb (both with fixed effects) in R?

I have estimated the following negative binomial regression model with group fixed effects in Stata. The data are time series cross sectional. The panelvar is group and the timevar is time.
tsset group time
xtnbreg y x1 x2 x3 + x4 + x5, fe
I want to replicate these findings in R. To do this, I have tried these 4 models:
nb1 <- femlm(y ~ x1 + x2 + x3 + x4 + x5 | group, panel.id = ~group + time, family = "negbin", mydata)
nb2 <- fenegbin(y ~ x1 + x2 + x3 + x4 + x5 | group, panel.id = ~group + time, mydata)
nb3 <- glm.nb(y ~ x1 + x2 + x3 + x4 + x5 + factor(group), data=mydata)
nb4 <- glmmadmb(y ~ x1 + x2 + x3 + x4 + x5 + factor(group), data = mydata, family="nbinom")
The results produced by nb1-4 are all identical, but different from the results produced by xtnbreg in Stata. The coefficients, standard errors, and p-values are all substantively different.
I have tried replicating a standard negative binomial regression in Stata and R and have been able to do so successfully.
Does anyone have any idea what's going on here? I have reviewed related posts on this forum (such as this one: is there an R function for Stata's xtnbreg?) and have not found any answers.
SOLVED (mostly): The R code that reproduces the results generated by xtnbreg, fe in Stata:
nb5 <- pglm(y ~ x1 + x2 + x3 + x4 + x5 ,family = negbin, data = mydata, effect = "individual", model="within", index = "group")
I found the solution on RPubs: https://rpubs.com/cuborican/xtpoisson.
I still do not know why this works, only that it does. I suspect that Ben is correct and it has something to do with estimating conditional vs unconditional ML. If anyone knows for sure, please share.

Incremental variance explained in multivariate multiple linear regression

I try to calculate incremental variance explained by variables in multivariate multiple linear regression model, but I don't have Sum of squares parameters like multiple linear regression. I'd like something like:
library(car)
#Create variables and adjusted the model
set.seed(123)
N <- 100
X1 <- rnorm(N, 175, 7)
X2 <- rnorm(N, 30, 8)
X3 <- abs(rnorm(N, 60, 30))
Y1 <- 0.2*X1 - 0.3*X2 - 0.4*X3 + 10 + rnorm(N, 0, 10)
Y2 <- -0.3*X2 + 0.2*X3 + rnorm(N, 10)
Y <- cbind(Y1, Y2)
dfRegr <- data.frame(X1, X2, X3, Y1, Y2)
(fit <- lm(cbind(Y1, Y2) ~ X1 + X2 + X3, data=dfRegr))
#How do we get the proportion now?
af <- Anova(fit)
afss <- af$"test stat"
print(cbind(af,PctExp=afss/sum(afss)*100))
#
Obviously doesn't work. There are some kind of approach for this?

Linear regression of same outcome, similar number of covariates and one unique covariate in each model

I want to run linear regression for the same outcome and a number of covariates minus one covariate in each model. I have looked at the example on this page but could that did not provide what I wanted.
Sample data
a <- data.frame(y = c(30,12,18), x1 = c(7,6,9), x2 = c(6,8,5),
x3 = c(4,-2,-3), x4 = c(8,3,-3), x5 = c(4,-4,-2))
m1 <- lm(y ~ x1 + x4 + x5, data = a)
m2 <- lm(y ~ x2 + x4 + x5, data = a)
m3 <- lm(y ~ x3 + x4 + x5, data = a)
How could I run these models in a short way and and without repeating the same covariates again and again?
Following this example you could do this:
lapply(1:3, function(i){
lm(as.formula(sprintf("y ~ x%i + x4 + x5", i)), a)
})

What is the difference between x^2 and I(x^2) in R?

What is the difference between these two models in R?
model1 <- glm(y~ x + x^2, family=binomial(link=logit), weights=numbers))
model2 <- glm(y~ x + I(x^2),family=binomial(link=logit), weights=numbers))
Also what is the equvalent of I(x^2) in SAS?
The I() function means 'as is' whereas the ^n (to the power of n) operator means 'include these variables and all interactions up to n way'
This means:
I(X^2) is literally regressing Y against X squared and
X^2 means include X and the 2 way interaction of X but since it is only one variable there is no interaction so it returns only itself i.e. X. Note that in your formula you say X + X^2 which translates to X + X which in the formula syntax is only taken into account once. I.e. one of the two Xs will be removed.
Demonstration:
Y <- runif(100)
X2 <- runif(100)
df <- data.frame(Y,X1,X2)
b <- lm( Y ~ X2 + X2^2 + X2,data=df)
> b
Call:
lm(formula = Y ~ X2 + X2^2 + X2, data = df)
Coefficients:
(Intercept) X2
0.48470 0.05098
a <- lm( Y ~ X2 + I(X2^2),data=df)
> a
Call:
lm(formula = Y ~ X2 + I(X2^2), data = df)
Coefficients:
(Intercept) X2 I(X2^2)
0.47545 0.11339 -0.06682
Hope it helps!

Resources