Errors Plotting a Restricted Cubic Spline with ggplot2 - r

I would like to use ggplot2 to illustrate a fit using a restricted cubic spline using geom_smooth() but it seems to be working incorrectly. Here is a short example:
# rms package Contains Restricted Cubic Splines (RCS)
library(rms)
library(ggplot2)
# Load Data
data(cars)
# Model Fit with RCS
fit <- lm(speed ~ rcs(dist, 5), data=cars)
# Obtain Diagnostic Data
plot.dat <- cbind(cars, fitted=fitted(fit))
# Compare Smooth to Actual
ggplot(data=plot.dat) +
geom_point(aes(x=dist, y=speed)) +
geom_smooth(aes(x=dist, y=speed), method="lm",
formula=y ~ rcs(x, 5), se=FALSE, colour="blue") +
geom_line(aes(y=fitted, x=dist), size=1.25, colour="red")
This results in the following image:
Comparison of Splines
I am not sure why geom_smooth() is not giving the correct results. Clearly there is a work-around (as illustrated above), but is there a way to make geom_smooth() produce the correct results?

I don't know how to integrate this with geom_smooth but I can do it with ggplot.Predict from the rms package:
ddist <- datadist(cars)
options(datadist='ddist')
fit <- ols(speed~ rcs(dist,5),data=cars,
x=TRUE, y=TRUE)
ggplot(Predict(fit))+geom_point(data=cars, aes(x=dist, y=speed))

It has been a long time, but I finally recognized the problem, and I thought I would post it here for those interested. Internally, geom_smooth() will create a sequence of the predictor at which to plot the predicted response. As this sequence is spaced out across the range of the x-axis, the knot points selected by rcs() inside of geom_smooth() will differ from the knot points selected by rcs() on the original data. To address this, you need to pass in the correct knot points.
# rms package Contains Restricted Cubic Splines (RCS)
library(rms)
library(ggplot2)
# Load Data
data(cars)
# Model Fit with RCS
fit <- lm(speed ~ rcs(dist, 5), data=cars)
# Obtain Diagnostic Data
plot.dat <- cbind(cars, fitted=fitted(fit))
# Compare Smooth to Actual
ggplot(data=plot.dat) +
geom_point(aes(x=dist, y=speed)) +
geom_smooth(aes(x=dist, y=speed), method="lm",
formula=y ~ rcs(x, quantile(plot.dat$dist, probs = c(0.05, 0.275, 0.5, 0.725, 0.95))), se=FALSE, colour="blue") +
geom_line(aes(y=fitted, x=dist), colour="red")

Related

R Finding logistic curve with nls

I have a problem with logistic curve in R for panel data which are here:
https://docs.google.com/spreadsheets/d/1SO3EzFib7T3XqTz1xZCU2bZFTB0Ddhou/edit?usp=sharing&ouid=110784858039906954607&rtpof=true&sd=true
I tried:
log <- nls(lrc~SSlogis(time, Asym, xmid, scal), data = data_log)
I have an error: 'qr.solve(QR.B, cc)':singular matrix 'a' in solve.
What can I do?
I get a different error:
Error in nls(y ~ 1/(1 + exp((xmid - x)/scal)), data = xy, start = list(xmid = aux[[1L]], :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
(it's not surprising that different platforms will get slightly different results on numerically "difficult" problems ...)
However, here's what plot(lrc ~ time, data = dd) produces when I use your data:
It seems optimistic to think that you could fit a logistic curve to these data, or that the fit would make very much sense ...
I did find that I could fit a logistic to the logged data, i.e. nls(log(lrc) ...)
plot(log(lrc) ~ time, data =dd)
tvec <- seq(2008, 2016, by = 0.1)
lines(tvec, predict(m2, newdata=data.frame(time=tvec)), col=2, lwd=2)
If I really needed logistic coefficients for this plot (e.g. to compare with other cases) I would fit a linear model to the data, assume that the midpoint was equal to the midpoint of the data, set the scaling parameter equal to the linear slope coefficient divided by 4 (this is a standard rule of thumb for the logistic), and say that the asymptotic value could not be estimated.
Plotting all of your data unit-by-unit doesn't make much more hopeful that you will be able to fit logistic curves unit-by-unit either. While there might be a few units where a logistic curve is a sensible description of the data, it's not in most cases. You might have to back up and consider your analytical strategy — i.e., what are you hoping to learn from these data, and how might you go about it? (If you can frame the question suitably you could post on CrossValidated. If you are a student/trainee, you might want to ask your supervisor/teacher/mentor for advice ...
library(readxl)
library(tidyverse)
(dd <- read_excel("data2.xlsx")
%>% pivot_longer(-code, names_transform = as.numeric,
names_to = "year")
## indices to break groups into chunks/facets
%>% mutate(grp_cat = factor(as.numeric(factor(code)) %% 30))
)
gg1 <- ggplot(dd, aes(year, value, group = code,
colour=code)) + geom_point() +
facet_wrap(~grp_cat, scale="free_y") +
expand_limits(y=0) +
theme_bw() +
theme(legend.position = "none",
panel.spacing = grid::unit(0, "lines"),
axis.text.x = element_blank())
gg1 + geom_smooth(se=FALSE)
ggsave("all.png", width=8, height=8)

Adding fixed effects regression line to ggplot

I am plotting panel data using ggplot and I want to add the regression line for my fixed effects model "fixed" to the plot. This is the current code:
# Fixed Effects Model in plm
fixed <- plm(progenyMean ~ damMean, data=finalDT, model= "within", index = c("sireID", "cropNum"))
# Plotting Function
plotFunction <- function(Data){
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_smooth(method = "lm", se = T, formula=fixed)
}
However, the plot doesn't recognise the geom_smooth() and there is no regression line on the plot.
Is it possible to plot a regression line for a fixed effects model here?
OP. Please, include a reproducible example in your next question so that we can help you better. In this case, I'll answer using the same dataset that is used on Princeton's site here, since I'm not too familiar with the necessary data structure to support the plm() function from the package plm. I do wish the dataset could be one that is a bit more dependably going to be present... but hopefully this example remains illustrative even if the dataset is no longer available.
library(foreign)
library(plm)
library(ggplot2)
library(dplyr)
library(tidyr)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")
fixed <-plm(y ~ x1, data=Panel, index=c("country", "year"), model="within")
my_lm <- lm(y ~ x1, data=Panel) # including for some reference
Example: Plotting a Simple Linear Regression
Note that I've also referenced a standard linear model - this is to show you how you can extract the values and plot a line from that to match geom_smooth(). Here's an example plot of that data plus a line plotted with the lm() function used by geom_smooth().
plot <- Panel %>%
ggplot(aes(x1, y)) + geom_point() + theme_bw() +
geom_smooth(method="lm", alpha=0.1, color='gray', size=4)
plot
If I wanted to plot a line to match the linear regression from geom_smooth(), you can use geom_abline() and specify slope= and intercept=. You can see those come directly from our my_lm list:
> my_lm
Call:
lm(formula = y ~ x1, data = Panel)
Coefficients:
(Intercept) x1
1.524e+09 4.950e+08
Extracting those values for my_lm$coefficients gives us our slope and intercept (realizing that the named vector has intercept as the fist position and slope as the second. You'll see our new blue line runs directly over top of the geom_smooth() line - which is why I made that one so fat :).
plot + geom_abline(
slope=my_lm$coefficients[2],
intercept = my_lm$coefficients[1], color='blue')
Plotting line from plm()
The same strategy can be used to plot the line from your predictive model using plm(). Here, it's simpler, since the model from plm() seems to have an intercept of 0:
> fixed
Model Formula: y ~ x1
Coefficients:
x1
2475617827
Well, then it's pretty easy to plot in the same way:
plot + geom_abline(slope=fixed$coefficients, color='red')
In your case, I'd try this:
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_abline(slope=fixed$coefficients)

Not getting a smooth curve using ggplot2

I am trying to fitting a mixed effects models using lme4 package. Unfortunately I cannot share the data that i am working with. Also i couldn't find a toy data set is relevant to my problem . So here i have showed the steps that i followed so far :
First i plotted the overall trend of the data as follows :
p21 <- ggplot(data = sub_data, aes(x = age_cent, y = y))
p21+ geom_point() + geom_smooth()
Based on this , there seems to be a some nonlinear trend in the data. Hence I tried to fit the quadratic model as follows :
sub_data$age_cent=sub_data$age-mean((sub_data)$age)
sub_data$age_centsqr=(sub_data$age-mean((sub_data)$age))^2
m1= lmer(y ~ 1 + age_cent + age_centsqr +(1 | id) , sub_data, REML = TRUE)
In the above model i only included a random intercept because i don't have enough data to include both random slope and intercept.Then i extracted the predictions of these model at population level as follows :
pred1=predict(m1,re.form=NA)
Next I plotted these predictions along with a smooth quadratic function like this
p21+ geom_point() + geom_smooth(method = "lm", formula = y ~ I(x) + I(x^2)
,col="red")+geom_line(aes(y=pred1,group = id) ,col="blue", lwd = 0.5)
In the above plot , the curve corresponds to predictions are not smooth. Can any one helps me to figure out the reason for that ?
I am doing anything wrong here ?
Update :
As eipi10 pointed out , this may due to fitting different curves for different people.
But when i tried the same thing using a toy data set which is in the lme4 package , i got the same curve for each person as follows :
m1 <- lmer(Reaction ~ 1+I(Days) + (1+ Days| Subject) , data = sleepstudy)
pred1new1=predict(m1,re.form=NA)
p21 <- ggplot(data = sleepstudy, aes(x = Days, y = Reaction))
p21+ geom_point() + geom_smooth()
p21+ geom_point() + geom_smooth()+ geom_line(aes(y=pred1new1,group = Subject) ,col="red", lwd = 0.5)
What may be the reason the for different results ? Is this due to unbalance of the data ?
The data i used collected in 3 time steps and some people didn't have it for all 3 time steps. But the toy data set is a balanced data set.
Thank you
tl;dr use expand.grid() or something like it to generate a balanced/evenly spaced sample for every group (if you have a strongly nonlinear curve you may want to generate a larger/more finely spaced set of x values than in the original data)
You could also take a look at the sjPlot package, which does a lot of this stuff automatically ...
You need both an unbalanced data set and a non-linear (e.g. polynomial) model for the fixed effects to see this effect.
if the model is linear, then you don't notice missing values because the linear interpolation done by geom_line() works perfectly
if the data are balanced then there are no gaps to get weirdly filled by linear interpolation
Generate an example with quadratic effects and an unbalanced data set; fit the model
library(lme4)
set.seed(101)
dd <- expand.grid(id=factor(1:10),x=1:10)
dd$y <- simulate(~poly(x,2)+(poly(x,2)|id),
newdata=dd,
family=gaussian,
newparams=list(beta=c(0,0,0.1),
theta=rep(0.1,6),
sigma=1))[[1]]
## subsample randomly (missing values)
dd <- dd[sort(sample(nrow(dd),size=round(0.7*nrow(dd)))),]
m1 <- lmer(y ~ poly(x,2) + (poly(x,2)|id) , data = dd)
Naive prediction and plot:
dd$pred1 <- predict(m1,re.form=NA)
library(ggplot2)
p11 <- (ggplot(data = dd, aes(x = x, y = y))
+ geom_point() + geom_smooth(method="lm",formula=y~poly(x,2))
)
p11 + geom_line(aes(y=pred1,group = id) ,col="red", lwd = 0.5)
Now generate a balanced data set. This version generates 51 evenly spaced points between the min and max - this will be useful if the original data are unevenly spaced. If you have NA values in your x variable, don't forget na.rm=TRUE ...
pframe <- with(dd,expand.grid(id=levels(id),x=seq(min(x),max(x),length.out=51)
Make predictions, and overlay them on the original plot:
pframe$pred1 <- predict(m1,newdata=pframe,re.form=NA)
p11 + geom_line(data=pframe,aes(y=pred1,group = id) ,col="red", lwd = 0.5)

Using ggplot2 to plot an already-existing linear model

Let's say that I have some data and I have created a linear model to fit the data. Then I plot the data using ggplot2 and I want to add the linear model to the plot. As far as I know, this is the standard way of doing it (using the built-in cars dataset):
library(ggplot2)
fit <- lm(dist ~ speed, data = cars)
summary(fit)
p <- ggplot(cars, aes(speed, dist))
p <- p + geom_point()
p <- p + geom_smooth(method='lm')
p
However, the above violates the DRY principle ('don't repeat yourself'): it involves creating the linear model in the call to lm and then recreating it in the call to geom_smooth. This seems inelegant to me, and it also introduces a space for bugs. For example, if I change the model that is created with lm but forget to change the model that is created with geom_smooth, then the summary and the plot won't be of the same model.
Is there a way of using ggplot2 to plot an already existing linear model, e.g. by passing the lm object itself to the geom_smooth function?
What one needs to do is to create a new data frame with the observations from the old one plus the predicted values from the model, then plot that dataframe using ggplot2.
library(ggplot2)
# create and summarise model
cars.model <- lm(dist ~ speed, data = cars)
summary(cars.model)
# add 'fit', 'lwr', and 'upr' columns to dataframe (generated by predict)
cars.predict <- cbind(cars, predict(cars.model, interval = 'confidence'))
# plot the points (actual observations), regression line, and confidence interval
p <- ggplot(cars.predict, aes(speed,dist))
p <- p + geom_point()
p <- p + geom_line(aes(speed, fit))
p <- p + geom_ribbon(aes(ymin=lwr,ymax=upr), alpha=0.3)
p
The great advantage of doing this is that if one changes the model (e.g. cars.model <- lm(dist ~ poly(speed, 2), data = cars)) then the plot and the summary will both change.
Thanks to Plamen Petrov for making me realise what was needed here. As he points out, this approach will only work if predict is defined for the model in question; if not, one has to define it oneself.
I believe you want to do something along the lines of :
library(ggplot2)
# install.packages('dplyr')
library(dplyr)
fit <- lm(dist ~ speed, data = cars)
cars %>%
mutate( my_model = predict(fit) ) %>%
ggplot() +
geom_point( aes(speed, dist) ) +
geom_line( aes(speed, my_model) )
This will also work for more complex models as long as the corresponding predict method is defined. Otherwise you will need to define it yourself.
In the case of linear model you can add the confidence/prediction bands with slightly more work and reproduce your plot.

Plotting y~1/x with stat_smooth

Good evening,
I am currently trying to plot a reciprocal model of the form y~1/x in a graph. I want to plot the original data, the predicted Y-values and the fitted model (including confidence intervals). Everything is working fine, despite plotting the fitted model with the reciprocal 1/x. I can get log transformations etc. done, but the reciprocal transformation will not work for me. I tried
stat_smooth(method="lm", formula= y ~ 1/x)
stat_smooth(method="lm", formula= y ~ poly(x,-1)
stat_smooth(method="lm", formula= y ~ (x^-1)
Does not work.. Is there anything I am missing? I included an example below. Any help is appreciated!
library(ggplot2)
library(car)
df <- Leinhardt
df1 <- na.omit(df)
df1 <- df1[order(infant),]
df1["reincome"] <- 1/(df1$income)
model3 <- lm(infant~reincome, df1)
df1["yhat"]<- predict(model3)
ggplot(df1, aes(x=income, y=infant))+
geom_point()+geom_point(aes(y=df1$yhat), color="red")+
stat_smooth(method="lm",formula= y ~ 1/x)
Use I to indicate that 1/x should be treated ‘as is’ (?AsIs for help):
ggplot(df1, aes(x=income, y=infant))+
geom_point()+geom_point(aes(y=df1$yhat), color="red")+
stat_smooth(method="lm",formula= y ~ I(1/x))

Resources