how to plot estimates through model in R - r

I'm trying to use R to do some modelling, I've started to use BodyWeight library, since I've seen some examples online. Just to understand and get used to the commands.
I've come to my final model, with estimates and I was wondering how to plot these estimates, but I haven't seen anything online..
Is there a way to plot the values of the estimates with a line, and dots for the values of each observation?
Where can I find information about how to do this, do I have to extract the values myself or it is possible to say plot the estimates of these model?
I'm only starting with R. Any help is welcome.
Thank you

There is no function that just plots the output of a model, since there are usually many different possible ways of plotting the output.
Take a look at the predict function for whatever model type you are using (for example, linear regressions using lm have a predict.lm function).
Then choose a plotting system (you will likely want different panels for different levels of diet, so use either ggplot2 or lattice). Then see if you can describe more clearly in words how you want the plot to look. Then update your question if you get stuck.
Now we've identified which dataset you are using, here's a possible plot:
#Run your model
model <- lme(weight ~ Time + Diet, BodyWeight, ~ 1 | Rat)
summary(model)
#Predict the values
#predict.lme is a pain because you have to specify which rat
#you are interested in, but we don't want that
#manually predicting things instead
times <- seq.int(0, 65, 0.1)
mcf <- model$coefficients$fixed
predicted <-
mcf["(Intercept)"] +
rep.int(mcf["Time"] * times, nlevels(BodyWeight$Diet)) +
rep(c(0, mcf["Diet2"], mcf["Diet3"]), each = length(times))
prediction_data <- data.frame(
weight = predicted,
Time = rep.int(times, nlevels(BodyWeight$Diet)),
Diet = rep(levels(BodyWeight$Diet), each = length(times))
)
#Draw the plot (using ggplot2)
(p <- ggplot(BodyWeight, aes(Time, weight, colour = Diet)) +
geom_point() +
geom_line(data = prediction_data)
)

Related

How to graph inc exponential decay in R?

My prof decided that our first experience with coding was going to be trying to fit the function z(t) = A(1-e^(-t/T)) into a given data-set from class using R. I'm completely lost. I keep using lm and nls functions, without quite knowing how they work. So far, I have the data graphed but I have no clue how to get any sort of line more complicated than
mod3<-lm(y~I(x^1/5))
pre3<-predict(mod3)
lines(pre3)
to sum up: how do I find the A and T parameters? Do I use nls for the formula? Anything helps. I'll include a picture of the graph and the data. Please ignore the random lines on the plot. graph depicting my dataset dataset I have to use
One could attempt transform your expression into a linear relationship, but sometimes it is easier to just let the computer do the work. As mention in the comments, R has the nls function to perform the nonlinear regression.
Here is an example using some dummy data. The supply the nls function with your equation, the data frame containing the data and supply it with the initial estimates of the parameters.
See comments for additional details.
#create dummy data
A= 0.8
T1 = 13
t <- seq(2, 50, 3)
z <- A*(1-exp(-t/T1))
z<- z +rnorm(length(z), 0, 0.005) #add noise
#starting data frame
df <-data.frame(t, z)
#solve non-linear model
model <- nls(z ~ A*(1-exp(-t/Tc)), data=df, start = list(A=1, Tc=1))
print(summary(model))
#predict
pred_y <-predict(model, data.frame(t))
#plot
plot(x=t, y=z)
lines(y=pred_y, x= t, col="blue")

Extracting baseline hazards from stpm2-object

I need to extract the baseline hazards from a general survival model (GSM) that I've constructed using the rstpm2-package (a conversion of the stpm2 module in stata).
using the data in the rstpm2-package let's use this as an example:
library(rstpm2)
gsm <- stpm2(Surv(rectime,censrec==1)~hormon, data=brcancer, df=3)
sum.gsm <- summary(gsm)
So I've noticed that the summary has an element named bhazard:
sum.gsm#args$bhazard
However it seems to be filled with zeroes and holds one value per patient. As far as I understand the baseline hazard should consist of one hazard for every time-point in the data.
Does anyone have any experience that could be of assistance
You can use the plot and lines methods to plot survival and a number of other predictions. For example:
plot(gsm, newdata=data.frame(hormon=0))
Plot
Note that you need to specify the newdata argument. For more general plots, you can get predictions on a time grid with full covariates with standard errors using:
out <- predict(gsm,newdata=data.frame(hormon=0:1), grid=TRUE,
full=TRUE, se.fit=TRUE)
Then you could use ggplot2 or lattice to plot the intervals. For example,
library(ggplot2)
ggplot(out, aes(x=rectime, y=Estimate, ymin=lower, ymax=upper,
fill=factor(hormon))) +
geom_ribbon(alpha=0.6) + geom_line()
Plot
Edit: to predict survival at specific times, you can include the times in the newdata argument:
predict(gsm, newdata=data.frame(hormon=0,rectime=1:365))
(Note that survival is defined in terms of log(time), hence I have excluded rectime=0.)

Plotting random slopes from glmer model using sjPlot

In the past, I had used the sjp.glmer from the package sjPlot to visualize the different slopes from a generalized mixed effects model. However, with the new package, I can't figure out how to plot the individual slopes, as in the figure for the probabilities of fixed effects by (random) group level, located here
Here is the code that, I think, should allow for the production of the figure. I just can't seem to get it in the new version of sjPlot.
library(lme4)
library(sjPlot)
data(efc)
# create binary response
efc$hi_qol = 0
efc$hi_qol[efc$quol_5 > mean(efc$quol_5,na.rm=T)] = 1
# prepare group variable
efc$grp = as.factor(efc$e15relat)
# data frame for 2nd fitted model
mydf <- na.omit(data.frame(hi_qol = as.factor(efc$hi_qol),
sex = as.factor(efc$c161sex),
c12hour = as.numeric(efc$c12hour),
neg_c_7 = as.numeric(efc$neg_c_7),
grp = efc$grp))
# fit 2nd model
fit2 <- glmer(hi_qol ~ sex + c12hour + neg_c_7 + (1|grp),
data = mydf,
family = binomial("logit"))
I have tried to graph the model using the following code.
plot_model(fit2,type="re")
plot_model(fit2,type="prob")
plot_model(fit2,type="eff")
I think that I may be missing a flag, but after reading through the documentation, I can't find out what that flag may be.
Looks like this might do what you want:
(pp <- plot_model(fit2,type="pred",
terms=c("c12hour","grp"),pred.type="re"))
type="pred": plot predicted values
terms=c("c12hour", "grp"): include c12hour (as the x-axis variable) and grp in the predictions
pred.type="re": random effects
I haven't been able to get confidence-interval ribbons yet (tried ci.lvl=0.9, but no luck ...)
pp+facet_wrap(~group) comes closer to the plot shown in the linked blog post (each random-effects level gets its own facet ...)
Ben already posted the correct answer. sjPlot uses the ggeffects-package for marginal effects plot, so an alternative would be using ggeffects directly:
ggpredict(fit2, terms = c("c12hour", "grp"), type="re") %>% plot()
There's a new vignette describing how to get marginal effects for mixed models / random effects. However, confidence intervals are currently not available for this plot-type.
The type = "ri.prob" option in the linked blog-post did not adjust for covariates, that's why I first removed that option and later re-implemented it (correctly) in ggeffects / sjPlot. The confidence intervals shown in the linked blog-post are not correct, either. Once I figure out a way how to obtain CI or prediction intervals, I'll add this option as well.

How to plot log odds for a binary outcome against a continuous variable in ggplot2?

I am trying to generate some graphs to visualize the functional form of different variables for exploratory data analysis and simple modeling. I have a binary outcome variable, and some continuous predictor variables. What I would like to do is make a graph in ggplot2 that compares the log odds of my outcome variable in the Y axis to the predictor variable in the X-axis. This way I can visually estimate if the predictor variable needs to be transformed before using it in a linear regression.
Disclaimer: I am a novice at both R and Statistics.
So far the closes thing I've found is in the Hmisc package:
library(tidyverse)
library(Hmisc)
df <- mtcars %>% mutate(mpg = if_else((mpg > 18), 1, 0))
plsmo(
df$hp,
df$mpg,
method = "lowess",
datadensity = TRUE,
ylab = "Log odds of mpg > 18 as a function of hp",
fun = plogis,
lty = 3
)
Even in this case, it doesn't quite make a scatterplot of the outcome variables, and instead represents the data densities in the form of the tick marks on the line. If I could at least replicate that graph, it might be a good starting point. My first thought would be to do it manually by breaking the predictor variable into bins and calculating the odds for each bin, but this might require a large degree of fiddling for each graph.
This is close but doesn't help much with seeing relationships.
Thanks!

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

Resources