Editing the y axis on a sjplot in R (plot_model) - r

I am having trouble with my y axis on this sjplot I have created. I am not sure why the values are arranged like that (see image) and was wondering if anyone could help me e.g. set my y axis to start at 0.
library(jtools)
library(carData)
library(effects)
library(sjPlot)
mod <- glmer(Golden.Trevally ~ Maturity.Status + Behavioural.Activity + (1 | ID.Number), family = "binomial", data = mydf2)
summary(mod)
plot_model(mod, "pred", title="")

By far the hardest part of answering this question was recreating your data to make it reproducible. However, the following is pretty close:
library(jtools)
library(carData)
library(effects)
library(sjPlot)
library(lme4)
set.seed(69)
Behavioural.Activity <- factor(sample(c("Cleaning", "Courtship",
"Cruising", "Feeding"),
size = 10000,
replace = TRUE))
Maturity.Status <- factor(sample(LETTERS[1:3], 10000, TRUE))
ID.Number <- factor(sample(500, 10000, TRUE))
Golden.Trevally <- rbinom(10000, 1, prob =
(c(6, 4, 7, 3)/600)[as.numeric(Behavioural.Activity)] *
c(0.8, 1, 1.2)[as.numeric(Maturity.Status)] *
(as.numeric(ID.Number) / 1000 + 0.75))
mydf2 <- data.frame(ID.Number, Golden.Trevally,
Behavioural.Activity, Maturity.Status)
mod <- glmer(Golden.Trevally ~ Maturity.Status + Behavioural.Activity + (1 | ID.Number),
family = "binomial", data = mydf2)
my_sjplot <- plot_model(mod, "pred", title = "")
my_sjplot$Behavioural.Activity
The solution here is to realize that the object returned by plot_model is a list containing two ggplot objects. You are seeing the one for Behavioural.Activity. It looks the way it does because it has a scale_y_continuous whose labelling function is labelling the breaks to the nearest percent. You can simply over-ride this scale with one of your own:
my_sjplot$Behavioural.Activity +
scale_y_continuous(limits = c(0, 0.01),
labels = scales::percent_format(accuracy = 0.01))

Related

Mixed model plotting with R - showing the data points

I have run a mixed effects binary model using the following code:
model = glmer(A ~ B + (1|C), data = data, family = "binomial")
summary(model)
I am now plotting the marginal fixed effects for a variable of interest (B). I have taken the code from the nice page on:
https://cran.r-project.org/web/packages/ggeffects/vignettes/practical_logisticmixedmodel.html
To produce the graph I have used:
ggpredict(model, "B")
plot(ggpredict(model, "B"))
The following is created which I like. But I want also the data points from the variable B to show on the graph. How can I add these in? Thanks.
welcome to stackoverflow :)
Sadly, I dont know how to (/whether it is possible) to add points to your plot of the ggpredict-object, since I am no good with ggplots :/
But I can do a workaround with baseplot. Only thing missing are the grey confidence intervals...which may bw crucial for good looks? :D
Cheers
#using the example data from the link you provided:
library(magrittr)
library(ggeffects)
library(sjmisc)
library(lme4)
library(splines)
set.seed(123)
#creating the data:
dat <- data.frame(
outcome = rbinom(n = 100, size = 1, prob = 0.35),
var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
var_cont = rnorm(n = 100, mean = 10, sd = 7),
group = sample(letters[1:4], size = 100, replace = TRUE)
)
dat$var_cont <- sjmisc::std(dat$var_cont)
#model creation:
m1 <- glmer( outcome ~ var_binom + var_cont + (1 | group),
data = dat,
family = binomial(link = "logit")
)
#save results:
m1_results <- ggpredict(m1, "var_cont")
#same plot you did:
plot(m1_results)
#workaround using baseplot:
#plotting the raw data:
plot(dat$outcome~dat$var_cont,
pch = 16,
ylab = "outcome",
xlab = "var_cont",
yaxt = "n")
#adding yaxis with percentages:
axis(2, at = pretty(dat$outcome), lab=paste0(pretty(dat$outcome) * 100," %"), las = TRUE)
#adding the model taken from ggpredict:
lines(m1_results$predicted~m1_results$x,
type = "l")
#upper and lower conf intervals:
lines(m1_results$conf.low~m1_results$x,
lty=2)
lines(m1_results$conf.high~m1_results$x,
lty=2)

How to visualise an interaction effect using ggplot with method = glmmTMB::glmmTMB and point weights

I am running an analysis in R on the effect of canopy cover (OverheadCover) and the number of carcasses placed on the same location (CarcassNumber) on the proportion of carrion eaten by birds (ProportionBirdsScavenging). The interaction effect OverheadCover * CarcassNumber is significant and I would like visualise this using ggplot like explained here: https://sebastiansauer.github.io/vis_interaction_effects/. I won't be using method = "lm" like in the example, but method = glmmTMB::glmmTMB. I've added the extra arguments formula = and method.args = to make sure R computes the smooth correctly.
This is how it should look, but I prefer the graph to be made with ggplot because then all my graphs will be in the same style.
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
plot_model(glmm_interaction, type = "int", ci.lvl = 0.682) # conf. int. of 68.3% -> ± standard error
This is the code I'm trying to run, but I can't get it to work. It keeps giving me errors, like object 'pointWeight_scaled' not found. Anyone an idea what I'm doing wrong here?
qplot(x = OverheadCover, y = ProportionBirdsScavenging, color = CarcassNumber, data = data_both) +
geom_smooth(method = glmmTMB::glmmTMB,
formula = ProportionBirdsScavenging ~ OverheadCover * CarcassNumber,
method.args = list(data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled))
I know that it might be easier to just individually run the models and plot them on the same graph. I've done that, and it works. However, my calculated standard errors are larger than the ones in the plot_model(), so I wanted to see how these standard errors look if R does all the work, hence my intention to plot it this way.
This is how it should look, but I prefer the graph to be made with ggplot
The plot returned by plot_model() is a ggplot-object, which you can modify as you like. You could also use the ggeffects-package, which returns the underlying data that can be used to create the plot. There are many examples in the vignettes, both on how to create own plots or how to modify plots returned by plot(), e.g. here or here.
Here is a toy example:
library(ggplot2)
library(ggeffects)
library(lme4)
#> Loading required package: Matrix
set.seed(123)
dat <- data.frame(
outcome = rbinom(n = 500, size = 1, prob = 0.25),
var_binom = as.factor(rbinom(n = 500, size = 1, prob = 0.3)),
var_cont = rnorm(n = 500, mean = 10, sd = 3),
group = sample(letters[1:4], size =500, replace = TRUE)
)
model <- glmer(
outcome ~ var_binom * poly(var_cont, 2) + (1 | group),
data = dat,
family = binomial(link = "logit")
)
predictions <- ggpredict(model, c("var_cont [all]", "var_binom"))
# plot-function from ggeffects
plot(predictions)
# self made ggplot
ggplot(
predictions,
aes(x = x, y = predicted, ymin = conf.low, ymax = conf.high, colour = group, fill = group)
) +
geom_line() +
geom_ribbon(alpha = .1, colour = NA) +
theme_minimal()
Created on 2020-02-06 by the reprex package (v0.3.0)

Replicating lattice graph for a mixed model

I am trying to replicate a lattice graph using ggplot2 for a mixed model. My ggplot graph looks very similar but I am not sure about about loess line model fitted.
My goal is to add a loess line from the mixed model using ggplot2. Below is an example of my commands :
library(nlme)
library(ggplot2)
library(lattice)
library(lme4)
data(MathAchieve)
attach(MathAchieve)
mses <- tapply(SES, School, mean)
mses[as.character(MathAchSchool$School[1:10])]
Bryk <- as.data.frame(MathAchieve[, c("School", "SES", "MathAch")])
names(Bryk) <- c("school", "ses", "mathach")
sample20 <- sort(sample(7185, 20)) # 20 randomly sampled students
Bryk$meanses <- mses[as.character(Bryk$school)]
Bryk$cses <- Bryk$ses - Bryk$meanses
sector <- MathAchSchool$Sector
names(sector) <- row.names(MathAchSchool)
Bryk$sector <- sector[as.character(Bryk$school)]
attach(Bryk)
cat <- sample(unique(school[sector=="Catholic"]), 20)
Cat.20 <- groupedData(mathach ~ ses | school, data=Bryk[is.element(school, cat),])
Graph with Lattice:
trellis.device(color=T)
xyplot(mathach ~ ses | school, data=Cat.20, main="Catholic",
panel=function(x, y) {
panel.loess(x, y, span=1)
panel.xyplot(x, y)
panel.lmline(x, y, lty=2)
})
Graph with ggplot:
ggplot(Cat.20, aes(x = ses, y =mathach )) +
geom_point(size=1, shape=1) +
stat_smooth(method="lm",se=F)+
stat_smooth(, colour="Red",se=F)+
facet_wrap(school~., scale = "free_y")
Please any advice will be appreciated.
Preamble
Before going into the explanation, allow me to refer you to this question: Why is it not advisable to use attach() in R, and what should I use instead?
While it's recommendable that you made your question reproducible, the code you used can do with some clean-up. For example:
Don't include packages that aren't used in the code (I didn't see a need for the lme4 package);
There's no need to use data(...) to load MathAchieve. See the "Good Practices" section from ?data for more details.
As mentioned above, don't use attach().
For complete reproducibility, use set.seed() before any random sampling.
For a minimal example, don't plot 20 schools when a smaller number would do.
Since you are using one of the tidyverse packages for plotting, I recommend another from its collection for data manipulation:
library(nlme)
library(ggplot2)
library(lattice)
library(dplyr)
Bryk <- MathAchieve %>%
select(School, SES, MathAch) %>%
group_by(School) %>%
mutate(meanses = mean(SES),
cses = SES - meanses) %>%
ungroup() %>%
left_join(MathAchSchool %>% select(School, Sector),
by = "School")
colnames(Bryk) <- tolower(colnames(Bryk))
set.seed(123)
cat <- sample(unique(Bryk$school[Bryk$sector == "Catholic"]), 2)
Cat.2 <- groupedData(mathach ~ ses | school,
data = Bryk %>% filter(school %in% cat))
Explanation
Now that that's out of the way, let's look at the relevant functions for loess:
from ?panel.loess:
panel.loess(x, y, span = 2/3, degree = 1,
family = c("symmetric", "gaussian"),
... # omitted for space
)
from ?stat_smooth:
stat_smooth(mapping = NULL, data = NULL, geom = "smooth",
method = "auto", formula = y ~ x, span = 0.75, method.args = list(),
... # omitted for space
)
where method = "auto" defaults to loess from the stats package for <1000 observations.
from ?loess:
loess(formula, data, span = 0.75, degree = 2,
family = c("gaussian", "symmetric"),
... #omitted for space
)
In short, a loess plot's default parameters are span = 2/3, degree = 1, family = "symmetric" for the lattice package, and span = 0.75, degree = 2, family = "gaussian" for the ggplot2 package. You have to specify matching parameters if you want the resulting plots to match:
xyplot(mathach ~ ses | school, data = Cat.2, main = "Catholic",
panel=function(x, y) {
panel.loess(x, y, span=1, col = "red") # match ggplot's colours
panel.xyplot(x, y, col = "black") # to facilitate comparison
panel.lmline(x, y, lty=2, col = "blue")
})
ggplot(Cat.2, aes(x = ses, y = mathach)) +
geom_point(size = 2, shape = 1) +
stat_smooth(method = "lm", se = F)+
stat_smooth(span = 1,
method.args = list(degree = 1, family = "symmetric"),
colour = "red", se = F)+
facet_wrap(school ~ .) +
theme_classic() # less cluttered background to facilitate comparison

Having several fits in one plot (in R)

I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")

R Language - Sorting data into ranges; averaging; ignore outliers

I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers.
The data consists of 2 columns WindSpeed and Power, so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other.
What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x and y and then find the average of power generated for each range and graph the curve formed.
From this average I want recalculate the average based on data which falls within one of two standard deviations of the average (basically ignoring outliers).
Any pointers are appreciated.
For those who are interested I am trying to create a graph similar to this. Its a pretty standard type of graph but like I said the shear quantity of data requires something heavier than excel.
Since you're no longer in Excel, why not use a modern statistical methodology that doesn't require crude binning of the data and ad hoc methods to remove outliers: locally smooth regression, as implemented by loess.
Using a slight modification of csgillespie's sample data:
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
plot(w_sp, power)
x_grid <- seq(0, 100, length = 100)
lines(x_grid, predict(loess(power ~ w_sp), x_grid), col = "red", lwd = 3)
Throw this version, similar in motivation as #hadley's, into the mix using an additive model with an adaptive smoother using package mgcv:
Dummy data first, as used by #hadley
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
df <- data.frame(power = power, w_sp = w_sp)
Fit the additive model using gam(), using an adaptive smoother and smoothness selection via REML
require(mgcv)
mod <- gam(power ~ s(w_sp, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
Predict from our model and get standard errors of fit, use latter to generate an approximate 95% confidence interval
x_grid <- with(df, data.frame(w_sp = seq(min(w_sp), max(w_sp), length = 100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
x_grid <- within(x_grid, upr <- fit + 2 * pred$se.fit)
x_grid <- within(x_grid, lwr <- fit - 2 * pred$se.fit)
Plot everything and the Loess fit for comparison
plot(power ~ w_sp, data = df, col = "grey")
lines(fit ~ w_sp, data = x_grid, col = "red", lwd = 3)
## upper and lower confidence intervals ~95%
lines(upr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
lines(lwr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
## add loess fit from #hadley's answer
lines(x_grid$w_sp, predict(loess(power ~ w_sp, data = df), x_grid), col = "blue",
lwd = 3)
First we will create some example data to make the problem concrete:
w_sp = sample(seq(0, 100, 0.01), 1000)
power = 1/(1+exp(-(rnorm(1000, mean=w_sp, sd=5) -40)/5))
Suppose we want to bin the power values between [0,5), [5,10), etc. Then
bin_incr = 5
bins = seq(0, 95, bin_incr)
y_mean = sapply(bins, function(x) mean(power[w_sp >= x & w_sp < (x+bin_incr)]))
We have now created the mean values between the ranges of interest. Note, if you wanted the median values, just change mean to median. All that's left to do, is to plot them:
plot(w_sp, power)
points(seq(2.5, 97.5, 5), y_mean, col=3, pch=16)
To get the average based on data that falls within two standard deviations of the average, we need to create a slightly more complicated function:
noOutliers = function(x, power, w_sp, bin_incr) {
d = power[w_sp >= x & w_sp < (x + bin_incr)]
m_d = mean(d)
d_trim = mean(d[d > (m_d - 2*sd(d)) & (d < m_d + 2*sd(d))])
return(mean(d_trim))
}
y_no_outliers = sapply(bins, noOutliers, power, w_sp, bin_incr)
Here are some examples of fitted curves (weibull analysis) for commercial turbines:
http://www.inl.gov/wind/software/
http://www.irec.cmerp.net/papers/WOE/Paper%20ID%20161.pdf
http://www.icaen.uiowa.edu/~ie_155/Lecture/Power_Curve.pdf
I'd recommend also playing around with Hadley's own ggplot2. His website is a great resource: http://had.co.nz/ggplot2/ .
# If you haven't already installed ggplot2:
install.pacakges("ggplot2", dependencies = T)
# Load the ggplot2 package
require(ggplot2)
# csgillespie's example data
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
# Bind the two variables into a data frame, which ggplot prefers
wind <- data.frame(w_sp = w_sp, power = power)
# Take a look at how the first few rows look, just for fun
head(wind)
# Create a simple plot
ggplot(data = wind, aes(x = w_sp, y = power)) + geom_point() + geom_smooth()
# Create a slightly more complicated plot as an example of how to fine tune
# plots in ggplot
p1 <- ggplot(data = wind, aes(x = w_sp, y = power))
p2 <- p1 + geom_point(colour = "darkblue", size = 1, shape = "dot")
p3 <- p2 + geom_smooth(method = "loess", se = TRUE, colour = "purple")
p3 + scale_x_continuous(name = "mph") +
scale_y_continuous(name = "power") +
opts(title = "Wind speed and power")

Resources