Ordering of gganimate numeric transition - r

I'm sitting with a tiny frustrating problem with gganimate.
I'd like to illustrate the improvement of various metrics while fitting model parameters. However, I can't figure how to force frames to be ordered according to metrics being maximized or minimized.
Take a simple example. We'll fit a OLS model on mtcars using mpg as reponse and hp and explanatory variable (polynomial for better fit). An alternative is the empty model. I would like to illustrate how residual-sum-squared decreases as the model parameters transition towards their MLE values. This is fairly simple, just generate some data to visualize geom_line for each value of RSS along the chosen parameters. But to illustrate this better I'd like the title to show the decreasing value for RSS and the image to start from the empty model. I can achieve the latter by reversing the sign of rss, or I can visualize the Rss change, but I cant seem to get both. How do I achieve this
###
# Data generation for plot (start)
###
# import data
library(ggplot2)
library(gganimate)
data(mtcars)
# Find MLE's
fit <- lm(mpg ~ poly(hp, 2, raw = TRUE), data = mtcars)
# Generate matrix of parameter matrix from empty model to polynomial MLE model
startcoef <- c(mean(mtcars$mpg), 0, 0)
coefs <- sapply(1:3, function(i)seq(startcoef[i], coef(fit)[i], length = 100))
colnames(coefs) <- names(coef(fit))
RSS <- function(coefs)
sum((mtcars$mpg - (coefs[1] + coefs[2] * mtcars$hp + coefs[3] * mtcars$hp^2))^2)
# Generate line data for each value of RSS
hp_seq <- seq(min(mtcars$hp), max(mtcars$hp), length = 100)
plot_data <- do.call(rbind, lapply(1:100, function(i, x = hp_seq, n = length(hp_seq)){
cbind(hp = hp_seq,
mpg = cbind(1, hp_seq, hp_seq^2) %*% coefs[i, 1:3],
rss = rep(RSS(coefs[i, ]), n),
id = rep(i, n))
}))
plot_data <- as.data.frame(plot_data)
names(plot_data) <- c('hp', 'mpg', 'rss', 'id')
###
# Data generation for plot (end)
###
# Generate gif:
p <- ggplot(plot_data, aes(x = hp, y = mpg)) +
geom_line(col = 'blue') +
geom_point(aes(x = hp, y = mpg), data = mtcars, inherit.aes = FALSE) +
transition_manual(rss) +
ease_aes('linear') +
labs(title = 'RSS: {current_frame}')
animate(p, duration = 10, rewind = FALSE, start_pause = 5, end_pause = 5,
renderer = gifski_renderer(loop = TRUE))

Related

How do I get a smooth curve from a few data points, in R?

I am trying to plot the rate 1/t as it changes with mue. The code is given below and I have highlighted the relevant lines with input and output.
library("deSolve")
library("reshape")
library("tidyverse")
Fd <- data.frame()
MUES <- c(100, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 100010, 100020, 100050, 100060, 100080, 100090, 100100, 100500) # <------ THIS IS THE INPUT
for (i in 1:length(MUES)){
parameters <- c(tau = 0.005, tau_r = 0.0025, mui=0, Ve=0.06, Vi=-0.01, s=0.015, mue=MUES[i])
state <- c(X = 0.015, Y = 0)
Derivatives <-function(t, state, parameters) {
#cc <- signal(t)
with(as.list(c(state, parameters)),{
# rate of change
dX <- -(1/tau + mue - mui)*X + (Y-X)/tau_r + mue*Ve - mui*Vi
dY <- -Y/tau + (X-Y)/tau_r
# return the rate of change
list(c(dX, dY))
}) # end with(as.list ...
}
times <- seq(0, 0.1, by = 0.0001)
out <- ode(y = state, times = times, func = Derivatives, parms = parameters)
out.1 <- out %>%
as.data.frame() %>% summarise(d = min(times[Y >=0.015]))
Time <- out.1$d
localdf <- data.frame(t=Time, rate= 1/Time, input=MUES[i])
Fd <- rbind.data.frame(Fd, localdf)}. # <----- THIS IS THE DATAFRAME WITH OUTPUT AND INPUT
spline_int <- as.data.frame(spline(Fd$input, Fd$rate))
ggplot(Fd) +
geom_point(aes(x = input, y = rate), size = 3) +
geom_line(data = spline_int, aes(x = x, y = y))
The rate 1/t has a limiting value at 1276 and thats why I have taken quite a few values of mue in the end, to highlight this. I get a graph like this:
What I want is something like below, so I can highlight the fact that the rate 1/t doesn't grow to infinity and infact has a limiting value. The below figure is from the Python question.
How do I accomplish this in R? I have tried loess and splines and geom_smooth (but just with changing span), perhaps I am missing something obvious.
Splines are polynomials with multiple inflection points. It sounds like you instead want to fit a logarithmic curve:
# fit a logarithmic curve with your data
logEstimate <- lm(rate~log(input),data=Fd)
# create a series of x values for which to predict y
xvec <- seq(0,max(Fd$input),length=1000)
# predict y based on the log curve fitted to your data
logpred <- predict(logEstimate,newdata=data.frame(input=xvec))
# save the result in a data frame
# these values will be used to plot the log curve
pred <- data.frame(x = xvec, y = logpred)
ggplot() +
geom_point(data = Fd, size = 3, aes(x=input, y=rate)) +
geom_line(data = pred, aes(x=x, y=y))
Result:
I borrowed some of the code from this answer.

Stop plotting predictions beyond data limits LME ggpredict Effects

Using the 'iris' dataset (sightly modified as below), I plot the results of an LME.
PLEASE NOTE: I am only using the iris dataset as mock data for the purpose of plotting, so please do not critique the appropriateness of this test. I'm not interested in the statistics, rather the plotting.
Using ggpredict function and plotting the results, the plot extends the predictions beyond the range of the data. Is there a systematic way plot predictions only within the range of each faceted data?
I can plot each facet separately, limit the axis per plot manually, and cowplot them back together, but if there is way to say 'predict only to the max. and min. of the data for that group', this would be great.
Given that these are facets of a single model, perhaps not showing the predictions for different groups is in fact misleading, and I should rather create three different models if I only want predictions within those data subsets?
library(lme4)
library(ggeffects)
library(ggplot2)
data(iris)
glimpse(iris)
df = iris
glimpse(df)
df_ed = df %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "setosa",Sepal.Length+10,Sepal.Length+0))
df_ed = df_ed %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "versicolor",Sepal.Length-3,Sepal.Length+0))
glimpse(df_ed)
m_test =
lmer(Sepal.Width ~ Sepal.Length * Species +
(1|Petal.Width),
data = df_ed, REML = T)
summary(m_test)
test_plot = ggpredict(m_test, c("Sepal.Length", "Species"), type = "re") %>% plot(rawdata = T, dot.alpha = 0.6, facet = T, alpha = 0.3)
As per the OP's comment, I think this will provide a solution. In this example, I use data from the sleepstudy dataset that comes with the lme4 package. First, we have to postulate a mixed model, which I generically call fit.
Note that I do not perform any hypothesis test to formally select an appropriate random-effects structure. Of course, this is essential to adequately capture the correlations in the repeated measurements, but falls outside the scope of this post.
library(lme4)
library(splines)
# quantiles of Days
quantile(sleepstudy$Days, c(0.05, 0.95))
# 5% 95%
# 0 9
# mixed model
fit <- lmer(Reaction ~ ns(Days, df = 2, B = c(0, 9)) +
(Days | Subject), data = sleepstudy)
# new data.frame for prediction
ND <- with(sleepstudy, expand.grid(Days = seq(0L, 9L, len = 50)))
Then, we need a fucntion that enables us to obtain predictions from fit for certain values of the covariates. The function effectPlot_lmer() takes the following arguments:
object: a character string indicating the merMod object that was fitted (the mixed model).
ND: a character string indicating the new data.frame, which specifies the values of the covariates for which we want to obtain predictions.
orig_data: a character string specifying the data on which the mixed model was fitted.
# function to obtain predicted reaction times
effectPlot_lmer <- function (object, ND, orig_data) {
form <- formula(object, fixed.only = TRUE)
namesVars <- all.vars(form)
betas <- fixef(object)
V <- vcov(object)
orig_data <- orig_data[complete.cases(orig_data[namesVars]), ]
Terms <- delete.response(terms(form))
mfX <- model.frame(Terms, data = orig_data)
Terms_new <- attr(mfX, "terms")
mfX_new <- model.frame(Terms_new, ND, xlev = .getXlevels(Terms, mfX))
X <- model.matrix(Terms_new, mfX_new)
pred <- c(X %*% betas)
ses <- sqrt(diag(X %*% V %*% t(X)))
ND$pred <- pred
ND$low <- pred - 1.96 * ses
ND$upp <- pred + 1.96 * ses
return(ND)
}
Finally, we can make an effect plot with ggplot.
# effect plot
library(ggplot2)
ggplot(effectPlot_lmer(fit, ND, orig_data = sleepstudy),
aes(x = Days, y = pred)) +
geom_line(size = 1.2, colour = 'blue4') +
geom_ribbon(aes(ymin = low, ymax = upp), colour = NA,
fill = adjustcolor('blue4', 0.2)) +
theme_bw() + ylab('Expected Reaction (ms)')

How do I plot a single numerical covariate using emmeans (or other package) from a model?

After variable selection I usually end up in a model with a numerical covariable (2nd or 3rd degree). What I want to do is to plot using emmeans package preferentially. Is there a way of doing it?
I can do it using predict:
m1 <- lm(mpg ~ poly(disp,2), data = mtcars)
df <- cbind(disp = mtcars$disp, predict.lm(m1, interval = "confidence"))
df <- as.data.frame(df)
ggplot(data = df, aes(x = disp, y = fit)) +
geom_line() +
geom_ribbon(aes(ymin = lwr, ymax = upr, x = disp, y = fit),alpha = 0.2)
I didn't figured out a way of doing it using emmip neither emtrends
For illustration purposes, how could I do it using mixed models via lme?
m1 <- lme(mpg ~ poly(disp,2), random = ~1|factor(am), data = mtcars)
I suspect that your issue is due to the fact that by default, covariates are reduced to their means in emmeans. You can use theat or cov.reduce arguments to specify a larger number of values. See the documentation for ref_grid and vignette(“basics”, “emmeans”), or the index of vignette topics.
Using sjPlot:
plot_model(m1, terms = "disp [all]", type = "pred")
gives the same graphic.
Using emmeans:
em1 <- ref_grid(m1, at = list(disp = seq(min(mtcars$disp), max(mtcars$disp), 1)))
emmip(em1, ~disp, CIs = T)
returns a graphic with a small difference in layout. An alternative is to add the result to an object and plot as the way that I want to:
d1 <- emmip(em1, ~disp, CIs = T, plotit = F)

How to plot vector of bootstrapped slopes in ggplot2?

I've been using ggplot2 to plot the results of bootstrapping various statistical outputs such as correlation coefficients. Most recently, I bootstrapped the slope of a linear regression model. Here's how that looks using the plot() function from the graphics package:
plot(main="Relationship Between Eruption Length at Wait Time at \n
Old Faithful With Bootstrapped Regression Lines",
xlab = "Eruption Length (minutes)",
ylab = "Wait Time (minutes)",
waiting ~ eruptions,
data = faithful,
col = spot_color,
pch = 19)
index <- 1:nrow(faithful)
for (i in 1:10000) {
index_boot <- sample(index, replace = TRUE) #getting a boostrap sample (of indices)
faithful_boot <- faithful[index_boot, ]
# Fitting the linear model to the bootstrapped data:
fit.boot <- lm(waiting ~ eruptions, data = faithful_boot)
abline(fit.boot, lwd = 0.1, col = rgb(0, 0.1, 0.25, alpha = 0.05)) # Add line to plot
}
fit <- lm(waiting ~ eruptions, data=faithful)
abline(fit, lwd = 2.5, col = "blue")
That works, but it depends on a workflow where we first create a plot, then add the lines in a loop. I'd rather create a list of slopes with a function and then plot all of them in ggplot2.
For example, the function might look something like this:
set.seed(777) # included so the following output is reproducible
n_resample <- 10000 # set the number of times to resample the data
# First argument is the data; second is the number of resampled datasets
bootstrap <- function(df, n_resample) {
slope_resample <- matrix(NA, nrow = n_resample) # initialize vector
index <- 1:nrow(df) # create an index for supplied table
for (i in 1:n_resample) {
index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
df_boot <- df[index_boot, ] # create a bootstrap sample from original data
a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
slope_resample[i] <- slope <- a$coefficients[2] # take the slope
}
return(slope_resample) # Return a vector of differences of proportion
}
bootstrapped_slopes <- bootstrap(faithful, 10000)
But how to get geom_line() or geom_smooth() to take the data from bootstrapped_slopes? Any assistance is much appreciated.
EDIT: More direct adaptation from the OP
For plotting, I presume you want both the slopes and the intercepts, so here's a modified bootstrap function:
bootstrap <- function(df, n_resample) {
# Note 2 dimensions here, for slope and intercept
slope_resample <- matrix(NA, 2, nrow = n_resample) # initialize vector
index <- 1:nrow(df) # create an index for supplied table
for (i in 1:n_resample) {
index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
df_boot <- df[index_boot, ] # create a bootstrap sample from original data
a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
slope_resample[i, 1] <- slope <- a$coefficients[1] # take the slope
slope_resample[i, 2] <- intercept <- a$coefficients[2] # take the intercept
}
# Return a data frame with all the slopes and intercepts
return(as.data.frame(slope_resample))
}
Then run it and plot the lines from that data frame:
bootstrapped_slopes <- bootstrap(faithful, 10000)
library(dplyr); library(ggplot2)
ggplot(faithful, aes(eruptions, waiting)) +
geom_abline(data = bootstrapped_slopes %>%
sample_n(1000), # 10k lines look about the same as 1k, just darker and slower
aes(slope = V2, intercept = V1), #, group = id),
alpha = 0.01) +
geom_point(shape = 19, color = "red")
Alternative solution
This could also be done using modelr and broom to simplify some of the bootstrapping. Based on the main help example for modelr::bootstrap, we can do the following:
library(purrr); library(modelr); library(broom); library(dplyr)
set.seed(777)
# Creates bootstrap object with 10k extracts from faithful
boot <- modelr::bootstrap(faithful, 10000)
# Applies the linear regression to each
models <- map(boot$strap, ~ lm(waiting ~ eruptions, data = .))
# Extracts the model results into a tidy format
tidied <- map_df(models, broom::tidy, .id = "id")
# We just need the slope and intercept here
tidied_wide <- tidied %>% select(id, term, estimate) %>% spread(term, estimate)
ggplot(faithful, aes(eruptions, waiting)) +
geom_abline(data = tidied_wide %>%
sample_n(1000), # 10k lines look about the same as 1k, just darker and slower
aes(slope = eruptions, intercept = `(Intercept)`, group = id), alpha = 0.05) +
geom_point(shape = 19, color = "red") # spot_color wasn't provided in OP

R: Loess regression produces a staircase-like graph, rather than being smoothed, after the value 10

What are possible reasons as to why this is happening? It always happens after the value 10.
A subset of the dataset around the area of interest before and after the regression was applied:
Before
After
Dataset to reproduce graph
This is the ggplot2 call that I am using to generate the graph. The smoothing span used is 0.05.
dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)
smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed
scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')
scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
geom_line(aes(colour = series)) +
xlab("Diameters (nm)") +
ylab("Concentration (dN#/cm^2)") +
theme(plot.title = element_text(hjust = 0.5))
Function used to apply the loess filter:
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
## response
vars <- colnames(raw.data)
## covariate
id <- 1:nrow(raw.data)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
data = given.data,
degree = 1,
span = span)$fitted
## apply filter column-by-column
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
col.names = colnames(raw.data))
sample.rows <- length(loess.graph.data[1])
loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])
}
The first problem is simply that your data is rounded to three significant figures. Below 10, the values on your x axis scan.plot.data$sample.diameters increase in 0.01 increments, which produces a smooth curve on the chart, but after 10 they increase in 0.1 increments, which shows up as visible steps on the chart.
The second problem is that you should be regressing against the values of sample.diameters, rather than against the row numbers id. I think this is causing there to be multiple smoothed values for each distinct value of x - hence the steps. Here are a couple of suggested small modifications to your function...
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
vars <- colnames(raw.data)
vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
loess.filter <- function (x, given.data, span) loess(
formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
data = given.data,
degree = 1,
span = span)$fitted
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data,
span = smoothing.span),
col.names = vars) #final argument edited
loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
return(loess.graph.data)
}
All of which seems to do the trick...
Of course, you could have just done this...
dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +
geom_smooth(method="loess", span=0.05, se=FALSE)

Resources