Plotting multiple models in facet_wrap using ggplot - r

I have a multi-panel plot created with facet_wrap in ggplot, and model outputs from the FlexParamCurve package.
FlexParamCurve provides a model to fit each set of data, i.e. each panel in the plot. I have found code elsewhere for plotting the same curve across all panels, and for plotting model curves for individual lines in each panel. But how can I plot the model curve for each specific plot?
Sample data;
DATA <- data.frame(Cond = rep(1:2, each = 60),
Site = rep((seq(1:2)), each = 30),
Survey = rep((seq(1:5)), each = 6),
Time = rep(1:6))
NUMBERS <- data.frame(Numbers = c(10,13,10,16,31,25,4,11,11,16,21,23,7,12,15,18,20,19,9,15,22,21,24,30,5,10,15,21,21,24,5,7,10,12,20,17,7,11,17,25,27,34,4,9,13,18,23,28,11,15,17,20,21,25,9,18,21,24,21,30,10,11,8,15,20,17,2,6,5,13,14,15,7,9,13,23,28,25,8,13,17,20,24,24,10,10,15,19,23,26,1,2,5,10,17,18,1,3,5,8,15,19,1,8,14,18,26,27,4,9,17,23,25,23,6,12,15,17,20,23))
GRAPH.DATA <- data.frame(cbind(DATA, NUMBERS))
ggplot code;
ggplot(GRAPH.DATA, aes(Time, Numbers, colour=factor(Survey))) +
geom_line(aes(group = Survey), size = 1.5) +
facet_grid(Cond ~ Site)
FlexParamCurve code and Model outputs;
MODELS <- pn.mod.compare(GRAPH.DATA$Time, GRAPH.DATA$Numbers, GRAPH.DATA$Site, pn.options ="MOD.OUTPUTS")
Call:
Model: y ~ SSposnegRichards(x, Asym = Asym, K = K, Infl = Infl, M = M,
RAsym = RAsym, modno = 11, pn.options = "MOD.OUTPUTS") | grp
Data: userdata
Coefficients:
Asym K Infl M RAsym
1-1 24.32301 1.0579649 35.61585 17.961961 -8.1278653
1-2 25.15115 0.4167590 31.70722 6.473804 -14.2503569
2-1 23.69160 0.9179826 36.32567 15.839351 -11.2709848
2-2 24.75980 0.3559937 35.14258 4.499553 -14.4040608
The FlexParamCurve call is in the format (x, y, grouping variable, output_object)

Related

Plotting GAM in R: Setting custom x-axis limits?

Is there a way to set the x-axis limits when plotting the predicted fits for GAM models? More specifically, I'm fitting a smoother for each level of a factor using 'by = ', however, each factor level has a different range of values. Plotting the variable in ggplot results in an x-axis that automatically accommodates the different ranges of 'x'; however, after fitting a GAM (mgcv::gam()) the default behavior of plot.gam() appears to be predicting values across a shared x-axis limit.
The dummy data below has some continuous variable for 'x', but in my real data, 'x' is Time (year), and 'group' is sampling location. Because I did not collect data from each site across the same time range, I feel it is inappropriate to show a model fit in these empty years.
library(tidyverse)
library(mgcv)
library(gratia)
theme_set(theme_classic())
## simulate data with a grouping variable of three levels:
d = data.frame(group = rep(c('A','B','C'), each = 100),
x = c(seq(0,1,length=100),
seq(.2,1,length=100),
seq(0,.5,length=100))) %>%
mutate(y = sin(2*pi*x) + rnorm(100, sd=0.3),
group = as.factor(group))
## Look at data
ggplot(d, aes(x = x, y = y, colour = group))+
facet_wrap(~group)+
geom_point()+
geom_smooth()
Here is the raw data with loess smoother in ggplot:
## fit simple GAM with smoother for X
m1 = mgcv::gam(y ~ s(x, by = group), data = d)
## base R plot
par(mfrow = c(2,2), bty = 'l', las = 1, mai = c(.6,.6,.2,.1), mgp = c(2,.5,0))
plot(m1)
## Gavin's neat plotter
gratia::draw(m1)
Here is the predicted GAM fit that spans the same range (0,1) for all three groups:
Can I limit the prediction/plot to actual values of 'x'?
If you install the current development version (>= 0.6.0.9111) from GitHub, {gratia} will now do what you want, sort of. I added some functionality to smooth_estimates() that I had planned to add eventually but your post kicked it the top of the ToDo list and motivated me to add it now.
You can use smooth_estimates() to evaluate the smooths at the observed (or any user-supplied) data only and then a bit of ggplot() recreates most of the plot.
remotes::install_github("gavinsimpson/gratia")
library('mgcv')
library('gratia')
library('dplyr')
library('ggplot2')
d <- data.frame(group = rep(c('A','B','C'), each = 100),
x = c(seq(0,1,length=100),
seq(.2,1,length=100),
seq(0,.5,length=100))) %>%
mutate(y = sin(2*pi*x) + rnorm(100, sd=0.3),
group = as.factor(group))
m <- gam(y ~ group + s(x, by = group), data = d, method = 'REML')
sm <- smooth_estimates(m, data = d) %>%
add_confint()
ggplot(sm, aes(x = x, y = est, colour = group)) +
geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci, colour = NULL, fill = group),
alpha = 0.2) +
geom_line() +
facet_wrap(~ group)

Plotting multiple lift curves

I am new to R and trying to learn. I am trying to plot lift curves of multiple classifiers in one graph. I can't figure out a way to do it. I know the below two classifiers are essentially the same but they both give different graphs and I just want to combine the two. Below is the code I tried. Could someone please point me in the right direction
fullmod = glm(Response ~ page_views_90d+win_visits+osx_visits+mc_1+mc_2+mc_3+mc_4+mc_5+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
summary(fullmod)
fullmod.results <- predict(fullmod, newdata = testing, type='response')
plotLift(fitted.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
redmod1 = glm(Response ~ win_visits+osx_visits+mc_2+mc_4+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
redmod1.results <- predict(redmod1, newdata = testing, type = 'response')
plotLift(redmod1.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
# Attempt to plot multiple classifiers
plotLift((redmod1.results, fullmod.results), test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
Here is a way to plot multiple lift curves using the caret library. But first some data:
set.seed(1)
for_lift <- data.frame(Class = factor(rep(1:2, each = 50)),
model1 = sort(runif(100), decreasing = TRUE),
model2 = runif(100),
model3 = runif(100))
Here the Class column is the real classes
model1 is the predicted probabilities by the first model and so on.
Now create a lift object from the data using:
library(caret)
lift_curve <- lift(Class ~ model1 + model2, data = for_lift)
and plot it
xyplot(lift_curve, auto.key = list(columns = 3))
If you would like to plot with ggplot:
library(ggplot2)
ggplot(lift_curve$data)+
geom_line(aes(CumTestedPct, CumEventPct, color = liftModelVar))+
xlab("% Samples tested")+
ylab("% Samples found")+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_polygon(data = data.frame(x = c(0, lift_curve$pct, 100, 0),
y = c(0, 100, 100, 0)),
aes(x = x, y = y), alpha = 0.1)

brms package in R smoother

I have this data frame in R:
x = rep(seq(-10,10,1),each=5)
y = rep(0,length(x) )
weights = sample( seq(1,20,1) ,length(x), replace = TRUE)
weights = weights/sum(weights)
groups = rep( letters[1:5], times =length(x)/5 )
and some data that looks like this:
library(ggplot2)
ggplot(data = dat, aes(x = x, y = y, color = group))+geom_point( aes(size = weights))+
ylab("outcome")+
xlab("predictor x1")+
geom_vline(xintercept = 0)+ geom_hline(yintercept = 0)
fit_brms = brm(y~ s(x)+(1|group), data = dat)
by_group = marginal_effects(fit_brms, conditions = data.frame(group = dat$group) ,
re_formula = NULL, method = "predict")
plot(by_group, ncol = 5, points = TRUE)
I'd like to make a hierarchical nonlinear model so that there is a different nonlinear fit for each group.
In brms I have the code below which is doing a spline fit on the x predictor with random intercepts on group the fitted line is the same for all groups. the difference is where the lines cross the y intercept. Is there a way to make the non-linear fit be different for each group's data points?
ON page 13 here : https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf
It states "As the smooth term itself cannot be modeled as varying by year in a multilevel manner,we add a basic varying intercept in an effort to account for variation between years"
So the spline will be the same for all groups it appears? The only difference in the plots is where the spline cross the y intercept. That seems very restrictive. Can this be modified to make the spline unique to each group?
Use the formula: y ~ s(x, by = group) + (1|group)

Plot the observed and fitted values from a linear regression using xyplot() from the lattice package

I can create simple graphs. I would like to have observed and predicted values (from a linear regression) on the same graph. I am plotting say Yvariable vs Xvariable. There is only 1 predictor and only 1 response. How could I also add linear regression curve to the same graph?
So to conclude need help with:
plotting actuals and predicted both
plotting regression line
Here is one option for the observed and predicted values in a single plot as points. It is easier to get the regression line on the observed points, which I illustrate second
First some dummy data
set.seed(1)
x <- runif(50)
y <- 2.5 + (3 * x) + rnorm(50, mean = 2.5, sd = 2)
dat <- data.frame(x = x, y = y)
Fit our model
mod <- lm(y ~ x, data = dat)
Combine the model output and observed x into a single object for plott
res <- stack(data.frame(Observed = dat$y, Predicted = fitted(mod)))
res <- cbind(res, x = rep(dat$x, 2))
head(res)
Load lattice and plot
require("lattice")
xyplot(values ~ x, data = res, group = ind, auto.key = TRUE)
The resulting plot should look similar to this
To get just the regression line on the observed data, and the regression model is a simple straight line model as per the one I show then you can circumvent most of this and just plot using
xyplot(y ~ x, data = dat, type = c("p","r"), col.line = "red")
(i.e. you don't even need to fit the model or make new data for plotting)
The resulting plot should look like this
An alternative to the first example which can be used with anything that will give coefficients for the regression line is to write your own panel functions - not as scary as it seems
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod), ...) ## using mod from earlier
}
)
That gives a plot from Figure 2 above, but by hand.
Assuming you've done this with caret then
mod <- train(y ~ x, data = dat, method = "lm",
trControl = trainControl(method = "cv"))
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret
}
)
Will produce a plot the same as Figure 2 above.
Another option is to use panel.lmlineq from latticeExtra.
library(latticeExtra)
set.seed(0)
xsim <- rnorm(50, mean = 3)
ysim <- (0 + 2 * xsim) * (1 + rnorm(50, sd = 0.3))
## basic use as a panel function
xyplot(ysim ~ xsim, panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmlineq(x, y, adj = c(1,0), lty = 1,xol.text='red',
col.line = "blue", digits = 1,r.squared =TRUE)
})

R Language - Sorting data into ranges; averaging; ignore outliers

I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers.
The data consists of 2 columns WindSpeed and Power, so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other.
What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x and y and then find the average of power generated for each range and graph the curve formed.
From this average I want recalculate the average based on data which falls within one of two standard deviations of the average (basically ignoring outliers).
Any pointers are appreciated.
For those who are interested I am trying to create a graph similar to this. Its a pretty standard type of graph but like I said the shear quantity of data requires something heavier than excel.
Since you're no longer in Excel, why not use a modern statistical methodology that doesn't require crude binning of the data and ad hoc methods to remove outliers: locally smooth regression, as implemented by loess.
Using a slight modification of csgillespie's sample data:
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
plot(w_sp, power)
x_grid <- seq(0, 100, length = 100)
lines(x_grid, predict(loess(power ~ w_sp), x_grid), col = "red", lwd = 3)
Throw this version, similar in motivation as #hadley's, into the mix using an additive model with an adaptive smoother using package mgcv:
Dummy data first, as used by #hadley
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
df <- data.frame(power = power, w_sp = w_sp)
Fit the additive model using gam(), using an adaptive smoother and smoothness selection via REML
require(mgcv)
mod <- gam(power ~ s(w_sp, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
Predict from our model and get standard errors of fit, use latter to generate an approximate 95% confidence interval
x_grid <- with(df, data.frame(w_sp = seq(min(w_sp), max(w_sp), length = 100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
x_grid <- within(x_grid, upr <- fit + 2 * pred$se.fit)
x_grid <- within(x_grid, lwr <- fit - 2 * pred$se.fit)
Plot everything and the Loess fit for comparison
plot(power ~ w_sp, data = df, col = "grey")
lines(fit ~ w_sp, data = x_grid, col = "red", lwd = 3)
## upper and lower confidence intervals ~95%
lines(upr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
lines(lwr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
## add loess fit from #hadley's answer
lines(x_grid$w_sp, predict(loess(power ~ w_sp, data = df), x_grid), col = "blue",
lwd = 3)
First we will create some example data to make the problem concrete:
w_sp = sample(seq(0, 100, 0.01), 1000)
power = 1/(1+exp(-(rnorm(1000, mean=w_sp, sd=5) -40)/5))
Suppose we want to bin the power values between [0,5), [5,10), etc. Then
bin_incr = 5
bins = seq(0, 95, bin_incr)
y_mean = sapply(bins, function(x) mean(power[w_sp >= x & w_sp < (x+bin_incr)]))
We have now created the mean values between the ranges of interest. Note, if you wanted the median values, just change mean to median. All that's left to do, is to plot them:
plot(w_sp, power)
points(seq(2.5, 97.5, 5), y_mean, col=3, pch=16)
To get the average based on data that falls within two standard deviations of the average, we need to create a slightly more complicated function:
noOutliers = function(x, power, w_sp, bin_incr) {
d = power[w_sp >= x & w_sp < (x + bin_incr)]
m_d = mean(d)
d_trim = mean(d[d > (m_d - 2*sd(d)) & (d < m_d + 2*sd(d))])
return(mean(d_trim))
}
y_no_outliers = sapply(bins, noOutliers, power, w_sp, bin_incr)
Here are some examples of fitted curves (weibull analysis) for commercial turbines:
http://www.inl.gov/wind/software/
http://www.irec.cmerp.net/papers/WOE/Paper%20ID%20161.pdf
http://www.icaen.uiowa.edu/~ie_155/Lecture/Power_Curve.pdf
I'd recommend also playing around with Hadley's own ggplot2. His website is a great resource: http://had.co.nz/ggplot2/ .
# If you haven't already installed ggplot2:
install.pacakges("ggplot2", dependencies = T)
# Load the ggplot2 package
require(ggplot2)
# csgillespie's example data
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
# Bind the two variables into a data frame, which ggplot prefers
wind <- data.frame(w_sp = w_sp, power = power)
# Take a look at how the first few rows look, just for fun
head(wind)
# Create a simple plot
ggplot(data = wind, aes(x = w_sp, y = power)) + geom_point() + geom_smooth()
# Create a slightly more complicated plot as an example of how to fine tune
# plots in ggplot
p1 <- ggplot(data = wind, aes(x = w_sp, y = power))
p2 <- p1 + geom_point(colour = "darkblue", size = 1, shape = "dot")
p3 <- p2 + geom_smooth(method = "loess", se = TRUE, colour = "purple")
p3 + scale_x_continuous(name = "mph") +
scale_y_continuous(name = "power") +
opts(title = "Wind speed and power")

Resources