How to use manipulate() on ggplot geom_smooth() - r

Does anyone know how to use manipulate() on a ggplot, in order to easily select a smoothing (span) level? I´ve tried the following without success:
# fake data
xvals <- 1:10
yvals <- xvals^2*exp(rnorm(2,5,0.6))
data <- data.frame(xvals,yvals)
# plot with manipulate
manipulate(
ggplot(data,aes(xvals,yvals)) +
geom_smooth(span=slider(0.5,5)) +
geom_point()
)
I want to be able to cycle through "smoothing levels" easily.

Changed your data to have more data points.
xvals <- 1:100
yvals <- rnorm(100)
data <- data.frame(xvals,yvals)
You have to give name for the value used with span= in geom_smooth() (for example, span.val) and then define span.val=slider(0.1,1) outside the ggplot() function - in this example as second argument to manipulate().
library(manipulate)
library(ggplot2)
manipulate({
#define plotting function
ggplot(data,aes(xvals,yvals)) +
geom_smooth(method="loess",span=span.val) +
geom_point()},
#define variable that will be changed in plot
span.val=slider(0.1,1)
)

Related

R ggplot loop: in a for loop of ggplot histograms, how can you automatically set the y axis scale based on max frequency?

I have the following loop to produce several histograms based off certain columns (columns 2 to 5) in a larger dataset (df):
loop.vector <- 2:5
for (i in loop.vector){
x <- df[,i]
print(ggplot(df,aes(x=x)) + geom_histogram(binwidth=1)+scale_x_continuous(breaks=seq(0,max((x),1)))
}
I'd like to have my y-axis scale done automatically as I have for the x-axis, where it ranges between zero and whatever the maximum frequency value is, at increments of 1.
I know how to set these values manually if I were to plot, take a look at it, and enter the max y-axis value separately, but i'd like to do this automatically within the loop.
Thanks!
Answering the question: how to access max counts for a histogram plot?
The information you're missing on each plot in order to create your scale_y_continuous command is the maximum number of counts. There is a nice way to access this information once you have created a ggplot object, which is to use the built-in ggplot_build() function from ggplot2. For a given plot, myPlot, the following will give you a list of dataframes that are used for each layer in your plot:
ggplot_build(myPlot)$data
In the case of your example, you can access the count column of the first data frame (since you only have one histogram geom layer). Here's how you can write the function to do what you need it to do. I'll use an example dataset that can show you the results. Note that I've also changed your scale_x_continuous line to be able to accomodate positive and negative numbers by using a combination of min(), max(), and the ceiling() and floor() functions:
set.seed(1234)
df <- data.frame(
y1=rnorm(100,10,1),
y2=rnorm(100,12,3),
y3=rnorm(100,5,4),
y4=rnorm(100,13,5))
for (i in 1:ncol(df)) {
p <- ggplot(df, aes(df[,i])) +
geom_histogram(alpha=0.5, color='black', fill='red', binwidth=1) +
scale_x_continuous(breaks=seq(floor(min(df[,i])),ceiling(max(df[,i])))) +
ggtitle(names(df)[i])
# get max counts
max_count <- max(ggplot_build(p)$data[[1]]$count)
p <- p + scale_y_continuous(breaks=seq(0,max_count,1))
print(p)
}
Is there a better way?
While that gets you what need, it's typically hard to deal with multiple plots output to your graphics device iteratively. I would recommend reformatting the above code as a function and then using lapply() and using something like plot_grid() from cowplot to display the output. This suggested approach is detailed in the code below:
myPlots <- function(data, column, fill_color) {
# column = character name of column
p <- ggplot(data, aes_string(x=column)) +
geom_histogram(fill='red', binwidth=1, alpha=0.5, color='black') +
scale_x_continuous(breaks=seq(floor(min(data[column])), ceiling(max(data[column])),1)) +
ggtitle(column)
max_count <- max(ggplot_build(p)$data[[1]]$count)
p <- p + scale_y_continuous(breaks=seq(0,max_count,1))
return(p)
}
library(cowplot)
plotList <- lapply(names(df), myPlots, data=df)
plot_grid(plotlist = plotList)
Figured it out - my values are integers, so what ended up working was a variation on Duck's response. See below:
loop.vector <- 2:5
for (i in loop.vector){
x <- df[,i]
print(ggplot(df,aes(x=x)) + geom_histogram(binwidth=1)+scale_x_continuous(breaks=seq(0,max((x),1)))+scale_y_continuous(breaks=seq(0,max(table(x)),1)))
}

Storing do.call("grid.arrange") output without printing

I would like to create a variable p that contains a plot with four ggplot2 subplots. I am able to achieve this with the below code:
library(ggplot2)
library(gridExtra)
data = diamonds[1:50,]
x = data$x
myPlots = lapply(c(1,5,6,7), function(i){
y = as.data.frame(data[,i])
y = y[,1]
df = data.frame(x=x,y=y)
p <- qplot(x, y, data=df)
p
})
p = do.call("grid.arrange", c(myPlots, ncol=2))
I like that I can use the variable p later by calling:
library(grid)
grid.draw(p)
However, I do not like that when I initially create p with the do.call("grid.arrange") syntax, it plots it automatically (at least in RStudio).
My question is: Is it possible to create p to be stored for later use, without plotting it upon its creation?

Changing ggplot objects generated by ggiNEXT()

This is the basic example given in the iNEXT package:
library(iNEXT)
data(spider)
# multiple abundance-based data with multiple order q
z <- iNEXT(spider, q=c(0,1,2), datatype="abundance")
p1 <- ggiNEXT(z, facet.var="site", color.var="order")
In my dataset, i have more samples and the facetting does not work so great:
, so i want to change the ncol/nrow arguments in the facet_wrap/grid-call inside the object "p1". p1 is a ggplot object, so it can be altered (f.e. p1 + xlab("") removes the x-title).
In general, it would be nice to know how gginext() can be decomposed into single lines, and what objects are used in the data arguments, so i can change the order of the samples and reduce the amount of samples used per plot. Somehow, i wasnt able to find that out by looking at the function itself, also i get "Error: ggplot2 doesn't know how to deal with data of class iNEXT" when i try to follow gginext() step-by-step.
You could use facet_wrap(~site, ncol=3) to tune your plot. Take a simple example as following:
library(iNEXT)
library(ggplot2)
set.seed(123)
p <- 1/1:sample(1:50, 1)
p <- p/sum(p)
dat <- as.data.frame(rmultinom(9, 200, p))
z <- iNEXT(dat, q=c(0,1,2))
p1 <- ggiNEXT(z, facet.var="site", color.var="order")
p1 + facet_wrap(~site, ncol=3)

Change colors of select lines in ggplot2 coefficient plot in R

I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.

plotting multiple plots in ggplot2 on same graph that are unrelated

How would one use the smooth.spline() method in a ggplot2 scatterplot?
If my data is in the data frame called data, with two columns, x and y.
The smooth.spline would be sm <- smooth.spline(data$x, data$y). I believe I should use geom_line(), with sm$x and sm$y as the xy coordinates. However, how would one plot a scatterplot and a lineplot on the same graph that are completely unrelated? I suspect it has something to do with the aes() but I am getting a little confused.
You can use different data(frames) in different geoms and call the relevant variables using aes or you could combine the relevant variables from the output of smooth.spline
# example data
set.seed(1)
dat <- data.frame(x = rnorm(20, 10,2))
dat$y <- dat$x^2 - 20*dat$x + rnorm(20,10,2)
# spline
s <- smooth.spline(dat)
# plot - combine the original x & y and the fitted values returned by
# smooth.spline into a data.frame
library(ggplot2)
ggplot(data.frame(x=s$data$x, y=s$data$y, xfit=s$x, yfit=s$y)) +
geom_point(aes(x,y)) + geom_line(aes(xfit, yfit))
# or you could use geom_smooth
ggplot(dat, aes(x , y)) + geom_point() + geom_smooth()

Resources