I'm trying to use a combination of this answer for annotating equations onto a ggplot plot and this answer of putting different texts onto different facets.
The problem I'm getting is that I can't get different formulas using mathematical expressions onto different facets.
#Required package
library(ggplot2)
#Split the mtcars dataset by the number of cylinders in each engine
cars.split <- split(mtcars, mtcars$cyl)
#Create a linear model to get the equation for the line for each cylinder
cars.mod <- lapply(cars.split, function(x){
lm(wt ~ mpg, data = x)
})
#Create predicted data set to add a 'geom_line()' in ggplot2
cars.pred <- as.data.frame(do.call(rbind,
mapply(x = cars.split, y = cars.mod,
FUN = function(x, y){
newdata <- data.frame(mpg = seq(min(x$mpg),
max(x$mpg),
length.out = 100))
pred <- data.frame(wt = predict(y, newdata),
mpg = newdata$mpg)
}, SIMPLIFY = F)))
cars.pred$cyl <- rep(c(4,6,8), each = 100)
(cars.coef <- as.data.frame(do.call(rbind, lapply(cars.mod, function(x)x$coefficients))))
#Create a data frame of line equations a 'cyl' variable to facilitate facetting
#as per second link. I had to MANUALLY take the values 'cars.coef' and put them
#into the data frame.
equation.text <- data.frame(label = c('y = 4.69-0.09x^{1}',
'y = 6.42-0.17x^{1}',
'y = 6.91-0.19x^{1}'),
cyl = c(4,6,8))
#Plot it
ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
geom_point() +
geom_line(data = cars.pred, mapping = aes(x = mpg, y = wt)) +
geom_text(data = equation.text, mapping = aes(x = 20, y = 5, label = label)) +
facet_wrap(.~ cyl)
The equation in the plot is exactly as I had written in the equation.text data frame, which is no surprise since the equations are in ''. But I'm trying to get it to be in mathematical notation, like $y = 4.69–0.09x^1$
I know I need to use expression as it said in the first link I had, but when I try to put it into a data frame:
equation.text <- data.frame(label = c(expression(y==4.69-0.9*x^{1}),
expression(y==6.42-0.17*x^{1}),
expression(y==6.91-0.19*x^{1})),
cyl = c(4,6,8))
I get an error saying expressions can't be put into data frames:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class '"expression"' to a data.frame
My questions are:
How can I get different equations in mathematical notation (italicized letters, superscripts, subscripts) in different facets?
What's a more automated way of getting values from the cars.coef data frame into the equations table (rather than typing out all the numbers!)?
UPDATE: This has been brought to my attention, but a lot of the answers seem to work for linear models. Is there a way to do it for, say, a non-linear model as well?
Hopefully this satisfies both parts of the question. I'm also not great with putting together expressions.
For the first part, you can create a data frame of equation text from your data frame of intercepts and coefficients, and format it how you need. I set up the sprintf to match the number of decimal places you had, and to flag the coefficient's sign.
library(ggplot2)
# same preparation as in question
# renamed just to have standard column names
names(cars.coef) <- c("intercept", "mpg")
equation.text <- data.frame(
cyl = unique(cars.pred$cyl),
label = sprintf("y == %1.2f %+1.2f*x^{1}", cars.coef$intercept, cars.coef$mpg,
stringsAsFactors = F)
)
The label column looks like this:
"y == 4.69 -0.09*x^{1}" "y == 6.42 -0.17*x^{1}" "y == 6.91 -0.19*x^{1}"
For the second part, you can just set parse = T in your geom_text, similar to the argument available in annotate.
ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
geom_point() +
geom_line(data = cars.pred, mapping = aes(x = mpg, y = wt)) +
geom_text(data = equation.text, mapping = aes(x = 20, y = 5, label = label), parse = T) +
facet_wrap(.~ cyl)
Notes on sprintf: % marks off where the formatting starts. I'm using + as the flag to include signs (plus or minus) to show the coefficient being either added or subtracted. 1.2f means including 1 place before the decimal point and 2 after; this can be adjusted as needed, but worked to display numbers e.g. 4.69. Arguments are passed to the format string in order as they're passed to sprintf.
Related
this sounds like a very trivial question at first, but no one managed to help me thus far, hence I'm reaching out to you all.
I'd like to do the following:
I'm writing a simple function that allows me to plot two variables against each other, with a third variable coloring the observation points (depending on the corresponding value of the color variable). The code looks like that:
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable))
}
scatterplot(data_used = example_data, x.variable = example_data$education,
y.variable = example_data$wages,
color.variable = example_data$sex)
What I would like R to do now is to label the x- and y-axis (respectively) by the corresponding variable's name that I decide to be plotted. In this example here, x-axis would be 'education', y-axis would be 'wages'.
I tried to simply put + labs (x = x.variable, y = y.variable) and it doesn't work (when doing that, R labels the axes by the variable values!). By default, R just names the axes "x.variable" and "y.variable".
Can someone help me achieve what I'm trying to do?
Best regards,
xifrix
jpenzer's answer is a good one. Here it is without the quasi-quotation stuff.
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable)) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
mtcars %>%
mutate(am = as.factor(am)) %>%
scatterplot(., x.variable = "hp",
y.variable = "mpg",
color.variable = "am")
I'm not sure the quasi-quotation stuff is 100% necessary in hindsight, but this is the pattern I use for similar needs:
my_scatterplot <- function(data, x, y){
.x = rlang::enquo(x)
.y = rlang::enquo(y)
data %>%
ggplot(aes(x = x, y = y))+
geom_point()+
labs(x = .x,
y = .y)
}
Let me know if it doesn't work for you, it should though. edit: Should add after DaveArmstrong's answer, the function would be called without quotes for the x / y variable e.g.
diamonds %>% my_scatterplot(price, table)
To pass a column name in the function you could use double curly braces {{...}} around the desired column name in the function body:
library(dplyr)
library(ggplot2)
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string({{x.variable}}, {{y.variable}})) +
geom_point(aes_string(color = {{color.variable}})) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
scatterplot(mtcars %>% mutate(am = as.factor(am)), x.variable = "mpg",
y.variable = "hp",
color.variable = "am")
I am making a shiny application where the user specifies the independent variables and as a result shiny displays a time series plot with plotly, where on-however each point shows the selected parameters.
If I know the exact number of variables that the user selects, I am able to construct the time series plot without a problem. Let's say there are 3 parameters chosen:
ggp <- ggplot(data = data.depend(), aes(x = Datum, y = y, tmp1 = .data[[input$Coockpit.Dependencies.Undependables[1]]], tmp2 = .data[[input$Coockpit.Dependencies.Undependables[2]]], tmp3 = .data[[input$Coockpit.Dependencies.Undependables[3]]])) +
geom_point()
ggplotly(ggp)
where data.depend() looks like
and the selected parameters are stored in a character vector
So the problem is that for each parameter I want to include in the tooltip, I have to hard code it in the aes function as tmpi = .data[[input$Coockpit.Dependencies.Undependables[i]]]. I would however like to write generic function that handles any amount of selected parameters. Any comment suggestions are welcome.
EDIT:
Below a minimal working example:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, tmp1 = .data[[ChosenParams[1]]], tmp2 = .data[[ChosenParams[2]]], tmp3 = .data[[ChosenParams[3]]])) + geom_point()
ggplotly(ggp)
Result:
So this works at the "cost" of me knowing the user is choosing three parameters and therefore I write in aes tmpi = .data[[ChosenParams[i]]]; i=1:3. I am interested in a solution with the same result but where I don't have to write tmpi = .data[[ChosenParams[i]]] i-number of times
Thank you!
One solution is to use eval(parse(...)) to create the code for you:
library(ggplot2)
library(plotly)
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- eval(parse(text = paste0("ggplot(data = data.dummy, aes(x = Datum, y = y, ",
paste0("tmp", seq_along(ChosenParams), " = .data[[ChosenParams[", seq_along(ChosenParams), "]]]", collapse = ", "),
")) + geom_point()"
)
))
ggplotly(ggp)
Just note that this is not very efficient and in some cases it is not advised to use it (see What specifically are the dangers of eval(parse(...))?). There might also be a way to use quasiquotation in aes(), but I am not really familiar with it.
EDIT: Added a way to do it with quasiquotation.
I had a look a closer look at quasiquotations in aes() and found a nicer way to do it using syms() and !!!:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
names(ChosenParams) <- paste0("tmp", seq_along(ChosenParams))
ChosenParams <- syms(ChosenParams)
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, !!!ChosenParams)) + geom_point()
ggplotly(ggp)
This question is related to
Create custom geom to compute summary statistics and display them *outside* the plotting region
(NOTE: All functions have been simplified; no error checks for correct objects types, NAs, etc.)
In base R, it is quite easy to create a function that produces a stripchart with the sample size indicated below each level of the grouping variable: you can add the sample size information using the mtext() function:
stripchart_w_n_ver1 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
levels.x <- levels(x)
x.ticks <- 1:length(levels(x))
axis(1, at = x.ticks, labels = levels.x)
n <- sapply(split(y, x), length)
mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks)
}
stripchart_w_n_ver1(mtcars, "cyl", "mpg")
or you can add the sample size information to the x-axis tick labels using the axis() function:
stripchart_w_n_ver2 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to set the second element of mgp to 1.5
# to allow room for two lines for the x-axis tick labels.
o.par <- par(mgp = c(3, 1.5, 0))
on.exit(par(o.par))
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
n <- sapply(split(y, x), length)
levels.x <- levels(x)
axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n))
}
stripchart_w_n_ver2(mtcars, "cyl", "mpg")
While this is a very easy task in base R, it is maddingly complex in ggplot2 because it is very hard to get at the data being used to generate the plot, and while there are functions equivalent to axis() (e.g., scale_x_discrete, etc.) there is no equivalent to mtext() that lets you easily place text at specified coordinates within the margins.
I tried using the built in stat_summary() function to compute the sample sizes (i.e., fun.y = "length") and then place that information on the x-axis tick labels, but as far as I can tell, you can't extract the sample sizes and then somehow add them to the x-axis tick labels using the function scale_x_discrete(), you have to tell stat_summary() what geom you want it to use. You could set geom="text", but then you have to supply the labels, and the point is that the labels should be the values of the sample sizes, which is what stat_summary() is computing but which you can't get at (and you would also have to specify where you want the text to be placed, and again, it is difficult to figure out where to place it so that it lies directly underneath the x-axis tick labels).
The vignette "Extending ggplot2" (http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html) shows you how to create your own stat function that allows you to get directly at the data, but the problem is that you always have to define a geom to go with your stat function (i.e., ggplot thinks you want to plot this information within the plot, not in the margins); as far as I can tell, you can't take the information you compute in your custom stat function, not plot anything in the plot area, and instead pass the information to a scales function like scale_x_discrete(). Here was my try at doing it this way; the best I could do was place the sample size information at the minimum value of y for each group:
StatN <- ggproto("StatN", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales) {
y <- data$y
y <- y[!is.na(y)]
n <- length(y)
data.frame(x = data$x[1], y = min(y), label = paste0("n=", n))
}
)
stat_n <- function(mapping = NULL, data = NULL, geom = "text",
position = "identity", inherit.aes = TRUE, show.legend = NA,
na.rm = FALSE, ...) {
ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom,
position = position, inherit.aes = inherit.aes, show.legend = show.legend,
params = list(na.rm = na.rm, ...))
}
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n()
I thought I had solved the problem by simply creating a wrapper function to ggplot:
ggstripchart <- function(data, x.name, y.name,
point.params = list(),
x.axis.params = list(labels = levels(x)),
y.axis.params = list(), ...) {
if(!is.factor(data[, x.name]))
data[, x.name] <- factor(data[, x.name])
x <- data[, x.name]
y <- data[, y.name]
params <- list(...)
point.params <- modifyList(params, point.params)
x.axis.params <- modifyList(params, x.axis.params)
y.axis.params <- modifyList(params, y.axis.params)
point <- do.call("geom_point", point.params)
stripchart.list <- list(
point,
theme(legend.position = "none")
)
n <- sapply(split(y, x), length)
x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n)
x.axis <- do.call("scale_x_discrete", x.axis.params)
y.axis <- do.call("scale_y_continuous", y.axis.params)
stripchart.list <- c(stripchart.list, x.axis, y.axis)
ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list
}
ggstripchart(mtcars, "cyl", "mpg")
However, this function does not work correctly with faceting. For example:
ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am)
shows the the sample sizes for both facets combined for each facet. I would have to build faceting into the wrapper function, which defeats the point of trying to use everything ggplot has to offer.
If anyone has any insights to this problem I would be grateful. Thanks so much for your time!
I have updated the EnvStats
package to include a stat called stat_n_text which will add the sample size (the number of unique y-values) below each unique x-value. See the help file for stat_n_text for more information and a list of examples. Below is a simple example:
library(ggplot2)
library(EnvStats)
p <- ggplot(mtcars,
aes(x = factor(cyl), y = mpg, color = factor(cyl))) +
theme(legend.position = "none")
p + geom_point() +
stat_n_text() +
labs(x = "Number of Cylinders", y = "Miles per Gallon")
My solution might be a little simple but it works well.
Given an example with faceting by am I start by creating labels
using paste and \n.
mtcars2 <- mtcars %>%
group_by(cyl, am) %>% mutate(n = n()) %>%
mutate(label = paste0(cyl,'\nN = ',n))
I then use these labels instead of cyl in the ggplot code
ggplot(mtcars2,
aes(x = factor(label), y = mpg, color = factor(label))) +
geom_point() +
xlab('cyl') +
facet_wrap(~am, scales = 'free_x') +
theme(legend.position = "none")
To produce something like the figure below.
You can print the counts below the x-axis labels using geom_text if you turn off clipping, but you'll probably have to tweak the placement. I've included a "nudge" parameter for that in the code below. Also, the method below is intended for cases where all the facets (if any) are column facets.
I realize you ultimately want code that will work inside a new geom, but perhaps the examples below can be adapted for use in a geom.
library(ggplot2)
library(dplyr)
pgg = function(dat, x, y, facet=NULL, nudge=0.17) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet)) {
p = p + facet_grid(paste("~", paste(facet, collapse="+")))
}
# Add counts as text labels
p = p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - nudge*1.05*diff(range(dat[,y])),
colour="grey20", size=3.5) +
theme(axis.title.x=element_text(margin=unit(c(1.5,0,0,0),"lines")))
# Turn off clipping and return plot
p <- ggplot_gtable(ggplot_build(p))
p$layout$clip[p$layout$name=="panel"] <- "off"
grid.draw(p)
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet=c("am","vs"))
Another, potentially more flexible, option is to add the counts to the bottom of the plot panel. For example:
pgg = function(dat, x, y, facet_r=NULL, facet_c=NULL) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet_r, facet_c, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet_r) | !is.null(facet_c)) {
facets = paste(ifelse(is.null(facet_r),".",facet_r), " ~ " ,
ifelse(is.null(facet_c),".",facet_c))
p = p + facet_grid(facets)
}
# Add counts as text labels
p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - 0.15*min(dat[,y]), colour="grey20", size=3) +
scale_y_continuous(limits=range(dat[,y]) + c(-0.1*min(dat[,y]), 0.01*max(dat[,y])))
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet_c="am")
pgg(mtcars, "cyl", "mpg", facet_c="am", facet_r="vs")
I have a data frame mydataAll with columns DESWC, journal, and highlight. To calculate the average and standard deviation of DESWC for each journal, I do
avg <- aggregate(DESWC ~ journal, data = mydataAll, mean)
stddev <- aggregate(DESWC ~ journal, data = mydataAll, sd)
Now I plot a horizontal stripchart with the values of DESWC along the x-axis and each journal along the y-axis. But for each journal, I want to indicate the standard deviation and average with a simple line. Here is my current code and the results.
stripchart2 <-
ggplot(data=mydataAll, aes(x=mydataAll$DESWC, y=mydataAll$journal, color=highlight)) +
geom_segment(aes(x=avg[1,2] - stddev[1,2],
y = avg[1,1],
xend=avg[1,2] + stddev[1,2],
yend = avg[1,1]), color="gray78") +
geom_segment(aes(x=avg[2,2] - stddev[2,2],
y = avg[2,1],
xend=avg[2,2] + stddev[2,2],
yend = avg[2,1]), color="gray78") +
geom_segment(aes(x=avg[3,2] - stddev[3,2],
y = avg[3,1],
xend=avg[3,2] + stddev[3,2],
yend = avg[3,1]), color="gray78") +
geom_point(size=3, aes(alpha=highlight)) +
scale_x_continuous(limit=x_axis_range) +
scale_y_discrete(limits=mydataAll$journal) +
scale_alpha_discrete(range = c(1.0, 0.5), guide='none')
show(stripchart2)
See the three horizontal geom_segments at the bottom of the image indicating the spread? I want to do that for all journals, but without handcrafting each one. I tried using the solution from this question, but when I put everything in a loop and remove the aes(), it give me an error that says:
Error in x - from[1] : non-numeric argument to binary operator
Can anyone help me condense the geom_segment() statements?
I generated some dummy data to demonstrate. First, we use aggregate like you have done, then we combine those results to create a data.frame in which we create upper and lower columns. Then, we pass these to the geom_segment specifying our new dataset. Also, I specify x as the character variable and y as the numeric variable, and then use coord_flip():
library(ggplot2)
set.seed(123)
df <- data.frame(lets = sample(letters[1:8], 100, replace = T),
vals = rnorm(100),
stringsAsFactors = F)
means <- aggregate(vals~lets, data = df, FUN = mean)
sds <- aggregate(vals~lets, data = df, FUN = sd)
df2 <- data.frame(means, sds)
df2$upper = df2$vals + df2$vals.1
df2$lower = df2$vals - df2$vals.1
ggplot(df, aes(x = lets, y = vals))+geom_point()+
geom_segment(data = df2, aes(x = lets, xend = lets, y = lower, yend = upper))+
coord_flip()+theme_bw()
Here, the lets column would resemble your character variable.
So I have two sets of data (of different length) that I am trying to group up and display the density plots for:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont")))
ggplot(dat, aes(x = dens, group=lines, fill = lines)) + geom_density(alpha = .5)
when I run the code it spits an error about the different lengths, i.e.
"arguments imply different num of rows: x, y"
I then augment the code to:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont"),X))
Where X is the length of the longer argument so the lengths of "lines" will match that of dens.
Now the issue is that when when I go to plot the data I am only getting ONE density plot.... I know there should be two, since plotting the densities with plot/lines, is clearly two non-equal overlapping distributions, so I am assuming the error is with the grouping...
hope that makes sense.
So I am not sure why but basically I simply had to do the rep() function manually:
A<-data.frame(ExpN, key = "exp")
B<-data.frame(ConN,key = "con")
colnames(A) <- c("a","key")
colnames(B) <- c("a","key")
dat <- rbind(A,B)
ggplot(dat, aes(x = dens, fill = key)) + geom_density(alpha = .5)
You need to tell rep how many times to repeat each element to get it to line up
dat <- data.frame(dens = c(nEXP,nCNT),
lines = rep(c("Exp","Cont"), c(length(nEXP),length(nCNT)))
That should give you a dat you can use with your ggplot call.