In the following reproducible example I'm attempting to build a ggplot2 function call dynamically, in order to be able to accommodate unknown number of mixture distribution components. The code produces this error message: Error in parse(text = g) : <text>:8:0: unexpected end of input. What is the problem with the code? (I'm aware of the method of pre-calculating plot data, storing it in a data frame, melting it and supplying it to ggplot2. I would like to explore the option below, as well.) Thank you!
library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)
NUM_COMPONENTS <- 2
set.seed(12345) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
myData <- diamonds$price
calc.component <- function(x, lambda, mu, sigma) {
lambda * dnorm(x, mean = mu, sd = sigma)
}
overlayHistDensity <- function(data, func) {
# extract 'k' components from mixed distribution 'data'
mix <- normalmixEM(data, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix)
DISTRIB_COLORS <-
suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))
# plot histogram, empirical and fitted densities
g <- "ggplot(data) +\n"
for (i in seq(length(mix$lambda))) {
args <- paste0("args.", i)
assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
sigma = mix$sigma[i]))
g <- paste0(g,
"stat_function(fun = func, args = ",
args,
", aes(color = ",
DISTRIB_COLORS[i], ")) +\n")
}
tailStr <-
"geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')"
g <- paste0(g, tailStr)
gr <- eval(parse(text = g))
return (gr)
}
overlayHistDensity(log10(myData), 'calc.component')
As long as you realize you are going about this a hard way...
If you look at the value of g before it is parsed, it is
ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = #E41A1C)) +
stat_function(fun = func, args = args.2, aes(color = #377EB8)) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')
Usually the unexpected end of input message is from unbalanced quotes or parentheses, but you've not (obviously) got that problem here. The problem is in the color specification. Literal hex colors should be specified as strings
ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = "#E41A1C")) +
stat_function(fun = func, args = args.2, aes(color = "#377EB8")) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')
Without the quotes, the hash is a comment character and the rest of the lines (the right parentheses in particular) are not included, and the error you got is given. (Note the syntax highlighting that SO gives on the first code snippet.)
That said, I think you can get what you want without the eval(parse()) approach. In particular, look at aes_string which allows the specification of which variable is used as the aesthetic by the value of a string variable and adding a list of stats or geoms (which can be of un-pre-specified length created using lapply, for example). Also, you seem to be specifying literal colors and then mapping them to just red and blue; possibly you want scale_colour_identity? All this (last paragraph) is more code review and is not what you actually asked about.
You've got several problems:
ggplot's data argument must be a data.frame, not a vector
hex color names starting with # must be quoted, or they'll be interpreted as comments
you must to provide an aes(x = ) mapping
color definitions that are constant do not go in aes
This should work:
overlayHistDensity <- function(data, func) {
# extract 'k' components from mixed distribution 'data'
mix <- normalmixEM(data, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix)
DISTRIB_COLORS <-
suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))
# plot histogram, empirical and fitted densities
g <- "ggplot(as.data.frame(data), aes(x = data)) +\n"
for (i in seq(length(mix$lambda))) {
args <- paste0("args.", i)
assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
sigma = mix$sigma[i]))
g <- paste0(g,
"stat_function(fun = func, args = ",
args,
", color = '",
DISTRIB_COLORS[i], "') +\n")
}
tailStr <-
"geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')"
g <- paste0(g, tailStr)
gr <- eval(parse(text = g))
return (gr)
}
Like Brian, I'll finish with 2 comments:
This is standard debugging and you shouldn't need an SO post for it. It's essentially several syntax errors and a couple little mistakes. I took your code outside of a function and ran it up through the final g <- paste0 line, and put the g output in a code window and looked for problems. Try to write code that works outside of a function first, then put it in a function.
Seconding Brian's comment, a more natural approach is to not use eval(parse()) and all this pasting. Instead, use aes_string, melt your data so that you can use one stat_function call based on a a grouping variable.
Related
I can plot a function in ggplot2, like so:
library(ggplot2)
ggplot(data.frame(x=0), aes(x)) + geom_function(fun = sin) + xlim(c(-5,5))
Can I use ggplot2's facetting to make the plot for multiple functions, one in each facet? (for example sin and cos)
Looks like you in fact can facet by the function if you feed each layer its own data with the faceting variable specified:
library(ggplot2) # using ggplot2 3.3.5
ggplot(data.frame(x=0), aes(x)) +
geom_function(fun = sin, data = data.frame(x = -5:5, fun_name = "sin")) +
geom_function(fun = cos, data = data.frame(x = -5:5, fun_name = "cos")) +
facet_wrap(~fun_name)
... fun is not an aesthetic ... you can make it one :)
It's a bit of an overkill, but just a quick demonstration what is possible by modifying the Geoms and Stats. The below is a quick hack and I've referenced with quite a lot of ugly :::, which you wouldn't do if you would properly package this. Also this is clearly not properly tested on many use cases. Few more comments in the code.
This was quite ... fun :)
df <- data.frame(x = 0, fun = c("sin", "cos", "tan", "mean"))
ggplot(df, aes(x)) +
stat_function2(aes(fun = fun)) +
xlim(c(-5,5)) +
facet_wrap(~fun, scales = "free_y")
Modifying Geom and Stat - StatFunction2
StatFunction2 <- ggproto(NULL, StatFunction)
## removing fun from the arguments
StatFunction2$compute_group <- function (data, scales, xlim = NULL, n = 101, args = list())
{
if (is.null(scales$x)) {
## need to change that here a bit
range <- rlang::`%||%`(xlim, c(0, 1))
xseq <- seq(range[1], range[2], length.out = n)
x_trans <- xseq
}
else {
## same same
range <- rlang::`%||%`(xlim, scales$x$dimension())
xseq <- seq(range[1], range[2], length.out = n)
if (scales$x$is_discrete()) {
x_trans <- xseq
}
else {
x_trans <- scales$x$trans$inverse(xseq)
}
}
## get the function, this is the trick :)
fun <- unique(data$fun)
if (plyr::is.formula(fun))
fun <- as_function(fun)
y_out <- do.call(fun, c(list(quote(x_trans)), args))
if (!is.null(scales$y) && !scales$y$is_discrete()) {
y_out <- scales$y$trans$transform(y_out)
}
ggplot2:::new_data_frame(list(x = xseq, y = y_out))
}
## update stat_function - remove fun argument and reference new geom_function2
stat_function2 <- function (mapping = NULL, data = NULL, geom = "function2", position = "identity",
..., fun, xlim = NULL, n = 101, args = list(), na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE)
{
if (is.null(data)) {
### those ::: are just for to make it work here
data <- ggplot2:::ensure_nonempty_data
}
layer(data = data, mapping = mapping, stat = StatFunction2,
geom = geom, position = position, show.legend = show.legend,
## fun needs to be removed here too.
inherit.aes = inherit.aes, params = list(n = n,
args = args, na.rm = na.rm, xlim = xlim, ...))
}
## This is the correct way to create copies (children) of ggproto objects
## see https://stackoverflow.com/a/70637511/7941188
GeomFunction2 <- ggproto(NULL, GeomFunction)
## change the required aesthetics - this removes the warning that aesthetics are not known
GeomFunction2$required_aes <- c("x", "y", "fun")
## update the corresponding geom (two locations in this function definition)
geom_function2 <- function (mapping = NULL, data = NULL, stat = "function2", position = "identity",
..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
{
if (is.null(data)) {
data <- ensure_nonempty_data
}
layer(data = data, mapping = mapping, stat = stat, geom = GeomFunction2,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...))
}
Because fun= is not an aesthetic, I think you cannot facet on it. However, you can faux-facet using the patchwork package.
library(ggplot2)
gsin <- ggplot(data.frame(x=0), aes(x)) +
geom_function(fun = sin) +
xlim(c(-5,5)) +
labs(title = "sin()")
gtan <- ggplot(data.frame(x=0), aes(x)) +
geom_function(fun = tan) +
xlim(c(-5,5)) +
labs(title = "tan()")
gsin + gtan
If you prefer the "look" of ggplot2's facets, you can choose this method instead:
gsin <- ggplot(data.frame(x=0, fun="sin"), aes(x)) +
facet_wrap(~fun) +
geom_function(fun = sin) +
xlim(c(-5,5))
gtan <- ggplot(data.frame(x=0, fun="tan"), aes(x)) +
facet_wrap(~fun) +
geom_function(fun = tan) +
xlim(c(-5,5))
gsin + gtan
All of this so far has the effect of facet_*(scales="free_y") (because we fixed xlim(.)). If you want to mimic faceting more closely, you need to control the limits of all facets:
ylims <- c(-1, 1)
gsin <- ggplot(data.frame(x=0, fun="sin"), aes(x)) +
facet_wrap(~fun) +
geom_function(fun = sin) +
xlim(c(-5,5)) +
scale_y_continuous(limits = ylims)
gtan <- ggplot(data.frame(x=0, fun="tan"), aes(x)) +
facet_wrap(~fun) +
geom_function(fun = tan) +
xlim(c(-5,5)) +
scale_y_continuous(name = NULL, guide = NULL, limits = ylims)
gsin + gtan
# Warning: Removed 22 row(s) containing missing values (geom_path).
Technically you are not required here to set the y-limits on all, but ... unless you know with certainty that the limits on unconstrained y-axes will be what you need, it is possible that they could be slightly off. For instance, if you change the initial function (in a quick hack) to be 2*sin(x) but forget to update the remaining facets' y-limits, then your plots will be lying. It's best to set the limits in a single place (ylims <- ...) and reference in all plots.
I have several dataframes that I have called and placed within a list as follows:
plots_list <- list(data_1, data_2, data_3, data_4, data_5)
I have defined a function, which takes in data from each dataframe and plots the residuals:
residual_plots <- function(data) {
ggplot(data, aes(x = x, y = y)) +
geom_point(
aes(color = col1,
shape = col2),
na.rm = T,
size = 1.2
) +
geom_hline(
yintercept = 0,
colour = "dimgray",
linetype = "solid",
size = 0.25
) +
geom_abline(
intercept = 0,
slope = 0.1,
colour = "dimgray",
linetype = "dashed",
size = 0.25
) +
geom_abline(
intercept = 0,
slope = -0.1,
colour = "dimgray",
linetype = "dashed",
size = 0.25
)
}
Since I have multiple dataframes, I wanted to use a for-loop to get the job done quickly. This is the code I have used:
res_plot_list <- vector("list", length = length(plots_list))
for (i in (plots_list)) {
res_plot_list[[i]] <- residual_plots(i)
}
ml <- marrangeGrob(res_plot_list, nrow = 2, ncol = 3)
ml
I have tried multiple variations to try and get all the plots onto the same page but I keep hitting some form of dead end. As of now, when I use this code, I keep running into the following error:
Error in `[[<-`(`*tmp*`, i, value = residual_plots(i)) :
invalid subscript type 'list'
I would really appreciate any help on how I can best approach this problem, because I am unable to find the right one. Thanks!
i in the for loop is the dataset hence res_plot_list[[i]] fails. Try -
for (i in seq_along(plots_list)) {
res_plot_list[[i]] <- residual_plots(plots_list[[i]])
}
Or why not just use lapply -
res_plot_list <- lapply(plots_list, residual_plots)
So I'm making pyramid visualizations. I'm using scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) to produce the labels. However, I want to get rid of the negative sign on the female section of the graph.
I think the best way to keep the SI suffixes, but remove the negative sign is to modify the labels output by label_number_si, but labels = abs(label_number_si()) gives the following error: Error in abs: non-numeric argument to mathematical function
Any insight is appreciated.
EDIT: Use demo_continuous(c(-1e10,1e10), label = label_number_si()) labels should look as they do below EXCEPT that negative numbers should not have a "-" in front
I bet there's a simpler way to do this but I haven't figured it out yet.
Here's an example that replicates your question's result using the normal scales::label_number_si:
ggplot(data = data.frame(x = 1000*c(-5:-1, 1:5),
type = rep(1:2, each = 5))) +
geom_col(aes(x,abs(x),fill = type), orientation = "y") +
scale_x_continuous(labels = scales::label_number_si()) +
facet_wrap(~type, scales = "free_x")
We could make a custom version of scales::label_number_si which makes them absolute values in the last step. To make this, I used command-click (Mac OS X) on the function name to see the underlying function's code, and then just pasted that into a new function definition with minor modifications.
label_number_si_abs <- function (accuracy = 1, unit = NULL, sep = NULL, ...)
{
sep <- if (is.null(unit))
""
else " "
function(x) {
breaks <- c(0, 10^c(K = 3, M = 6, B = 9, T = 12))
n_suffix <- cut(abs(x), breaks = c(unname(breaks), Inf),
labels = c(names(breaks)), right = FALSE)
n_suffix[is.na(n_suffix)] <- ""
suffix <- paste0(sep, n_suffix, unit)
scale <- 1/breaks[n_suffix]
scale[which(scale %in% c(Inf, NA))] <- 1
scales::number(abs(x), accuracy = accuracy, scale = unname(scale),
suffix = suffix, ...)
}
}
We could replace with the custom function to get abs value labels:
ggplot(data = data.frame(x = 1000*c(-5:-1, 1:5),
type = rep(1:2, each = 5))) +
geom_col(aes(x,abs(x),fill = type), orientation = "y") +
scale_x_continuous(labels = label_number_si_abs()) +
facet_wrap(~type, scales = "free_x")
I'm trying to graph multiple nonlinear least squares regression in r in different colors based on the value of a variable.
However, I also display the equation of the last one, and I would like the color in the nonlinear regression corresponding to the equation to be black as well.
What I've tried is shown in the geom_smooth() layer - I tried to include an ifelse() statement, but this doesn't work because of reasons described here: Different between colour argument and aes colour in ggplot2?
test <- function() {
require(ggplot2)
set.seed(1);
master <- data.frame(matrix(NA_real_, nrow = 0, ncol = 3))
for( i in 1:5 ) {
df <- data.frame(matrix(NA_real_, nrow = 50, ncol = 3))
colnames(df) <- c("xdata", "ydata", "test")
df$xdata = as.numeric(sample(1:100, size = nrow(df), replace = FALSE))
df$ydata = as.numeric(sample(1:3, size = nrow(df), prob=c(.60, .25, .15), replace = TRUE))
# browser()
df$test = i
master <- rbind(master, df)
}
df <- master
last <- 5
# based on https://stackoverflow.com/questions/18305852/power-regression-in-r-similar-to-excel
power_eqn = function(df, start = list(a=300,b=1)) {
m = nls(as.numeric(reorder(xdata,-ydata)) ~ a*ydata^b, start = start, data = df)
# View(summary(m))
# browser()
# eq <- substitute(italic(hat(y)) == a ~italic(x)^b*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue),
eq <- substitute(italic(y) == a ~italic(x)^b*","~~italic('se')~"="~se*","~~italic(p)~"="~pvalue,
list(a = format(coef(m)[1], digits = 6), # a
b = format(coef(m)[2], digits = 6), # b
# r2 = format(summary(m)$r.squared, digits = 3),
se = format(summary(m)$parameters[2,'Std. Error'], digits = 6), # standard error
pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=6) )) # p value (based on t statistic)
as.character(as.expression(eq))
}
plot1 <- ggplot(df, aes(x = as.numeric(reorder(xdata,-ydata)), y = ydata ) ) +
geom_point(color="black", shape=1 ) +
# PROBLEM LINE
stat_smooth(aes(color=ifelse(test==5, "black", test)), method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a =1,b=1)),se=FALSE, fullrange=TRUE) +
geom_text(x = quantile(df$xdata)[4], y = max(df$ydata), label = power_eqn(df), parse = TRUE, size=4, color="black") + # make bigger? add border around?
theme(legend.position = "none", axis.ticks.x = element_blank() ) + #, axis.title.x = "family number", axis.title.y = "number of languages" ) # axis.text.x = element_blank(),
labs( x = "xdata", y = "ydata", title="test" )
plot1
}
test()
This is the graph I got.
I would like the line corresponding to the points and equation to be black as well. Does anyone know how to do this?
I do not want to use a scale_fill_manual, etc., because my real data would have many, many more lines - unless the scale_fill_manual/etc. can be randomly generated.
You could use scale_color_manual using a custom created palette where your level of interest (in your example where test equals 5) is set to black. Below I use palettes from RColorBrewer, extend them if necessary to the number of levels needed and sets the last color to black.
library(RColorBrewer) # provides several great palettes
createPalette <- function(n, colors = 'Greens') {
max_colors <- brewer.pal.info[colors, ]$maxcolors # Get maximum colors in palette
palette <- brewer.pal(min(max_colors, n), colors) # Get RColorBrewer palette
if (n > max_colors) {
palette <- colorRampPalette(palette)(n) # make it longer i n > max_colros
}
# assume that n-th color should be black
palette[n] <- "#000000"
# return palette
palette[1:n]
}
# create a palette with 5 levels using the Spectral palette
# change from 5 to the needed number of levels in your real data.
mypalette <- createPalette(5, 'Spectral') # palettes from RColorBrewer
We can then use mypalette with scale_color_manual(values=mypalette) to color points and lines according to the test variable.
Please note that I have updated geom_point and stat_smooth to so that they use aes(color=as.factor(test)). I have also changed the call to power_eqn to only use data points where df$test==5. The black points, lines and equation should now be based on the same data.
plot1 <- ggplot(df, aes(x = as.numeric(reorder(xdata,-ydata)), y = ydata )) +
geom_point(aes(color=as.factor(test)), shape=1) +
stat_smooth(aes(color=as.factor(test)), method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a =1,b=1)),se=FALSE, fullrange=TRUE) +
geom_text(x = quantile(df$xdata)[4], y = max(df$ydata), label = power_eqn(df[df$test == 5,]), parse = TRUE, size=4, color="black") +
theme(legend.position = "none", axis.ticks.x = element_blank() ) +
labs( x = "xdata", y = "ydata", title="test" ) +
scale_color_manual(values = mypalette)
plot1
See resulting figure here (not reputation enough to include them)
I hope you find my answer useful.
I'd like to be able to build a string based on the number of columns in my matrix and pass that to ggplot as an aesthetic. This doesn't seem to be covered by the aes_string() function. The reason I want this is that I'm using the ggalluvial package but the intricacies matter less than the principle. My code looks like this:
library(ggplot2)
library(ggalluvial)
my_alluvial_plot <- function(scores, n_groups = 5) {
score_names <- names(scores)
scr_mat <- data.matrix(scores)
n_cols <- ncol(scores)
# create ntiles of scores so that flow can be seen between groups
ranks <- apply(scr_mat, 2, function(x) {
rk <- dplyr::ntile(x, n_groups)
return(as.factor(rk))
})
to_plot <- data.frame(ranks)
# build the string for the aes() function
a_string <- ""
for (i in 1:n_cols) {
a_string <- paste0(a_string, "axis", i, " = to_plot[, ", i, "],")
}
# remove final comma
a_string <- substr(a_string, 1, nchar(a_string) - 1)
ggplot(to_plot,
aes(eval(a_string))) +
geom_alluvium(aes(fill = to_plot[, n_cols], width = 1/12)) +
geom_stratum(width = 1/12, fill = "black", color = "grey") +
scale_x_continuous(breaks = 1:n_cols, labels = score_names) +
scale_fill_brewer(type = "qual", palette = "Set1")
}
df <- data.frame(col1 = runif(10),
col2 = runif(10),
col3 = rnorm(10),
col4 = rnorm(10))
my_alluvial_plot(df)
This produces a blank plot with the following error:
Warning: Ignoring unknown aesthetics: width
Error: Discrete value supplied to continuous scale
Basically, I want to build an alluvial plot that can support an arbitrary number of columns, so the ggplot code as it's evaluated would end up being like
ggplot(to_plot,
aes(axis1 = data[, 1], axis2 = data[, 2], axis3 = data[, 3], ...))
But neither eval() or parse() produce anything sensible. aes_string() produces the same problem. Is there any way to do this systematically?
The reason you can't run parse() or eval() on strings like "axis1 = col1, axis2 = col2" is that such is a string by itself is not valid R code. But the entire ggplot call? That can be parsed!
If you rework the plot call like this, it produces the alluvial plot just fine:
gg_string <- paste0("ggplot(to_plot,
aes(", a_string, ")) +
geom_alluvium(aes(fill = to_plot[, n_cols], width = 1/12)) +
geom_stratum(width = 1/12, fill = 'black', color = 'grey') +
scale_x_continuous(breaks = 1:n_cols, labels = score_names) +
scale_fill_brewer(type = 'qual', palette = 'Set1')")
eval(parse(text = gg_string))