Related
I'm building a function where users can select a column from a list of options, and within the function, I want to do an if statement where, if the x variable is one of the options that the function is designed to work with, it will generate the plot. Otherwise, it will print an error message.
However, when I try to do if({{x_variable}} == ses)--the correct variable in this example--I keep getting
Error in make_plot(test_data, x_variable = ses) : object 'ses' not found
That's the same error I get for if(enquo(x_variable) == ses) and if(!!x_variable == ses)
The correct answer will produce the plot when x_variable is ses and will print the error when x_variable is anything else.
Here's a sample dataset and my function (that does not work collectively, but each individual part does):
library(dplyr)
library(rlang) #if enquo() is needed
test_data <- tibble(ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
total = c(10, 20, 20, 30, 9, 11, 40, 60))
make_plot <- function(data, x_variable) {
if({{x_variable}} == "ses") {
ggplot(data = test_data, aes(x = {{x_variable}}, y = total)) +
geom_col()
} else {
print("This function isn't designed for this variable, sorry!")
}
}
make_plot(test_data, x_variable = ses)
make_plot(test_data, x_variable = anything_else)
You have to use:
make_plot(test_data, x_variable = "ses")
or alternatively:
ses <- "ses"
make_plot(test_data, x_variable = ses)
This error means that the object ses is not declared.
If you want to be able to pass undeclared objects such as ses as an input, you could use substitute(x_variable) or deparse(substitute(x_variable))
make_plot <- function(data, x_variable) {
#print(deparse(substitute(x_variable)))
if(deparse(substitute(x_variable)) == "ses") {
ggplot(data = test_data, aes(x = ses, y = total)) +
geom_col()
} else {
print("This function isn't designed for this variable, sorry!")
}
}
This is non-standard evaluation however, so make sure this is indeed what you're after as it can lead to surprising behaviours.
This explains the difference between both options, from Advanced R
To supplement #gaut's elaborate answer, or if it's easier for you to remember (or understand how {{ work), you would need the following to use the curly brackets.
names(select(data, {{x_variable}})) == "ses"
Suppose you have data:
df = data.frame(A = rep(1:10,each=10),B = rep(1:10,times=10),C = runif(100,0,1))
I've written a function that takes a column name as an argument:
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes(x=A,y=B,z=get(variable))) +
geom_tile(aes(fill=get(variable)))
return(plot)
}
You can therefore run this doing: plotFill(df,"C")
I'm trying to label the legend with the name of the variable passed, but adding labs(colour=variable) doesn't work, which I think it should since variable is a string...
If its only about the label name, you could use plot$labels$fill:
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes(x=A,y=B,z=get(variable))) +
geom_tile(aes(fill=get(variable)))
plot$labels$fill <- variable
return(plot)
}
You shouldn't use get here. Instead, use aes_string.
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes_string(x="A",y="B",z=variable)) +
geom_tile(aes_string(fill=variable))
return(plot)
}
plotFill(df,"C")
I have a question about convert variable name into strings to work as a x-axis name.
I tried to apply the deparse(substitute(input)), but unfortunately, it doesn't work well when I called this function within another function.
plot_CI <- function(input){
nm <- deparse(substitute(input))
if (substring(nm,1,1) == 'u') {
prior <- 'uniform'
} else if ((substring(nm,1,1) == 'l')) {
prior <- 'logit_Normal'
} else {
prior <- paste(strsplit(nm,"_")[[1]][1:2],collapse="_")
}
plot <- ggplot(temp_data, aes(x = x, y = mean)) +
geom_point(size = 2) +
geom_errorbar(aes(ymax = high, ymin = low)) +
geom_hline(yintercept = true_value, col = 'blue') +
labs(x=prior, y='value')
return(plot)
}
sen_plot <- function(variable){
# variable <- deparse(substitute(var))
file_name <- paste0('C:/Users/Qiangsuper/Dropbox/Papers/1/plot/sensitivity_', variable, '.png')
png(filename = file_name, width = 1000, height = 400)
p1 <- plot_CI(eval(parse(text = paste0('uniform_', variable))))
p2 <- plot_CI(eval(parse(text = paste0('logitN_', variable))))
multiplot(p1,p2,cols=2)
dev.off()
}
for (i in c("beta_1", "beta_2", "beta_3", "phi", "p", "delta")) {
sen_plot(i)
}
I expect 'uniform' as X-axis name, however, I only received eval(parse(text = paste0('Uniform_', variable))).
Thank you very much for your help.
UPDATE ABOUT QUESTION:
I think I should make the question more concise. Here is an easily-understood scenario.
uniform_beta_1 is a data frame or data table, which stores the my results. I try to develop an automatic plotting algorithm which will automatically identify which prior distribution I applied and name the X-axis with this prior distribution. For uniform_beta_1, the prior distribution is uniform, then the X-axis's name will be uniform. Here is what I try to do:
input <- uniform_beta_1
nm <- deparse(substitute(input))
Then apply the substring command to judge the prior distribution. However, in this case, nm will return 'input' rather than 'uniform_beta_1'. I am wondering if there is any way I can return 'uniform_beta_1'.
Thank you very much for your help.
The final solution for my question is that I create a list at very beginning and store all these strings into this list to avoid the transformation in the function.
Before you mark as dup, I know about Use character string as function argument, but my use case is slightly different. I don't need to pass a parameter INSIDE the function, I would like to pass a dynamic number of parameters after a + (think ggplot2).
(Note: Please don't format and remove the extra-looking ####, I have left them in so people can copy paste the code into R for simplicity).
This has been my process:
#### So let's reproduce this example:
library(condformat)
condformat(iris[c(1:5,70:75, 120:125),]) +
rule_fill_discrete(Species) +
rule_fill_discrete(Petal.Width)
#### I would like to be able to pass the two rule_fill_discrete() functions dynamically (in my real use-case I have a variable number of possible inputs and it's not possible to hardcode these in).
#### First, create a function to generalize:
PlotSeries <- function(x){
b=NULL
for (i in 1:length(x)){
a <- paste('rule_fill_discrete(',x[i],')',sep="")
b <- paste(paste(b,a,sep="+"))
}
b <- gsub("^\\+","",b)
eval(parse(text = b))
}
#### Which works with one argument
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries("Species")
#### But not if we pass two arguments:
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries(c("Species","Petal.Width"))
Error in rule_fill_discrete(Species) + rule_fill_discrete(Petal.Width) :
non-numeric argument to binary operator
#### It will work if we call each individually
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries("Species") +
PlotSeries("Petal.Width")
#### Which gives us an indication as to what the problem is... the fact that it doesn't like when the rule_fill_discrete statements are passed in as one statement. Let's test this:
condformat(iris[c(1:5,70:75, 120:125),]) +
eval(rule_fill_discrete(Species) +
rule_fill_discrete(Petal.Width) )
Error in rule_fill_discrete(Species) + rule_fill_discrete(Petal.Width) :
non-numeric argument to binary operator
#### Fails. But:
condformat(iris[c(1:5,70:75, 120:125),]) +
eval(rule_fill_discrete(Species)) +
eval(rule_fill_discrete(Petal.Width) )
#### This works. But we need to be able to pass in a GROUP of statements (that's kinda the whole point). So let's try to get the eval statements in:
Nasty <- "eval(rule_fill_discrete(Species)) eval(rule_fill_discrete(Petal.Width))"
condformat(iris[c(1:5,70:75, 120:125),]) + Nasty #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), Nasty) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + eval(Nasty) #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), eval(Nasty)) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + parse(text=Nasty) #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), parse(text = Nasty)) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + eval(parse(text=Nasty)) #### FAIL
Error in eval(rule_fill_discrete(Species)) + eval(rule_fill_discrete(Petal.Width)) :
non-numeric argument to binary operator
So how can we do it?
Thanks to this stackoverflow question and thanks to the bug report from #amit-kohli, I was made aware that there was a bug in the condformat package.
Update: Answer updated to reflect the new condformat API introduced in condformat 0.7.
Here I show how to (using condformat 0.7.0). Note that the syntax I use in the standard evaluation function is derived from the rlang package.
Install condformat:
install.packages("condformat)"
A simple example, asked in the question:
# Reproduce the example
library(condformat)
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(Species) %>%
rule_fill_discrete(Petal.Width)
# With variables:
col1 <- rlang::quo(Species)
col2 <- rlang::quo(Petal.Width)
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(!! col1) %>%
rule_fill_discrete(!! col2)
# Or even with character strings to give the column names:
col1 <- "Species"
col2 <- "Petal.Width"
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(!! col1) %>%
rule_fill_discrete(!! col2)
# Do it programmatically (In a function)
#' #importFrom magrittr %>%
some_color <- function(data, col1, col2) {
condformat::condformat(data) %>%
condformat::rule_fill_discrete(!! col1) %>%
condformat::rule_fill_discrete(!! col2)
}
some_color(iris[c(1:5,70:75, 120:125),], "Species", "Petal.Width")
A more general example, using an expression:
# General example, using an expression:
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_gradient(Species, expression = Sepal.Width - Sepal.Length)
# General example, using a column given as character and an
# expression given as character as well:
expr <- rlang::parse_expr("Sepal.Width - Sepal.Length")
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_gradient("Species", expression = !! expr)
# General example, in a function, everything given as a character:
two_column_difference <- function(data, col_to_colour, col1, col2) {
expr1 <- rlang::parse_expr(col1)
expr2 <- rlang::parse_expr(col2)
condformat::condformat(data) %>%
condformat::rule_fill_gradient(
!! col_to_colour,
expression = (!!expr1) - (!!expr2))
}
two_column_difference(iris[c(1:5,70:75, 120:125),],
col_to_colour = "Species",
col1 = "Sepal.Width",
col2 = "Sepal.Length")
Custom discretized scales for continuous values
Custom discrete color values can be specified with a function that preprocesses a continuous column into a discrete scale:
discretize <- function(column) {
sapply(column,
FUN = function(value) {
if (value < 4.7) {
return("low")
} else if (value < 5.0) {
return("mid")
} else {
return("high")
}
})
}
And we can specify the colors for each of the levels of the scale using colours =:
condformat(head(iris)) %>%
rule_fill_discrete(
"Sepal.Length",
expression = discretize(Sepal.Length),
colours = c("low" = "red", "mid" = "yellow", "high" = "green"))
If we want, the discretize function can return colours:
discretize_colours <- function(column) {
sapply(column,
FUN = function(value) {
if (value < 4.7) {
return("red")
} else if (value < 5.0) {
return("yellow")
} else {
return("green")
}
})
}
The code to use it:
condformat(head(iris)) %>%
rule_fill_discrete(
"Sepal.Length",
expression = discretize_colours(Sepal.Length),
colours = identity)
Note that as expression returns the colours we use colours = identity. identity is just function(x) x.
Finally, using some rlang tidy evaluation we can create a function:
colour_based_function <- function(data, col1) {
col <- rlang::parse_expr(col1)
condformat::condformat(data) %>%
condformat::rule_fill_discrete(
columns = !! col1,
expression = discretize_colours(!! col),
colours = identity)
}
colour_based_function(head(iris), "Sepal.Length")
NOTE: This answer provides a workaround for a bug in an old version of condformat. The bug has since been fixed, see #zeehio's answer for the current version after this bug was fixed.
I think you have two mostly separate questions. That are all mixed together in your post. I will attempt to restate and answer them individually, and then put things together - which doesn't work all the way at this point but gets close.
First, let's save some typing by defining a couple variables:
ir = iris[c(1:5,70:75, 120:125), ]
cf = condformat(ir)
Q1: How do I use + on a vector or list of inputs?
This is the easy question. The base answer is Reduce. The following are all equivalent:
10 + 1 + 2 + 5
"+"("+"("+"(10, 1), 2), 5)
Reduce("+", c(1, 2, 5), init = 10))
More pertinent to your case, we can do this to replicate your desired output:
fills = list(rule_fill_discrete(Species), rule_fill_discrete(Petal.Width))
res = Reduce(f = "+", x = fills, init = cf)
res
Q2: How do I use string inputs with rule_fill_discrete?
This is my first time using condformat, but it looks to be written in the lazyeval paradigm with rule_fill_discrete_ as a standard-evaluating counterpart to the non-standard-evaluating rule_fill_discrete. This example is even given in ?rule_fill_discrete, but it doesn't work as expected
cf + rule_fill_discrete_(columns = "Species")
# bad: Species column colored entirely red, not colored by species
# possibly a bug? At the very least misleading documentation...
cf + rule_fill_discrete_(columns = "Species", expression = expression(Species))
# bad: works as expected, but still uses an unquoted Species
# other failed attempts
cf + rule_fill_discrete_(columns = "Species", expression = expression("Species"))
cf + rule_fill_discrete_(columns = "Species", expression = "Species")
# bad: single color still single color column
There is also an env environment argument in the SE function, but I had no luck with that either. Maybe someone with more lazyeval/expression experience can point out something I'm overlooking or doing wrong.
Work-around: What we can do is pass the column directly. This works because we're not doing any fancy functions of the column, just using it's values directly to determine the coloring:
cf + rule_fill_discrete_(columns = c("Species"), expression = ir[["Species"]])
# hacky, but it works
Putting it together
Using the NSE version with Reduce is easy:
fills = list(rule_fill_discrete(Species), rule_fill_discrete(Petal.Width))
res = Reduce(f = "+", x = fills, init = cf)
res
# works!
Using SE with input strings, we can use the hacky workaround.
input = c("Species", "Petal.Width")
fills_ = lapply(input, function(x) rule_fill_discrete_(x, expression = ir[[x]]))
res_ = Reduce(f = "+", x = fills_, init = cf)
res_
# works!
And this, of course, you could wrap up into a custom function that takes a data frame and a string vector of column names as input.
#Gregor's answer was perfect. A bit hacky, but works excellently.
In my use-case, I needed a bit more complication, I will post it here in case it's useful to somebody else.
In my use-case, I needed to be able to color multiple columns based on the values of one column. condformat allows us to do this already, but again we run into the parametrization problem. Here's my solution to that, based on the response by Gregor:
CondFormatForInput <- function(Table,VectorToColor,VectorFromColor) {
cf <- condformat(Table)
input = data.frame(Val=VectorToColor,
Comp=VectorFromColor)
fills2_ = map2(input$Val,.y = input$Comp,.f = function(x,y) rule_fill_discrete_(x, expression =
iris[[y]]))
res_ = Reduce(f = "+", x = fills2_, init = cf)
res_
}
CondFormatForInput(iris,
c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
c("Sepal.Width","Sepal.Width","Petal.Width","Petal.Width"))
Question: why can't I call sapply inside aes()?
Goal of following figure: Create histogram showing proportion that died/lived so that the proportion for each combination of group/type sums to 1 (example inspired by previous post).
I know you could make the figure by summarising outside of ggplot but the question is really about why the function isn't working inside of aes.
## Data
set.seed(999)
dat <- data.frame(group=factor(rep(1:2, 25)),
type=factor(sample(1:2, 50, rep=T)),
died=factor(sample(0:1, 50, rep=T)))
## Setup the figure
p <- ggplot(dat, aes(x=died, group=interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
ylab("Proportion")
## Proportions, all groups/types together sum to 1 (not wanted)
p + geom_histogram(aes(y=..count../sum(..count..)), position=position_dodge())
## Look at groups
stuff <- ggplot_build(p)
stuff$data[[1]]
## The long way works: proportions by group/type
p + geom_histogram(
aes(y=c(..count..[..group..==1] / sum(..count..[..group..==1]),
..count..[..group..==2] / sum(..count..[..group..==2]),
..count..[..group..==3] / sum(..count..[..group..==3]),
..count..[..group..==4] / sum(..count..[..group..==4]))),
position='dodge'
)
## Why can't I call sapply there?
p + geom_histogram(
aes(y=sapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g]))),
position='dodge'
)
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'expr' of mode 'function' was not found
So, the issue arises because of a recursive call to ggplot2:::strip_dots for any aesthetics that include 'calculated aesthetics'. There is some discussion around the calculated aesthetics in this SO question and answer. The relevant code in layer.r is here:
new <- strip_dots(aesthetics[is_calculated_aes(aesthetics)])
i.e. strip_dots is called only if there are calculated aesthetics, defined using the regex "\\.\\.([a-zA-z._]+)\\.\\.".
strip_dots in takes a recursive approach, working down through the nested calls and stripping out the dots. The code is like this:
function (expr)
{
if (is.atomic(expr)) {
expr
}
else if (is.name(expr)) {
as.name(gsub(match_calculated_aes, "\\1", as.character(expr)))
}
else if (is.call(expr)) {
expr[-1] <- lapply(expr[-1], strip_dots)
expr
}
else if (is.pairlist(expr)) {
as.pairlist(lapply(expr, expr))
}
else if (is.list(expr)) {
lapply(expr, strip_dots)
}
else {
stop("Unknown input:", class(expr)[1])
}
}
If we supply an anonymous function this code as follows:
anon <- as.call(quote(function(g) mean(g)))
ggplot2:::strip_dots(anon)
we reproduce the error:
#Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
Working through this, we can see that anon is a call. For calls, strip_dots will use lapply to call strip_dots on the second and third elements of the call. For an anonymous function like this, the second element is the formals of the function. If we look at the formals of anon using dput(formals(eval(anon))) or dput(anon[[2]]) we see this:
#pairlist(g = )
For pairlists, strip_dots tries to lapply it to itself. I'm not sure why this code is there, but certainly in this circumstance it leads to the error:
expr <- anon[[2]]
lapply(expr, expr)
# Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
TL; DR At this stage, ggplot2 doesn't support the use of anonymous functions within aes where a calculated aesthetic (such as ..count..) is used.
Anyway, the desired end result can be achieved using dplyr; in general I think it makes for more readable code to separate out the data summarisation from the plotting:
newDat <- dat %>%
group_by(died, type, group) %>%
summarise(count = n()) %>%
group_by(type, group) %>%
mutate(Proportion = count / sum(count))
p <- ggplot(newDat, aes(x = died, y = Proportion, group = interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
geom_bar(stat = "identity", position = "dodge")
ggplot2 fix
I've forked ggplot2 and have made two changes to aes_calculated.r which fix the problem. The first was to correct the handling of pairlists to lapply strip_dots instead of expr, which I think must have been the intended behaviour. The second was that for formals with no default value (like in the examples provided here), as.character(as.name(expr)) throws an error because expr is an empty name, and while this is a valid construct, it's not possible to create one from an empty string.
Forked version of ggplot2 at https://github.com/NikNakk/ggplot2 and pull request just made.
Finally, after all that, the sapply example given doesn't work because it returns a 2 row by 4 column matrix rather than an 8 length vector. The corrected version is like this:
p + geom_histogram(
aes(y=unlist(lapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g])))),
position='dodge'
)
This gives the same output as the dplyr solution above.
One other thing to note is that this lapply code assumes that the data at that stage is sorted by group. I think this is always the case, but if for whatever reason it weren't you would end up with the y data out of order. An alternative which preserves the order of the rows in the calculated data would be:
p + geom_histogram(
aes(y={grp_total <- tapply(..count.., ..group.., sum);
..count.. / grp_total[as.character(..group..)]
}),
position='dodge'
)
It's also worth being aware that these expressions are evaluated in baseenv(), the namespace of the base package. This means that any functions from other packages, even standard ones like stats and utils, need to be used with the :: operator (e.g. stats::rnorm).
After playing around a little, the problem appears to be using anonymous functions with ..group.. or ..count.. inside aes:
xy <- data.frame(x=1:10,y=1:10) #data
ggplot(xy, aes(x = x, y = sapply(y, mean))) + geom_line() #sapply is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., mean))) #sapply with ..group.. is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., function(g) {mean(g)})))
#broken, with same error
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(y, function(g) {mean(g)})), stat = "identity")
#sapply with anonymous functions works fine!
It seems like a really weird bug, unless I'm missing something stupid.