How can I pass character strings as independent parameters after a `+`? - r

Before you mark as dup, I know about Use character string as function argument, but my use case is slightly different. I don't need to pass a parameter INSIDE the function, I would like to pass a dynamic number of parameters after a + (think ggplot2).
(Note: Please don't format and remove the extra-looking ####, I have left them in so people can copy paste the code into R for simplicity).
This has been my process:
#### So let's reproduce this example:
library(condformat)
condformat(iris[c(1:5,70:75, 120:125),]) +
rule_fill_discrete(Species) +
rule_fill_discrete(Petal.Width)
#### I would like to be able to pass the two rule_fill_discrete() functions dynamically (in my real use-case I have a variable number of possible inputs and it's not possible to hardcode these in).
#### First, create a function to generalize:
PlotSeries <- function(x){
b=NULL
for (i in 1:length(x)){
a <- paste('rule_fill_discrete(',x[i],')',sep="")
b <- paste(paste(b,a,sep="+"))
}
b <- gsub("^\\+","",b)
eval(parse(text = b))
}
#### Which works with one argument
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries("Species")
#### But not if we pass two arguments:
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries(c("Species","Petal.Width"))
Error in rule_fill_discrete(Species) + rule_fill_discrete(Petal.Width) :
non-numeric argument to binary operator
#### It will work if we call each individually
condformat(iris[c(1:5,70:75, 120:125),]) +
PlotSeries("Species") +
PlotSeries("Petal.Width")
#### Which gives us an indication as to what the problem is... the fact that it doesn't like when the rule_fill_discrete statements are passed in as one statement. Let's test this:
condformat(iris[c(1:5,70:75, 120:125),]) +
eval(rule_fill_discrete(Species) +
rule_fill_discrete(Petal.Width) )
Error in rule_fill_discrete(Species) + rule_fill_discrete(Petal.Width) :
non-numeric argument to binary operator
#### Fails. But:
condformat(iris[c(1:5,70:75, 120:125),]) +
eval(rule_fill_discrete(Species)) +
eval(rule_fill_discrete(Petal.Width) )
#### This works. But we need to be able to pass in a GROUP of statements (that's kinda the whole point). So let's try to get the eval statements in:
Nasty <- "eval(rule_fill_discrete(Species)) eval(rule_fill_discrete(Petal.Width))"
condformat(iris[c(1:5,70:75, 120:125),]) + Nasty #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), Nasty) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + eval(Nasty) #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), eval(Nasty)) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + parse(text=Nasty) #### FAIL
Error in +.default(condformat(iris[c(1:5, 70:75, 120:125), ]), parse(text = Nasty)) :
non-numeric argument to binary operator
condformat(iris[c(1:5,70:75, 120:125),]) + eval(parse(text=Nasty)) #### FAIL
Error in eval(rule_fill_discrete(Species)) + eval(rule_fill_discrete(Petal.Width)) :
non-numeric argument to binary operator
So how can we do it?

Thanks to this stackoverflow question and thanks to the bug report from #amit-kohli, I was made aware that there was a bug in the condformat package.
Update: Answer updated to reflect the new condformat API introduced in condformat 0.7.
Here I show how to (using condformat 0.7.0). Note that the syntax I use in the standard evaluation function is derived from the rlang package.
Install condformat:
install.packages("condformat)"
A simple example, asked in the question:
# Reproduce the example
library(condformat)
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(Species) %>%
rule_fill_discrete(Petal.Width)
# With variables:
col1 <- rlang::quo(Species)
col2 <- rlang::quo(Petal.Width)
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(!! col1) %>%
rule_fill_discrete(!! col2)
# Or even with character strings to give the column names:
col1 <- "Species"
col2 <- "Petal.Width"
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_discrete(!! col1) %>%
rule_fill_discrete(!! col2)
# Do it programmatically (In a function)
#' #importFrom magrittr %>%
some_color <- function(data, col1, col2) {
condformat::condformat(data) %>%
condformat::rule_fill_discrete(!! col1) %>%
condformat::rule_fill_discrete(!! col2)
}
some_color(iris[c(1:5,70:75, 120:125),], "Species", "Petal.Width")
A more general example, using an expression:
# General example, using an expression:
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_gradient(Species, expression = Sepal.Width - Sepal.Length)
# General example, using a column given as character and an
# expression given as character as well:
expr <- rlang::parse_expr("Sepal.Width - Sepal.Length")
condformat(iris[c(1:5,70:75, 120:125),]) %>%
rule_fill_gradient("Species", expression = !! expr)
# General example, in a function, everything given as a character:
two_column_difference <- function(data, col_to_colour, col1, col2) {
expr1 <- rlang::parse_expr(col1)
expr2 <- rlang::parse_expr(col2)
condformat::condformat(data) %>%
condformat::rule_fill_gradient(
!! col_to_colour,
expression = (!!expr1) - (!!expr2))
}
two_column_difference(iris[c(1:5,70:75, 120:125),],
col_to_colour = "Species",
col1 = "Sepal.Width",
col2 = "Sepal.Length")
Custom discretized scales for continuous values
Custom discrete color values can be specified with a function that preprocesses a continuous column into a discrete scale:
discretize <- function(column) {
sapply(column,
FUN = function(value) {
if (value < 4.7) {
return("low")
} else if (value < 5.0) {
return("mid")
} else {
return("high")
}
})
}
And we can specify the colors for each of the levels of the scale using colours =:
condformat(head(iris)) %>%
rule_fill_discrete(
"Sepal.Length",
expression = discretize(Sepal.Length),
colours = c("low" = "red", "mid" = "yellow", "high" = "green"))
If we want, the discretize function can return colours:
discretize_colours <- function(column) {
sapply(column,
FUN = function(value) {
if (value < 4.7) {
return("red")
} else if (value < 5.0) {
return("yellow")
} else {
return("green")
}
})
}
The code to use it:
condformat(head(iris)) %>%
rule_fill_discrete(
"Sepal.Length",
expression = discretize_colours(Sepal.Length),
colours = identity)
Note that as expression returns the colours we use colours = identity. identity is just function(x) x.
Finally, using some rlang tidy evaluation we can create a function:
colour_based_function <- function(data, col1) {
col <- rlang::parse_expr(col1)
condformat::condformat(data) %>%
condformat::rule_fill_discrete(
columns = !! col1,
expression = discretize_colours(!! col),
colours = identity)
}
colour_based_function(head(iris), "Sepal.Length")

NOTE: This answer provides a workaround for a bug in an old version of condformat. The bug has since been fixed, see #zeehio's answer for the current version after this bug was fixed.
I think you have two mostly separate questions. That are all mixed together in your post. I will attempt to restate and answer them individually, and then put things together - which doesn't work all the way at this point but gets close.
First, let's save some typing by defining a couple variables:
ir = iris[c(1:5,70:75, 120:125), ]
cf = condformat(ir)
Q1: How do I use + on a vector or list of inputs?
This is the easy question. The base answer is Reduce. The following are all equivalent:
10 + 1 + 2 + 5
"+"("+"("+"(10, 1), 2), 5)
Reduce("+", c(1, 2, 5), init = 10))
More pertinent to your case, we can do this to replicate your desired output:
fills = list(rule_fill_discrete(Species), rule_fill_discrete(Petal.Width))
res = Reduce(f = "+", x = fills, init = cf)
res
Q2: How do I use string inputs with rule_fill_discrete?
This is my first time using condformat, but it looks to be written in the lazyeval paradigm with rule_fill_discrete_ as a standard-evaluating counterpart to the non-standard-evaluating rule_fill_discrete. This example is even given in ?rule_fill_discrete, but it doesn't work as expected
cf + rule_fill_discrete_(columns = "Species")
# bad: Species column colored entirely red, not colored by species
# possibly a bug? At the very least misleading documentation...
cf + rule_fill_discrete_(columns = "Species", expression = expression(Species))
# bad: works as expected, but still uses an unquoted Species
# other failed attempts
cf + rule_fill_discrete_(columns = "Species", expression = expression("Species"))
cf + rule_fill_discrete_(columns = "Species", expression = "Species")
# bad: single color still single color column
There is also an env environment argument in the SE function, but I had no luck with that either. Maybe someone with more lazyeval/expression experience can point out something I'm overlooking or doing wrong.
Work-around: What we can do is pass the column directly. This works because we're not doing any fancy functions of the column, just using it's values directly to determine the coloring:
cf + rule_fill_discrete_(columns = c("Species"), expression = ir[["Species"]])
# hacky, but it works
Putting it together
Using the NSE version with Reduce is easy:
fills = list(rule_fill_discrete(Species), rule_fill_discrete(Petal.Width))
res = Reduce(f = "+", x = fills, init = cf)
res
# works!
Using SE with input strings, we can use the hacky workaround.
input = c("Species", "Petal.Width")
fills_ = lapply(input, function(x) rule_fill_discrete_(x, expression = ir[[x]]))
res_ = Reduce(f = "+", x = fills_, init = cf)
res_
# works!
And this, of course, you could wrap up into a custom function that takes a data frame and a string vector of column names as input.

#Gregor's answer was perfect. A bit hacky, but works excellently.
In my use-case, I needed a bit more complication, I will post it here in case it's useful to somebody else.
In my use-case, I needed to be able to color multiple columns based on the values of one column. condformat allows us to do this already, but again we run into the parametrization problem. Here's my solution to that, based on the response by Gregor:
CondFormatForInput <- function(Table,VectorToColor,VectorFromColor) {
cf <- condformat(Table)
input = data.frame(Val=VectorToColor,
Comp=VectorFromColor)
fills2_ = map2(input$Val,.y = input$Comp,.f = function(x,y) rule_fill_discrete_(x, expression =
iris[[y]]))
res_ = Reduce(f = "+", x = fills2_, init = cf)
res_
}
CondFormatForInput(iris,
c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
c("Sepal.Width","Sepal.Width","Petal.Width","Petal.Width"))

Related

Using map (purrr) to iterate over two vectors in R

I am trying to plot maps for three different dependent variables (each plotted separately), over multiple metro areas. To do this, I am trying to do 2 nested maps. I start by doing it just over the 3 different variables foe one metro area - defining the function and running it through map and it works. The code for this part is below:
plot_fun <- function(x){
base_plot_2015$`14460` +
data_2015 %>% subset(met2013 == 14460) %>% subset(onetsoccode == "151021") %>%
geom_sf(mapping = aes_string(fill = x)) +
scale_fill_viridis_c(option = "plasma")
}
expl <- names(data_2015)[17:19]
expl <- set_names(expl)
plot_Boston <- map(expl, ~plot_fun(.x))
This gives me a list of 3 maps and so far so good. (Please note that the base_plot is a list containing maps of each of the metro areas saved as the metro area id, which is stored as double).
Next, I want to do the same, but also iterate it over different metro codes. I try doing this by defining the following function:
plot_fun <- function(x,y){
base_plot_2015$`y` +
data_2015 %>% subset(met2013 == y) %>% subset(onetsoccode == "151021") %>%
geom_sf(mapping = aes_string(fill = x)) +
scale_fill_viridis_c(option = "plasma")
}
The idea is to map it over a vector of metro areas (y) for each dependent variable to be plotted(x). However, before I do that, I can't get this function to work. When I run this function using plot_fun("work_home", 14460), I get the following error:
Error in base_plot_2015$y + data_2015 %>% subset(met2013 == y) %>% subset(onetsoccode == :
non-numeric argument to binary operator
Would someone be kind enough to point out what I am doing wrong here?
Also, to do the nested map, does the code map(met, ~map(expl, ~plot_fun, y=.x)) (where met is a vector of metro area codes) make sense?
TLDR: When you subsetting your base_plot list, you need to use the [[ operator instead of $
In your second version, the function sees it as "y" a string, not the variable y. Since, presumably, there is no metro code "y", NULL is returned. The binary operator is + and NULL is non-numeric. Here's a simplified example of what's happening:
base_data <- list(
a = letters,
b = LETTERS,
c = rnorm(100)
)
getMyData <- function(y){
base_data$`x`
}
getMyData("a")
#> NULL
base_data$x <- "QSBjbHVlIHBlcmhhcHM="
val <- getMyData("a") %>% print()
#> [1] "QSBjbHVlIHBlcmhhcHM="
base64enc::base64decode(val) %>% rawToChar()
#> [1] "A clue perhaps"
If you change your function to use the [[ operator, you can use the string value in the variable y to subset:
actuallyGetMyData <- function(x){
base_data[[x]]
}
actuallyGetMyData("a")[1:10]
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
EDIT: Regarding the mapping over the permutations (because it's hard to read code in comments)
the call map(expl, ~plot_fun) creates an anonymous function that returns the function plot_fun. You either need to drop the anonymous function
map(met, ~map(expl, plot_fun, y=.x))
or call the function
map(met, ~map(expl, ~plot_fun(.x, .y), .y=.x))
I find the nested maps a little hard to read (.x becoming .y and what not) so alternatively you could just first enumerate all the combinations and then map over those
plotVars <- expand.grid(expl, met)
plot_final <- map2(plotVars[[1]], plotsVars[[2]], plot_fun) %>%
set_names(paste(plotVars[[1]], plotVars[[2]], sep = "_"))
Note that your plot_final list will no longer be nested though.

Error: cannot join on columns: index out of bounds [duplicate]

I am trying to perform an inner join two tables using dplyr, and I think I'm getting tripped up by non-standard evaluation rules. When using the by=("a" = "b") argument, everything works as expected when "a" and "b" are actual strings. Here's a toy example that works:
library(dplyr)
data(iris)
inner_join(iris, iris, by=c("Sepal.Length" = "Sepal.Width"))
But let's say I was putting inner_join in a function:
library(dplyr)
data(iris)
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=c(xname = yname))
}
myfn("Sepal.Length", "Sepal.Width")
This returns the following error:
Error: cannot join on columns 'xname' x 'Sepal.Width': index out of bounds
I suspect there is some fancy expression, deparsing, quoting, or unquoting that I could do to make this work, but I'm a bit murky on those details.
You can use
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=setNames(yname, xname))
}
The suggested syntax in the ?inner_join documentation of
by = c("a"="b") # same as by = c(a="b")
is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.
I like MrFlick's answer and fber's addendum, but I prefer structure. For me setNames feels as something at the end of a pipe, not as an on-the-fly constructor. On another note, both setNames and structure enable the use of variables in the function call.
myfn <- function(xnames, ynames) {
data(iris)
inner_join(iris, iris, by = structure(names = xnames, .Data = ynames))
}
x <- "Sepal.Length"
myfn(x, "Sepal.Width")
A named vector argument would run into problems here:
myfn <- function(byvars) {
data(iris)
inner_join(iris, iris, by = byvars)
}
x <- "Sepal.Length"
myfn(c(x = "Sepal.Width"))
You could solve that, though, by using setNames or structure in the function call.
I know I'm late to the party, but how about:
myfn <- function(byvar) {
data(iris)
inner_join(iris, iris, by=byvar)
}
This way you can do what you want with:
myfn(c("Sepal.Length"="Sepal.Width"))
I faced a nearly identical challenge as #Peter, but needed to pass multiple different sets of by = join parameters at one time. I chose to use the map() function from the tidyverse package, purrr.
This is the subset of the tidyverse that I used.
library(magrittr)
library(dplyr)
library(rlang)
library(purrr)
First, I adapted myfn to use map() for the case posted by Peter. 42's comment and Felipe Gerard's answer made it clear that the by argument can take a named vector. map() requires a list over which to iterate.
myfn_2 <- function(xname, yname) {
by_names <- list(setNames(nm = xname, yname ))
data(iris)
# map() returns a single-element list. We index to retrieve dataframe.
map( .x = by_names,
.f = ~inner_join(x = iris,
y = iris,
by = .x)) %>%
`[[`(1)
}
myfn_2("Sepal.Length", "Sepal.Width")
I found that I didn't need quo_name() / !! in building the function.
Then, I adapted the function to take a list of by parameters. For each by_i in by_grps, we could extend x and y to add named values on which to join.
by_grps <- list( by_1 = list(x = c("Sepal.Length"), y = c("Sepal.Width")),
by_2 = list(x = c("Sepal.Width"), y = c("Petal.Width"))
)
myfn_3 <- function(by_grps_list, nm_dataset) {
by_named_vectors_list <- lapply(by_grps_list,
function(by_grp) setNames(object = by_grp$y,
nm = by_grp$x))
map(.x = by_named_vectors_list,
.f = ~inner_join(nm_dataset, nm_dataset, by = .x))
}
myfn_3(by_grps, iris)

Passing data and column names to ggplot via another function

I'll skip right to an example and comment afterwords:
cont <- data.frame(value = c(1:20),variable = c(1:20,(1:20)^1.5,(1:20)^2),group=rep(c(1,2,3),each=20))
value variable group
1 1 1.000000 1
2 2 2.000000 1
3 3 3.000000 1
#... etc.
#ser is shorthand for "series".
plot_scat <- function(data,x,y,ser) {
ggplot(data,aes(x=x,y=y,color=factor(ser)))+geom_point()
}
plot_scat(cont,value,variable,group)
#This gives the error:
#Error in eval(expr,envir,enclose) : object 'x' not found
Now, I know that ggplot2 has a known bug where aes() will only look in the global environent and not in the local environment. Following advice from: Use of ggplot() within another function in R, I tried another route.
plot_scat <- function(data,x,y,ser) {
#environment=environment() added
ggplot(data,aes(x=x,y=y,color=factor(ser)),environment=environment())+geom_point()
}
plot_scat(cont,value,variable,group)
#This gives the error:
#Error in eval(expr,envir,enclos) : object 'value' not found
#In addition: Warning message:
#In eval(expr,envir,enclos) : restarting interrupted promise evaluation
I don't know what that last line means. If I call:
ggplot(cont,aes(x=value,y=variable,color=group))+geom_point()
I get the graph you would expect. At the command line, aes() is looking for the variable names in ggplot(), but it is not doing this within the function call. So I tried to change my call:
plot_scat(cont,cont$value,cont$variable,cont$group)
This gets me what I want. So I add the next layer of complexity:
plot_scat <- function(data,x,y,ser) {
#added facet_grid
ggplot(data,aes(x=x,y=y,color=factor(ser)),environment=environment())+geom_point()+
facet_grid(.~ser)
}
plot_scat(cont,cont$value,cont$variable,cont$group)
#This gives the error:
#Error in layout_base(data, cols, drop = drop):
# At least one layer must contain all variables used for facetting
My thought on this is that ser is actually cont$group, which is fine for use in aes(), but when passed to facet_grid is now a one column data frame with no information about value and variables. According to the help page, facet_grid does not take a "data=" argument so I cant use facet_grid(data=data,.~ser) to get around this. I don't know how to proceed from here.
This is an extremely simple example, but the long term goal is to have a function I can give to non-R-literate people in my office and say "give it your data frame name, column names and the column you want to split on and it will make pretty plots for you". It will also get a lot more complex, with a very customized theme, which is irrelevant to the problems I'm having.
If you do not want to pass your variables (column names) as strings/quoted, then one approach that I tried (see also here) was to make use of match.call() and eval. It works with faceting (as in your case) as well:
library(ggplot2)
cont <- data.frame( value = c(1:20),
variable = c(1:20, (1:20) ^ 1.5, (1:20) ^ 2),
group = rep(c(1, 2, 3), each = 20))
plot_scat <- function(data, x, y, ser) {
arg <- match.call()
ggplot(data, aes(x = eval(arg$x),
y = eval(arg$y),
color = factor(eval(arg$ser)))) +
geom_point() +
facet_grid(. ~ eval(arg$ser))
}
# Call your custom function without quoting the variables
plot_scat(data = cont, x = value, y = variable, ser = group)
To get an idea what match.call() does, maybe try to run this:
plot_scat <- function(data, x, y, ser) {
str(as.list(match.call()))
}
plot_scat(cont, value, variable, group)
#> List of 5
#> $ : symbol plot_scat
#> $ data: symbol cont
#> $ x : symbol value
#> $ y : symbol variable
#> $ ser : symbol group
Created on 2019-01-10 by the reprex package (v0.2.1)
Or, another workaround, but this time with passing quoted column names to the custom plotting function is using get():
plot_scat <- function(data, x, y, ser) {
ggplot(data, aes(x = get(x),
y = get(y),
color = factor(get(ser)))) +
geom_point() +
facet_grid(. ~ get(ser))
}
plot_scat(data = cont, x = "value", y = "variable", ser = "group")
You could use aes_string() in place of aes() and pass the column names as strings.
plot_scat <- function(data,x,y,ser) {
ser_col = paste("factor(",ser,")")
ggplot(data,aes_string(x=x,y=y,col=ser_col))+geom_point()+facet_grid(as.formula(sprintf('~%s',ser)))
}
plot_scat(cont,"value","variable","group")
facet_grid requires a formula so you can use as.formula to parse the string to a formula.

Functions inside aes

Question: why can't I call sapply inside aes()?
Goal of following figure: Create histogram showing proportion that died/lived so that the proportion for each combination of group/type sums to 1 (example inspired by previous post).
I know you could make the figure by summarising outside of ggplot but the question is really about why the function isn't working inside of aes.
## Data
set.seed(999)
dat <- data.frame(group=factor(rep(1:2, 25)),
type=factor(sample(1:2, 50, rep=T)),
died=factor(sample(0:1, 50, rep=T)))
## Setup the figure
p <- ggplot(dat, aes(x=died, group=interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
ylab("Proportion")
## Proportions, all groups/types together sum to 1 (not wanted)
p + geom_histogram(aes(y=..count../sum(..count..)), position=position_dodge())
## Look at groups
stuff <- ggplot_build(p)
stuff$data[[1]]
## The long way works: proportions by group/type
p + geom_histogram(
aes(y=c(..count..[..group..==1] / sum(..count..[..group..==1]),
..count..[..group..==2] / sum(..count..[..group..==2]),
..count..[..group..==3] / sum(..count..[..group..==3]),
..count..[..group..==4] / sum(..count..[..group..==4]))),
position='dodge'
)
## Why can't I call sapply there?
p + geom_histogram(
aes(y=sapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g]))),
position='dodge'
)
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'expr' of mode 'function' was not found
So, the issue arises because of a recursive call to ggplot2:::strip_dots for any aesthetics that include 'calculated aesthetics'. There is some discussion around the calculated aesthetics in this SO question and answer. The relevant code in layer.r is here:
new <- strip_dots(aesthetics[is_calculated_aes(aesthetics)])
i.e. strip_dots is called only if there are calculated aesthetics, defined using the regex "\\.\\.([a-zA-z._]+)\\.\\.".
strip_dots in takes a recursive approach, working down through the nested calls and stripping out the dots. The code is like this:
function (expr)
{
if (is.atomic(expr)) {
expr
}
else if (is.name(expr)) {
as.name(gsub(match_calculated_aes, "\\1", as.character(expr)))
}
else if (is.call(expr)) {
expr[-1] <- lapply(expr[-1], strip_dots)
expr
}
else if (is.pairlist(expr)) {
as.pairlist(lapply(expr, expr))
}
else if (is.list(expr)) {
lapply(expr, strip_dots)
}
else {
stop("Unknown input:", class(expr)[1])
}
}
If we supply an anonymous function this code as follows:
anon <- as.call(quote(function(g) mean(g)))
ggplot2:::strip_dots(anon)
we reproduce the error:
#Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
Working through this, we can see that anon is a call. For calls, strip_dots will use lapply to call strip_dots on the second and third elements of the call. For an anonymous function like this, the second element is the formals of the function. If we look at the formals of anon using dput(formals(eval(anon))) or dput(anon[[2]]) we see this:
#pairlist(g = )
For pairlists, strip_dots tries to lapply it to itself. I'm not sure why this code is there, but certainly in this circumstance it leads to the error:
expr <- anon[[2]]
lapply(expr, expr)
# Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
TL; DR At this stage, ggplot2 doesn't support the use of anonymous functions within aes where a calculated aesthetic (such as ..count..) is used.
Anyway, the desired end result can be achieved using dplyr; in general I think it makes for more readable code to separate out the data summarisation from the plotting:
newDat <- dat %>%
group_by(died, type, group) %>%
summarise(count = n()) %>%
group_by(type, group) %>%
mutate(Proportion = count / sum(count))
p <- ggplot(newDat, aes(x = died, y = Proportion, group = interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
geom_bar(stat = "identity", position = "dodge")
ggplot2 fix
I've forked ggplot2 and have made two changes to aes_calculated.r which fix the problem. The first was to correct the handling of pairlists to lapply strip_dots instead of expr, which I think must have been the intended behaviour. The second was that for formals with no default value (like in the examples provided here), as.character(as.name(expr)) throws an error because expr is an empty name, and while this is a valid construct, it's not possible to create one from an empty string.
Forked version of ggplot2 at https://github.com/NikNakk/ggplot2 and pull request just made.
Finally, after all that, the sapply example given doesn't work because it returns a 2 row by 4 column matrix rather than an 8 length vector. The corrected version is like this:
p + geom_histogram(
aes(y=unlist(lapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g])))),
position='dodge'
)
This gives the same output as the dplyr solution above.
One other thing to note is that this lapply code assumes that the data at that stage is sorted by group. I think this is always the case, but if for whatever reason it weren't you would end up with the y data out of order. An alternative which preserves the order of the rows in the calculated data would be:
p + geom_histogram(
aes(y={grp_total <- tapply(..count.., ..group.., sum);
..count.. / grp_total[as.character(..group..)]
}),
position='dodge'
)
It's also worth being aware that these expressions are evaluated in baseenv(), the namespace of the base package. This means that any functions from other packages, even standard ones like stats and utils, need to be used with the :: operator (e.g. stats::rnorm).
After playing around a little, the problem appears to be using anonymous functions with ..group.. or ..count.. inside aes:
xy <- data.frame(x=1:10,y=1:10) #data
ggplot(xy, aes(x = x, y = sapply(y, mean))) + geom_line() #sapply is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., mean))) #sapply with ..group.. is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., function(g) {mean(g)})))
#broken, with same error
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(y, function(g) {mean(g)})), stat = "identity")
#sapply with anonymous functions works fine!
It seems like a really weird bug, unless I'm missing something stupid.

character string as function argument r

I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)

Resources