How to pass literal text to a function in R - r

I'm new to R (and programming in general), and having some trouble wrapping my head around functions. I am trying to write a function for plotting a histogram of a given variable with a normal curve overlaid. Here, I have code that does this given a specific variable in data:
dev.new()
ggplot(data,aes(x = variable)) +
geom_histogram(aes(y=..density..),binwidth=.2)+
geom_density(na.rm=TRUE)+
stat_function(fun=dnorm, args=list(mean=mean(data$variable,na.rm=TRUE),
sd=sd(data$variable, na.rm=TRUE)), linetype=4, colour="red")
This code works just fine if I put in the specific data and variable, but if I try to pass the same things through a function, it no longer does. For instance:
plotnormal<-function(data,variable){
dev.new()
ggplot(data,aes(x = variable)) +
geom_histogram(aes(y=..density..),binwidth=.2)+
geom_density(na.rm=TRUE)+
stat_function(fun=dnorm, args=list(mean=mean(data$variable,na.rm=TRUE),
sd=sd(data$variable, na.rm=TRUE)), linetype=4, colour="red")}
plotnormal(data, variable)
What gives? Is there any way to just pass the exact text that I enter into the function?

When passing a string variable name into a function where you plan on using that name in a subset of a data frame or other structure, use
data[[variable]]
instead of
data$variable
For more, read help(Extract) Also, you'll want to use aes_string() instead of aes(). Then your updated function should be
plotnormal <- function(data, variable) {
dev.new()
ggplot(data, aes_string(x = variable)) +
geom_histogram(aes(y = ..density..), binwidth = .2) +
geom_density(na.rm = TRUE) +
stat_function(fun = dnorm, args = list(mean = mean(data[[variable]], na.rm = TRUE),
sd = sd(data[[variable]], na.rm = TRUE)), linetype = 4, colour = "red")
}
Hitting the space bar every once in a while helps too.

Related

Is there a way to pass the data of a ggplot2 call to the scale_* functions that works with .+gg in one pass [duplicate]

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

ggplot: plotting more than one function on one plot

I have a task to plot histogram using my data (here) named NoPodsWeight, its density and normal distribution for this segment (min(NoPodsWeight) and max(NoPodsWeight)).
I am trying this:
myframe <- read.csv(filepath, fileEncoding = "UTF-8", stringsAsFactors = FALSE)
myframe <- myframe[rowSums(is.na(myframe)) <= 0,]
nopodsweight <- myframe$NoPodsWeight
height <- myframe$Height
ggplot(myframe, aes(x = NoPodsWeight, y = ..density..)) +
geom_histogram(color="black", fill="white") +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(myframe$NoPodsWeight), sd = sd(myframe$NoPodsWeight)))
Using this code I get an error:
Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y =
..density...
Did you map your stat in the wrong layer?
I don't understand how to plot two or more functions on one plot. For example I can solve my problem using standard plot (but without density):
hist(x = nopodsweight, freq = F, ylim = c(0, 0.45), breaks = 37)
n_norm<-seq(min(nopodsweight)-1, max(nopodsweight)+1, 0.0001)
lines(n_norm, dnorm(n_norm), col = "red")
Is there any function in ggplot to plot (normal) distribution (or maybe using another function) like in lines?
You need to take ..density.. out of the ggplot() layer and put it specifically in the geom_histogram layer. I didn't download and import your data, but here's an example on mtcars:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..)) +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)))
The error message says "did you map your stat in the wrong layer?"; that's a hint. Moving aes(y=..density..) to apply specifically to geom_histogram() seems to make everything OK ...
ggplot(myframe, aes(x = NoPodsWeight)) +
geom_histogram(color="black", fill="white",
aes(y = ..density..)) +
## [... everything else ...]

How to define a function that invokes ggplot2 functions and takes a variable name among its arguments?

As an example, suppose that I have this snippet of code:
binwidth <- 0.01
my.histogram <- ggplot(my.data, aes(x = foo, fill = type)) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = "foo", y = "density")
Further, suppose that my.data has many other columns besides foo that could be plotted using pretty much the same code. Therefore, I would like to define a helper function make.histogram, so that I could replace the assignment above with something like:
my.histogram <- make.histogram(foo, bindwidth = 0.01)
Actually, this looks a bit weird to me. Would R complain that foo is not defined? Maybe the call would have to be this instead:
my.histogram <- make.histogram("foo", binwidth = 0.01)
Be that as it may, how would one define make.histogram?
For the purpose of this question, make.histogram may treat my.data as a global variable.
Also, note that in the snippet above, foo appears twice, once (as a variable) as the x argument in the first aes call, and once (as a string) as the x argument in the labs call. In other words, the make.histogram functions needs somehow to translate the column specified in its first argument into both a variable name and a string.
Not sure to understand your question.
Why couldn't you use aes_string() and define a function like below ?
make.histogram <- function(variable) {
p <- ggplot(my.data, aes_string(x = variable, fill = "type")) + (...) + xlab(variable)
print(p)
}
Since ggplot is part of the tidyverse, I think tidyeval will come in handy:
make.histogram <- function(var = "foo", bindwith = 0.01) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
ggplot(my.data, aes(x = !!enquo_varName, fill = type)) +
...
labs(x = var)
}
Basically, with as.name() we generate a name object that matches var (here var is a string like "foo"). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the ggplot() call using !!.
After reading the material that #andrew.punnett linked in his comment, it was very easy to code the desired function:
make.histogram <- function(column.name, binwidth = 0.02) {
base.aes <- eval(substitute(aes(x = column.name, fill = type)))
x.label <- deparse(substitute(column.name))
ggplot(my.data, base.aes) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = x.label, y = "density")
}
my.histogram <- make.histogram(foo, binwidth = 0.01)
The benefit of this solution is its generality: it relies only on base R functions (substitute, eval, and deparse), so it can be easily ported to situations outside of the ggplot2 context.

Refering to a variable of the data frame passed in the 'data' parameter of ggplot function

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

Computation failed for stat_summary, 'what' must be a character string or a function

I am trying to following the script/example on ChickWeight plotting raw data in "Independent Group T intervals test", but keeps running into the following error for stat_summary function
Code to reproduce here:
library(datasets)
data(ChickWeight)
library(ggplot2)
g <- ggplot(ChickWeight, aes(x = Time, y = weight,
colour = Diet, group = Chick))
g <- g + geom_line()
g <- g + stat_summary(aes(group = 1), geom = "line", fun.y = mean, size = 1, color = "black")
g <- g + facet_grid(. ~ Diet)
Error message:
"Computation failed in stat_summary():
'what' must be a character string or a function"
The error message is not very intuitive, I do not even see "what" as a param in the documentation of stat_summary, I did some research and check for others' answers but so far no concrete answer or solution to this problem.
The reason is that you have a variable called mean in your workspace. So when you call stat_summary …
stat_summary(aes(group = 1), geom = "line", fun.y = mean, size = 1, color = "black")
… R thinks that you’re referring to that variable, rather than the mean function from the {base} package.
R is usually able to disambiguate between functions and other variables, even if they have the same name. However, in this case the disambiguation isn’t working because you’re not calling mean directly but passing it as an argument. The solution is to manually disambiguate the function from the variable by doing either of these things:
In the call to stat_summary, use the fully qualified name base::mean, rather than bare mean.
In the call to stat_summary, use match.fun(mean) instead of bare mean: this will tell R that you want to use a function.
Remove or rename the variable.
Similar problem for geom_smooth(method=lm, se=FALSE, fullrange=TRUE), I got exactly the same error message. Because I have lm in my global environment.
Just fixed the problem by changing lm to "lm":
geom_smooth(method="lm", se=FALSE, fullrange=TRUE)
I got the same error when trying to plot summary stats for categories for a given continuous variable. The problem for mine was:
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymax = max,
fun.ymin = min,
fun.y = median)
Functions are not called here as objects. After trying string form this worked for me:
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymax = "max",
fun.ymin = "min",
fun.y = "median")
same problem here.
For me the trick was also that I needed strings as parameters. Example:
expBar + stat_summary(fun.y = "sum", geom = "bar", fill = "white", colour = "black")
instead of
expBar + stat_summary(fun.y = sum, geom = "bar", fill = "white", colour = "black")
made it work.
Hope that helps,
rikojir
Some of my students experienced that error because there were superfluous variables in the dataframe (that had not been dropped before/during the use of the reshape function). Try creating a temporary dataframe where variables that will not be used are dropped.
df_temp <- df[c("needed1", "needed2", "needed3")]
You may also drop variables using -before them.

Resources