I have to plot multiple datasets in the same format, and after copy-pasting the code several times, I decided to write a function.
I understand simple function in R, and managed to write the following:
testplot <- function(data, mapping){
output <- ggplot(data) +
geom_bar(mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
this works fine, however, my plot is more complicated and requires the "data" argument to go separately into each component:
output <- ggplot() +
geom_bar(df1, mapping,
stat="identity",
position='stack')+
geom_errorbar(df1, ...)+
geom+bar(df2, mapping,
...+
geom_errorbar(df2, ...)
but when I write the function and try to run it as
output <- ggplot() +
geom_bar(data, mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
it gives me an error:
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval Did you accidentally pass `aes()` to the `data` argument?
Is there a way around it?
EDIT: when I try to include 2 dataframes like this:
testplot <- function(data, data2, mapping){
output <- ggplot() +
geom_bar(data=data, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)+
geom_bar(data2=data2, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)
}
p <- testplot(data=df, data2=df2, mapping=aes(x=norms_number, y=coeff.BLDRT, fill=type))
it says "Ignoring unknown parameters: data2"
Most of the first arguments to the ggplot2 layer functions are reserved for the mapping argument, which is from aes.
So in your function definition you have a dataframe "data" being implicitly assigned to the mapping variable.
To get around this, explicitly assign data = data in your function definitions.
for example
output <- ggplot() +
geom_bar(data = data, mapping = mapping,
stat="identity",
position='stack')
}
EDIT:
There are many ways to do this and it really depends on how complex you want your function to be. If you are gonna stick to a global aesthetic mapping, then you can leave the mapping in the main ggplot call and assign data = NULL, then specify which data frame will be associated with which layer.
Consider the following reproducible example
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, color = "red")
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
In this example, the mapping variable for each of the geom_* layer functions are left blank and instead the mapping is inherited from the main ggplot call.
This is usually how each layer function knows what data to use, because generally it is inherited in the main ggplot function. Whenever you specify a data argument or a mapping argument, you are generally overriding the inherited values. Any missing required aes mappings are attempted to be found in the main call.
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10), z = c("A","B"))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, mapping = aes(color = z)) #inherits x and y mapping from main ggplot call.
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
But adding additional aes mappings is risky if you are also specifying data. This is because you data variable may not always contain the correct columns.
plot_custom_ggplot(df1 = data2, df2 = data1, aes(x = v1, y = v2))
#Error in FUN(X[[i]], ...) : object 'z' not found
#
#the column z is not present in data1 object -
#R then looked globally for a z object and didnt find anything.
I believe it is best practices to use tidy data when working with ggplot because things become so much easier. There is usually no reason to use multiple data frames. Especially if you plan to use one set of mapping for all data frames. A good exception is if you are writing a plotting function for a custom R object, in which you know how it is defined.
Otherwise, consider and compare how these two functions work in this example:
data1 <- data.frame(v1=rnorm(20, 50, 20), v2=rnorm(20,30,5), letters= letters[1:20], id = "df1")
data2 <- data.frame(v1=rnorm(20, 100, 20), v2=rnorm(20,50,10), letters = letters[17:26], id = "df2")
set.seed(76)
plot_custom_ggplot2 <- function(df, mapping) {
ggplot(data = df, mapping = mapping) +
geom_bar(stat = "identity",
position="stack")
}
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_bar(data = df1, stat = "identity",
position="stack") +
geom_bar(data = df2, stat = "identity",
position="stack")
}
plot_custom_ggplot(data1,data2, aes(x = letters,y = v2, fill = id))
plot_custom_ggplot2(rbind(data1,data2), aes(x = letters, y = v2, fill = id))
In the first plot, the red bars for q, r, s, and t are hidden behind the blue bars. This is because they are added on top of each other as layers. In the second plot, these values actually stack because these values were added together in a single layer rather than two separate ones.
I hope this gives you enough information to write your ggplot function.
library(tidyverse)
testplot <- function(df1, df2, mapping){
a <- ggplot() +
geom_point(data = df1, mapping = mapping) +
geom_point(data = df2, mapping = mapping)
return(a)
}
mtcars2 <- mtcars / 100 # creating a separate dataframe to provide the function
testplot(mtcars, mtcars2, mapping = aes(x = drat, y = vs))
From your example you have "data2=data2" - geom_bar doesn't have an argument 'data2', only data. I got the above to work, so an adaptation for your purposes should work too!
The reason I split my dataframe was because I wanted a grouped and stacked plot, and used this question:
How to plot a Stacked and grouped bar chart in ggplot?
The mapping has to be different so that they don't end up on top on each other (so it's x=var1, and then x=var1+barwidth)
Anyway, I can make a plot with multiple geom_bar, but it's the subsequent geom_errorbar that doesn't work in a single function. I just added the error bars separately in the end, and maybe I'll look into the other options some other time.
I realise these are already functions so probably not meant to be used this way, and maybe that's why I can't do multiple geom_errorbar in one function. I just wanted my code to be more readable because I had to plot the same thing 12 times, with very minor differences and it was very long. Perhaps there is a more elegant way to do it though.
Related
I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}
I find very difficult to put labels for sites with a DCA in a autoplot or ggplot.
I also want to differentiate the points on the autoplot/ggplot according to their groups.
This is the data and the code I used and it went well until the command for autoplot/ggplot:
library(vegan)
data(dune)
d <- vegdist(dune)
csin <- hclust(d, method = "single")
cl <- cutree(csin, 3)
dune.dca <- decorana(dune)
autoplot(dune.dca)
This is the autoplot obtained:
I am using simple coding and I tried these codes but they didn't led me anywhere:
autoplot(dune.dca, label.size = 3, data = dune, colour = cl)
ggplot(dune.dca(x=DCA1, y=DCA2,colour=cl))
ggplot(dune.dca, display = ‘site’, pch = 16, col = cl)
ggrepel::geom_text_repel(aes(dune.dca))
If anyone has a simple suggestion, it could be great.
With the added information (package) I was able to go and dig a bit deeper.
The problem is (in short) that autoplot.decorana adds the data to the specific layer (either geom_point or geom_text). This is not inherited to other layers, so adding additional layers results in blank pages.
Basically notice that one of the 2 code strings below results in an error, and note the position of the data argument:
# Error:
ggplot() +
geom_point(data = mtcars, mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
# Work:
ggplot(data = mtcars) +
geom_point(mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
ggvegan:::autoplot.decorana places data as in the example the returns an error.
I see 2 ways to get around this problem:
Extract the layers data using ggplot_build or layer_data and create an overall or single layer mapping.
Extract the code for generating the data, and create our plot manually (not using autoplot).
I honestly think the second is simpler, as we might have to extract more information to make our data sensible. By looking at the source code of ggvegan:::autoplot.decorana (simply printing it to console by leaving out brackets) we can extract the below code which generates the same data as used in the plot
ggvegan_data <- function(object, axes = c(1, 2), layers = c("species", "sites"), ...){
obj <- fortify(object, axes = axes, ...)
obj <- obj[obj$Score %in% layers, , drop = FALSE]
want <- obj$Score %in% c("species", "sites")
obj[want, , drop = FALSE]
}
With this we can then generate any plot that we desire, with appropriate mappings rather than layer-individual mappings
dune.plot.data <- ggvegan_data(dune.dca)
p <- ggplot(data = dune.dca, aes(x = DCA1, DCA2, colour = Score)) +
geom_point() +
geom_text(aes(label = Label), nudge_y = 0.3)
p
Which gives us what I hope is your desired output
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))
I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}
I have the following plot function using ggplot2.
Function_Plot <- function(Fun_Data, Fun_Color)
{
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
The result is :
I need to upgrade my function to reorder the bar according frequencies of the words (in descending order). As I see the answer for another question about reordering, I try to introduce reorder function in the aes_string but it doesn't work.
A reproducible example :
a <- c("G1","G1","G1","G1","G1","G1","G1","G1","G1","G1","G2","G2","G2","G2","G2","G2","G2","G2")
b <- c("happy","sad","happy","bravery","bravery","God","sad","happy","freedom","happy","freedom",
"God","sad","happy","freedom",NA,"money","sad")
MyData <- data.frame(Cluster = a, Word = b)
MyColor <- c("red","blue")
Function_Plot(Fun_Data = MyData, Fun_Color = MyColor)
Well, if reordering doesn't work inside aes_string, let's try it beforehand.
Function_Plot <- function(Fun_Data, Fun_Color)
{
Fun_Data[[2]] <- reorder(Fun_Data[[2]], Fun_Data[[2]], length)
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
Function_Plot()
Couple other notes - I'd recommend you use a more consistent style, mixing whether or not use use _ to separate words in variable names is confusing and asking for bugs.
It won't matter much unless your data is really big, but extracting names from a data frame is very efficient, whereas subsetting a data frame is less efficient. Your code subsets a data frame and then extracts the column names remaining, e.g., colnames(Fun_Data[1]). It will be cleaner to extract the names and then subset that vector: colnames(Fun_Data)[1]