R - Reorder a bar plot in a function using ggplot2 - r

I have the following plot function using ggplot2.
Function_Plot <- function(Fun_Data, Fun_Color)
{
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
The result is :
I need to upgrade my function to reorder the bar according frequencies of the words (in descending order). As I see the answer for another question about reordering, I try to introduce reorder function in the aes_string but it doesn't work.
A reproducible example :
a <- c("G1","G1","G1","G1","G1","G1","G1","G1","G1","G1","G2","G2","G2","G2","G2","G2","G2","G2")
b <- c("happy","sad","happy","bravery","bravery","God","sad","happy","freedom","happy","freedom",
"God","sad","happy","freedom",NA,"money","sad")
MyData <- data.frame(Cluster = a, Word = b)
MyColor <- c("red","blue")
Function_Plot(Fun_Data = MyData, Fun_Color = MyColor)

Well, if reordering doesn't work inside aes_string, let's try it beforehand.
Function_Plot <- function(Fun_Data, Fun_Color)
{
Fun_Data[[2]] <- reorder(Fun_Data[[2]], Fun_Data[[2]], length)
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
Function_Plot()
Couple other notes - I'd recommend you use a more consistent style, mixing whether or not use use _ to separate words in variable names is confusing and asking for bugs.
It won't matter much unless your data is really big, but extracting names from a data frame is very efficient, whereas subsetting a data frame is less efficient. Your code subsets a data frame and then extracts the column names remaining, e.g., colnames(Fun_Data[1]). It will be cleaner to extract the names and then subset that vector: colnames(Fun_Data)[1]

Related

Represent dataset in column bar in R using ggplot [duplicate]

I have a csv file which looks like the following:
Name,Count1,Count2,Count3
application_name1,x1,x2,x3
application_name2,x4,x5,x6
The x variables represent numbers and the applications_name variables represent names of different applications.
Now I would like to make a barplot for each row by using ggplot2. The barplot should have the application_name as title. The x axis should show Count1, Count2, Count3 and the y axis should show the corresponding values (x1, x2, x3).
I would like to have a single barplot for each row, because I have to store the different plots in different files. So I guess I cannot use "melt".
I would like to have something like:
for each row in rows {
print barplot in file
}
Thanks for your help.
You can use melt to rearrange your data and then use either facet_wrap or facet_grid to get a separate plot for each application name
library(ggplot2)
library(reshape2)
# example data
mydf <- data.frame(name = paste0("name",1:4), replicate(5,rpois(4,30)))
names(mydf)[2:6] <- paste0("count",1:5)
# rearrange data
m <- melt(mydf)
# if you are wanting to export each plot separately
# I used facet_wrap as a quick way to add the application name as a plot title
for(i in levels(m$name)) {
p <- ggplot(subset(m, name==i), aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
ggsave(paste0("figure_",i,".pdf"), p)
}
# or all plots in one window
ggplot(m, aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
I didn't see #user20650's nice answer before preparing this. It's almost identical, except that I use plyr::d_ply to save things instead of a loop. I believe dplyr::do() is another good option (you'd group_by(Name) first).
yourData <- data.frame(Name = sample(letters, 10),
Count1 = rpois(10, 20),
Count2 = rpois(10, 10),
Count3 = rpois(10, 8))
library(reshape2)
yourMelt <- melt(yourData, id.vars = "Name")
library(ggplot2)
# Test a function on one piece to develope graph
ggplot(subset(yourMelt, Name == "a"), aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = subset(yourMelt, Name == 'a')$Name)
# Wrap it up, with saving to file
bp <- function(dat) {
myPlot <- ggplot(dat, aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = dat$Name)
ggsave(filname = paste0("path/to/save/", dat$Name, "_plot.pdf"),
myPlot)
}
library(plyr)
d_ply(yourMelt, .variables = "Name", .fun = bp)

writing R function with ggplot

I have to plot multiple datasets in the same format, and after copy-pasting the code several times, I decided to write a function.
I understand simple function in R, and managed to write the following:
testplot <- function(data, mapping){
output <- ggplot(data) +
geom_bar(mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
this works fine, however, my plot is more complicated and requires the "data" argument to go separately into each component:
output <- ggplot() +
geom_bar(df1, mapping,
stat="identity",
position='stack')+
geom_errorbar(df1, ...)+
geom+bar(df2, mapping,
...+
geom_errorbar(df2, ...)
but when I write the function and try to run it as
output <- ggplot() +
geom_bar(data, mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
it gives me an error:
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval Did you accidentally pass `aes()` to the `data` argument?
Is there a way around it?
EDIT: when I try to include 2 dataframes like this:
testplot <- function(data, data2, mapping){
output <- ggplot() +
geom_bar(data=data, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)+
geom_bar(data2=data2, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)
}
p <- testplot(data=df, data2=df2, mapping=aes(x=norms_number, y=coeff.BLDRT, fill=type))
it says "Ignoring unknown parameters: data2"
Most of the first arguments to the ggplot2 layer functions are reserved for the mapping argument, which is from aes.
So in your function definition you have a dataframe "data" being implicitly assigned to the mapping variable.
To get around this, explicitly assign data = data in your function definitions.
for example
output <- ggplot() +
geom_bar(data = data, mapping = mapping,
stat="identity",
position='stack')
}
EDIT:
There are many ways to do this and it really depends on how complex you want your function to be. If you are gonna stick to a global aesthetic mapping, then you can leave the mapping in the main ggplot call and assign data = NULL, then specify which data frame will be associated with which layer.
Consider the following reproducible example
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, color = "red")
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
In this example, the mapping variable for each of the geom_* layer functions are left blank and instead the mapping is inherited from the main ggplot call.
This is usually how each layer function knows what data to use, because generally it is inherited in the main ggplot function. Whenever you specify a data argument or a mapping argument, you are generally overriding the inherited values. Any missing required aes mappings are attempted to be found in the main call.
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10), z = c("A","B"))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, mapping = aes(color = z)) #inherits x and y mapping from main ggplot call.
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
But adding additional aes mappings is risky if you are also specifying data. This is because you data variable may not always contain the correct columns.
plot_custom_ggplot(df1 = data2, df2 = data1, aes(x = v1, y = v2))
#Error in FUN(X[[i]], ...) : object 'z' not found
#
#the column z is not present in data1 object -
#R then looked globally for a z object and didnt find anything.
I believe it is best practices to use tidy data when working with ggplot because things become so much easier. There is usually no reason to use multiple data frames. Especially if you plan to use one set of mapping for all data frames. A good exception is if you are writing a plotting function for a custom R object, in which you know how it is defined.
Otherwise, consider and compare how these two functions work in this example:
data1 <- data.frame(v1=rnorm(20, 50, 20), v2=rnorm(20,30,5), letters= letters[1:20], id = "df1")
data2 <- data.frame(v1=rnorm(20, 100, 20), v2=rnorm(20,50,10), letters = letters[17:26], id = "df2")
set.seed(76)
plot_custom_ggplot2 <- function(df, mapping) {
ggplot(data = df, mapping = mapping) +
geom_bar(stat = "identity",
position="stack")
}
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_bar(data = df1, stat = "identity",
position="stack") +
geom_bar(data = df2, stat = "identity",
position="stack")
}
plot_custom_ggplot(data1,data2, aes(x = letters,y = v2, fill = id))
plot_custom_ggplot2(rbind(data1,data2), aes(x = letters, y = v2, fill = id))
In the first plot, the red bars for q, r, s, and t are hidden behind the blue bars. This is because they are added on top of each other as layers. In the second plot, these values actually stack because these values were added together in a single layer rather than two separate ones.
I hope this gives you enough information to write your ggplot function.
library(tidyverse)
testplot <- function(df1, df2, mapping){
a <- ggplot() +
geom_point(data = df1, mapping = mapping) +
geom_point(data = df2, mapping = mapping)
return(a)
}
mtcars2 <- mtcars / 100 # creating a separate dataframe to provide the function
testplot(mtcars, mtcars2, mapping = aes(x = drat, y = vs))
From your example you have "data2=data2" - geom_bar doesn't have an argument 'data2', only data. I got the above to work, so an adaptation for your purposes should work too!
The reason I split my dataframe was because I wanted a grouped and stacked plot, and used this question:
How to plot a Stacked and grouped bar chart in ggplot?
The mapping has to be different so that they don't end up on top on each other (so it's x=var1, and then x=var1+barwidth)
Anyway, I can make a plot with multiple geom_bar, but it's the subsequent geom_errorbar that doesn't work in a single function. I just added the error bars separately in the end, and maybe I'll look into the other options some other time.
I realise these are already functions so probably not meant to be used this way, and maybe that's why I can't do multiple geom_errorbar in one function. I just wanted my code to be more readable because I had to plot the same thing 12 times, with very minor differences and it was very long. Perhaps there is a more elegant way to do it though.

Loops, dataframes and ggplot

I would like to display multiple plots on the same page using ggplot, and the multiplot function described here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/. My data is stored in a large dataframe with the first column corresponding to period. I want to visualize columns 2:26. My issue is reproducible using:
rawdata1 <- data.frame("Period" = 1:34, "Sample" = sample(x = c(1,2),34, replace = TRUE),"Runif" = runif(n = 34))
Intuitively, I would use the following code: (With 2:3 replaced with 2:26)
out <- NULL
for (i in 2:3){
out[[i-1]] <- ggplot(rawdata1, aes("Period", y = value)) + geom_line(aes(x = Period, y = rawdata1[[i]])) + ggtitle(label = colnames(rawdata1)[i])
}
multiplot(plotlist = out, cols = 2)
This succeeds in plotting multiple graphs, however my problem is that each graph that is plotted uses data from the same column (column 3 in the above example, column 26 in my dataset). I've puzzled out that this is because my "out" list stores the ggplot list with the y values stored dynamically.
i's final value is 26, and when I call an item from "out", it uses the current value for i to create the graph. So every graph displays using the same column. As I am new to R, so my guess is that I am not managing my variables correctly. Any help would be appreciated
Below you find an alternative: using the melt function from reshape2 and then faceting with facet_wrap.
require(ggplot2)
require(reshape2)
data.melt <- melt(rawdata1, id.var='Period')
ggplot(data.melt, aes(Period, value)) +
geom_line() +
facet_wrap(~variable, scales='free_y')
If you want to use multiplot instead, you could do the following:
out <- lapply(names(rawdata1)[-1],
function(index) ggplot(rawdata1) +
geom_line(aes_string(x = 'Period', y = index)) +
ggtitle(label = index))
multiplot(plotlist = out, cols = 2)

Data driven plot names in data.table

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.
Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.
Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.
I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)

How to specify columns in facet_grid OR how to change labels in facet_wrap

I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)
I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)
To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.
Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen
Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)

Resources