Bar Plots with two variables - r

I am trying to create a bar plot, with both base r plotting and geom_point plot with ggplot to visualize two variables by a column of factors.
I have created a datasheet that uses r-squared values for Species and Elasmobranch variables calculated using each factor.
Graph <- structure(list(Factors = structure(c(5L, 4L, 11L, 6L, 8L, 10L
), .Label = c("Activity", "Bait", "Depth", "Location", "Marine Park",
"Month", "Sea State", "Start Time", "Substrate", "Swell", "Year"
), class = "factor"), Species = c(0.1064, 0.5806, 0.05974, 0.07888,
0.1325, 0.05725), Elasmobranchs = c(0.02658, 0.4074, 0.02072,
0.1419, 0.1065, 0.08661)), row.names = c(NA, 6L), class = "data.frame")
ggplot(data = Graph, aes(x = Species, y = Factors, colour = Factors)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
theme_classic()
barplot(Species~Factors, xlab="Factors", ylab="R-Sqaured Values",
horizontal = TRUE,
main = "Factor correlations with species richness", frame.plot=FALSE)
barplot(Elasmobranchs~Factors, xlab="Factors", ylab="R-Sqaured Values",
horizontal = TRUE,
main = "Factor correlations with species richness", frame.plot=FALSE)
These ggplot and standard plots work nicely, however I would simply like to add Elasmobranchs on the x-axis alongside Species and have the result displayed in decreasing order. Is there a simple way to do this by adding a small line of code to my existing plots?
Thank you for any assistance.

If I understood correctly what you're trying to do, you can reshape your data frame using pivot_longer (former gather) and then differentiate between Species and Elasmobranchs through shape. You can reverse the scale using scale_x_reverse.
library(tidyverse)
ggplot(data = Graph %>% pivot_longer(-Factors),
aes(x = value, y = Factors, colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
scale_x_reverse() +
theme_classic()
EDIT:
In case you're working with an older version of tidyr that doesn't have pivot_longer function, you can use gather (from the same package).
ggplot(data = Graph %>% gather("name", "value", -Factors),
aes(x = value, y = Factors, colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
scale_x_reverse() +
theme_classic()
EDIT 2:
To reorder the y-axis based on the values of x-axis.
ggplot(data = Graph %>% gather("name", "value", -Factors),
aes(x = value, y = fct_reorder(Factors, value), colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
theme_classic()

Related

plotly overrules ggplot2's scale_fill_manual's labels

I have a sample data set containing a end of week date and a churn value, either be negative or positive. In ggplot2 I use the scale_fill_manual() on the sign of the value as group.
This works perfectly fine showing the colors for positive versus negative values. Also the labels get rewritten according to the labels provided. However if I simply make it a plotly graph I lose my labels and they are set back to the -1, 1 factors instead. Does plotly not support this and if so is their another way to get this done
library(ggplot2)
library(plotly)
dt <- structure(list(date = structure(c(18651L, 18658L, 18665L, 18672L,
18679L, 18686L, 18693L, 18700L, 18707L, 18714L), class = c("IDate",
"Date")), churn = c(-3.27088948787062, -0.582518144525087, -0.125024925224327,
-0.333746898263027, -0.685714285714286, -0.340165549862042, 0.0601176470588235,
-0.119351608461635, -0.0132513279284316, -0.011201854099989)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
plot_ggplot <- ggplot(dt, aes(x = date, y = churn * 100)) +
geom_bar(stat = "identity", aes(fill = factor(sign(churn)))) +
scale_fill_manual(
values = c("#4da63f", "#e84e62"),
breaks = c("-1", "1"),
labels = c("Growing base", "Declining base")
) +
ylim(-75, 25) +
labs(
title = "Weekly churn rate",
fill = "Legend"
)
plot_ggplot
plot_ggplotly <- ggplotly(plot_ggplot)
plot_ggplotly
Does this do the trick?
dt$base = ifelse(sign(dt$churn)>0, "Growing base","Declining base")
plot_ggplot <- ggplot(dt, aes(x = date, y = churn * 100)) +
geom_bar(stat = "identity", aes(fill = base)) +
scale_fill_manual(
values = c("#4da63f", "#e84e62"),
) +
ylim(-75, 25) +
labs(
title = "Weekly churn rate",
fill = "Legend"
)
plot_ggplot
plot_ggplotly <- ggplotly(plot_ggplot)
edit: I just read the comment, I think it is what was suggested

Adding percentage labels on pie chart in R

My data frame looks like
df
Group value
1 Positive 52
2 Negative 239
3 Neutral 9
I would like to make a pie chart of the data frame using ggplot.
pie <- ggplot(df, aes(x="", y=value, fill=Group)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0)
This is my pie chart.
But when I try to add percentage labels on the chart
pie <- ggplot(df, aes(x="", y=value, fill=Group)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0) +
geom_text(aes(y = value/2 + c(0, cumsum(value)[-length(value)]),
label = percent(value/300 )), size=5)
This is my result.
I have already seen many same question as mine,i.e R + ggplot2 => add labels on facet pie chart and the solutions are not helping.
How about:
vals <- c(239, 52, 9)
val_names <- sprintf("%s (%s)", c("Negative", "Positive", "Neutral"), scales::percent(round(vals/sum(vals), 2)))
names(vals) <- val_names
waffle::waffle(vals) +
ggthemes::scale_fill_tableau(name=NULL)
instead?
It's "fresher" than a pie chart and you aren't really gaining anything with the level of precision you have/want on those pie labels now.
For example, I create a dataframe e3 with 400 vehicles:
e3 <- data.frame(400)
e3 <- rep( c("car", "truck", "other", "bike", "suv"), c(60, 120, 20, 50, 150))
Since pie charts are especially useful for proportions, let's have a look on the proportions of our vehicles, than we will report on the graph in this case:
paste(prop.table(table(e3))*100, "%", sep = "")
[1] "15%" "5%" "30%" "12.5%" "37.5%"
Then you can draw your pie chart,
pie(table(e3), labels = paste(round(prop.table(table(e3))*100), "%", sep = ""),
col = heat.colors(5), main = "Vehicles proportions - n: 400")
Here is an idea matching the order of groups in the pie chart and the order of labels. I sorted the data in descending order by value. I also calculated the percentage in advance. When I drew the ggplot figure, I specified the order of Group in the order in mydf (i.e., Negative, Positive, and Neutral) using fct_inorder(). When geom_label_repel() added labels to the pie, the order of label was identical to that of the pie.
library(dplyr)
library(ggplot2)
library(ggrepel)
library(forcats)
library(scales)
mydf %>%
arrange(desc(value)) %>%
mutate(prop = percent(value / sum(value))) -> mydf
pie <- ggplot(mydf, aes(x = "", y = value, fill = fct_inorder(Group))) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
geom_label_repel(aes(label = prop), size=5, show.legend = F, nudge_x = 1) +
guides(fill = guide_legend(title = "Group"))
DATA
mydf <- structure(list(Group = structure(c(3L, 1L, 2L), .Label = c("Negative",
"Neutral", "Positive"), class = "factor"), value = c(52L, 239L,
9L)), .Names = c("Group", "value"), class = "data.frame", row.names = c("1",
"2", "3"))
I agree with #hrbrmstr a waffle chart would be better. But to answer the original question... your problem comes from the order in which the wedges are drawn, which will default to alphabetical. As you calculate where to place the labels based on the ordering in your data frame, this works out wrong.
As a general principle of readability, do all the fancy calculations of labels and positions they go before the actual code drawing the graphic.
library(dplyr)
library(ggplot2)
library(ggmap) # for theme_nothing
df <- data.frame(value = c(52, 239, 9),
Group = c("Positive", "Negative", "Neutral")) %>%
# factor levels need to be the opposite order of the cumulative sum of the values
mutate(Group = factor(Group, levels = c("Neutral", "Negative", "Positive")),
cumulative = cumsum(value),
midpoint = cumulative - value / 2,
label = paste0(Group, " ", round(value / sum(value) * 100, 1), "%"))
ggplot(df, aes(x = 1, weight = value, fill = Group)) +
geom_bar(width = 1, position = "stack") +
coord_polar(theta = "y") +
geom_text(aes(x = 1.3, y = midpoint, label = label)) +
theme_nothing()
This is my example, using only the basic R code. Hope it help.
Take iris for example
attach(iris)
check the the ratio of iris$Species
a<- table(iris$Species)
class(a)
then convert table format into matrix in order to use rowname code
a_mat<- as.matrix(a)
a_mat
calculate the ratio of each Species
a_ratio<- a_mat[,1]/sum(a_mat[,1])*100
a_ratio
since each Species accounts for 0.33333 (i.e. 33.33333%), I just want 2 decimal places by using signif()
a_ratio<- signif(a_ratio,3)
a_ratio
basic pie chart code of R base
pie(a_ratio,labels=rownames(a_mat))
further add ratio values to labels by using paste()
pie(a_ratio,labels=paste(rownames(a_mat),c("33%","33%","34%")))
final pie chart, please click this link

Ordered bar graphs using ggplot2 and facet

I have a data.frame that looks something like this:
HSP90AA1 SSH2 ACTB TotalTranscripts
ESC_11_TTCGCCAAATCC 8.053308 12.038484 10.557234 33367.23
ESC_10_TTGAGCTGCACT 9.430003 10.687959 10.437068 30285.41
ESC_11_GCCGCGTTATAA 7.953726 9.918988 10.078192 30133.94
ESC_11_GCATTCTGGCTC 11.184402 11.056144 8.316846 24857.07
ESC_11_GTTACATTTCAC 11.943733 11.004500 9.240883 23629.00
ESC_11_CCGTTGCCCCTC 7.441695 9.774733 7.566619 22792.18
The TotalTranscripts column is sorted in descending order. What I'd like to do is generate three bar graphs using ggplot2 with each bar graph corresponding to each column of the data.frame with the exception of TotalTranscripts. I'd like the bar graphs to be ordered by TotalTranscripts just as the data.frame. I would be ideal to have these bar graphs on one plot using a facet wrap.
Any help would be greatly appreciated! Thank you!
EDIT: Here is my current code using barplot().
cells = "ESC"
genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
barplot(as.matrix(g[1]), beside=TRUE, names.arg=paste(rownames(g)," (",g$TotalTranscripts,")",sep=""), las=2, col="light blue", cex.names=0.3, main=paste(colnames(g)[1], "\nCells sorted by total number of transcripts (colSums)", sep=""))
This will generate a plot that looks like this.
Again, the problem I seem to be having here is how to have multiple of these plots on the same image. I would like to add 20+ columns to this data.frame but I've cut this down to 3 for the sake of simplicity.
EDIT: Current code incorporating the answer below
cells = "ESC"
genes = rownames(data[x,])[1:8]
# genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
g$rowz <- row.names(g)
g$Cells <- reorder(g$rowz, rev(g$TotalTranscripts))
df1 <- melt(g, id.vars = c("Cells", "TotalTranscripts"), measure.vars=genes)
ggplot(df1, aes(x = Cells, y = value)) + geom_bar(stat = "identity") +
theme(axis.title.x=element_blank(), axis.text.x = element_blank()) +
facet_wrap(~ variable, scales = "free") +
theme_bw() + theme(axis.text.x = element_text(angle = 90))
Here is the example data for anybody else:
df <- structure(list(HSP90AA1 = c(8.053308, 9.430003, 7.953726, 11.184402,
11.943733, 7.441695), SSH2 = c(12.038484, 10.687959, 9.918988,
11.056144, 11.0045, 9.774733), ACTB = c(10.557234, 10.437068,
10.078192, 8.316846, 9.240883, 7.566619), TotalTranscripts = c(33367.23,
30285.41, 30133.94, 24857.07, 23629, 22792.18)), .Names = c("HSP90AA1",
"SSH2", "ACTB", "TotalTranscripts"), class = "data.frame", row.names = c("ESC_11_TTCGCCAAATCC",
"ESC_10_TTGAGCTGCACT", "ESC_11_GCCGCGTTATAA", "ESC_11_GCATTCTGGCTC",
"ESC_11_GTTACATTTCAC", "ESC_11_CCGTTGCCCCTC"))
And here is a solution:
#New column for row names so they can be used as x-axis elements
df$rowz <- row.names(df)
#Explicitly order the rows (see the Kohske link)
df$rowz1 <- reorder(df$rowz, rev(df$TotalTranscripts))
library(reshape2)
#Melt the data from wide to long
df1 <- melt(df, id.vars = c("rowz1", "TotalTranscripts"),
measure.vars = c("HSP90AA1", "SSH2", "ACTB"))
library(ggplot2)
gp <- ggplot(df1, aes(x = rowz1, y = value)) + geom_bar(stat = "identity") +
facet_wrap(~ variable, scales = "free") +
theme_bw()
gp + theme(axis.text.x = element_text(angle = 90))
This example by Kohske is a constant reference for me on ordering elements in ggplot2.
If you have many columns, but the same six ESC complexes, you can switch the groupings, i.e. x = variable and facet_wrap(~ rowz1), but this fundamentally changes how you are visualizing/comparing your data. Also, consider facet_grid(row ~ column) if you can organize the columns by 2 components (Columns being the data that are melted into 'variable' and 'value').
And this additional SO solution isn't related to your question, but it is an elegant way to reorder elements in each facet by their values (for future reference).
Finally, the method that will give you the finest control is to plot each graph separately and combine the grobs. Baptiste's packages like gridExtra and gtable are useful for these tasks.
**EDIT in response to new information from OP**
The OP has subsequently asked how to visualize the data, especially when there are more ESC categorical variables (up to 600+).
Here are some examples, with the big caveat that with many categorical variables, they should be grouped or converted to a continuous variable somehow.
#Plot colour to a few discrete, categorical variables
gp + aes(fill = rowz1) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
labs(x = NULL, fill = "Cell", title = "Discrete categorical variables")
#Plot colour on a continuous scale.
#Ultimately, not appropriate for this example! (but shown for reference)
#More appropriate: fill = TotalTranscripts
gp + aes(fill = as.numeric(rowz1)) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
labs(x = NULL, title = "Continuous variables (legend won't work for many values)") +
scale_fill_gradient2(name = "Cell",
breaks = as.numeric(df1$rowz1),
labels = df1$rowz1,
midpoint=median(as.numeric(df1$rowz1)))
#x is continuous, colour plotted to the categorical variable.
#Same caveats as earlier.
gp1 <- ggplot(df1, aes(x = TotalTranscripts/1000, y = value, colour = rowz1)) +
geom_point(size=3) + facet_wrap(~ variable, scales = "free") +
labs(title = "X is an actual continuous variable") +
theme_bw() + labs(x = bquote("Total Transcripts,"~10^3), colour = "Cell")
gp1

R Side-by-side grouped boxplot

I have temporal data of gas emissions from two species of plant, both of which have been subjected to the same treatments. With some previous help to get this code together [edit]:
soilflux = read.csv("soil_fluxes.csv")
library(ggplot2)
soilflux$Treatment <- factor(soilflux$Treatment,levels=c("L-","C","L+"))
soilplot = ggplot(soilflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment)) + stat_boxplot(geom ='errorbar') + geom_boxplot()
soilplot = soilplot + labs(x = "Week", y = "Flux (mg m-2 d-1)") + theme_bw(base_size = 12, base_family = "Helvetica")
soilplot
Producing this which works well but has its flaws.
Whilst it conveys all the information I need it to, despite Google trawls and looking through here I just couldn't get the 'Treatment' part of the legend to show that L- is light and L+ darkest. I've also been told that a monochrome colour scheme is easier to differentiate hence I'm trying to get something like this where the legend is clear.
(source: biomedcentral.com)
As a workaround you could create a combined factor from species and treatment and assign the fill colors manually:
library(ggplot2)
library(RColorBrewer)
d <- expand.grid(week = factor(1:4), species = factor(c("Heisteria", "Simarouba")),
trt = factor(c("C", "L-", "L+"), levels = c("L-", "C", "L+")))
d <- d[rep(1:24, each = 30), ]
d$flux <- runif(NROW(d))
# Create a combined factor for coding the color
d$spec.trt <- interaction(d$species, d$trt, lex.order = TRUE, sep = " - ")
ggplot(d, aes(x = week, y = flux, fill = spec.trt)) +
stat_boxplot(geom ='errorbar') + geom_boxplot() +
scale_fill_manual(values = c(brewer.pal(3, "Greens"), brewer.pal(3, "Reds")))

More bullseye plotting in R

I'm using ggplot2 to make some bullseye charts in R. They look delightful, and everyone is very pleased - except that they'd like to have the values of the bullseye layers plotted on the chart. I'd be happy just to put them in the lower-right corner of the plot, or even in the plot margins, but I'm having some difficulty doing this.
Here's the example data again:
critters <- structure(list(Zoo = "Omaha", Animals = 50, Bears = 10, PolarBears = 3), .Names = c("Zoo",
"Animals", "Bears", "PolarBears"), row.names = c(NA, -1L), class = "data.frame")
And how to plot it:
d <- data.frame(animal=factor(c(rep("Animals", critters$Animals),
rep("Bears", critters$Bears), rep("PolarBears", critters$PolarBears)),
levels = c("PolarBears", "Bears", "Animals"), ordered= TRUE))
grr <- ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))
I'd like to add a list to, say, the lower right corner of this plot saying,
Animals: 50
Bears: 10
PolarBears: 3
But I can't figure out how. My efforts so far with annotate() have been thwarted, in part by the polar coordinates. If I have to add the numbers to the title, so be it - but I always hold out hope for a more elegant solution.
EDIT:
An important note for those who come after: the bullseye is a bar plot mapped to polar coordinates. The ggplot2 default for bar plots is, sensibly, to stack them. However, that means that the rings of your bullseye will also be stacked (e.g. the radius in my example equals the sum of all three groups, 63, instead of the size of the largest group, 50). I don't think that's what most people expect from a bullseye plot, especially when the groups are nested. Using geom_bar(position = position_identity()) will turn the stacked rings into layered circles.
EDIT 2: Example from ggplot2 docs:
you can also add it directly to the plot:
grr <- ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))+
geom_text(y=c(3,10,50)-3,label=c("3","10","50"),size=4)
grr
You could add the numbers to the legend.
library(ggplot2)
critters <- structure(list(Zoo = "Omaha", Animals = 50, Bears = 10, PolarBears = 3), .Names = c("Zoo", "Animals", "Bears", "PolarBears"), row.names = c(NA, -1L), class = "data.frame")
d <- data.frame(animal=factor(c(rep("Animals", critters$Animals),
rep("Bears", critters$Bears), rep("PolarBears", critters$PolarBears)),
levels = c("PolarBears", "Bears", "Animals"), ordered= TRUE))
levels(d$animal) <- apply(data.frame(table(d$animal)), 1, paste, collapse = ": ")
ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))

Resources