How to avoid overlapping plots in ggplot2

How to avoid overlapping plots in ggplot2 - r

I want to plot estimates for three age groups (agecat) by two exposures (expo). The code below produced overlapped plots with alphabetically rearranged age groups. How could I avoid overlap of the plots and plot maintain the existing order of the age groups?
I used this code:
ggplot(mydf, aes(x = agecat, y = est,ymin = lcl, ymax = ucl, group=agecat,color=agecat,shape=agecat)) +
geom_point(position="dodge",size = 4) +
geom_linerange(position="dodge",size =0.7) +
geom_hline(aes(yintercept = 0)) +
labs(colour="Age Group", shape="Age Group") + theme(axis.title=element_text(face="bold",size="12"),axis.text=element_text(size=12,face="bold"))
Sample data:
> dput(mydf)
structure(list(expo = c(0, 1, 0, 1, 0, 1), est = c(0.290780632898979,
0.208093573361601, 0.140524761247529, 0.156713614649751, 0.444402395010579,
0.711469870845916), lcl = c(0.0679784035303221, -0.00413163014975071,
-0.208866152400888, -0.175393089838871, -0.227660022186016, 0.0755871550441212
), ucl = c(0.514078933380535, 0.420769190852455, 0.491138970050864,
0.489925205664665, 1.12099179726843, 1.35139300089608), agecat = c("young",
"young", "middle", "middle", "old", "old")), .Names = c("expo",
"est", "lcl", "ucl", "agecat"), row.names = c(2L, 4L, 6L, 8L,
10L, 12L), class = "data.frame")

I would do this by using expo as a variable in the plot. This would let ggplot know that you have overlap and so you need dodging at each level of your x variable. Once you do this, you can use position = position_dodge() directly in the two geoms and set the width argument to whatever you'd like. See the help page for position_dodge for examples of when you need to set width explicitly.
Here I'll replace group = agecat with group = expo. Using group instead of an aesthetic like shape means that there is no indication which point represents which expo level on the graphic.
mydf$agecat = factor(mydf$agecat, levels = c("young", "middle", "old"))
ggplot(mydf, aes(x = agecat, y = est, ymin = lcl, ymax = ucl, group = expo, color = agecat, shape = agecat)) +
geom_point(position = position_dodge(width = .5), size = 4) +
geom_linerange(position = position_dodge(width = .5), size = 0.7) +
geom_hline(aes(yintercept = 0)) +
labs(colour="Age Group", shape="Age Group") +
theme(axis.title = element_text(face="bold", size="12"),
axis.text = element_text(size=12, face="bold"))

You can convert the column agecat to factor with the levels in the desired order. Then, as Heroka pointed out in the comments, we can achieve a similar effect using facet_wrap:
mydf$agecat <- factor(mydf$agecat, levels=c("young", "middle", "old"))
ggplot(mydf, aes(x = agecat, y = est, ymin = lcl, ymax = ucl, group=agecat,color=agecat, shape=agecat)) +
geom_linerange(size =0.7) +
geom_hline(aes(yintercept = 0)) + labs(colour="Age Group", shape="Age Group") +
facet_wrap(agecat~est, scales="free_x", ncol=6) + geom_point(size = 4)+ theme(axis.title=element_text(face="bold",size="12"),axis.text=element_text(size=12,face="bold"),strip.text.x = element_blank())

Related

changing color of errorbars in ggplot2 chart

I have a problem with errorbars in bar chart in ggplot. I have an interaction between categorical (condition) and continuous (moderator) variable. I want to show error bars, but they are the same color as bars, which makes them impossible to interpret.
I tried adding color = "black" etc. for error bars, but it won't change anything.
Here is a code:
moderator = runif(n = 100, min = 1, max = 7)
condition <- rep(letters[1:2], length.out = 100)
y = runif(n = 100, min = 1, max = 100)
df <- data.frame(moderator, condition, y)
lm21 <- lm(y~ condition* moderator, data = df)
summary(lm21)
library(ggeffects)
library(ggplot2)
library(magrittr)
pd <- position_dodge()
ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))
The graph looks like this:
How can I change the color of error bars so that they are fully visible?
Thank you in advance!

I tried to convert the ggeffect value to a data.frame and ended like this, hope it's what you wanted.
The width control is made by hand sorry, I played with it to put it in the middle. Maybe someone better than me knows how to do it.
ggplot(as.data.frame(ggeffect(lm21, terms = c("condition", "moderator"))), aes(x = factor(x))) +
geom_col(aes(y = predicted, fill = factor(group)), position = position_dodge2(width = .5, preserve = "single", padding = 0)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high, group = factor(group)), position = position_dodge(width = .9), width = .15) +
geom_point(aes(y = predicted, group = factor(group)), position = position_dodge2(width = .9)) +
scale_fill_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_x_discrete(name = "Condition", labels = c("Low","High")) +
theme_light()

Ok it's not the easiest way but that's what I'd done:
p = ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))+
scale_colour_manual(values = rep('black',3))+
theme(legend.position = 'none')
The output is:
The only thing is that the legend is missing because scale_colour_manual changes it. But you can use this post to extract the legend How to plot just the legends in ggplot2? and the combine it to your plot.
I hope this is what you wanted

Here is another solution based on grobs manipulation.
p <- ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))
# Change the order of ggplot layers (error bars are printed after mean bars)
p$layers <- p$layers[c(3,1,2,4)]
# Set colors of polyline grob (error bars)
q <- ggplotGrob(p)
q$grobs[[6]]$children[[5]]$gp$col <- rep("black",6)
grid::grid.draw(q)

Positioning labels and color coding in sunburst - R

This is what is the output.I have a data set which contains unit, weight of each unit and compliance score for each unit in year 2016.
I was not able to add the table but here is the screenshot for the data in csv
I have named the columns in the data as unit, weight and year(which is compliance score) .
I want to create a sunburst chart where the first ring will be the unit divided based on weight and the second ring will be the same but will have labels compliance score.
The colour for each ring will be different.
I was able to do some code with the help from an online blog and the output I have gotten is similar to what I want but I am facing difficulty in positioning of the labels and also the colour coding for each ring
#using ggplot
library(ggplot2) # Visualisation
library(dplyr) # data wrangling
library(scales) # formatting
#read file
weight.eg = read.csv("Dummy Data.csv", header = FALSE, sep =
";",encoding = "UTF-8")
#change column names
colnames(weight.eg) <- c ("unit","weight","year")
#as weight column is factor change into integer
weight.eg$weight = as.numeric(levels(weight.eg$weight))
[as.integer(weight.eg$weight)]
weight.eg$year = as.numeric(levels(weight.eg$year))
[as.integer(weight.eg$year)]
#Nas are introduced, remove
weight.eg <- na.omit(weight.eg)
#Sum of the total weight
sum_total_weight = sum(weight.eg$weight)
#First layer
firstLevel = weight.eg %>% summarize(total_weight=sum(weight))
sunburst_0 = ggplot(firstLevel) # Just a foundation
#this will generate a bar chart
sunburst_1 =
sunburst_0 +
geom_bar(data=firstLevel, aes(x=1, y=total_weight),
fill='darkgrey', stat='identity') +
geom_text(aes(x=1, y=sum_total_weight/2, label=paste("Total
Weight", comma(total_weight))), color='black')
#View
sunburst_1
#this argument is used to rotate the plot around the y-axis which
the total weight
sunburst_1 + coord_polar(theta = "y")
sunburst_2=
sunburst_1 +
geom_bar(data=weight.eg,
aes(x=2, y=weight.eg$weight, fill=weight.eg$weight),
color='white', position='stack', stat='identity', size=0.6)
+
geom_text(data=weight.eg, aes(label=paste(weight.eg$unit,
weight.eg$weight), x=2, y=weight.eg$weight), position='stack')
sunburst_2 + coord_polar(theta = "y")
sunburst_3 =
sunburst_2 +
geom_bar(data=weight.eg,
aes(x=3, y=weight.eg$weight,fill=weight.eg$weight),
color='white', position='stack', stat='identity',
size=0.6)+
geom_text(data = weight.eg,
aes(label=paste(weight.eg$year),x=3,y=weight.eg$weight),position =
'stack')
sunburst_3 + coord_polar(theta = "y")
sunburst_3 + scale_y_continuous(labels=comma) +
scale_fill_continuous(low='white', high='darkred') +
coord_polar('y') + theme_minimal()
Output for dput(weight.eg)
structure(list(unit = structure(2:7, .Label = c("", "A", "B",
"C", "D", "E", "F", "Unit"), class = "factor"), weight = c(30,
25, 10, 17, 5, 13), year = c(70, 80, 50, 30, 60, 40)), .Names =
c("unit",
"weight", "year"), row.names = 2:7, class = "data.frame", na.action
= structure(c(1L,
8L), .Names = c("1", "8"), class = "omit"))
output for dput(firstLevel)
structure(list(total_weight = 100), .Names = "total_weight", row.names
= c(NA,
-1L), na.action = structure(c(1L, 8L), .Names = c("1", "8"), class =
"omit"), class = "data.frame")

So I think I might have some sort of solution for you. I wasn't sure what you wanted to color-code on the outer ring; from your code it seems you wanted it to be the weight again, but it was not obvious to me. For different colour scales per ring, you could use the ggnewscale package:
library(ggnewscale)
For the centering of the labels you could write a function:
cs_fun <- function(x){(cumsum(x) + c(0, cumsum(head(x , -1))))/ 2}
Now the plotting code could look something like this:
ggplot(weight.eg) +
# Note: geom_col is equivalent to geom_bar(stat = "identity")
geom_col(data = firstLevel,
aes(x = 1, y = total_weight)) +
geom_text(data = firstLevel,
aes(x = 1, y = total_weight / 2,
label = paste("Total Weight:", total_weight)),
colour = "black") +
geom_col(aes(x = 2,
y = weight, fill = weight),
colour = "white", size = 0.6) +
scale_fill_gradient(name = "Weight",
low = "white", high = "darkred") +
# Open up new fill scale for next ring
new_scale_fill() +
geom_text(aes(x = 2, y = cs_fun(weight),
label = paste(unit, weight))) +
geom_col(aes(x = 3, y = weight, fill = weight),
size = 0.6, colour = "white") +
scale_fill_gradient(name = "Another Weight?",
low = "forestgreen", high = "white") +
geom_text(aes(label = paste0(year), x = 3,
y = cs_fun(weight))) +
coord_polar(theta = "y")
Which looks like this:

Plot confusion matrix in R using ggplot

I have two confusion matrices with calculated values as true positive (tp), false positives (fp), true negatives(tn) and false negatives (fn), corresponding to two different methods. I want to represent them as
I believe facet grid or facet wrap can do this, but I find difficult to start.
Here is the data of two confusion matrices corresponding to method1 and method2
dframe<-structure(list(label = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L,
3L), .Label = c("fn", "fp", "tn", "tp"), class = "factor"), value = c(9,
0, 3, 1716, 6, 3, 6, 1713), method = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("method1", "method2"), class = "factor")), .Names = c("label",
"value", "method"), row.names = c(NA, -8L), class = "data.frame")

This could be a good start
library(ggplot2)
ggplot(data = dframe, mapping = aes(x = label, y = method)) +
geom_tile(aes(fill = value), colour = "white") +
geom_text(aes(label = sprintf("%1.0f",value)), vjust = 1) +
scale_fill_gradient(low = "white", high = "steelblue")
Edited
TClass <- factor(c(0, 0, 1, 1))
PClass <- factor(c(0, 1, 0, 1))
Y <- c(2816, 248, 34, 235)
df <- data.frame(TClass, PClass, Y)
library(ggplot2)
ggplot(data = df, mapping = aes(x = TClass, y = PClass)) +
geom_tile(aes(fill = Y), colour = "white") +
geom_text(aes(label = sprintf("%1.0f", Y)), vjust = 1) +
scale_fill_gradient(low = "blue", high = "red") +
theme_bw() + theme(legend.position = "none")

It is a very old question, still it seems there is a quite straight forward solution to that using ggplot2 which hasn't been mentioned.
Hope it might be helpful to someone:
cm <- confusionMatrix(factor(y.pred), factor(y.test), dnn = c("Prediction", "Reference"))
plt <- as.data.frame(cm$table)
plt$Prediction <- factor(plt$Prediction, levels=rev(levels(plt$Prediction)))
ggplot(plt, aes(Prediction,Reference, fill= Freq)) +
geom_tile() + geom_text(aes(label=Freq)) +
scale_fill_gradient(low="white", high="#009194") +
labs(x = "Reference",y = "Prediction") +
scale_x_discrete(labels=c("Class_1","Class_2","Class_3","Class_4")) +
scale_y_discrete(labels=c("Class_4","Class_3","Class_2","Class_1"))

A slightly more modular solution based on MYaseen208's answer. Might be more effective for large datasets / multinomial classification:
confusion_matrix <- as.data.frame(table(predicted_class, actual_class))
ggplot(data = confusion_matrix
mapping = aes(x = Var1,
y = Var2)) +
geom_tile(aes(fill = Freq)) +
geom_text(aes(label = sprintf("%1.0f", Freq)), vjust = 1) +
scale_fill_gradient(low = "blue",
high = "red",
trans = "log") # if your results aren't quite as clear as the above example

Here's another ggplot2 based option; first the data (from caret):
library(caret)
# data/code from "2 class example" example courtesy of ?caret::confusionMatrix
lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
levels = rev(lvs))
pred <- factor(
c(
rep(lvs, times = c(54, 32)),
rep(lvs, times = c(27, 231))),
levels = rev(lvs))
confusionMatrix(pred, truth)
And to construct the plots (substitute your own matrix below as needed when setting up "table"):
library(ggplot2)
library(dplyr)
table <- data.frame(confusionMatrix(pred, truth)$table)
plotTable <- table %>%
mutate(goodbad = ifelse(table$Prediction == table$Reference, "good", "bad")) %>%
group_by(Reference) %>%
mutate(prop = Freq/sum(Freq))
# fill alpha relative to sensitivity/specificity by proportional outcomes within reference groups (see dplyr code above as well as original confusion matrix for comparison)
ggplot(data = plotTable, mapping = aes(x = Reference, y = Prediction, fill = goodbad, alpha = prop)) +
geom_tile() +
geom_text(aes(label = Freq), vjust = .5, fontface = "bold", alpha = 1) +
scale_fill_manual(values = c(good = "green", bad = "red")) +
theme_bw() +
xlim(rev(levels(table$Reference)))
# note: for simple alpha shading by frequency across the table at large, simply use "alpha = Freq" in place of "alpha = prop" when setting up the ggplot call above, e.g.,
ggplot(data = plotTable, mapping = aes(x = Reference, y = Prediction, fill = goodbad, alpha = Freq)) +
geom_tile() +
geom_text(aes(label = Freq), vjust = .5, fontface = "bold", alpha = 1) +
scale_fill_manual(values = c(good = "green", bad = "red")) +
theme_bw() +
xlim(rev(levels(table$Reference)))

Here it is a reprex using cvms package i.e., Wrapper function for ggplot2 to make confusion matrix.
library(cvms)
library(broom)
library(tibble)
library(ggimage)
#> Loading required package: ggplot2
library(rsvg)
set.seed(1)
d_multi <- tibble("target" = floor(runif(100) * 3),
"prediction" = floor(runif(100) * 3))
conf_mat <- confusion_matrix(targets = d_multi$target,
predictions = d_multi$prediction)
# plot_confusion_matrix(conf_mat$`Confusion Matrix`[[1]], add_sums = TRUE)
plot_confusion_matrix(
conf_mat$`Confusion Matrix`[[1]],
add_sums = TRUE,
sums_settings = sum_tile_settings(
palette = "Oranges",
label = "Total",
tc_tile_border_color = "black"
)
)
Created on 2021-01-19 by the reprex package (v0.3.0)

Old question, but I wrote this function which I think makes a prettier answer. Results in a divergent color palette (or whatever you want, but default is divergent):
prettyConfused<-function(Actual,Predict,colors=c("white","red4","dodgerblue3"),text.scl=5){
actual = as.data.frame(table(Actual))
names(actual) = c("Actual","ActualFreq")
#build confusion matrix
confusion = as.data.frame(table(Actual, Predict))
names(confusion) = c("Actual","Predicted","Freq")
#calculate percentage of test cases based on actual frequency
confusion = merge(confusion, actual, by=c('Actual','Actual'))
confusion$Percent = confusion$Freq/confusion$ActualFreq*100
confusion$ColorScale<-confusion$Percent*-1
confusion[which(confusion$Actual==confusion$Predicted),]$ColorScale<-confusion[which(confusion$Actual==confusion$Predicted),]$ColorScale*-1
confusion$Label<-paste(round(confusion$Percent,0),"%, n=",confusion$Freq,sep="")
tile <- ggplot() +
geom_tile(aes(x=Actual, y=Predicted,fill=ColorScale),data=confusion, color="black",size=0.1) +
labs(x="Actual",y="Predicted")
tile = tile +
geom_text(aes(x=Actual,y=Predicted, label=Label),data=confusion, size=text.scl, colour="black") +
scale_fill_gradient2(low=colors[2],high=colors[3],mid=colors[1],midpoint = 0,guide='none')
}

Add axis text above horizontal geom_bars; justify text flush left

I would like to place axis text above the appropriate horizontal bars in a ggplot2 plot. Below is as far as I have gotten, with the plotting code afterwards. The data is at the bottom of this question.
My questions, aside from the ever-present "what code would accomplish the goal better", are (1) instead of my manually entering rectangle and text locations, how can R place them algorithmically, (2) how can R move the text in the rectangles to the far left (I tried with calculating a mid-point based on the number of characters in the text, but it doesn't work)?
For the plotting I created the sequence variable instead of struggling with as.numeric(as.character(risks).
ggplot(plotpg19, aes(x = sequence, y = scores)) +
geom_bar(stat = "identity", width = 0.4) +
coord_flip() +
labs(x = "", y = "") +
theme_bw() +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) +
geom_rect(data=plotpg19, aes(xmin= seq(1.5, 8.5, 1),
xmax= seq(1.8, 8.8, 1), ymin=0, ymax=30), fill = "white") +
geom_text(data=plotpg19, aes(x=seq(1.6, 8.6, 1), y= nchar(as.character(risks))/2, label=risks, size = 5, show_guide = FALSE)) +
guides(size = FALSE)
Below is the data.
plotpg19 <- structure(list(risks = structure(c(8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L), .Label = c("Other", "Third parties/associates acting on our behalf",
"Rogue employees", "Lack of understanding by top executives",
"Lack of anti-bribery/corruption training or awareness within the business",
"Geographic locations in which the company operates", "Industries/sector(s) in which the company operates",
"Inadequate resources for anti-bribery/corruption compliance activities"
), class = "factor"), scores = c(15, 28, 71, 16, 5, 48, 55, 2
), sequence = 1:8), .Names = c("risks", "scores", "sequence"), class = "data.frame", row.names = c(NA,
-8L))
This question gave me some guidance. fitting geom_text inside geom_rect

I do not understand why your are plotting white geom_rect. For the second question, setting y=0 in the aes of geom_text and adding hjust=0 (start the text at precisely y) works. I adjusted the x parameter so that the text are plotted halfway through bars :
library(dplyr)
plotpg19 <- mutate(plotpg19, xtext = sequence + 0.55)
library(ggplot2)
ggplot(plotpg19, aes(x = sequence, y = scores)) +
geom_bar(stat = "identity", width = 0.4) +
coord_flip() +
labs(x = "", y = "") +
theme_bw() +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) +
geom_text(data = plotpg19,
aes(x = xtext, y = 0, label = risks, size = 5, show_guide = FALSE),
hjust = 0, vjust = 1) +
guides(size = FALSE)

Stacked Bar Graph Labels with ggplot2

I am trying to graph the following data:
to_graph <- structure(list(Teacher = c("BS", "BS", "FA"
), Level = structure(c(2L, 1L, 1L), .Label = c("BE", "AE", "ME",
"EE"), class = "factor"), Count = c(2L, 25L, 28L)), .Names = c("Teacher",
"Level", "Count"), row.names = c(NA, 3L), class = "data.frame")
and want to add labels in the middle of each piece of the bars that are the percentage for that piece. Based on this post, I came up with:
ggplot(data=to_graph, aes(x=Teacher, y=Count, fill=Level), ordered=TRUE) +
geom_bar(aes(fill = Level), position = 'fill') +
opts(axis.text.x=theme_text(angle=45)) +
scale_y_continuous("",formatter="percent") +
opts(title = "Score Distribution") +
scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
geom_text(aes(label = Count), size = 3, hjust = 0.5, vjust = 3, position = "stack")
But it
Doesn't have any effect on the graph
Probably doesn't display the percentage if it did (although I'm not entirely sure of this point)
Any help is greatly appreciated. Thanks!

The y-coordinate of the text is the actual count (2, 25 or 28), whereas the y-coordinates in the plot panel range from 0 to 1, so the text is being printed off the top.
Calculate the fraction of counts using ddply (or tapply or whatever).
graph_avgs <- ddply(
to_graph,
.(Teacher),
summarise,
Count.Fraction = Count / sum(Count)
)
to_graph <- cbind(to_graph, graph_avgs$Count.Fraction)
A simplified version of your plot. I haven't bothered to play about with factor orders so the numbers match up to the bars yet.
ggplot(to_graph, aes(Teacher), ordered = TRUE) +
geom_bar(aes(y = Count, fill = Level), position = 'fill') +
scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
geom_text(
aes(y = graph_avgs$Count.Fraction, label = graph_avgs$Count.Fraction),
size = 3
)