R geom_col does not show the 'bars' - r

I am having this strange error regarding displaying the actual bars in a geom_col() plot.
Suppose I have a data set (called user_data) that contains a count of the total number of changes ('adjustments') done for a particular user (and a plethora of other columns). Let's say it looks like this:
User_ID total_adjustments additional column_1 additional column_2 ...
1 'Blah_17' 21 random_data random_data
2 'Blah_1' 47 random_data random_data
3 'foobar' 2 random_data random_data
4 'acbd1' 17 random_data random_data
5 'user27' 9 random_data random_data
I am using the following code to reduce it into a dataframe with only the two columns I care about:
total_adj_count = user_data %>%
select(User_ID, total_adjustments) %>%
arrange(desc(total_adjustments)) %>%
mutate(User_ID = factor(User_ID, User_ID))
This results in my dataframe (total_adj_count) looking like so:
User_ID total_adjustments
1 'Blah_1' 47
2 'Blah_17' 21
3 'acbd1' 17
4 'user27' 9
5 'foobar' 2
Moving along, here is the code I used to attempt to create a geom_col() plot of that data:
g = ggplot(data=total_adj_count, aes(x = User_ID, y = total_adjustments)) +
geom_bar(width=.5, alpha=1, show.legend = FALSE, fill="#000066", stat="identity") +
labs(x="", y="Adjustment Count", caption="(based on sample data)") +
theme_few(base_size = 10) + scale_color_few() +
theme(axis.text.x=element_text(angle = 45, hjust = 1)) +
geom_text(aes(label=round(total_adjustments, digits = 2)), size=3, nudge_y = 2000) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
p = ggplotly(g)
p = p %>%
layout(margin = m,
showlegend = FALSE,
title = "Number of Adjustments per User"
)
p
And for some strange reason when I try to view plot p it displays all parts of the plot as intended, but does not show the actual bars (or columns).
In fact I get this strange plot and am sort of stuck where to fix it:

Change nudge_y argument to a smaller number. Right now you have it set to 2000 which offsets the labels by 2000 on the y-axis. Below I've changed it to nudge_y = 2 and it looks like so:
g <-
ggplot(total_adj_count, aes(User_ID, total_adjustments)) +
geom_col(width = .5, alpha = 1, show.legend = FALSE, fill = "#000066") +
labs(x = "", y = "Adjustment Count", caption = "(based on sample data)") +
theme_few(base_size = 10) +
scale_color_few() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = round(total_adjustments, digits = 2)), size = 3, nudge_y = 2) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
Full copy/paste:
library(ggplot2)
library(ggthemes)
library(plotly)
library(dplyr)
text <- " User_ID total_adjustments
1 'Blah_1' 47
2 'Blah_17' 21
3 'acbd1' 17
4 'user27' 9
5 'foobar' 2"
total_adj_count <- read.table(text = text, header = TRUE, stringsAsFactors = FALSE)
g <-
ggplot(total_adj_count, aes(User_ID, total_adjustments)) +
geom_col(width = .5, alpha = 1, show.legend = FALSE, fill = "#000066") +
labs(x = NULL, y = "Adjustment Count", caption = "(based on sample data)", title = "Number of Adjustments per User") +
theme_few(base_size = 10) +
scale_color_few() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = round(total_adjustments, digits = 2)), size = 3, nudge_y = 2) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
p <- ggplotly(g)
p <- layout(p, showlegend = FALSE)
p

Related

How to customize Horizontal dots plot?

I want to plot customized Horizontal dots using my data and the code given here
data:
df <- data.frame (origin = c("A","B","C","D","E","F","G","H","I","J"),
Percentage = c(23,16,32,71,3,60,15,21,44,60),
rate = c(10,12,20,200,-25,12,13,90,-105,23),
change = c(10,12,-5,12,6,8,0.5,-2,5,-2))
.
origin Percentage rate change
1 A 23 10 10.0
2 B 16 12 12.0
3 C 32 20 -5.0
4 D 71 200 12.0
5 E 3 -25 6.0
6 F 60 12 8.0
7 G 15 13 0.5
8 H 21 90 -2.0
9 I 44 -105 5.0
10 J 60 23 -2.0
obs from 'origin' column need be put on y-axis. corresponding values in 'change' and 'rate' column must be presented/differentiated through in box instead of circles, for example values from 'change' column in lightblue and values from 'rate' column in blue. In addition I want to add second vertical axis on right and put circles on it which size will be defined based on corresponding value in 'Percentage' column.
Output of code from the link:
Expected outcome (smth. like this:
Try this.
First, reshaping so that both rate and change are in one column better supports ggplot's general preference towards "long" data.
df2 <- reshape2::melt(df, id.vars = c("origin", "Percentage"))
(That can also be done using pivot_wider.)
The plot:
ggplot(df2, aes(value, origin)) +
geom_label(aes(label = value, fill = variable, color = variable)) +
geom_point(aes(size = Percentage), x = max(df2$value) +
20, shape = 21) +
scale_x_continuous(expand = expansion(add = c(15, 25))) +
scale_fill_manual(values = c(change="lightblue", rate="blue")) +
scale_color_manual(values = c(change="black", rate="white")) +
theme_bw() +
theme(panel.border = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) +
labs(x = NULL, y = NULL)
The legend and labels can be adjusted in the usual ggplot methods. Overlapping of labels is an issue with which you will need to contend.
Update on OP request: See comments:
gg_dot +
geom_text(aes(x = rate, y = origin,
label = paste0(round(rate, 1), "%")),
col = "black") +
geom_text(aes(x = change, y = origin,
label = paste0(round(change, 1), "%")),
col = "white") +
geom_text(aes(x = x, y = y, label = label, col = label),
data.frame(x = c(40 - 1.1, 180 + 0.6), y = 11,
label = c("change", "rate")), size = 6) +
scale_color_manual(values = c("#9DBEBB", "#468189"), guide = "none") +
scale_y_discrete(expand = c(0.2, 0))
First answer:
Something like this?
library(tidyverse)
library(dslabs)
gg_dot <- df %>%
arrange(rate) %>%
mutate(origin = fct_inorder(origin)) %>%
ggplot() +
# remove axes and superfluous grids
theme_classic() +
theme(axis.title = element_blank(),
axis.ticks.y = element_blank(),
axis.line = element_blank()) +
# add a dummy point for scaling purposes
geom_point(aes(x = 12, y = origin),
size = 0, col = "white") +
# add the horizontal discipline lines
geom_hline(yintercept = 1:10, col = "grey80") +
# add a point for each male success rate
geom_point(aes(x = rate, y = origin),
size = 11, col = "#9DBEBB") +
# add a point for each female success rate
geom_point(aes(x = change, y = origin),
size = 11, col = "#468189")
gg_dot +
geom_text(aes(x = rate, y = origin,
label = paste0(round(rate, 1))),
col = "black") +
geom_text(aes(x = change, y = origin,
label = paste0(round(change, 1))),
col = "white") +
geom_text(aes(x = x, y = y, label = label, col = label),
data.frame(x = c(40 - 1.1, 180 + 0.6), y = 11,
label = c("change", "rate")), size = 6) +
scale_color_manual(values = c("#9DBEBB", "#468189"), guide = "none") +
scale_y_discrete(expand = c(0.2, 0))

Table below x axis in ggplot

Hello everyone I was trying to add some text below the x axis in ggplot2 and I was able to do so using geom_textand with help of coord_cartesian but I couldn't make it reproducible as this need to run in a loop. I thought that adding the values I want with the row names (First, Second) in a table would fix it, does anybody have experience in that. below is the workaround I did. Thank you very much in advance.
## Data
Grade <- 1 : 20
Case <- rep(paste('case' , 1:5,sep = ''),4)
Number <- paste('n', 1:20 , sep = '')
Class <- c(rep('Class1',5) , rep('Class2',5) , rep('Class3',5) , rep('Class4',5))
se <- 0.2
df <- data.frame(Grade,Case ,Number, Class , se)
## plot
ggplot(df, aes(x= factor(Case , levels = c('case1','case2' , 'case3' , 'case4','case5')) , y=Grade ,
fill= Grade)) +
geom_bar(position="dodge", stat="identity",
colour="black",
size=.4) +
geom_errorbar(aes(ymin=Grade +se, ymax=Grade +se),
size=.3,
width=.2,
position=position_dodge(.9))+
geom_linerange(aes(ymin = Grade , ymax = Grade +se),position=position_dodge(.9))+
geom_text(aes(label=Number , y = Grade + se + 1),data=df, position=position_dodge(0.9), size= 4) +
ggtitle('Place a table below x axis')+
facet_grid(~Class) +
xlab('') +
ylab('Case Num') +
theme_gray()+
theme(plot.margin = unit(c(1,1,1,6), "lines"),
axis.text.x = element_text(size = 15)) +
scale_x_discrete(labels = paste(1:5 , '\n' , 10:15, sep = '')) +
geom_text(data = df[df$Class == 'Class1',],x = -1 , y = -3,
label= 'First\nSecond' , size = 4)+
coord_cartesian(clip = "off" , xlim = c(1, 5) )
EDIT:
Sorry for the confusion,although the solution suggested by #stefan is pretty much convenient but the main purpose is to have something like this:
considering that the proposed table will contain external characters, not taken from the data frame at all (if possible!).
As an alternative approach to tackle this problem I simply set up the table as a second ggplot which I glue together with the major ggplot using patchwork.
## Data
Grade <- 1 : 20
Case <- rep(paste('case' , 1:5,sep = ''),4)
Number <- paste('n', 1:20 , sep = '')
Class <- c(rep('Class1',5) , rep('Class2',5) , rep('Class3',5) , rep('Class4',5))
se <- 0.2
df <- data.frame(Grade,Case ,Number, Class , se)
library(patchwork)
library(ggplot2)
library(tidyr)
library(dplyr)
## plot
p1 <- ggplot(df, aes(x= factor(Case , levels = c('case1','case2' , 'case3' , 'case4','case5')) , y=Grade ,
fill= Grade)) +
geom_bar(position="dodge", stat="identity",
colour="black",
size=.4) +
geom_errorbar(aes(ymin=Grade +se, ymax=Grade +se),
size=.3,
width=.2,
position=position_dodge(.9))+
geom_linerange(aes(ymin = Grade , ymax = Grade +se),position=position_dodge(.9))+
geom_text(aes(label=Number , y = Grade + se + 1),data=df, position=position_dodge(0.9), size= 4) +
ggtitle('Place a table below x axis')+
facet_grid(~Class) +
xlab(NULL) +
ylab('Case Num') +
theme_gray()+
theme(axis.text.x = element_blank())
p2 <- df %>%
mutate(First = as.integer(stringr::str_extract(Case, "\\d")),
Second = First + 9,
Third = Second + 9) %>%
pivot_longer(c(First, Second, Third), names_to = "layer", values_to = "label") %>%
ggplot(aes(x = Case)) +
geom_text(aes(y = factor(layer, c("Third", "Second", "First")), label = label)) +
labs(y = "", x = NULL) +
theme_minimal() +
theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
panel.grid = element_blank(), strip.text = element_blank()) +
facet_grid(~Class)
p1 / p2 + plot_layout(heights = c(8, 1))
Created on 2020-05-23 by the reprex package (v0.3.0)
EDIT: Tweak to get a more table like output by adding a geom_tile and removing the spacing between facets as well as setting expansion of x-axis to zero:
p2 <- df %>%
select(Case, Class) %>%
mutate(First = letters[1:nrow(.)],
Second = LETTERS[1:nrow(.)],
Third = as.character(1:nrow(.))) %>%
pivot_longer(c(First, Second, Third), names_to = "layer", values_to = "label") %>%
ggplot(aes(x = Case, y = factor(layer, c("Third", "Second", "First")))) +
# Add Table Style
geom_tile(fill = "blue", alpha = .4, color = "black") +
geom_text(aes(label = label)) +
# Remove expansion of axsis
scale_x_discrete(expand = expansion(mult = c(0, 0))) +
labs(y = "", x = NULL) +
theme_minimal() +
theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
panel.grid = element_blank(), strip.text = element_blank(), panel.spacing.x = unit(0, "mm")) +
facet_grid(~Class)
p1 / p2 + plot_layout(heights = c(8, 1))
Created on 2020-05-24 by the reprex package (v0.3.0)
If I understand your requirement correctly, (as in my comment above), this may help you. You just need to name your graph and add the labels in loop and render outside the loop.
...
theme(plot.margin = unit(c(1,1,1,6), "lines"),
axis.text.x = element_text(size = 15)) +
scale_x_discrete(labels = paste(1:5 , '\n' , 10:15, sep = '')) +
coord_cartesian(clip = "off" , xlim = c(1, 5) )
label = NULL
ordinal <- c('first','second','third','fourth','fifth','sixth','seventh','eighth','ninth','tenth')
for (i in 1:5) {
label <- paste(label, '\n', ordinal[i])
}
g1 <- g1 + geom_text(data = df[df$Class == 'Class1',],x = -1 , y = -3,
label= label , size = 4)
g1
This is what I get as a result:

`geom_text()` labels are very light/faint - need them normal/dark

# Create the data frame
library(tidyverse)
dat <- read.table(text = "A B C
1 23 234 324
2 34 534 120
3 56 324 124
4 34 234 124
5 123 534 654",
sep = "",
header = TRUE) %>%
gather(key = "variable", value = "value") %>%
group_by(variable) %>%
mutate(ind = as.factor(rep(1:5)),
perc = value / sum(value)) %>%
arrange(variable, -perc) %>%
mutate(ordering = row_number()) %>%
mutate(lab.y = cumsum(perc),
lab.y.mid = lab.y - (perc / 2))
# Toggle whether red is on top/bottom with '1L' or '-1L'
red <- 1L
n_ord <- length(unique(dat$ordering))
fill_scale <- c("darkred", rep("black", n_ord - 1L)) %>%
setNames(red * seq(n_ord))
alpha_scale <- c(0.5, rep(0.3, n_ord - 1L)) %>%
setNames(red * seq(n_ord))
# Plot the data
ggplot(dat, aes(variable,
perc,
fill = factor(red * ordering),
alpha = factor(red * ordering))) +
geom_col(color = "white", size = 1.5) +
scale_fill_manual(guide = "none", values = fill_scale) +
scale_alpha_manual(guide = "none", values = alpha_scale) +
facet_grid(~ variable, scales = "free_x") +
theme_minimal() +
theme(panel.grid.major.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "none") +
scale_y_continuous(labels = scales::percent_format()) +
geom_text(aes(y = 1 - lab.y.mid, label = ind), color = "black")
I've been under the assumption that ggplot plots things in sequential order, line-item by line-item. The last line of my ggplot above is:
geom_text(aes(y = 1 - lab.y.mid, label = ind), color = "black")
But it doesn't appear this command is the last thing that ggplot "did". If you look at the plot above you'll see that my text labels are very faint. The text is either behind some of the sections of the plot, or it has inherited some type of alpha level, or something else is going on I haven't thought of.
How do I get the text to be dark (like it is normally)? Like this plot below.
geom_text inherits the alpha aesthetic from ggplot() which is the reason the text doesn't appear in "black".
Change your last line to
... +
geom_text(aes(x = variable, y = 1 - lab.y.mid, label = ind), inherit.aes = FALSE)
To get this result
Another option is to overwrite alpha
... +
geom_text(aes(y = 1 - lab.y.mid, label = ind), alpha = 1)

adding annotation_custom with rasterGrob after function call

Apologies for the title, I know it sucks.
I am trying to create a waterfall chart function. So, I am trying to create a basic plot, which people can configure however they wish. I ran into a problem, though, adding a gradient to the plot. For example:
I have this df:
> wfDF
category value sign id end start labels
1 Basic Materials 0.0024 pos 1 0.0024 0.0000 0.0024
2 Communications 0.0492 pos 2 0.0516 0.0024 0.0516
3 Consumer, Cyclical 0.0268 pos 3 0.0784 0.0516 0.0784
4 Consumer, Non-cyclical 0.0245 pos 4 0.1029 0.0784 0.1029
5 Diversified -0.0037 neg 5 0.0992 0.1029 0.1029
6 Energy -0.0040 neg 6 0.0952 0.0992 0.0992
7 Financial 0.0445 pos 7 0.1397 0.0952 0.1397
8 Industrial 0.0006 pos 8 0.1403 0.1397 0.1403
9 Technology -0.0059 neg 9 0.1344 0.1403 0.1403
10 Total 0.1345 pos 10 0.0000 0.1344 0.1344
With this code:
ggplot(wfDF, aes(category, fill = sign, color = sign)) + guides(fill = FALSE, color=FALSE) +
ggtitle("Risk by Industry") +
annotation_custom(g, xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
theme(plot.title = element_text(vjust=1.5, face="bold", size = 20),
axis.title.x = element_blank(), axis.title.y = element_blank()) +
geom_rect(aes(x = category, xmin = id - 0.475, xmax = id + 0.475, ymin = end, ymax = start)) +
scale_fill_manual(values=c("red", "forestgreen")) +
scale_color_manual(values=c("black", "black")) +
scale_y_continuous(labels = percent) +
scale_x_discrete("", breaks = levels(wfDF$category), labels = gsub(" ", "\n", levels(wfDF$category))) +
geom_text(data = wfDF, aes(id, labels, label = paste0(value*100, "%")), vjust = -.5, size = 5, fontface = 4)
Which produces this graph:
Which looks great. I am trying to write a function which will do all this with any set of categories and values, and allows for any colors or customization to be added or used. I have this function:
waterfall <- function(categories, values, has.total = FALSE, offset = .475, labelType = c("decimal", "percent")) {
library(scales)
library(grid)
library(ggplot2)
library(dplyr)
theData <- data.frame("category" = as.character(categories), "value" = as.numeric(values))
if (labelType == "percent") theData$value = theData$value/100
if (!has.total) theData <- theData %>% rbind(.,list("Total", sum(.$val)))
theData$sign <- ifelse(theData$val >= 0, "pos","neg")
theData <- data.frame(category = factor(theData$category, levels = unique(theData$category)),
value = round(theData$value,4),
sign = factor(theData$sign, levels = unique(theData$sign)))
theData$id <- seq_along(theData$value)
theData$end <- cumsum(theData$value)
theData$end <- c(head(theData$end, -1), 0)
theData$start <- c(0, head(theData$end, -1))
theData$labels <- paste0(theData$value*100, "%")
theData$labellocs <- pmax(theData$end,theData$start)
theGG <- ggplot(theData, aes(category, fill = sign, color = sign)) +
geom_rect(aes(x = category, xmin = id - offset, xmax = id + offset, ymin = end, ymax = start)) +
scale_x_discrete("", breaks = levels(theData$category), labels = gsub(" ", "\n", levels(theData$category))) +
geom_text(data = theData, aes(id, labellocs, label = labels), vjust = -.5, size = 5, fontface = 4)
return(theGG)
}
waterfall(categories = riskDecomp$ID, values = riskDecomp$val, labelType = "percent")
Which produces a pretty ugly basic thing:
However, if I try to run something like the following:
test <- waterfall(categories = riskDecomp$ID, values = riskDecomp$val, labelType = "percent")
g <- rasterGrob(blues9, width=unit(1,"npc"), height = unit(1,"npc"), interpolate = TRUE)
test + guides(fill = FALSE, color=FALSE) +
ggtitle("Risk Decomposition") +
annotation_custom(g, xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
theme(plot.title = element_text(vjust=1.5, face="bold", size = 20),
axis.title.x = element_blank(), axis.title.y = element_blank()) +
scale_fill_manual(values=c("red", "forestgreen")) +
scale_color_manual(values=c("black", "black")) +
scale_y_continuous(labels = percent)
I get this nonsense:
The rasterGrob thing seems to overlay the entire rest of the plot. The only workaround I can find is to add the gradient to the inside of the function. Which kind of removes the... customization of the function. Is there a way to fix this? To fix the order of the grobs? If that makes sense?
you can change the order of the layers manually,
library(grid)
library(ggplot2)
g <- rasterGrob(matrix(blues9, ,1), interpolate=TRUE,
width=unit(1,"npc"), height=unit(1,"npc"))
p <- qplot(rnorm(10), rnorm(10)) +
annotation_custom(g)
nl <- length(p$layers)
p$layers <- c(p$layers[[nl]], p$layers[-nl])
p

Combined frequency histogram using two attributes

I'm using ggplot2 to create histograms for two different parameters. My current approach is attached at the end of my question (including a dataset, which can be used and loaded right from pasetbin.com), which creates
a histrogram visualizing the frequency for the spatial distribution of logged user data based on the "location"-attribute (either "WITHIN" or "NOT_WITHIN").
a histogram visualizing the frequency for the distribution of logged user data based on the "context"-attribute (either "Clicked A" or "Clicked B").
This looks like the follwoing:
# Load my example dataset from pastebin
RawDataSet <- read.csv("http://pastebin.com/raw/uKybDy03", sep=";")
# Load packages
library(plyr)
library(dplyr)
library(reshape2)
library(ggplot2)
###### Create Frequency Table for Location-Information
LocationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Within_area = sum(location=="WITHIN"),
Not_within_area = sum(location=="NOT_WITHIN"))
# Create a column for unique identifiers
LocationFrequency <- mutate(LocationFrequency, id = rownames(LocationFrequency))
# Reorder columns
LocationFrequency <- LocationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
LocationFrequency[,c(1)] <- sapply(LocationFrequency[, c(1)], as.numeric)
# Melt data
LocationFrequency.m = melt(LocationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(LocationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all spatial information per user.") +
labs(x="User", y="Number of notifications interaction within/not within the area") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of location")
##### Create Frequency Table for Interaction-Information
InterationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Clicked_A = sum(context=="Clicked A"),
Clicked_B = sum(context=="Clicked B"))
# Create a column for unique identifiers
InterationFrequency <- mutate(InterationFrequency, id = rownames(InterationFrequency))
# Reorder columns
InterationFrequency <- InterationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
InterationFrequency[,c(1)] <- sapply(InterationFrequency[, c(1)], as.numeric)
# Melt data
InterationFrequency.m = melt(InterationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(InterationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all interaction types per user.") +
labs(x="User", y="Number of interaction") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of interaction")
But what I'm trying to realize: How can I combine both histograms in only one plot? Would it be somehow possible to place the corressponding percentage for each part? Somethink like the following sketch, which represents the total number of observations per user (the complete height of the bar) and using the different segmentation to visualize the corresponding data. Each bar would be divided into to parts (within and not_within) where each part would be then divided into two subparts showing the percentage of the interaction types (*Clicked A' or Clicked B).
With the update description, I would make a combined barplot with two parts: a negative and a positve one. In order to achieve that, you have to get your data into the correct format:
# load needed libraries
library(dplyr)
library(tidyr)
library(ggplot2)
# summarise your data
new.df <- RawDataSet %>%
group_by(UserEmail,location,context) %>%
tally() %>%
mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>%
group_by(UserEmail,location) %>%
mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n))
The new.df dataframe looks like:
> new.df
Source: local data frame [90 x 6]
Groups: UserEmail, location [54]
UserEmail location context n n2 p
(fctr) (fctr) (fctr) (int) (dbl) (dbl)
1 andre NOT_WITHIN Clicked A 3 -3 -1.0000000
2 bibi NOT_WITHIN Clicked A 4 -4 -0.5000000
3 bibi NOT_WITHIN Clicked B 4 -4 -0.5000000
4 bibi WITHIN Clicked A 9 9 0.6000000
5 bibi WITHIN Clicked B 6 6 0.4000000
6 corinn NOT_WITHIN Clicked A 10 -10 -0.5882353
7 corinn NOT_WITHIN Clicked B 7 -7 -0.4117647
8 corinn WITHIN Clicked A 9 9 0.7500000
9 corinn WITHIN Clicked B 3 3 0.2500000
10 dpfeifer NOT_WITHIN Clicked A 7 -7 -1.0000000
.. ... ... ... ... ... ...
Next you can create a plot with:
ggplot() +
geom_bar(data = new.df[new.df$location == "NOT_WITHIN",],
aes(x = UserEmail, y = n2, color = "darkgreen", fill = context),
size = 1, stat = "identity", width = 0.7) +
geom_bar(data = new.df[new.df$location == "WITHIN",],
aes(x = UserEmail, y = n2, color = "darkred", fill = context),
size = 1, stat = "identity", width = 0.7) +
scale_y_continuous(breaks = seq(-20,20,5),
labels = c(20,15,10,5,0,5,10,15,20)) +
scale_color_manual("Location of interaction",
values = c("darkgreen","darkred"),
labels = c("NOT_WITHIN","WITHIN")) +
scale_fill_manual("Type of interaction",
values = c("lightyellow","lightblue"),
labels = c("Clicked A","Clicked B")) +
guides(color = guide_legend(override.aes = list(color = c("darkred","darkgreen"),
fill = NA, size = 2), reverse = TRUE),
fill = guide_legend(override.aes = list(fill = c("lightyellow","lightblue"),
color = "black", size = 0.5))) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 14),
axis.title = element_blank(),
legend.title = element_text(face = "italic", size = 14),
legend.key.size = unit(1, "lines"),
legend.text = element_text(size = 11))
which results in:
If you want to use percentage values, you can use the p-column to make a plot:
ggplot() +
geom_bar(data = new.df[new.df$location == "NOT_WITHIN",],
aes(x = UserEmail, y = p, color = "darkgreen", fill = context),
size = 1, stat = "identity", width = 0.7) +
geom_bar(data = new.df[new.df$location == "WITHIN",],
aes(x = UserEmail, y = p, color = "darkred", fill = context),
size = 1, stat = "identity", width = 0.7) +
scale_y_continuous(breaks = c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),
labels = scales::percent(c(1,0.75,0.5,0.25,0,0.25,0.5,0.75,1))) +
scale_color_manual("Location of interaction",
values = c("darkgreen","darkred"),
labels = c("NOT_WITHIN","WITHIN")) +
scale_fill_manual("Type of interaction",
values = c("lightyellow","lightblue"),
labels = c("Clicked A","Clicked B")) +
coord_flip() +
guides(color = guide_legend(override.aes = list(color = c("darkred","darkgreen"),
fill = NA, size = 2), reverse = TRUE),
fill = guide_legend(override.aes = list(fill = c("lightyellow","lightblue"),
color = "black", size = 0.5))) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank(),
legend.title = element_text(face = "italic", size = 14),
legend.key.size = unit(1, "lines"),
legend.text = element_text(size = 11))
which results in:
In response to the comment
If you want to place the text-labels inside the bars, you will have to calculate a position variable too:
new.df <- RawDataSet %>%
group_by(UserEmail,location,context) %>%
tally() %>%
mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>%
group_by(UserEmail,location) %>%
mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n),
pos = (context=="Clicked A")*p/2 + (context=="Clicked B")*(c(1,-1)[(location=="NOT_WITHIN")+1L] * (1 - abs(p)/2)))
Then add the following line to your ggplot code after the geom_bar's:
geom_text(data = new.df, aes(x = UserEmail, y = pos, label = n))
which results in:
Instead of label = n you can also use label = scales::percent(abs(p)) to display the percentages.

Resources