Combined frequency histogram using two attributes - r

I'm using ggplot2 to create histograms for two different parameters. My current approach is attached at the end of my question (including a dataset, which can be used and loaded right from pasetbin.com), which creates
a histrogram visualizing the frequency for the spatial distribution of logged user data based on the "location"-attribute (either "WITHIN" or "NOT_WITHIN").
a histogram visualizing the frequency for the distribution of logged user data based on the "context"-attribute (either "Clicked A" or "Clicked B").
This looks like the follwoing:
# Load my example dataset from pastebin
RawDataSet <- read.csv("http://pastebin.com/raw/uKybDy03", sep=";")
# Load packages
library(plyr)
library(dplyr)
library(reshape2)
library(ggplot2)
###### Create Frequency Table for Location-Information
LocationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Within_area = sum(location=="WITHIN"),
Not_within_area = sum(location=="NOT_WITHIN"))
# Create a column for unique identifiers
LocationFrequency <- mutate(LocationFrequency, id = rownames(LocationFrequency))
# Reorder columns
LocationFrequency <- LocationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
LocationFrequency[,c(1)] <- sapply(LocationFrequency[, c(1)], as.numeric)
# Melt data
LocationFrequency.m = melt(LocationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(LocationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all spatial information per user.") +
labs(x="User", y="Number of notifications interaction within/not within the area") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of location")
##### Create Frequency Table for Interaction-Information
InterationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Clicked_A = sum(context=="Clicked A"),
Clicked_B = sum(context=="Clicked B"))
# Create a column for unique identifiers
InterationFrequency <- mutate(InterationFrequency, id = rownames(InterationFrequency))
# Reorder columns
InterationFrequency <- InterationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
InterationFrequency[,c(1)] <- sapply(InterationFrequency[, c(1)], as.numeric)
# Melt data
InterationFrequency.m = melt(InterationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(InterationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all interaction types per user.") +
labs(x="User", y="Number of interaction") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of interaction")
But what I'm trying to realize: How can I combine both histograms in only one plot? Would it be somehow possible to place the corressponding percentage for each part? Somethink like the following sketch, which represents the total number of observations per user (the complete height of the bar) and using the different segmentation to visualize the corresponding data. Each bar would be divided into to parts (within and not_within) where each part would be then divided into two subparts showing the percentage of the interaction types (*Clicked A' or Clicked B).

With the update description, I would make a combined barplot with two parts: a negative and a positve one. In order to achieve that, you have to get your data into the correct format:
# load needed libraries
library(dplyr)
library(tidyr)
library(ggplot2)
# summarise your data
new.df <- RawDataSet %>%
group_by(UserEmail,location,context) %>%
tally() %>%
mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>%
group_by(UserEmail,location) %>%
mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n))
The new.df dataframe looks like:
> new.df
Source: local data frame [90 x 6]
Groups: UserEmail, location [54]
UserEmail location context n n2 p
(fctr) (fctr) (fctr) (int) (dbl) (dbl)
1 andre NOT_WITHIN Clicked A 3 -3 -1.0000000
2 bibi NOT_WITHIN Clicked A 4 -4 -0.5000000
3 bibi NOT_WITHIN Clicked B 4 -4 -0.5000000
4 bibi WITHIN Clicked A 9 9 0.6000000
5 bibi WITHIN Clicked B 6 6 0.4000000
6 corinn NOT_WITHIN Clicked A 10 -10 -0.5882353
7 corinn NOT_WITHIN Clicked B 7 -7 -0.4117647
8 corinn WITHIN Clicked A 9 9 0.7500000
9 corinn WITHIN Clicked B 3 3 0.2500000
10 dpfeifer NOT_WITHIN Clicked A 7 -7 -1.0000000
.. ... ... ... ... ... ...
Next you can create a plot with:
ggplot() +
geom_bar(data = new.df[new.df$location == "NOT_WITHIN",],
aes(x = UserEmail, y = n2, color = "darkgreen", fill = context),
size = 1, stat = "identity", width = 0.7) +
geom_bar(data = new.df[new.df$location == "WITHIN",],
aes(x = UserEmail, y = n2, color = "darkred", fill = context),
size = 1, stat = "identity", width = 0.7) +
scale_y_continuous(breaks = seq(-20,20,5),
labels = c(20,15,10,5,0,5,10,15,20)) +
scale_color_manual("Location of interaction",
values = c("darkgreen","darkred"),
labels = c("NOT_WITHIN","WITHIN")) +
scale_fill_manual("Type of interaction",
values = c("lightyellow","lightblue"),
labels = c("Clicked A","Clicked B")) +
guides(color = guide_legend(override.aes = list(color = c("darkred","darkgreen"),
fill = NA, size = 2), reverse = TRUE),
fill = guide_legend(override.aes = list(fill = c("lightyellow","lightblue"),
color = "black", size = 0.5))) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 14),
axis.title = element_blank(),
legend.title = element_text(face = "italic", size = 14),
legend.key.size = unit(1, "lines"),
legend.text = element_text(size = 11))
which results in:
If you want to use percentage values, you can use the p-column to make a plot:
ggplot() +
geom_bar(data = new.df[new.df$location == "NOT_WITHIN",],
aes(x = UserEmail, y = p, color = "darkgreen", fill = context),
size = 1, stat = "identity", width = 0.7) +
geom_bar(data = new.df[new.df$location == "WITHIN",],
aes(x = UserEmail, y = p, color = "darkred", fill = context),
size = 1, stat = "identity", width = 0.7) +
scale_y_continuous(breaks = c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),
labels = scales::percent(c(1,0.75,0.5,0.25,0,0.25,0.5,0.75,1))) +
scale_color_manual("Location of interaction",
values = c("darkgreen","darkred"),
labels = c("NOT_WITHIN","WITHIN")) +
scale_fill_manual("Type of interaction",
values = c("lightyellow","lightblue"),
labels = c("Clicked A","Clicked B")) +
coord_flip() +
guides(color = guide_legend(override.aes = list(color = c("darkred","darkgreen"),
fill = NA, size = 2), reverse = TRUE),
fill = guide_legend(override.aes = list(fill = c("lightyellow","lightblue"),
color = "black", size = 0.5))) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank(),
legend.title = element_text(face = "italic", size = 14),
legend.key.size = unit(1, "lines"),
legend.text = element_text(size = 11))
which results in:
In response to the comment
If you want to place the text-labels inside the bars, you will have to calculate a position variable too:
new.df <- RawDataSet %>%
group_by(UserEmail,location,context) %>%
tally() %>%
mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>%
group_by(UserEmail,location) %>%
mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n),
pos = (context=="Clicked A")*p/2 + (context=="Clicked B")*(c(1,-1)[(location=="NOT_WITHIN")+1L] * (1 - abs(p)/2)))
Then add the following line to your ggplot code after the geom_bar's:
geom_text(data = new.df, aes(x = UserEmail, y = pos, label = n))
which results in:
Instead of label = n you can also use label = scales::percent(abs(p)) to display the percentages.

Related

How to customize Horizontal dots plot?

I want to plot customized Horizontal dots using my data and the code given here
data:
df <- data.frame (origin = c("A","B","C","D","E","F","G","H","I","J"),
Percentage = c(23,16,32,71,3,60,15,21,44,60),
rate = c(10,12,20,200,-25,12,13,90,-105,23),
change = c(10,12,-5,12,6,8,0.5,-2,5,-2))
.
origin Percentage rate change
1 A 23 10 10.0
2 B 16 12 12.0
3 C 32 20 -5.0
4 D 71 200 12.0
5 E 3 -25 6.0
6 F 60 12 8.0
7 G 15 13 0.5
8 H 21 90 -2.0
9 I 44 -105 5.0
10 J 60 23 -2.0
obs from 'origin' column need be put on y-axis. corresponding values in 'change' and 'rate' column must be presented/differentiated through in box instead of circles, for example values from 'change' column in lightblue and values from 'rate' column in blue. In addition I want to add second vertical axis on right and put circles on it which size will be defined based on corresponding value in 'Percentage' column.
Output of code from the link:
Expected outcome (smth. like this:
Try this.
First, reshaping so that both rate and change are in one column better supports ggplot's general preference towards "long" data.
df2 <- reshape2::melt(df, id.vars = c("origin", "Percentage"))
(That can also be done using pivot_wider.)
The plot:
ggplot(df2, aes(value, origin)) +
geom_label(aes(label = value, fill = variable, color = variable)) +
geom_point(aes(size = Percentage), x = max(df2$value) +
20, shape = 21) +
scale_x_continuous(expand = expansion(add = c(15, 25))) +
scale_fill_manual(values = c(change="lightblue", rate="blue")) +
scale_color_manual(values = c(change="black", rate="white")) +
theme_bw() +
theme(panel.border = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) +
labs(x = NULL, y = NULL)
The legend and labels can be adjusted in the usual ggplot methods. Overlapping of labels is an issue with which you will need to contend.
Update on OP request: See comments:
gg_dot +
geom_text(aes(x = rate, y = origin,
label = paste0(round(rate, 1), "%")),
col = "black") +
geom_text(aes(x = change, y = origin,
label = paste0(round(change, 1), "%")),
col = "white") +
geom_text(aes(x = x, y = y, label = label, col = label),
data.frame(x = c(40 - 1.1, 180 + 0.6), y = 11,
label = c("change", "rate")), size = 6) +
scale_color_manual(values = c("#9DBEBB", "#468189"), guide = "none") +
scale_y_discrete(expand = c(0.2, 0))
First answer:
Something like this?
library(tidyverse)
library(dslabs)
gg_dot <- df %>%
arrange(rate) %>%
mutate(origin = fct_inorder(origin)) %>%
ggplot() +
# remove axes and superfluous grids
theme_classic() +
theme(axis.title = element_blank(),
axis.ticks.y = element_blank(),
axis.line = element_blank()) +
# add a dummy point for scaling purposes
geom_point(aes(x = 12, y = origin),
size = 0, col = "white") +
# add the horizontal discipline lines
geom_hline(yintercept = 1:10, col = "grey80") +
# add a point for each male success rate
geom_point(aes(x = rate, y = origin),
size = 11, col = "#9DBEBB") +
# add a point for each female success rate
geom_point(aes(x = change, y = origin),
size = 11, col = "#468189")
gg_dot +
geom_text(aes(x = rate, y = origin,
label = paste0(round(rate, 1))),
col = "black") +
geom_text(aes(x = change, y = origin,
label = paste0(round(change, 1))),
col = "white") +
geom_text(aes(x = x, y = y, label = label, col = label),
data.frame(x = c(40 - 1.1, 180 + 0.6), y = 11,
label = c("change", "rate")), size = 6) +
scale_color_manual(values = c("#9DBEBB", "#468189"), guide = "none") +
scale_y_discrete(expand = c(0.2, 0))

Plot lists of time series data for factors in R

I have a series of lists describing duration (in days) of events, and I would like to plot this data as lines to compare the lists.
Below is some example data on what lunch options were served on which days at school. I have already parsed my data and this is the reduced form. Originally it was in the form of complex character strings.
soup = c(15:18)
grilledcheese = c(0:19)
pasta = c(3:13)
I want to create a graph similar to this one, with days on the x axis and soup, grilled cheese, and pasta on the y axis:
I looked online and I'm not sure what kind of graph to use for this. Part of the difficulty is that the data does not start at 0 and the y axis should represent factors.
What I tried:
I tried plotting this in ggplot but it only takes data frames. I am wondering if there is a way to plot directly from lists. It seems like there should be a straightforward solution here that maybe I am missing.
I also tried this
plot(x = grilledcheese, y = rep(1, length(grilledcheese)))
which is closer to what I want, but I'm not sure how to plot multiple factors on the y axis.
First, let's get your data in a shape easier to handle with ggplot2:
library(tidyverse)
soup = c(15:18)
grilledcheese = c(0:19)
pasta = c(3:13)
df <- data.frame(soup_min = c(min(soup),max(soup)),
grilledcheese = c(min(grilledcheese),max(grilledcheese)),
pasta = c(min(pasta),max(pasta)))
df <- pivot_longer(df, cols = 1:3) %>%
group_by(name) %>%
mutate(minv = min(value),
maxv = max(value)) %>%
ungroup() %>%
select(-value) %>%
distinct()
Data
# A tibble: 3 x 3
name minv maxv
<chr> <int> <int>
1 soup_min 15 18
2 grilledcheese 0 19
3 pasta 3 13
Graph
We can then plot the different elements you want: the starting and ending dots for each line, the lines themselves and the axis theme.
ggplot(df) +
geom_segment(aes(x = minv, xend = maxv, y = name, yend = name)) +
geom_point(aes(x = minv, y = name)) +
geom_point(aes(x = maxv, y = name)) +
scale_x_continuous(breaks = c(0:20),
labels = c(0:20),
limits = c(0,20),
expand = c(0,0)) +
theme(axis.ticks.x = element_line(size = 1),
axis.ticks.y = element_blank(),
axis.ticks.length =unit(.25, "cm"),
axis.line.x = element_line(size = 1),
panel.background = element_blank()) +
labs(x = "",
y = "")
We get this:
This should do the trick.
Extra custom
Now, if you want to have the ticks labels in-between the ticks, you might want to check here because you will have to reshape your data, and get the graph done once you have all the food types you want. Until, I just add spacing with-in the labels :
ggplot(df) +
geom_segment(aes(x = minv, xend = maxv, y = name, yend = name)) +
geom_point(aes(x = minv, y = name)) +
geom_point(aes(x = maxv, y = name)) +
scale_x_continuous(breaks = c(0:20),
labels = paste(" ",0:20),
limits = c(0,20),
expand = c(0,0)) +
theme(axis.ticks.x = element_line(size = 1),
axis.ticks.y = element_blank(),
axis.ticks.length =unit(.25, "cm"),
axis.line.x = element_line(size = 1),
panel.background = element_blank()) +
labs(x = "",
y = "")
You will first need to engineer your data into a data frame. You could do, e.g.
soup = c(15:18)
grilledcheese = c(0:19)
pasta = c(3:13)
## make dataframe
library(tidyverse)
my_x_axis <- as_tibble(seq(0,20))
names(my_x_axis) <- 'x'
my_x_axis %>% mutate(soup_y = 1*ifelse(as.numeric(x %in% soup) == 1, 1, NA)) %>%
mutate(grilledcheese_y = 2*ifelse(as.numeric(x %in% grilledcheese) == 1, 1, NA)) %>%
mutate(pasta_y = 3*ifelse(as.numeric(x %in% pasta) == 1, 1, NA)) -> data
Here, I use the knowledge that your x axis values are between 0 and 20. You could also do choose them by, e.g. by min(c(soup,grilledcheese,pasta)) and min(c(soup,grilledcheese,pasta)) or some other logic.
Following the idea from this answer, I hard-code the y axis positions for the three foods as 1, 2, and 3.
The ggplot command reads as:
library(ggplot2)
ggplot() +
geom_line(data=data, aes(x = x, y=soup_y)) +
geom_line(data=data, aes(x = x, y=grilledcheese_y)) +
geom_line(data=data, aes(x = x, y=pasta_y)) +
scale_y_discrete(labels = NULL, breaks = NULL) + labs(y = "") + ## drop y axis labels
scale_x_continuous(labels=seq(0,20,1), breaks=seq(0,20,1)) + # x axis tick marks
geom_text(aes(label = c('soup','grilledcheese','pasta'), x = 0, y = c(1,2,3), vjust = -.2,hjust=-.3)) # add labels

Stacked Diverging Bar Chart Plot by groups in ggplot

I am trying to do a chart like this one:
The idea is to plot 3 amounts, in this mixed stacked bar chart we have a dataframe which has one row for a negative value and two rows for the positive value, however i need to stack the negative with the first positive bar, i also need 3 colors. The code I have so far is as follows: (the dataframe already has the desired shape):
df3 <- read.table(
text =
"region group metric somevalue
blue T1 epsilon 63
blue T2 epsilon -40
red T1 epsilon 100
blue T1 kappa 19
blue T2 kappa -30
red T1 kappa 75
blue T1 zulu 50
blue T2 zulu -18
red T1 zulu 68", header=TRUE)
p2 <- ggplot(df3, aes(x = metric, y = somevalue, fill=region))+
geom_col(aes(fill = group), width = 0.7) + geom_bar(position = 'dodge', stat='identity')
p2
please help me out, if you think the dataframe has to be modified please let me know. thanks
Stacking and dodging is always a bit tricky. In your case this could be achieved like so:
Convert region to a factor. (This makes sure that step 3 works)
Split your dataset in two for negative and positive values.
Fill up the datasets using tidy::complete so that each dataset contains "all" combinations of metric, region and group. (This makes sure that the dodging works
Use two geom_col layers to plot the positive and negative values using position="dodge". I added na.rm = TRUE to remove the missing values we added via complete.
library(ggplot2)
library(dplyr)
library(tidyr)
df3$region <- factor(df3$region)
df3_neg <- filter(df3, somevalue < 0) %>%
tidyr::complete(region, group, metric)
df3_pos<- filter(df3, somevalue > 0) %>%
tidyr::complete(region, group, metric)
p2 <- ggplot(df3, aes(somevalue, metric)) +
geom_col(aes(alpha = group, fill=region), data = df3_pos, position = "dodge", na.rm = TRUE) +
geom_col(aes(alpha = group, fill=region), data = df3_neg, position = "dodge", na.rm = TRUE) +
scale_fill_identity() +
scale_alpha_manual(values = c(T2 = .6, T1 = 1)) +
guides(alpha = FALSE)
p2
EDIT Adding annotations could be achieved the same way, e.g. my code below uses two geom_text to add the values next to the bar where I make use of position_dodge2(.9) so that the labels align nicely with the bars:
p2 <- ggplot(df3, aes(somevalue, metric)) +
geom_col(aes(alpha = group, fill=region), data = df3_pos, position = "dodge", na.rm = TRUE) +
geom_col(aes(alpha = group, fill=region), data = df3_neg, position = "dodge", na.rm = TRUE) +
geom_text(aes(x = somevalue + 1, label = somevalue), data = df3_pos, position = position_dodge2(width = .9), hjust = 0, na.rm = TRUE) +
geom_text(aes(x = somevalue - 1, label = somevalue), data = df3_neg, , position = position_dodge2(width = .9), hjust = 1, na.rm = TRUE) +
scale_fill_identity() +
scale_alpha_manual(values = c(T2 = .6, T1 = 1)) +
guides(alpha = FALSE)
p2
EDIT2 Adding a table is indeed a different thing. In that case I would go for patchwork which means making plots to mimic the table layout. To make the dodging work or to make sure that the table rows align with the bars you have make a plot for each table column. The basic approach may look like so:
library(patchwork)
# 1. Make a dataframe with all combinations of region and metric using expand_grid
d_table <- expand_grid(region = unique(df3$region), metric = unique(df3$metric))
# 2. Add columns with the table content
d_table$column1 <- LETTERS[1:6]
d_table$column2 <- letters[1:6]
# 3. Make a plot for each column of the table
p_column1 <- ggplot(d_table, aes(y = metric, x = 1, label = column1)) +
geom_text(aes(group = region), position = position_dodge2(width = .9), na.rm = TRUE) +
scale_x_continuous(position = "top", breaks = 1, labels = "column1") +
labs(y = NULL, x = "") +
theme(axis.text.y = element_blank(),
axis.text.x.bottom = element_blank(),
axis.ticks = element_blank(),
plot.margin = unit(rep(0, 4), "pt"),
panel.background = element_rect(fill = NA))
p_column2 <- ggplot(d_table, aes(y = metric, x = 1, label = column2)) +
geom_text(aes(group = region), position = position_dodge2(width = .9), na.rm = TRUE) +
scale_x_continuous(position = "top", breaks = 1, labels = "column2") +
labs(y = NULL, x = "") +
theme(axis.text.y = element_blank(),
axis.text.x.bottom = element_blank(),
axis.ticks = element_blank(),
plot.margin = unit(rep(0, 4), "pt"),
panel.background = element_rect(fill = NA))
# 4. Add the table columns via patchwork
p2 + p_column1 + p_column2 + plot_layout(widths = c(1, .1, .1))

using y-axis values to create secondary x-axis in ggplot2

I would like to create a dot plot with percentiles, which looks something like this-
Here is the ggplot2 code I used to create the dot plot. There are two things I'd like to change:
I can plot the percentile values on the y-axis but I want these
values on the x-axis (as shown in the graph above). Note that
the coordinates are flipped.
The axes don't display label for the
minimum value (for example the percentile axis labels start at 25
when they should start at 0 instead.)
# loading needed libraries
library(tidyverse)
library(ggstatsplot)
# creating dataframe with mean mileage per manufacturer
cty_mpg <- ggplot2::mpg %>%
dplyr::group_by(.data = ., manufacturer) %>%
dplyr::summarise(.data = ., mileage = mean(cty, na.rm = TRUE)) %>%
dplyr::rename(.data = ., make = manufacturer) %>%
dplyr::arrange(.data = ., mileage) %>%
dplyr::mutate(.data = ., make = factor(x = make, levels = .$make)) %>%
dplyr::mutate(
.data = .,
percent_rank = (trunc(rank(mileage)) / length(mileage)) * 100
) %>%
tibble::as_data_frame(x = .)
# plot
ggplot2::ggplot(data = cty_mpg, mapping = ggplot2::aes(x = make, y = mileage)) +
ggplot2::geom_point(col = "tomato2", size = 3) + # Draw points
ggplot2::geom_segment(
mapping = ggplot2::aes(
x = make,
xend = make,
y = min(mileage),
yend = max(mileage)
),
linetype = "dashed",
size = 0.1
) + # Draw dashed lines
ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(trans = ~(trunc(rank(.)) / length(.)) * 100, name = "percentile")) +
ggplot2::coord_flip() +
ggplot2::labs(
title = "City mileage by car manufacturer",
subtitle = "Dot plot",
caption = "source: mpg dataset in ggplot2"
) +
ggstatsplot::theme_ggstatsplot()
Created on 2018-08-17 by the reprex package (v0.2.0.9000).
I am not 100% sure to have understood what you really want, but below is my attempt to reproduce the first picture with mpg data:
require(ggplot2)
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
g <- ggplot(data, aes(y = rank, x = cty))
g <- g + geom_point(size = 2)
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
g <- g + scale_x_continuous(name = "Mileage", limits = c(10, 25),
sec.axis = dup_axis(name = element_blank()))
g <- g + theme_classic()
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
print(g)
That produces:
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
These two lines generate the data for the graph. Basically we need the manufacturers, the mileage (average of cty by manufacturer) and the rank.
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
Note that here the scale is using rank and not the column manufacturer. To display the name of the manufacturers, you must use the labels property and you must force the breaks to be for every values (see property breaks).
The second y-axis is generated using the sec.axis property. This is very straight-forward using dup_axis that easily duplicate the axis. By replacing the labels and the breaks, you can display the %-value.
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
The horizontal lines are just the major grid. This is much easier to manipulate than geom_segments in my opinion.
Regarding your question 1, you can flip the coordinates easily using coord_flip, with minor adjustments. Replace the following line:
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted")
By the following two lines:
g <- g + coord_flip()
g <- g + theme(panel.grid.major.x = element_line(color = "black", linetype = "dotted"),
axis.text.x = element_text(angle = 90, hjust = 1))
Which produces:
Regarding your question 2, the problem is that the value 0% is outside the limits. You can solve this issue by changing the way you calculate the percentage (starting from zero and not from one), or you can extend the limit of your plot to include the value zero, but then no point will be associated to 0%.

R geom_col does not show the 'bars'

I am having this strange error regarding displaying the actual bars in a geom_col() plot.
Suppose I have a data set (called user_data) that contains a count of the total number of changes ('adjustments') done for a particular user (and a plethora of other columns). Let's say it looks like this:
User_ID total_adjustments additional column_1 additional column_2 ...
1 'Blah_17' 21 random_data random_data
2 'Blah_1' 47 random_data random_data
3 'foobar' 2 random_data random_data
4 'acbd1' 17 random_data random_data
5 'user27' 9 random_data random_data
I am using the following code to reduce it into a dataframe with only the two columns I care about:
total_adj_count = user_data %>%
select(User_ID, total_adjustments) %>%
arrange(desc(total_adjustments)) %>%
mutate(User_ID = factor(User_ID, User_ID))
This results in my dataframe (total_adj_count) looking like so:
User_ID total_adjustments
1 'Blah_1' 47
2 'Blah_17' 21
3 'acbd1' 17
4 'user27' 9
5 'foobar' 2
Moving along, here is the code I used to attempt to create a geom_col() plot of that data:
g = ggplot(data=total_adj_count, aes(x = User_ID, y = total_adjustments)) +
geom_bar(width=.5, alpha=1, show.legend = FALSE, fill="#000066", stat="identity") +
labs(x="", y="Adjustment Count", caption="(based on sample data)") +
theme_few(base_size = 10) + scale_color_few() +
theme(axis.text.x=element_text(angle = 45, hjust = 1)) +
geom_text(aes(label=round(total_adjustments, digits = 2)), size=3, nudge_y = 2000) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
p = ggplotly(g)
p = p %>%
layout(margin = m,
showlegend = FALSE,
title = "Number of Adjustments per User"
)
p
And for some strange reason when I try to view plot p it displays all parts of the plot as intended, but does not show the actual bars (or columns).
In fact I get this strange plot and am sort of stuck where to fix it:
Change nudge_y argument to a smaller number. Right now you have it set to 2000 which offsets the labels by 2000 on the y-axis. Below I've changed it to nudge_y = 2 and it looks like so:
g <-
ggplot(total_adj_count, aes(User_ID, total_adjustments)) +
geom_col(width = .5, alpha = 1, show.legend = FALSE, fill = "#000066") +
labs(x = "", y = "Adjustment Count", caption = "(based on sample data)") +
theme_few(base_size = 10) +
scale_color_few() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = round(total_adjustments, digits = 2)), size = 3, nudge_y = 2) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
Full copy/paste:
library(ggplot2)
library(ggthemes)
library(plotly)
library(dplyr)
text <- " User_ID total_adjustments
1 'Blah_1' 47
2 'Blah_17' 21
3 'acbd1' 17
4 'user27' 9
5 'foobar' 2"
total_adj_count <- read.table(text = text, header = TRUE, stringsAsFactors = FALSE)
g <-
ggplot(total_adj_count, aes(User_ID, total_adjustments)) +
geom_col(width = .5, alpha = 1, show.legend = FALSE, fill = "#000066") +
labs(x = NULL, y = "Adjustment Count", caption = "(based on sample data)", title = "Number of Adjustments per User") +
theme_few(base_size = 10) +
scale_color_few() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = round(total_adjustments, digits = 2)), size = 3, nudge_y = 2) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
p <- ggplotly(g)
p <- layout(p, showlegend = FALSE)
p

Resources