GGPLOT2: Stacked bar plot for two discrete variable columns - r

I have a dataset with three columns (one categorical column and two-discrete variables column). I want to make a stacked bar plot to compare the values of the two discrete variables for each category. However, I get continuous coloring rather than discrete colors.
Reproducible code
sampleData <- data.frame(grp = c("A","B", "C"),
var_1 = c(15,20, 25),
var_2 = c(12, 13, 20))
sampleData
p <- ggplot(sampleData, aes(x = grp, y = var_1, fill= var_2)) +
geom_bar( stat="identity", position = "fill")+
coord_flip()+ theme_bw()
p
Instead, what I want is
*Var2 will always be smaller than its corresponding Var1 value for a particular category.
Thanks for the help!

Your problem here is that you haven't fixed your tibble from Wide to Long.
FixedData <- sampleData %>%
pivot_longer(cols = c("var_1", "var_2"), names_prefix = "var_",
names_to = "Variable Number", values_to = "ValueName")
Once you do this, the problem becomes much easier to solve. You only need to change a few things, most notably the y, fill, and position variables to make it work.
p2 <- ggplot(FixedData, aes(x = grp, y = ValueName, fill = `Variable Number`)) +
geom_bar(stat="identity", position = "stack")+
coord_flip()+ theme_bw()
p2

Related

ggplot2: geom_bar with facet-wise proportion and fill argument

I'm trying to plot proportions with geom_bar() combining fill and facet_grid.
library(tidyverse)
set.seed(123)
df <- data_frame(val_num = c(rep(1, 60), rep(2, 40), rep(1, 30), rep(2, 70)),
val_cat = ifelse(val_num == 1, "cat", "mouse"),
val_fill = sample(c("black", "white", "gray"), 200, replace = TRUE),
group = rep(c("A", "B"), each = 100))
ggplot(df) +
stat_count(mapping = aes(x = val_cat, y = ..count../tapply(..count.., ..x.. , sum)[..x..],
fill = val_fill),
position = position_dodge2(preserve = "single")) +
facet_grid(.~ group)
However, it seems that proportions are calculated for all cats (or all mices) in categories A and B together. In other words, sum of proportions in the first three columns is not 1.
It should be solved with adding group = group into the mapping. However:
ggplot(df) +
stat_count(mapping = aes(x = val_cat, y = ..count../tapply(..count.., ..x.. , sum)[..x..],
fill = val_fill, group = group),
position = position_dodge2(preserve = "single")) +
facet_grid(.~ group)
plot ignores fill argument (and moreover does not solve the issue). I tried to specify group with different choices including interaction() but without any real success.
I would like to solve problem within ggplot and I would like to avoid data manipulation before plotting.
So it wasn't as easy as I thought because I don't tend to use the stat_xxx() functions. Although you seem persistent in not manipulating the data before hand, here is an approach you can use.
grouped.df <- df %>%
group_by( group, val_fill ) %>%
count( val_cat ) %>%
ungroup() %>%
group_by( group, val_cat ) %>%
mutate( prop=n/sum(n) ) %>%
ungroup()
grouped.df %>%
ggplot() +
geom_col( aes(x=val_cat,y=prop,fill=val_fill), position="dodge" ) +
facet_wrap( ~ group )
to produce
But getting back to your "no data manipulation approach", I think your error is within your y variable. For example, consider the following code and output.
df2 %>%
ggplot() +
stat_count( aes(x=val_cat,y=..count..,color=val_fill,label=tapply(..count.., ..x.. , sum)[..x..]),
geom="text" ) +
facet_wrap( ~ group )
In the plot above, the y value is the numerator of your attempted proportion and the label value is the denominator of your attempted proportion. I think all you need to do is mess around some more with your tapply() function calls until you have the right combination of y and label.

Within a function, how to create a discrete axis with _repeated and ordered_ labels

I want to create a function that makes a heatmap where the y axis will have unique breaks, but repeated and ordered labels. I know that this is might not be a great practice. I am also aware that similar questions have been asked before. For example: ggplot in R, reordering the bars. But I want to achieve these repeated and ordered labels through sorting within a function, not by typing them manually. I am aware of solutions for reordering axes based on the values of factor (e.g., Order Bars in ggplot2 bar graph), but I don't think they apply or can't see how to apply these to my case, where the breaks are unique but the labels repeat.
Here is some code to reproduce the problem and some of my attempts:
Libraries and data
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(4)
id <- LETTERS[1:10]
lab <- paste(c("AB", "CD"), 1:5, sep = "_") %>%
sample(., size = 10, replace = TRUE)
val <- sample.int(n = 6, size = 10, replace = TRUE)
tes <- ifelse(val >= 4, 1, 0)
dat <- data.frame(id, lab, val, tes)
A heatmap with unique breaks on the y axis
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
A heatmap where the y axis is labeled with repeated labels instead of the unique breaks
This works, to the point that labels are used instead of unique ids, but the y axis is not ordered by the labels. Also, I am not sure about setting breaks and labels from the data frame in wide format (dat), rather than the data frame in long format used by ggplot (dat2).
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=dat$id, labels=dat$lab)
Mapping the vector of with repeated values on the y axis obviously doesn't work
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = lab, fill = value), color="white", size=1)
Repeated and ordered labels, try 1
As expected, merely sorting the input data by the non-unique lab variable does not work.
dat2 <- dat %>% gather(kind, value, val:tes) %>%
arrange(lab)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=id, label=lab)
Repeated and ordered labels, try 2
Try to create a named breaks vector ordered by the (repeating) labels. This gets me nowhere. Half the labels are missing and they are still not sorted.
dat2 <- dat %>% gather(kind, value, val:tes)
brks <- setNames(dat$id, dat$lab)[sort(dat$lab)]
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = brks, labels = names(brks))
Repeated and ordered labels, try 3
Starting with the data frame sorted by label, try to create an ordered factor for lab. Then sort the table by this ordered factor. No luck.
dat2 <- dat %>% gather(kind, value, val:tes) %>% arrange(lab)
dat2 <- mutate(dat2, lab_f=factor(lab, levels=sort(unique(lab)), ordered = TRUE))
dat2 <- arrange(dat2, lab_f)
# check
dat2$lab_f
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = dat2$id, labels = dat2$lab_f)
A workaround, which I can use if I have to, but I am trying to avoid
We can create a combination of id and lab which will be unique and use it for the y axis
dat2 <- dat %>% gather(kind, value, val:tes) %>%
mutate(id_lab=paste(lab, id, sep="_"))
ggplot(dat2) +
geom_tile(aes(x = kind, y = id_lab, fill = value), color="white", size=1)
I must be missing something. Any help is much appreciated.
The goal is to have a function that will take an arbitrarily long table and plot a y axis with unique breaks but (possibly) repeated and ordered labels.
heat <- function(dat) {
dat2 <- dat %>% gather(kind, value, val:tes)
# any other manipulation here
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
# scale_y_discrete() (if needed)
}
The plot I am looking for is something like this (created in inkscape)
Using limits instead of breaks sets the order:
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
geom_text(aes(x = 1, y = id, label = id), col = 'white') +
scale_y_discrete(limits = dat$id[order(dat$lab)], labels = sort(dat$lab))

ggplot faceted cumulative histogram

I have the following data
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(100, 6, 1))
gender = rep(c("Male", "Female"), each=100)
mydata = data.frame(x=x, gender=gender)
and I want to plot two cumulative histograms (one for males and the other for females) with ggplot.
I have tried the code below
ggplot(data=mydata, aes(x=x, fill=gender)) + stat_bin(aes(y=cumsum(..count..)), geom="bar", breaks=1:10, colour=I("white")) + facet_grid(gender~.)
but I get this chart
that, obviously, is not correct.
How can I get the correct one, like this:
Thanks!
I would pre-compute the cumsum values per bin per group, and then use geom_histogram to plot.
mydata %>%
mutate(x = cut(x, breaks = 1:10, labels = F)) %>% # Bin x
count(gender, x) %>% # Counts per bin per gender
mutate(x = factor(x, levels = 1:10)) %>% # x as factor
complete(x, gender, fill = list(n = 0)) %>% # Fill missing bins with 0
group_by(gender) %>% # Group by gender ...
mutate(y = cumsum(n)) %>% # ... and calculate cumsum
ggplot(aes(x, y, fill = gender)) + # The rest is (gg)plotting
geom_histogram(stat = "identity", colour = "white") +
facet_grid(gender ~ .)
Like #Edo, I also came here looking for exactly this. #Edo's solution was the key for me. It's great. But I post here a few additions that increase the information density and allow comparisons across different situations.
library(ggplot2)
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(50, 6, 1))
gender = c(rep("Male", 100), rep("Female", 50))
grade = rep(1:3, 50)
mydata = data.frame(x=x, gender=gender, grade = grade)
ggplot(mydata, aes(x,
y = ave(after_stat(density), group, FUN = cumsum)*after_stat(width),
group = interaction(gender, grade),
color = gender)) +
geom_line(stat = "bin") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~grade)
I rescale the y so that the cumulative plot always ends at 100%. Otherwise, if the groups are not the same size (like they are in the original example data) then the cumulative plots have different final heights. This obscures their relative distribution.
Secondly, I use geom_line(stat="bin") instead of geom_histogram() so that I can put more than one line on a panel. This way I can compare them easily.
Finally, because I also want to compare across facets, I need to make sure the ggplot group variable uses more than just color=gender. We set it manually with group = interaction(gender, grade).
Answering a million years later....
I was looking for a solution for the same problem and I got here..
Eventually I figured it out by myself, so I'll drop it here in case other people will ever need it.
As required: no pre-work is necessary!
ggplot(mydata) +
geom_histogram(aes(x = x, y = ave(..count.., group, FUN = cumsum),
fill = gender, group = gender),
colour = "gray70", breaks = 1:10) +
facet_grid(rows = "gender")

How to draw a barplot from counts data in R?

I have a data-frame 'x'
I want barplot like this
I tried
barplot(x$Value, names.arg = x$'Categorical variable')
ggplot(as.data.frame(x$Value), aes(x$'Categorical variable')
Nothing seems to work properly. In barplot, all axis labels (freq values) are different. ggplot is filling all bars to 100%.
You can try plotting using geom_bar(). Following code generates what you are looking for.
df = data.frame(X = c("A","B C","D"),Y = c(23,12,43))
ggplot(df,aes(x=X,y=Y)) + geom_bar(stat='identity') + coord_flip()
It helps to read the ggplot documentation. ggplot requires a few things, including data and aes(). You've got both of those statements there but you're not using them correctly.
library(ggplot2)
set.seed(256)
dat <-
data.frame(variable = c("a", "b", "c"),
value = rnorm(3, 10))
dat %>%
ggplot(aes(x = variable, y = value)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip()
Here, I'm piping my dat to ggplot as the data argument and using the names of the x and y variables rather than passing a data$... value. Next, I add the geom_bar() statement and I have to use stat = "identity" to tell ggplot to use the actual values in my value rather than trying to plot the count of the number.
You have to use stat = "identity" in geom_bar().
dat <- data.frame("cat" = c("A", "BC", "D"),
"val" = c(23, 12, 43))
ggplot(dat, aes(as.factor(cat), val)) +
geom_bar(stat = "identity") +
coord_flip()

grouping labels in ggplot

Here's my attempt to create a heatmap using ggplot2.
#DATA
set.seed(42)
df1 = data.frame(ID = paste0("I", 1:40),
group = rep(c("Dry", "Rain"), each = 20),
subgroup = rep(paste0("S", 1:4), each = 10),
setNames(data.frame(replicate(8, rnorm(40))), letters[1:8]))
library(reshape2)
df1 = melt(df1, id.vars = c("ID", "group", "subgroup"))
df1 = df1[order(df1$group, df1$subgroup),]
df1$fact = paste(df1$subgroup, df1$ID)
df1$fact = factor(df1$fact, levels = unique(df1$fact))
#PLOT
library(ggplot2)
ggplot(df1, aes(x = variable, y = fact, fill = value)) +
geom_tile() +
scale_y_discrete(labels = df1$subgroup[!duplicated(df1$ID)])
The plot is exactly what I want except for the fact that the labels S1, S2, S3, and S4 repeat 10 times each. Is there a way to display them only one time and then put some kind of break between S1, S2, S3, and S4.
I am also curious if there is way to put group to the left of subgroup in the plot as a secondary y-axis but that is optional.
You can use facet_grid which would address both having a subgroup indicator on the y-axis and a white space separation between the subgroups.
You can also remove y-axis labels in theme to avoid redundancy.
ggplot(df1, aes(x = variable, y = fact, fill = value)) +
geom_tile() +
facet_grid(subgroup~., scales="free_y") +
theme(axis.text.y = element_blank())
Note: scales="free_y" is necessary because fact is not identical across subgroups, see output if this parameter is absent.

Resources