ggplot geom_bar leave blank spaces for 0 values by group - r

Below is a simple ggplot bar plot:
x<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3)
y<-c(1,2,3,4,5,3,3,3,3,4,5,5,6,7,6,5,4,3,2,3,4,5,3,2,1,1,1,1,1)
d<-cbind(x,y)
ggplot(data=d,aes(x=x,fill=as.factor(y)))+
geom_bar(position = position_dodge())
The issue I'm having is that each value of y is not present in each grouping x. So for example, group 1 along the x-axis only contains groups 1-5 of the y variable, and doesn't have any values for 6 or 7. What I would like is for the plot to leave blank spaces when there is are no values for a y in the given x-grouping, this way it is easier to compare the x-groups.

A solution is to compute the frequencies manually and plot the graph based on that frequencies table.
library(ggplot2)
d1 <- data.frame(table(d))
d1$x <- factor(d1$x)
ggplot(d1, aes(x, Freq, fill = factor(y))) +
geom_bar(stat = "identity", position = position_dodge())

library(tidyverse)
# set factor levels
d2 <- d %>% data.frame() %>% mutate(x=factor(x, levels=c(1:3)),
y=factor(y, levels=c(1:7)))
# count frequencies and send to ggplot2
d2 %>% group_by(x, y, .drop=F) %>% tally() %>%
ggplot(aes(x=x, y=n, fill=y, color=y)) +
geom_bar(position = position_dodge2(),
stat="identity")
Another way to do this using dplyr is to use tally() to count the frequencies, but you need to make sure that you have your variables set as factors first.
Using color=y & fill=y in the aes statement helps to show exactly where on the plot the zero values are. So, now you can see that it is y=6 & y=7 missing from x=1 & x=3, and y=1 missing from x=2
And I chose position_dodge2 for my own personal preferences.

Related

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.
When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.
I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.
Here is my code:
# Start plotting
g <- ggplot(NULL)
# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval,
na.rm = TRUE), y = numval, fill = catval), trim = TRUE,
scale = "width", adjust = 0.5)
(snip)
# Specify colors
g <- g + scale_colour_viridis_d()
# Remove legend
g <- g + theme(legend.position = "none")
# Flip for readability
g <- g + coord_flip()
# Produce plot
g
Here is the resulting plot.
If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.
Is there a way to get what I'm after?
I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.
library(dplyr)
diamonds %>%
group_by(cut) %>%
summarize(mean_price = mean(price))
# A tibble: 5 x 2
cut mean_price
<ord> <dbl>
1 Fair 4359.
2 Good 3929.
3 Very Good 3982.
4 Premium 4584.
5 Ideal 3458.
By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price, fill = cut)) +
geom_violin() +
coord_flip()
If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price,
fill = reorder(cut, -price))) +
geom_violin() +
coord_flip()
Or
diamonds %>%
mutate(cut = reorder(cut, -price)) %>%
ggplot(aes(x = cut, y = price, fill = cut)) +
geom_violin() +
coord_flip()

How to always have fixed number of bins in geom_bar with missing values

I would like to ask how to always have fixed number of bins in barplots no matter how much variables we have - it must be in bar plot not histogram
for example:
DF <- mtcars
ggplot(DF, aes(gear)) + geom_bar()
will produce three bars from (3 to 5 values) I would like to also have values 1 and 2 and they must be equal to zero - So we will end up with 5 bar plots. where 2 will be equal to 0 and last 3 values will be equal to values in dataset.
You need to include the counts for all missing values of gear that you want. One way of achieving that is by using complete:
DF <- mtcars %>%
group_by( gear ) %>%
tally() %>%
complete( gear = 1:max(gear), fill = list(n=0) )
ggplot(DF, aes(x = gear, y = n)) + geom_bar( stat = 'identity' )
You can edit the properties of the x-axis to include 1 and 2. You can add a scale_x_continous and manually define the breaks and the limits. However, you cannot really see the column for these values because it is a line...
library(tidyverse)
DF <- mtcars
ggplot(DF, aes(gear)) + geom_bar() +
scale_x_continuous(breaks = 1:5, limits = c(0.5,5.5))
Created on 2019-12-06 by the reprex package (v0.3.0)
Does this help?

How to filter ggplot bar graph to only show counts above a threshold

Say I use the following to produce the bar graph below. How would I only visualize bars where the count is above, say, 20? I'm able to do this kind of filtering with the which function on variables I create or that exist in my data, but I'm not sure how to access/filter the auto-counts generated by ggplot. Thanks.
g <- ggplot(mpg, aes(class))
g + geom_bar()
Simple bar graph
Aggregating and filtering your data before plotting and using stat = "identity" is probably the easiest solution, e.g.:
library(tidyverse)
mpg %>%
group_by(class) %>%
count %>%
filter(n > 20) %>%
ggplot(aes(x = class, y = n)) +
geom_bar(stat = "identity")
You can try this, so the auto-counts are ..count.. in aes (yes I know it's weird, you can see Special variables in ggplot (..count.., ..density.., etc.)). And if you apply an ifelse, that makes it NA if < 20, then you have your plot.. (not very nice code..)
g <- ggplot(mpg, aes(class))
g + geom_bar(aes(y = ifelse(..count.. > 20, ..count.., NA)))

Boxplots grouped by group value

I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.
A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.

Fill with three groups - how to center the third group and remove unused boxplot

How do I avoid ggplot to make an emtpy boxplot in the case when I have only three groups? ggplot(df, aes(group, value, fill=group)) + geom_boxplot()
It is hard to know for sure without seeing the data but it seems like you have four groups as follows:
# Make 3 repetative groups
group <- rep(c("group_1","group_2","group_3"),n)
# Generate values for defined groups
value <- rnorm(length(group), mean = 5, sd = 1)
# Data frame with 1 more group with value
df <- data.frame(c("group_01", group), c(5, value))
colnames(df) <- c("group", "value")
ggplot(df, aes(group, value, fill = group)) + geom_boxplot()
From this simulated dataset we obtain boxplot as follows in this graph, which seems to be your case.
You should check for levels in your data frame and remove the ones that are not necessary:
# Check for levels
levels(df$group)
# Remove unwanted group
df <- df[df$group != "group_01",]
# Plot the cleaned df
ggplot(df, aes(group, value, fill = group)) + geom_boxplot()
Now you acquire a graph with three groups.

Resources