A way to always dodge a histogram? [duplicate] - r

This question already has answers here:
Don't drop zero count: dodged barplot
(6 answers)
Closed 2 years ago.
Using ggplot2 I'm creating a histogram with a factor on the horizontal axis and another factor for the fill color, using a dodged position. My problem is that the fill factor sometimes takes only one value for a value of the horizontal factor, and with nothing to dodge the bar takes up the full width. Is there a way to make it dodge nothing so that all bar widths are the same? Or equivalently to plot the 0's?
For example
ggplot(data = mtcars, aes(x = factor(carb), fill = factor(gear))) +
geom_histogram(position = "dodge")
This answer has a couple ideas. It was also asked before the new version was released, so maybe something changed? Using facets (also shown here) I don't like for my situation, though I suppose editing the data and using geom_bar could work, but it feels inelegant. Moreover, when I tried facetting anyway
ggplot(mtcars, aes(x = factor(carb), fill = factor(gear))) +
geom_bar() + facet_grid(~factor(carb))
I get the error "Error in layout_base(data, cols, drop = drop):
At least one layer must contain all variables used for facetting"
I suppose I could generate a data frame of counts and then use geom_bar,
mtcounts <- ddply(subset(mtcars, select = c("carb", "gear")),
.fun = count, .variables = c("carb", "gear"))
filling out the levels that aren't present with 0's. Does anyone know if that would work or if there's a better way?

Updated geom_bar needs stat = "identity"
I'm not sure if this is too late for you, but see the answer to a recent post here
That is, I'd take Joran's advice to pre-calculate the counts outside the ggplot call and to use geom_bar. As with the answer to other post, the counts are obtained in two steps: first, a crosstabulation of counts is obtained using dcast; then second, melt the crosstabulation.
library(ggplot2)
library(reshape2)
dat = dcast(mtcars, factor(carb) ~ factor(gear), fun.aggregate = length)
dat.melt = melt(dat, id.vars = "factor(carb)", measure.vars = c("3", "4", "5"))
dat.melt
(p <- ggplot(dat.melt, aes(x = `factor(carb)`, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge"))
The chart:

As shown in this answer, in newer versions of ggplot2 (version >= 2.2.1.900) there is a simpler way: position_dodge gains a preserve argument that if set to "single" will always dodge.
ggplot(data = mtcars, aes(x = factor(carb), fill = factor(gear))) +
geom_bar(position = position_dodge(preserve = "single"))

Related

Make x-axis appear in a particular order in ggplot [duplicate]

This question already has answers here:
How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate]
(2 answers)
Order discrete x scale by frequency/value
(7 answers)
Closed 17 days ago.
I have a dataset:
data <- c('real','real','real','real','real','pred','pred','pred','pred','pred','real','real','real','real','pred','pred','pred','pred')
threshold <- c('>=1','>=2','>=3','>=4','>=101','>=1','>=2','>=3','>=4','>=101','>=1','>=2','>=3','>=4','>=1','>=2','>=3','>=4')
accuracy <- c(63.4,64.4,65.1,64.3,65.4,62.1,63.6,64.1,65.4,64.8,62.2,63.3,64.4,65.6,63.1,63.8,64.6,65.1)
types<-c('morning','morning','morning','morning','morning','morning','morning','morning','morning','morning','evening','evening','evening','evening','evening','evening','evening','evening')
df <- data.frame(data,threshold,accuracy,types)
I want to plot 'data' column as stacked barplot for morning and evening separately. So I use facet wrap. My code for plotting is:
ggplot(df, aes(x = threshold, y = accuracy)) + geom_bar(aes(fill = data), stat = "identity", color = "white",position = position_dodge(0.9))+
facet_wrap(~types) +
fill_palette("jco")
And the plot I get looks like:
However, as you can see the order of threshold got messed up. I want the order for morning to look like:
'>=1','>=2','>=3','>=4','>=101'
And the order for evening should be:
'>=1','>=2','>=3','>=4'
So I have three questions:
How can I enforce the order using my code?
2 Also for evening I shouldn't be getting '>=101' so how can I remove that from the plot.
Is there a way to make the background white but keep the grid.
And on a slightly unrelated note, can you point at a graph type that might be slightly better looking than this? I am new at visualisation so I am still learning.
Insights will be appreciated.
You may set order of threshold to reorder x axis.
Then add scales = 'free' in facet_wrap to remove >=101 in evening,
add theme_bw() to make background white.
df %>%
mutate(threshold = factor(threshold, levels = c('>=1','>=2','>=3','>=4','>=101'))) %>%
ggplot(aes(x = threshold, y = accuracy)) + geom_bar(aes(fill = data), stat = "identity", color = "white",position = position_dodge(0.9))+
facet_wrap(~types, scales = 'free') +
theme_bw() +
fill_palette("jco")

Adding catagory labels on a stacked bar chart in R

I am have a made a geom_col() chart in ggplot2 for which I would like to add labels on the bars, but only one per stacked bar.
This is what my chart looks like without the labels:
gI <- df %>%
ggplot(aes(fill = Category, y=csum, x= tijdas)) +
geom_col()
plot(gI)
And now comes the tricky part. Every stacked bar has is a specific Meeting_type and I would like to add this to the plot. I have tried adding
geom_text(aes(label = Meeting_Type), position = position_dodge(width=0.9), vjust=-0.25)
But this resulted in a label for every category within each bar (so a lot of text):
I would like only one label per (stacked) bar that indicates the Meeting_type. Preferably in a readable place.
Difficult to know the best approach without seeing your data, but it's possible that substituting geom_text for stat_summary would be a good way of summing each column to get the position of the label:
library(ggplot2)
library(dplyr)
mtcars %>%
ggplot(aes(factor(cyl), y = mpg, fill = factor(gear))) +
geom_col() +
stat_summary(aes(label = factor(cyl), fill = NULL),
fun = sum,
geom = "text", vjust = -0.25)
Created on 2020-12-15 by the reprex package (v0.3.0)
As I say, may need a different function for your data - if this isn't a solution then do post a sample using dput and we'll see if we can make it work for your data.

ggplot2's geom_text refusing to dodge

I have a pretty straightforward dataset consisting of a week of two totals in groups, which I'm displaying in an identity bar plot using ggplot2 (version 3.3.0).
library(ggplot2)
library(lubridate)
weeksummary <- data.frame(
Date = rep(as.POSIXct("2020-01-01") + days(0:6), 2),
Total = rpois(14, 30),
Group = c(rep("group1", 7), rep("group2", 7))
)
ggplot(data = weeksummary, mapping = aes(x = Date, y = Total, fill = Group)) +
geom_col(position = "dodge") +
geom_text(aes(label = Total), position = position_dodge(width = 0.9), size = 3)
I cannot for the life of me get this to put the numbers at the top of their own bars, been hunting around for an answer and trying everything I found with no luck, until I randomly tried this:
weeksummary$Date <- as.factor(weeksummary$Date)
But this seems unnecessary manipulation, and I'd need to make sure the dates appear in the right format and order and rewrite the additional bits that currently rely on dates... I'd rather understand what I'm doing wrong.
What you're looking for is to use as.Date.POSIXct. as.factor() works to force weeksummary$Date into a factor, but it forces the conversion of your POSIXct class into a character first (thus erasing "date"). However, you need to convert to a factor so that dodging works properly - that's the question.
You can either convert before (e.g. weeksummary$Date <- as.Date.POXIXct(weeksummary$Date)), or do it right in your plot call:
ggplot(weeksummary, aes(x = as.Date.POSIXct(Date), y = Total, fill = Group)) +
geom_col(position = 'dodge') +
geom_text(aes(label = Total, y = Total + 1),
position = position_dodge(width = 0.9), size = 3)
Giving you this:
Note: the values are different than your values, since our randomization seeds are likely not the same :)
You'll notice I nudged the labels up a bit. You can normally do this with nudge_y, but you cannot specify nudge_x or nudge_y the same time you specify a position= argument. In this case, you can just nudge by overwriting the y aesthetic.
Because geom_text inherits x aesthetics which is Date in this case, which is totally correct. You don't have to mutate your data frame, you can specify the behaviour when plotting instead
aes(x = factor(Date), y = ...),

How to reduce binwidth in geom_bar for one single bar?

I'm trying to get a side-by-side bar plot using ggplot's geom_bar(). Here's some sample data I made up for replication purposes:
dat <- data.frame("x"=c(rep(c(1,2,3,4,5),5)),
"by"=c(NA,0,0,0,0,NA,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
I want to plot "x" grouped by "by". Now, because I don't need to plot NA values, I filtered for !is.na(by))
library(dplyr)
dat <- filter(dat, !is.na(by))
Now for the plot:
library(ggplot2)
ggplot(dat, aes(x=x, fill=as.factor(by))) + geom_bar(position="dodge") + theme_tufte()
This returns what I need; almost. Unfortunately, the first bar looks really weird, because it's binwidth is twice as high (due to the fact that there are no zeros in "by" for "x"==1).
Is there a way to reduce the binwidth for the first bar back to "normal"?
You could also do it like this. Precalculate the table and use geom_col.
ggplot(as.data.frame(table(dat)), aes(x = x, y = Freq, fill = by)) +
theme_bw() +
geom_col(position = "dodge")
Never mind, I just figured out that you can manipulate the binwidth argument using an ifelse statement.
...geom_bar(..., binwidth = ifelse("by"==1 & is.na("x"), .5, 1)))
So if you play around with this, it will work. At least it worked for me.

Looping over variables in ggplot to create a grid of density distributions for each variable

I want to create a grid of density distribution plots, with a dashed vertical line at the mean, for multiple variables I have in a dataset. Using mtcars dataset as an example, the code for a single variable plot would be:
ggplot(mtcars, aes(x = mpg)) + geom_density() + geom_vline(aes(xintercept =
mean(mpg)), linetype = "dashed", size = 0.6)
I am unclear about how I alter this to make it loop over specified variables in my dataset and produce a grid with the plots of each one. It seems like it would involve some combination of adding facet_grid and the "vars" argument but I have tried a number of combinations with no success.
It seems like in all the examples I can find online, facet_grid splits the plots by subsets of a variable, while keeping the same x and y for each plot, but I want to have the plot of x vary in each graph and the y is the density of values.
In trying to solve this, it is also my understanding that the new release of ggplot includes something involving "quasiquotation" which may help solve my problem (https://www.tidyverse.org/articles/2018/07/ggplot2-tidy-evaluation/) but again, I couldn't quite figure out how to apply the examples provided here to my own issue.
Consider reshaping the data into long format than plotting with facets. Here both x and y scales are free since plot differ in magnitude across the columns.
rdf <- reshape(mtcars, varying = names(mtcars), v.names = "value",
times = names(mtcars), timevar = "variable",
new.row.names = 1:1000, direction = "long")
ggplot(rdf, aes(x = value)) + geom_density() +
geom_vline(aes(xintercept = mean(value)), linetype = "dashed", size = 0.6) +
facet_grid(~variable, scales="free")

Resources