ggplot2: axis markers when grouping by two variables - r

I have a simple bar graph in ggplot, with two factor variables on the x axis:
library(ggplot2)
dat <- data.frame(group1= c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
group2= rep(1:4,4),
val = 1:16)
ggplot(dat, aes(x=group1,y=val,group=group2))+
geom_bar(stat="identity", position="dodge")
What is the simplest way to add a second x axis label (for group2)? There is a more complex version of this question here, but I don't see how to apply this logic to this simple case.

As suggested at the question posted by Jimbou, one solution is:
ggplot(dat, aes(y=val,x=group2))+
geom_bar(stat="identity")+
facet_grid(.~group1,scales="free")
I'd be curious to know whether there is another solution using annotate, as also suggested in that question, that works in the case in which the grouping variables are two factors.

Related

Unexpected behaviour when re-ordering facets (for ggplot2)

I'd like some help understanding an error so that it can't happen again.
I was producing some (gg) plots and wanted to change the order of facets for aesthetic reasons. The way I did this had unexpected consequences and almost slipped through the net when I was checking the results - it could have caused serious problems with the article I'm working on!
I wanted to re-order the facets based on a numerical vector that I could define up-front
E.g. facet_order=c(1,2,4,3). This was so the graph syntax could be copied / pasted for repeat graphs more easily and I wouldn't have to dig around too much in the code each time.
# some example data:
df <- data.frame(x=c(1,2,3,4), y=c(1,2,3,4), facet_var=factor(c('A','B','C','D')))
# First plot (facet order defined by default):
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var, nrow = 1)+labs(title='Original data')
In the second plot, facets 'C' and 'D' are swapped as intended:
# reorder facets (normal method)
df$facet_var2 <- factor(df$facet_var, levels=c('A','B','D','C')) # Set the facets var
as a factor, to define the order
# Second plot:
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var2, nrow = 1)+labs(title='Re-
ordered facets', subtitle='working as expected')
However, this is the mistake I made:
# different syntax to reorder the facets
df$facet_var3 <- df$facet_var # duplicate the faceting variable
levels(df$facet_var3) <- levels(df$facet_var3)[c(1,2,4,3)] # I thought I was just
re-ordering the levels here
# Third plot:
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var3, nrow = 1)+labs(title='Re-
ordered facets (method 2)',subtitle='Unexpected behaviour')
In the third graph, it looks like the data doesn't move, but the facet labels do, which is obviously wrong.
Digging a bit deeper, it appears that my syntax changed not only the order of the factor, but actually the underlying data in the factor variable. Is this behaviour expected?
Here's the crux of it:
facet_order <- c(1,2,4,3)
levels(df$facet_var) <- levels(df$facet_var)[facet_order] # bad
df$facet_var <- factor(df$facet_var, levels=c(levels(df$facet_var)[facet_order)) #
good
Obviously I now know the solution but I'm still unclear what I actually did wrong here. Any pointers?
Hang on while I try and fix the images:
quick'n'dirty: posterior reordering with fct_reorder of {forcats} (part of tidyverse):
ggplot(df, aes(x,y)) +
geom_point() +
facet_wrap(~ fct_reorder(facet_var, c('B','A','D','C')),
nrow = 1)

How to obtain a 'normal' boxplot? (R)

I was trying to make a boxplot using the R environment following the many guides that I found online (such this one: http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization) using my dataframe:
library(ggplot2)
value=c('2000000','115000','500000','20000','3000','1000000')
condition=c('C','C','C','H','H','H')
df=data.frame(value,condition)
df$value=as.factor(df$value)
ggplot(df, aes(x=condition, y=value))+
geom_boxplot()
However, following these steps, my results is similar to this figure:
https://i.stack.imgur.com/HloKG.png
I can't figure it out why ggplot cannot understand that I'm using two conditions!
Thanks for your help
Why are your value values character (originally) or factor (after as_factor)? They need to be numeric for a boxplot y axis.
library(ggplot2)
df$value <- as.numeric(df$value)
ggplot(df, aes(x = condition, y = value))+
geom_boxplot()
The value attribute should be numerical, not a factor:
df$value=as.factor(df$value)
Then you will have two boxplots of condition type.

R - Bar Plot with transparency based on values?

I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale

ggplot: How to increase space between axis labels for categorical data?

I love ggplot, but find it hard to customize some elements such as X axis labels and grid lines. The title of the question says it all, but here's a reproducible example to go with it:
Reproducible example
library(ggplot2)
library(dplyr)
# Make a dataset
set.seed(123)
x1 <- c('2015_46','2015_47','2015_48','2015_49'
,'2015_50','2015_51','2015_52','2016_01',
'2016_02','2016_03')
y1 <- runif(10,0.0,1.0)
y2 <- runif(10,0.5,2.0)
# Make the dataset ggplot friendly
df_wide <- data.table(x1, y1, y2)
df_long <- melt(df_wide, id = 'x1')
# Plot it
p <- ggplot(df_long, aes(x=x1,
y=value,
group=variable,
colour=variable )) + geom_line(size=1)
plot(p)
# Now, plot the same thing with the same lines and numbers,
# but with increased space between x-axis labels
# and / or space between x-axis grid lines.
Plot1
The plot looks like this, and doesn't look too bad in it's current form:
Plot2
The problem occurs when the dataset gets bigger, and the labels on the x-axis start overlapping each other like this:
What I've tried so far:
I've made several attempts using scale_x_discrete as suggested here, but I've had no luck so far. What really bugs me is that I saw some tutorial about these things a while back, but despite two days of intense googling I just can't find it. I'm going to update this section when I try new things.
I'm looking forward to your suggestions!
As mentioned above, assuming that x1 represents a year_day, ggplot provides sensible defaults for date scales.
First make x1 into a valid date format, then plot as you already did:
df_long$x1 <- strptime(as.character(df_long$x1), format="%Y_%j")
ggplot(df_long, aes(x=x1, y=value, group=variable, colour=variable)) +
geom_line(size=1)
The plot looks a little odd because of the disconnected time series, but scales_x_date() provides an easy way to customize the axis:
http://docs.ggplot2.org/current/scale_date.html

colour single ggplot axis item

I have created a chart and am wanting to colour one of the x-axis items based on a variable. I have seen this post (How to get axis ticks labels with different colors within a single axis for a ggplot graph?), but am struggling to apply it to my dataset.
df1 <- data.frame(var=c("a","b","c","a","b","c","a","b","c"),
val=c(99,120,79,22,43,53,12,27,31),
type=c("alpha","alpha","alpha","bravo","bravo","bravo","charlie","charlie","charlie"))
myvar="a"
ggplot(df1,aes(x=reorder(var,-val), y=val,fill=type)) + geom_bar(stat="identity")
Any tips on how to make the x-axis value red when it is equal to myvar?
Update: Thanks to #ddiez for some guidance. I finally came around to the fact that i would have to reorder prior to plotting. I also should have made my original example with data.table, so am not sure if this would influenced original responses. I modified my original dataset to be a data.table and used the following code to achieve success.
df1 <- data.table(var=c("a","b","c","a","b","c","a","b","c"),
val=c(99,120,79,22,43,53,12,27,31),
type=c("alpha","alpha","alpha","bravo","bravo","bravo","charlie","charlie","charlie"))
myvar="a"
df1[,axisColour := ifelse(var==myvar,"red","black")]
df1$var <- reorder(df1$var,-df1$val,sum)
setkey(df1,var,type)
ggplot(df1,aes(x=var, y=val,fill=type)) + geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=df1[,axisColour[1],by=var][,V1]))
There may be a more elegant solution but a quick hack (requires you to know the final order) would be:
ggplot(df1,aes(x=reorder(var,-val), y=val,fill=type)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=c("black","black","red")))
A solution using the variable myvar (yet, there may be better ways):
# reorder the factors in the data.frame (instead of in situ).
df1$var=reorder(df1$var, -df1$val)
# create a vector of colors for each level.
mycol=rep("black", nlevels(df1$var))
names(mycol)=levels(df1$var)
# assign the desired ones a different color.
mycol[myvar]="red"
ggplot(df1,aes(x=var, y=val,fill=type)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=mycol))

Resources