Remove unused facet combinations in 2-way facet_grid - r

I have two factors and two continuous variables, and I use this to create a two-way facet plot using ggplot2. However, not all of my factor combinations have data, so I end up with dummy facets. Here's some dummy code to produce an equivalent output:
library(ggplot2)
dummy<-data.frame(x=rnorm(60),y=rnorm(60),
col=rep(c("A","B","C","B","C","C"),each=10),
row=rep(c("a","a","a","b","b","c"),each=10))
ggplot(data=dummy,aes(x=x,y=y))+
geom_point()+
facet_grid(row~col)
This produces this figure
Is there any way to remove the facets that don't plot any data? And, ideally, move the x and y axis labels up or right to the remaining plots? As shown in this GIMPed version
I've searched here and elsewhere and unless my search terms just aren't good enough, I can't find the same problem anywhere. Similar issues are often with unused factor levels, but here no factor level is unused, just factor level combinations. So facet_grid(drop=TRUE) or ggplot(data=droplevel(dummy)) doesn't help here. Combining the factors into a single factor and dropping unused levels of the new factor can only produce a 1-dimensional facet grid, which isn't what I want.
Note: my actual data has a third factor level which I represent by different point colours. Thus a single-plot solution allowing me to retain a legend would be ideal.

It's not too difficult to rearrange the graphical objects (grobs) manually to achieve what you're after.
Load the necessary libraries.
library(grid);
library(gtable);
Turn your ggplot2 plot into a grob.
gg <- ggplot(data = dummy, aes(x = x,y = y)) +
geom_point() +
facet_grid(row ~ col);
grob <- ggplotGrob(gg);
Working out which facets to remove, and which axes to move where depends on the grid-structure of your grob. gtable_show_layout(grob) gives a visual representation of your grid structure, where numbers like (7, 4) denote a panel in row 7 and column 4.
Remove the empty facets.
# Remove facets
idx <- which(grob$layout$name %in% c("panel-2-1", "panel-3-1", "panel-3-2"));
for (i in idx) grob$grobs[[i]] <- nullGrob();
Move the x axes up.
# Move x axes up
# axis-b-1 needs to move up 4 rows
# axis-b-2 needs to move up 2 rows
idx <- which(grob$layout$name %in% c("axis-b-1", "axis-b-2"));
grob$layout[idx, c("t", "b")] <- grob$layout[idx, c("t", "b")] - c(4, 2);
Move the y axes to the right.
# Move y axes right
# axis-l-2 needs to move 2 columns to the right
# axis-l-3 needs ot move 4 columns to the right
idx <- which(grob$layout$name %in% c("axis-l-2", "axis-l-3"));
grob$layout[idx, c("l", "r")] <- grob$layout[idx, c("l", "r")] + c(2, 4);
Plot.
# Plot
grid.newpage();
grid.draw(grob);
Extending this to more facets is straightforward.

Maurits Evers solution worked great, but is quite cumbersome to modify.
An alternative solution is to use facet_manual from {ggh4x}.
This is not equivalent though as it uses facet_wrap, but allows appropriate placement of the facets.
# devtools::install_github("teunbrand/ggh4x")
library(ggplot2)
dummy<-data.frame(x=rnorm(60),y=rnorm(60),
col=rep(c("A","B","C","B","C","C"),each=10),
row=rep(c("a","a","a","b","b","c"),each=10))
design <- "
ABC
#DE
##F
"
ggplot(data=dummy,aes(x=x,y=y))+
geom_point()+
ggh4x::facet_manual(vars(row,col), design = design, labeller = label_both)
Created on 2022-02-25 by the reprex package (v2.0.0)

One possible solution, of course, would be to create a plot for each factor combination separately and then combine them using grid.arrange() from gridExtra. This would probably lose my legend and would be an all around pain, would love to hear if anyone has any better suggestions.

This particular case looks like a job for ggpairs (link to a SO example). I haven't used it myself, but for paired plots this seems like the best tool for the job.
In a more general case, where you're not looking for pairs, you could try creating a column with a composite (pasted) factor and facet_grid or facet_wrap by that variable (example)

Related

Unexpected behaviour when re-ordering facets (for ggplot2)

I'd like some help understanding an error so that it can't happen again.
I was producing some (gg) plots and wanted to change the order of facets for aesthetic reasons. The way I did this had unexpected consequences and almost slipped through the net when I was checking the results - it could have caused serious problems with the article I'm working on!
I wanted to re-order the facets based on a numerical vector that I could define up-front
E.g. facet_order=c(1,2,4,3). This was so the graph syntax could be copied / pasted for repeat graphs more easily and I wouldn't have to dig around too much in the code each time.
# some example data:
df <- data.frame(x=c(1,2,3,4), y=c(1,2,3,4), facet_var=factor(c('A','B','C','D')))
# First plot (facet order defined by default):
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var, nrow = 1)+labs(title='Original data')
In the second plot, facets 'C' and 'D' are swapped as intended:
# reorder facets (normal method)
df$facet_var2 <- factor(df$facet_var, levels=c('A','B','D','C')) # Set the facets var
as a factor, to define the order
# Second plot:
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var2, nrow = 1)+labs(title='Re-
ordered facets', subtitle='working as expected')
However, this is the mistake I made:
# different syntax to reorder the facets
df$facet_var3 <- df$facet_var # duplicate the faceting variable
levels(df$facet_var3) <- levels(df$facet_var3)[c(1,2,4,3)] # I thought I was just
re-ordering the levels here
# Third plot:
ggplot(df, aes(x,y))+geom_point()+facet_wrap(~facet_var3, nrow = 1)+labs(title='Re-
ordered facets (method 2)',subtitle='Unexpected behaviour')
In the third graph, it looks like the data doesn't move, but the facet labels do, which is obviously wrong.
Digging a bit deeper, it appears that my syntax changed not only the order of the factor, but actually the underlying data in the factor variable. Is this behaviour expected?
Here's the crux of it:
facet_order <- c(1,2,4,3)
levels(df$facet_var) <- levels(df$facet_var)[facet_order] # bad
df$facet_var <- factor(df$facet_var, levels=c(levels(df$facet_var)[facet_order)) #
good
Obviously I now know the solution but I'm still unclear what I actually did wrong here. Any pointers?
Hang on while I try and fix the images:
quick'n'dirty: posterior reordering with fct_reorder of {forcats} (part of tidyverse):
ggplot(df, aes(x,y)) +
geom_point() +
facet_wrap(~ fct_reorder(facet_var, c('B','A','D','C')),
nrow = 1)

R - Bar Plot with transparency based on values?

I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale

Plotting two legends side by side or one legend with two columns

I have a data.frame that includes x and y values which I want to plot (y vs. x). There are two factors, one with three levels and the other with two levels, to which each points is assigned:
df = data.frame(x=c(1:90),y=runif(90,5,10),pch=c(rep(0,30),rep(5,30),rep(6,30)),col=c(rep("red",45),rep("blue",45)),cell=c(rep("A",30),rep("B",30),rep("C",30)),group=c(rep("p",45),rep("v",45)))
As you can see, the factors are: cell and group, with respective levels: c("A","B","C") for cell and c("p","v") for group. I have three different shapes (pch) for the cell factor levels and two different colors for the group factor levels.
I want to plot the df$y vs. df$x points with pch and colors specified according to df$pch and df$col and have two legends side by side: one for the cell factor levels and the other for the group factor levels or one legend with two columns of both factors.
So far this is what I'm playing around with:
plot(df$x,df$y,pch=df$pch,col=as.character(df$col),xlim=c(min(df$x),max(df$x)+5),ylim=c(min(df$y),max(df$y)+2))
legend("topright",title="Cell",legend=c("A","B","C"),col="black",pch=c(0,5,6),bty="n",border=F)
legend(x=75,y=12,title="Group",legend=c("p","v"),col=c("red","blue"),lty=c(1,1),bty="n",border=F)
Which produces this plot:
Which I'm not happy with since I need to adjust the locations of the two legends to get them properly aligned. I'm wondering whether there's a better, more automatic, way to achieve this.
On the same note, it would be nice to know if there's also an automatic way to figure out how much extra space in the plot is needed to fit the legends and specify that in the xlim and ylim rather than manually adjusting them.
One last thing - if possible I'd prefer a solution that's not ggplot
This is a bit hackish, but if you specify the ncol parameter in legend, you can force the legend to have multiple columns. That way you combine your two legend calls into one and let R handle the column creation/spacing.
The hackish part is that you then just need to manually set up the spacing on the legend title so that your "Cell" and "Group" labels fall where they need to be:
plot(df$x, df$y, pch=df$pch, col=as.character(df$col),
xlim=c(min(df$x),max(df$x)+5), ylim=c(min(df$y),max(df$y)+2))
legend("topright", title="Cell Group", # << THIS IS THE HACKISH PART
legend=c("A","B","C","p","v"),
col=c(rep("black",3),'red','blue'), pch=c(0,5,6,1,1),
bty="n", border=F, ncol=2)
You might want to have a look at ggplot2
ggplot(df, aes(y = y, x = x,shape = cell, colour = group)) +
geom_point(aes(group = interaction(group,cell)))
which produces:
Docs on ggplot2 http://docs.ggplot2.org/0.9.3.1/
Also, forgot to add:
read ?interaction which is what I used in my geom_point call to compute a factor
that represents interaction between cell and group.

Making ordered heat maps in qplot (ggplot2)

I am making heat maps from correlations. I have two columns that represent ID's and a third column that gives the correlation between those two datapoints. I am struggling to get qplot to keep the order of my data in the file. Link to data:
https://www.dropbox.com/s/3l9p1od5vjt0p4d/SNPS.txt?n=7399684
Here is the code I am using to make the plot:
test <- qplot(x=x, y=y, data=PCIT, fill = col1, geom = "tile")
I have tried several order options but they don't seem to do the trick? Ideas?
Thanks and Happy Holidays
You need to set the levels of the factors x and y to be in the order you want them (as they come in from the file). Try
PCIT$x <- factor(PCIT$x, levels=unique(as.character(PCIT$x)))
and similarly with y.

How to better create stacked bar graphs with multiple variables from ggplot2?

I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how to do two things:
First, I would like to be able to add proper percentage tick marks for each variable rather than tick marks by count. Counts would be confusing, which is why I take out the axis labels completely.
Second, there must be a simpler way to reorganize my data to make this happen. It seems like the sort of thing I should be able to do natively in ggplot2 with plyR, but the documentation for plyR is not very clear (and I have read both the ggplot2 book and the online plyR documentation.
My best graph looks like this, the code to create it follows:
The R code I use to get it is the following:
library(epicalc)
### recode the variables to factors ###
recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA),
c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA))
### Combine recoded variables to a common vector
Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ)
### Create a second vector to label the first vector by original variable ###
a1<-rep("News about Bangladesh", length(int_newcoun))
a2<-rep("Neighboring Countries", length(int_newneigh))
[...]
a17<-rep("Education", length(int_educ))
Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17)
### Create a Weighting vector of the proper length ###
Interest.weight<-rep(weight, 17)
### Make and save a new data frame from the three vectors ###
Interest.df<-cbind(Interest1, Interest2, Interest.weight)
Interest.df<-as.data.frame(Interest.df)
write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv')
### Sort the factor levels to display properly ###
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education')
[...]
Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics')
detach(Interest)
attach(Interest)
### Finally create the graph in ggplot2 ###
library(ggplot2)
p<-ggplot(Interest, aes(Interest2, ..count..))
p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1)))
p<-p+coord_flip()
p<-p+scale_y_continuous("", breaks=NA)
p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples")))
p
update_labels(p, list(fill='', x='', y=''))
I'd very much appreciate any tips, tricks or hints.
Your second problem can be solved with melt and cast from the reshape package
After you've factored the elements in your data.frame called you can use something like:
install.packages("reshape")
library(reshape)
x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations
x <- cast(x, variable + value ~., length)
colnames(x) <- c("variable","value","freq")
## Presto!
ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")
As an aside, I like to use grep to pull in columns from a messy import. For example:
x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"
And factoring is easier when you don't have to type c(' ', ...) a million times.
for(x in 1:ncol(x)) {
df[,x] <- factor(df[,x], labels = strsplit('
Very Interested
Somewhat Interested
Not Very Interested
Not At All interested
NA
NA
NA
NA
NA
NA
', '\n')[[1]][-1]
}
You don't need prop.tables or count etc to do the 100% stacked bars. You just need +geom_bar(position="stack")
About percentages insted of ..count.. , try:
ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()
but since it's not a good idea to shove a function into the aes(), you can write custom function to create percentages out of ..count.. , round it to n decimals etc.
You labeled this post with plyr, but I don't see any plyr in action here, and I bet that one ddply() can do the job. Online plyr documentation should suffice.
If I am understanding you correctly, to fix the axis labeling problem make the following change:
# p<-ggplot(Interest, aes(Interest2, ..count..))
p<-ggplot(Interest, aes(Interest2, ..density..))
As for the second one, I think you would be better off working with the reshape package. You can use it to aggregate data into groups very easily.
In reference to aL3xa's comment below...
library(ggplot2)
r<-rnorm(1000)
d<-as.data.frame(cbind(r,1:1000))
ggplot(d,aes(r,..density..))+geom_bar()
Returns...
alt text http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png
The bins are now densities...
Your first question: Would this help?
geom_bar(aes(y=..count../sum(..count..)))
Your second question; could you use reorder to sort the bars? Something like
aes(reorder(Interest, Value, mean), Value)
(just back from a seven hour drive - am tired - but I guess it should work)

Resources