Formatting legend to fit 45+ legend items in R - r

I need help formatting a legend in ggplot2. I have approximatley 45 legened items. When I display the legend, my graph shrinks becuase the graph and legend items don't fit. I'm wondering how I can get all my legend items to display, but also have a reasonably sized graph. Is there a way to make my longer legend items go over multiple lines? Or, is there a way to make some legend items occupy more of the white space above/below the page? Any help will be super appreciated! Below is a screenshot of my current plot, along with my code.
guild_chart <-
ggplot(chart, aes(x=factor(Site,level=level_order1), y=`Row 1`, fill=Label)) +
geom_bar(stat="identity") +
scale_fill_manual(values =colfundose) +
theme_bw()+ ylab("# of reads") +
xlab("Location")

A frame challenge, if I may:
This is probably a bad way to visualise this data. The groups are impossible to distinguish from one another, and very difficult to compare. What is the purpose of this graph? What information do you wish to convey to the viewer? With that question in mind, think about how you can design the graph in a legible way.
To increase legibility, I would consider combining factors into groups and visualising these instead of the individual levels you are currently displaying.

As others have noted, presenting your data in this stacked bar layout is difficult to interpret. In addition to the challenge in discriminating between different groups, its also tough to estimate the number of reads for any sort of comparison.
As an alternative presentation, would it make sense to visualize these read counts as a heatmap? You could have a column (or row) for each of your seven locations containing 45 squares, colored to indicate # of reads. Now your legend is a color gradient with the range of read counts across the dataset. An advantage here is you can keep your 45 categories, if this is important, but have them right next to their respective rows, minimizing lookups in a legend.

Related

ggplot draw multiple plots by levels of a variable

I have a sample dataset
d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)
And I want to draw two separate histograms based on group variable.
I tried this method suggested by #jenesaisquoi in a separate post here
Generating Multiple Plots in ggplot by Factor
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)
It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.
Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.
d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)
Finally I tried factoring with color
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)
But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.
In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.
By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)
You actually have the separation of charts by variable correct! Especially with ggplot, you sometimes need to consider the scales of the graph separately from the shape. Facet_wrap applies a new layer to your data, regardless of scale. It will behave the same, no matter what your axes are. You could also try adding scale_y_log10() as a layer, and you'll notice that the overall shape and style of your graph is the same, you've just changed the axes.
What you actually need is a fix to your scales. Understandable - frequency plots can be confusing. ..count../sum(..count..)) treats each bin as an independent unit, regardless of its value. See a good explanation of this here: Show % instead of counts in charts of categorical variables
What you want is ..density.., which is basically the count divided by the total count. The difference is subtle in principle, but the important bit is that the value on the x-axis matters. For an extreme case of this, see here: Normalizing y-axis in histograms in R ggplot to proportion, where tiny x-axis values produced huge densities.
Your original code will still work, just substituting the aesthetics I described above.
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..,)binwidth = 1)+facet_wrap(~group)
If you're still confused about density, so are lots of people. Hadley Wickham wrote a long piece about it, you can find that here: http://vita.had.co.nz/papers/density-estimation.pdf

How to enlarge the size of facet in R

I'm working on the following dataset where each facet shows the bleaching for one kind of coral at one site across the time period. My problem is how to enlarge the size of each facet to see the trend more clearly, as in current facets, it is hard to see the trend because of the small change in bleaching....
here is my code,
cb1<-aggregate(cb$latitude, list(Site=cb$site), mean)
cb$site=factor(cb$site, levels=cb1$Site[order(cb1$x)])
ggplot(cb,aes(year,bleaching)) +
geom_point() +
facet_grid(site~kind) +
geom_smooth(method="lm",color="grey") +
coord_cartesian(ylim=c(0,1))
due to the current size of the grid of facets, some lines seem flat but actually they are not.
You cannot really increase the sizes of the facets unless you increases the size of the plot overall. One option would be to save a large version of the plot:
p<-ggplot(cb,aes(year,bleaching))+geom_point()+facet_grid(site~kind)+geom_smooth(method="lm",color="grey")+coord_cartesian(ylim=c(0,1))
ggsave("file_name.jpg", plot = p, width = 24, height = 24, units = "in")
If you have limited space (e.g. the plot has to go on an A4 sheet) then the facet_grid_paginate function from ggforce would be a good option. It allows you to split faceted plots over multiple pages. You can define the number of rows and columns per page. See this link.
Alternatively, if you want to show that the lines are not flat more clearly, you can try toying with a couple of the arguments to facet_grid. facet_grid allows you to set the scales to free, free_x or free_y. Setting free_y would mean that each facet has its own y-axis (not necessarily between 0 and one (assuming you also removed the ylim=c(0,1). This would, however, make the the facets more difficult to compare with each other.

Making one variable be shapes of different colors (ggplot2)

So right now I've got this plot:
my plot
(sorry it's not inline image, this is my first time on Stack Overflow and it wouldn't let me post images)
The plot is produced with this code:
ggplot(potassium.data,
aes(x=Experiment,y=value,
colour=Pedigree))+geom_jitter()+labs(title=element)
The problem is, there are 31 different maize pedigrees being plotted here, so it's difficult to distinguish the colors from each other. I was wondering if it's possible to make it so that the color and shape of the point are used to uniquely identify a pedigree, so that for example one pedigree is red squares, another is red circles, a third one is blue squares, a fourth is blue circles, and so on. This would make it far easier to distinguish the points. Anyone know how to do this?
I don't think thats possible, if you do the shaping by pedigree you will just end up with as many categories of shapes as you have colors now.
geom_label() and geom_text() would let you plot the cultivar id directly onto the plot, then maybe you could build a separate column for something equivalent to genus, so that the cultivars could be grouped somehow (maybe A, B, PH, etc). Then you could color by that "genus" column, which would make the plot look better:
ggplot(potassium.data,
aes(x=Experiment,y=value, label=Pedigree, colour = genus))+
geom_label(position = position_jitter())+
labs(title=element)
Ideally you would end up with a plot colored by the genus while only plotting the suffix digits currently in Pedigree.
I have to agree with Nathan and Joran, the plot is quite confusing by having so many different points and adding shapes into the mix is unlikely to help.
To answer your question you should be able to use shape=pedigree, but maybe to make the graph more readable you could join the pedigrees from one experiment to the other with a geom_line so the reader spends less time scanning.

How do I create a stacked area plot with many areas, or where the legend "points" at the respective areas?

Example plot:
http://i56.tinypic.com/eagjfn.jpg
Created with:
qplot(score, ..count.., data=df, fill=method, geom='density', position='stack')
Pretty much impossible to tell what goes with what. Any way to make this better? Ideally the legend draws lines "connecting" the areas to the item in the legend. Alternatively, I'd at least need some very different filling patterns for the areas.
The human eye does not do well distinguishing between more than 7-10 different categories whether they are indicated using color, shading or pattern. Adding lines or shadings here will, I think, only make this graph harder to read.
In situations like this, I often think that it's best to take a step back and rethink what message you intend for the graph to convey. Do you really need to compare all ~23 methods in a single graph, or can the methods be placed into subgroups and compared in multiple plots or facets? Are some of the methods' curves so similar that they could be combined into a single category?
For instance, I see ~3-4 natural groups just based on the similarity of the curves in your plot. You could plot a single, representative, method from each group to illustrate the large scale differences, and then create additional plots that focus in on the differences between methods within groups.

Adjusting the relative space of facets (without regard to coordinate space)

I have a primary graph and some secondary information that I want to facet in another graph below it. Facetting works great except I do not know how to control the relative space used by one facet versus another. Am aware of space='free' but this is only useful if the ranges correspond to the desired relative sizing.
So for instance, I may want a graph where the first facet occupies 80% and the second 20%. Here is an example:
data <- rbind(
data.frame(x=1:500, y=rnorm(500,sd=1), type='A'),
data.frame(x=1:500, y=rnorm(500,sd=5), type='B'))
ggplot() +
geom_line(aes(x=x, y=y, colour=type), data=data) +
facet_grid(type ~ ., scale='free_y')
The above creates 2 facets of equal vertical dimension. Adding in space='free' in the facet_grid function changes the dimensions such that the lower facet is roughly 5x larger than the upper (as expected).
Supposing I want the upper to be 2x as large, with the same data set and ordering of facets. How can I accomplish this?
Is the only way to do this with some trickery in rescaling the data set and manually overriding axis labels (and if so, how)?
Alternative
As indicated below can use viewports to render as multiple graphs. I had considered this and in-fact had implemented using this approach in the past with standard plot and viewports.
The problem is that it is very difficult to get x-axis to align with this approach. So if there is a way to fix the size of the y-axis label region and the size of the legend region, can produce 2 graphs that have the same rendering area.
You don't need to use facets for this - you can also do this by using the viewport function.
> ratio = 1/3
> v1 = viewport(width=1,height=ratio,y=1-ratio/2)
> v2 = viewport(width=1,height=1-ratio,y=(1-ratio)/2)
> print(qplot(1:10,11:20,geom="point"),vp=v1)
> print(qplot(1:10,11:20,geom="line"),vp=v2)
Ratio is the proportion of the top panel to the whole page. Try 2/3 and 4/5 as well.
This approach can get ugly if your legend or axis labels in the two plots are different sizes, but for a fix, see the align.plots function in the ggExtra package and ggplot2 author Hadley Wickam's notes on this very topic.
There's no easy way to do this with facets currently, although if you are prepared to go down to editing the Grid, you can modify the ggplot graph after it has been plotted to get this effect.
See also this question on using grid and ggplot2 to create join plots using R.
Kohske Takahashi posted a patch to facet_grid that allows specification of the relative sizing of facets. See the thread:
http://groups.google.com/group/ggplot2/browse_thread/thread/7c5454dcc04bc7b8
With luck we'll see this in a future version of ggplot2.

Resources