ggplot geom_text_repel text exceeding the limit of plot - r

How can I prevent geom_text_repel() to display part of the labels when labels are close to plot boundary. Here is an example with a facet_grid e.g. in chr3 facet the label on the top "ZNF717" is not completely displayed.
example with mtcars with forcing 20 facets and long labels :
mtcars %>%
rowwise() %>%
mutate(label="test_label") %>%
mutate(facet=runif(n = n(),min = 1,max=20)) %>%
ggplot(aes(x=disp,y=hp,label=label)) +
geom_text_repel() +
facet_grid(~facet)

Each panel is self contained and by default plotting is limited to the plotting area. This can be overridden by modifying the default coordinates. With this extreme example, using facet_wrap() with two rows was needed. I also decreased the font size of the labels, and restricted repulsion so that it moves labels only vertically. (Obviously tick labels and panel names would need to be tweaked further in real use.)
library(ggplot2)
library(ggrepel)
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(label="test_label") %>%
mutate(facet=runif(n = n(),min = 1,max=20)) %>%
ggplot(aes(x=disp,y=hp,label=label)) +
geom_text_repel(direction = "y", hjust = 0.5, size = 2) +
facet_wrap(~facet, nrow = 2) +
coord_cartesian(clip = "off")
The code above answers the question but creates a new problem at least in the mtcars example as geoms work on a panel by panel basis, the repulsion cannot prevent overlap of labels that extend into neighbouring panels. Surprisingly, in addition some unexpected clipping on the left side takes place when saving to bitmap formats but not when saving to PDF (at least within RStudio).
A further option, is to make sure that the labels fit in the available space by using using the angle aesthetic to rotate the labels, or abbreviating the text used for labels.

Related

Aligning groups of points and of boxplots in ggplotly

I am trying to interactively show both points and boxplots of the same data in a ggplotly situation.
"dodged" positioning does the job in ggplot, but when passing to plotly positioning goes off--how do I get boxes and points to line up? (Essentially throwing points on top of this question. I also realize that an answer to this question would likely also be an answer to my question, though there may be more answers for my issue.)
What I want is for both layers to show up together, even when a group is missing at a location (either centered or in the group location), for examply like so:
What I get with interactivity so far is this:
library(plotly)
mtcars_boxplot <- mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
ggplot(aes(y=mpg, x=cyl)) +
geom_boxplot(aes(color=vs), position=position_dodge())+
geom_point(aes(color=vs), position=position_jitterdodge(), size = 0.5)
mtcars_boxplot %>%
ggplotly() %>%
layout(boxmode='group')
You can see that for cyl=8, the points are centered, but the box shows up in its group's location.
My question is: how do I get an interactive version of the first image, or something similar (preferably using ggplotly)?
I found a way to do this--not with ggplot, but pure plotly:
mtcars_boxplot <- mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
plot_ly(type="box",
x = ~cyl,
y = ~mpg,
color = ~vs,
alignmentgroup = ~MOTART,
boxpoints = "all",
pointpos = 0,
jitter = 1) %>%
layout(boxmode='group')
If there is a ggplotly-answer, I would still love to know that one. (This actually ends up aligning more nicely, but is also more work when working in ggplot otherwise.)

Possible to force non-occurring elements to show in ggplot legend?

I'm plotting a sort of chloropleth of up to three selectable species abundances across a research area. This toy code behaves as expected and does almost what I want:
library(dplyr)
library(ggplot2)
square <- expand.grid(X=0:10, Y=0:10)
sq2 <- square[rep(row.names(square), 2),] %>%
arrange(X,Y) %>%
mutate(SPEC = rep(c('red','blue'),len=n())) %>%
mutate(POP = ifelse(SPEC %in% 'red', X, Y)) %>%
group_by(X,Y) %>%
mutate(CLR = rgb(X/10,0,Y/10)) %>% ungroup()
ggplot(sq2, aes(x=X, y=Y, fill=CLR)) + geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=c('red','blue'), breaks=c('#FF0000','#0000FF'))
Producing this:
A modified version properly plots the real map, appropriately mixing the RGBs to show the species proportions per map unit. But given that mixing, the real data does not necessarily include the specific values listed in breaks, in which case no entry appears in the legend for that species. If you change the last line of the example to
labels=c('red','blue','green'), breaks=c('#FF0000','#0000FF','#00FF00'))
you get the same legend as shown, with only 'red' and 'blue' displayed, as there is no green in it. Searching the data for each max(Species) and assigning those to the legend is possible but won't make good legend keys for species that only occur in low proportions. What's needed is for the legend to display the idea of the entities present, not their attested presences -- three colors in the legend even if only one species is detected.
I'd think that scale_fill_manual() or the override.aes argument might help me here but I haven't been able to make any combination work.
Edit: Episode IV -- A New Dead End
(Thanks #r2evans for fixing my omission of packages.)
I thought I might be able to trick the legend by mutating a further column into the df in the processing pipe called spCLR to represent the color ('#FF0000', e.g.) that codes each entry's species (redundant info, but fine). Now the plotting call in my real version goes:
df %>% [everything] %>%
ggplot(aes(x = X, y = Y, height = WIDTH, width = WIDTH, fill = CLR)) +
geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=spCODE, breaks=spCLR)
But this gives the error: Error in check_breaks_labels(breaks, labels) : object 'spCLR' not found. That seems weird since spCLR is indeed in the pipe-modified df, and of all the values supplied to the ggplot functions spCODE is the only one present in the original df -- so if there's some kind of scope problem I don't get it. [Re-edit -- I see that neither labels nor breaks wants to look at df$anything. Anyway.]
I assume (rightly?) there's some way to make this one work [?], but it still wouldn't make the legend show 'red', 'blue' and 'green' in my toy example -- which is what my original question is really about -- because there is still no actual green-data present in that. So to reiterate, isn't there any way to force a ggplot2 legend to show the things you want to talk about, rather than just the ones that are present in the data?
I have belatedly discovered that my question is a near-duplicate of this. The accepted answer there (from #joran) doesn't work for this but the second answer (from #Axeman) does. So the way for me to go here is that the last line should be
labels=c('red','blue','green'), limits=c('#FF0000','#0000FF','#00FF00'))
calling limits() instead of breaks(), and now my example and my real version work as desired.
I have to say I spent a lot of time digging around in the ggplot2 reference without ever gaining a suspicion that limits() was the correct alternative to breaks() -- which is explicitly mentioned in that ref page while limits() does not appear. The ?limits() page is quite uninformative, and I can't find anything that lays out the distinctions between the two: when this rather than that.
I assume from the heatmap use case that you have no other need for colour mapping in the chart. In this case, a possible workaround is to leave the fill scale alone, & create an invisible geom layer with colour aesthetic mapping to generate the desired legend instead:
ggplot(sq2, aes(x=X, y=Y)) +
geom_tile(aes(fill = CLR)) + # move fill mapping here so new point layer doesn't inherit it
scale_fill_identity() + # scale_*_identity has guide set to FALSE by default
# add invisible layer with colour (not fill) mapping, within x/y coordinates within
# same range as geom_tile layer above
geom_point(data = . %>%
slice(1:3) %>%
# optional: list colours in the desired label order
mutate(col = forcats::fct_inorder(c("red", "blue", "green"))),
aes(colour = col),
alpha = 0) +
# add colour scale with alpha set to 1 (overriding alpha = 0 above),
# also make the shape square & larger to mimic the default legend keys
# associated with fill scale
scale_color_manual(name = "Species",
values = c("red" = '#FF0000', "blue" = '#0000FF', "green" = '#00FF00'),
guide = guide_legend(override.aes = list(alpha = 1, shape = 15, size = 5)))

ggplot2 multi-variable scatterplot, Changing Labels and View in Margins

I am trying to create a scatterplot based on four values. My data is just lists of prices (BASIC,VALUE,DELUXE,ULTIMATE). I want VALUE and DELUXE to be the two axis (x,y) and then have the size and color of the points represent the data for the other two columns.
It is hard to set up a reproducible example, because it is only an issue when I get a lot of values listed. i have about 300 points, with about 30 different color/value labels(For ULTIMATE, and 20 size/value labels(For BASIC)
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1)
> plot(gg)
My code does this well, and lists the colors/size with the corresponding value on the side. This is great, but I would like to alter how that is displayed, so that it is not cut off. I would like to be able to "wrap" the values into more columns, or shrink the display size of those so that they fit.
Currently, this lists ULTIMATE in three columns, to the right of the plot area, but cuts off the top of the labels (it extends well above the plot area)
This lists BASIC size/value labels to the right of the plot area, below ULTIMATE labels, in one column, so about half are cut off at the bottom.
I can increase the margins with:
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) +theme(plot.margin = unit(c(4,2,4,2), "cm"))
> plot(gg)
This gets more of it in, but creates lots of white area and a smaller view of the plot. I would like to be able to just increase the right margin if necessary, and "wrap" the labels in more columns extending to the right. (i.e. put ULTIMATE into 4 columns instead of 3, and put BASIC into 3-4 columns instead of 1 - So that they are shorter and don't run out the plot area.
There is some built in functionality I found to do the required operation. It lies in adding a guides() argument to the plot, specifying whether I am dealing with the color or size legend, and specifying the number of columns with "ncol = " (You can also specify rows). Giving it an order ranking allows you to rank these as well, so my resulting code was:
> gg <- ggplot(Table, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) + guides(color = guide_legend(order = 0,ncol = 4),size = guide_legend(order = 1,ncol = 4))

ggplot facet_wrap_paginate pages grouped by variable

I'm working on creating some harvest plots for a paper and am stumbling a bit with the code. I have recreated the important bits using 'diamonds' so that it's easier for people to recreate
Part 1
The aim is to create a bar chart that will facet by multiple variables, e.g. 'carat' and 'color', as these will act as the titles for the plots. I've used ggforce's paginate to allow me to spread it over multiple pages, as I'd like each page to show results by a group - here I've added values '1', '2', or '3' to each row of the dataframe. Whilst I could subset the dataframe and create the plots individually, the issue is that the widths of the bars aren't consistent between pages, even when I add width = x to geom_bar (though the widths are the same within each page).
Does anyone have an idea of how I can accomplish this? I was wondering if aes_string would help, but wasn't sure it'd work with the multiple facets I need.
Part 2
When I try and add in some code to save the images it overrides the grid.arrange ... command to specify plot size (so they are all consistent) and adjusts to fill the white space. Is this easily fixed?
Thanks,
Cal
library(ggplot2)
library(ggforce)
library(plyr)
library(dplyr)
library(grid)
library(egg)
df = diamonds
df$Group<- rep(1:3,length.out=nrow(df))
for (i in df$Group) {
p <- ggplot(data=df, aes(x=cut, y=clarity, fill=price)) +
# preserve = single keeps all bars same width, rather than adjusting to
# the space
geom_bar(position=position_dodge2(preserve = 'single'),
stat="identity", color = "black", size = 0.2) +
# paginate allows the chart to be printed on multiple pages
# strip position adds facet title to top of page
facet_wrap_paginate(c("carat","color"), ncol = 3, nrow = 3,
scales = "fixed", strip.position = "top", page = i)
# manually adjust the size of the plot panels
grid.arrange(grobs = lapply(
list(p),
set_panel_size,
width = unit(8,"cm"),
height = unit(5,"cm")
))
}

ggplot axis tick labels getting cut off [duplicate]

I have a plot that is a simple barplot of number of each type of an event. I need the labels of the plot to be under the plot as some of the events have very long names and were squashing the plot sideways. I tried to move the labels underneath the plot but it now gets squashed upwards when there are lots of event types. Is there a way of having a static plot size (i.e. for the bar graph) so that long legends don't squash the plot?
My code:
ggplot(counts_df, aes(x = Var2, y = value, fill - Var1)+
geom_bar(stat = "identity") +
theme(legend.position = "bottom") +
theme(legen.direction = "vertical") +
theme(axis.text.x = element_text(angle = -90)
The result:
I think this is because the image size must be static so the plot gets sacrificed for the axis. The same thing happens when I put a legend beneath the plot.
There a several ways to avoid overplotting of labels or squeezing the plot area or to improve readability in general. Which of the proposed solutions is most suitable will depend on the lengths of the labels and the number of bars, and a number of other factors. So, you will probably have to play around.
Dummy data
Unfortunately, the OP hasn't included a reproducible example, so we we have to make up our own data:
V1 <- c("Long label", "Longer label", "An even longer label",
"A very, very long label", "An extremely long label",
"Long, longer, longest label of all possible labels",
"Another label", "Short", "Not so short label")
df <- data.frame(V1, V2 = nchar(V1))
yaxis_label <- "A rather long axis label of character counts"
"Standard" bar chart
Labels on the x-axis are printed upright, overplotting each other:
library(ggplot2) # version 2.2.0+
p <- ggplot(df, aes(V1, V2)) + geom_col() + xlab(NULL) +
ylab(yaxis_label)
p
Note that the recently added geom_col() instead of geom_bar(stat="identity") is being used.
OP's approach: rotate labels
Labels on x-axis are rotated by 90° degrees, squeezing the plot area:
p + theme(axis.text.x = element_text(angle = 90))
Horizontal bar chart
All labels (including the y-axis label) are printed upright, improving readability but still squeezing the plot area (but to a lesser extent as the chart is in landscape format):
p + coord_flip()
Vertical bar chart with labels wrapped
Labels are printed upright, avoiding overplotting, squeezing of plot area is reduced. You may have to play around with the width parameter to stringr::str_wrap.
q <- p + aes(stringr::str_wrap(V1, 15), V2) + xlab(NULL) +
ylab(yaxis_label)
q
Horizontal bar chart with labels wrapped
My favorite approach: All labels are printed upright, improving readability,
squeezing of plot area are is reduced. Again, you may have to play around with the width parameter to stringr::str_wrap to control the number of lines the labels are split into.
q + coord_flip()
Addendum: Abbreviate labels using scale_x_discrete()
For the sake of completeness, it should be mentioned that ggplot2 is able to abbreviate labels. In this case, I find the result disappointing.
p + scale_x_discrete(labels = abbreviate)
To clarify, what this question appears to be asking about is how to specify the panel size in ggplot2.
I believe that the correct answer to this question is 'you just can't do that'.
As of the present time, there does not seem to be any parameter that can be set in any ggplot2 function that would achieve this. If there was one, I think it would most likely be in the form of height and width arguments to an element_rect call within a call to theme (which is how we make other changes to the panel, e.g. altering its background colour), but there's nothing resembling those in the docs for element_rect so my best guess is that specifying the panel size is impossible:
https://ggplot2.tidyverse.org/reference/element.html
The following reference is old but I can't find anything more up to date that positively confirms whether or not this is the case:
https://groups.google.com/forum/#!topic/ggplot2/nbhph_arQ7E
In that discussion, someone asks whether it is possible to specify the panel size, and Hadley says 'Not yet, but it's on my to do list'. That was nine years ago; I guess it's still on his to do list!
One more solution in addition to those above - use staggered labels. These can be used with text wrapping to get a fairly readable result:
p + scale_x_discrete(guide = ggplot2::guide_axis(n.dodge = 2),
labels = function(x) stringr::str_wrap(x, width = 20))
(Using the plot p from #Uwe's answer)
I found other methods didn't quite get what I wanted. I made this function to add a couple of dots after long names
tidy_name <- function(name, n_char) {
ifelse(nchar(name) > (n_char - 2),
{substr(name, 1, n_char) %>% paste0(., "..")},
name)
}
vec <- c("short", "medium string", "very long string which will be shortened")
vec %>% tidy_name(20)
# [1] "short" "medium string" "very long string whi.."

Resources