Aligning groups of points and of boxplots in ggplotly - r

I am trying to interactively show both points and boxplots of the same data in a ggplotly situation.
"dodged" positioning does the job in ggplot, but when passing to plotly positioning goes off--how do I get boxes and points to line up? (Essentially throwing points on top of this question. I also realize that an answer to this question would likely also be an answer to my question, though there may be more answers for my issue.)
What I want is for both layers to show up together, even when a group is missing at a location (either centered or in the group location), for examply like so:
What I get with interactivity so far is this:
library(plotly)
mtcars_boxplot <- mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
ggplot(aes(y=mpg, x=cyl)) +
geom_boxplot(aes(color=vs), position=position_dodge())+
geom_point(aes(color=vs), position=position_jitterdodge(), size = 0.5)
mtcars_boxplot %>%
ggplotly() %>%
layout(boxmode='group')
You can see that for cyl=8, the points are centered, but the box shows up in its group's location.
My question is: how do I get an interactive version of the first image, or something similar (preferably using ggplotly)?

I found a way to do this--not with ggplot, but pure plotly:
mtcars_boxplot <- mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
plot_ly(type="box",
x = ~cyl,
y = ~mpg,
color = ~vs,
alignmentgroup = ~MOTART,
boxpoints = "all",
pointpos = 0,
jitter = 1) %>%
layout(boxmode='group')
If there is a ggplotly-answer, I would still love to know that one. (This actually ends up aligning more nicely, but is also more work when working in ggplot otherwise.)

Related

Possible to force non-occurring elements to show in ggplot legend?

I'm plotting a sort of chloropleth of up to three selectable species abundances across a research area. This toy code behaves as expected and does almost what I want:
library(dplyr)
library(ggplot2)
square <- expand.grid(X=0:10, Y=0:10)
sq2 <- square[rep(row.names(square), 2),] %>%
arrange(X,Y) %>%
mutate(SPEC = rep(c('red','blue'),len=n())) %>%
mutate(POP = ifelse(SPEC %in% 'red', X, Y)) %>%
group_by(X,Y) %>%
mutate(CLR = rgb(X/10,0,Y/10)) %>% ungroup()
ggplot(sq2, aes(x=X, y=Y, fill=CLR)) + geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=c('red','blue'), breaks=c('#FF0000','#0000FF'))
Producing this:
A modified version properly plots the real map, appropriately mixing the RGBs to show the species proportions per map unit. But given that mixing, the real data does not necessarily include the specific values listed in breaks, in which case no entry appears in the legend for that species. If you change the last line of the example to
labels=c('red','blue','green'), breaks=c('#FF0000','#0000FF','#00FF00'))
you get the same legend as shown, with only 'red' and 'blue' displayed, as there is no green in it. Searching the data for each max(Species) and assigning those to the legend is possible but won't make good legend keys for species that only occur in low proportions. What's needed is for the legend to display the idea of the entities present, not their attested presences -- three colors in the legend even if only one species is detected.
I'd think that scale_fill_manual() or the override.aes argument might help me here but I haven't been able to make any combination work.
Edit: Episode IV -- A New Dead End
(Thanks #r2evans for fixing my omission of packages.)
I thought I might be able to trick the legend by mutating a further column into the df in the processing pipe called spCLR to represent the color ('#FF0000', e.g.) that codes each entry's species (redundant info, but fine). Now the plotting call in my real version goes:
df %>% [everything] %>%
ggplot(aes(x = X, y = Y, height = WIDTH, width = WIDTH, fill = CLR)) +
geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=spCODE, breaks=spCLR)
But this gives the error: Error in check_breaks_labels(breaks, labels) : object 'spCLR' not found. That seems weird since spCLR is indeed in the pipe-modified df, and of all the values supplied to the ggplot functions spCODE is the only one present in the original df -- so if there's some kind of scope problem I don't get it. [Re-edit -- I see that neither labels nor breaks wants to look at df$anything. Anyway.]
I assume (rightly?) there's some way to make this one work [?], but it still wouldn't make the legend show 'red', 'blue' and 'green' in my toy example -- which is what my original question is really about -- because there is still no actual green-data present in that. So to reiterate, isn't there any way to force a ggplot2 legend to show the things you want to talk about, rather than just the ones that are present in the data?
I have belatedly discovered that my question is a near-duplicate of this. The accepted answer there (from #joran) doesn't work for this but the second answer (from #Axeman) does. So the way for me to go here is that the last line should be
labels=c('red','blue','green'), limits=c('#FF0000','#0000FF','#00FF00'))
calling limits() instead of breaks(), and now my example and my real version work as desired.
I have to say I spent a lot of time digging around in the ggplot2 reference without ever gaining a suspicion that limits() was the correct alternative to breaks() -- which is explicitly mentioned in that ref page while limits() does not appear. The ?limits() page is quite uninformative, and I can't find anything that lays out the distinctions between the two: when this rather than that.
I assume from the heatmap use case that you have no other need for colour mapping in the chart. In this case, a possible workaround is to leave the fill scale alone, & create an invisible geom layer with colour aesthetic mapping to generate the desired legend instead:
ggplot(sq2, aes(x=X, y=Y)) +
geom_tile(aes(fill = CLR)) + # move fill mapping here so new point layer doesn't inherit it
scale_fill_identity() + # scale_*_identity has guide set to FALSE by default
# add invisible layer with colour (not fill) mapping, within x/y coordinates within
# same range as geom_tile layer above
geom_point(data = . %>%
slice(1:3) %>%
# optional: list colours in the desired label order
mutate(col = forcats::fct_inorder(c("red", "blue", "green"))),
aes(colour = col),
alpha = 0) +
# add colour scale with alpha set to 1 (overriding alpha = 0 above),
# also make the shape square & larger to mimic the default legend keys
# associated with fill scale
scale_color_manual(name = "Species",
values = c("red" = '#FF0000', "blue" = '#0000FF', "green" = '#00FF00'),
guide = guide_legend(override.aes = list(alpha = 1, shape = 15, size = 5)))

ggplot geom_text_repel text exceeding the limit of plot

How can I prevent geom_text_repel() to display part of the labels when labels are close to plot boundary. Here is an example with a facet_grid e.g. in chr3 facet the label on the top "ZNF717" is not completely displayed.
example with mtcars with forcing 20 facets and long labels :
mtcars %>%
rowwise() %>%
mutate(label="test_label") %>%
mutate(facet=runif(n = n(),min = 1,max=20)) %>%
ggplot(aes(x=disp,y=hp,label=label)) +
geom_text_repel() +
facet_grid(~facet)
Each panel is self contained and by default plotting is limited to the plotting area. This can be overridden by modifying the default coordinates. With this extreme example, using facet_wrap() with two rows was needed. I also decreased the font size of the labels, and restricted repulsion so that it moves labels only vertically. (Obviously tick labels and panel names would need to be tweaked further in real use.)
library(ggplot2)
library(ggrepel)
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(label="test_label") %>%
mutate(facet=runif(n = n(),min = 1,max=20)) %>%
ggplot(aes(x=disp,y=hp,label=label)) +
geom_text_repel(direction = "y", hjust = 0.5, size = 2) +
facet_wrap(~facet, nrow = 2) +
coord_cartesian(clip = "off")
The code above answers the question but creates a new problem at least in the mtcars example as geoms work on a panel by panel basis, the repulsion cannot prevent overlap of labels that extend into neighbouring panels. Surprisingly, in addition some unexpected clipping on the left side takes place when saving to bitmap formats but not when saving to PDF (at least within RStudio).
A further option, is to make sure that the labels fit in the available space by using using the angle aesthetic to rotate the labels, or abbreviating the text used for labels.

Highcharter: Adjust colours for multiple series

I am trying to add multiple layers to highcharter plots. I am not sure how to adjust colours for each layer independently. I want each group to have the same colour and the background polygon at lower opacity. Below is a working example. Perhaps there is a better way to build up layers.
library(highcharter)
data(iris)
hull <- data.frame(x=c(5.5,4.5,4.3,4.6,5.2,5.7,5.8,5.7,6.2,5,4.9,5.4,6,7,6.8,7.7,6,4.9,6.2,7.7,7.9),y=c(3.5,2.3,3,3.6,4.1,4.4,4,3.8,2.2,2,2.4,3,3.4,3.2,2.8,2.6,2.2,2.5,3.4,3.8,3.8),Species=c('setosa','setosa','setosa','setosa','setosa','setosa','setosa','setosa','versicolor','versicolor','versicolor','versicolor','versicolor','versicolor','versicolor','virginica','virginica','virginica','virginica','virginica','virginica'))
hchart(hull,"polygon",hcaes(x,y,group="Species",opacity=0.2)) %>%
hc_add_series(data=iris,type="scatter",hcaes(Sepal.Length,Sepal.Width,group="Species"),showInLegend=F) %>%
hc_colors(colors=c("#A6CEE3","#1F78B4","#B2DF8A","#33A02C"))
is this someway in the way you want it
hchart(hull,"polygon",hcaes(x,y,group="Species",opacity=0.5)) %>%
hc_add_series(
data=iris,type="scatter",
hcaes(
Sepal.Length,
Sepal.Width,
group="Species",
color = c(setosa = "#A6CEE3",versicolor = "#1F78B4",virginica = "#B2DF8A")[Species]
),
showInLegend=F
) %>%
hc_colors(colors=c("#A6CEE399","#1F78B499","#B2DF8A99","#33A02C99"))
You can add opacity to hex colors by adding two more characters at the end in this case I used 99
Hope this helps

Tornado-llike plot with two variables

Related question that uses three varibales is easier to do.
This should be seemigly simple but I couldn't get it to work. Here's a simple example:
test_me<-data.frame(A=c(-1.5,-5.6,-4.6,-7.8,0.98,0.07,-0.32,-0.4,-0.4),
B=c("A","A","A","B","B","B","C","C","C"))
The kind of plot(not shown to keep the post as concise as possible) I would like to make with ggplot2 done with base:
barplot(test_me$A,col=test_me$B,legend=test_me$B)
This gives me the kind of plot I need. However, barplot returns duplicated names in the legend and efforts to remove these were futile. I could use lattice or barchart but would prefer a solution that either replicates this in ggplot2 or removes the duplicated legend entries in base's output.
Here is one of several things I've tried:
library(ggplot2)
ggplot(test_me,aes(B,A,fill=B))+geom_col()
The above won't work with changes to position. How can I best make this plot? I tried to set manual legends with legend.text in barplot but that removes the "grouping".
EDIT:
The solution below might solve the issue but it leads to overlap in bars unlike the base equivalent. I would therefore prefer a solution that uses base with elimination of the multiple entries in the legend. In short, how can I have a grouped barplot with just two variables and unique legend entries?
test_me %>%
mutate(x = row_number()) %>%
ggplot(aes(x = x, y = A, fill = B)) +
geom_col()
The issue however is that the above solution results in overlap yet the base plot results in three grouped bars(that is the groups appear to be non-overlapping).
Thanks.
You need to give each element a discrete value on the x axis. Try this:
test_me %>%
mutate(x = row_number()) %>%
ggplot(aes(x = x, y = A, fill = B)) +
geom_col()

stacked bar chart in ggplot when converted to plotly doesnt render right

update: adding a minimum reproducible code for the data.
im trying to convert a ggplot to plotly chart in shiny. The problem is that in ggplot, the stacked bar chart (with stat =identity) stacks up nicely without any spaces in between, whereas when i convert to plotly, there are these spaces in between each item.
I am not producing the entire code for shiny, as it is difficult to follow. However here are the images and a much simplified code (not the shiny version)
t<- 1:50
GIB_Rating = rep(c('2','3','4+','5','5-','7','8','6','6+','5'),5)
t1<-data.frame(t,GIB_Rating)
CapitalChargeType = c('Credit_risk_Capital','NameConcentration','SectorConcentration')
t2<- expand.grid(t=t, CapitalChargeType=CapitalChargeType)
t3<-left_join(t2,t1)
New = rnorm(150, mean=100,sd=250)
t3<- data.frame(t3,New)
t3<- ggplot(t3, aes(x=GIB_Rating, y=New, fill=CapitalChargeType)) + geom_bar(stat='identity')
t3
this produces this image some what like this, which is exactly what I want.
However as it is not interactive, I want a plotly image, which shows the total of capital charge type when cursor hovers over. so, I use the code below
t4<-ggplotly(t3)
t4
the plotly plot now produced has white lines (for each individual item) in between each color class (Capitalchargetype), which i want to avoid, also the tooltip also produces individual items rather than the sum of each CapitalChargeType
The issue is in the way plotly handles stacked bars of factors. Each factor gets wrapped in a border which is white by default. There's a very easy workaround: Just add color = CapitalChargeType to the ggplot object.
library(tidyverse)
df <- data_frame(
t = 1:50,
GIB_Rating = rep(c('2','3','4+','5','5-','7','8','6','6+','5'),5)
)
df <- df %>%
expand(t, CapitalChargeType =
c('Credit_risk_Capital','NameConcentration','SectorConcentration')) %>%
left_join(df) %>%
mutate(New = rnorm(150, mean=100,sd=250))
g <- ggplot(df, aes(x = GIB_Rating, y = New, fill = CapitalChargeType, color = CapitalChargeType)) +
geom_bar(stat='identity')
plotly::ggplotly(g)

Resources