Plotly flips ggplot2 boxplot - r

I want to plot boxplot using only summary statistics (INPUT is summary statistics for each ID).
plot1 is what I want, but when I convert it to a plotly object something goes wrong (ie., plotly flips boxplot).
However, if I plot boxplot the usual way (not using stat = "identity") everything works fine.
Question: Why plotly "flips" summarised ggplot2 boxplot and how to avoid this?
library(broom)
library(plotly)
library(tidyverse)
# Generate random data
# Calculate statistics
INPUT <- rnorm(100) %>%
matrix(10) %>%
apply(2, function(x) tidy(summary(x))) %>%
bind_rows() %>%
mutate(ID = letters[1:10])
# Plot boxplot using statistics
plot1 <- ggplot(INPUT, aes(ID)) +
geom_boxplot(stat = "identity", aes(
lower = q1,
upper = q3,
middle = median,
ymin = minimum,
ymax = maximum))
# Only ggplot2 produces right result
plot1; ggplotly(plot1)
# Plot boxplot usual way
plot2 <- INPUT %>%
gather(variable, value, -ID) %>%
ggplot(aes(ID, value)) +
geom_boxplot()
# ggplot2 and plotly produces right result
plot2; ggplotly(plot2)

Related

How do you get end-points on an ecdf curve using plotly in R?

I am trying to create a cumulative distribution function as a plotly object using ggplot2 and converting it using ggplotly, but the tails for 0 and 1 get lost in the conversion. How can I get the lines to extend to 0 and 1 using plotly in R?
This is the code I'm using:
gg <- ggplot(data=some_data, aes(x = x_var, color = grouping_var)) +
stat_ecdf()
ggplotly(gg)
This is the ggplot I get
But when I turn it into a plotly object the tails disappear and this is what I get
Edit: Editing to incorporate Quinten's request for recreation data:
new_data <- iris %>%
arrange(Petal.Length)
gg <- ggplot(data = new_data,
aes(x = Petal.Length,
color = Species)) +
stat_ecdf()
ggplotly(gg)
I think the problem here is the way that plotly handles the infinite x axis values produced as default by stat_ecdf. Although this can be turned off with stat_ecdf(pad = FALSE), this just leaves your initial ggplot without the endpoints too.
We can replicate your problem, as you suggested in the comments, with the iris data set:
library(ggplot2)
new_data <- iris %>%
arrange(Petal.Length)
gg <- ggplot(data = new_data, aes( x = Petal.Length, color = Species )) +
stat_ecdf()
In native ggplot, everything looks OK:
gg
But in plotly we miss the 0% and 100% lines because they stretch off to minus and positive infinity and are therefore dropped:
ggplotly(gg)
The way round this is to make the ecdf ourselves over a fixed range (say, 0 to 7). This requires a little data manipulation:
xmin <- 0
xmax <- 7
gg2 <- new_data %>%
group_by(Species) %>%
summarise(y = sapply(seq(xmin, xmax, 0.1), function(x) ecdf(Petal.Length)(x)),
Petal.Length = seq(xmin, xmax, 0.1)) %>%
ggplot(aes(Petal.Length, y, color = Species)) +
geom_step()
Now our ggplot looks like this:
gg2
And the plotly version remains faithful to this:
ggplotly(gg2)

Filling bar colours with the mean of another continuous variable in ggplot2 histograms

I have a dataset at the municipality level. I would like to draw a histogram of a given variable and, at the same time, fill the bars with another continuous variable (using a color gradient). This is because I believe the municipalities with low values of the variable I am plotting the histogram for have very different population size (on average) when comparing with the municipalities that are in the upper end of the distribution.
Using the mtcar data, say I would like to plot the distribution of mpg and fill the bars with a continuous color to represent the mean of the variable wt for each of the histogram bars. I typed the code below but I don't know how to actually make the fill option take the average of wt. I would want a legend to show up with a color gradient so as to inform if the mean value of wt for each histogram bar is low-medium-high in relative terms.
mtcars %>%
ggplot(aes(x=mpg, fill=wt)) +
geom_histogram()
If you want a genuine histogram you need to transform your data to do this by summarizing it first, and plot with geom_col rather than geom_histogram. The base R function hist will help you here to generate the breaks and midpoints:
library(ggplot2)
library(dplyr)
mtcars %>%
mutate(mpg = cut(x = mpg,
breaks = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$breaks,
labels = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$mids)) %>%
group_by(mpg) %>%
summarize(n = n(), wt = mean(wt)) %>%
ggplot(aes(x = as.numeric(as.character(mpg)), y = n, fill = wt)) +
scale_x_continuous(limits = c(0, 40), name = "mpg") +
geom_col(width = 10) +
theme_bw()
It is not a histogram exactly, but was the closest that I could think for your problem
library(tidyverse)
mtcars %>%
#Create breaks for mpg, where this sequence is just an example
mutate(mpg_cut = cut(mpg,seq(10,35,5))) %>%
#Count and mean of wt by mpg_cut
group_by(mpg_cut) %>%
summarise(
n = n(),
wt = mean(wt)
) %>%
ggplot(aes(x=mpg_cut, fill=wt)) +
#Bar plot
geom_col(aes(y = n), width = 1)

Can we plot percentage with Plot_ly

is there a way to plot percentages using plot_ly. For example, the below is used to plot the count of cut from diamonds dataset,
plot_ly(diamonds, x = ~cut)
But i tried to plot the percentage for cut. For example I need the percentage of "Good" to the total count. Is there a way to get it?
It could be done like this.
First, create percentage for each cut category
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100)
summarized dataset
Second, pipe the resultant data set to plot_ly()
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100) %>% plot_ly(x = ~cut, y = ~perc)
R Plot
You can use data.table and ggplot2:
library(data.table)
library(ggplot2)
dt <- data.table(diamonds)
Calculate the number of records by each cut, and then calculate the prop.table of those counts:
result <- dt[, .N, by = cut][, .(cut, N, percentCut = prop.table(N))]
Now you can plot it with ggplot and use the library scales to have a beautiful percent-formatted y-axis:
p <- ggplot(result, aes(x = cut, y = percentCut))+
geom_col()+
scale_y_continuous(labels = scales::percent)
Now you can pass p to plotly, if so you want:
plotly::ggplotly(p)

R Highlight point on ecdf line graph

I'm creating a frequency plot using ggplot and the stat_ecdf function. I would like to add the Y-value to the graph for specific X-values, but just can't figure out how. geom_point or geom_text seems likely options, but as stat_ecdf automatically calculates Y, I don't know how to call that value in the geom_point/text mappings.
Sample code for my initial plot is:
x = as.data.frame(rnorm(100))
ggplot(x, aes(x)) +
stat_ecdf()
Now how would I add specific y-x points here, e.g. y-value at x = -1.
The easiest way is to create the ecdf function beforehand using ecdf() from the stats package, then plot it using geom_label().
library(ggplot2)
# create a data.frame with column name
x = data.frame(col1 = rnorm(100))
# create ecdf function
e = ecdf(x$col1)
# plot the result
ggplot(x, aes(col1)) +
stat_ecdf() +
geom_label(aes(x = -1, y = e(-1)),
label = e(-1))
You can try
library(tidyverse)
# data
set.seed(123)
df = data.frame(x=rnorm(100))
# Plot
Values <- c(-1,0.5,2)
df %>%
mutate(gr=FALSE) %>%
bind_rows(data.frame(x=Values,gr=TRUE)) %>%
mutate(y=ecdf(x)(x)) %>%
mutate(xmin=min(x)) %>%
ggplot(aes(x, y)) +
stat_ecdf() +
geom_point(data=. %>% filter(gr), aes(x, y)) +
geom_segment(data=. %>% filter(gr),aes(y=y,x=xmin, xend=x,yend=y), color="red")+
geom_segment(data=. %>% filter(gr),aes(y=0,x=x, xend=x,yend=y), color="red") +
ggrepel::geom_label_repel(data=. %>% filter(gr),
aes(x, y, label=paste("x=",round(x,2),"\ny=",round(y,2))))
The idea is to add the y values in the beginning, together with the index gr specifing which Values you want to show.
Edit:
Since this code adds points to the actual data, which could be wrong for the curve, one should consider to remove these points at least in the ecdf function stat_ecdf(data=. %>% filter(!gr))

gradient fill violin plots using ggplot2

I want to gradient fill a violin plot based on the density of points in the bins (blue for highest density and red for lowest).
I have generated a plot using the following commands but failed to color it based on density (in this case the width of the violin. I also would like to generate box plots with similar coloring).
library("ggplot2")
data(diamonds)
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
to change the colour of the violin plot you use fill = variable, like this:
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin(aes(fill=cut))
same goes for boxplot
ggplot(diamonds, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=cut))
but whatever value you have has to have the same value for each cut, that is, if you wanted to use for example mean depth/cut as the color variable you would have to code it.
with dplyr group your diamonds by cut and with summarize get the mean depth (or any other variable)
library(dplyr)
diamonds_group <- group_by(diamonds, cut)
diamonds_group <- summarize(diamonds_group, Mean_Price = mean(price))
Then I used diamonds2 as a copy of diamonds to then manipulate the dataset
diamonds2 <- diamonds
I merge both dataframes to get the Mean_Depth as a variable in diamonds2
diamonds2 <- merge(diamonds2, diamonds_group)
And now I can plot it with mean depth as a color variable
ggplot(diamonds2, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=Mean_Price)) + scale_fill_gradient2(midpoint = mean(diamonds2$price))
Just answered this for another thread, but believe it's possibly more appropriate for this thread. You can create a pseudo-fill by drawing many segments. You can get those directly from the underlying data in the ggplot_built object.
If you want an additional polygon outline ("border"), you'd need to create this from the x/y coordinates. Below one option.
library(tidyverse)
p <- ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
mywidth <- .35 # bit of trial and error
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth)
# the outline is a bit more convoluted, as the order matters
vl_poly <- vl_fill %>%
select(xnew, xend, y, group) %>%
pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>%
arrange(y) %>%
split(., .$oldx) %>%
map(., function(x) {
if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
x
}) %>%
bind_rows()
ggplot() +
geom_polygon(data = vl_poly, aes(x, y, group = group),
color= "black", size = 1, fill = NA) +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = violinwidth))
Created on 2021-04-14 by the reprex package (v1.0.0)

Resources