Plotly and stat_summary produce "NA" in tooltip - r

I'd like to use stat_summary to show mean values of my data on a ggplot2 graph, and use plotly to show the data and mean values in the tooltip. I'm using the dev version of ggplot2 (ggplot2_2.2.1.9000) and I just reinstalled plotly (plotly_4.7.1).
stat_smooth works - you can see I'm hovering over the smoothed line and I get values in the tooltip.
library(plotly)
library(ggplot2)
p <- iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point()
ggplotly(p + stat_smooth(se = FALSE))
However, stat_summary doesn't - and that's what I need for my "real-world" problem :). Note how Sepal.Width is NA. I could have sworn this worked for me last year.
ggplotly(p + stat_summary(fun.y = "mean", geom="line", size = 1))

Related

ggplot add Normal Distribution while using `facet_wrap` [duplicate]

This question already has answers here:
using stat_function and facet_wrap together in ggplot2 in R
(6 answers)
Closed 2 years ago.
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas?
Thanks!
Edit
I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))

Overlay density plot to each existing facet wrapped density plot in ggplot2?

I have a dataframe with ~37000 rows that contains 'name' in string format and 'UTCDateTime' in posixct format and am using it to produce a facet wrapped density plot of time grouped by the names:
I also have a separate density plot of posixct datetime data from an entirely different dataframe:
I want to overlay this second density plot on each individual facet_wrapped plot in the first density plot. Is there a way to do that? In general, if I have plots of any kind that are facet wrapped and another plot of the same type but different data that I want to overlay on each facet of the facet wrap, how do I do so?
This should in theory be as simple as not having the column that you're facetting by in the second dataframe. Example below:
library(ggplot2)
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions)) +
facet_wrap(~ Species)
Created on 2020-08-12 by the reprex package (v0.3.0)
EDIT: To get the densities on the same scale for the two types of data, you can use the computed variables using after_stat()*:
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(y = after_stat(scaled),
fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions,
y = after_stat(scaled))) +
facet_wrap(~ Species)
* Prior to ggplot2 v3.3.0 also stat(variable) or ...variable....

How to specify ggplot2 boxplot fill colour for continuous data?

I want to plot a ggplot2 boxplot using all columns of a data.frame, and I want to reorder the columns by the median for each column, rotate the x-axis labels, and fill each box with the colour corresponding to the same median. I can't figure out how to do the last part. There are plenty of examples where the fill colour corresponds to a factor variable, but I haven't seen a clear example of using a continuous variable to control fill colour. (The reason I'm trying to do this is that the resultant plot will provide context for a force-directed network graph with nodes that will be colour-coded in the same way as the boxplot -- the colour will then provide a mapping between the two plots.) It would be nice if I could re-use the value-to-colour mapping for later plots so that colours are consistent between plots. So, for example, the box corresponding to the column variable with a high median value will have a colour that denotes this mapping and matches perfectly the colour for the same column variable in other plots (such as the corresponding node in a force-directed network graph).
So far, I have something like this:
# Melt the data.frame:
DT.m <- melt(results, id.vars = NULL) # using reshape2
# I can now make a boxplot for every column in the data.frame:
g <- ggplot(DT.m, aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
geom_boxplot(???, alpha=0.5)
The colour fill information is what I'm stuck on. "value" is a continuous variable in the range [0,1] and there are 55 columns in my data.frame. Various approaches I've tried seem to result in the boxes being split vertically down the middle, and I haven't got any further. Any ideas?
You can do this by adding the median-by-group to your data frame and then mapping the new median variable to the fill aesthetic. Here's an example with the built-in mtcars data frame. By using this same mapping across different plots, you should get the same colors:
library(ggplot2)
library(dplyr)
ggplot(mtcars %>% group_by(carb) %>%
mutate(medMPG = median(mpg)),
aes(x = reorder(carb, mpg, FUN=median), y = mpg)) +
geom_boxplot(aes(fill=medMPG)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradient(low=hcl(15,100,75), high=hcl(195,100,75))
If you have various data frames with different ranges of medians, you can still use the method above, but to get a consistent mapping of color to median across all your plots, you'll need to also set the same limits for scale_fill_gradient in each plot. In this example, the median of mpg (by carb grouping) varies from 15.0 to 22.8. But let's say across all my data sets, it varies from 13.3 to 39.8. Then I could add this to all my plots:
scale_fill_gradient(limits=c(13.3, 39.8),
low=hcl(15,100,75), high=hcl(195,100,75))
This is just for illustration. For ease of maintenance if your data might change, you'll want to set the actual limits programmatically.
I built on eipi10's solution and obtained the following code which does what I want:
# "results" is a 55-column data.frame containing
# bootstrapped estimates of the Gini impurity for each column variable
# (But can synthesize fake data for testing with a bunch of rnorms)
DT.m <- melt(results, id.vars = NULL) # using reshape2
g <- ggplot(DT.m %>% group_by(variable) %>%
mutate(median.gini = median(value)),
aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_boxplot(aes(fill=median.gini)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradientn(colours = heat.colors(9)) +
ylab("Gini impurity") +
xlab("Feature") +
guides(fill=guide_colourbar(title="Median\nGini\nimpurity"))
plot(g)
Later, for the second plot:
medians <- lapply(results, median)
color <- colorRampPalette(colors =
heat.colors(9))(1000)[cut(unlist(medians),1000,labels = F)]
color is then a character vector containing the colours of the nodes in my subsequent network graph, and these colours match those in the boxplot. Job done!

How do I color by factors of a categorical variable for faceted barplots?

My question relates to plots in ggplot. Running the code below each image should work if you load the "diamonds" dataset that comes with ggplot2.
I am trying to generate a graph like this:
library(ggplot2)
#First plot
p1 <- ggplot(diamonds, aes(color)) + geom_bar(aes(group = cut, y = ..density..))
p1 <- p1 + facet_wrap(~cut)
p1
but I want to color each bar in each facet by factor, like in this plot:
#Second plot
p2 <- ggplot(diamonds, aes(color)) + geom_bar(aes( y = ..density.., fill = color))
p2 <- p2 + facet_wrap(~cut)
p2
The problem is that "group =" and "fill=" appear to interfere with each other when I attempt to call them both; ggplot seems to ignore the "fill" command when "group" is also called.
The call to group is important because it forces the y-axis to scale for each facet, so that densities within each facet add up to 1. However, I'd like to be able to visually distinguish between groups easily using fill colors.
How can I work around this?
The problem is with ..density... It often is a convenient shortcut, but in a more complicated situation like this one it's often easier just to calculate on your own:
library(dplyr)
diam2 <- diamonds %>% group_by(cut) %>%
mutate(ncut = n()) %>%
group_by(cut, color) %>%
summarize(den = n() / first(ncut))
ggplot(diam2, aes(x = color, fill = color, y = den)) +
geom_bar(stat = "identity") +
facet_wrap(~ cut)
I should add, comparing my plot with your p1, the shapes are the same but the scale looks a little different (mine being a little lower overall). I'm not sure why.

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Resources