How to add mean to grouped bwplot Lattice R - r

I have a grouped boxplot that shows for each category two boxes side by side (see code). Now I am interested in adding the mean for each category and box separately. I can calculate and visualize the mean for each category but not conditioned on the grouped variable "year". I tried to calculate the means for each year individually and add them separately, but that did not work.
data(mpg, package = "ggplot2")
library(latticeExtra)
tmp <- tapply(mpg$hwy, mpg$class, FUN =mean)
bwplot(class~hwy, data = mpg, groups = year,
box.width = 1/3,
panel = panel.superpose,
panel.groups = function(x, y,..., group.number) {
panel.bwplot(x,y + (group.number-1.5)/3,...)
panel.points(tmp, seq(tmp),...)
}
)
Which produces the following plot:
The example is based on: Grouped horizontal boxplot with bwplot
Can someone show how to do this if possible using Lattice graphics? Because all my plots in my master thesis are based on it.

If you want to consider a last option, you can try with ggplot2. Here the code where the red points belong to means:
library(ggplot2)
library(dplyr)
#Data
data(mpg, package = "ggplot2")
#Compute summary for points
Avg <- mpg %>% group_by(class,year) %>%
summarise(Avg=mean(hwy))
#Plot
ggplot(data = mpg, aes(x = class, y = hwy, fill = factor(year))) +
geom_boxplot(alpha=.25) +
geom_point(data=Avg,aes(x = class, y = Avg,color=factor(year)),
position=position_dodge(width=0.9),show.legend = F)+
scale_color_manual(values = c('red','red'))+
coord_flip()+
labs(fill='Year')+
theme_bw()
Output:

Related

ggplotly: unable to add a frame in PCA score plot in ggplot2

I would like to make a PCA score plot using ggplot2, and then convert the plot into interactive plot using plotly.
What I want to do is to add a frame (not ellipse using stat_ellipse, I know it worked).
My problem is that when I try to use sample name as tooltip in ggplotly, the frame will disappear. I don't know how to fix it.
Below is my code
library(ggplot2)
library(plotly)
library(dplyr)
## Demo data
dat <- iris[1:4]
Group <- iris$Species
## Calculate PCA
df_pca <- prcomp(dat, center = T, scale. = FALSE)
df_pcs <- data.frame(df_pca$x, Group = Group)
percentage <-round(df_pca$sdev^2 / sum(df_pca$sdev^2) * 100, 2)
percentage <-paste(colnames(df_pcs),"(", paste(as.character(percentage), "%", ")", sep = ""))
## Visualization
Sample_Name <- rownames(df_pcs)
p <- ggplot(df_pcs, aes(x = PC1, y = PC2, color = Group, label = Sample_Name)) +
xlab(percentage[1]) +
ylab(percentage[2]) +
geom_point(size = 3)
ggplotly(p, tooltip = "label")
Until here it works! You can see that sample names can be properly shown in the ggplotly plot.
Next I tried to add a frame
## add frame
hull_group <- df_pcs %>%
dplyr::mutate(Sample_Name = Sample_Name) %>%
dplyr::group_by(Group) %>%
dplyr::slice(chull(PC1, PC2))
p2 <- p +
ggplot2::geom_polygon(data = hull_group, aes(fill = Group), alpha = 0.1)
You can see that the static plot still worked! The frame is properly added.
However, when I tried to convert it to plotly interactive plot. The frame disappeared.
ggplotly(p2, tooltip = "label")
Thanks a lot for your help.
It works if you move the data and mapping from the ggplot() call to the geom_point() call:
p2 <- ggplot() +
geom_point(data = df_pcs, mapping = aes(x = PC1, y = PC2, color = Group, label = Sample_Name), size = 3) +
ggplot2::geom_polygon(data = hull_group, aes(x = PC1, y = PC2, fill = Group, group = Group), alpha = 0.2)
ggplotly(p2, tooltip = "label")
You might want to change the order of the geom_point and geom_polygon to make sure that the points are on top of the polygon (this also affects the tooltip location).

Is there a way to add the bin range label into the tooltip for a histogram using ggplotly in R?

library(tidyverse)
library(ggplot2)
library(plotly)
data(mpg)
ggplotly(
mpg %>%
ggplot(aes(x=hwy)) +
geom_histogram(),
tooltip = ("all"))
When you hover over the bar, I'd like for the tooltip to show the start and stop of the bin (e.g. 20-21)
Thanks for the simple plot_ly answer. For other reasons, I'd like to preserve ggplot. Here's one possible solution I came up with that extracts the histogram elements from ggbuild_plot() and plots them as a bar graph.
ggplotly(
ggplot_build(
mpg %>%
ggplot(aes(x=hwy)) +
geom_histogram()
)$data[[1]] %>%
ggplot(aes(x=factor(x), y = count, text = paste0("range: ",round(xmin, 1), " - ", round(xmax,1)))) +
geom_bar(stat="identity") +
theme(axis.text.x = element_blank()),
tooltip = c("text"))
In case if it's not mandatory to use ggplot2, an easier fix is to use basic histogram plot:
plot_ly(x = mpg$hwy, type = "histogram")
I ran into this issue but also needed to label the x-axis with bin ranges, so I built on your answer (which was great!)
I broke it down into three steps: using ggplot to create the first histogram that generates the bin ranges, using ggplot again to create the second histogram that uses those ranges for labels, and then using plotly to make it interactive.
Here's a reprex that should be customizable for other use cases. Once you get the gist you can ditch the intermediate variables and run the whole thing at once with pipes.
library(tidyverse)
library(plotly)
# step 1: create a ggplot histogram, extract the internal data
plot_step1 <- ggplot_build(
mpg %>%
ggplot() +
geom_histogram(aes(x=hwy),
bins = 11 # set histogram parameters here
)
)$data[[1]]
# step 2: create a new plot, using the derived xmin and xmax values from the
# first plot, and set the labels and axes
plot_step2 <- plot_step1 %>% {
ggplot(data = .,
aes(x=factor(x),
y = count,
text = sprintf("Count: %d\nRange (MPG): %.1f-%.1f", y, round(xmin,1), round(xmax,1)))) +
scale_x_discrete(labels = sprintf("%.1f-%.1f", .$xmin, .$xmax)) +
geom_bar(stat="identity",
width = 1) +
labs(title = "Histogram: Highway Miles per Gallon",
x = "MPG",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45 ))
}
# step 3: make this new plot interactive
plotly::ggplotly(plot_step2, tooltip = c("text"))
A solution using library(ggiraph):
library(tidyverse)
library(ggplot2)
library(ggiraph)
p1 <- mpg %>%
ggplot(., aes(x=hwy)) +
geom_histogram_interactive(bins = 20, aes(tooltip = paste0("[",round(..xmin..,2),",",round(..xmax..,2),"] count: ",..count..)))
ggiraph(ggobj = p1)
Example

Plot a stacked barplot - amended

I have 4 dataframes, which all have a column called Results showing Wins, Draws, Losses. I would like to create a layered histogram as the picture below. Any idea if it is achievable in R?
This is what I was playing with:
ggplot(results, aes(x = Country, y = ??)) +
geom_bar(aes(fill = Performance), stat = "identity")
Problem with this is I don't know what should I set the y axis to be. These are supposed to be counts
Another option I tried which is almost what I want is this:
counts <- table(results$Performance, results$Country)
barplot(counts, main="Game Count per Football Team",
xlab="Football Teams", ylab = "Game Count", col=c("darkblue","red", "Yellow"),
legend = rownames(counts))
Although the y axis stop at 800 although I have 908 observations max in one of the countries
Well, I can give you some code that will show you how you could do this. You basically would just want four different geom_bar statements.
To demonstrate, I'll create two different dataframes from the mpg dataset that comes with the ggplot2 package, because you didn't provide any data.
library(tidyverse)
# I'm making two different data frames from the
# 'mpg' dataset, which comes with the ggplot package
mpg$year = as.character(mpg$year)
df1 = filter(mpg, year == "1999")
df2 = filter(mpg, year == "2008")
plot = ggplot() +
geom_bar(data=df1
, aes(x = year, y = hwy, fill = manufacturer)
, stat = "identity") +
geom_bar(data=df2
, aes(x = year, y = hwy, fill = manufacturer)
, stat = "identity")
print(plot)

ggplot2: Different vlines for each graph using facet_wrap [duplicate]

I've poked around, but been unable to find an answer. I want to do a weighted geom_bar plot overlaid with a vertical line that shows the overall weighted average per facet. I'm unable to make this happen. The vertical line seems to a single value applied to all facets.
require('ggplot2')
require('plyr')
# data vectors
panel <- c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
instrument <-c("V1","V2","V1","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1")
cost <- c(1,4,1.5,1,4,4,1,2,1.5,1,2,1.5,2,1.5,1,2)
sensitivity <- c(3,5,2,5,5,1,1,2,3,4,3,2,1,3,1,2)
# put an initial data frame together
mydata <- data.frame(panel, instrument, cost, sensitivity)
# add a "contribution to" vector to the data frame: contribution of each instrument
# to the panel's weighted average sensitivity.
myfunc <- function(cost, sensitivity) {
return(cost*sensitivity/sum(cost))
}
mydata <- ddply(mydata, .(panel), transform, contrib=myfunc(cost, sensitivity))
# two views of each panels weighted average; should be the same numbers either way
ddply(mydata, c("panel"), summarize, wavg=weighted.mean(sensitivity, cost))
ddply(mydata, c("panel"), summarize, wavg2=sum(contrib))
# plot where each panel is getting its overall cost-weighted sensitivity from. Also
# put each panel's weighted average on the plot as a simple vertical line.
#
# PROBLEM! I don't know how to get geom_vline to honor the facet breakdown. It
# seems to be computing it overall the data and showing the resulting
# value identically in each facet plot.
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(xintercept=sum(contrib)) +
facet_wrap(~ panel) +
ylab("contrib")
If you pass in the presumarized data, it seems to work:
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(data = ddply(mydata, "panel", summarize, wavg = sum(contrib)), aes(xintercept=wavg)) +
facet_wrap(~ panel) +
ylab("contrib") +
theme_bw()
Example using dplyr and facet_wrap incase anyone wants it.
library(dplyr)
library(ggplot2)
df1 <- mutate(iris, Big.Petal = Petal.Length > 4)
df2 <- df1 %>%
group_by(Species, Big.Petal) %>%
summarise(Mean.SL = mean(Sepal.Length))
ggplot() +
geom_histogram(data = df1, aes(x = Sepal.Length, y = ..density..)) +
geom_vline(data = df2, mapping = aes(xintercept = Mean.SL)) +
facet_wrap(Species ~ Big.Petal)
vlines <- ddply(mydata, .(panel), summarize, sumc = sum(contrib))
ggplot(merge(mydata, vlines), aes(sensitivity, weight = contrib)) +
geom_bar(binwidth = 1) + geom_vline(aes(xintercept = sumc)) +
facet_wrap(~panel) + ylab("contrib")

ggplot2 boxplot medians aren't plotting as expected

So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:
require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)
df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
scale_y_continuous(limits = c(0, 15)) +
theme(legend.position = "none"))
d
However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.
df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value))
d
The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.
What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?
The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:
Scale Transformations
Statistical Computations
Coordinate Transformations
In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.
The solution is to omit the scale_y_continuous instruction and instead use:
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))
This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.

Resources