I am using ggiraph to make an interactive plot in R. My data is grouped and what I'm hoping to do is plot just the mean value of the group but when I hover over that point in the plot, the other points appear. Hopefully, my example below will explain what I mean.
To begin I create some data and make a basic plot:
library(ggplot2)
library(ggiraph)
# create some data
dat1 <- data.frame(X=rnorm(21),
Y=rnorm(21),
groupID=rep(1,21))
dat2 <- data.frame(X=rnorm(21,5),
Y=rnorm(21,5),
groupID=rep(2,21))
dat3 <- data.frame(X=rnorm(21,10),
Y=rnorm(21,10),
groupID=rep(3,21))
ggdat <- rbind(dat1,dat2,dat3)
ggdat$groupID <- as.factor(ggdat$groupID)
# create a plot
ggplot(ggdat, aes(X,Y)) +
geom_point(aes(color = groupID)) +
theme(legend.position = 'none')
We can see the 3 different groups in the above plot.
Then, I'm finding the mean value of each group and plot that. In the example plot below, I'm also plotting all the points with a low alpha value and the mean point in black.
library(dplyr)
# create mean data frame
dfMean <- ggdat %>%
group_by(groupID) %>%
dplyr::summarize(mX = mean(X), mY = mean(Y))
gg_scatter <- ggplot(dfMean, aes(mX, mY, tooltip = groupID, data_id = groupID)) +
geom_point(data = ggdat, aes(X,Y), alpha = 0.1, color = ggdat$groupID) +
theme(legend.position = 'none') +
geom_point_interactive()
gg_scatter
What I'm hoping to do is when I hover over one of the black points, it changes the alpha value for that group to, say, alpha = 1 and shows all the points for that group.
Naively I just tried:
girafe(ggobj = gg_scatter,
options = list(
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "fill:red;")
) )
but this will just highlight the mean point that I'm hovering over and changes the other mean values points alpha.
Is there a way to hover over the mean value point, which changes the alpha for that particular group?
I am not sure if I answer correctly, but I hope it could help:
In your code, you did not use geom_point_interactive()when plotting the first points, so they can not be interactive.
library(ggplot2)
library(ggiraph)
# create some data
dat1 <- data.frame(X=rnorm(21),
Y=rnorm(21),
groupID=rep(1,21))
dat2 <- data.frame(X=rnorm(21,5),
Y=rnorm(21,5),
groupID=rep(2,21))
dat3 <- data.frame(X=rnorm(21,10),
Y=rnorm(21,10),
groupID=rep(3,21))
ggdat <- rbind(dat1,dat2,dat3)
ggdat$groupID <- as.factor(ggdat$groupID)
library(dplyr)
# create mean data frame
dfMean <- ggdat %>%
group_by(groupID) %>%
dplyr::summarize(mX = mean(X), mY = mean(Y))
gg_scatter <- ggplot(dfMean, aes(mX, mY, tooltip = groupID, data_id = groupID)) +
geom_point_interactive(data = ggdat, aes(X,Y, color = groupID), alpha = 0.9) +
theme(legend.position = 'none') +
geom_point_interactive()
gg_scatter
girafe(ggobj = gg_scatter,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "fill:red;")
) )
Related
I know how to plot several density curves/polygrams on one plot, but not conditional density plots.
Reproducible example:
require(ggplot2)
# generate data
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
c <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,c)
# plot 1
ggplot(df, aes(a, fill = c)) +
geom_density(position='fill', alpha = 0.5)
# plot 2
ggplot(df, aes(b, fill = c)) +
geom_density(position='fill', alpha = 0.5)
In my real data I have a bunch of these paired conditional density plots and I would need to overlay one over the other to see (and show) how different (or similar) they are. Does anyone know how to do this?
One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.
ggplot(df, aes(fill = c)) +
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)
For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."
ggplot(df) +
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)
I'm not so sure if "on top of one another" is a great idea. Jon's ideas are probably the way to go. But what about just plotting side-by side - our brains can cope with that and we can compare this pretty well.
Make it long, then use facet.
Another option might be an animated graph (see 2nd code chunk below).
require(ggplot2)
#> Loading required package: ggplot2
library(tidyverse)
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
#### BAAAAAD idea to call anything "c" in R!!! Don't do this. ever!
d <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,d)
df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
facet_grid(~name)
library(gganimate)
p <- df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
labs(title = "{closest_state}")
p_anim <- p + transition_states(name)
animate(p_anim, duration = 2, fps = 5)
Created on 2022-06-14 by the reprex package (v2.0.1)
Although it is not the overlay you might have thought of, it facilitates the comparison of density curves:
library(tidyverse)
library(ggridges)
library(truncnorm)
DF <- tibble(
alpha = rtruncnorm(n = 200, a = 0, b = 1000, mean = 500, sd = 50),
beta = rtruncnorm(n = 200, a = 0, b = 1000, mean = 550, sd = 50)
)
DF <- DF %>%
pivot_longer(c(alpha, beta), names_to = "name", values_to = "meas") %>%
mutate(name = factor(name))
DF %>%
ggplot(aes(meas, name, fill = factor(stat(quantile)))) +
stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = T,
quantiles = 4,
quantile_lines = T
) +
scale_fill_viridis_d(name = "Quartiles")
I wrote the following procedure in R:
Start with a data frame called "giraffe" data
Sample 30% of this data and label it "sample"
Create a histogram for this data, and color the areas of this histogram that were "sampled" as one color, and the other rows another color
Repeat this process 100 times and make an animation of this process
library(ggplot2)
library(dplyr)
library(gganimate)
giraffe_data <- data.frame( a = abs(rnorm(1000,17,10)), b = abs(rnorm(1000,17,10)))
results <- list()
for( i in 1:100)
{
giraffe_data_i <- giraffe_data
a_i <- c("sample", "not_sampled")
aa_i <- as.factor(sample(a_i, 1000, replace=TRUE, prob=c(0.3, 0.7)))
giraffe_data_i $col = cut(giraffe_data_i$a, c(-Inf, 17, Inf))
giraffe_data_i$sample <- aa_i
giraffe_data_i$iteration <- i + 1
results[[i]] <- giraffe_data_i
}
results
results_df <- do.call(rbind.data.frame, results)
animate(
ggplot(results_df, aes(x=a, fill = col)) +
geom_histogram(binwidth=1) +
scale_fill_manual(breaks = levels(results_df$col), values = c('blue', 'red')) +
transition_states(iteration, state_length = 0.2) +
labs(title = "Group: {closest_state}"),
fps = 25)
But for some reason, this graph does not change colors in the animation.
Can someone please show me how to fix this?
Thanks
Note: I was able to get the colors to change with the following code:
animate(
ggplot(results_df, aes(x=a, color = sample)) +
geom_histogram(fill="white", position="dodge")+
transition_states(iteration, state_length = 0.2) +
labs(title = "Group: {closest_state}"),
fps = 5)
But this shows the two colors as two separate "groups". I want there to be only one "group", but there to be different colors within this one "group". Can someone please show me how to fix this?
Thanks
Sometimes I find it easier to do transformations of the data upstream of gganimate. So here's an approach of binning the data and counting for each iteration, and then plotting as a normal column geom.
library(tidyverse); library(gganimate)
# bins of width 2
bin_wid = 2
results_df_bins <- results_df %>%
# "col" is set at 17 but my bins are at even #s, so to align
# bins with that I offset by 1
mutate(a_bin = floor((a + 1)/ bin_wid)*bin_wid) %>%
count(a_bin, col, sample, iteration) %>%
mutate(sample = fct_rev(sample)) # put "sample" first
animate(
ggplot(results_df_bins, aes(x=a_bin, y = n, fill = sample)) +
geom_col(position = position_stack(reverse = TRUE)) +
transition_states(iteration, state_length = 0.2) +
labs(title = "Group: {closest_state}"),
fps = 25, nframes = 500, height = 300)
I was trying the following code in order to get a graph of boxplots with ggplot2 which are grouped according to different categories:
category_1 <- rep(LETTERS[1:4], each = 20)
value <- rnorm(length(category_1), mean = 200, sd = 20)
category_2 <- rep(as.factor(c("Good", "Medium", "Bad")), length.out = length(category_1))
category_3 <- rep(as.factor(c("Bright", "Dark")), length.out = length(category_1))
df <- data.frame( category_1, value, category_2, category_3)
p <- ggplot(df, aes(x = category_1, y = value, color = category_2, shape = category_3)) +
geom_boxplot(alpha = 0.5) +
geom_point(position=position_jitterdodge(), alpha=0.7)
p
I'm still too noob in stackoverflow to post images, but this is the result I want.
However, when I try to convert it to plotly using
pp <- ggplotly(p)
pp
the last 2 grouping layers (shape and color) are "ignored" and all the boxplots are plotted on top of each other, only respecting the x-axis grouping specified in aes(x = category_1, ...) as you can see here.
How can I avoid this problem? Thanks for your time.
EDIT
I've tried using plotly syntax directly and I get a similar result using the following code:
pp <- plot_ly(df, x = ~category_1, y = ~value, color = ~category_2,
mode = "markers", symbol = ~category_3, type = "box", boxpoints = "all") %>%
layout(boxmode = "group")
pp
Here the result. I said similar because plotly forces the dots to be next to, and not on top of the boxplot, which is not exactly what I wanted.
I guess the question is "solved". Although, I'm still curious if there is an explanation for the problem above. Thanks again!
I think this will solve your issue.
p <- ggplot(df, aes(x = category_1, y = value, color = category_2, shape = category_3)) +
geom_boxplot(alpha = 0.5) +
geom_point(position=position_jitterdodge(), alpha=0.7)
p %>%
ggplotly() %>%
layout(boxmode = "group")
Cheers.
I am trying to reproduce a graph from Stata in R. I have several variables and want to display their mean in each treatment group of which there are two. The Stata graph is as follows:
This coefficient plot is not actually a plot of coefficients, but of the mean values by each treatment for each separate variable. The df basically looks something like.
workable data
It is difficult to answer your question without reproducible data.
However, this might get what you desire just with mean:
library(dplyr)
mpg %>%
select(manufacturer, cty, trans) %>%
group_by(manufacturer, trans) %>%
summarize(cty_mean = mean(cty)) %>%
ggplot(aes(x=cty_mean, y=reorder(manufacturer, cty_mean), color=trans)) +
geom_point()
If you also wish to include the coefficients or std errors, then you could achieve by including a function in summarize().
I figured out geom_pointrange() is probably what you are looking for:
library("ggplot2")
set.seed(111018)
interval1 <- -qnorm((1-0.9)/2)
means_treatment_1 <- rnorm(2)
se_treatment_1 <- rnorm(2)
df_treatment_1 <- data.frame("Mean" = means_treatment_1,
"lower" = means_treatment_1 - se_treatment_1*interval1,
"upper" = means_treatment_1 + se_treatment_1*interval1,
"Variable" = c("medicare_spending_dummy",
"job_training_dummy"),
"Treatment" = "a")
means_treatment_2 <- rnorm(2)
se_treatment_2 <- rnorm(2)
df_treatment_2 <- data.frame("Mean" = means_treatment_2,
"lower" = means_treatment_2 - se_treatment_2*interval1,
"upper" = means_treatment_2 + se_treatment_2*interval1,
"Variable" = c("medicare_spending_dummy",
"job_training_dummy"),
"Treatment" = "b")
df_tot<-rbind(df_treatment_1, df_treatment_2)
# Plot
ggplot(df_tot, aes(colour = Treatment)) +
geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
geom_pointrange(aes(x = Variable, y = Mean, ymin = lower, ymax = upper ),lwd = 1, position = position_dodge(width = 1/2)) +
coord_flip() +
theme_bw()
I am making a pie chart and want to label it with the value for each slice. I have the information in a data frame but the column in which to look should be defined in the function call.
The code is the (decently) long, but I think only 1 line needs to be changed. I have tried mainsym, as.symbol, as.name, quote, and anything else I could think to throw at it but to no avail.
Thanks
library(dplyr)
library(ggplot2)
library(gridExtra)
pie_chart <- function(df, main, labels, labels_title=NULL) {
mainsym <- as.symbol(main)
labelssym <- as.symbol(labels)
# convert the data into percentages. add label position and inner label text
df <- df %>%
mutate(perc = mainsym / sum(mainsym)) %>%
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",main)) #NEED HELP HERE! Replace 'main' with something
#debug print statement
print(df)
# reorder the category factor levels to order the legend
df[[labels]] <- factor(df[[labels]], levels = unique(df[[labels]]))
p <- ggplot(data = df, aes_(x = factor(1), y = ~perc, fill = labelssym)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents and values to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = inner_label_text), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y",direction=-1)
return(p)
}
set.seed(42)
donations <- data.frame(donation_total=sample(1:1E5,50,replace=TRUE))
donation_size_levels_same <- seq(0,2E6,10E3)
donations$bracket <- cut(donations$donation_total,breaks=donation_size_levels_same,right=FALSE,dig.lab = 50)
donations.by_bracket <- donations %>%
group_by(bracket) %>%
summarize(n=n(),total=sum(donation_total)) %>%
ungroup() %>%
arrange(bracket)
grid.arrange(
pie_chart(df=donations.by_bracket,main="n",labels="bracket",labels_title="Total Amount Donated"),
pie_chart(df=donations.by_bracket,main="total",labels="bracket",labels_title="Total Amount Donated"))
The label placement still needs some adjustment but this seems to address the labelling issue, if you just replace that one line (where you say need help here) as follows:
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",as.character(df[[main]])))