Overlay two density plots in plotly - r

I have a dataset which contains two columns.
Each row is a user with frequency (1~31, shows how frequently a user plays a game monthly) and is_consumed(0,1 whether the user ever consumed in the game).
I want to draw two density plots for frequency separated by the value of is_consumed.
I finished it in ggplot2 but I want to use plotly.
ggplot2 code:
p2 <- p_plot %>%
ggplot(aes(frequency, fill = is_consumed)) +
geom_density(alpha = 0.5)
p2
Output
Red is the density plot of is_consumed == 1. Green is is_consumed == 0

There's a cool function in the plotly library called ggplotly() that converts a ggplot object into a plotly object: https://plot.ly/ggplot2/geom_density/
So you could do:
library(plotly)
library(ggplot2)
p_plot <- data.frame(frequency = c(rnorm(31, 1), rnorm(31)),
is_consumed = factor(round(runif(62))))
p2 <- p_plot %>%
ggplot(aes(frequency, fill = is_consumed)) +
geom_density(alpha = 0.5)
ggplotly(p2)

Related

How to make a circled bubble plot using ggplot2 coord_polar()?

I have an example data, which does not have x- and y-axis information. I would like to make a bubble plot using R package ggplot2, and arrange the bubbles in a circled manner.
data <- data.frame(group = paste("Group", letters[1:11]),
value = sample(seq(1,100),11))
Thanks a lot.
You can just put a dummy value for y and make group your x values in aes.
ggplot(data, aes(x = group, y = 0, size = value)) +
coord_polar() +
geom_point()

Use two colour scales possible (with work around)?

I'm trying to plots insect counts of 2 species in 18 experimental plots onto a single graph. Since the second species population peaks later, it is visually doable (see picture below). I would like the 18 population lines from species 1 to be green (using "Greens" from RColorBrewer) and the 18 of species 2 to be red (using "Reds"). I do realize this may be problematic for a colourblind audience, but that is irrelevant here.
I've read here that it is not possible with standard ggplot2 options: R ggplot two color palette on the same plot but this post is more than two years old.
There is a short of "cheat" for points: Using two scale colour gradients ggplot2 but since I prefer lines to show the population through time, I can't use it.
Are there any new "cheats" available for this?
Or does anyone have another idea to visualize my data in a way that shows population trends through time in all plots and shows the difference in timing of the peak? I've included a picture at the bottom that shows my real data, all in the same colour scale though.
Sample code
# example data frame
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
# example plot
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot", values = S1Colours) +
scale_colour_manual(name = "plot", values = S2Colours)
# this gives the note "Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale."
Plot real data
I also would go by creating a manual color scale for all the combinations.
library(tidyverse)
library(RColorBrewer)
df_long=pivot_longer(df,cols=c(S1,S2),names_to = "Species",values_to = "counts") %>% # create long format and
mutate(plot_Species=paste(plot,Species,sep="_")) # make identifiers for combined plot and Species
#make color palette
mycolors=c(colorRampPalette(brewer.pal(9,"Greens"))(sum(grepl("S1",unique(df_long$plot_Species)))),
colorRampPalette(brewer.pal(9,"Reds"))(sum(grepl("S2",unique(df_long$plot_Species)))))
names(mycolors)=c(grep("S1",unique(df_long$plot_Species),value = T),
grep("S2",unique(df_long$plot_Species),value = T))
# example plot
ggplot(data=df_long) +
geom_line(aes(x=time, y=counts, colour=plot_Species)) +
scale_colour_manual(name = "Species by plot", values = mycolors)
You can do this easily with the ggnewscale package (disclaimer: I'm the author).
This is how you would do it:
library(RColorBrewer)
library(ggplot2)
library(ggnewscale)
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
scale_colour_manual(name = "plot 1", values = S1Colours) +
new_scale_color() +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot 2", values = S2Colours)
Created on 2019-12-19 by the reprex package (v0.3.0)

R: overlying trajectory plot and scatter plot

I'm working with ggplot2 and trajectory plots, plots whom are like scatter plots, but with lines that connect points due a specific rule.
My goal is to overlay a trajectory plot with a scatter plot, and each of them has different data.
First of all, the data:
# first dataset
ideal <- data.frame(ideal=c('a','b')
,x_i=c(0.3,0.8)
,y_i=c(0.11, 0.23))
# second dataset
calculated <- data.frame(calc = c("alpha","alpha","alpha")
,time = c(1,2,3)
,x_c = c(0.1,0.9,0.3)
,y_c = c(0.01,0.26,0.17)
)
Creating a scatter plot with the first one is easy:
library(ggplot2)
ggplot(calculated, aes(x=x_c, y=y_c)) + geom_point()
After that, I created the trajectory plot, using this helpful link:
library(grid)
library(data.table)
qplot(x_c, y_c, data = calculated, color = calc, group = calc)+
geom_path (linetype=1, size=0.5, arrow=arrow(angle=15, type="closed"))+
geom_point (data = calculated, colour = "red")+
geom_point (shape=19, size=5, fill="black")
With this result:
How can I overlay the ideal data to this trajectory plot (without trajectory of course, they should be only points)?
Thanks in advance!
qplot isn't usually recommended. Here's how you could plot the two dataframes. However, ggplot might work better for you if the dataframes were merged, and you had an x and y column, with an additional method column containing with calculated or ideal.
library(ggplot2)
ideal <- data.frame(ideal=c('a','b')
,x_i=c(0.3,0.8)
,y_i=c(0.11, 0.23)
)
# second dataset
calculated <- data.frame(calc = c("alpha","alpha","alpha")
,time = c(1,2,3)
,x_c = c(0.1,0.9,0.3)
,y_c = c(0.01,0.26,0.17)
)
ggplot(aes(x_c, y_c, color = "calculated"), data = calculated) +
geom_point( size = 5) +
geom_path (linetype=1, size=0.5, arrow = arrow(angle=15, type="closed"))+
geom_point(aes(x_i, y_i, color = "ideal"), data = ideal, size = 5) +
labs(x = "x", y = "y", color = "method")

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

ggplot only printing gray boxes to file

So SO nailed getting my graph to work, but now i can't get it to print! The end goal is that i need to automate the updating of these plots, so the ggplot and print calls need to be in a function. When i run this code, each file just contains a gray square.
toyfn <- function(plotdata){
library(ggplot2)
plotS1 <- ggplot(plotdata)
plotS1 + geom_bar(aes(x=year,y=value,factor=variable,fill=variable,
order=-as.numeric(variable)), stat="identity") +
geom_line(data=linedata, aes(x=year,y=production))
ggsave('testprint.png',plotS1)
png(filename='testprint2.png')
print(plotS1)
dev.off()
}
library(ggplot2)
library(reshape)
# First let's make a toy dataset for our stacked plot/line plot example.
year = c(1,2,3,4,5,6)
stocks = c(2,4,3,2,4,3)
exports = stocks*2
domestic = stocks*3
production = c(15,16,15,16,15,16)
# Make 2 df's: alldata is for stacked bar chart, linedata is for plotting a line on top of it.
alldata = data.frame(year,stocks,exports,domestic)
linedata = data.frame(year,production)
# Make alldata 'long' for the stacking
melteddata = melt(alldata,id.vars="year")
toyfn(melteddata)
You are saving a plot with no geoms. The plot with geoms will display on the screen, but not in the file.
Try this:
toyfn <- function(plotdata){
plotS1 <- ggplot(plotdata, aes(year, value, factor = variable, fill = variable)) +
geom_bar(stat="identity", aes(order = -as.numeric(variable))) +
geom_line(data=linedata, aes(x=year,y=production))
ggsave('testprint.png', plot = plotS1)
}

Resources