This question already has answers here:
Align multiple plots in ggplot2 when some have legends and others don't
(6 answers)
Closed 5 years ago.
I'm trying to use ggplot to draw a graph comparing the absolute values of two variables, and also show the ratio between them. Since the ratio is unitless and the values are not, I can't show them on the same y-axis, so I'd like to stack vertically as two separate graphs with aligned x-axes.
Here's what I've got so far:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot absolute values
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control"))
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Arrange the two plots above each other
grid.arrange(plot_values, plot_ratios, ncol=1, nrow=2)
The big problem is that the legend on the right of the first plot makes it a different size. A minor problem is that I'd rather not show the x-axis name and tick marks on the top plot, to avoid clutter and make it clear that they share the same axis.
I've looked at this question and its answers:
Align plot areas in ggplot
Unfortunately, neither answer there works well for me. Faceting doesn't seem a good fit, since I want to have completely different y scales for my two graphs. Manipulating the dimensions returned by ggplot_gtable seems more promising, but I don't know how to get around the fact that the two graphs have a different number of cells. Naively copying that code doesn't seem to change the resulting graph dimensions for my case.
Here's another similar question:
The perils of aligning plots in ggplot
The question itself seems to suggest a good option, but rbind.gtable complains if the tables have different numbers of columns, which is the case here due to the legend. Perhaps there's a way to slot in an extra empty column in the second table? Or a way to suppress the legend in the first graph and then re-add it to the combined graph?
Here's a solution that doesn't require explicit use of grid graphics. It uses facets, and hides the legend entry for "ratio" (using a technique from https://stackoverflow.com/a/21802022).
library(reshape2)
results_long <- melt(results, id.vars="index")
results_long$facet <- ifelse(results_long$variable=="ratio", "ratio", "values")
results_long$facet <- factor(results_long$facet, levels=c("values", "ratio"))
ggplot(results_long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"),
values=c("#1B9E77", "#D95F02", "#7570B3")) +
theme(legend.justification=c(0,1), legend.position=c(0,1)) +
guides(colour=guide_legend(title=NULL)) +
theme(axis.title.y = element_blank())
Try this:
library(ggplot2)
library(gtable)
library(gridExtra)
AlignPlots <- function(...) {
LegendWidth <- function(x) x$grobs[[8]]$grobs[[1]]$widths[[4]]
plots.grobs <- lapply(list(...), ggplotGrob)
max.widths <- do.call(unit.pmax, lapply(plots.grobs, "[[", "widths"))
plots.grobs.eq.widths <- lapply(plots.grobs, function(x) {
x$widths <- max.widths
x
})
legends.widths <- lapply(plots.grobs, LegendWidth)
max.legends.width <- do.call(max, legends.widths)
plots.grobs.eq.widths.aligned <- lapply(plots.grobs.eq.widths, function(x) {
if (is.gtable(x$grobs[[8]])) {
x$grobs[[8]] <- gtable_add_cols(x$grobs[[8]],
unit(abs(diff(c(LegendWidth(x),
max.legends.width))),
"mm"))
}
x
})
plots.grobs.eq.widths.aligned
}
df <- data.frame(x = c(1:5, 1:5),
y = c(1:5, seq.int(5,1)),
type = factor(c(rep_len("t1", 5), rep_len("t2", 5))))
p1.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p1.2 <- ggplot(df, aes(x = x, y = y, colour = type)) + geom_line()
plots1 <- AlignPlots(p1.1, p1.2)
do.call(grid.arrange, plots1)
p2.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p2.2 <- ggplot(df, aes(x = x, y = y)) + geom_line()
plots2 <- AlignPlots(p2.1, p2.2)
do.call(grid.arrange, plots2)
Produces this:
// Based on multiple baptiste's answers
Encouraged by baptiste's comment, here's what I did in the end:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Plot absolute values
remove_x_axis =
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control")) +
remove_x_axis
# Arrange the two plots above each other
grob_ratios <- ggplotGrob(plot_ratios)
grob_values <- ggplotGrob(plot_values)
legend_column <- 5
legend_width <- grob_values$widths[legend_column]
grob_ratios <- gtable_add_cols(grob_ratios, legend_width, legend_column-1)
grob_combined <- gtable:::rbind_gtable(grob_values, grob_ratios, "first")
grob_combined <- gtable_add_rows(
grob_combined,unit(-1.2,"cm"), pos=nrow(grob_values))
grid.draw(grob_combined)
(I later realised I didn't even need to extract the legend width, since the size="first" argument to rbind tells it just to have that one override the other.)
It feels a bit messy, but it is exactly the layout I was hoping for.
An alternative & quite easy solution is as follows:
# loading needed packages
library(ggplot2)
library(dplyr)
library(tidyr)
# Prepare some sample data
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# reshape into long format
long <- results %>%
gather(variable, value, -index) %>%
mutate(facet = ifelse(variable=="ratio", "ratio", "values"))
long$facet <- factor(long$facet, levels=c("values", "ratio"))
# create the plot & remove facet labels with theme() elements
ggplot(long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"), values=c("green", "red", "blue")) +
theme(axis.title.y=element_blank(), strip.text=element_blank(), strip.background=element_blank())
which gives:
Related
I have barplots, but would like to run a Wilcox.test within each "grp1" comparing the bars to the control for that group, and then putting an asterix if it is significant.
I've seen "compare_means" to get the comparisons, but I'm trying to make it automated and not so manual. Would "geom_signif" or "stat_compare_means" do this? Can someone help with this? Thank you very much.
I need the comparison to be made using the full dataset, not just the means (which is only one value per bar). I added a line at the end of the code running one of the comparisons so you can see where I need the p-values from.
y <- c(runif(100,0,4.5),runif(100,3,6),runif(100,4,7))
grp1 <- sample(c("A","B","C","D"),size = 300, replace = TRUE)
grp2 <- rep(c("High","Med","Contrl"),each=100)
dataset <- data.frame(y,grp1,grp2)
means <- aggregate(y~grp1+grp2,data=dataset,mean)
sd <- aggregate(y~grp1+grp2,data=dataset,function(x){sd(x)})
means.all <- merge(sd,means,by=c("grp1","grp2"))
names(means.all)[3:4] <- c("sd","y.mean")
library(ggplot2)
p<- ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))
p
compare_means(y~grp2,data = dataset[dataset$grp1=="A",],method="wilcox.test")
Maybe this is not the optimal way but you can create a list splitting the data and applying the stat_compare_means() function individually at each level of your data. After that you can arrange the plots in one using patchwork:
library(ggplot2)
library(ggpubr)
library(patchwork)
#Split data
List <- split(means.all,means.all$grp1)
#Function for plot
myfun <- function(x)
{
#Ref group
rg <- paste0(unique(x$grp1),'.','Contrl')
#Plot
G <- ggplot(x, aes(x=interaction(grp1,grp2), y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
stat_compare_means(ref.group = rg,label = "p.signif",method = "wilcox.test",label.y = 7)+
theme(axis.text.x = element_blank())+
xlab(unique(x$grp1))
return(G)
}
#Apply
Lplot <- lapply(List, myfun)
#Wrap plots
wrap_plots(Lplot,nrow = 1)+plot_layout(guides = 'collect')
Output:
Consider this update that takes the values for asterisks stored in a new dataframe:
#Create p-vals dataset
List2 <- split(dataset,dataset$grp1)
#p-val function
mypval <- function(x)
{
y <- compare_means(y~grp2,data = x,method="wilcox.test")
y <- y[,c('group2', 'group1','p.signif')]
names(y)<-c('grp2','grp1','p.signif')
y <- y[y$grp2=='Contrl',]
y$grp2 <- y$grp1
y <- rbind(y,data.frame(grp2='Contrl',grp1='',p.signif=''))
y$grp1 <- unique(x$grp1)
y$y.mean=7
return(y)
}
#Apply
dfpvals <- lapply(List2, mypval)
df <- do.call(rbind,dfpvals)
#Plot
ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2,group=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
geom_text(data=df,aes(x=grp1, y=y.mean,group=grp2,label=p.signif),
position=position_dodge(0.9))
Output:
Is there a more efficient way to present these data in ggplot2? Ideally, I would like them both in one plot. I know this can be achieved in python with matlibplot, but I like the visuals of ggplot2 better.
R code used to generate the plots:
#load libraries
library(ggplot2)
library (gridExtra)
library(scales)
#generate some data plot 1
var_iter <- c(seq(0, 4000, 20))
x <- runif(201,0.877813, 2.283210)
var_loss <- c(sort(x, decreasing = TRUE))
rndm1 <- data.frame(var_iter, var_loss)
#generate some data plot 2
var_iter2 <- c(seq(0, 3500, 500))
x2 <- runif(8,0.1821, 0.6675)
var_acc <- c(sort(x2, decreasing = FALSE))
rndm2 <- data.frame(var_iter2, var_acc)
#plot loss
c <- ggplot(data=rndm1, aes(x=var_iter, y=var_loss)) + geom_line(aes(colour="Log Loss")) +
scale_colour_manual(name='', values=c('Log Loss'='#00BFC4')) + #theme_bw() +
xlab("iterations") + ylab("log loss") + theme(legend.position=c(1,1),legend.justification=c(1,1),
legend.direction="horizontal",
legend.box="horizontal",
legend.box.just = c("top"),
legend.background = element_rect(fill=alpha('white', 0.3)))
#plot accuracy
d <- ggplot(data=rndm2, aes(x=var_iter2, y=var_acc)) + geom_line(aes(colour="Accuracy")) +
scale_colour_manual(name='', values=c('Accuracy'='#F8766D')) + #theme_bw() +
xlab("iterations") + ylab("accuracy") + theme(legend.position=c(0.80, 1),legend.justification=c(1,1),
legend.direction="horizontal",
legend.box="horizontal",
legend.box.just = c("top"),
legend.background = element_rect(fill=alpha('white', 0.3)))
grid.arrange(c, d, ncol=2)
You still can use the same concept of adding a layer on another layer.
ggplot(rndm1, aes(x=var_iter)) +
geom_line(aes(y=var_loss, color="var_loss")) +
geom_line(data=rndm2, aes(x=var_iter2, y=var_acc, color="var_acc"))
Or combine two data frame together and create another variable for color.
# Change the column name, so they can combine together
names(rndm1) <- c("x", "y")
names(rndm2) <- c("x", "y")
rndm <- rbind(rndm1, rndm2)
# Create a variable for color
rndm$group <- rep(c("Log Loss", "Accuracy"), c(dim(rndm1)[1], dim(rndm2)[1]))
ggplot(rndm, aes(x=x, y=y, color=group)) + geom_line()
I wanted to suggest the same idea as the JasonWang, but he was faster. I think it is the way to go (hence I upvoted it myself).
ggplot2 doesn't allow two y axis, for a reason: Plot with 2 y axes, one y axis on the left, and another y axis on the right
It is misleading.
But if you still want to do it. You can do it with base plot or dygraphs (for example):
rndm2$var_iter <- rndm2$var_iter2
rndm2$var_iter2 <- NULL
merged.rndm <- merge(rndm1, rndm2, all = TRUE)
dygraph(merged.rndm) %>% dySeries("var_acc", axis = "y2")
But this will give you points for var_acc, as it has a lot less observations.
You could fill it.
merged.rndm1 <- as.data.frame(zoo::na.approx(merged.rndm))
dygraph(merged.rndm1) %>% dySeries("var_acc", axis = "y2")
Note: this has approximated values, which might not be something you want to do.
New to programming and first time post.
I'm trying to create a stacked bubble chart to display how a population breaks down into it's proportions. My aim is to write this as a function so that I can use it repeatedly easily, but I need to get the meat of the code sorted before turning it to a function.
This is the type of plot I would like:
This is the code I've tried so far:
library(ggplot2)
# some data
observations = c(850, 500, 200, 50)
plot_data = data.frame(
"x" = rep.int(1,length(observations))
,"y" = rep.int(1,length(observations))
, "size" = rep.int(1,length(observations))
,"colour" = c(1:length(observations))
)
# convert to percentage for relative sizes
for (i in 1:length(observations))
{
plot_data$size[i] = (observations[i]/max(observations))*100
}
ggplot(plot_data,aes(x = x, y = y)) +
geom_point(aes(size = size, color = colour)) +
scale_size_identity() +
scale_y_continuous (limits = c(0.5, 1.5)) +
theme(legend.position = "none")
This produces a bullseye type image.
My approach has been to try and work out how the circle radii are calculated, and then update the y value in the for loop for each entry such that all the circles touch at the base - this is where I have been failing.
So my question:
How can I work out what the y coordinates for each circle needs to be?
Thank you for any help and hints.
I think this simplifies the answer that Henrick found:
circle <- function(center, radius, group) {
th <- seq(0, 2*pi, len=200)
data.frame(group=group,
x=center[1] + radius*cos(th),
y=center[2] + radius*sin(th))
}
# Create a named vector for your values
obs <- c(Org1=500, Org2=850, Org3=50, Org4=200)
# this reverse sorts them (so the stacked layered circles work)
# and makes it a list
obs <- as.list(rev(sort(obs)))
# need the radii
rads <- lapply(obs, "/", 2)
# need the max
x <- max(sapply(rads, "["))
# build a data frame of created circles
do.call(rbind.data.frame, lapply(1:length(rads), function(i) {
circle(c(x, rads[[i]]), rads[[i]], names(rads[i]))
})) -> dat
# make the plot
gg <- ggplot(dat)
gg <- gg + geom_polygon(aes(x=x, y=y, group=group, fill=group),
color="black")
gg <- gg + coord_equal()
gg <- gg + ggthemes::theme_map()
gg <- gg + theme(legend.position="right")
gg
You can tweak the guides/colors with standard ggplot functions.
Let's say we have a simple plot of the following kind.
library(ggplot2)
df = data.frame(y=c(0,1.1,2.3,3.1,2.9,5.8,6,7.4,8.2,9.1),x=seq(1,100, length.out=10))
ggplot(df,aes(x=x,y=y)) + geom_point()
x perfectly correlates with z. The relation is: Constant=x^2*z=1.23
therefore I could rewrite the data.frame like this:
df = cbind(df,1.23/df$x^2)
The question is:
How can I display both variables xand zone the x-axis? It could be one at the bottom and one at the top of the graph or both at the bottom.
Here's a dangerous attempt. Previous version with a log-scale was just wrong.
library(ggplot2)
df = data.frame(y=c(0,1.1,2.3,3.1,2.9,5.8,6,7.4,8.2,9.1),
x=seq(1,100, length.out=10))
df$z = 1.23/df$x^2
## let's at least remove the gridlines
p1 <- ggplot(df,aes(x=x,y=y)) + geom_point() +
scale_x_continuous(expand=c(0,0)) +
theme(panel.grid.major=element_blank(),
panel.grid.minor = element_blank())
## make sure both plots have expand = c(0,0)
## otherwise data and top-axis won't necessarily be aligned...
p2 <- ggplot(df,aes(x=z,y=y)) + geom_point() +
scale_x_continuous(expand=c(0,0))
library(gtable)
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(p2)
tmp <- gtable_filter(g2, pattern="axis-b")
## ugly tricks to extract and reshape the axis
axis <- tmp[["grobs"]][[1]][["children"]][["axis"]] # corrupt the children
axis$layout <- axis$layout[2:1,]
axis$grobs[[1]][["y"]] <- axis$grobs[[1]][["y"]] - unit(1,"npc") + unit(0.15,"cm")
## back to "normality"
g1 <- gtable_add_rows(g1, sum(tmp$heights), 2)
gtableAddGrobs <- gtable_add_grob # alias, making sure #!hadley doesn't see this
g1 <- gtableAddGrobs(g1,
grobs=list(gtable_filter(g2, pattern="xlab"),axis),
t=c(1,3), l=4)
grid.newpage()
grid.draw(g1)
A both-on-the-bottom approach can be done with the excellent cowplot library.
library(ggplot2)
library(cowplot)
data <- data.frame(temp_c=runif(100, min=-5, max=30), outcome=runif(100))
plot <- ggplot(data) +
geom_point(aes(x=temp_c, y=outcome)) +
theme_classic() +
labs(x='Temperature (Celsius)')
x2plot <- ggplot(data) +
geom_point(aes(x=temp_c, y=outcome)) +
theme_classic() +
scale_x_continuous(label=function(x){round(x*(9/5) + 32)}) +
labs(x='Temperature (Fahrenehit)')
x <- get_x_axis(x2plot)
xl <- get_plot_component(x2plot, "xlab-b")
plot_grid(plot, ggdraw(x), ggdraw(xl), align='v', axis='rl', ncol=1,
rel_heights=c(0.8, 0.05, 0.05))
I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()