How to reorder plots in combined ggplot2 graph? - r

I have seen the solutions to reordering subplots when it's just one object being plotted (e.g. mydata), but I am not sure how to do this when there are multiple objects being plotted (in this instance, mydata1 and mydata2). I would like to switch the order of the violins such that Treatment2 is on the left, and Treatment1 is on the right, instead of vice-versa like I currently have it:
mycp <- ggplot() + geom_violin(data = mydata1, aes(x= treatment, y = Myc_List1, fill = Myc_List1, colour="Myc Pathway (Treatment1)")) +
geom_violin(data = mydata2, aes(x= treatment, y = Myc_List1, fill = Myc_List1, colour = "Myc Pathway (Treatment2)"))
When I try solutions such as in Ordering of bars in ggplot, or the following solution posed at https://www.r-graph-gallery.com/22-order-boxplot-labels-by-names.html, this graph remains unchanged.
Hopefully this makes sense, and thank you for reading!
UPDATE
Here is another solution as well from https://www.datanovia.com/en/blog/how-to-change-ggplot-legend-order/
mydata$treatment<- factor(mydata$treatment, levels = c("Treatment2", "Treatment1"))

I'm not sure how to reorder factors in this case, but you can change the x axis scale to get the desired result, e.g.
library(tidyverse)
data("Puromycin")
dat1 <- Puromycin %>%
filter(state == "treated")
dat2 <- Puromycin %>%
filter(state == "untreated")
mycp <- ggplot() +
geom_violin(data = dat1, aes(x= state, y = conc, colour = "Puromycin (Treatment1)")) +
geom_violin(data = dat2, aes(x= state, y = conc, colour = "Puromycin (Treatment2)"))
mycp
mycp2 <- ggplot() +
geom_violin(data = dat1, aes(x = state, y = conc, colour = "Puromycin (Treatment1)")) +
geom_violin(data = dat2, aes(x = state, y = conc, colour = "Puromycin (Treatment2)")) +
scale_x_discrete(limits = c("untreated", "treated"))
mycp2

Stack the data into a single data frame and set the order by converting treatment to a factor. In your example, the colors and legend are redundant, since you can label the x-axis values to describe each treatment, or change the x-axis title to "Myc Pathway", but the code below in any case shows how to get the ordering.
library(tidyverse)
bind_rows(mydata1, mydata2) %>%
mutate(treatment = factor(treatment, levels=paste0("Treatment", c(2,1)) %>%
ggplot(aes(treatment, Myc_List1, colour=treatment)) +
geom_violin()
Here's a reproducible example:
library(tidyverse)
theme_set(theme_bw(base_size=15))
# Create two separate data frames to start with
d1=iris %>% filter(Species=="setosa")
d2=iris %>% filter(Species=="versicolor")
bind_rows(d1, d2) %>%
mutate(Species = factor(Species, levels=c("versicolor", "setosa"))) %>%
ggplot(aes(Species, Petal.Width, colour=Species)) +
geom_violin()

Related

Sorting Y-axis of barplot based on the decresing value of last facet grid in ggplot2

Question:
I am trying to sort the Y-axis of the barplot based on the decreasing value of the last facet group "Step4" with having a common Y-axis label. There are suggestions for ordering all facet groups within themselves but how to do with the common y-axis label and values of one facet group. I have attached a sample data and code for the initial plot to understand the question.
Thanks in advance.
Data:
Download the sample data here
Code:
library(ggplot2)
library(reshape2)
#reading data
data <- read.csv(file = "./sample_data.csv", stringsAsFactors = TRUE)
#reshaping data in longer format using reshape::melt
data.melt <- melt(data)
#plotting the data in multi-panel barplot
ggplot(data.melt, aes(x= value, y=reorder(variable, value))) +
geom_col(aes(fill = Days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")
Graph: Barplot Graph for the sample data
Summarise the values for last 'step' and extract the levels from the data.
library(dplyr)
library(ggplot2)
lvls <- data.melt %>%
arrange(step) %>%
filter(step == last(step)) %>%
#Or
#filter(step == 'Step4') %>%
group_by(variable) %>%
summarise(sum = sum(value)) %>%
arrange(sum) %>%
pull(variable)
data.melt$variable <- factor(data.melt$variable, lvls)
ggplot(data.melt, aes(x= value, y= variable)) +
geom_col(aes(fill = days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")

Same variables from 2 separate data.frames sided by side in ggplot2

I want to plot the exact same variable names (ses & math) from 2 separate data.frames (dat1 & dat2) but side by side so I can visually compare them.
I have tried the following but it places both data.frames on top of each other.
Is there a function within ggplot2 to plot ses vs. math from dat1 and the same from dat2 side by side and placed on the same axes scales?
library(ggplot2)
dat1 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
dat2 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/sm.csv')
ggplot(dat1, aes(x = ses, y = math, colour = factor(sector))) +
geom_point() +
geom_point(data = dat2, aes(x = ses, y = math, colour = factor(sector)))
You can try faceting combining the two datasets :
library(dplyr)
library(ggplot2)
list(dat1 = dat1 %>%
select(sector,ses, math) %>%
mutate(sector = as.character(sector)) ,
dat2 = dat2 %>% select(sector,ses, math)) %>%
bind_rows(.id = 'name') %>%
ggplot() +
aes(x = ses, y = math, colour = factor(sector)) +
geom_point() +
facet_wrap(.~name)
Another option is to create list of plots and arrange them with grid.arrange :
list_plots <- lapply(list(dat1, dat2), function(df) {
ggplot(df, aes(x = ses, y = math, colour = factor(sector))) + geom_point()
})
do.call(gridExtra::grid.arrange, c(list_plots, ncol = 2))

Boxplot for several variables with different Y scale

I have 4 variables (A, B, C, D) with similar pattern on 3 Locations. I would like to plot a box plot (variables as dots on Y-axis, locations as X). But the variables have values of different orders of magnitude. Is there a way of scaling the Y-axis and have all variables plotted on the boxplots? Maybe differenced by colouring.
Location = c("Washington","Washington","Washington","Washington","Washington","Washington", "Maine","Maine","Maine","Maine","Maine", "Florida","Florida","Florida","Florida","Florida","Florida")
A = c(0.000693156, 0.000677354, 0.000727863, 0.000650822, 0.000908343, 0.001126689, 0.001316292, 0.000975274, 0.00109082, 0.001057585, 0.000927826, 0.000552769, 0.000532546, 0.000559781, 0.000771569, 0.000563436, 0.000551136)
B = c(0.001915388, 0.001936627, 0.001476521, 0.001573681, 0.002584282, 0.00738909, 0.008089839, 0.006616564, 0.00495211, 0.004515925, 0.003791596, 0.000653847, 0.000350701, 0.000559781, 0.001920087, 0.000738206, 0.001077627)
C = c(0.000138966, 0.000104745, 0.000145573, 0.000103305, 5.08255E-05, 0.000361988, 0.000264876, 0.000454172, 0.000277471, 0.000117919, 8.9214E-05, 0.000173727, 0.000108241, 8.54628E-05, 2.35593E-05, 3.1302E-05, 1.12019E-05)
D = c(0.000108829, 0.000135005, 0.000120617, 9.29746E-05, 0.000105561, 9.27596E-05, 0.000121317, 0.000131471, 0.000152503, 0.000128974, 0.000196271, 0.000142141, 0.000147208, 0.00013674, 0.000147246, 0.000185204, 0.000103058)
df = data.frame(Location, A, B, C, D)
And this is what I have tried for two variables as individual graphs
library(ggplot2)
a <- ggplot(df, aes(x=Location, y=A)) +
geom_boxplot()
a + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="red")
b <- ggplot(df, aes(x=Location, y=B)) +
geom_boxplot()
b + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="blue")
Can I merge all 4 variables in 1 graph with a scaled Y-axis?
Can I add a legend only showing "A" and "D"?
If you reshape your data to "long" format, faceting is one option. Note that you must set scales = 'free' in facet_wrap().
library(tidyverse)
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value')
g <- ggplot(data = df.long, aes(x = Location, y = value)) +
geom_boxplot() +
facet_wrap(facets = ~variable, scales = 'free')
print(g)
If you wanted to get everything on one plot, you'd have to rescale the data per group. Here I've normalized each data point to between 0 and 1, relative to its original scale.
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value') %>%
group_by(variable) %>%
mutate(value_norm = value - min(value),
value_norm = value_norm / max(value_norm)
)
g.norm <- ggplot(data = df.long, aes(x = Location, y = value_norm, fill = variable)) +
geom_boxplot()
print(g.norm)
Try this. Using scale_y_log10. Not the most beautiful plot, but ...
library(ggplot2)
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Location) %>%
ggplot(aes(x=Location, y=value, color = name)) +
geom_boxplot() +
geom_dotplot(aes(fill = name), color = "black", binaxis='y', dotsize=.5) +
scale_y_log10()
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2020-04-14 by the reprex package (v0.3.0)

More compact use of ggplot : grid spaghetti plot

The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")

Plot discrete values with different color

Given a dataframe with discrete values,
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
I want to make a plot like
However I want to make different color for each layer, say red and green for "a", yellow/blue for "b".
The idea is to reshape your data (define coordinates to draw the rectangles) in order to use geom_rect from ggplot:
library(ggplot2)
library(reshape2)
i = setNames(expand.grid(1:nrow(d),1:ncol(d[-1])),c('x1','y1'))
ggplot(cbind(i,melt(d, id.vars='id')),
aes(xmin=x1, xmax=x1+1, ymin=y1, ymax=y1+1, color=variable, fill=value)) +
geom_rect()
Try geom_tile(). But you need to reshape your data to get exactly the same figure as you presented.
df <- data.frame(id=factor(c(1:6)), a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
library(reshape2)
df <- melt(df, vars.id = c(df$id))
library(ggplot2)
ggplot(aes(x = id, y = variable, fill = value), data = df) + geom_tile()
require("dplyr")
require("tidyr")
require("ggplot2")
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
ggplot(d %>% gather(type, value, a, b, c) %>% mutate(value = paste0(type, value)),
aes(x = id, y = type)) +
geom_tile(aes(fill = value), color = "white") +
scale_fill_manual(values = c("forestgreen", "indianred", "lightgoldenrod1",
"royalblue", "plum1", "plum2", "plum3"))
First we use reshape2 to transform the data from wide to long. Then to get discrete values we use as.factor(value) and finally we use scale_fill_manual to assign the 5 different colours we need. In geom_tile we specify the colour of the tile borders.
library(reshape2)
library(ggplot2)
df <- data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
df <- melt(df, id.vars=c("id"))
ggplot(df, aes(id, variable, fill = as.factor(value))) + geom_tile(colour = "white") +
scale_fill_manual(values = c("lightblue", "steelblue2", "steelblue3", "steelblue4", "darkblue"), name = "Values")+
scale_x_discrete(limits = 1:6)

Resources