I want to plot the exact same variable names (ses & math) from 2 separate data.frames (dat1 & dat2) but side by side so I can visually compare them.
I have tried the following but it places both data.frames on top of each other.
Is there a function within ggplot2 to plot ses vs. math from dat1 and the same from dat2 side by side and placed on the same axes scales?
library(ggplot2)
dat1 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
dat2 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/sm.csv')
ggplot(dat1, aes(x = ses, y = math, colour = factor(sector))) +
geom_point() +
geom_point(data = dat2, aes(x = ses, y = math, colour = factor(sector)))
You can try faceting combining the two datasets :
library(dplyr)
library(ggplot2)
list(dat1 = dat1 %>%
select(sector,ses, math) %>%
mutate(sector = as.character(sector)) ,
dat2 = dat2 %>% select(sector,ses, math)) %>%
bind_rows(.id = 'name') %>%
ggplot() +
aes(x = ses, y = math, colour = factor(sector)) +
geom_point() +
facet_wrap(.~name)
Another option is to create list of plots and arrange them with grid.arrange :
list_plots <- lapply(list(dat1, dat2), function(df) {
ggplot(df, aes(x = ses, y = math, colour = factor(sector))) + geom_point()
})
do.call(gridExtra::grid.arrange, c(list_plots, ncol = 2))
Related
I have seen the solutions to reordering subplots when it's just one object being plotted (e.g. mydata), but I am not sure how to do this when there are multiple objects being plotted (in this instance, mydata1 and mydata2). I would like to switch the order of the violins such that Treatment2 is on the left, and Treatment1 is on the right, instead of vice-versa like I currently have it:
mycp <- ggplot() + geom_violin(data = mydata1, aes(x= treatment, y = Myc_List1, fill = Myc_List1, colour="Myc Pathway (Treatment1)")) +
geom_violin(data = mydata2, aes(x= treatment, y = Myc_List1, fill = Myc_List1, colour = "Myc Pathway (Treatment2)"))
When I try solutions such as in Ordering of bars in ggplot, or the following solution posed at https://www.r-graph-gallery.com/22-order-boxplot-labels-by-names.html, this graph remains unchanged.
Hopefully this makes sense, and thank you for reading!
UPDATE
Here is another solution as well from https://www.datanovia.com/en/blog/how-to-change-ggplot-legend-order/
mydata$treatment<- factor(mydata$treatment, levels = c("Treatment2", "Treatment1"))
I'm not sure how to reorder factors in this case, but you can change the x axis scale to get the desired result, e.g.
library(tidyverse)
data("Puromycin")
dat1 <- Puromycin %>%
filter(state == "treated")
dat2 <- Puromycin %>%
filter(state == "untreated")
mycp <- ggplot() +
geom_violin(data = dat1, aes(x= state, y = conc, colour = "Puromycin (Treatment1)")) +
geom_violin(data = dat2, aes(x= state, y = conc, colour = "Puromycin (Treatment2)"))
mycp
mycp2 <- ggplot() +
geom_violin(data = dat1, aes(x = state, y = conc, colour = "Puromycin (Treatment1)")) +
geom_violin(data = dat2, aes(x = state, y = conc, colour = "Puromycin (Treatment2)")) +
scale_x_discrete(limits = c("untreated", "treated"))
mycp2
Stack the data into a single data frame and set the order by converting treatment to a factor. In your example, the colors and legend are redundant, since you can label the x-axis values to describe each treatment, or change the x-axis title to "Myc Pathway", but the code below in any case shows how to get the ordering.
library(tidyverse)
bind_rows(mydata1, mydata2) %>%
mutate(treatment = factor(treatment, levels=paste0("Treatment", c(2,1)) %>%
ggplot(aes(treatment, Myc_List1, colour=treatment)) +
geom_violin()
Here's a reproducible example:
library(tidyverse)
theme_set(theme_bw(base_size=15))
# Create two separate data frames to start with
d1=iris %>% filter(Species=="setosa")
d2=iris %>% filter(Species=="versicolor")
bind_rows(d1, d2) %>%
mutate(Species = factor(Species, levels=c("versicolor", "setosa"))) %>%
ggplot(aes(Species, Petal.Width, colour=Species)) +
geom_violin()
I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)
I am wondering if there is a better way to produce 4 barcharts of different outcome variables arranged in a grid:
This is the code I used:
library(cowplot)
bar1 <- ggplot(data = subset(data, !is.na(MHQ_Heading_Male_Quartile))) +
geom_bar(mapping = aes(x = MHQ_Heading_Male_Quartile))
bar2 <- ggplot(data = subset(data, !is.na(AHQ_Heading_Male_Quartile))) +
geom_bar(mapping = aes(x = AHQ_Heading_Male_Quartile))
bar3 <- ggplot(data = subset(data, !is.na(MHQ_Heading_Female_Quartile))) +
geom_bar(mapping = aes(x = MHQ_Heading_Female_Quartile))
bar4 <- ggplot(data = subset(data, !is.na(AHQ_Heading_Female_Quartile))) +
geom_bar(mapping = aes(x = AHQ_Heading_Female_Quartile))
plot_grid(bar1, bar2, bar3, bar4, ncol = 2)
However, there is a lot of repeated code- is there some function or way to create the same plot with ggplot2 in fewer lines?
I would convert relevant columns from wide to long (the ones ending in "_Quartile") and then use facet_wrap to show the 4 plots in a 2x2 grid with scales = "free".
Something like this:
data %>%
gather(key, value, ends_with("Quartile")) %>%
filter(!is.na(value)) %>%
ggplot(aes(value)) +
geom_bar() +
facet_wrap(~ key, scales = "free", ncol = 2, nrow = 2)
As mentioned you need to make it a long format using dplyr gather (or reshape package) and then facet over this.
`data %>%
select( MHQ_Heading_Male_Quartile, AHQ_Heading_Male_Quartile, MHQ_Heading_Female_Quartile, AHQ_Heading_Female_Quartile) %>%
gather("Type", "Range", MHQ_Heading_Male_Quartile:AHQ_Heading_Female_Quartile) %>%
filter(!is.na(Range)) %>%
ggplot(aes(x=Range)) +
geom_bar() +
facet_wrap(~Type, scales="free")`
I'll leave it to you to clean the graphs up but that's the basic premise.
Extract the column names to be shown into nms and then for each one use qplot to create a ggplot object so that bars is a list of such objects. Then run plot_grid on that.
nms <- grep("Quartile", names(data), value = TRUE)
bars <- lapply(nms, function(nm) qplot(na.omit(data[[nm]]), xlab = nm))
do.call("plot_grid", bars)
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))
The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")