I'm looking for help in order to merge two plots and respect their respective scale size.
Here is a reproductible example:
data1<- subset(mtcars, cyl = 4)
data1$mpg <- data1$mp*5.6
data2<- subset(mtcars, cyl = 8)
p1 <- ggplot(data1, aes(wt, mpg, colour = cyl)) + geom_point()
p2 <- ggplot(data2, aes(wt, mpg, colour = cyl)) + geom_point()
grid.arrange(p1, p2, ncol = 2)
But what I'm looking for is to merge the two plots and respect the scale size and get something like :
It would be nice to not use a package which need to define the ratio since it's difficult to known how much I should reduce the second plot compared to the first one... And event more difficult when we have more than 2 plots.
I think what you are trying to achieve is something like this:
library(tidyverse)
mtcars %>%
filter(cyl %in% c(4, 8)) %>%
mutate(mpg = ifelse(cyl == 4, mpg * 5.6, mpg)) %>%
ggplot(aes(x = wt, y = mpg, col = as.factor(cyl))) +
geom_point(show.legend = FALSE) +
facet_wrap(~ cyl)
NOTE: I see some bugs in your original code. For example, if you want to use subset() to subset your data, you have to change your code from:
data1 <- subset(mtcars, cyl = 4)
to:
data1 <- subset(mtcars, cyl == 4)
subset(mtcars, cyl = 4) does not do anything.
Related
I would like to create multiple plots using for loop setup. However my code does not work. Could anyone give me some guidance on this?
for i in 1:4 {
paste0("p_carb_",i) <- ggplot(mtcars%>% filter(carb==4), aes(x = wt, y = mpg, color = disp))
+ geom_point()
}
Perhaps this?
library(ggplot2)
library(dplyr)
ggs <- lapply(sort(unique(mtcars$carb)), function(crb) {
ggplot(filter(mtcars, carb == crb), aes(x = wt, y = mpg, color = disp)) +
geom_point()
})
This produces six plots, which when the first two are viewed side-by-side (calling ggs[[1]] and then ggs[[2]]), we see
An alternative might be to facet the data, as in
ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) +
facet_wrap(~ carb) +
geom_point()
But the literal translation of your paste(..) <- ... code into something syntactically correct, we'd use an anti-pattern in R: assign:
for (crb in sort(unique(mtcars$carb))) {
gg <- ggplot(filter(mtcars, carb == crb), aes(x = wt, y = mpg, color = disp)) +
geom_point()
assign(paste0("carb_", crb), gg)
}
Again, this is not the preferred/best-practices way of doing things. It is generally considered much better to keep like-things in a list for uniform/consistent processing of them.
Multiple IDs ... two ways:
Nested lapply:
carbs <- sort(unique(mtcars$carb))
ggs <- lapply(carbs, function(crb) {
gears <- subset(mtcars, carb == crb)$gear
lapply(gears, function(gr) {
ggplot(dplyr::filter(mtcars, carb == crb, gear == gr), aes(x = wt, y = mpg, color = disp)) +
geom_point()
})
})
Where ggs[[1]] is a list of lists. ggs[[1]][[1]] will be one plot.
split list, one-deep:
carbsgears <- split(mtcars, mtcars[,c("carb", "gear")], drop = TRUE)
ggs <- lapply(carbsgears, function(dat) {
ggplot(dat, aes(x = wt, y = mpg, color = disp)) + geom_point()
})
Here, ggs is a list only one-deep. The names are just concatenated strings of the two fields, so since we have mtcars$carb with values c(1,2,3,4,6,8) and mtcars$gear with values c(3,4,5), removing combinations without data we have names:
names(ggs)
# [1] "1.3" "2.3" "3.3" "4.3" "1.4" "2.4" "4.4" "2.5" "4.5" "6.5" "8.5"
where "1.3" is carb == 1 and gear == 3. When column names have dots in them, this might become ambiguous.
I have a figure with 16 regression lines and I need to be able to identify them. Using a color gradient or symbols or different line types do not really help.
My idea therefore is, to just (haha) annotate every line.
Therefore, I build a dataset (hpAnnotatedLines) with the different maximum x values. This is the position the text should start. However, I have no idea how to automatically extract the respective y values of the predicted regression lines at the maximum x-axis values, which is different for each line.
Please find a smaller data set using mtcars as an example
library(ggplot2)
library(dplyr)
library(ggrepel)
#just select the data I need
mtcars1 <- select(mtcars, disp,cyl,hp)
mtcars1$cyl <- as.factor(mtcars1$cyl)
#extract max values
mtcars2 <- mtcars1 %>%
group_by(cyl) %>%
summarise(Max.disp= max(disp))
#build dataset for the annotation layer
#note that hp was done by hand. Here I need help
hpAnnotatedLines <- data.frame(cyl=levels(mtcars2$cyl),
disp=mtcars2$Max.disp,
hp=c(90,100,210))
#example plot
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50)) +
geom_text_repel(
data = hpAnnotatedLines,
aes(label = cyl),
size = 3,
nudge_x = 1)
Instead of extracting the fitted values you could add the labels via geom_text by switching the stat to smooth and setting the label aesthetic via after_stat such that only the last point of each regression line gets labelled:
library(ggplot2)
library(dplyr)
myfun <- function(x, color) {
data.frame(x = x, color = color) %>%
group_by(color) %>%
mutate(label = ifelse(x %in% max(x), as.character(color), "")) %>%
pull(label)
}
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm) +
geom_text(aes(label = after_stat(myfun(x, color))),
stat = "smooth", method = "lm", hjust = 0, size = 3, nudge_x = 1, show.legend = FALSE) +
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))
It's a bit of a hack, but you can extract the data from the compiled plot object. For example first make the plot without the labels,
myplot <- ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))
Then use ggplot_build to get the data from the second layer (The geom_smooth layer) and transform it back into the names used by your data. Here we find the largest x value per group, and then take that y value.
pobj <- ggplot_build(myplot)
hpAnnotatedLines <- pobj$data[[2]] %>% group_by(group) %>%
top_n(1, x) %>%
transmute(disp=x, hp=y, cyl=levels(mtcars$cyl)[group])
Then add an additional layer to your plot
myplot +
geom_text_repel(
data = hpAnnotatedLines,
aes(label = cyl),
size = 3,
nudge_x = 1)
If your data is not that huge, you can extract the predictions out using augment() from broom and take that with the largest value:
library(broom)
library(dplyr)
library(ggplot2)
hpAnn = mtcars %>% group_by(cyl) %>%
do(augment(lm(hp ~ disp,data=.))) %>%
top_n(1,disp) %>%
select(cyl,disp,.fitted) %>%
rename(hp = .fitted)
# A tibble: 3 x 3
# Groups: cyl [3]
cyl disp hp
<dbl> <dbl> <dbl>
1 4 147. 96.7
2 6 258 99.9
3 8 472 220.
Then plot:
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))+
geom_text_repel(
data = hpAnn,
aes(label = cyl),
size = 3,
nudge_x = 1)
I would like to take a faceted histogram and add text on each plot indicating the total number of observations in that facet. So for carb = 1 the total count would be 7, carb = 2 the total count would be 10 etc.
p <- ggplot(mtcars, aes(x = mpg, stat = "count",fill=as.factor(carb))) + geom_histogram(bins = 8)
p <- p + facet_grid(as.factor(carb) ~ .)
p
I can do this with the table function but for more complex faceting how can I do it quickly?
You can try this. Maybe is not the most optimal because you have to define the x and y position for the label (this is done in Labels for x and in geom_text() for y with 3). But it can help you:
#Other
library(tidyverse)
#Create similar data for labels
Labels <- mtcars %>% group_by(carb) %>% summarise(N=paste0('Number is: ',n()))
#X position
Labels$mpg <- 25
#Plot
ggplot(mtcars, aes(x = mpg, stat = "count",fill=as.factor(carb))) + geom_histogram(bins = 8)+
geom_text(data = Labels,aes(x=mpg,y=3,label=N))+facet_grid(as.factor(carb) ~ .)
I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)
I have data that looks like this example in the facet_wrap documentation:
(source: ggplot2.org)
I would like to fill the last facet with the overall view, using all data.
Is there an easy way to add a 'total' facet with facet_wrap? It's easy to add margins to facet_grid, but that option does not exist in facet_wrap.
Note: using facet_grid is not an option if you want a quadrant as in the plot above, which requires the ncol or nrow arguments from facet_wrap.
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
geom_point(data = transform(mpg, cyl = "all"), colour = "blue") +
facet_wrap(~ cyl)
I prefer a slightly alternative approach. Essentially, the data is duplicated before creating the plot, with a new set of data added for the all data. I wrote the following CreateAllFacet function to simplify the process. It returns a new dataframe with the duplicated data and an additional column facet.
library(ggplot2)
#' Duplicates data to create additional facet
#' #param df a dataframe
#' #param col the name of facet column
#'
CreateAllFacet <- function(df, col){
df$facet <- df[[col]]
temp <- df
temp$facet <- "all"
merged <-rbind(temp, df)
# ensure the facet value is a factor
merged[[col]] <- as.factor(merged[[col]])
return(merged)
}
The benefit of adding the new column facet to the original data is that it still allows the variable cylinder to be used to colour the points in the plot within the aesthetics:
df <- CreateAllFacet(mpg, "cyl")
ggplot(data=df, aes(x=displ,y=hwy)) +
geom_point(aes(color=cyl)) +
facet_wrap(~ facet) +
theme(legend.position = "none")
you can try "margins" option in facet_wrap as followings :
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
facet_wrap(~ cyl, margins=TRUE)