I am producing a scatterplot using ggplot, and will be colouring the data points by a given factor. The legend that is produced, details the colour assigned to each level of the factor, but is it possible for it to also count the number of points in each factor.
For example, I have included the code for the cars data set:
p <- ggplot(mtcars, aes(wt, mpg))
p + geom_point(aes(colour = factor(cyl)))
In this plot, I would be looking to have the count for each number of cylinders. So 4(Count 1), 6(Count 2) and 8(Count 3).
Thanks in advance.
you can try something like this
mtcars %>%
group_by(cyl) %>%
mutate(label = paste0(cyl, ' (Count ', n(), ')')) %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(colour = factor(label)))
I want to create a large facet_wrap with ggplot2.
What I want is some automatic way to append the individual facets by column so that the facets keep the size they would have by default if (with the same data) only the first row with four columns of facets would be supplied.
E.g. I am concerned about the plot height. I know how to manually change the plot height, but I need an automatic way.
Some example data:
mpg <- mpg %>%
distinct(model, year, .keep_all = TRUE)
ggplot(mpg, aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model)
All facets should have the same height as
mpg %>%
distinct(model, year, .keep_all = TRUE) %>%
filter(model %in% c("4runner 4wd", "a4", "a4 quattro", "altima")) %>%
ggplot(aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model, ncol = 4)
You can conserve plot height by using theme(aspect.ratio). For example,
mpg <- mpg %>%
distinct(model, year, .keep_all = TRUE)
ggplot(mpg, aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model) + theme(aspect.ratio=2)
I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)
I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.
I have data that looks like this example in the facet_wrap documentation:
(source: ggplot2.org)
I would like to fill the last facet with the overall view, using all data.
Is there an easy way to add a 'total' facet with facet_wrap? It's easy to add margins to facet_grid, but that option does not exist in facet_wrap.
Note: using facet_grid is not an option if you want a quadrant as in the plot above, which requires the ncol or nrow arguments from facet_wrap.
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
geom_point(data = transform(mpg, cyl = "all"), colour = "blue") +
facet_wrap(~ cyl)
I prefer a slightly alternative approach. Essentially, the data is duplicated before creating the plot, with a new set of data added for the all data. I wrote the following CreateAllFacet function to simplify the process. It returns a new dataframe with the duplicated data and an additional column facet.
library(ggplot2)
#' Duplicates data to create additional facet
#' #param df a dataframe
#' #param col the name of facet column
#'
CreateAllFacet <- function(df, col){
df$facet <- df[[col]]
temp <- df
temp$facet <- "all"
merged <-rbind(temp, df)
# ensure the facet value is a factor
merged[[col]] <- as.factor(merged[[col]])
return(merged)
}
The benefit of adding the new column facet to the original data is that it still allows the variable cylinder to be used to colour the points in the plot within the aesthetics:
df <- CreateAllFacet(mpg, "cyl")
ggplot(data=df, aes(x=displ,y=hwy)) +
geom_point(aes(color=cyl)) +
facet_wrap(~ facet) +
theme(legend.position = "none")
you can try "margins" option in facet_wrap as followings :
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
facet_wrap(~ cyl, margins=TRUE)