Extract ggplot from a nested dataframe - r

I have created a set of ggplots using a grouped dataframe and the map function and I would like to extract the plots to be able to manipulate them individually.
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>% group_by(manufacturer) %>% nest() %>%
mutate(plots= map(.x=data, ~plot(.x, manufacturer)))
nissan <- plots %>% filter(manufacturer == "nissan") %>% pull(plots)
nissan
nissan + labs(title = "Nissan")
In this case, "nissan" is a list object and I am not able to manipulate it. How do I extract the ggplot?

In terms of data structures, I think retaining a tibble (or data.frame) is suboptimal with respect to the illustrated usage. If you have one plot per manufacturer, and you plan to access them by manufacturer, then I would recommend to transmute and then deframe out to a list object.
That is, I would find it more conceptually clear here to do something like:
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>%
group_by(manufacturer) %>% nest() %>%
transmute(plot=map(.x=data, ~plot(.x, manufacturer))) %>%
deframe()
plots[['nissan']]
plots[['nissan']] + labs(title = "Nissan")
Otherwise, if you want to keep the tibble, another option similar to what has been suggested in the comments is to use a first() after the pull.

Related

lapply on a list of dataframe to create plots

I have created a list of dataframe df_list, each of which is an 18 x 4 dataframe.
The first columns of the dataframe is 18-times-repeated gene name, and the rest three columns are the gene's information. Each dataframe describes different gene.
Now I'd like to iterate the list of dataframe (i.e, a list of gene and their respective information) over the boxplot, to get plots on each gene; however, I am not sure how to deal with the ggtitle below:
Here is my simplified boxplot function:
box <- function(df){
df %>%
ggplot(df, aes(x = df[,4], y = df[,2])) +
geom_boxplot() +
ggtitle(g)
}
g is the gene name in each dataframe in the df_list
and when I run lapply(df_list,box),
I got Error: Mapping should be created with aes()oraes_().
Does anyone know how to fix this? Thank you.
As you used dplyr pipe, the first argument of ggplot() is already filled, leading df to be understood as aes argument.
box <- function(df){
df %>%
ggplot(aes(x = df[,4], y = df[,2])) +
geom_boxplot() +
ggtitle(g)
}
Instead of using df[,4] or df[,2] use column names in aes. Assuming the column names on x-axis is col1 and that on y-axis is col2 try -
box <- function(df, g){
df %>%
ggplot(aes(x = col1, y = col2)) +
geom_boxplot() +
ggtitle(df$g[1])
}
lapply(df_list,box)
g is the column name in the dataframe, so we can take the first value from it in title.
This is how to do it with another example dataset (Species instead of genes):
library(tidyverse)
plots <-
iris %>%
pivot_longer(-Species) %>%
nest(-Species) %>%
mutate(
plt = data %>% map2(Species, ~ {
.x %>%
ggplot(aes(name, value)) +
geom_boxplot() +
labs(title = .y)
})
) %>%
pull(plt)
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(name, value)`?
plots[[1]]
Created on 2021-09-09 by the reprex package (v2.0.1)

How to derive a relationship between the variables in R by using ggplot()

I have tried to determine the relationship between the variable "RainTomorrow" and others by the code below. But, seems like the way I coded is not giving me the output. How do I determine the relation of RainTomorrow and all other variables?
rattle::weatherAUS # to load the dataset into R
str(weather)
weather$Date <- as.Date(weather$Date)
weather$RainTomorrow <- as.factor(weather$RainTomorrow)
# exploring all the varibales
weather %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
rattle::weatherAUS merely prints the data to console. You need to run weather <- rattle::weatherAUS
After that everything will work fine.
I use facet_grid() to show RainTomorrow in each row and other numeric variables in each column.
library(tidyverse)
library(rattle)
# exploring all the varibales
weather %>%
mutate(RainTomorrow = as.integer(RainTomorrow)) %>%
keep(is.numeric) %>%
mutate(RainTomorrow = weather$RainTomorrow) %>%
pivot_longer(-RainTomorrow, names_to = "name", values_to = "value") %>%
ggplot(aes(value)) +
geom_histogram() +
facet_grid(vars(RainTomorrow), vars(name), scales = "free") +
theme_test()

How do I use a dynamically declared variable in R ggplot when using count() and factor() functions?

I would like to plot some relative frequency data using ggplot in a more efficient manner.
I have many variables of interest, and want to plot a separate barchart for each. The following is my current code for one variables of interest Gender:
chart.gender <- data %>%
count(Gender = factor(Gender)) %>%
mutate(Gender = fct_reorder(Gender,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=Gender, y=n, fill=Gender)) +
geom_col()
This works, but the variable Gender is repeated many times. Since I need to repeat plots for many variables of interest (Gender, Age, Location, etc.) with similar code, I would like to simplify this by declaring the variable of interest once at the top and using that declared variable for the rest of the code. Intuitively, something like:
var <- "Gender"
chart.gender <- data %>%
count(var = factor(var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Which does not result in a plot of three-level factor count of gender frequencies, but merely a single column named 'Gender'. I believe I see why it's not working, but I do not know the solution for it: I want R to retrieve the variable name I stored in var, and then use that to retrieve the data for that variable in 'data'.
With some research I've found suggestions like using as.name(var), but there seems to (at the least) be a problem with declaring the variable var as a factor within the count() function.
Some reproducible data:
library(tidyverse)
library(ggplot2)
set.seed(1)
data <- data.frame(sample(c("Male", "Female", "Prefer not to say"),20,replace=TRUE))
colnames(data) <- c("Gender")
I'm using the following packages in R: tidyverse, ggplot2
Use .data pronound to subset the column with var as variable.
library(tidyverse)
var <- "Gender"
data %>%
count(var = factor(.data[[var]])) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Or another way would be using sym and !!
data %>%
count(var = factor(!!sym(var))) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
If you use as.name() when you set the variable initially, you can use !! ("bang-bang") to unquote the variable for the count() step.
var <- as.name("Gender")
chart.gender <- data %>%
count(var = factor(!! var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()

Using Purrr package to produce plots with correct xlab

I am trying to use the map function from Purrr package to produce a bunch of plots at one time. I met issues with the xlab title.
library(dplyr)
library(purrr)
df <- mtcars
df %>% keep(is.numeric) %>%
map(~qplot(.), geom = 'density')
The xlab of each resulting plot turns to be .. I have tried to include xlab = . into the function, but it does not work. How can I add the correct xlab (e.g., the column name) to each plot? Thanks!
map only iterates the columns, not the names of the columns. You can also iterate the names with imap. For example
df %>% keep(is.numeric) %>%
imap(~qplot(.x, xlab=.y, geom = 'density'))
We can use imap instead of map and use the .y in xlab
library(tidyverse)
library(ggplot2)
df %>%
keep(is.numeric) %>%
imap(~qplot(.x) +
geom_density() +
xlab(.y))
-output (last plot)

Set ggplot title to reflect dplyr grouping

I've got a grouped dataframe generated in dplyr where each group reflects a unique combination of factor variable levels. I'd like to plot the different groups using code similar to this post. However, I can't figure out how to include two (or more) variables in the title of my plots, which is a hassle since I've got a bunch of different combinations.
Fake data and plotting code:
library(dplyr)
library(ggplot2)
spiris<-iris
spiris$site<-as.factor(rep(c("A","B","C")))
spiris$year<-as.factor(rep(2012:2016))
spiris$treatment<-as.factor(rep(1:2))
g<-spiris %>%
group_by(site, Species) %>%
do(plots=ggplot(data=.) +
aes(x=Petal.Width)+geom_histogram()+
facet_grid(treatment~year))
##Need code for title here
g[[3]] ##view plots
I need the title of each plot to reflect both "site" and "Species". Any ideas?
Use split() %>% purrr::map2() instead of group_by() %>% do() like this:
spiris %>%
split(list(.$site, .$Species)) %>%
purrr::map2(.y = names(.),
~ ggplot(data=., aes(x=Petal.Width)) +
geom_histogram()+
facet_grid(treatment~year) +
labs(title = .y) )
You just need to set the title with ggtitle():
g <- spiris %>% group_by(site, Species) %>% do(plots = ggplot(data = .) +
aes(x = Petal.Width) + geom_histogram() + facet_grid(treatment ~
year) + ggtitle(paste(.$Species,.$site,sep=" - ")))

Resources