Using Purrr package to produce plots with correct xlab - r

I am trying to use the map function from Purrr package to produce a bunch of plots at one time. I met issues with the xlab title.
library(dplyr)
library(purrr)
df <- mtcars
df %>% keep(is.numeric) %>%
map(~qplot(.), geom = 'density')
The xlab of each resulting plot turns to be .. I have tried to include xlab = . into the function, but it does not work. How can I add the correct xlab (e.g., the column name) to each plot? Thanks!

map only iterates the columns, not the names of the columns. You can also iterate the names with imap. For example
df %>% keep(is.numeric) %>%
imap(~qplot(.x, xlab=.y, geom = 'density'))

We can use imap instead of map and use the .y in xlab
library(tidyverse)
library(ggplot2)
df %>%
keep(is.numeric) %>%
imap(~qplot(.x) +
geom_density() +
xlab(.y))
-output (last plot)

Related

Extract ggplot from a nested dataframe

I have created a set of ggplots using a grouped dataframe and the map function and I would like to extract the plots to be able to manipulate them individually.
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>% group_by(manufacturer) %>% nest() %>%
mutate(plots= map(.x=data, ~plot(.x, manufacturer)))
nissan <- plots %>% filter(manufacturer == "nissan") %>% pull(plots)
nissan
nissan + labs(title = "Nissan")
In this case, "nissan" is a list object and I am not able to manipulate it. How do I extract the ggplot?
In terms of data structures, I think retaining a tibble (or data.frame) is suboptimal with respect to the illustrated usage. If you have one plot per manufacturer, and you plan to access them by manufacturer, then I would recommend to transmute and then deframe out to a list object.
That is, I would find it more conceptually clear here to do something like:
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>%
group_by(manufacturer) %>% nest() %>%
transmute(plot=map(.x=data, ~plot(.x, manufacturer))) %>%
deframe()
plots[['nissan']]
plots[['nissan']] + labs(title = "Nissan")
Otherwise, if you want to keep the tibble, another option similar to what has been suggested in the comments is to use a first() after the pull.

R ggplot - Boxplot with fill displaying different values than bargraph

I have a boxplot generated using the following code, and after checking the dataset all the values are correct here.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=ACTARMCD))+
geom_boxplot()+
stat_summary(fun.y=mean,na.rm=TRUE,shape=25,col='black',geom='point')
I want to generate a second boxplot where I split the x variable into different groups by applying a different variable as a fill. I use the following code, but the values present in the graph are incorrect.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
group_by(ACTARMCD, RESPFL) %>%
ggplot(aes(x=ACTARMCD,y=AVAL))+
geom_boxplot(aes(fill=RESPFL))
However when I generate a bargraph using this code, the numbers are correct.
myplot <- inDATA %>%
filter(PARAMCD=="param1") %>%
group_by(ACTARMCD,RESPFL) %>%
dplyr::mutate(AVAL = mean(AVAL, na.rm=TRUE)) %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=RESPFL))+
geom_bar(stat="identity",position="dodge")
Can anyone please help me understand what I am doing incorrectly with the second boxplot?
I ended up solving the issue by using plotly instead of ggplot. The code that worked is:
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
plot_ly(x = ~ACTARMCD, y = ~AVAL, color = ~RESPFL, type = "box",boxmean=TRUE) %>% layout(boxmode = "group")

Color/fill bars in geom_col based on another variable?

I have an uncolored geom_col and would like it to display information about another (continuous) variable by displaying different shades of color in the bars.
Example
Starting with a geom_col
library(dplyr)
library(ggplot2)
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n()) %>%
ggplot(aes(Species, n)) +
geom_col()
Suppose we want to color the bars according to how low/high mean(Sepal.Width) in each grouping
(note: I don't know if there's a way to provide 'continuous' colors to a ggplot, but, if not, the following colors would be fine to use)
library(RColorBrewer)
display.brewer.pal(n = 3, name= "PuBu")
brewer.pal(n = 3, name = "PuBu")
[1] "#ECE7F2" "#A6BDDB" "#2B8CBE"
The end result should be the same geom_col as above but with the bars colored according to how low/high mean(Sepal.Width) is.
Notes
This answer shows something similar but is highly manual, and is okay for 3 bars, but not sustainable for many plots with a high number of bars (since would require too many case_when conditions to be manually set)
This is similar but the coloring is based on a variable already displayed in the plot, rather than another variable
Note also, in the example I provide above, there are 3 bars and I provide 3 colors, this is somewhat manual and if there's a better (i.e. less manual) way to designate colors would be glad to learn it
What I've tried
I thought this would work, but it seems to ignore the colors I provide
library(RColorBrewer)
# fill info from: https://stackoverflow.com/questions/38788357/change-bar-plot-colour-in-geom-bar-with-ggplot2-in-r
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(colors = brewer.pal(n = 3, name = "PuBu")) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = colors)) +
geom_col()
Do the following
add fill = sep_mean to aes()
add + scale_fill_gradient()
remove mutate(colors = brewer.pal(n = 3, name = "PuBu")) since the previous step takes care of colors for you
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = sep_mean, label=sprintf("%.2f", sep_mean))) +
geom_col() +
scale_fill_gradient() +
labs(fill="Sepal Width\n(mean cm)") +
geom_text()

Set ggplot title to reflect dplyr grouping

I've got a grouped dataframe generated in dplyr where each group reflects a unique combination of factor variable levels. I'd like to plot the different groups using code similar to this post. However, I can't figure out how to include two (or more) variables in the title of my plots, which is a hassle since I've got a bunch of different combinations.
Fake data and plotting code:
library(dplyr)
library(ggplot2)
spiris<-iris
spiris$site<-as.factor(rep(c("A","B","C")))
spiris$year<-as.factor(rep(2012:2016))
spiris$treatment<-as.factor(rep(1:2))
g<-spiris %>%
group_by(site, Species) %>%
do(plots=ggplot(data=.) +
aes(x=Petal.Width)+geom_histogram()+
facet_grid(treatment~year))
##Need code for title here
g[[3]] ##view plots
I need the title of each plot to reflect both "site" and "Species". Any ideas?
Use split() %>% purrr::map2() instead of group_by() %>% do() like this:
spiris %>%
split(list(.$site, .$Species)) %>%
purrr::map2(.y = names(.),
~ ggplot(data=., aes(x=Petal.Width)) +
geom_histogram()+
facet_grid(treatment~year) +
labs(title = .y) )
You just need to set the title with ggtitle():
g <- spiris %>% group_by(site, Species) %>% do(plots = ggplot(data = .) +
aes(x = Petal.Width) + geom_histogram() + facet_grid(treatment ~
year) + ggtitle(paste(.$Species,.$site,sep=" - ")))

Violin plot of a data frame

I have a data.frame, for example:
df = data.frame(AAA=rnorm(100,1,1),BBB=rnorm(100,2,1.5),CCC=rnorm(100,1.5,1.2))
And I'd like to plot each of its columns in a joint violin plot.
Here's where I'm at so far:
names(df)[1] = 'x'
do.call('vioplot', c(df,col="red",drawRect=FALSE))
What I want to do next is to plot the colnames of df as x-axis labels rather than the default x-axis labels of vioplot and in addition in a way that they don't run over each other. I imagine this can be achieved either by spreading the columns of df in the plot or by slanting the x-axis labels. But I can't figure that out.
Probably easier to use ggplot
df = data.frame(AAA=rnorm(100,1,1),
BBB=rnorm(100,2,1.5),
CCC=rnorm(100,1.5,1.2))
Need to transform the data into something ggplot can handle:
df.m <- reshape2::melt(df, id.vars = NULL)
and plot:
library(ggplot2)
ggplot(df.m, aes(x = variable, y = value)) + geom_violin()
I like the ggplot solution the best, but here is how you would do it with do.call:
do.call(vioplot,c(unname(df),col='red',drawRect=FALSE,names=list(names(df))))
Notably, you wouldn't have to do names(df)[1] = 'x' because you remove the names with unname.
Have you tried dropping the do.call call and doing them individually.
vioplot(df[,"AAA"], df[,"BBB"], df[,"CCC"],
col = "red", drawRect = FALSE,names = names(df))
Another simple option is using the ggviolin function from ggpubr with long formatted data like this:
df = data.frame(AAA=rnorm(100,1,1),BBB=rnorm(100,2,1.5),CCC=rnorm(100,1.5,1.2))
library(dplyr)
library(tidyr)
library(ggpubr)
df %>%
pivot_longer(cols = everything()) %>%
ggviolin(x = "name",
y = "value")
Created on 2022-08-14 by the reprex package (v2.0.1)

Resources