I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)
Related
I have a dataframe to plot where y_axis variable is a character one. I want to take only the last part of character with '_' as separation.
Here an example with iris dataset. As you can see, all y_axis labels are the same. How can I do it? thanks
iris$trial = paste('hello', 'good_bye', iris$Sepal.Length, sep = '_')
myfun = function(x) {
tail(unlist(strsplit(x, '_')), n = 1)
}
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = function(x) myfun(x)) +
theme_bw()
It seems to me that you function is only applied to the first row of the column. That value is replicated. Using lapply returns all the unique values. However, I don't know if it makes sense in this example without making it numeric (and sorting it) so you might want to add that as well.
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = lapply(iris$trial, myfun)) +
theme_bw()
You can make use of regex instead to extract the required value.
library(ggplot2)
#This removes everything until the last underscore
myfun = function(x) sub('.*_', '', x)
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = myfun) +
theme_bw()
If you want to extract numbers from y-axis value, you can also use scale_y_discrete(labels = readr::parse_number).
I have an issue with my ggplot() reordering the data. I have an example code below. I have data, and reordered the factors in feed to my content, but after the str_extract() in facet_wrap(), the data gets reordered back before I reordered it. Is there a way to prevent that from occurring? For my actual code, it is important for me to use regex within the facet_wrap() in ggplot,
data <- chickwts
data <- mutate(data, time = 1:nrow(data))
lvl <- c("linseed", "meatmeal", "sunflower", "soybean",
"casein", "horsebean")
data$feed <- factor(data$feed, levels = lvl)
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~str_extract(feed,"[a-z]+"))
You could put the factor inside the facet_wrap:
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~ factor(str_extract(feed,"[a-z]+"), levels = lvl))
I have a ggplot with facets and colors. The colors are related to "ID" and the columns of the facets are related to "Type". One ID is always in the same Type but there are a different numbers of IDs in each Type. I would like to reset the colors with each column of the facets to have a bigger difference in the colors.
ggplot(data = plt_cont_em, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(ID))) +
facet_grid(Schadstoff_ID ~ Type, scales = "free_y")
Now it looks like:
I understand, that I have to introduce a dummy var for the color. But is there an easy way of numerating the IDs in each Type, starting in each Type with 1?
Since the data is confidential, I created dummy data that shows the same problem.
ID<-c()
Type<-c()
Jahr<-c()
Schadstoff<-c()
Monat<-c()
Konz<-c()
for (i in 1:25){
#i = ID
#t = Type
t<-sample(c("A","B","C"),1)
for (j in 1:5){
#j = Schadstoff
if(runif(1)<0.75){
for(k in 2015:2020){
#k = Jahr
for(l in 1:12){
#l = Monat
if(runif(1)<0.9){
ID<-c( ID,i)
Type<-c( Type,t)
Jahr<-c( Jahr,k)
Schadstoff<-c( Schadstoff,j)
Monat<-c( Monat,l)
Konz<-c( Konz,runif(1))
}
}
}
}
}
}
tmp<- data.frame(ID,Type, Jahr, Schadstoff, Monat, Konz)
tmp<-tmp %>% group_by( Type) %>% mutate( Color=row_number())
p<-ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(Color)), size=0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
p
Problem still exists, that the grouping doesn't work and Color is unique for each line.
Using dplyr you can group_by Type and create a new column with the dense_rank of the ID inside each group:
plt_cont_em %>%
group_by(Type) %>%
mutate(Type_ID = dense_rank()) %>%
ggplot() +
...
This will 'rank' each ID from smallest to biggest inside the group, keeping records with the same ID with the same value.
You will probabily then want to exclude the legend, as it'll have little sense.
library(dplyr)
library(ggplot2)
# Using provided random data
tmp <- tmp %>%
group_by(Type) %>%
mutate(Color = dense_rank(ID))
ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color = factor(Color)), size = 0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Created on 2020-04-02 by the reprex package (v0.3.0)
I have a boxplot with multiple groups in R.
When i add the dots within the boxplots, they are not in the center.
Since each week has a different number of boxplots, the dots are not centered within the box.
The problem is in the geom_point part.
I uploaded my data of df.m in a text file and a figure of what i get.
I am using ggplot, and here is my code:
setwd("/home/usuario")
dput("df.m")
df.m = read.table("df.m.txt")
df.m$variable <- as.factor(df.m$variable)
give.n = function(elita){
return(c(y = median(elita)*-0.1, label = length(elita)))
}
p = ggplot(data = df.m, aes(x=variable, y=value))
p = p + geom_boxplot(aes(fill = Label))
p = p + geom_point(aes(fill = Label), shape = 21,
position = position_jitterdodge(jitter.width = 0))
p = p + stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Here is my data in a text file:
https://drive.google.com/file/d/1kpMx7Ao01bAol5eUC6BZUiulLBKV_rtH/view?usp=sharing
Only in variable 12 is in the center, because there are 3 groups (the maximum of possibilities!
I would also like to show the counting of observations. If I use the code shown, I can only get the number of observations for all the groups. I would like to add the counting for EACH GROUP.
Thank you in advance
enter image description here
Here's a solution using boxplot and dotplot and an example dataset:
library(tidyverse)
# example data
dt <- data.frame(week = c(1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2),
value = c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
6.00,6.11,6.40,7.00,3,5.44,6.00,5,6.00),
donor_type = c("A","A","A","A","CB","CB","CB","CB","CB",
"CB","CB","CB","CB","CB","A","A","A","A"))
# create the plot
ggplot(dt, aes(x = factor(week), y = value, fill = donor_type)) +
geom_boxplot() +
geom_dotplot(binaxis='y', stackdir='center', position = position_dodge(0.75))
You should be able to adjust my code to your real dataset easily.
Edited answer with OP's dataset:
Using some generated data and geom_point():
library(tidyverse)
df.m <- df.m %>%
mutate(variable = as.factor(variable)) %>%
filter(!is.na(value))
ggplot(df.m, aes(x = variable, y = value, fill = Label)) +
geom_boxplot() +
geom_point(shape = 21, position = position_jitterdodge(jitter.width = 0)) +
scale_x_discrete("variable", drop = FALSE)
The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")