Standard evaluation inside a function with dplyr - r

I have data with lots of factor variables that I am visualising to get a feel for each of the variables. I am reproducing a lot of the code with minor tweaks for variable names etc. so decided to write a function to simply things. I just can't get it to work...
Dummy Data
ID <- sample(1:32, 128, replace = TRUE)
AgeGrp <- sample(c("18-65", "65-75", "75-85", "85+"), 128, replace = TRUE)
ID <- factor(ID)
AgeGrp <- factor(AgeGrp)
data <- data_frame(ID, AgeGrp)
data
Basically what I am trying to do with each factor variable is produce a bar chart with labels of percentages inside the bars. For example with the dummy data.
plotstats <- #Create a table with pre-summarised percentages
data %>%
group_by(AgeGrp) %>%
summarise(count = n()) %>%
mutate(pct = count/sum(count)*100)
age_plot <- #Plot the data
ggplot(data,aes(x = AgeGrp)) +
geom_bar() + #Add the percentage labels using pre-summarised table
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),y=pct),
size=3.5, vjust = -1, colour = "sky blue") +
ggtitle("Count of Age Group")
age_plot
This works fine with the dummy data - but when I try to create a function...
basic_plot <-
function(df, x){
plotstats <-
df %>%
group_by_(x) %>%
summarise_(
count = ~n(),
pct = ~count/sum(count)*100)
plot <-
ggplot(df,aes(x = x)) +
geom_bar() +
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),
y=pct), size=3.5, vjust = -1, colour = "sky blue")
plot
}
basic_plot(data, AgeGrp)
I get the error code :
Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "factor"
I have looked at questions here, here, and here and also looked at the NSE Vignette but can't find my fault.

Related

Put dplyr & ggplot in Loop/Apply

I'm newish to R programming and am trying to standardise, or generalise, a piece of code so that I apply it to different data exports of the same structure. The code is trivial, but I am having trouble getting getting it to loop:
Here is my code:
plot <- data %>%
group_by(Age, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
I want to generalise the code so that I can switch out 'Age' w/ variables I put into a list. Here is my amateur code:
cols <- c(data$Col1, data$Col2) #Im pretty sure this is wrong
for (i in cols) {
plot <- data %>%
group_by(i, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
}
And this doesn't work. The datasets I will be receiving will have the same variables, just different observations and so standardising this process will be a lifesaver.
Thanks in advance.
You were probably trying to do :
library(dplyr)
library(rlang)
cols <- c('col1', 'col2')
plot_list <- lapply(cols, function(i)
data %>%
group_by(!!sym(i), ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(x = AgeGroup,y = Rev,fill = AgeGroup)) +
geom_col(alpha = 0.9) + theme_minimal())
This will return you list of plots which can be accessed as plot_list[[1]], plot_list[[2]] etc. Also look into facets to combine multiple plots.

How to parse LaTeX math formulas within geom_text() of ggplot2

I want to add a geom_text() including a Latex formula to the plot, to describe the mean percentage of each value in the 2 matrices:
library(latex2exp)
library(ggplot2)
library(tidyverse)
percentage <- matrix(c(10,100,90,80,100,97,80,19,90,82,9,87),nrow=2)
colnames(percentage) <- c("value1","value2","value3","value4","value5","value6")
rownames(percentage) <- c("matrix1", "matrix2")
mean_p <- apply(percentage,2,mean)
mat <- c("matrix1", "matrix2")
percentage %>%
as_data_frame() %>%
gather(., Value , Percentage) %>%
ggplot(., aes(x=Value,y=Percentage,color=rep(mat,ncol(percentage)))) +
geom_bar(position = position_dodge(width = 0.8), stat = "identity", fill = "white")`
I tried to add
lab <- character()
for(i in 1:ncol(percentage)){
lab <- c(lab,"",sprintf('$\\oslash%s$',mean_p[i]))
}
geom_text(aes(label=lapply(lab,TeX)),vjust=0,show.legend = FALSE,color="lightblue")
but this doesn't convert the Latex Expression correctly. Has anybody an idea how to fix this Problem?
The output I want to generate should look like this:
I propose a solution using annotate wherease geom_text, it is largely inspired by the following solution :
Annotate a plot made with ggplot2 with an equation using latex2exp::TeX
lab <- character()
for(i in 1:ncol(percentage)){
lab <- c(lab, paste('$\\oslash$', mean_p[i], '$\\%$', sep = " "))
}
percentage %>%
as_data_frame() %>%
gather(., Value , Percentage) %>%
ggplot(., aes(x=Value,y=Percentage,color=rep(mat,ncol(percentage)))) +
geom_bar(position = position_dodge(width = 0.8), stat = "identity", fill = "white") +
annotate('text', x = 1:6, y = percentage[2,], label = lapply(lab, function(x){TeX(x, output = "character")}), hjust=0, size = 4, parse = TRUE)

R: Unexplainable behavior of ggplot inside a function

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))

Revising ggplot after a function: non-numeric argument to binary operator error

I am attempting to produce a ggplot from within a function. I can do so using the sample data and code below.
If I produce the plot (p) outside of the function, I can revise it with no problem to add a title, subtitle, axis labels, etc. (e.g., p + labs(title = "Most frequent words, by gender")).
However, if I produce the plot from within the function and then attempt to modify it, I get the following error: non-numeric argument to binary operator.
In both cases, the object "p" shows up under Values.
I would of course like to use a function because I have a number of different group_by variables to test, and I want to eliminate typing mistakes (e.g., forgetting to change "gender" to "income" on a later analysis).
Can someone explain why the error arises only after modifying a ggplot created in a function? And of course I would be grateful for advice about how to eliminate the source of the error.
# sample data of favorite activities
df <- tibble(
word = c("walk","hike","garden","garden","walk","hike", "garden","hike","hike","hike","walk"),
gender = c("Male","Female","Female","Female","Male","Male","Male", "Male","Male","Female","Female")
)
df
# function to figure out the proportions of the activities
sum_text_prop <- function(df, groupbyvar) {
groupbyvar <- enquo(groupbyvar)
df %>%
count(!!groupbyvar, word, sort = TRUE) %>%
group_by(groupbyvar = !!groupbyvar) %>%
mutate(proportion = n / sum(n)) %>%
top_n(proportion, n = 5) %>%
ungroup()
}
# function to plot the most common words
plot_text_prop <- function(df) {
p <- ggplot(data = df, aes(x = word, y = proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
print(p)
}
# deploy the functions
df %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
# add a title to the plot
p + labs(title = "Most frequent words, by gender")
# error: Error in p + labs(title = "Most frequent words, by gender") :
non-numeric argument to binary operator
Update
Thanks to the helpful responses, my revised code is as follows:
plot_text_prop <- function(df) {
ggplot(data = df, aes(reorder_within(word, proportion, groupbyvar),
proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
scale_x_reordered() +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
}
p <- tidy_infl %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
p + labs(title = "Most frequent words, by gender")

How to generate grouped bar plot or pie chart from list of csv files?

I got list of data.frame that need to be classified, I did manipulate these list and finally export them as csv files in default folder. However, to make these exported data more informative, I think it is better to generate grouped bar plot, or pie chart for each data.frame objects. As a beginner, I am still learning features of ggplot2 packages, so I have little idea how to do this easily. Can any one give me possible ideas how to generate grouped bar plot easily ? How can I generate well informative bar plot for list of files ? How can I make this happen ? Any idea ? Thanks in advance :)
reproducible data :
savedDF <- list(
bar.saved = data.frame(start=sample(100, 15), stop=sample(150, 15), score=sample(36, 15)),
cat.saved = data.frame(start=sample(100, 20), stop=sample(100,20), score=sample(45,20)),
foo.saved = data.frame(start=sample(125, 24), stop=sample(140, 24), score=sample(32, 24))
)
dropedDF <- list(
bar.droped = data.frame(start=sample(60, 12), stop=sample(90,12), score=sample(35,12)),
cat.droped = data.frame(start=sample(75, 18), stop=sample(84,18), score=sample(28,18)),
foo.droped = data.frame(start=sample(54, 14), stop=sample(72,14), score=sample(25,14))
)
so I am getting list of csv files from this pipeline :
comb <- do.call("rbind", c(savedDF, dropedDF))
cn <- c("letter", "saved","seq")
DF <- cbind(read.table(text = chartr("_", ".", rownames(comb)), sep = ".", col.names = cn), comb)
DF <- transform(DF, updown = ifelse(score>= 12, "stringent", "weak"))
by(DF, DF[c("letter", "saved", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$updown[1], x$saved[1])))
To better understand the exported data, I think generating grouped bar plot and pie chart for each data.frame object will be much informative.
In desired plot, I intend to see number of features in each csv files for each data.frame objects. Can any one give me ideas to do this task ?
How can I make this happen easily by using ggplot2 packages ? Is there any way to get this done more efficiently ? Thanks a lot
If I understand correctly, this may work for you as a rough solution. Please comment to let me know if this is acceptable. In the future, if you can provide a rough sketch along with your data to show what you're trying to achieve that would be a good idea.
library(dplyr)
library(ggplot2)
plot_data <- DF %>%
group_by(letter, saved, updown) %>%
tally %>%
group_by(saved, updown) %>%
mutate(percentage = n/sum(n))
ggplot(plot_data, aes(x = saved, y = n, fill = saved)) +
geom_bar(stat = "identity") +
facet_wrap(~ letter + updown, ncol = 2)
You can always change the facet_wrap(~ letter + updown, ncol = 2) to an explicit facet_grid(letter ~ updown) if you wish.
Or you could view it this way:
ggplot(plot_data, aes(x = letter, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~updown+saved, ncol = 2)
For a pie (cleaning up and labeling is up to you):
ggplot(plot_data, aes(x = 1, y = percentage, fill = letter)) +
geom_bar(stat = "identity", width =1) +
facet_wrap(~updown+saved, ncol = 2) +
coord_polar(theta = "y") +
theme_void()
The bar, 4 interaction pie just requires some manipulating of your data:
library(dplyr)
library(tidyr)
library(ggplot2)
plot_data <- DF %>%
unite(interaction, saved, updown, sep = "-") %>%
group_by(letter, interaction) %>%
tally %>%
mutate(percentage = n/sum(n)) %>%
filter(letter == "bar")
ggplot(plot_data, aes(x = 1, y = percentage, fill = interaction)) +
geom_bar(stat = "identity", width =1) +
coord_polar(theta = "y") +
theme_void()
You should really look into dplyr, tidyr and ggplot2 packages. Read their documentation and vignettes and work through the exmaples. Best way to learn is by doing.

Resources