Put dplyr & ggplot in Loop/Apply

Put dplyr & ggplot in Loop/Apply - r

I'm newish to R programming and am trying to standardise, or generalise, a piece of code so that I apply it to different data exports of the same structure. The code is trivial, but I am having trouble getting getting it to loop:
Here is my code:
plot <- data %>%
group_by(Age, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
I want to generalise the code so that I can switch out 'Age' w/ variables I put into a list. Here is my amateur code:
cols <- c(data$Col1, data$Col2) #Im pretty sure this is wrong
for (i in cols) {
plot <- data %>%
group_by(i, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
}
And this doesn't work. The datasets I will be receiving will have the same variables, just different observations and so standardising this process will be a lifesaver.
Thanks in advance.

You were probably trying to do :
library(dplyr)
library(rlang)
cols <- c('col1', 'col2')
plot_list <- lapply(cols, function(i)
data %>%
group_by(!!sym(i), ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(x = AgeGroup,y = Rev,fill = AgeGroup)) +
geom_col(alpha = 0.9) + theme_minimal())
This will return you list of plots which can be accessed as plot_list[[1]], plot_list[[2]] etc. Also look into facets to combine multiple plots.

Related

Apply a function to a list of dataframes n list elements at a time

I have a dataframe that contains a grouping variable. Trivial to create a list of dataframes using group_split but then I'd like to turn around and make a plot that groups these 5 at a time using facetting. For reproducibility I'll use mtcars
ldf <- mtcars %>%
group_split(carb)
Now I'm having a brain lock on how to do the equivalent of:
ldf[1:3] %>%
bind_rows( .id = "column_label") %>%
ggplot(aes(x = disp, y = hp)) +
geom_line() +
facet_wrap(carb ~ ., ncol = 1)
ldf[4:6] %>%
bind_rows( .id = "column_label") %>%
ggplot(aes(x = disp, y = hp)) +
geom_line() +
facet_wrap(carb ~ ., ncol = 1)
Where I don't have to manually slice the list with [1:3], [4:6] etc. and simply provide an n value like 3 or 5.
Preference for a tidyverse solution second choice base r. Thank you in advance

As per comments, here's my suggestion without the group_split:
n_per_group = 3
mtcars %>%
mutate(
carb_grp = as.integer(factor(carb)),
plot_grp = (carb_grp - 1) %/% n_per_group
) %>%
group_by(plot_grp) %>%
group_map(
~ggplot(., aes(x = disp, y = hp)) +
geom_line() +
facet_wrap(carb ~ ., ncol = 1)
)
In general, I find most of what I might want to do after group_split can be done with group_map instead, and there are sometimes advantages to keeping the data together---like ease of regrouping, as in this example.

I think you should first look at solutions that do not require splitting then un-splitting ... but if you're stuck with it, then you can group them such as this:
ggs <- split(ldf, (seq_along(ldf)-1) %/% 3) %>%
lapply(function(z) {
bind_rows(z, .id = "column_label") %>%
ggplot(aes(x = disp, y = hp)) +
geom_line() +
facet_wrap(carb ~ ., ncol = 1)
})
(Produces a list of 2 gg objects.)

Splitting a dataframe by every n unique values of a variable

I have a dataframe of Lots, Time, Value with the same structure as the sample data below.
df <- tibble(Lot = c(rep(123,4),rep(265,5),rep(132,3),rep(455,4)),
time = c(seq(4), seq(5), seq(3), seq(4)), Value = runif(16))
I'd like to split the dataframe by every N Lots and plot them. The Lots are different sizes so I can't subset the data by every n rows!
I've been using an approach like this but it's not scalable for a large dataset.
df %>% filter(Lot == c(123, 265)) %>% ggplot(., aes(x = time, y = Value)) +
geom_point() + stat_smooth()
How can I do this?

Create a lot number column and create a list of plots for every n unique lot values.
This would give you list of plots.
library(tidyverse)
lot_n <- 2
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
group_split(group) %>%
map(~ggplot(.x, aes(x = time, y = Value)) +
geom_point() + stat_smooth()) -> list_plots
list_plots
Individual plots can be accessed via list_plots[[1]], list_plots[[2]] etc.
You can also plot the data with facets.
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
ggplot(aes(x = time, y = Value)) +
geom_point() + stat_smooth() +
facet_wrap(~group, scales = 'free')

Timeseries graphs of mean values of group in R

I am learning R and dealing a data set of with multiple repetitive columns, say 200 times as given columns are repeated 200 times.
I want to take mean of each column and the group the mean of each variable. So there will be 200 values of mean of each variable. I want to make a line chart like this of mean values of each variable.
I am trying these codes
library(data.table)
library(tidyverse)
library(ggplot2)
library(viridisLite)
df <- read.table("H-W.csv", sep = ",")
df
dat %>% filter(Scenario != 'NULL') %>%
mutate("Scenario" = ifelse(Scenario == 'NULL2', "BASELINE", Scenario)) %>%
group_by(.dots = c("X.step.", "Scenario")) %>%
summarise('height.people' = mean(height),
'weight.people' = mean(weight),
"wealth.people" = mean(wealth)) %>%
pivot_longer(c('height.people', 'weight.people', 'wealth.people')) %>%
ggplot(aes(x = X.step., y = value, colour = Scenario)) +
geom_line(size = 1) + facet_grid(name~., scales = "free_y") + theme_classic() +
scale_colour_viridis_d() + scale_y_log10()
I found this error
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "NULL"

I think you might have the same problem as this...
Is your data in a data.frame or tibble?
Other wise if that doesn't work try this...
filter is a function in stats and dplyr,
so you could try changing
dat %>% filter(Scenario != 'NULL') %>%
to
dat %>% dplyr::filter(Scenario != "NULL") %>%

dplyr() and ggolot2()::geom_tile, filtering a group of summary statistics

I've got a data frame (df) with three categorical variables called site, purchase, and happycustomer.
I'd like to use gglot2's geom_tile function to create a heat-map of customer experience. I'd like site on the x-axis, purchase on the y-axis, and happycustomer as the fill. I'd like the heat map to feature the percentages for the happy customers grouped by site and purchase (ie the ones for which the value of happycustomer is y).
My problem's that at the moment the plot features both the happy and the unhappy customers.
Any help would be much appreciated.
Starting point (df):
df <- data.frame(site=c("GA","NY","BO","NY","BO","NY","BO","NY","BO","GA","NY","GA","NY","NY","NY"),purchase=c("a1","a2","a1","a1","a3","a1","a1","a3","a1","a2","a1","a2","a1","a2","a1"),happycustomer=c("n","y","n","y","y","y","n","y","n","y","y","y","n","y","n"))
Current code:
library(ggplot2)
library(dplyr)
df %>%
group_by(site, purchase,happycustomer) %>%
summarize(bin = sum(happycustomer==happycustomer)) %>%
group_by(site,happycustomer) %>%
mutate(bin_per = (bin/sum(bin)*100)) %>%
ggplot(aes(site,purchase)) + geom_tile(aes(fill = bin_per),colour = "white") + geom_text(aes(label = round(bin_per, 1))) +
scale_fill_gradient(low = "blue", high = "red")

Here is the solution with two data frames.
happyDF <- df %>%
filter(happycustomer == "y") %>%
group_by(site, purchase) %>%
summarise( n = n() )
totalDF <- df %>%
group_by(site, purchase) %>%
summarise( n = n() )
And the ggplot code:
merge(happyDF, totalDF, by=c("site", "purchase") ) %>%
mutate(prop = 100 * (n.x / n.y) ) %>%
ggplot(., aes(site, purchase)) +
geom_tile(aes(fill = prop),colour = "white") +
geom_text(aes(label = round(prop, 1))) +
scale_fill_gradient(low = "blue", high = "red")

Standard evaluation inside a function with dplyr

I have data with lots of factor variables that I am visualising to get a feel for each of the variables. I am reproducing a lot of the code with minor tweaks for variable names etc. so decided to write a function to simply things. I just can't get it to work...
Dummy Data
ID <- sample(1:32, 128, replace = TRUE)
AgeGrp <- sample(c("18-65", "65-75", "75-85", "85+"), 128, replace = TRUE)
ID <- factor(ID)
AgeGrp <- factor(AgeGrp)
data <- data_frame(ID, AgeGrp)
data
Basically what I am trying to do with each factor variable is produce a bar chart with labels of percentages inside the bars. For example with the dummy data.
plotstats <- #Create a table with pre-summarised percentages
data %>%
group_by(AgeGrp) %>%
summarise(count = n()) %>%
mutate(pct = count/sum(count)*100)
age_plot <- #Plot the data
ggplot(data,aes(x = AgeGrp)) +
geom_bar() + #Add the percentage labels using pre-summarised table
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),y=pct),
size=3.5, vjust = -1, colour = "sky blue") +
ggtitle("Count of Age Group")
age_plot
This works fine with the dummy data - but when I try to create a function...
basic_plot <-
function(df, x){
plotstats <-
df %>%
group_by_(x) %>%
summarise_(
count = ~n(),
pct = ~count/sum(count)*100)
plot <-
ggplot(df,aes(x = x)) +
geom_bar() +
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),
y=pct), size=3.5, vjust = -1, colour = "sky blue")
plot
}
basic_plot(data, AgeGrp)
I get the error code :
Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "factor"
I have looked at questions here, here, and here and also looked at the NSE Vignette but can't find my fault.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Put dplyr & ggplot in Loop/Apply - r

Related

Apply a function to a list of dataframes n list elements at a time

Splitting a dataframe by every n unique values of a variable

Timeseries graphs of mean values of group in R

dplyr() and ggolot2()::geom_tile, filtering a group of summary statistics

Standard evaluation inside a function with dplyr

Categories

Resources