Having trouble converting variable to factor for ggplot - r

I am trying to plot data from the nycflights13 data set. I want month and dep_delay variable to be factors rather than continuous. I am getting a error with no explanation and am stuck. Here's my code:
library(ggplot2)
library(dplyr)
library(nycflights13)
f <- group_by(flights, month) %>%
summarise(delay = mean(dep_delay, na.rm = TRUE)) %>%
ggplot(mutate(month = as.factor(unlist(month))) +
geom_bar(aes(month, delay, fill=month),stat = "identity")

You can't do the mutate inside the ggplot call like that. It does not get properly parsed inside, as the ggplot call gets the data, but cannot carry out the mutate step.
Do it in an outside call:
f <- group_by(flights, month) %>%
summarise(delay = mean(dep_delay, na.rm = TRUE)) %>%
mutate(month = as.factor(month)) %>%
ggplot() +
geom_bar(aes(month, delay, fill=month),stat = "identity")

Related

Timeseries graphs of mean values of group in R

I am learning R and dealing a data set of with multiple repetitive columns, say 200 times as given columns are repeated 200 times.
I want to take mean of each column and the group the mean of each variable. So there will be 200 values of mean of each variable. I want to make a line chart like this of mean values of each variable.
I am trying these codes
library(data.table)
library(tidyverse)
library(ggplot2)
library(viridisLite)
df <- read.table("H-W.csv", sep = ",")
df
dat %>% filter(Scenario != 'NULL') %>%
mutate("Scenario" = ifelse(Scenario == 'NULL2', "BASELINE", Scenario)) %>%
group_by(.dots = c("X.step.", "Scenario")) %>%
summarise('height.people' = mean(height),
'weight.people' = mean(weight),
"wealth.people" = mean(wealth)) %>%
pivot_longer(c('height.people', 'weight.people', 'wealth.people')) %>%
ggplot(aes(x = X.step., y = value, colour = Scenario)) +
geom_line(size = 1) + facet_grid(name~., scales = "free_y") + theme_classic() +
scale_colour_viridis_d() + scale_y_log10()
I found this error
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "NULL"
I think you might have the same problem as this...
Is your data in a data.frame or tibble?
Other wise if that doesn't work try this...
filter is a function in stats and dplyr,
so you could try changing
dat %>% filter(Scenario != 'NULL') %>%
to
dat %>% dplyr::filter(Scenario != "NULL") %>%

How do I use a dynamically declared variable in R ggplot when using count() and factor() functions?

I would like to plot some relative frequency data using ggplot in a more efficient manner.
I have many variables of interest, and want to plot a separate barchart for each. The following is my current code for one variables of interest Gender:
chart.gender <- data %>%
count(Gender = factor(Gender)) %>%
mutate(Gender = fct_reorder(Gender,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=Gender, y=n, fill=Gender)) +
geom_col()
This works, but the variable Gender is repeated many times. Since I need to repeat plots for many variables of interest (Gender, Age, Location, etc.) with similar code, I would like to simplify this by declaring the variable of interest once at the top and using that declared variable for the rest of the code. Intuitively, something like:
var <- "Gender"
chart.gender <- data %>%
count(var = factor(var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Which does not result in a plot of three-level factor count of gender frequencies, but merely a single column named 'Gender'. I believe I see why it's not working, but I do not know the solution for it: I want R to retrieve the variable name I stored in var, and then use that to retrieve the data for that variable in 'data'.
With some research I've found suggestions like using as.name(var), but there seems to (at the least) be a problem with declaring the variable var as a factor within the count() function.
Some reproducible data:
library(tidyverse)
library(ggplot2)
set.seed(1)
data <- data.frame(sample(c("Male", "Female", "Prefer not to say"),20,replace=TRUE))
colnames(data) <- c("Gender")
I'm using the following packages in R: tidyverse, ggplot2
Use .data pronound to subset the column with var as variable.
library(tidyverse)
var <- "Gender"
data %>%
count(var = factor(.data[[var]])) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Or another way would be using sym and !!
data %>%
count(var = factor(!!sym(var))) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
If you use as.name() when you set the variable initially, you can use !! ("bang-bang") to unquote the variable for the count() step.
var <- as.name("Gender")
chart.gender <- data %>%
count(var = factor(!! var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()

Working with tidyverse, ggplot, and broom to add confidence interval to a proportion test (prop.test) in R

Let's say I'm working with proportions, I have two main variables (sex and pain_level). It's not difficult to plot them:
With tidyverse and broom (and thanks for this link here: Calling prop.test function in R with dplyr) I can compare if the proportions are statistically different.
Now comes the question!
I want to add to the plot, the error bar. I know it's not as difficult as I'm thinking, but I could not find a way to do it. I've tried to replicate this link here (http://www.andrew.cmu.edu/user/achoulde/94842/labs/lab07_solution.html) but I'm trying to stay at tidyverse environment.
The desired output should be something like that:
Please feel free to use the script/syntax below that simulate the original dataset.
library(tidyverse)
ds <- data.frame(sex = rep(c("M","F"), 18),
pain_level = c("High","Moderate","low"))
#plot
ds %>%
group_by(pain_level, sex) %>%
summarise(n=n()) %>%
mutate(prop = n/sum(n)*100) %>%
ggplot(., aes(x = sex, fill = pain_level, y = prop)) +
geom_bar(stat = "summary") +
facet_wrap( ~ pain_level) +
theme(legend.position = "none")
#p values of proportion test
ds %>%
rowwise %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>% #compute totals
distinct(., pain_level, .keep_all= TRUE) %>% #keep only one value of the row
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst)
I think the following might roughly resemble your desired output:
ds %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst) %>%
ggplot(aes(sex, estimate, group = pain_level)) +
geom_col(aes(fill = pain_level)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
facet_wrap(~ pain_level)

Using gganimate and ggplot for a boxplot: Cumulative not working

I'm trying to produce an animation for a simulation model, and I want to show how the distribution of results changes as the simulation runs.
I've seen gganimate used for scatter plots but not for boxplots (or ideally violin plots). Here I've provided a reprex.
When I use sim_category (which is a bucket for a certain number of simulation runs) I want the result to be cumulative of all previous runs to show the total distribution.
In this example (and my actual code), cumulative = TRUE does not do this. Why is this?
library(gganimate)
library(animation)
library(ggplot2)
df = as.data.frame(structure(list(ID = c(1,1,2,2,1,1,2,2,1,1,2,2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = c(1,1,1,1,2,2,2,2,3,3,3,3))))
df$ID <- factor(df$ID, levels = (unique(df$ID)))
df$sim_category <- factor(df$sim_category, levels = (unique(df$sim_category)))
ani.options(convert = shQuote('C:/Program Files/ImageMagick-7.0.5-Q16/magick.exe'))
p <- ggplot(df, aes(ID, value, frame= sim_category, cumulative = TRUE)) + geom_boxplot(position = "identity")
gganimate(p)
gganimate's cumulative doesn't accumulate the data, it just keeps gif frames in subsequent frames as they appear. To achieve what you want, you have to do the accumulation before building the plot, something along the following lines:
library(tidyverse)
library(gganimate)
df <- data_frame(
ID = factor(c(1,1,2,2,1,1,2,2,1,1,2,2), levels = 1:2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = factor(c(1,1,1,1,2,2,2,2,3,3,3,3), levels = 1:3)
)
p <- df %>%
pull(sim_category) %>%
levels() %>%
as.integer() %>%
map_df(~ df %>% filter(sim_category %in% 1:.x) %>% mutate(sim_category = .x)) %>%
ggplot(aes(ID, value, frame = factor(sim_category))) +
geom_boxplot(position = "identity")
gganimate(p)

Plotting grouped probabilities in R

I'm new to R and I'm trying to graph probability of flight delays by hour of day. Probability of flight delays would be calculated using a "Delays" column of 1's and 0's.
Here's what I have. I was trying to put a custom function into fun.y, but it doesn't seem like it's allowed.
library(ggplot2)
ggplot(data = flights, aes(flights$HourOfDay, flights$ArrDelay)) +
stat_summary(fun.y = (sum(flights$Delay)/no_na_flights), geom = "bar") +
scale_x_discrete(limits=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)) +
ylim(0,500)
What's the best way to do this?
Thanks in advance.
I am not sure if that is what you wanted, but I did it in the following way:
library(ggplot2)
library(dplyr)
library(nycflights13)
probs <- flights %>%
# Testing whether a delay occurred for departure or arrival
mutate(Delay = dep_delay > 0 | arr_delay > 0) %>%
# Grouping the data by hour
group_by(hour) %>%
# Calculating the proportion of delays for each hour
summarize(Prob_Delay = sum(Delay, na.rm = TRUE) / n()) %>%
ungroup()
theme_set(theme_bw())
ggplot(probs) +
aes(x = hour,
y = Prob_Delay) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = 0:24)
Which gives the following plot:
I think it is always better to do data manipulation outside ggplot, using for instance dplyr.

Resources