ggplot works in individual code but not work in loop - r

I am having issue of ggplot in loop.
It works well individually.
as below:
*plotgg<-
ggplot(renewalplot, aes(x = Month,y=Rate)) +
stat_summary(fun = mean, na.rm = TRUE,geom="bar")+
labs(x = "Month") +
ggtitle("Rate Change Distribution")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(cols = vars(Year))
print(plotgg)*
when I put them in loop, it gives me error:
vars <- colnames(detailinfo_renewal_1)
varslist1 = vars[c(13)]
for (i in varslist1) {
renewalplot <- detailinfo_renewal_1 %>%
filter(Product=="FI")
plotgg<-
ggplot(renewalplot, aes(x = renewalplot[, i],y=Rate)) +
stat_summary(fun = mean, na.rm = TRUE,geom="bar")+
labs(x = i) +
ggtitle("Mean of Rate Change Distribution")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(cols = vars(Year))
print(plotgg)
}
Much appreciated it!
LC

The bit that you've changed when putting into a loop tells you a lot about where the error might be:
aes(x = renewalplot[, i],y=Rate)
This method of mapping an aesthetic won't work. Normally, when you pick an aesthetic, you pass the name of the aesthetic to the value of x, such as in x = Month. Behind the scenes, ggplot() then figures out the appropriate values from your data source.
An alternative way of mapping aesthetics is to use aes_string(), which would probably be better suited to your use-case. Since i is already the name of the column as a string, this will fit right in
for (i in varslist1) {
renewalplot <- detailinfo_renewal_1 %>%
filter(Product=="FI")
plotgg<-
ggplot(renewalplot, aes_string(x = i, y = "Rate")) +
stat_summary(fun = mean, na.rm = TRUE,geom="bar")+
labs(x = i) +
ggtitle("Mean of Rate Change Distribution")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(cols = vars(Year))
print(plotgg)
}

You can write a function :
library(dplyr)
library(ggplot2)
plot_fn <- function(col) {
renewalplot <- detailinfo_renewal_1 %>% filter(Product=="FI")
ggplot(renewalplot, aes(x = .data[[col]],y=Rate)) +
stat_summary(fun = mean, na.rm = TRUE,geom="bar")+
labs(x = col) +
ggtitle("Mean of Rate Change Distribution")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(cols = vars(Year))
}
and use lapply to apply it for every value in varslist1.
list_plot <- lapply(varslist1, plot_fn)

Related

Selecting using geom_bar

I am using ggplot2 and geom_bar to summarize the number of spices in a column. I want the number of bars to be restricted to a value, say count > 10.
My code is:
ggplot(r_df, aes(spice, colour="black", fill = "orange")) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "black", size = 3.,
aes(label = after_stat(count)), position=position_stack(vjust=1.05)) +
theme(axis.text.x = element_text(angle=90, vjust=.6))
Since the code is counting the number of each item for me, I don't think I can subset prior to executing the plot. I am hoping there is a simple way to put the count>10 into the ggplot function itself.
Since I don't have the exact data in r_df, I'm taking a guess here. You should be able to use filter() prior to building your plot:
library(tidyverse)
r_df %>%
group_by(spice) %>%
filter(n() > 10) %>%
ungroup() %>%
ggplot(aes(spice, colour = "black", fill = "orange")) +
geom_bar(stat = "count") +
stat_count(
geom = "text",
colour = "black",
size = 3.,
aes(label = after_stat(count)),
position = position_stack(vjust = 1.05)
) +
theme(axis.text.x = element_text(angle = 90, vjust = .6))

Add mean line to ggplot?

I currently have this plot:
current plot without mean line
I want to add a continuous line in the plot that shows the mean value of each x-axis point.
How can i do this? Here is my code:
data <- ndpdata[which(ndpdata$FC.Fill.Size==250),] #250 fill size
data$PS_DATE <- as.Date(data$PS_DATE, "%Y-%m-%d")
data$PS_DATE <- as.Date(data$PS_DATE, "%m-%d-%Y")
data$final <- paste(data$PS_DATE, data$FC.Batch.Nbr, sep=" ") %>% na.omit()
library(tidyr)
my_df_long <- gather(data, group, y, -final)
data = my_df_long[2075:2550,] %>% na.omit()
ggplot(data, aes(final, y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325))
First, for the future please note the note of MrFlick.
We could use stat_summary. x should be factor and in a meaningful order.
I can't test because no data provided:
ggplot(data, aes(x=factor(final), y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325)) +
stat_summary(fun=mean, colour="red", geom="line", aes(group = 1))

R Windrose percent label on figure

I am using the windrose function posted here: Wind rose with ggplot (R)?
I need to have the percents on the figure showing on the individual lines (rather than on the left side), but so far I have not been able to figure out how. (see figure below for depiction of goal)
Here is the code that makes the figure:
p.windrose <- ggplot(data = data,
aes(x = dir.binned,y = (..count..)/sum(..count..),
fill = spd.binned)) +
geom_bar()+
scale_y_continuous(breaks = ybreaks.prct,labels=percent)+
ylab("")+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica")
I marked up the figure I have so far with what I am trying to do! It'd be neat if the labels either auto-picked the location with the least wind in that direction, or if it had a tag for the placement so that it could be changed.
I tried using geom_text, but I get an error saying that "aesthetics must be valid data columns".
Thanks for your help!
One of the things you could do is to make an extra data.frame that you use for the labels. Since the data isn't available from your question, I'll illustrate with mock data below:
library(ggplot2)
# Mock data
df <- data.frame(
x = 1:360,
y = runif(360, 0, 0.20)
)
labels <- data.frame(
x = 90,
y = scales::extended_breaks()(range(df$y))
)
ggplot(data = df,
aes(x = as.factor(x), y = y)) +
geom_point() +
geom_text(data = labels,
aes(label = scales::percent(y, 1))) +
scale_x_discrete(breaks = seq(0, 1, length.out = 9) * 360) +
coord_polar() +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
#teunbrand answer got me very close! I wanted to add the code I used to get everything just right in case anyone in the future has a similar problem.
# Create the labels:
x_location <- pi # x location of the labels
# Get the percentage
T_data <- data %>%
dplyr::group_by(dir.binned) %>%
dplyr::summarise(count= n()) %>%
dplyr::mutate(y = count/sum(count))
labels <- data.frame(x = x_location,
y = scales::extended_breaks()(range(T_data$y)))
# Create figure
p.windrose <- ggplot() +
geom_bar(data = data,
aes(x = dir.binned, y = (..count..)/sum(..count..),
fill = spd.binned))+
geom_text(data = labels,
aes(x=x, y=y, label = scales::percent(y, 1))) +
scale_y_continuous(breaks = waiver(),labels=NULL)+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
ylab("")+xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica") +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())

geom_histogram with proportions and factor data

I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:
dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)
wherein not all possible classes may have been present in the zone.
I can pre-compute the data summary and use geom_bar and a manual colour scale to get consistent bar colours regardless of missing data:
library(dplyr)
library(ggplot2)
library(viridis)
dat_summ <- dat %>%
group_by(CLASS, .drop = FALSE) %>%
summarise(percentage = n() / nrow(.) * 100)
mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]
ggplot(dat_summ) +
geom_bar(aes(x = CLASS, y = percentage, fill = CLASS),
stat = 'identity', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = mancols, drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
But I can't keep the colours consistent across plots when I try to use geom_histogram:
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = ..x..), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_c() +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b() doesn't solve the problem - it always rescales the palette against the number of non-0 columns.
Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar approach?
Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F) is what you want (with fill = CLASS).
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_d(drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
I think that the problem is that you pass the calculated variable ..x.. to fill in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual and you will get the same plot colours regardless of how many levels there are in your CLASS variable:
ggplot(dat) +
geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF"))
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

ggplot fill variable to add to 100%

Here is a dataframe
DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"),
Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))
I have created this graph.
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?
I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.
I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:
plot.fun <- function (original.data) {
newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
Plot <- ggplot(newDF, aes(x=Value, y=value)) +
geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return (Plot)
}
plot.fun(DF)
Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.
Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:
#Store factor values
fac <- unique(DF$SchoolYear)
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +
geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.
Finally, we check to see the plot is in fact giving us what we want.
#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]

Resources