Exclude observations below a certain threshold in a stacked bar chart in ggplot2 - r

I need to exclude some observations below a certain threshold in stacked bar chart done with ggplot2.
An example of my dataframe:
My code:
ggplot(df, aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
geom_bar(stat="identity") +
theme_bw()+
theme(legend.key.size = unit(0.5,"line"), legend.position = c(0.8,0.7)) +
labs(x = "UserName") +
ylim(0, 95000) +
scale_y_continuous(breaks = seq(0, 95000, 10000)) +
scale_fill_brewer(palette = "Blues") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
The problem is that I have many observations (UserName) with low values on the Y axes (Nb_Interrogations). So I'd like to exclude all the UserName below a certain threshold from the barplot, let's say 100.
I tried with the which function changing my code:
ggplot(df[which(df$Nb_Interrogations>100),]aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
geom_bar(stat="identity") +
theme_bw()+
theme(legend.key.size = unit(0.5,"line"), legend.position = c(0.8,0.7)) +
labs(x = "UserName") +
ylim(0, 95000) +
scale_y_continuous(breaks = seq(0, 95000, 10000)) +
scale_fill_brewer(palette = "Blues") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
But it doesn't fit my case since it excludes all the observations below the threshold = 100 that are present in my DF from the general computation changing also the Y axes values. How can I solve this problem? thanks

It looks like the simplest solution for you will involve subsetting your data first, and then plotting. Without workable data to test, this is just a theoretical answer, so you may have to adapt for your needs. You can pipe the subsetting and plotting together for ease. Something like this might do the trick for you:
df %>%
group_by(UserName) %>%
filter(sum(Nb_Interrogations > 100)) %>%
ggplot(., aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
## the rest of your plotting code here ##

Related

Add mean line to ggplot?

I currently have this plot:
current plot without mean line
I want to add a continuous line in the plot that shows the mean value of each x-axis point.
How can i do this? Here is my code:
data <- ndpdata[which(ndpdata$FC.Fill.Size==250),] #250 fill size
data$PS_DATE <- as.Date(data$PS_DATE, "%Y-%m-%d")
data$PS_DATE <- as.Date(data$PS_DATE, "%m-%d-%Y")
data$final <- paste(data$PS_DATE, data$FC.Batch.Nbr, sep=" ") %>% na.omit()
library(tidyr)
my_df_long <- gather(data, group, y, -final)
data = my_df_long[2075:2550,] %>% na.omit()
ggplot(data, aes(final, y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325))
First, for the future please note the note of MrFlick.
We could use stat_summary. x should be factor and in a meaningful order.
I can't test because no data provided:
ggplot(data, aes(x=factor(final), y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325)) +
stat_summary(fun=mean, colour="red", geom="line", aes(group = 1))

How to change color of data points on a boxplot based on a factored variable using ggplot R

I am trying to make a series of graphs that are based on a binomial variable. I want to add data points to the graph based on a different factored variable with 3 levels. I have been trying to use geom_jitter which worked to put the points on the box plot but I havent been able to change the colors to represent the different levels of the factored variable.
Here is the code I have been using
longg <- ggplot(long, aes(x = mbbase, y= beta)) +
geom_boxplot() + facet_wrap(~test) +
ylab("Beta") +
theme_cleveland() +
scale_fill_viridis(discrete = TRUE, alpha=0.09) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(axis.title.x = element_blank()) +
geom_jitter( size=0.7, alpha=1, width = 0.05)
Here is an example of the graph I want with the mtcars data but instead of a numeric variable as the color id like a factored variable with 3 levels but I only want the color of the data points to change without adding a new box plot for each level of the factored variable
With mtcars, you can try this:
library(ggplot2)
library(dplyr)
library(viridis)
mtcars %>%
# optional: divide the column to color in three. There are more elegant ways
#to do it, but in this way probably it's easier to use it in your data
mutate(new_carb = as.factor(ifelse(carb %in% c(1,2),1,
ifelse(carb %in% c(3,4),2,3)))) %>%
ggplot( aes(x = as.factor(am), y= mpg)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
axis.title.x = element_blank()) +
geom_boxplot(outlier.shape=NA) +
# add the color here
geom_jitter(aes(color = new_carb),
size=0.7, alpha=1, width = 0.05) +
scale_color_viridis(discrete = TRUE, alpha=0.09)

How do I represent percent of a variable in a filled barplot?

I have a data frame(t1) and I want to illustrate the shares of companies in relation to their size
I added a Dummy variable in order to make a filled barplot and not 3:
t1$row <- 1
The size of companies are separated in medium, small and micro:
f_size <- factor(t1$size,
ordered = TRUE,
levels = c("medium", "small", "micro"))
The plot is build up with the economic_theme:
ggplot(t1, aes(x = "Size", y = prop.table(row), fill = f_size)) +
geom_col() +
geom_text(aes(label = as.numeric(f_size)),
position = position_stack(vjust = 0.5)) +
theme_economist(base_size = 14) +
scale_fill_economist() +
theme(legend.position = "right",
legend.title = element_blank()) +
theme(axis.title.y = element_text(margin = margin(r = 20))) +
ylab("Percentage") +
xlab(NULL)
How can I modify my code to get the share for medium, small and micro in the middle of the three filled parts in the barplot?
Thanks in advance!
Your question isn't quite clear to me and I suggest you re-phrase it for clarity. But I believe you're trying to get the annotations to be accurately aligned on the Y-axis. For this use, pre-calculate the labels and then use annotate
library(data.table)
library(ggplot2)
set.seed(3432)
df <- data.table(
cat= sample(LETTERS[1:3], 1000, replace = TRUE)
, x= rpois(1000, lambda = 5)
)
tmp <- df[, .(pct= sum(x) / sum(df[,x])), cat][, cumsum := cumsum(pct)]
ggplot(tmp, aes(x= 'size', y= pct, fill= cat)) + geom_bar(stat='identity') +
annotate('text', y= tmp[,cumsum] - 0.15, x= 1, label= as.character(tmp[,pct]))
But this is a poor decision graphically. Stacked bar charts, by definition sum to 100%. Rather than labeling the components with text, just let the graphic do this for you via the axis labels:
ggplot(tmp, aes(x= cat, y= pct, fill= cat)) + geom_bar(stat='identity') + coord_flip() +
scale_y_continuous(breaks= seq(0,1,.05))

Grouped bar chart, ggplot2 in R, need to display one observation at a time

I have a dataset that looks like the following:
df <- data.frame(Name=rep(c('Sarah', 'Casey', 'Mary', 'Tom'), 3),
Scale=rep(c('Scale1', 'Scale2', 'Scale3'), 4),
Score=sample(1:7, 12, replace=T))
I am trying to create a barchat in ggplot2 that currently looks like this:
ggplot(df, aes(x=Name, y=Score, fill=Scale)) + geom_bar(stat='identity', position='dodge') +
coord_flip() +
scale_y_continuous(breaks=seq(0, 7, 1), limits = c(0, 7)) +
scale_x_discrete() +
scale_fill_manual(values=c('#253494', '#2c7fb8', '#000000')) +
theme(panel.background = element_blank(),
legend.position = 'right',
axis.line = element_line(),
axis.title = element_blank(),
axis.text = element_text(size=10))
However, I only want to show one observation (one Name) at a time. Is this possible to do without creating a ton of separate datasets, one for each person? I would like the end result to look like the example below, where I can just iterate through the names to produce a separate plot for each, or some similar process.
# Trying to avoid creating separate datasets, but for the sake of the example:
df2 <- data.frame(Name=rep(c('Sarah'), 3),
Scale=c('Scale1', 'Scale2', 'Scale3'),
Score=sample(1:7, 3, replace=T))
ggplot(df2, aes(x=Name, y=Score, fill=Scale)) + geom_bar(stat='identity', position='dodge') +
coord_flip() +
scale_y_continuous(breaks=seq(0, 7, 1), limits = c(0, 7)) +
scale_x_discrete() +
scale_fill_manual(values=c('#253494', '#2c7fb8', '#000000')) +
theme(panel.background = element_blank(),
legend.position = 'right',
axis.line = element_line(),
axis.title = element_blank(),
axis.text = element_text(size=10))
Since your data is already tidy ie. in long format, you can use facet_wrap as suggested and set the scales as "free" thus creating facets with your different Name groups.
df %>% ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip() +
facet_wrap(~Name, scales = "free")
You can get rid of the facet labels or the axis labels depending which you prefer.
EDIT: in response to comment.
You can use the same data frame to create seperate plots by just piping a filter in at the start, hence,
df %>%
filter(Name == "Sarah") %>%
ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip()
Since you are using Rmarkdown you could throw a for loop around that to plot all the names
for(i in c("Sarah", "Casey", "Mary", "Tom")){
df %>%
filter(Name == i) %>%
ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip()
}
If you want to arrange all these into a group you can use ggpubr::ggarrange to place all the plots into the same object.
facet_grid(.~Name)
Maybe somehow implement this, it'll plot them all, but should do so in individual plots.

ggplot fill variable to add to 100%

Here is a dataframe
DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"),
Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))
I have created this graph.
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?
I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.
I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:
plot.fun <- function (original.data) {
newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
Plot <- ggplot(newDF, aes(x=Value, y=value)) +
geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return (Plot)
}
plot.fun(DF)
Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.
Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:
#Store factor values
fac <- unique(DF$SchoolYear)
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +
geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.
Finally, we check to see the plot is in fact giving us what we want.
#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]

Resources