Geom_errorbar adding multiple error bars to bar plot (ggplot) - r

I'm trying to create a ggplot based on the means of various groups in my data. The means were found using aggregate;
f1 <- function(x) c(Mean = mean(x), std_error=std.error(x)) #create function for mean and standard error coverboard_stat<-aggregate(no_count ~ EEM.or.NCOS + Species+Period..Oct.Sep.+Habitat, data = coverboard, f1)
so it now looks like this (as an example; the real data-set is much larger)
Group
no_count[,"Mean"
no_count[,"std_error"
Type 1
1
.05
Type 2
2
.75
this is my barplot code:
ggplot(aes(x = Species, y = no_count[,"Mean"]), data = post_rest) +
geom_bar(aes(fill=EEM.or.NCOS), stat = "identity", position = "dodge") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.9, hjust=1)) +
labs(fill="Site") +
scale_fill_brewer(palette = "YlOrRd", labels=c("South Parcel", "North Campus Open Space")) +
labs(x="", y="Encounter rate") +
theme(legend.position="bottom")+
geom_errorbar(aes(ymin=no_count[,"Mean"]-no_count[,"std_error"],
ymax=no_count[,"Mean"]+no_count[,"std_error"],
fill=EEM.or.NCOS),
width=.2,
position=position_dodge(.9))
And this is the result I get:
Obviously this isn't correct; I just want one error bar for each bar.
I tried formatting the (aes) differently and using summary instead of aggregate (which lead me down a whole different wormhole of errors).

Related

How to label plot with value of bars?

I'm trying to make a barplot with two categorical values. This particular thread was very helpful
My code was this
ggplot(DF, aes(Participant.Type, ..count...)) +
geom_bar(aes(fill=Sex), position ="dodge") +
theme_classic() +
ggtitle("Main phenotypes stated for the PCDH19 cohort on GEL") +
scale_fill_viridis(option ="viridis")
This was my resulting graph. I'm now trying to add the count of the particular bars on top - like Female proband is 135, Male proband is 165 and so on. I tried adding different iterations of the geom_text command so I could achieve this. Commands here:
+ geom_text(aes(label= ..count))
+ geom_text(aes(label= Sex))
Could anyone please help?
With some sample data from that question you linked you can do it like this:
library(ggplot2)
library(viridis)
#> Loading required package: viridisLite
Fruit <- c(rep("Apple", 3), rep("Orange", 5))
Bug <- c("worm", "spider", "spider", "worm", "worm", "worm", "worm", "spider")
df <- data.frame(Fruit, Bug)
ggplot(df, aes(Fruit, fill = Bug)) + geom_bar(position = "dodge") +
geom_text(
aes(label = after_stat(count)),
stat = "count",
vjust = -0.5,
position = position_dodge(width = 0.9)
) +
geom_text(
aes(y = after_stat(count), label = Bug),
stat = "count",
vjust = -1.5,
position = position_dodge(width = 0.9)
) +
scale_y_continuous(expand = expansion(add = c(0, 1))) +
scale_fill_viridis(option = "viridis", discrete = TRUE)
A few things to note:
geom_bar doesn't need ..count.. passed as a y-value - it defaults to counting
after_stat(count) is the updated form of .. notation
Text labels need dodges added - default width is 0.9 for bars so this width matches the placement of the bars.
I can't test the process without your input data, but here's something for you to give a try:
+ geom_text(stat='count', aes(label=..count..), vjust=-1)

Using geom_pointrange() to plot means and standard errors

I have three groups (categorical variable) who completed a test and a dataframe with the mean and standard error on the test per group. I would like to plot their means as a single point in the plot accompanied by a short horizontal line indicating the standard error (i.e., error bars). I'm using R with ggplot2.
My x-axis represents all the possible scores in the test (from -218 to 218) and the groups are plotted on the y-axis (I used coord_flip() for this).
I was able to create the graph but the standard error lines don't show up, so I don't know what I'm doing wrong. I think it has to do with my use of geom_pointrange(), but I have no idea what I'm supposed to change.
This is my code:
ggplot(descriptive_blp_data) +
aes(x = group, y = mean_blp, colour = group, size = 5) +
geom_pointrange(aes(ymin = mean_blp - se_blp, ymax = mean_blp + se_blp), width=.2,
position=position_dodge(.9)) +
scale_color_manual(
values = list(
Group_2 = "#9EBCDA",
Group_3 = "#8856A7",
Group_1 = "#E0ECF4"
)
) +
labs(y = "Mean BLP score (SE)") +
coord_flip() +
theme_classic() +
theme(legend.position = "none", axis.title.y = element_blank()) +
ylim(-218, 218)
And this is my graph so far:
It is easier to check this, if you can provide the actual dataframe descriptive_blp_data. Running your code with some arbitrary dataset does work as intended and produces error bars, so there is nothing really wrong with the ggplot part.
There may be a few reasons why this does not work with your actual dataset - maybe the standard errors are too small to show up with a point size of 5?
descriptive_blp_data <- data.frame(
"group" = c("Group_3", "Group_2", "Group_1"),
"mean_blp" = c(150, 50, -50),
"se_blp" = c(40, 20, 30)
)
library(ggplot2)
ggplot(descriptive_blp_data) +
aes(x = group, y = mean_blp, colour = group, size = 5) +
geom_pointrange(aes(ymin = mean_blp - se_blp, ymax = mean_blp + se_blp), width=.2,
position=position_dodge(.9)) +
scale_color_manual(
values = list(
Group_2 = "#9EBCDA",
Group_3 = "#8856A7",
Group_1 = "#E0ECF4"
)
) +
labs(y = "Mean BLP score (SE)") +
coord_flip() +
theme_classic() +
theme(legend.position = "none", axis.title.y = element_blank()) +
ylim(-218, 218)

Exclude observations below a certain threshold in a stacked bar chart in ggplot2

I need to exclude some observations below a certain threshold in stacked bar chart done with ggplot2.
An example of my dataframe:
My code:
ggplot(df, aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
geom_bar(stat="identity") +
theme_bw()+
theme(legend.key.size = unit(0.5,"line"), legend.position = c(0.8,0.7)) +
labs(x = "UserName") +
ylim(0, 95000) +
scale_y_continuous(breaks = seq(0, 95000, 10000)) +
scale_fill_brewer(palette = "Blues") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
The problem is that I have many observations (UserName) with low values on the Y axes (Nb_Interrogations). So I'd like to exclude all the UserName below a certain threshold from the barplot, let's say 100.
I tried with the which function changing my code:
ggplot(df[which(df$Nb_Interrogations>100),]aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
geom_bar(stat="identity") +
theme_bw()+
theme(legend.key.size = unit(0.5,"line"), legend.position = c(0.8,0.7)) +
labs(x = "UserName") +
ylim(0, 95000) +
scale_y_continuous(breaks = seq(0, 95000, 10000)) +
scale_fill_brewer(palette = "Blues") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
But it doesn't fit my case since it excludes all the observations below the threshold = 100 that are present in my DF from the general computation changing also the Y axes values. How can I solve this problem? thanks
It looks like the simplest solution for you will involve subsetting your data first, and then plotting. Without workable data to test, this is just a theoretical answer, so you may have to adapt for your needs. You can pipe the subsetting and plotting together for ease. Something like this might do the trick for you:
df %>%
group_by(UserName) %>%
filter(sum(Nb_Interrogations > 100)) %>%
ggplot(., aes(x=reorder(UserName,-Nb_Interrogations, sum), y=Nb_Interrogations, fill=Folder)) +
## the rest of your plotting code here ##

ggplot fill variable to add to 100%

Here is a dataframe
DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"),
Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))
I have created this graph.
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?
I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.
I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:
plot.fun <- function (original.data) {
newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
Plot <- ggplot(newDF, aes(x=Value, y=value)) +
geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return (Plot)
}
plot.fun(DF)
Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.
Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:
#Store factor values
fac <- unique(DF$SchoolYear)
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +
geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.
Finally, we check to see the plot is in fact giving us what we want.
#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]

How to plot 95 percentile and 5 percentile on ggplot2 plot with already calculated values?

I have this dataset and use this R code:
library(reshape2)
library(ggplot2)
library(RGraphics)
library(gridExtra)
long <- read.csv("long.csv")
ix <- 1:14
ggp2 <- ggplot(long, aes(x = id, y = value, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3, angle = 0) +
scale_x_continuous("Nodes", breaks = ix) +
scale_y_continuous("Throughput (Mbps)", limits = c(0,1060)) +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)",
"Inside Firewall (Source)",
"Outside Firewall (Dest)",
"Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right") +
theme(legend.title = element_text(colour="black", size=14, face="bold")) +
theme(legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .) +
plot(ggp2)
to get the following result:
Now I need to add the 95 percentile and 5 percentile to the plot. The numbers are calculated in this dataset (NFPnumbers (95 percentile) and FPnumbers (5 percentile) columns).
It seems boxplot() may work here but I am not sure how to use it with ggplot.
stat_quantile(quantiles = c(0.05,0.95)) could work as well, but the function calculates the numbers itself. Can I use my numbers here?
I also tried:
geom_line(aes(x = id, y = long$FPnumbers)) +
geom_line(aes(x = id, y = long$NFPnumbers))
but the result did not look good enough.
geom_boxplot() did not work as well:
geom_boxplot(aes(x = id, y = long$FPnumbers)) +
geom_boxplot(aes(x = id, y = long$NFPnumbers))
When you want to set the parameters for a boxplot, you also need ymin and ymax values. As they are not in the dataset, I calculated them.
ggplot(long, aes(x = factor(id), y = value, fill = type)) +
geom_boxplot(aes(lower = FPnumbers, middle = value, upper = NFPnumbers, ymin = FPnumbers*0.5, ymax = NFPnumbers*1.2, fill = type), stat = "identity") +
xlab("Nodes") +
ylab("Throughput (Mbps)") +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)", "Inside Firewall (Source)",
"Outside Firewall (Dest)", "Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right",
legend.title = element_text(colour="black", size=14, face="bold"),
legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .)
The result:
In the dataset you provided, you gave the value, FPnumbers & NFPnumbers variables. As FPnumbers & NFPnumbers represent the 5 and 95 percentiles, I suppose that the mean is represented by value. For this solution to work, you'll need min and max values for each "Node". I guess you have them somewhere in your raw data.
However, as they are not provided in the dataset, I made them up by calculating them based on FPnumbers & NFPnumbers. The multiplication factors of 0.5 and 1.2 are arbitrary. It is just a way of creating fictitious min and max values.
There are several suitable geoms for that, geom_errorbar is one of them:
ggp2 + geom_errorbar(aes(ymax = NFPnumbers, ymin = FPnumbers), alpha = 0.5, width = 0.5)
I don't know if there's a way to get rid of the central line though.

Resources