reorder bars of ggplot with increasing y value - r

I have to reorder the bars of the plot with increasing value of y(nbi).
Thanks in advance!
#NBI*Sig_lip
p4 <-ggplot(DF, aes(x=sig_lip, y=nbi, fill=sig_lip)) +
stat_summary(fun.y="mean", geom="bar",show.legend = TRUE) +
stat_summary(func="sd", geom="errorbar") +
theme_minimal()
p4+ coord_flip()
p4 + ggtitle(label = "nbi associated to signaling lipids")

Here your issue to reorder bargraph is that you are calculating the mean and the standard deviation in ggplot2. So, if you pass the "classic" reorder(x, -y), it will set the order based on the individual values of y not the mean.
So, you need to calculate Mean and SD before passing nbi as an argument in ggplot2:
library(dplyr)
library(ggplot2)
DF %>% group_by(sig_lip) %>%
summarise(Mean = mean(nbi, na.rm = TRUE),
SD = sd(nbi, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(sig_lip,-Mean), y = Mean, fill = sig_lip))+
geom_col()+
geom_errorbar(aes(ymin = Mean-SD, ymax = Mean+SD))
Does it answer your question ?
If not, please provide a reproducible example of your dataset by follwoign this guide: How to make a great R reproducible example

Related

ggplot median and percentile

I'm trying to replicate this image.
I was able to plot a scatter plot and the median (but it's not continuous).
I failed to plot the percentiles.
The median varies according to different spell length.
ggplot(df,aes(x=Spell.Length,y=Growth.Rate)) +
geom_point() +
stat_summary(fun = median, fun.min = median, fun.max = median,
geom = "crossbar", width = 0.5,colour="red")
What I'm trying to do
What I got so far
Use dplyr::summarize to create a data frame of the values of percentiles also group_by(Spell.Length), then plot those using geom_line(). Then the horizontal lines with geom_hline().
df %>% group_by(Spell.Length) %>%
summarize(median = quantile(Growth.Rate, p = .5), q1 = quantile(Growth.Rate, p = .25)) %>%
ggplot(aes(x = Spell.Length, y = median) +
geom_line() +
geom_line(aes(x = Spell.Length, y = q1)) +
geom_hline(yintercept = 3)
would be the basic idea.
geom_line() for each specific line style/group
Red lines geom_hline()

Standard Error Bars in wrong position on my graph

I am trying to graph my data in R for my research project and for some reason on the three graphs I have created my error bars look like this. They are all at the bottom of the bars rather than in the correct spot on the top.
Here is my coding for that specific graph:
ggplot(Epiphyte_Biomass,aes(x=Treatment, y=Epiphyte.Biomass,fill=Treatment))+
geom_bar(stat="Identity")+
geom_errorbar(aes(ymin=mean(Epiphyte.Biomass)-sd(Epiphyte.Biomass),
ymax=mean(Epiphyte.Biomass)+ sd(Epiphyte.Biomass)),
width=0.2)+
theme_classic()
When you computed the mean and sd, ggplot didn't automatically subdivide the data by group, so I think you got the overall mean and SD (the mean looks low, but perhaps you have fewer data points in the "NC+N" treatment?)
ggplot2 has some built-in convenience wrappers for functions from the Hmisc package that compute different kinds of ranges, but ±1 SD bars are not included. Try
msd <- function(y) {
my <- mean(y, na.rm = TRUE)
sy <- sd(y, na.rm = TRUE)
data.frame(y = my, ymin = my - sy, ymax = my + sy)
}
## and use this in place of `geom_errorbar()`:
+ stat_summary(fun.data = msd, geom = "errorbar")
Here is an example using mtcars:
ggplot(mtcars, aes(cyl, mpg, fill = factor(cyl))) +
stat_summary(geom = "bar", fun = mean) +
stat_summary(geom = "errorbar", fun.data = msd)
The point is that this way ggplot does all the mean and SD calculations per treatment for you, on the fly, rather than your having to do them separately ...
It looks as though your data set may already have computed the mean of epiphyte biomass per treatment, in which case your SD calculations will be messed up anyway (they will be the SDs across treatment means rather than the within-treatment SDs)
I think the error is in -+sd(Epiphyte.Biomass)... You have to calculate sd for each treatment separately. In your case, sd is the same for both!
Here is an example with the mtcars dataset.
Just take your variables and put them in.
I really appreciate Ben Bolkers answer. It is not trivial to set the errorbars at least if you are not doing it every day.
library(tidyverse)
library(plyr)
# function
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
# definition of variables
df <- data_summary(mtcars, varname="mpg",
groupnames=c("cyl"))
# plot
ggplot(df, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=mpg-sd, ymax=mpg+sd), width=.2,
position=position_dodge(.9))

How do you plot multiple columns of a data frame all within the same boxplot in r (using ggplot2)?

I have a data frame that looks like this:
Train_Table_Time_Power <- data.frame(
Mean = runif(100),
STD = runif(100),
Kurt = runif(100),
Skew = runif(100),
TI = sample(c("0.05", "0.10", "0.15", "0.20"), 10, replace = TRUE)
)
I then created a box for the Skew Feature using the code below:
Skew_BoxPlot <- ggplot(Train_Table_Time_Power, aes(x = TI, y = Skew, color = TI)) +
geom_boxplot(notch = T, id=TRUE) +
stat_summary(fun = mean, geom="point", shape=19, color="red", size=2) +
geom_jitter(shape=16, position = position_jitter(0.2), size = 0.3) +
labs(title = "Crest_Time", x = "TI", y = "Normalized Magnitude") +
theme_minimal() + theme_Publication()
The above box plot displays the different distributions of the Skew feature as the TI feature varies. However, I now want to create a new box plot that shows the distributions of all of the features (Mean, STD, Kurt, and Skew) for just one value of TI, say TI = 0.05, and I would like the figure to plot all of the box plot distributions on the same graph horizontally, next to each other. Can anyone direct me on how best to go about doing this?
You can convert your data into a long table and then plot. Using tidyverse this can be easily done
library(tidyverse)
Train_Table_Time_Power %>% filter(TI == 0.05) %>%
pivot_longer( cols=1:4) %>%
ggplot(aes(x=name, y=value)) + geom_boxplot()
You can change TI == 0.05 to any value that you want or you can do all TI values and used facet_grid() to split out individual plots
Train_Table_Time_Power %>% pivot_longer( cols=1:4) %>%
ggplot(aes(x=name, y=value)) + geom_boxplot() +facet_grid(~TI)

ggplot2 barplot - adding percentage labels inside the stacked bars but retaining counts on the y-axis

I have created an stacked barplot with the counts of a variables. I want to keep these as counts, so that the different bar sizes represent different group sizes. However, inside the bar plot i would like to add labels that show the proportion of each stack - in terms of percentage.
I managed to create the stacked plot of count for every group. Also I have created the labels and they are are placed correctly. What i struggle with is how to calculate the percentage there?
I have tried this, but i get an error:
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n())
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = ifelse((..count..)==0,"",scales::percent((..count..)/sum(..count..)))), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
Error in (count) == 0 : comparison (1) is possible only for atomic
and list types
desired result:
well, just found answer ... or workaround. Maybe this will help someone in the future: calculate the percentage before the ggplot and then just just use that vector as labels.
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = dataex$pct), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()

ggplot faceted cumulative histogram

I have the following data
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(100, 6, 1))
gender = rep(c("Male", "Female"), each=100)
mydata = data.frame(x=x, gender=gender)
and I want to plot two cumulative histograms (one for males and the other for females) with ggplot.
I have tried the code below
ggplot(data=mydata, aes(x=x, fill=gender)) + stat_bin(aes(y=cumsum(..count..)), geom="bar", breaks=1:10, colour=I("white")) + facet_grid(gender~.)
but I get this chart
that, obviously, is not correct.
How can I get the correct one, like this:
Thanks!
I would pre-compute the cumsum values per bin per group, and then use geom_histogram to plot.
mydata %>%
mutate(x = cut(x, breaks = 1:10, labels = F)) %>% # Bin x
count(gender, x) %>% # Counts per bin per gender
mutate(x = factor(x, levels = 1:10)) %>% # x as factor
complete(x, gender, fill = list(n = 0)) %>% # Fill missing bins with 0
group_by(gender) %>% # Group by gender ...
mutate(y = cumsum(n)) %>% # ... and calculate cumsum
ggplot(aes(x, y, fill = gender)) + # The rest is (gg)plotting
geom_histogram(stat = "identity", colour = "white") +
facet_grid(gender ~ .)
Like #Edo, I also came here looking for exactly this. #Edo's solution was the key for me. It's great. But I post here a few additions that increase the information density and allow comparisons across different situations.
library(ggplot2)
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(50, 6, 1))
gender = c(rep("Male", 100), rep("Female", 50))
grade = rep(1:3, 50)
mydata = data.frame(x=x, gender=gender, grade = grade)
ggplot(mydata, aes(x,
y = ave(after_stat(density), group, FUN = cumsum)*after_stat(width),
group = interaction(gender, grade),
color = gender)) +
geom_line(stat = "bin") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~grade)
I rescale the y so that the cumulative plot always ends at 100%. Otherwise, if the groups are not the same size (like they are in the original example data) then the cumulative plots have different final heights. This obscures their relative distribution.
Secondly, I use geom_line(stat="bin") instead of geom_histogram() so that I can put more than one line on a panel. This way I can compare them easily.
Finally, because I also want to compare across facets, I need to make sure the ggplot group variable uses more than just color=gender. We set it manually with group = interaction(gender, grade).
Answering a million years later....
I was looking for a solution for the same problem and I got here..
Eventually I figured it out by myself, so I'll drop it here in case other people will ever need it.
As required: no pre-work is necessary!
ggplot(mydata) +
geom_histogram(aes(x = x, y = ave(..count.., group, FUN = cumsum),
fill = gender, group = gender),
colour = "gray70", breaks = 1:10) +
facet_grid(rows = "gender")

Resources