Usually in publications, statistically significant differences are shown by putting * above the bar. I have a lot of bars in my plot and I was hoping to make significant ones different from the others by coloring it differently.
For example:
this is the dataset
some_data = data.frame(name = sample(LETTERS, 5),
value = rnorm(5, 5, 7),
pvalue = rnorm(5, 0.05, 0.02))
> some_data
name value pvalue
1 Q 8.8101784 0.01691628
2 Z 5.9426036 0.10228445
3 U 1.4862314 0.02062453
4 K -0.1365665 0.04405621
5 N 8.8828848 0.05992229
ggplot(some_data, aes(name, value)) +
geom_bar(stat = "identity") +
geom_text(aes(label=pvalue), position=position_dodge(width=0.9), vjust=-0.25)
What I want is to make the bars different colored if pvalue was more less than 0.05
ggplot aesthetics let you evaluate R code, which allows you to do stuff like this:
ggplot(some_data, aes(x = name, y = value, fill = pvalue < 0.05)) +
geom_col() +
geom_text(aes(label=pvalue), position=position_dodge(width=0.9), vjust=-0.25)
EDIT: Use geom_col instead of geom_bar(stat = 'identity') per Axeman's comment.
Related
I am trying to create a swimlane plot of different subjects doses over time. When I run my code the bars are stacked by amount of dose. My issue is that subjects doses vary they could have 5, 10 , 5 in my plot the 5's are stacked together. But I want the represented as they happen over time. In my data set I have the amount of time each patient was on a dose for ordered by when they had the dose. I want by bars stacked by ordering variable called "p" which is numeric is goes 1,2,3,4,5,6 etc which what visit the subject had that dose.
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(EXDOSE))) +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
I want the bars stacked by my variable "p" not by fill
I tried forcats but that does not work. Unsure how to go about this the data in the dataset is arranged by p for each subject
example data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "1034","1034","1034","1034"),
exdose = c(5,10,20,5,5,10,20,20),
p= c(1,2,3,4,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5)
)
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(exdose)),position ="stack") +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
If you want to order your stacked bar chart by p you have to tell ggplot2 to do so by mapping p on the group aesthetic. Otherwise ggplot2 will make a guess which by default is based on the categorical variables mapped on any aesthetic, i.e. in your case the fill aes:
Note: I dropped the scale_fill_manual as you did not provide the vector of colors. But that's not important for the issue.
library(ggplot2)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)))
EDIT And to get the right order we have to reverse the order of the stack which could be achieved using position_stack(reverse = TRUE):
Note: To check that we have the right order I added a geom_text showing the p value.
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE)) +
geom_text(aes(label = p), position = position_stack(reverse = TRUE))
Second option would be to convert p to a factor which the order of levels set in the reverse order:
ggplot(dataset, aes(x = diff + 1, y = subject, group = factor(p, rev(sort(unique(p)))))) +
geom_col(aes(fill = as.factor(exdose))) +
geom_text(aes(label = p), position = "stack")
in the base version of R it is easy (but cumbersome) to create a plot with error bars based on the descriptive data. With ggplot2 I am struggling to do so and all the examples I have found are based on the raw data.
Specifically, how can I create a barplot with confidence intervals for a simple two-group design? M1 = 3, M2 = 4, SD1 = 1, SD2 = 1.2, n1 = 111, n2 = 222? I started off simply with
ggplot(aes(x=c(1:2), y=c(3, 4))) + geom_bar()
# or
ggplot(aes(y=c(3, 4))) + geom_bar()
but not even this seem to work to create a barplot.
Any suggestions?
What about using ggplot2::stat_summary()? You can let it take care of your mean and se calculations (it relies on library(Hmisc) for most of these summary functions, so look there for more help).
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_se)
Adjust width = for skinnier bars or error bars.
You can also use a true confidence interval with mean_cl_normal or mean_cl_boot and for a better visualization of the data dispersion:
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "crossbar", fun.data = mean_cl_normal)
Edit:
If your want to recreate a published paper just roll your data into a data.frame first:
datf <- data.frame(
group = c("1", "2"),
means = c(3,4),
sds = c(1,1.2),
ns = c(111, 222)
)
# add your CI calcs as column called upr and lwr
library(tidyverse)
datf <- datf %>% mutate(lwr = means - (qnorm(.975)*(sds/sqrt(ns))),
upr = means + (qnorm(.975)*(sds/sqrt(ns))))
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_crossbar()
Or the traditional standard of columns with error bars if you must like this:
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_col() +
geom_errorbar()
You can draw an error bar to whatever values you want. They have an aesthetic called ymin and ymax that you can set. Here I draw the bars +/- 1 standard devaiation from the mean
dd<-read.table(text="sample mean sd n
1 3 1 111
2 4 1.2 222", header=T)
ggplot(dd, aes(sample)) +
geom_col(aes(y=mean)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd))
I have developed code in ggplot for a boxplot that displays the mean --calling a custom function. The code is the following:
fun_mean <- function(x){
return(data.frame(y=round(mean(x), digits = 3),label=mean(x,na.rm=T)))}
ggplot(my_data, aes(x = as.factor(viotiko), y = pd_1year, fill = as.factor(viotiko))) + geom_boxplot() +
labs(title="Does the PD differ significantly by 'Viotiko' group?",x="Viotiko Group", y = "PD (pd_1year)") +
coord_cartesian(ylim = c(0,0.05)) + stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
The boxplot outputted is the following:
As you can see, although I have rounded the means to contain only 3 decimal digits, they appear long and clutter the plot.
What should I do to limit the digits displayed to only 3?
Moreover, I am puzzled by the fact that the means appear outside the distribution depicted by the boxplots in the majority of the groups. How could this be interpreted?
Your advice will be appreciated.
I faked up some data to reproduce this - in the future you should do that yourself before posting:
library(ggplot2)
set.seed(1234)
n <- 100
my_data <- data.frame(viotiko=sample(0:8,n,T),pd_1year=exp(rnorm(n,-4.5,0.8)))
fun_mean <- function(x){
y = mean(x,na.rm=T)
return(data.frame(y=y,label=round(y,3)))
}
ggplot(my_data, aes(x = as.factor(viotiko), y = pd_1year, fill = as.factor(viotiko))) +
geom_boxplot() +
labs(title="Does the PD differ significantly by 'Viotiko' group?",
x="Viotiko Group", y = "PD (pd_1year)") +
coord_cartesian(ylim = c(0,0.05)) +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
Yielding this:
As for your question about why the mean is falling outside the 0.25-0.75 quartile, that is quite common - and to be expected - for long tailed data,even if it does seem a bit counter-intuititive. In this case I used a log-normal distribution and I had 3 of 8 mean values outside those quartiles.
I have a data set where I need to represent a stacked bar chart for two cohorts over three time periods. Currently, I am faceting by year, and filling based on probability values for my DV (# of times,t, that someone goes to a nursing home; pr that t=0, t=1, ... t >= 5). I am trying to figure out if it is possible to introduce another color scale, so that each of the "Comparison" bars would be filled with a yellow gradient, and the treatmetn bars would be filled with a blue gradient. I figure the best way to do this may to be to overlay the two plots, but I'm not sure if it is possible to do this in ggplot (or some other package.) Code and screenshot are below:
tempPlot <- ggplot(tempDF,aes(x = HBPCI, y = margin, fill=factor(prob))) +
scale_x_continuous(breaks=c(0,1), labels=c("Comparison", "Treatment"))+
scale_y_continuous(labels = percent_format())+
ylab("Prob snf= x")+
xlab("Program Year")+
ggtitle(tempFlag)+
geom_bar(stat="identity")+
scale_fill_brewer(palette = "Blues")+ #can change the color scheme here.
theme(axis.title.y =element_text(vjust=1.5, size=11))+
theme(axis.title.x =element_text(vjust=0.1, size=11))+
theme(axis.text.x = element_text(size=10,angle=-45,hjust=.5,vjust=.5))+
theme(axis.text.y = element_text(size=10,angle=0,hjust=1,vjust=0))+
facet_grid(~yearQual, scales="fixed")
You may want to consider using interaction() -- here's a reproducible solution:
year <- c("BP", "PY1", "PY2")
type <- c("comparison", "treatment")
df <- data.frame(year = sample(year, 100, T),
type = sample(type, 100, T),
marg = abs(rnorm(100)),
fact = sample(1:5, 100, T))
head(df)
# year type marg fact
# 1 BP comparison 0.2794279 3
# 2 PY2 comparison 1.6776371 1
# 3 BP comparison 0.8301721 2
# 4 PY1 treatment 0.6900511 1
# 5 PY2 comparison 0.6857421 3
# 6 PY1 treatment 1.4835672 3
library(ggplot2)
blues <- RColorBrewer::brewer.pal(5, "Blues")
oranges <- RColorBrewer::brewer.pal(5, "Oranges")
ggplot(df, aes(x = type, y = marg, fill = interaction(factor(fact), type))) +
geom_bar(stat = "identity") +
facet_wrap(~ year) +
scale_fill_manual(values = c(blues, oranges))
If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))