I am trying to plot three boxplots and label always the 2 highest values.
I tried creating a position element and use this to get same position for points and labels, but somehow this does not work.
set.seed(1)
df <-
data.frame(a=rep(letters, 3),
b=LETTERS[1:3],
int=runif(78, 15, 30))
jitter_pos <- position_jitter(width=.4, seed = 1)
ggplot(df,
aes(1, int,
color=b)) +
geom_point(position=jitter_pos) +
geom_boxplot(alpha=.3, outlier.shape=NA, fill=NA, color='#993404') +
facet_wrap(~ b) +
guides(color=FALSE) +
geom_label_repel(data=df %>%
group_by(b) %>%
arrange(desc(int)) %>%
slice(1:2),
aes(label=a),
size=2.5, color='black',
fill='#FFFFFF33',
box.padding=1,
position=jitter_pos)
I am pretty sure it is just a little mistake but somehow I can't find my error. The labels do not match the dot positions.
Maybe a better solution would be with b on the x axis and using jitterdodge somehow, but this didn't worked ether, so I tried to get it running with facets. So nothing works for me yet.
Your mistake is in position_jitter's behavior and the issue is discussed at length in this github issue.
The solution suggested by the ggrepel author is to add an explicit label column to the dataframe, with empty strings for the rows you want to omit:
library(ggplot2)
library(ggrepel)
library(tidyverse)
set.seed(1)
df <-
data.frame(a=rep(letters, 3),
b=LETTERS[1:3],
int=runif(78, 15, 30)) %>%
group_by(b) %>%
mutate(label = if_else(rank(-int) %in% 1:2, as.character(a), ""))
jitter_pos <- position_jitter(width=.4, height = 0, seed = 1)
ggplot(df,
aes(1, int,
color=b)) +
geom_jitter(position=jitter_pos) +
geom_boxplot(alpha=.3, outlier.shape=NA, fill=NA, color='#993404') +
guides(color=FALSE) +
facet_wrap(~ b) +
geom_label_repel(aes(x=1, y=int, label=label),
size=2.5, color='black',
fill='#FFFFFF33',
box.padding=1,
position=jitter_pos)
Created on 2020-09-01 by the reprex package (v0.3.0)
Related
my question is basically a follow-up to this question. However, the problem is that in the said question the answer completely bypasses the fact that ggarrange is used and instead transfers the whole issue to be handled by the facets functionality of ggplot.
This doesn't work for me since I already am using facets in the sub-plots and I cannot use them again.
Here is some example code. I am wondering how to achieve that the two plots which are joined with ggarrange have the same range of y-axis (of course, not setting the limits manually).
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
As you can see, the left image's y-axis ranges from 2 to 5, while the right plot's y-axis ranges from 1.5 to 3.5. How can I make them be the same?
I'm once again arguing for abandoning the 'ggarrange' approach, this time in favour of the {patchwork} package, which allows you to apply an operation to all previous plots. In this case, we can use & scale_y_continuous(limits = ...) to set the limits for all plots.
library(ggplot2)
library(dplyr)
library(purrr)
library(patchwork)
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
wrap_plots() &
scale_y_continuous(limits = range(mtcars$wt))
Created on 2022-12-08 by the reprex package (v2.0.0)
One option would be to compute and add the range of your x and y variables to your dataset before splitting, which could then be used to set the limits.
library(dplyr)
library(ggplot2)
library(ggpubr)
library(purrr)
mtcars %>%
mutate(across(c(mpg, wt), list(range = ~list(range(.x))))) %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
scale_x_continuous(limits = .$mpg_range[[1]]) +
scale_y_continuous(limits = .$wt_range[[1]]) +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
I've plotted a specific set of meteorological data using ggplot as described in the R code below. However, when I use scale_fill_brewer to specific the fill color, a legend does not appear.
What changes are necessary for the legend to appear?
library(tidyverse)
library(lubridate)
library(ggplot2)
library(RColorBrewer)
qurl <- "https://www.geo.fu-berlin.de/met/ag/strat/produkte/qbo/singapore.dat"
sing <- read_table(qurl, skip=4)
# the data file adds a 100mb data row starting in 1997 increasing the number of rows per year from
# 14 to 15. So, one calcuation must be applied to rnum <140 and a different to rnum >140.
sing2 <- sing %>% separate(1,into=c('hpa','JAN'),sep='\\s+') %>% drop_na() %>%
subset(hpa != 'hPa') %>%
mutate(rnum = row_number(),
hpa=as.integer(hpa)) %>%
mutate(year = case_when(rnum <=140 ~ 1987 + floor(rnum/14), # the last year with 14 rows of data
rnum >=141 ~ 1987 + floor(rnum+10/15))) %>% # the first year with 15 rows of data
relocate(year, .before='hpa') %>% arrange(year,hpa) %>%
pivot_longer(cols=3:14, names_to='month',values_to='qbo') %>%
mutate(date=ymd(paste0(year,'-',month,'-15')),
hpa=as.integer(hpa),
qbo=as.numeric(qbo))
sing2 <- sing %>% separate(1,into=c('hpa','JAN'),sep='\\s+') %>% drop_na() %>%
subset(hpa != 'hPa') %>%
mutate(year=1987+floor(row_number()/15),
hpa=as.integer(hpa)) %>%
relocate(year, .before='hpa') %>% arrange(year,hpa) %>%
pivot_longer(cols=2:13, names_to='month',values_to='qbo') %>%
mutate(date=ymd(paste0(year,'-',month,'-15')),
hpa=as.integer(hpa),
qbo=as.numeric(qbo))
# End Data Massaging. It's ready to be graphed
# A simple call to ggplot with geom_contour_filled generates a legend
sing2 %>%
ggplot(aes(x=date,y=hpa)) +
geom_contour_filled(aes(z=qbo*0.1)) +
scale_y_reverse()
# Adding scale_fill_brewer removes the legend.
# Adding show.legend = TRUE to the geom_countour_filled options has no effect.
limits = c(-1,1)*max(abs(sing2$qbo),na.rm=TRUE)
zCuts <- round(seq(limits[1], limits[2], length.out = 11), digits=0)
sing2 %>%
ggplot() +
geom_contour_filled(aes(x=date,y=hpa, z = qbo*0.1),breaks=zCuts*0.1) +
scale_y_reverse(expand=c(0,0)) +
scale_x_date(expand=c(0,0), date_breaks = '1 year', date_labels = '%Y') +
scale_fill_brewer(palette = 5,type='div',breaks=zCuts) +
theme_bw() +
theme(legend.position = 'right',
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
OP, I don't have a direct answer for you, given that your example is not able to be replicated (unable to access the data you gave). In place, I can give you a bit of advice on debugging, since it seems the issue is related to the breaks= argument of scale_fill_brewer(). As you mention, you get a legend when using geom_contour_filled(), but not when you add the scale_fill_brewer() part.
Let me use the example from the documentation for geom_contour_filled() to illustrate this behavior, which utilizes the built-in dataset, fathfuld.
I'll add in your own palette and type choice, leaving out the breaks argument for example:
v <- ggplot(faithfuld, aes(waiting, eruptions, z = density))
v + geom_contour_filled() +
scale_fill_brewer(palette = 5, type='div')
If you do the same thing, but add in a "nonsensical" breaks argument, you get the same plot, but without a legend (like you are seeing):
v + geom_contour_filled() +
scale_fill_brewer(palette = 5, type='div', breaks=1:4)
For me, this is good evidence that the issue in your code relates to the value for breaks= not being within the range expected. Is this just a typo? Note that breaks=zCuts in scale_fill_brewer(), yet breaks=zCuts*0.1 in geom_contour_filled(). This would put each value for your color scale to be 10 times outside the range of the breaks for the contours themselves. I'd be willing to bet that this change to that scale_fill_brewer() line will do the trick:
# earlier plot code
... +
scale_fill_brewer(palette = 5,type='div',breaks=zCuts*0.1) +
...
# remaining plot code
In the following plot, which is a simple scatter plot + theme_apa(), I would like that both axes go through 0.
I tried some of the solutions proposed in the answers to similar questions to that but none of them worked.
A MWE to reproduce the plot:
library(papaja)
library(ggplot2)
library(MASS)
plot_two_factor <- function(factor_sol, groups) {
the_df <- as.data.frame(factor_sol)
the_df$groups <- groups
p1 <- ggplot(data = the_df, aes(x = MR1, y = MR2, color = groups)) +
geom_point() + theme_apa()
}
set.seed(131340)
n <- 30
group1 <- mvrnorm(n, mu=c(0,0.6), Sigma = diag(c(0.01,0.01)))
group2 <- mvrnorm(n, mu=c(0.6,0), Sigma = diag(c(0.01,0.01)))
factor_sol <- rbind(group1, group2)
colnames(factor_sol) <- c("MR1", "MR2")
groups <- as.factor(rep(c(1,2), each = n))
print(plot_two_factor(factor_sol, groups))
The papaja package can be installed via
devtools::install_github("crsh/papaja")
What you request cannot be achieved in ggplot2 and for a good reason, if you include axis and tick labels within the plotting area they will sooner or later overlap with points or lines representing data. I used #phiggins and #Job Nmadu answers as a starting point. I changed the order of the geoms to make sure the "data" are plotted on top of the axes. I changed the theme to theme_minimal() so that axes are not drawn outside the plotting area. I modified the offsets used for the data to better demonstrate how the code works.
library(ggplot2)
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_point() +
theme_minimal()
This gets as close as possible to answering the question using ggplot2.
Using package 'ggpmisc' we can slightly simplify the code.
library(ggpmisc)
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_quadrant_lines(linetype = "solid") +
geom_point() +
theme_minimal()
This code produces exactly the same plot as shown above.
If you want to always have the origin centered, i.e., symmetrical plus and minus limits in the plots irrespective of the data range, then package 'ggpmisc' provides a simple solution with function symmetric_limits(). This is how quadrant plots for gene expression and similar bidirectional responses are usually drawn.
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_quadrant_lines(linetype = "solid") +
geom_point() +
scale_x_continuous(limits = symmetric_limits) +
scale_y_continuous(limits = symmetric_limits) +
theme_minimal()
The grid can be removed from the plotting area by adding + theme(panel.grid = element_blank()) after theme_minimal() to any of the three examples.
Loading 'ggpmisc' just for function symmetric_limits() is overkill, so here I show its definition, which is extremely simple:
symmetric_limits <- function (x)
{
max <- max(abs(x))
c(-max, max)
}
For the record, the following also works as above.
iris %>%
ggplot(aes(Sepal.Length-6.2, Sepal.Width-3.2, col = Species)) +
geom_point() +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0)
Setting xlim and slim should work.
library(tidyverse)
# default
iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, col = Species)) +
geom_point()
# setting xlim and ylim
iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, col = Species)) +
geom_point() +
xlim(c(0,8)) +
ylim(c(0,4.5))
Created on 2020-06-12 by the reprex package (v0.3.0)
While the question is not very clear, PoGibas seems to think that this is what the OP wanted.
library(tidyverse)
# default
iris %>%
ggplot(aes(Sepal.Length-6.2, Sepal.Width-3.2, col = Species)) +
geom_point() +
xlim(c(-2.5,2.5)) +
ylim(c(-1.5,1.5)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0)
Created on 2020-06-12 by the reprex package (v0.3.0)
I am trying to add significance levels to my boxplots in the form of asterisks using ggplot2 and the ggpubr package, but I have many comparisons and I only want to show the significant ones.
I try to use the option hide.ns=TRUE in stat_compare_means, but it clearly does not work, it might be a bug in the ggpubr package.
Besides, you see that I leave out group "PGMC4" from the pairwise wilcox.test comparisons; how can I leave this group out also for the kruskal.test?
The last question I have is how the significance level works? As in * is significant below 0.05, ** below 0.025, *** below 0.01? what is the convention ggpubr uses? Is it showing p-values or adjusted p-values? If the latter, what's the adjusting method? BH?
Please check my MWE below and this link and this other one for reference
##############################
##MWE
set.seed(5)
#test df
mydf <- data.frame(ID=paste(sample(LETTERS, 163, replace=TRUE), sample(1:1000, 163, replace=FALSE), sep=''),
Group=c(rep('C',10),rep('FH',10),rep('I',19),rep('IF',42),rep('NA',14),rep('NF',42),rep('NI',15),rep('NS',10),rep('PGMC4',1)),
Value=rnorm(n=163))
#I don't want to compare PGMC4 cause I have only onw sample
groups <- as.character(unique(mydf$Group[which(mydf$Group!="PGMC4")]))
#function to make combinations of groups without repeating pairs, and avoiding self-combinations
expand.grid.unique <- function(x, y, include.equals=FALSE){
x <- unique(x)
y <- unique(y)
g <- function(i){
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
#all pairs I want to compare
combs <- as.data.frame(expand.grid.unique(groups, groups), stringsAsFactors=FALSE)
head(combs)
my.comps <- as.data.frame(t(combs), stringsAsFactors=FALSE)
colnames(my.comps) <- NULL
rownames(my.comps) <- NULL
#pairs I want to compare in list format for stat_compare_means
my.comps <- as.list(my.comps)
head(my.comps)
pdf(file="test.pdf", height=20, width=25)
print(#or ggsave()
ggplot(mydf, aes(x=Group, y=Value, fill=Group)) + geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
scale_fill_manual(values=myPal) +
ggtitle("TEST TITLE") +
theme(plot.title = element_text(size=30),
axis.text=element_text(size=12),
axis.text.x = element_text(angle=45, hjust=1),
axis.ticks = element_blank(),
axis.title=element_text(size=20,face="bold"),
legend.text=element_text(size=16)) +
stat_compare_means(comparisons=my.comps, method="wilcox.test", label="p.signif", size=14) + #WHY DOES hide.ns=TRUE NOT WORK??? WHY DOES size=14 NOT WORK???
stat_compare_means(method="kruskal.test", size=14) #GLOBAL COMPARISON ACROSS GROUPS (HOW TO LEAVE PGMC4 OUT OF THIS??)
)
dev.off()
##############################
The MWE will produce the following boxplots:
The questions would be:
1- How to make hide.ns=TRUE work?
2- How to increase the size of the *?
3- How to exclude a group from the kruskal.test comparison?
4- What is the * convention used by ggpubr, and are the p-values shown adjusted or not?
Many thanks!!
EDIT
Besides, when doing
stat_compare_means(comparisons=my.comps, method="wilcox.test", p.adjust.method="BH")
I do not obtain the same p-values as when doing
wilcox.test(Value ~ Group, data=mydf.sub)$p.value
where mydf.sub is a subset() of mydf for a given comparison of 2 groups.
What is ggpubr doing here? How does it calculate the p.values?
EDIT 2
Please help, the solution does not have to be with ggpubr (but it has to be with ggplot2), I just need to be able to hide the NS and make the size of the asterisks bigger, as well as a p-value calculation identical to wilcox.test() + p.adjust(method"BH").
Thanks!
Edit: Since I discovered the rstatix package I would do:
set.seed(123)
#test df
mydf <- data.frame(ID=paste(sample(LETTERS, 163, replace=TRUE), sample(1:1000, 163, replace=FALSE), sep=''),
Group=c(rep('C',10),rep('FH',10),rep('I',19),rep('IF',42),rep('NA',14),rep('NF',42),rep('NI',15),rep('NS',10),rep('PGMC4',1)),
Value=c(runif(n=100), runif(63,max= 0.5)))
library(tidyverse)
stat_pvalue <- mydf %>%
rstatix::wilcox_test(Value ~ Group) %>%
filter(p < 0.05) %>%
rstatix::add_significance("p") %>%
rstatix::add_y_position() %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n())
ggplot(mydf, aes(x=Group, y=Value)) + geom_boxplot() +
ggpubr::stat_pvalue_manual(stat_pvalue, label = "p.signif") +
theme_bw(base_size = 16)
Old Answer:
You can try following. The idea is that you calculate the stats by your own using pairwise.wilcox.test. Then you use the ggsignif function geom_signif
to add the precalculated pvalues. With y_position you can place the brackets so they don't overlap.
library(tidyverse)
library(ggsignif)
library(broom)
# your list of combinations you want to compare
CN <- combn(levels(mydf$Group)[-9], 2, simplify = FALSE)
# the pvalues. I use broom and tidy to get a nice formatted dataframe. Note, I turned off the adjustment of the pvalues.
pv <- tidy(with(mydf[ mydf$Group != "PGMC4", ], pairwise.wilcox.test(Value, Group, p.adjust.method = "none")))
# data preparation
CN2 <- do.call(rbind.data.frame, CN)
colnames(CN2) <- colnames(pv)[-3]
# subset the pvalues, by merging the CN list
pv_final <- merge(CN2, pv, by.x = c("group2", "group1"), by.y = c("group1", "group2"))
# fix ordering
pv_final <- pv_final[order(pv_final$group1), ]
# set signif level
pv_final$map_signif <- ifelse(pv_final$p.value > 0.05, "", ifelse(pv_final$p.value > 0.01,"*", "**"))
# the plot
ggplot(mydf, aes(x=Group, y=Value, fill=Group)) + geom_boxplot() +
stat_compare_means(data=mydf[ mydf$Group != "PGMC4", ], aes(x=Group, y=Value, fill=Group), size=5) +
ylim(-4,30)+
geom_signif(comparisons=CN,
y_position = 3:30, annotation= pv_final$map_signif) +
theme_bw(base_size = 16)
The arguments vjust, textsize, and size are not properly working. Seems to be a bug in the latest version ggsignif_0.3.0.
Edit: When you want to show only the significant comparisons, you can easily subset the dataset CN. Since I updated to ggsignif_0.4.0 and R version 3.4.1, vjust and textsize are working now as expected. Instead of y_position you can try step_increase.
# subset
gr <- pv_final$p.value <= 0.05
CN[gr]
ggplot(mydf, aes(x=Group, y=Value, fill=Group)) +
geom_boxplot() +
stat_compare_means(data=mydf[ mydf$Group != "PGMC4", ], aes(x=Group, y=Value, fill=Group), size=5) +
geom_signif(comparisons=CN[gr], textsize = 12, vjust = 0.7,
step_increase=0.12, annotation= pv_final$map_signif[gr]) +
theme_bw(base_size = 16)
You can use ggpubr as well. Add:
stat_compare_means(comparisons=CN[gr], method="wilcox.test", label="p.signif", color="red")
Hi I really have googled this a lot without any joy. Would be happy to get a reference to a website if it exists. I'm struggling to understand the Hadley documentation on polar coordinates and I know that pie/donut charts are considered inherently evil.
That said, what I'm trying to do is
Create a donut/ring chart (so a pie with an empty middle) like the tikz ring chart shown here
Add a second layer circle on top (with alpha=0.5 or so) that shows a second (comparable) variable.
Why? I'm looking to show financial information. The first ring is costs (broken down) and the second is total income. The idea is then to add + facet=period for each review period to show the trend in both revenues and expenses and the growth in both.
Any thoughts would be most appreciated
Note: Completely arbitrarily if an MWE is needed if this was tried with
donut_data=iris[,2:4]
revenue_data=iris[,1]
facet=iris$Species
That would be similar to what I'm trying to do.. Thanks
I don't have a full answer to your question, but I can offer some code that may help get you started making ring plots using ggplot2.
library(ggplot2)
# Create test data.
dat = data.frame(count=c(10, 60, 30), category=c("A", "B", "C"))
# Add addition columns, needed for drawing with geom_rect.
dat$fraction = dat$count / sum(dat$count)
dat = dat[order(dat$fraction), ]
dat$ymax = cumsum(dat$fraction)
dat$ymin = c(0, head(dat$ymax, n=-1))
p1 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(0, 4)) +
labs(title="Basic ring plot")
p2 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank()) +
labs(title="Customized ring plot")
library(gridExtra)
png("ring_plots_1.png", height=4, width=8, units="in", res=120)
grid.arrange(p1, p2, nrow=1)
dev.off()
Thoughts:
You may get more useful answers if you post some well-structured sample data. You have mentioned using some columns from the iris dataset (a good start), but I am unable to see how to use that data to make a ring plot. For example, the ring plot you have linked to shows proportions of several categories, but neither iris[, 2:4] nor iris[, 1] are categorical.
You want to "Add a second layer circle on top": Do you mean to superimpose the second ring directly on top of the first? Or do you want the second ring to be inside or outside of the first? You could add a second internal ring with something like geom_rect(data=dat2, xmax=3, xmin=2, aes(ymax=ymax, ymin=ymin))
If your data.frame has a column named period, you can use facet_wrap(~ period) for facetting.
To use ggplot2 most easily, you will want your data in 'long-form'; melt() from the reshape2 package may be useful for converting the data.
Make some barplots for comparison, even if you decide not to use them. For example, try:
ggplot(dat, aes(x=category, y=count, fill=category)) +
geom_bar(stat="identity")
Just trying to solve question 2 with the same approach from bdemarest's answer. Also using his code as a scaffold. I added some tests to make it more complete but feel free to remove them.
library(broom)
library(tidyverse)
# Create test data.
dat = data.frame(count=c(10,60,20,50),
ring=c("A", "A","B","B"),
category=c("C","D","C","D"))
# compute pvalue
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
ungroup() %>% select(-ring) %>%
chisq.test() %>% tidy()
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
select(-ring) %>%
fisher.test() %>% tidy() %>% full_join(cs.pvalue)
# compute fractions
#dat = dat[order(dat$count), ]
dat %<>% group_by(ring) %>% mutate(fraction = count / sum(count),
ymax = cumsum(fraction),
ymin = c(0,ymax[1:length(ymax)-1]))
# Add x limits
baseNum <- 4
#numCat <- length(unique(dat$ring))
dat$xmax <- as.numeric(dat$ring) + baseNum
dat$xmin = dat$xmax -1
# plot
p2 = ggplot(dat, aes(fill=category,
alpha = ring,
ymax=ymax,
ymin=ymin,
xmax=xmax,
xmin=xmin)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
geom_text(inherit.aes = F,
x=c(-1,1),
y=0,
data = cs.pvalue,aes(label = paste(method,
"\n",
format(p.value,
scientific = T,
digits = 2))))+
xlim(c(0, 6)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank(),
panel.border = element_blank()) +
labs(title="Customized ring plot") +
scale_fill_brewer(palette = "Set1") +
scale_alpha_discrete(range = c(0.5,0.9))
p2
And the result: