I have a dataset with a lot of overlapping points and used ggplot to create a bubble plot to show that data. I need to add bars on my plot for the means of each group on the x axis (values can be 0, 1, or 2). I have tried to use geom_errorbar but haven't been able to get it to work with my data. Any help/suggestions would be greatly appreciated.
The following is my code and a script to generate fake data that is similar:
y <- seq(from=0, to=3.5, by=0.5)
x <- seq(from=0, to=2, by=1)
xnew <- sample(x, 100, replace=T)
ynew <- sample(y, 100, replace=T)
data <- data.frame(xnew,ynew)
data2 <- aggregate(data$xnew, by=list(x=data$xnew, y=data$ynew), length)
names(data2)[3] <- "Count"
ggplot(data2, aes(x = x, y = y)) +
geom_point(aes(size=Count)) +
labs(x = "Copies", y = "Score") +
aes(ymax=..y.., ymin=..y..) +
scale_x_continuous(breaks = seq(0, 2, 1)) +
scale_y_continuous(breaks = seq(0, 3, 0.5)) +
theme(legend.position = "bottom", legend.direction = "horizontal",
axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.text.x = element_text(colour="black", size = 10),
axis.text.y = element_text(colour="black", size = 10))
I am not entirely sure that I understand your question correctly. It seems to me that in addition to the bubbles, you want to visualise the mean value of y for each value of x as a bar of some kind. (You mention error bars, but it seems that this is not a requirement, but just what you have tried. I will use geom_col() instead.)
I assume that you want to weigh the mean over y by the counts, i.e., sum(y * Count) / sum(Count). You can create a data frame that contains these values by using dplyr:
data2_mean
## # A tibble: 3 × 2
## x y
## <dbl> <dbl>
## 1 0 1.833333
## 2 1 1.750000
## 3 2 2.200000
When creating the plot, I use data2 as the data set for geom_point() and data2_mean as the data set for geom_col(). It is important to put the bars first, since the bubbles should be on top of the bars.
ggplot() +
geom_col(aes(x = x, y = y), data2_mean, fill = "gray60", width = 0.7) +
geom_point(aes(x = x, y = y, size = Count), data2) +
labs(x = "Copies", y = "Score") +
scale_x_continuous(breaks = seq(0, 2, 1)) +
scale_y_continuous(breaks = seq(0, 3, 0.5)) +
theme(legend.position = "bottom", legend.direction = "horizontal",
axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.text.x = element_text(colour="black", size = 10),
axis.text.y = element_text(colour="black", size = 10))
Everything that I changed compared to your code comes before scale_x_continuous(). This produces the following plot:
Is this what you're after? I first calculated the group-level means using the dplyr package and then added line segments to your plot using geom_segment:
library(ggplot2)
library(dplyr)
data2 <- data2 %>% group_by(x) %>% mutate(mean.y = mean(y))
ggplot(data2, aes(x = x, y = y)) +
geom_point(aes(size=Count)) +
labs(x = "Copies", y = "Score") +
aes(ymax=..y.., ymin=..y..) +
scale_x_continuous(breaks = seq(0, 2, 1)) +
scale_y_continuous(breaks = seq(0, 3, 0.5)) +
theme(legend.position = "bottom", legend.direction = "horizontal",
axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.text.x = element_text(colour="black", size = 10),
axis.text.y = element_text(colour="black", size = 10)) +
geom_segment(aes(y = mean.y, yend = mean.y, x = x -0.25, xend = x + 0.25))
Related
suppose I have this dataset
data3 <- data.frame(
id = c(1:10),
marker = paste("Marker", seq(1, 10, 1)),
value = paste(rep(c(0,1), times = 2, length.out = 10))
) %>%
mutate(id = row_number(), angle = 90 - 360 * (id - 0.5) / n())
I want to make a chart like this:
[
Image taken from Royam et al, 2019
I have tried using coord_polar() with codes as follow:
ggplot(data = data3, aes(x = factor(id), y = 2, fill = factor(value), label = marker)) +
geom_bar(stat = 'identity', position = 'dodge') +
geom_text(hjust = 1.5, angle = data3$angle) +
coord_polar() +
scale_fill_manual(values = alpha(c('green', 'red'), 0.3), breaks = c(0, 1), labels = c('Upregulated', 'Downregulated')) +
guides(fill = 'none') +
theme(
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.ticks = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.border = element_blank(),
panel.background = element_blank()
)
Which returns this figure:
How can I make the labels kept upright? Additionally, am I going to the right direction in creating the sample plot? Is there any other command in ggplot2 which may create such a figure?
Thank you very much in advance
I'm trying to plot a liine on the x axis which is basically a bunch of zeros and ones. Ones are green and zeros are red. When I try to do that, the color_scale_gradient of the ggplot basically colors on top of the line.
It looks like this
Where the line should be colored as follows:
colorbar is a vector of zeros and ones.
p <- ggplot(data1,aes(newx,newy, group = 1, colour=newy))+
geom_line(size=1.5, show.legend = FALSE)+
scale_colour_gradient(low="red2", high="green3") +
geom_line(data = colorFrame, aes(as.numeric(x)-5,as.numeric(ys), color = colorbar),size=3, show.legend = FALSE)+
xlim(0,1300)
p <- p +
theme(panel.background = element_blank(), axis.ticks.x = element_blank(),
axis.text.x = element_blank(), axis.line.y = element_line(colour = 'black'),
axis.ticks.y.left = element_line(colour = 'black')) +
scale_y_continuous(breaks = seq(0, 12, 1), limits = c(-1, 12), expand = c(0,0))
One solution would be to create two subplots and stitch them together. I use cowplot and theme_void here, but really the second plot below could look however you want it to.
p1 <- ggplot(df, aes(x,y, group = 1, colour=y)) +
geom_line(size=1.5, show.legend = FALSE) +
scale_colour_gradient(low="red2", high="green3") +
theme(panel.background = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line.y = element_line(colour = 'black'),
axis.ticks.y.left = element_line(colour = 'black')) +
scale_y_continuous(breaks = seq(0, 12, 1), limits = c(-1, 12), expand = c(0,0)) +
labs(x = NULL)
p2 <- ggplot(df, aes(x, y = 0, colour=z)) +
geom_line(size=1.5, show.legend = FALSE) +
scale_colour_gradient(low="red2", high="green3") +
theme_void()
cowplot::plot_grid(p1, p2,
ncol = 1,
rel_heights = c(1, .05),
align = 'v')
Data
df <- data.frame(x = 1:50,
y = runif(50, 0, 12),
z = sample(c(0,1), 50, replace = TRUE))
I have made a histogram in R using the following code:
(I have tried generating a reprex. Try the code reprex here
progressiveNumber = c(1:50)
c = c(-0.22037439, -0.21536365, -0.34203720, 0.04501624, -0.13141665, -1.28155157, -0.08394700, -0.08484768, -0.12577287, 0.30402612, -0.40578251,
0.00000000, -0.16849942, -0.04212114, 0.12577287, 0.57366312, -0.84766743, -1.03909659, -0.21536365, -0.46263648, -0.48181028, -0.38887381,
-0.38571106, -0.38571106, -0.26220026, 0.73227348, -0.38887381, -0.96590662, -0.29931065, 0.04272655, 0.04182587, -0.38571106, -0.13141665,
-0.34614726, -0.49063020, -0.08484768, 0.05249378, 0.08484768, -0.74591104, 0.46263648, -0.42081062, 0.00000000, 0.08394700, -0.38571106,
-0.34203720, -0.04212114, -0.79517364, 0.25429442, -0.30402612, -0.08365173)
library(tidyverse)
# DEFINING BREAKS AND CUT A VECTOR INTO BINS
# set up cut-off values
breaks <- c(-1.2816,-0.3881,-0.2154, 0.0000, 0.3 ,0.7323)
# specify interval/bin labels
tags <- c("[-1.2 / -0.3]","[-0.3 / -0.2]", "[-0.2 / 0]", "[0 / 0.3]","[0.3 / 0.7]")
# bucketing values into bins
group_tags <- cut(c,
breaks=breaks,
include.lowest=TRUE,
right=FALSE,
labels=tags)
# inspect bins
summary(group_tags)
# c_groups <- factor(group_tags,levels = labels, ordered = TRUE) # this line doesn't work for some reason
#tiff("percentageBinsC.tiff", units="in", width=5, height=5, res=300,)
p2 = ggplot(data = as_tibble(group_tags), mapping = aes(x=value)) +
geom_bar(fill="deepskyblue1",color="white",alpha=0.7, ) +
stat_count(geom="text", aes(label=sprintf("%.2f",..count../length(group_tags))), vjust=-0.5) +
labs(y = 'Count', x='C') +
theme(text = element_text(size=20), axis.line.x = element_line(color = "black", size = 1),
axis.line.y = element_line(color = "black", size = 1), axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1),
panel.background = element_blank(), panel.border = element_blank(),
panel.grid.minor = element_blank(),panel.grid.major = element_blank())
p2
#dev.off()
Result
I would like to change the label on the bars (not the x-axis label but the ones that are right on top of each bar) from, e.g., 0.26 to 26%, 22% and so on.
How can I do that?
You can use percent_format from scales, first we define a function to do the conversion, and the rounding up you did with sprintf:
convert2perc = scales::percent_format(accuracy = 2)
You can test it:
convert2perc(0.107)
[1] "10%"
Then use it in the plotting:
p2 = ggplot(data = as_tibble(group_tags), mapping = aes(x=value)) +
geom_bar(fill="deepskyblue1",color="white",alpha=0.7, ) +
stat_count(geom="text", aes(label=convert2perc(..count../length(group_tags))), vjust=-0.5) +
labs(y = 'Count', x='C') +
theme(text = element_text(size=20), axis.line.x = element_line(color = "black", size = 1),
axis.line.y = element_line(color = "black", size = 1), axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1),
panel.background = element_blank(), panel.border = element_blank(),
panel.grid.minor = element_blank(),panel.grid.major = element_blank())
I am trying to add the p-value and R2 from mgcv::gam results to ggplot with facets. The sample dataframe and code are below. Is there a way to successfully paste the p-value and R2 on the ggplots?
DF <- data.frame(Site = rep(LETTERS[20:24], each = 4),
Region = rep(LETTERS[14:18], each = 4),
time = rep(LETTERS[1:10], each = 10),
group = rep(LETTERS[1:4], each = 10),
value1 = runif(n = 1000, min = 10, max = 15),
value2 = runif(n = 1000, min = 100, max = 150))
DF$time <- as.numeric(DF$time)
GAMFORMULA <- y ~ s(x,bs="cr",k=3)
plot1 <- ggplot(data=DF,
aes(x=time, y=value2)) +
geom_point(col="gray", alpha=0.8,
name="") +
geom_line(col="gray", alpha=0.8,
name="",aes(group=group)) +
geom_smooth(se=T, col="darkorange", alpha=0.8,
name="", fill="orange",
method="gam",formula=GAMFORMULA) +
theme_bw() +
theme(strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10, face="bold", angle=0),
strip.background = element_rect(colour="black", fill="gray90"),
axis.text.x = element_text(size=10), # remove x-axis text
axis.text.y = element_text(size=10), # remove y-axis text
axis.ticks = element_blank(), # remove axis ticks
axis.title.x = element_text(size=18), # remove x-axis labels
axis.title.y = element_text(size=25), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) +
labs(y="Value", x="Time", title = "") +
stat_fit_glance(method = "gam",
method.args = list(formula = GAMFORMULA),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE)
plot1 + facet_wrap(Site~group, scales="free_y", ncol=3)
Error in sprintf("R^2~\"=\"~%.3f~~italic(p)~\"=\"~%.2f", r.squared, p.value) :
object 'r.squared' not found
My answer explains why stat_fit_glance() cannot be used to add r.sq to a plot, but I am afraid is does not provide an alternative approach.
stat_fit_glance() is a wrapper on broom:glance() that fits the model and passes the model fit object to broom:glance(). In the case of gam(), broom:glance() does not return an estimate for R2 and consequently also stat_fit_glance() is unable to return it.
To see what computed values are available one can use geom_debug() from package 'gginnards'.
library(ggpmisc)
library(gginnards)
library(mgcv)
DF <- data.frame(Site = rep(LETTERS[20:24], each = 4),
Region = rep(LETTERS[14:18], each = 4),
time = rep(LETTERS[1:10], each = 10),
group = rep(LETTERS[1:4], each = 10),
value1 = runif(n = 1000, min = 10, max = 15),
value2 = runif(n = 1000, min = 100, max = 150))
DF$time <- as.numeric(DF$time)
GAMFORMULA <- y ~ s(x,bs="cr",k=3)
plot1 <- ggplot(data=DF,
aes(x=time, y=value2)) +
geom_point(col="gray", alpha=0.8,
name="") +
geom_line(col="gray", alpha=0.8,
name="",aes(group=group)) +
geom_smooth(se=T, col="darkorange", alpha=0.8,
name="", fill="orange",
method="gam",formula=GAMFORMULA) +
theme_bw() +
theme(strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10, face="bold", angle=0),
strip.background = element_rect(colour="black", fill="gray90"),
axis.text.x = element_text(size=10), # remove x-axis text
axis.text.y = element_text(size=10), # remove y-axis text
axis.ticks = element_blank(), # remove axis ticks
axis.title.x = element_text(size=18), # remove x-axis labels
axis.title.y = element_text(size=25), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) +
labs(y="Value", x="Time", title = "") +
stat_fit_glance(method = "gam",
method.args = list(formula = GAMFORMULA),
# aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
# stat(..r.squared..),stat(..p.value..))),
# parse = TRUE)
geom = "debug")
plot1 + facet_wrap(Site~group, scales="free_y", ncol=3)
Shown above are the values returned by stat_fit_glance() for the first two panels in the plot.
Note: There does not seem to be agreement on whether R-square is meaningful for GAM. However the summary() method for gam does return an adjusted R-square estimate as member r.sq.
Thanks for the suggested duplicate, this is however not only about the labels, but is also about adjusting the points themselves so they do not overlap.
have a quick look at the plot below...
I need the coloured points, and their corresponding labels, to never overlap. They should be clustered together and all visible, perhaps with some indication that they are spaced and not 100% accurate, perhaps some sort of call out? Open to suggestions on that.
I've tried adding position = 'jitter' to both geom_point and geom_text, but that doesn't seem to be working (assume it is only for small overlaps?)
Ideas?
# TEST DATA
srvc_data <- data.frame(
Key = 1:20,
X = sample(40:80, 20, replace = T),
Y = sample(30:65, 20, replace = T)
)
srvc_data$Z <- with(srvc_data,abs(X-Y))
t1<-theme(
plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.line = element_line(size=.4)
)
main_plot <- ggplot(srvc_data, aes(x = X, y = Y),xlim=c(0,100), ylim=c(0,100)) +
t1 +
theme_bw() +
labs(x="X", y="Y") +
scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100)) +
geom_abline(intercept = 0, slope = 1, colour="blue", size=34, alpha=.1)+
geom_abline(intercept = 0, slope = 1, colour="black", size=.2, alpha=.5,linetype="dashed")+
geom_point(size = 7, aes(color = Z), alpha=.7) +
scale_color_gradient("Gap %\n",low="green", high="red")+
coord_fixed()+
geom_text(aes(label=Key,size=6),show_guide = FALSE)
main_plot
Produces this plot (of course with your random data it will vary)
Thanks in advance.
Here's your plot with ggrepel geom_text_repel:
library(ggrepel)
# TEST DATA
set.seed(42)
srvc_data <- data.frame(
Key = 1:20,
X = sample(40:80, 20, replace = T),
Y = sample(30:65, 20, replace = T)
)
srvc_data$Z <- with(srvc_data,abs(X-Y))
t1<-theme(
plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.line = element_line(size=.4)
)
ggplot(srvc_data, aes(x = X, y = Y),xlim=c(0,100), ylim=c(0,100)) +
t1 +
theme_bw() +
labs(x="X", y="Y") +
scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100)) +
geom_abline(intercept = 0, slope = 1, colour="blue", size=34, alpha=.1)+
geom_abline(intercept = 0, slope = 1, colour="black", size=.2, alpha=.5,linetype="dashed")+
geom_point(size = 7, aes(color = Z), alpha=.7) +
scale_color_gradient("Gap %\n",low="green", high="red")+
coord_fixed()+
geom_text_repel(aes(label=Key,size=6),show_guide = FALSE)