I am attempting to make a multi-panelled plot from three individual plots (see images).However, I am unable to rectify the bunched x-axis tick labels when the plots are in the multi-panel format. Following is the script for the individual plots and the multi-panel:
Individual Plot:
NewDat [[60]]
EstRes <- NewDat [[60]]
EstResPlt = ggplot(EstRes,aes(Distance3, `newBa`))+geom_line() + scale_x_continuous(n.breaks = 10, limits = c(0, 3500))+ scale_y_continuous(n.breaks = 10, limits = c(0,25))+ xlab("Distance from Core (μm)") + ylab("Ba:Ca concentration(μmol:mol)") + geom_hline(yintercept=2.25, linetype="dashed", color = "red")+ geom_vline(xintercept = 1193.9, linetype="dashed", color = "grey")+ geom_vline(xintercept = 1965.5, linetype="dashed", color = "grey") + geom_vline(xintercept = 2616.9, linetype="dashed", color = "grey") + geom_vline(xintercept = 3202.8, linetype="dashed", color = "grey")+ geom_vline(xintercept = 3698.9, linetype="dashed", color = "grey")
EstResPlt
Multi-panel plot:
MultiP <- grid.arrange(MigrPlt,OcResPlt,EstResPlt, nrow =1)
I have attempted to include:
MultiP <- grid.arrange(MigrPlt,OcResPlt,EstResPlt, nrow =1)+
theme(axis.text.x = element_text (angle = 45)) )
MultiP
but have only received errors. It's not necessary for all tick marks to be included. An initial, mid and end value is sufficient and therefore they would not need to all be included or angled. I'm just not sure how to do this. Assistance would be much appreciated.
There are several options to resolve the crowded axes. Let's consider the following example which parallels your case. The default labelling strategy wouldn't overcrowd the x-axis.
library(ggplot2)
library(patchwork)
library(scales)
df <- data.frame(
x = seq(0, 3200, by = 20),
y = cumsum(rnorm(161))
)
p <- ggplot(df, aes(x, y)) +
geom_line()
(p + p + p) / p &
scale_x_continuous(
name = "Distance (um)"
)
However, because you've given n.breaks = 10 to the scale, it becomes crowded. So a simple solution would just be to remove that.
(p + p + p) / p &
scale_x_continuous(
n.breaks = 10,
name = "Distance (um)"
)
Alternatively, you could convert the micrometers to millimeters, which makes the labels less wide.
(p + p + p) / p &
scale_x_continuous(
n.breaks = 10,
labels = label_number(scale = 1e-3, accuracy = 0.1),
name = "Distance (mm)"
)
Yet another alternative is to put breaks only every n units, in the case below, a 1000. This happens to coincide with omitting n.breaks = 10 by chance.
(p + p + p) / p &
scale_x_continuous(
breaks = breaks_width(1000),
name = "Distance (um)"
)
Created on 2021-11-02 by the reprex package (v2.0.1)
I thought it would be better to show with an example.
What I mean was, you made MigrPlt, OcResPlt, EstResPlt each with ggplot() +...... For plot that you want to rotate x axis, add + theme(axis.text.x = element_text (angle = 45)).
For example, in iris data, only rotate x axis text for a like
a <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point() +
theme(axis.text.x = element_text (angle = 45))
b <- ggplot(iris, aes(Petal.Width, Petal.Length)) +
geom_point()
gridExtra::grid.arrange(a,b, nrow = 1)
Related
I am trying to display a graph showing the log10 transformed values of some data. When creating a violin graph with the below code:
tt <- EditedDF1 %>%
ggplot(aes(x=ct_marshallFAC, y=log10(uchl1_d1_pgml), fill = EditedDF1$ct_marshallFAC)) +
geom_violin() +
geom_boxplot(width=0.1, outlier.shape = NA, fill="white") +
theme_classic() +
theme(axis.text.x = element_text(size= 20)) +
labs(x="", y= "Log UCHL1(pg/ml)") +
ylim(-3.5, 6) +
theme(legend.position="none")+
scale_x_discrete(labels=c("I", "II", "III-IV", "V-VI")) +
geom_hline(yintercept = 1.568, colour = "red", linetype="dotted" )
My problem is the y axis gives a value of -3, -2, -1, 0 when what I want it to show is the actual value the log10 would equate to. For instance -1 would be 0.1, 0 would be 1, 1 would be 10 and so on.
I have chosen to display the log-transformed data as I have multiple panels of data. However I feel the reader would be better able to understand if I gave the actual values on the y axis.
Many thanks.
Dan W
Take out your ylim(-3.5, 6) and use ggplot's scale_y_log10 instead with limits there:
library(ggplot2)
EditedDF1 <- data.frame(ct_marshallFAC = rep(c("I", "II", "III-IV", "V-VI"),10), uchl1_d1_pgml =runif(40)*100)
tt <- EditedDF1 %>%
ggplot(aes(x=ct_marshallFAC, y=log10(uchl1_d1_pgml),
fill = EditedDF1$ct_marshallFAC)) +
geom_violin() +
geom_boxplot(width=0.1, outlier.shape = NA, fill="white") +
theme_classic() +
theme(axis.text.x = element_text(size= 20)) +
labs(x="", y= "UCHL1(pg/ml)") +
theme(legend.position="none")+
scale_x_discrete(labels=c("I", "II", "III-IV", "V-VI")) +
geom_hline(yintercept = 1.568, colour = "red", linetype="dotted" ) +
scale_y_log10(limits = c(exp(-3.5), exp(6)))
Edit: I've never had someone tell me that a logarithmic scale is confusing...I'd just use what you had to begin with and keep the label to reflect that the values are logarithmic.
I would like to apply a position_nudge to an object, but it should always be a certain distance (e.g. in "cm") rather than relative to the scale of the measured variable.
data <- data.frame(
name=c("de","gb","cn","ir","ru") ,
value=c(3,12,5,18,45)*1
)
ggplot(data,
aes(x=name, y=value)) +
geom_bar(stat = "identity") +
geom_text(aes(y = 0,
label = paste0(name,value)),
position = position_nudge(y = -12)) +
coord_cartesian(ylim = c(0, 50), # This focuses the x-axis on the range of interest
clip = 'off') + # This keeps the labels from disappearing
theme(plot.margin = unit(c(1,1,1,1), "lines"))
When changing the scale of the variable, that adjustment should not need to be made in the position_nudge argument, e.g.
factor = 100
data <- data.frame(
name=c("de","gb","cn","ir","ru") ,
value=c(3,12,5,18,45)*factor
)
ggplot(data,
aes(x=name, y=value)) +
geom_bar(stat = "identity") +
geom_text(aes(y = 0,
label = paste0(name,value)),
position = position_nudge(y = -12)) +
coord_cartesian(ylim = c(0, 50*factor), # This focuses the x-axis on the range of interest
clip = 'off') + # This keeps the labels from disappearing
theme(plot.margin = unit(c(1,1,1,1), "lines"))
Currently, this does not work, so that I need to manually change -12 to -1200 to achieve this:
This is of course only a short reproducible example, the actual use-case is placing country flags as x-axis labels below the plot.
The final product will look somewhat like this, but currently requires updating the nudges each time the y-values change:
Thank you very much!
The easiest "hack" is to make this two plots and bind them with patchwork or cowplot. If you try it differently, you'd soon get into deep grid ... trouble.
Related
baptiste on github
baptiste on stackoverflow
Sandy Muspratt's answer
The easy way:
library(ggplot2)
library(patchwork)
foo <- data.frame(
name=c("de","gb","cn","ir","ru") ,
value=c(3,12,5,18,45)*1
)
foo_label = paste(foo$name, foo$value)
p <- ggplot(foo, aes(x=name, y=value)) +
geom_blank() # essential, so that both plots have same scaling
p_1 <-
p + geom_col() +
coord_cartesian(ylim = c(0, 50),clip = 'off') +
theme(plot.margin = margin())
p_text <-
p + annotate("text", label = foo_label, x = 1:5, y = 0, col="red") +
theme_void() +
coord_cartesian(clip = "off") +
theme(plot.margin = margin(1,0,1,0, unit = "lines"))
p_1/p_text + plot_layout(heights = c(1,0)) #this is a workaround to make the height of the text plot minimal!
You can then of course annotate with anything.
For your stated goal, the ggtext library may be more appropriate, as it allows you to embed images directly into the x axis labels. See also here for another example.
library(ggplot2)
library(ggtext)
labels <- c(
setosa = "<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Iris_setosa.JPG/180px-Iris_setosa.JPG'
width='100' /><br>*I. setosa*",
virginica = "<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Iris_virginica_-_NRCS.jpg/320px-Iris_virginica_-_NRCS.jpg'
width='100' /><br>*I. virginica*",
versicolor = "<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/20140427Iris_versicolor1.jpg/320px-20140427Iris_versicolor1.jpg'
width='100' /><br>*I. versicolor*"
)
ggplot(iris, aes(Species, Sepal.Width)) +
geom_boxplot() +
scale_x_discrete(
name = NULL,
labels = labels
) +
theme(
axis.text.x = element_markdown(color = "black", size = 11)
)
This is my first question here so hope this makes sense and thank you for your time in advance!
I am trying to generate a scatterplot with the data points being the log2 expression values of genes from 2 treatments from an RNA-Seq data set. With this code I have generated the plot below:
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
Current scatterplot:
However, I would like to change the background of the plot below the red line to a lighter red and above the blue line to a lighter blue, but still being able to see the data points in these regions. I have tried so far by using polygons in the code below.
pol1 <- data.frame(x = c(-14, 15, 15), y = c(-15, -15, 14))
pol2 <- data.frame(x = c(-15, -15, 14), y = c(-14, 15, 15))
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_polygon(data = pol1, aes(x = x, y = y), color ="pink1") +
geom_polygon(data = pol2, aes(x = x, y = y), color ="powderblue") +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
New scatterplot:
However, these polygons hide my data points in this area and I don't know how to keep the polygon color but see the data points as well. I have also tried adding "fill = NA" to the geom_polygon code but this makes the area white and only keeps a colored border. Also, these polygons shift my axis limits so how do I change the axes to begin at -15 and end at 15 rather than having that extra unwanted length?
Any help would be massively appreciated as I have struggled with this for a while now and asked friends and colleagues who were unable to help.
Thanks,
Liv
Your question has two parts, so I'll answer each in turn using a dummy dataset:
df <- data.frame(x=rnorm(20,5,1), y=rnorm(20,5,1))
Stop geom_polygon from hiding geom_point
Stefan had commented with the answer to this one. Here's an illustration. Order of operations matters in ggplot. The plot you create is a result of each geom (drawing operation) performed in sequence. In your case, you have geom_polygon after geom_point, so it means that it will plot on top of geom_point. To have the points plotted on top of the polygons, just have geom_point happen after geom_polygon. Here's an illustrative example:
p <- ggplot(df, aes(x,y)) + theme_bw()
p + geom_point() + xlim(0,10) + ylim(0,10)
Now if we add a geom_rect after, it hides the points:
p + geom_point() +
geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
xlim(0,10) + ylim(0,10)
The way to prevent that is to just reverse the order of geom_point and geom_rect. It works this way for all geoms.
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
xlim(0,10) + ylim(0,10)
Removing whitespace between the axis and limits of the axis
The second part of your question asks about how to remove the white space between the edges of your geom_polygon and the axes. Notice how I have been using xlim and ylim to set limits? It is a shortcut for scale_x_continuous(limits=...) and scale_y_continuous(limits=...); however, we can use the argument expand= within scale_... functions to set how far to "expand" the plot before reaching the axis. You can set the expand setting for upper and lower axis limits independently, which is why this argument expects a two-component number vector, similar to the limits= argument.
Here's how to remove that whitespace:
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
scale_x_continuous(limits=c(0,10), expand=c(0,0)) +
scale_y_continuous(limits=c(0,10), expand=c(0,0))
I have two scatter plots obtained from two sets of data that I would like to overlay, when using the ggplo2 for creating single plot i am using log scale and than ordering the numbers sothe scatter plot falls into kind if horizontal S shape. Byt when i want to overlay, the information about reordering gets lost, and the plot loses its shape.
this is how the df looks like (one has 1076 entries and the other 1448)
protein Light_Dark log10
AT1G01080 1.1744852 0.06984755
AT1G01090 1.0710359 0.02980403
AT1G01100 0.4716955 -0.32633823
AT1G01320 156.6594802 2.19495668
AT1G02500 0.6406005 -0.19341276
AT1G02560 1.3381804 0.12651467
AT1G03130 0.6361147 -0.19646458
AT1G03475 0.7529015 -0.12326181
AT1G03630 0.7646064 -0.11656207
AT1G03680 0.8340107 -0.07882836
this is for single plot:
p1 <- ggplot(ratio_log_ENR4, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")
image=p1
ggsave(file="p1_ratio_data_ENR4_cys.svg", plot=image, width=10, height=8)
and for over lay:
p1_14a <- ggplot(ratio_log_ENR1, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")+
geom_point()+
geom_point(data=ratio_log_ENR4, color="red")
p=ggplot(ratio_log_ENR1, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")
p = p + geom_point(data=ratio_log_ENR4, aes(x=reorder(protein, -log10), y=log10), color ="red" )
p
I tried to change classes... but it cant be the problem since for single plot its working like it is
The easiest solution I see for you is just binding together your two dataframes before plotting.
a$color <- 'red'
b$color <- 'blue'
ab <- a %>%
rbind(b)
ggplot(ab, aes(x = fct_reorder(protein, -log10), y = log10, color = color)) +
geom_point() +
scale_color_identity()
You can find a nice cheat-sheet for working with factors here: https://stat545.com/block029_factors.html
I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.
However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:
library(scales)
library(ggplot2)
# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900),
rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
value <- sqrt(value*value)
Tdata <- cbind(Gene, Clone, variable)
Tdata <- data.frame(Tdata)
Tdata <- cbind(Tdata,value)
# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")
lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
pch=15)
lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Grouped-Wrong Symbols.png")
#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
aes(shape=Clone))
lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Ungrouped-Right Symbols.png")
If anyone has any suggestions I'd really appreciate it.
Thank you
Nathan
To get the boxplots to appear, the shape aesthetic needs to be inside geom_point, rather than in the main call to ggplot. The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot. However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone. Since there's only one row of data for each combination of variable and Clone, no boxplot is produced.
That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point.
Then, to get the points to appear with the correct boxplot, we need to group by Gene. I also added theme_classic to make it easier to see the plot (although it's still very busy):
ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7,
aes(shape=Clone, group=Gene)) +
scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x))) +
theme_classic()
I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable. Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version.
library(dplyr)
ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)),
aes(x=gsub("Day","",variable), y=value)) +
stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y = "Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging:
set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
Gene = gsub("Gene","Gene ", Gene)),
aes(x=Day - 2, y=value, group=Day)) +
stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
position=position_jitter(width=1, height=0)) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y="Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
One more thing: For future reference, you can simplify your data creation code:
Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24),
sd=rep(c(0.5,8,900,9000,3000), each=24))
Tdata = data.frame(Gene, Clone, variable, value)