I have a graph made in ggplot that looks like this:
I wish to have the numeric labels at each of the bars to be grounded/glued to the x axis where y <= 0.
This is the code to generate the graph as such:
ggplot(data=df) +
geom_bar(aes(x=row, y=numofpics, fill = crop, group = 1), stat='identity') +
geom_point(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_line(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_text(aes(x=row, y=numofpics, label=bbch)) +
geom_hline(yintercept=300, linetype="dashed", color = "red", size=1) +
scale_y_continuous(sec.axis= sec_axis(~./50, name="Number of Parcels")) +
scale_x_discrete(name = c(),breaks = unique(df$crop), labels = as.character(unique(df$crop)))+
labs(x=c(), y="Number of Pictures")
I've tried vjust and experimenting with position_nudge for the geom_text element, but every solution I can find changes the position of each element of the geom_text respective to its current position. As such everything I try results in situation like this one:
How can I make ggplot ground the text to the bottom of the x axis where y <= 0, possibly with the possibility to also introduce a angle = 45?
Link to dataframe = https://drive.google.com/file/d/1b-5AfBECap3TZjlpLhl1m3v74Lept2em/view?usp=sharing
As I said in the comments, just set the y-coordinate of the text to 0 or below, and specify the angle : geom_text(aes(x=row, y=-100, label=bbch), angle=45)
I'm behind a proxy server that blocks connections to google drive so I can't access your data. I'm not able to test this, but I would introduce a new label field in my dataset that sets y to be 0 if y<0:
df <- df %>%
mutate(labelField = if_else(numofpics<0, 0, numofpics)
I would then use this label field in my geom_text call:
geom_text(aes(x=row, y=labelField, label=bbch), angle = 45)
Hope that helps.
You can simply define the y-value in geom_text (e.g. -50)
ggplot(data=df) +
geom_bar(aes(x=row, y=numofpics, fill = crop, group = 1), stat='identity') +
geom_point(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_line(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_text(aes(x=row, y=-50, label=bbch)) +
geom_hline(yintercept=300, linetype="dashed", color = "red", size=1) +
scale_y_continuous(sec.axis= sec_axis(~./50, name="Number of Parcels")) +
scale_x_discrete(name = c(),breaks = unique(df$crop), labels =
as.character(unique(df$crop)))+
labs(x=c(), y="Number of Pictures")
Related
I am planning to plot vertical profile of multiple parameters on x axis, for example, salinity, temperature, density, against pressure as y axis, in the same graph. This is the kind of plot i am hoping to get :
Here is a sample from my data :
ï..IntD.Date. IntT.Time. Salinity..psu. SIGMA.Kg.m3. Pressure.dbar.
1 21-April-2019 5:31:55 PM 30.2502 20.2241 0.7160
2 21-April-2019 5:32:00 PM 31.0254 20.8081 0.8409
3 21-April-2019 5:32:05 PM 31.2654 20.9930 1.0551
4 21-April-2019 5:32:10 PM 31.2953 21.0176 1.2694
Temp..0C. Vbatt.volt.
1 23.4054 12.29
2 23.4148 12.30
3 23.4060 12.29
4 23.4024 12.33
I already used these codes:
data <- read.csv('file location')
vert_plot <- ggplot(data, aes(x = Pressure.dbar., y = Temp..0C.)) + geom_line(color = '#088DA5', size = 0.75) + labs(size = 18) + ggtitle("temp vs pressure") + theme_grey() + coord_flip() + scale_y_reverse()
Which generated this plot :
as you can see, i was able to bring a single profile where the scale of y axis wasn't in reverse order whereas I'd prefer pressure value (0, 5, 10....) starting from the top left corner. Unlike the plot i made where pressure value begins in bottom left corner.
I'd be grateful if someone helped me to get figure where i will be able to plot multiple vertical profile in same graph where y axis is pressure and is in reverse order, as shown in that barrier layer thickness picture.
Add as many geom_line() as required and call aes in each geom_line(). For breaks of 5, add scale_x_continuous and call sequence of breaks in it.
vert_plot <- ggplot(df) +
geom_line(aes(x = Pressure.dbar., y = Temp..0C.), color = 'blue', size = 0.75) +
geom_line(aes(x = Pressure.dbar., y = Salinity..psu.), color = 'red', size = 0.75) +
geom_line(aes(x = Pressure.dbar., y = SIGMA.Kg.m3.), color = 'green', size = 0.75) +
labs(size = 18) + ggtitle("Dummy Title") + xlab("Pressure") + ylab("Dummy Label") +
scale_x_reverse(limits = c(40, 0), breaks = seq(40, 0, -5)) +
theme_grey() + coord_flip() + scale_y_reverse()
Alternate method:
Instead of going through all these, you can melt the data frame keeping the variable names as groups.
library(reshape2)
newdf <- melt(df, id.vars = c("IntD.Date.", "IntT.Time.", "Pressure.dbar."),
variable.name = "group")
vert_plot <- ggplot(newdf, aes(x = Pressure.dbar., y = value, color = group)) +
geom_line(size = 0.75) +
labs(size = 18) + ggtitle("Dummy Title") +
xlab("Pressure") + ylab("Dummy Label") +
scale_x_reverse(limits = c(40, 0), breaks = seq(40, 0, -5)) +
theme_grey() + coord_flip() + scale_y_reverse()
This is my first question here so hope this makes sense and thank you for your time in advance!
I am trying to generate a scatterplot with the data points being the log2 expression values of genes from 2 treatments from an RNA-Seq data set. With this code I have generated the plot below:
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
Current scatterplot:
However, I would like to change the background of the plot below the red line to a lighter red and above the blue line to a lighter blue, but still being able to see the data points in these regions. I have tried so far by using polygons in the code below.
pol1 <- data.frame(x = c(-14, 15, 15), y = c(-15, -15, 14))
pol2 <- data.frame(x = c(-15, -15, 14), y = c(-14, 15, 15))
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_polygon(data = pol1, aes(x = x, y = y), color ="pink1") +
geom_polygon(data = pol2, aes(x = x, y = y), color ="powderblue") +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
New scatterplot:
However, these polygons hide my data points in this area and I don't know how to keep the polygon color but see the data points as well. I have also tried adding "fill = NA" to the geom_polygon code but this makes the area white and only keeps a colored border. Also, these polygons shift my axis limits so how do I change the axes to begin at -15 and end at 15 rather than having that extra unwanted length?
Any help would be massively appreciated as I have struggled with this for a while now and asked friends and colleagues who were unable to help.
Thanks,
Liv
Your question has two parts, so I'll answer each in turn using a dummy dataset:
df <- data.frame(x=rnorm(20,5,1), y=rnorm(20,5,1))
Stop geom_polygon from hiding geom_point
Stefan had commented with the answer to this one. Here's an illustration. Order of operations matters in ggplot. The plot you create is a result of each geom (drawing operation) performed in sequence. In your case, you have geom_polygon after geom_point, so it means that it will plot on top of geom_point. To have the points plotted on top of the polygons, just have geom_point happen after geom_polygon. Here's an illustrative example:
p <- ggplot(df, aes(x,y)) + theme_bw()
p + geom_point() + xlim(0,10) + ylim(0,10)
Now if we add a geom_rect after, it hides the points:
p + geom_point() +
geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
xlim(0,10) + ylim(0,10)
The way to prevent that is to just reverse the order of geom_point and geom_rect. It works this way for all geoms.
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
xlim(0,10) + ylim(0,10)
Removing whitespace between the axis and limits of the axis
The second part of your question asks about how to remove the white space between the edges of your geom_polygon and the axes. Notice how I have been using xlim and ylim to set limits? It is a shortcut for scale_x_continuous(limits=...) and scale_y_continuous(limits=...); however, we can use the argument expand= within scale_... functions to set how far to "expand" the plot before reaching the axis. You can set the expand setting for upper and lower axis limits independently, which is why this argument expects a two-component number vector, similar to the limits= argument.
Here's how to remove that whitespace:
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
scale_x_continuous(limits=c(0,10), expand=c(0,0)) +
scale_y_continuous(limits=c(0,10), expand=c(0,0))
The data I am working on is a clustering data, with multiple observations within one group, I generated a caterpillar plot and want labelling for each group(zipid), not every line, my current graph and code look like this:
text = hosp_new[,c("zipid")]
ggplot(hosp_new, aes(x = id, y = oe, colour = zipid, shape = group)) +
# theme(panel.grid.major = element_blank()) +
geom_point(size=1) +
scale_shape_manual(values = c(1, 2, 4)) +
geom_errorbar(aes(ymin = low_ci, ymax = high_ci)) +
geom_smooth(method = lm, se = FALSE) +
scale_linetype_manual(values = linetype) +
geom_segment(aes(x = start_id, xend = end_id, y = region_oe, yend = region_oe, linetype = "4", size = 1.2)) +
geom_ribbon(aes(ymin = region_low_ci, ymax = region_high_ci), alpha=0.2, linetype = "blank") +
geom_hline(aes(yintercept = 1, alpha = 0.2, colour = "red", size = 1), show.legend = "FALSE") +
scale_size_identity() +
scale_x_continuous(name = "hospital id", breaks = seq(0,210, by = 10)) +
scale_y_continuous(name = "O:E ratio", breaks = seq(0,7, by = 1)) +
geom_text(aes(label = text), position = position_stack(vjust = 10.0), size = 2)
Caterpillar plot:
Each color represents a region, I just want one label/per region, but don't know how to delete the duplicated labels in this graph.
Any idea?
The key is to have geom_text return only one value for each zipid, rather than multiple values. If we want each zipid label located in the middle of its group, then we can use the average value of id as the x-coordinate for each label. In the code below, we use stat_summaryh (from the ggstance package) to calculate that average id value for the x-coordinate of the label and return a single label for each zipid.
library(ggplot2)
theme_set(theme_bw())
library(ggstance)
# Fake data
set.seed(300)
dat = data.frame(id=1:100, y=cumsum(rnorm(100)),
zipid=rep(LETTERS[1:10], c(10, 5, 20, 8, 7, 12, 7, 10, 13,8)))
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
stat_summaryh(fun.x=mean, aes(label=zipid, y=1.02*max(y)), geom="text") +
guides(colour=FALSE)
You could also use faceting, as mentioned by #user20650. In the code below, panel.spacing.x=unit(0,'pt') removes the space between facet panels, while expand=c(0,0.5) adds 0.5 units of padding on the sides of each panel. Together, these ensure constant spacing between tick marks, even across facets.
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
facet_grid(. ~ zipid, scales="free_x", space="free_x") +
guides(colour=FALSE) +
theme_classic() +
scale_x_continuous(breaks=0:nrow(dat),
labels=c(rbind(seq(0,100,5),'','','',''))[1:(nrow(dat)+1)],
expand=c(0,0.5)) +
theme(panel.spacing.x = unit(0,"pt"))
I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.
However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:
library(scales)
library(ggplot2)
# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900),
rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
value <- sqrt(value*value)
Tdata <- cbind(Gene, Clone, variable)
Tdata <- data.frame(Tdata)
Tdata <- cbind(Tdata,value)
# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")
lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
pch=15)
lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Grouped-Wrong Symbols.png")
#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
aes(shape=Clone))
lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Ungrouped-Right Symbols.png")
If anyone has any suggestions I'd really appreciate it.
Thank you
Nathan
To get the boxplots to appear, the shape aesthetic needs to be inside geom_point, rather than in the main call to ggplot. The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot. However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone. Since there's only one row of data for each combination of variable and Clone, no boxplot is produced.
That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point.
Then, to get the points to appear with the correct boxplot, we need to group by Gene. I also added theme_classic to make it easier to see the plot (although it's still very busy):
ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7,
aes(shape=Clone, group=Gene)) +
scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x))) +
theme_classic()
I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable. Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version.
library(dplyr)
ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)),
aes(x=gsub("Day","",variable), y=value)) +
stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y = "Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging:
set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
Gene = gsub("Gene","Gene ", Gene)),
aes(x=Day - 2, y=value, group=Day)) +
stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
position=position_jitter(width=1, height=0)) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y="Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
One more thing: For future reference, you can simplify your data creation code:
Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24),
sd=rep(c(0.5,8,900,9000,3000), each=24))
Tdata = data.frame(Gene, Clone, variable, value)
I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")