I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))
Related
I am trying to plot data using ggplot2 in R.
The datapoints occur for each 2^i-th x-value (4, 8, 16, 32,...). For that reason, I want to scale my x-Axis by log_2 so that my datapoints are spread out evenly. Currently most of the datapoints are clustered on the left side, making my plot hard to read (see first image).
I used the following command to get this image:
ggplot(summary, aes(x=xData, y=yData, colour=groups)) +
geom_errorbar(aes(ymin=yData-se, ymax=yData+se), width=2000, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd)
However trying to scale my x-axis with log2_trans yields the second image, which is not what I expected and does not follow my data.
Code used:
ggplot(summary, aes(x=settings.numPoints, y=benchmark.costs.average, colour=solver.name)) +
geom_errorbar(aes(ymin=benchmark.costs.average-se, ymax=benchmark.costs.average+se), width=2000, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd) +
scale_x_continuous(trans = log2_trans(),
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x)))
Using scale_x_continuous(trans = log2_trans()) only doesn't help either.
EDIT:
Attached the data for reproducing the results:
https://pastebin.com/N1W0z11x
EDIT 2:
I have used the function pd <- position_dodge(1000) to avoid overlapping of my error bars, which caused the problem.
Removing the position=pd statements solved the issue
Here is a way you could format your x-axis:
# Generate dummy data
x <- 2^seq(1, 10)
df <- data.frame(
x = c(x, x, x),
y = c(0.5*x, x, 1.5*x),
z = rep(letters[seq_len(3)], each = length(x))
)
The plot of this would look like this:
ggplot(df, aes(x, y, colour = z)) +
geom_point() +
geom_line()
Adjusting the x-axis would work like so:
ggplot(df, aes(x, y, colour = z)) +
geom_point() +
geom_line() +
scale_x_continuous(
trans = "log2",
labels = scales::math_format(2^.x, format = log2)
)
The labels argument is just so you have labels in the format 2^x, you could change that to whatever you like.
I have used the function pd <- position_dodge(1000) to avoid overlapping of my error bars, which caused the problem.
Adjusting the amount of position dodge and the with of the error bars according to the new scaling solved the problem.
pd <- position_dodge(0.2) # move them .2 to the left and right
ggplot(summary, aes(x=settings.numPoints, y=benchmark.costs.average, colour=algorithm)) +
geom_errorbar(aes(ymin=benchmark.costs.average-se, ymax=benchmark.costs.average+se), width=0.4, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd) +
scale_x_continuous(
trans = "log2",
labels = scales::math_format(2^.x, format = log2)
)
Adding scale_y_continuous(trans="log2") yields the results I was looking for:
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I want to create a graph that looks something like this:
However, I would like to incorporate density based on the connected lines (and not individual plot points, as the graph above using geom_density_2d does). The data, in reality, looks something like this:
Where I am showing gene expression over a 4-point time series (y = gene expression value, x = time) In both examples, the centre line was created using LOESS curve fitting.
How can I create a density or contour plot based on the actual individual connecting lines that span from time=1 to time=4?
This is what have done so far:
# make a dataset
test <- data.frame(gene=rep(c((1:500)), each=4),
time=rep(c(1:4), 125),
value=rep(c(1,2,3,1), 125))
# add random noise to dataset
test$value <- jitter(test$value, factor=1,amount=2)
# first graph created as follows:
ggplot(data=test, aes(x=time, y=value)) +
geom_density_2d(colour="grey") +
scale_x_continuous(limits = c(0,5),
breaks = seq(1,4),
minor_breaks = seq(1)) +
scale_y_continuous(limits = c(-3,8)) +
guides(fill=FALSE) +
theme_classic()
# second plot created as follows
ggplot(test, aes(time, value)) +
geom_line(aes(group = gene),
size = 0.5,
alpha = 0.3,
color = "snow3") +
geom_point() +
scale_y_continuous(limits = c(-3, 8)) +
scale_x_continuous(breaks = seq(1,4), minor_breaks = seq(1)) +
theme_classic()
Thanks in advance for your help!
In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")
I have a plot with three different lines. I want one of those lines to have points on as well. I also want the two lines without points to be thicker than the one without points. I have managed to get the plot I want, but I the legend isn't keeping up.
library(ggplot2)
y <- c(1:10, 2:11, 3:12)
x <- c(1:10, 1:10, 1:10)
testnames <- c(rep('mod1', 10), rep('mod2', 10), rep('meas', 10))
df <- data.frame(testnames, y, x)
ggplot(data=df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1)) +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0, 0))
I can remove the second (black) legend:
ggplot(data = df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1), guide='none') +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0.05, 0.05), guide='none')
But what I really want is a merge of the two legends - a legend with colours, cross only on the first variable (meas) and the lines of mod1 and mod2 thicker than the first line. I have tried guide and override, but with little luck.
You don't need transparency to hide the shapes for mod1 and mod2. You can omit these points from the plot and legend by setting their shape to NA in scale_shape_manual:
ggplot(data = df, aes(x = x, y = y, colour = testnames, size = testnames)) +
geom_line() +
geom_point(aes(shape = testnames), size = 5) +
scale_size_manual(values=c(0.5, 2, 2)) +
scale_shape_manual(values=c(8, NA, NA))
This gives the following plot:
NOTE: I used some more distinct values in the size-scale and another shape in order to better illustrate the effect.