I want to create a graph that looks something like this:
However, I would like to incorporate density based on the connected lines (and not individual plot points, as the graph above using geom_density_2d does). The data, in reality, looks something like this:
Where I am showing gene expression over a 4-point time series (y = gene expression value, x = time) In both examples, the centre line was created using LOESS curve fitting.
How can I create a density or contour plot based on the actual individual connecting lines that span from time=1 to time=4?
This is what have done so far:
# make a dataset
test <- data.frame(gene=rep(c((1:500)), each=4),
time=rep(c(1:4), 125),
value=rep(c(1,2,3,1), 125))
# add random noise to dataset
test$value <- jitter(test$value, factor=1,amount=2)
# first graph created as follows:
ggplot(data=test, aes(x=time, y=value)) +
geom_density_2d(colour="grey") +
scale_x_continuous(limits = c(0,5),
breaks = seq(1,4),
minor_breaks = seq(1)) +
scale_y_continuous(limits = c(-3,8)) +
guides(fill=FALSE) +
theme_classic()
# second plot created as follows
ggplot(test, aes(time, value)) +
geom_line(aes(group = gene),
size = 0.5,
alpha = 0.3,
color = "snow3") +
geom_point() +
scale_y_continuous(limits = c(-3, 8)) +
scale_x_continuous(breaks = seq(1,4), minor_breaks = seq(1)) +
theme_classic()
Thanks in advance for your help!
Related
i am currently plotting (long format) data which consists of fluorescence (RFU) on the 1. Y-Axis and Growth (OD600) on the 2. Y-Axis. I have managed to create the plots, but i find it very difficult to log transform the 2. Y-axis (for OD600) and not messing up the entire plot. (The data is all derived from the same data frame)
My question is this: Is there any way to log10 transform only the 2. Y-axis (from 0.01-1) and making perhaps 5 breaks something like:("0.01","0.1","0.5","0.1")?
My code looks like this: (i apologize for ugly code)
for (i in 1:length(unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))])){
print(i)
coeff <- 1/max(lf_combined_test$normalized_gfp)
p1<-lf_combined_test[lf_combined_test$media %in% unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i], ] %>%
# filter(normalized_gfp>0) %>%
filter(row_number() %% 3 == 1) %>%
ggplot( aes(x=time)) +
geom_bar( aes(y=normalized_gfp), stat="identity", size=.1, fill="green", color="green", alpha=.4)+
geom_line( aes(y=od / coeff), size=2, color="tomato") +
scale_x_continuous(breaks = round(seq(0,92, by = 5),1))+
geom_vline(xintercept = 12, linetype="dotted",
color = "blue", size=1)+
scale_y_continuous(limits = c(0,80000),
name = "Relative Flourescence [RFU]/[OD] ",
sec.axis = sec_axis(~.*coeff, name="[OD600]")
) +
scale_y_log10(limits=c(0.01,1))+
theme_grey() +
theme(
axis.title.y = element_text(color = "green", size=13),
axis.title.y.right = element_text(color = "tomato", size=13)
) +
ggtitle(paste("Relative fluorescence & OD600 time series for",unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i],sep=" "))
print(p1)
)
}
Which gives a plots that looks like this for now:
Thank you very much in advance! :))
Yes, this is certainly possible. Without your data set it is difficult to give you specific code, but here is an example using the built-in mtcars data set. We plot a best-fitting line for mpg against an x axis of wt.
p <- ggplot(mtcars, aes(wt, mpg)) + geom_smooth(aes(color = 'mpg'))
p
Suppose we want to draw the value of disp according to a log scale which we will show on the y axis. We need to carry out the log transform of our data to do this, but also multiply it by 10 to get it on a similar visual scale to the mpg line:
p <- p + geom_smooth(aes(y = 10 * log10(disp), color = 'disp'))
p
To draw the secondary axis in, we need to supply it with the reverse transformation of 10 * log10(x), which is 10^(x/10), and we will supply appropriately logarithmic breaks at 10, 100 and 1000
p + scale_y_continuous(
sec.axis = sec_axis(~ 10^(.x/10), breaks = c(10, 100, 1000), name = 'disp'))
It seems that you are generating the values of your line by using od / coeff, and reversing that transform with .*coeff, which seems appropriate, but to get a log10 axis, you will need to do something like log10(od) * constant and reverse it with 10^(od/constant). Without your data, it's impossible to know what this constant should be, but you can play around with different values until it looks right visually.
I'd like to use facet_zoom but for some reason the zoomed area results empty.
The two data sets I use are just numeric vectors of 1.000.000 numbers generated from a modified polynomial distribution. In the zoomed area there is a small spike that I'd like to show.
prova <-readRDS("probcond1.rds")
prova1 <-readRDS("probpoly.rds")
dfGamma <-data.frame(prova)
ggplot(dfGamma, aes(x=prova)) + stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3)
g <- ggplot(dfGamma, aes(x=prova)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000,4000,5000), trans="log1p", expand=c(0,0)) +
theme_bw()
g+expand_limits(x = c(1, 6000)) +facet_zoom(xlim = c(4000,5000))
I'm really new to R. sorry for my ignorance
Your axis is on a log1p scale, so your xlim should be wrapped inside log1p to do a zoom. You can do as follows:
g+expand_limits(x = c(1, 6000)) +facet_zoom(xlim = c(log1p(4000),log1p(5000)))
Here is a sample using the mtcars dataset.
library(ggplot2)
library(ggforce)
g <- ggplot(mtcars, aes(x=hp)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300), trans="log1p", expand=c(0,0)) +
theme_bw()
If you use facet_zoom(xlim = c(100,300)) as follows will produce empty zoom output (flat values of 100 and 300 don't exist on the g's x-axis):
g+expand_limits(x = c(1, 300)) +facet_zoom(xlim = c(100,300))
Output-1 (flat value zoom)
If you transform the xlim using log1p, you can zoom on the corresponding values of the x-axis of plot g. You can do that as follows:
g+expand_limits(x = c(1, 300)) +facet_zoom(xlim = c(log1p(100),log1p(300)))
Output-2 (log1p zoom)
If you want to zoom in the axis independently, you can do as follows:
g+expand_limits(x = c(1, 300)) +facet_zoom(xlim = c(log1p(100),log1p(300)), ylim = c(5,10), split = TRUE)
Output
As you can see I did zoom the ylim between 5 and 10 and the split = TRUE makes the zoom independent and you can have multiple views of the zoom axis or if you just want one view, you can leave the split to its default value FALSE. The manual has a lot more information which you might want to consult, just in case it is available at Package ‘ggforce’
Hope that helps.
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))
I am doing some research on non-defaulters and defaulters with regards to banking. In that context I am plotting their distributions relative to some score in a bar plot. The higher the score, the better the credit rating.
Since the number of defaults is very limited compared to the number of non-defaults plotting the defaults and non-defaults on the same bar plot is not very giving as you hardly can see the defaults. I then make a second bar plot based on the defaulters' scores only, but on the same interval scale as the full bar plot of both the scores of the defaulters and non-defaulters. I would then like to add vertical lines to the first bar plot indicating where the highest defaulter score is located and the lowest defaulter score is located. That is to get a view of where the distribution of the defaulters fit into that of the overall distribution of both defaulters and non-defaulters.
Below is the code I am using replaced with (seeded) random data instead.
library(ggplot2)
#NDS represents non-defaults and DS defaults on the same scale
#although here being just some random normals for the sake of simplicity.
set.seed(10)
NDS<-rnorm(10000,sd=1)-2
DS<-rnorm(100,sd=2)-5
#Cutoffs are constructed such that intervals of size 0.3
#contain all values of NDS & DS
minCutoff<--9.3
maxCutoff<-2.1
#Generate the actual interval "bins"
NDS_CUT<-cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
DS_CUT<-cut(DS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
#Manually generate where to put the vertical lines for min(DS) and max(DS)
minDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[1]
maxDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[32]
#Generate data frame - seems stupid, but makes sense
#when the "real" data is used :-)
NDSdataframe<-cbind(as.data.frame(NDS_CUT),rep(factor("State-1"),length(NDS_CUT)))
colnames(NDSdataframe)<-c("Score","Action")
DSdataframe<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe)<-c("Score","Action")
fulldataframe<-rbind(NDSdataframe,DSdataframe)
attach(fulldataframe)
#Plot the full distribution of NDS & DS
# with geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
# that unfortunately does not show :-(
fullplot<-ggplot(fulldataframe, aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) + geom_bar(position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts (legend.position = "none") + xlab("Scoreinterval") + ylab("Antal pr. interval") + geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
#Generate dataframe for DS only
#It might seem stupid, but again makes sense
#when using the original data :-)
DSdataframe2<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe2)<-c("theScore","theAction")
#Calucate max number of observations to adjust bar plot of DS only
myMax<-max(table(DSdataframe2))+1
attach(DSdataframe2)
#Generate bar plot of DS only
subplot<-ggplot(fulldataframe, aes(theScore, fill=factor(theAction))) + geom_bar (position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts(legend.position = "none") + ylim(0, myMax) + xlab("Scoreinterval") + ylab("Antal pr. interval")
#plot on a grid
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(fullplot, vp = vplayout(1, 1))
print(subplot, vp = vplayout(2, 1))
#detach dataframes
detach(DSdataframe2)
detach(fulldataframe)
Furthermore, if anybody has an idea of how I can align the to plot so that correct intervals are just below/above each other on the grid plot
Hope somebody is able to help!
Thanks in advance,
Christian
Wrap aes around the xintercept in the geom_vline layer:
... + geom_vline(aes(xintercept = minDS_bar)) + geom_vline(aes(xintercept = maxDS_bar))
Question 1:
Since you provide the vertical lines as data, you have to map the aesthetics first, using aes()
fullplot <-ggplot(
fulldataframe,
aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) +
geom_bar(position="stack") +
opts(axis.text.x = theme_text(angle = 45)) +
opts (legend.position = "none") +
xlab("Scoreinterval") +
ylab("Antal pr. interval") +
geom_vline(aes(xintercept = minDS_bar)) +
geom_vline(aes(xintercept = maxDS_bar))
Second question:
To align the plots, you can use the align.plots() function in package ggExtra
install.packages("dichromat")
install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)
ggExtra::align.plots(fullplot, subplot)