I have some biological data for two individuals, and I graph it using R as a scatterplot using ggplot like this:
p1<-ggplot(data, aes(meth_matrix$sample1, meth_matrix$sample3)) +
geom_point() +
theme_minimal()
which works perfect, but I want to add lines to it: the abline that divides the scatterplot in half:
p1 + geom_abline(color="blue")
and my question is: how can I draw two red lines parallel to that diagonal (y intercept would be 0.2, slope would be the same as the blue line) ??
Also: how can I draw the difference of both samples in a similar scatterplot (it will look like a horizontal scatterplot) with ggplot? right now I can only do it with plot like:
dif_samples<-meth_matrix$sample1- meth_matrix$sample3
plot(dif_samples, main="difference",
xlab="CpGs ", ylab="Methylation ", pch=19)
(also I'd like adding the horizontal blue line and the red lines paralllel to the blue line)
Please help!!!
Thank you very much.
You can specify slopes and intercepts in the geom_abline() function. I'll use the iris dataset that comes with ggplot2 to illustrate:
# I'll use the iris dataset. I normalise by dividing the variables by their max so that
# a line through the origin will be visible
library(ggplot2)
p1 <- ggplot(iris, aes(Sepal.Length/max(Sepal.Length), Sepal.Width/max(Sepal.Width))) +
geom_point() + theme_minimal()
# Draw lines by specifying their slopes and intercepts. since all lines
# share a slope I just give one insted of a vector of slopes
p1 + geom_abline(intercept = c(0, .2, -.2), slope = 1,
color = c("blue", "red", "red"))
I'm not as clear on exactly what you want for the second plot, but you can plot differences directly in the call to ggplot() and you can add horizontal lines with geom_hline():
# Now lets plot the difference between sepal length and width
# for each observation
p2 <- ggplot(iris, aes(x = 1:nrow(iris),
y = (Sepal.Length - Sepal.Width) )) +
geom_point() + theme_minimal()
# we'll add horizontal lines -- you can pick values that make sense for your problem
p2 + geom_hline(yintercept = c(3, 3.2, 2.8),
color = c("blue", "red", "red"))
Created on 2018-03-21 by the reprex package (v0.2.0).
Related
i am trying to add a fitted distribution to the histogram, but after I run it, it is just a straight line. How can i get a density line?
hist(data$price) lines(density(data$price)), lwd = 2, col ="red")
You are using graphics function hist. Use MASS function truehist instead
MASS::truehist(data$price)
lines(density(data$price)), lwd = 2, col ="red")
#Chriss gave a good solution--it does produce a density curve on top of the histogram; however, it changes the y-axis so that you only see the density values (losing the count values).
Here is an alternate solution that will place the frequency counts on the left-side y-axis and add density as a right-side y-axis. Tweak code as needed for things like bins, color, etc. I'm using the mtcars data as an example since there was no code or data provided in the question to replicate. In addition to the two libraries used here (ggpubr and cowplot), you may need to use some ggplot functions to better customize these plot options.
Code for this solution was modified from https://www.datanovia.com/en/blog/ggplot-histogram-with-density-curve-in-r-using-secondary-y-axis/
# packages needed
library(ggpubr)
library(cowplot)
# load data (none provided in the original question)
data("mtcars")
# create histogram (I have 10 bins here, but you may need a different amount)
phist <- gghistogram(mtcars, x="hp", bins=10, fill="blue", ylab="Count (blue)") + ggtitle("Car Horsepower Histogram")
# create density plot, removing many plot elements
pdens <- ggdensity(mtcars, x="hp", col="red", size=2, alpha = 0, ylab="Density (red)") +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), position = "right") +
theme_half_open(11, rel_small = 1) +
rremove("x.axis")+
rremove("xlab") +
rremove("x.text") +
rremove("x.ticks") +
rremove("legend")
# overlay and display the plots
aligned_plots <- align_plots(phist, pdens, align="hv", axis="tblr")
ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
I would like to draw a histogram with a density curve and then put a boxplot above the top margin. I know how to do this using the hist(), boxplot() and layout() functions, or using functions from the ggplot2 and grid packages. However, I am looking for a specific solution using ggplot2 and the ggMarginal() function within the ggExtra package. Let's simulate some data before I present my problem:
library(ggplot2)
library(ggExtra)
set.seed(1234)
vdat = data.frame(V1 = c(sample(1:10, 100, T), 99))
vname = colnames(vdat)[1]
boxplot(vdat[[vname]], horizontal = T)
To note, I explicitly insert an outlier 99 into a sample of numbers from 1 to 10. Hence, when I draw the boxplot, 99 should be displayed as an outlier.
I can easily draw a histogram using ggplot2.
p = ggplot(data=vdat, aes_string(x=vname)) +
geom_histogram(aes(y=stat(density)),
bins=nclass.Sturges(vdat[[vname]])+1,
color="black", fill="steelblue", na.rm=T) +
geom_density(na.rm=T) +
theme_bw()
p
When I try to use ggMarginal to add a marginal boxplot, the added boxplots are not right.
p1 = ggMarginal(p, type="boxplot")
p1
The boxplot on the right might be right. But the one on top, which is the very one I need, is definitely wrong. The outlier 99 is not there and the median is clearly not right.
When I try not to provide p1, but the original data, x, and y as suggested by the help documentation, I get the right boxplot but the histogram is now gone.
p2 = ggMarginal(data=vdat, x=vname, y=NA, type="boxplot", margins="x")
p2
How can I combine the correct parts of p1 and p2 such that I have the histogram from p1 and the boxplot from p2?
I am trying something like
p1 + p2
or
ggMarginal(p1, data=vdat, x=vname, y=NA, type="boxplot", margins="x")
But they are not working.
According to ggMarginal's documentation, p is expected to be a ggplot scatterplot. We can insert the following line as the first geom layer in p:
geom_point(aes(y = 0.01), alpha = 0)
y = 0.01 was chosen as a value within the existing plot's y-axis range, and alpha = 0 ensures this layer isn't visible.
Running your code with this p should give you the boxplot with outlier.
p <- ggplot(data=vdat, aes_string(x=vname)) +
geom_point(aes(y = 0.01), alpha = 0) +
geom_histogram(aes(y=stat(density)),
bins=nclass.Sturges(vdat[[vname]])+1,
color="black", fill="steelblue", na.rm=T) +
geom_density(na.rm=T) +
theme_bw()
p1 = ggMarginal(p, type="boxplot", margins = "x")
p1
By the way, I don't think it really makes sense to plot a boxplot to the right in this instance, since you have not assigned any variable to y.
I'm trying to make a plot that overlays a bunch of simulated density plots that are one color with low alpha and one empirical density plot with high alpha in a new color. This produces a plot that looks about how I want it.
library(ggplot2)
model <- c(1:100)
values <- rnbinom(10000, 1, .4)
df = data.frame(model, values)
empirical_data <- rnbinom(1000, 1, .3)
ggplot() +
geom_density(aes(x=empirical_data), color='orange') +
geom_line(stat='density',
data = df,
aes(x=values,
group = model),
color='blue',
alpha = .05) +
xlab("Value")
However, it doesn't have a legend and I can't figure out how to add a legend to differentiate plots from df and plots from empirical_data.
The other road I started to go down was to put them all in one dataframe but I couldn't figure out how to change the color and alpha for just one of the density plots.
Moving the color = ... into the aes allows you to call the scale_color_manual and move them into the aes and make the values you pass to color a binding. You can then change it to whatever you want as the actual colors are determined in the scale_color_manual.
ggplot() +
geom_density(aes(x=empirical_data, color='a')) +
geom_line(stat='density',
data = df,
aes(x=values,
group = model,
color='b'),
alpha = .05) +
scale_color_manual(name = 'data source',
values =c('b'='blue','a'='orange'),
labels = c('df','empirical_data')) +
xlab("Value")
I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.
I have some code that is plots a histogram of some values, along with a few horizontal lines to represent reference points to compare against. However, ggplot is not generating a legend for the lines.
library(ggplot2)
library(dplyr)
## Siumlate an equal mix of uniform and non-uniform observations on [0,1]
x <- data.frame(PValue=c(runif(500), rbeta(500, 0.25, 1)))
y <- c(Uniform=1, NullFraction=0.5) %>% data.frame(Line=names(.) %>% factor(levels=unique(.)), Intercept=.)
ggplot(x) +
aes(x=PValue, y=..density..) + geom_histogram(binwidth=0.02) +
geom_hline(aes(yintercept=Intercept, group=Line, color=Line, linetype=Line),
data=y, alpha=0.5)
I even tried reducing the problem to just plotting the lines:
ggplot(y) +
geom_hline(aes(yintercept=Intercept, color=Line)) + xlim(0,1)
and I still don't get a legend. Can anyone explain why my code isn't producing plots with legends?
By default show_guide = FALSE for geom_hline. If you turn this on then the legend will appear. Also, alpha needs to be inside of aes otherwise the colours of the lines will not be plotted properly (on the legend). The code looks like this:
ggplot(x) +
aes(x=PValue, y=..density..) + geom_histogram(binwidth=0.02) +
geom_hline(aes(yintercept=Intercept, colour=Line, linetype=Line, alpha=0.5),
data=y, show_guide=TRUE)
And output: