I have a scatterplot and I need to draw a contour that contains all (or almost all) points.
I have managed to do that with stat_density_2d() with the bins option set to 2 and the geom_polygon(). However, since I have to set the bin to 2, I still have 2 contours, one in the 'center' of the polygon and the outter one. I only need the outter one.
What I have (a polygon with 2 bins: an inner one and the outter one):
What I need:
The inner bin looks small in this example but it looks unprofessional in bigger and more complex graphs.
Example:
set.seed(20)
x = rnorm(20, 3)
y = rnorm(20, 4)
points = tibble('x'=rnorm(10, 3), 'y'=rnorm(10, 4))
ggplot2::ggplot(data=points, mapping=aes(x=x, y=y, fill='grey', colour='black')) +
geom_point() +
stat_density_2d(aes(colour='black'), bins=2, geom='polygon') +
scale_fill_identity() + scale_colour_identity() +
geom_vline(xintercept=0, colour = 'black', linetype = 'solid') +
geom_hline(yintercept=0, colour = 'black', linetype = 'solid') +
xlim(-8, 8) + ylim(-8, 8)
Similar questions:
(the package in which this soution is based it is not longe availble)
ggplot: How to draw contour line for 2d scatter plot to outline the data points
How to plot a contour line showing where 95% of values fall within, in R and in ggplot2
Related
I want to draw two circles inside each other with ggplot2.
So far my effort is:
make a fake data and plot it with geom_line(). If I convert this with coord_polar() then I will not be able to see two different circles the one inside each other
library(ggplot2)
library(tidyverse)
x1=seq(0,6000000,1000)
y1=rep(1,length(x1))
y2=rep(2,length(x1))
data=as.data.frame(cbind(x1,y1,y2))
Created on 2021-12-25 by the reprex package (v2.0.1)
# plot the data
ggplot(data) +
geom_line(aes(x1,y1)) +
geom_line(aes(x1,y2))
#coord_polar()
I would avoid the geom_circle option and use the coord_polar option if possible.
The reason is that these two circles have some differences in the x-axis, which I would indicate after drawing the circles.
I would like my plot to look like this
The code you have with coord_polar() is correct, just the plot limits need adjusting to see both the circles, e.g.
ggplot(data) +
geom_line(aes(x1,y1)) +
geom_line(aes(x1,y2)) +
coord_polar() + ylim(c(0,NA))
The reason for using ylim is that this is the direction getting transformed to the radius by the coord_polar()
Why not use two geom_point() with different sizes and pch = 21?
library(ggplot2)
df <- tibble(x = 0, y = 0)
ggplot(df, aes(x, y)) +
geom_point(pch = 21, size = 50) +
geom_point(pch = 21, size = 40) +
theme_void()
I'm trying to make a plot that overlays a bunch of simulated density plots that are one color with low alpha and one empirical density plot with high alpha in a new color. This produces a plot that looks about how I want it.
library(ggplot2)
model <- c(1:100)
values <- rnbinom(10000, 1, .4)
df = data.frame(model, values)
empirical_data <- rnbinom(1000, 1, .3)
ggplot() +
geom_density(aes(x=empirical_data), color='orange') +
geom_line(stat='density',
data = df,
aes(x=values,
group = model),
color='blue',
alpha = .05) +
xlab("Value")
However, it doesn't have a legend and I can't figure out how to add a legend to differentiate plots from df and plots from empirical_data.
The other road I started to go down was to put them all in one dataframe but I couldn't figure out how to change the color and alpha for just one of the density plots.
Moving the color = ... into the aes allows you to call the scale_color_manual and move them into the aes and make the values you pass to color a binding. You can then change it to whatever you want as the actual colors are determined in the scale_color_manual.
ggplot() +
geom_density(aes(x=empirical_data, color='a')) +
geom_line(stat='density',
data = df,
aes(x=values,
group = model,
color='b'),
alpha = .05) +
scale_color_manual(name = 'data source',
values =c('b'='blue','a'='orange'),
labels = c('df','empirical_data')) +
xlab("Value")
I have some biological data for two individuals, and I graph it using R as a scatterplot using ggplot like this:
p1<-ggplot(data, aes(meth_matrix$sample1, meth_matrix$sample3)) +
geom_point() +
theme_minimal()
which works perfect, but I want to add lines to it: the abline that divides the scatterplot in half:
p1 + geom_abline(color="blue")
and my question is: how can I draw two red lines parallel to that diagonal (y intercept would be 0.2, slope would be the same as the blue line) ??
Also: how can I draw the difference of both samples in a similar scatterplot (it will look like a horizontal scatterplot) with ggplot? right now I can only do it with plot like:
dif_samples<-meth_matrix$sample1- meth_matrix$sample3
plot(dif_samples, main="difference",
xlab="CpGs ", ylab="Methylation ", pch=19)
(also I'd like adding the horizontal blue line and the red lines paralllel to the blue line)
Please help!!!
Thank you very much.
You can specify slopes and intercepts in the geom_abline() function. I'll use the iris dataset that comes with ggplot2 to illustrate:
# I'll use the iris dataset. I normalise by dividing the variables by their max so that
# a line through the origin will be visible
library(ggplot2)
p1 <- ggplot(iris, aes(Sepal.Length/max(Sepal.Length), Sepal.Width/max(Sepal.Width))) +
geom_point() + theme_minimal()
# Draw lines by specifying their slopes and intercepts. since all lines
# share a slope I just give one insted of a vector of slopes
p1 + geom_abline(intercept = c(0, .2, -.2), slope = 1,
color = c("blue", "red", "red"))
I'm not as clear on exactly what you want for the second plot, but you can plot differences directly in the call to ggplot() and you can add horizontal lines with geom_hline():
# Now lets plot the difference between sepal length and width
# for each observation
p2 <- ggplot(iris, aes(x = 1:nrow(iris),
y = (Sepal.Length - Sepal.Width) )) +
geom_point() + theme_minimal()
# we'll add horizontal lines -- you can pick values that make sense for your problem
p2 + geom_hline(yintercept = c(3, 3.2, 2.8),
color = c("blue", "red", "red"))
Created on 2018-03-21 by the reprex package (v0.2.0).
Using ggplot's geom_pointrange() function, how do I change the size of the point and the thickness of the line separately?
Example:
# make test data
df <- data.frame(y=10, ymin=1, ymax=20, x=1)
# store ggplot object
p <- ggplot(data=df, aes(y=y, ymin=ymin, ymax=ymax, x=x))
# plot 1: big dot and thick line
p + geom_pointrange(fill='blue', color='grey', shape=21, size=5)
# plot 2: small dot and thin line (I want small dot and thick line or vice versa)
p + geom_pointrange(fill='blue', color='grey', shape=21, lwd=1, size=5)
Plot 1:
Plot 2:
Can I get a small dot with a thick line (or vice-versa)?
A workaround might be to plot the line and point as separate geoms using geom_point and geom_errorbar. Unfortunately, my real application involves jittering, so the point and the interval end up in different places (unless maybe I can control the jittering?).
I can find similar questions on SO (like this), but they don't directly answer this one.
Thanks!
You can use fatten in combination with size:
p + geom_pointrange(fill='blue', color='grey', shape=21, fatten = 20, size = 5)
p + geom_pointrange(fill='blue', color='grey', shape=21, fatten = .5, size = 5)
s. ?geom_pointrange:
fatten
A multiplicative factor used to increase the size of the
middle bar in geom_crossbar() and the middle point in
geom_pointrange().
I have a scatterplot in R. Each (x,y) point is colored according to its z value. So you can think of each point as (x,y,z), where (x,y) determines its position and z determines its color along a color gradient. I would like to add two things
A legend on the right side showing the color gradient and what z values correspond to what colors
I would like to smooth all the color using some type of interpolation, I assume. In other words, the entire plotting region (or at least most of it) should become colored so that it looks like a huge heatmap instead of a scatterplot. So, in the example below, there would be lots of orange/yellow around and then some patches of purple throughout. I'm happy to further clarify what I'm trying to explain here, if need be.
Here is the code I have currently, and the image it makes.
x <- seq(1,150)
y <- runif(150)
z <- c(rnorm(mean=1,100),rnorm(mean=20,50))
colorFunction <- colorRamp(rainbow(100))
zScaled <- (z - min(z)) / (max(z) - min(z))
zMatrix <- colorFunction(zScaled)
zColors <- rgb(zMatrix[,1], zMatrix[,2], zMatrix[,3], maxColorValue=255)
df <- data.frame(x,y)
x <- densCols(x,y, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L
plot(y~x, data=df[order(df$dens),],pch=20, col=zColors, cex=1)
Here are some solutions using the ggplot2 package.
# Load library
library(ggplot2)
# Recreate the scatterplot from the example with default colours
ggplot(df) +
geom_point(aes(x=x, y=y, col=dens))
# Recreate the scatterplot with a custom set of colours. I use rainbow(100)
ggplot(df) +
geom_point(aes(x=x, y=y, col=dens)) +
scale_color_gradientn(colours=rainbow(100))
# A 2d density plot, using default colours
ggplot(df) +
stat_density2d(aes(x=x, y=y, z=dens, fill = ..level..), geom="polygon") +
ylim(-0.2, 1.2) + xlim(-30, 180) # I had to twiddle with the ranges to get a nicer plot
# A better density plot, in my opinion. Tiles across your range of data
ggplot(df) +
stat_density2d(aes(x=x, y=y, z=dens, fill = ..density..), geom="tile",
contour = FALSE)
# Using custom colours. I use rainbow(100) again.
ggplot(df) +
stat_density2d(aes(x=x, y=y, z=dens, fill = ..density..), geom="tile",
contour = FALSE) +
scale_fill_gradientn(colours=rainbow(100))
# You can also plot the points on top, if you want
ggplot(df) +
stat_density2d(aes(x=x, y=y, z=dens, fill = ..density..), geom="tile",
contour = FALSE) +
geom_point(aes(x=x, y=y, col=dens)) +
scale_colour_continuous(guide=FALSE) # This removes the extra legend
I attach the plots as well:
Also, using ggplot2, you can use color and size together, as in:
ggplot(df, aes(x=x, y=y, size=dens, color=dens)) + geom_point() +
scale_color_gradientn(name="Density", colours=rev(rainbow(100))) +
scale_size_continuous(range=c(1,15), guide="none")
which might make it a little clearer.
Notes:
The expression rev(rainbow(100)) reverses the rainbow color scale,
so that red goes with the larger values of dens.
Unfortunately, you cannot combine a continuous legend (color) and a
discrete legend (size), so you would normally get two legends. The
expression guide="none" hides the size legend.
Here's the plot: