I am trying to add a line to separate part of data in ggplot2. Following this thread:
Adding linear model abline to log-log plot in ggplot
I tried
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
ggplot(d, aes(x, y)) + geom_point() +
geom_abline(intercept = 100, slope = -1, col='red') +
scale_y_log10() + scale_x_log10()
but it did not plot the line. Note that the old plot approach got the line alright:
plot(d$x, d$y, log='xy')
abline(a = 100, b=-1, col='red', untf=TRUE)
This may not be the most elegant solution, but I usually define a separate data frame for predictions when I'm adding them to plots. I know that it's quicker in a lot of ways to add the model specification as part of the plot, but I really like the flexibility of having this as a separate object. Here's what I've got in mind in this case:
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
p = ggplot(d, aes(x,y)) + geom_point() + scale_x_log10() + scale_y_log10()
pred.func = function(x){
100 - x
}
new.dat = data.frame(x = seq(from = 5, to = 90))
new.dat$pred = pred.func(new.dat$x)
p + geom_line(aes(x = x, y = pred), data = new.dat, col = "red")
Related
I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.
I am trying to generate a ternary plot using ggtern.
My data ranges from 0 - 1000 for x, y,and z variables. I wondered if it is possible to extend the axis length above 100 to represent my data.
#Nevrome is on the right path, your points will still be plotted as 'compositions', ie, concentrations sum to unity, but you can change the labels of the axes, to indicate a range from 0 to 1000.
library(ggtern)
set.seed(1)
df = data.frame(x = runif(10)*1000,
y = runif(10)*1000,
z = runif(10)*1000)
breaks = seq(0,1,by=0.2)
ggtern(data = df, aes(x, y, z)) +
geom_point() +
limit_tern(breaks=breaks,labels=1000*breaks)
I think there is no direct solution to do this with ggtern. But an easy workaround could look like this:
library(ggtern)
df = data.frame(x = runif(50)*1000,
y = runif(50)*1000,
z = runif(50)*1000,
Group = as.factor(round(runif(50,1,2))))
ggtern() +
geom_point(data = df, aes(x/10, y/10, z/10, color = Group)) +
labs(x="X", y="Y", z="Z", title="Title") +
scale_T_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_L_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_R_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2))
Currently, the below code (part of a more comprehensive code) generates a line that ranges from the very left to the very right of the graph.
geom_abline(intercept=-8.3, slope=1/1.415, col = "black", size = 1,
lty="longdash", lwd=1) +
However, I would like the line to only range from x=1 to x=9; the limits of the x-axis are 1-9.
In ggplot2, is there a command to reduce a line that is derived from a manually defined intercept and slope to only cover the range of the x-axis value limits?
You could use geom_segment instead of geom_abline if you want to manually define the line. If your slope is derived from the dataset you are plotting from, the easiest thing to do is use stat_smooth with method = "lm".
Here is an example with some toy data:
set.seed(16)
x = runif(100, 1, 9)
y = -8.3 + (1/1.415)*x + rnorm(100)
dat = data.frame(x, y)
Estimate intercept and slope:
coef(lm(y~x))
(Intercept) x
-8.3218990 0.7036189
First make the plot with geom_abline for comparison:
ggplot(dat, aes(x, y)) +
geom_point() +
geom_abline(intercept = -8.32, slope = 0.704) +
xlim(1, 9)
Using geom_segment instead, have to define the start and end of the line for both x and y. Make sure line is truncated between 1 and 9 on the x axis.
ggplot(dat, aes(x, y)) +
geom_point() +
geom_segment(aes(x = 1, xend = 9, y = -8.32 + .704, yend = -8.32 + .704*9)) +
xlim(1, 9)
Using stat_smooth. This will draw the line only within the range of the explanatory variable by default.
ggplot(dat, aes(x, y)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, color = "black") +
xlim(1, 9)
I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)
I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question: