Size of points in ggplot2 comparable across plots? - r

I am using ggplot2 to produce various plots in which the size of a point is proportional to the number of cases that have the same values of x and y. Is there a way to make the size of the points comparable across different plots that have different values of size?
Example using fake data:
df1 = data.frame(x = seq(1:10),
y = c(4,3.8,3.8,3.2,3.1,2.5,2,1.5,1.2,1.3),
size = c(1,20,1,70,100,70,1,1,110,1))
library(ggplot2)
pdf("plot.1.pdf")
ggplot(df1, aes(x = x, y = y, size = size)) + geom_point()
dev.off()
df2 = data.frame(x = seq(1:10),
y = c(4,3.8,3.8,3.2,3.1,2.5,2,1.5,1.2,1.3),
size = rep(1,length(y)))
pdf("plot.2.pdf")
ggplot(df2, aes(x = x, y = y, size = size)) + geom_point()
dev.off()
The points in Plot 1, which all have size equal to 1, are much larger than the points in Plot 2 for which size equals 1. I need a version of the plots where points with the same value of size have the same size across different plots. Thank you,
Sofia

One possibility is to use scale_size_identity() - that will interpret size directly as units of pointsize, so in both plots points with value 1 will be the same size. But this approach will make too large points if size values are big (as in your case). To deal with problem of too big points, you can use transformation inside scale, for example, square root, with argument trans="sqrt".
ggplot(df1, aes(x = x, y = y, size = size)) +
geom_point()+scale_size_identity(trans="sqrt",guide="legend")
ggplot(df2, aes(x = x, y = y, size = size)) +
geom_point()+scale_size_identity(trans="sqrt",guide="legend")
UPDATE
As pointed out by #hadley, easiest way to achieve this is to set limits= inside scale_size_continuous() to the same values to get identical sizes.
ggplot(df1, aes(x = x, y = y, size = size)) + geom_point()+
scale_size_continuous(limits=c(1,110))
ggplot(df2, aes(x = x, y = y, size = size)) + geom_point()+
scale_size_continuous(limits=c(1,110))

Related

R: How to fill geom_hex with a numerical value and heat scale it?

I was wondering how I can scale geom_hex not on count, but rather by a variable and heat scale it? I am also having overfitting in my actual model and was wondering how to eliminate that? Here's an examples:
'''
ggplot(data = diamonds)+
geom_hex(mapping = aes(x = x, y = price, fill = depth, bins =
25))+
scale_fill_continuous(type = "viridis")
'''
Thanks!
I think this will do the trick, assuming you want to colour the hexagons according to the mean of depth...
ggplot(diamonds, aes(x = x, y = price, z = depth)) +
stat_summary_hex(fun = mean, bins = 25) +
scale_fill_continuous(type = "viridis")

ggplot2 plots more points than asked

I am trying to fill a square region with non-overlapping squares with different colors and ggplot2 is plotting more points than those in the dataframe at the higher x and y limits. Here is the code
l = 1000
a=seq(0,1, 1/(l-1))
x=rep(a, each=length(a))
y=rep(a, length(a))
k = length(x)
c=sample(1:10, k, replace = TRUE)
data <- data.frame(x, y, c)
ggplot(data, aes(x=x, y=y)) + geom_point(shape=15, color=c)
ggsave('k.jpg', width=10, height=10)
The result I am getting with RStudio is this. Notice the extra points on the right and top of the image.
How can I get ggplot to plot exactly one square exclusively for those points in the dataframe and not more?
As a second related question, this is what happens if l is changed from 1000 to l=100
My problem is now that the squares are not perfectly stacked, leaving empty space between them. I would like to know how can I compute from the number of points in each dimension of the array (l), the correct value for size inside geom_point so that the squares are perfectly stacked.
Many thanks
You might be better off with geom_tile, rather than geom_point, as this will allow more control over the size of the rectangles and the border width. See ?geom_tile for details.
Providing a couple of alternatives using OP's example, reducing the data frame dimension to increase the size of the tile:
Data
library(ggplot2)
l = 100
a = seq(0, 1, 1 / (l - 1))
x = rep(a, each = length(a))
y = rep(a, length(a))
k = length(x)
c = sample(1:10, k, replace = TRUE)
data <- data.frame(x, y, c)
Example 1
Very simple, just pasing "white" as colour to make the tiles more distinctive.
ggplot(data, aes(x = x, y = y, fill = c)) + geom_tile(colour = "white")
Example 2
Creating manually a palette, and coord_equal to force a specified ratio (default 1) so tiles are squares:
colors<-c("peachpuff", "yellow", "orange", "orangered", "red",
"darkred","firebrick", "royalblue", "darkslategrey", "black")
ggplot(data, aes(x = x, y = y)) +
geom_tile(aes(fill = factor(c)), colour = "white") +
scale_fill_manual(values = colors, name = "Colours") +
coord_equal()
Comparing geom_point and geom_tile
Creating small data frame (10 x 10, l = 10) to observe closer what happens when using geom_point instead of geom_tile.
Original OP code
ggplot(data, aes(x = x, y = y)) + geom_point(shape = 15, color = c)
Example 1
ggplot(data, ae(x = x, y = y, fill = c)) + geom_tile(colour = "white")
Example 2
colors<-c("peachpuff", "yellow", "orange", "orangered", "red",
"darkred","firebrick", "royalblue", "darkslategrey", "black")
ggplot(data, aes(x = x, y = y)) +
geom_tile(aes(fill = factor(c)), colour = "white") +
scale_fill_manual(values = colors, name = "Colours") +
coord_equal()

ggplot2: how to add sample numbers to density plot?

I am trying to generate a (grouped) density plot labelled with sample sizes.
Sample data:
set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))
The unlabelled density plot is generated and looks as follows:
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.
I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013
n_fun <- function(x){
return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4) +
stat_summary(geom = "text", fun.data = n_fun)
However, this fails with Error: stat_summary requires the following missing aesthetics: y.
I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.
I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.
Where am I going wrong?
The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).
Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.
However, you can add text to desired positions by geom_text or annotate:
# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]
# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text) # we can also get p.text with dplyr.
# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
label = sprintf('n = %d', p.text$n), vjust = 0)

Automated way to prevent ggplot hexbin from cutting geoms off axes

This is a slightly different question from an earlier post(ggplot hexbin shows different number of hexagons in plot versus data frame).
I am using hexbin() to bin data into hexagon objects, and ggplot() to plot the results. I notice that, sometimes, the hexagons on the edge of the plot are cut in half. Below is an example.
library(hexbin)
library(ggplot2)
set.seed(1)
data <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100), D=rnorm(100), E=rnorm(100))
maxVal = max(abs(data))
maxRange = c(-1*maxVal, maxVal)
x = data[,c("A")]
y = data[,c("E")]
h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, xbnds=maxRange, ybnds=maxRange)
hexdf <- data.frame (hcell2xy (h), hexID = h#cell, counts = h#count)
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
coord_cartesian(xlim = c(maxRange[1], maxRange[2]), ylim = c(maxRange[1], maxRange[2]))
This creates a graphic where one hexagon is cut off at the top and one hexagon is cut off at the bottom:
Another approach I can try is to hard-code a value (here 1.5) to be added to the limits of the x and y axis. Doing so does seem to solve the problem in that no hexagons are cut off anymore.
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
scale_x_continuous(limits = maxRange * 1.5) +
scale_y_continuous(limits = maxRange * 1.5)
However, even though the second approach solves the problem in this instance, the value of 1.5 is arbitrary. I am trying to automate this process for a variety of data and variety of bin sizes and hexagon sizes that could be used. Is there a solution to keeping all hexagons fully visible in the plot without having to hard-code an arbitrary value that may be too large or too small for certain instances?
Consider that you can skip the computation of hexbin, and let ggplot do the job.
Then, if you prefer to manually set the width of the bins you can set the binwidth and modify the limits:
bwd = 1
ggplot(data, aes(x = x, y = y)) +
geom_hex(binwidth = bwd) +
coord_cartesian(xlim = c(min(x) - bwd, max(x) + bwd),
ylim = c(min(y) - bwd, max(y) + bwd),
expand = T) +
geom_point(color = "red") +
theme_bw()
this way, hexagons should never be truncated (though you may end up with some "empty" space.
Result with bwd = 1:
Result with bwd = 3:
If instead you prefer to programmatically set the number of the bins, you can use:
nbins_x <- 4
nbins_y <- 6
range_x <- range(data$A, na.rm = T)
range_y <- range(data$E, na.rm = T)
bwd_x <- (range_x[2] - range_x[1])/nbins_x
bwd_y <- (range_y[2] - range_y[1])/nbins_y
ggplot(data, aes(x = A, y = E)) +
geom_hex(bins = c(nbins_x,nbins_y)) +
coord_cartesian(xlim = c(range_x[1] - bwd_x, range_x[2] + bwd_x),
ylim = c(range_y[1] - bwd_y, range_y[2] + bwd_y),
expand = T) +
geom_point(color = "red")+
theme_bw()

Adjusting axis in ggtern ternary plots

I am trying to generate a ternary plot using ggtern.
My data ranges from 0 - 1000 for x, y,and z variables. I wondered if it is possible to extend the axis length above 100 to represent my data.
#Nevrome is on the right path, your points will still be plotted as 'compositions', ie, concentrations sum to unity, but you can change the labels of the axes, to indicate a range from 0 to 1000.
library(ggtern)
set.seed(1)
df = data.frame(x = runif(10)*1000,
y = runif(10)*1000,
z = runif(10)*1000)
breaks = seq(0,1,by=0.2)
ggtern(data = df, aes(x, y, z)) +
geom_point() +
limit_tern(breaks=breaks,labels=1000*breaks)
I think there is no direct solution to do this with ggtern. But an easy workaround could look like this:
library(ggtern)
df = data.frame(x = runif(50)*1000,
y = runif(50)*1000,
z = runif(50)*1000,
Group = as.factor(round(runif(50,1,2))))
ggtern() +
geom_point(data = df, aes(x/10, y/10, z/10, color = Group)) +
labs(x="X", y="Y", z="Z", title="Title") +
scale_T_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_L_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_R_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2))

Resources