The line width (size) aesthetics in ggplot2 seems to print approximately 2.13 pt wider lines to a pdf (the experiment was done in Adobe Illustrator with a Mac):
library(ggplot2)
dt <- data.frame(id = rep(letters[1:5], each = 3), x = rep(seq(1:3), 5), y = rep(seq(1:5), each = 3), s = rep(c(0.05, 0.1, 0.5, 1, 72.27/96*0.5), each = 3))
lns <- split(dt, dt$id)
ggplot() + geom_line(data = lns[[1]], aes(x = x, y = y), size = unique(lns[[1]]$s)) +
geom_text(data = lns[[1]], y = unique(lns[[1]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[1]]$s))) +
geom_line(data = lns[[2]], aes(x = x, y = y), size = unique(lns[[2]]$s)) +
geom_text(data = lns[[2]], y = unique(lns[[2]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[2]]$s))) +
geom_line(data = lns[[3]], aes(x = x, y = y), size = unique(lns[[3]]$s)) +
geom_text(data = lns[[3]], y = unique(lns[[3]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[3]]$s))) +
geom_line(data = lns[[4]], aes(x = x, y = y), size = unique(lns[[4]]$s)) +
geom_text(data = lns[[4]], y = unique(lns[[4]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[4]]$s))) +
geom_line(data = lns[[5]], aes(x = x, y = y), size = unique(lns[[5]]$s)) +
geom_text(data = lns[[5]], y = unique(lns[[5]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[5]]$s))) +
xlim(1,4) + theme_void()
ggsave("linetest.pdf", width = 8, height = 2)
# Device size does not affect line width:
ggsave("linetest2.pdf", width = 10, height = 6)
I read that one should multiply the line width by 72.27/96 to get a line width in pt, but the experiment above gives me a line width of 0.8 pt, when I try to get 0.5 pt.
As #Pascal points out, the line width does not seem to follow the pt to mm conversion that works for fonts and was defined by #hadley in one of the comments. I.e. the line width does not appear to be defined by "the magic number" 1/0.352777778.
What is the equation behind line width for ggplot2?
You had all the pieces in your post already. First, ggplot2 multiplies the size setting by ggplot2::.pt, which is defined as 72.27/25.4 = 2.845276 (line 165 in geom-.r):
> ggplot2::.pt
[1] 2.845276
Then, as you state, you need to multiply the resulting value by 72.27/96 to convert from R pixels to points. Thus the conversion factor is:
> ggplot2::.pt*72.27/96
[1] 2.141959
As you can see, ggplot2 size = 1 corresponds to approximately 2.14pt, and similarly 0.8 pt corresponds to 0.8/2.141959 = 0.3734899 in ggplot2 size units.
Related
Objective:
Create the XY scatterplot of variables (xx,yy). Color the corresponding Cartesian quadrants according to a third variable's (return) median.
I've created the color vector using colorRampPalette. The issue is that it is being read as continuous (though the vector is discrete).
Have the scatter points be blue (not labeled "blue")
Include a label on each quadrant according to dt.data[, quadrants] so that it is easy to identify what the area corresponds to. So the mark A or the top right, B on bottom right, etc.
This is the code I've written.
library(data.table)
set.seed(42)
dt <- data.table(
xx = rnorm(40, 0, 2),
yy = rnorm(40, 0, 2),
return = rnorm(40, 1, 3))
## compute the range we're going to want to plot over
## in this case 50% more than the max value
RANGE <- 1.5 * dt[, max(abs(c(xx, yy)))]
## compute the medians per quadrant
dtMedians <- dt[,
.(med = median(return)),
.(sign_x = sign(xx), sign_y = sign(yy))]
## set up some fake labels
dtMedians[, quadrant := letters[1:4]]
## compute a color scale for the medians and assign it
fcol <- colorRampPalette(c("#FC4445", "#3FEEE6", "#5CDB95"))
dtMedians[, col := fcol(4)[rank(med)]]
Mycol <- dt.Medians[, .(col)]
dt.rects2<- data.table(
quadrant = letters[1:4],
xmin= c(0,0,-RANGE, -RANGE),
xmax= c(RANGE,RANGE,0,0),
ymin= c(0,-RANGE,-RANGE,0),
ymax= c(RANGE,0,0,RANGE))
dt.data <- merge(dtMedians, dt.rects2, by ="quadrant")
gg<- ggplot() +
geom_rect(data = dt.data,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = med ))
gg+
scale_fill_manual(values = Mycol ) +
labs(x="xx", y="yy", title='US. Growth Quadrant') +
geom_point(data = dt,
aes(x = xx,
y = yy,
color = 'blue'))
While I think the code could be much cleaner, I left it unchanged to the extent possible - there were a few mistakes (e.g., with the variables x and y) that I had to correct to be able to run the code. Now as to your questions:
You can tell R to treat a variable as a factor with fill = as.factor(med). In addition, I had to adjust scale_fill_manual(values = Mycol$col) to select the colors defined in variable col of df Mycol.
To make the scatters blue, I took the color = 'blue' outside of the aes() in the geom_point().
I used annotate() to label the corners of the plot, which relies on manually defining the x and y coordinates. I am sure there are other, potentially better (and automated) solutions out there.
Full code for the plot (taking your data):
ggplot() +
geom_rect(data = dt.data,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = as.factor(med))) +
scale_fill_manual(values = Mycol$col) +
labs(x="xx", y="yy", title='US. Growth Quadrant') +
geom_point(data = dt,
aes(x = x,
y = y),
color = 'blue') +
annotate(geom = 'text', label = 'A', x = 5, y = 5, size = 8) +
annotate(geom = 'text', label = 'B', x = 5, y = -5, size = 8) +
annotate(geom = 'text', label = 'C', x = -5, y = -5, size = 8) +
annotate(geom = 'text', label = 'D', x = -5, y = 5, size = 8)
Output:
I am creating a lot of charts programmatically in R using ggplot2 and have everything working perfectly except the position of bar labels.
This requires inputs of the plot height, y axis scale and text size.
Example (stripped down) plot code:
testInput <- data.frame("xAxis" = c("first", "second", "third"), "yAxis" = c(20, 200, 60))
# Changeable variables
yMax <- 220
plotHeight <- 5
textSize <- 4
# Set up labels
geomTextList <- {
textHeightRatio <- textSize / height
maxHeightRatio <- yMax / height
values <- testInput[["yAxis"]]
### THIS IS THE FORMULA NEEDING UPDATING
testInput[["labelPositions"]] <- values + 5 # # Should instead be a formula eg. (x * height) + (y * textSize) + (z * yMax)?
list(
ggplot2::geom_text(data = testInput, ggplot2::aes_string(x = "xAxis", y = "labelPositions", label = "yAxis"), hjust = 0.5, size = textSize)
)
}
# Create plot
outputPlot <- ggplot2::ggplot(testInput) +
ggplot2::geom_bar(data = testInput, ggplot2::aes_string(x = "xAxis", y = "yAxis"), stat = "identity", position = "dodge", width = 0.5) +
geomTextList +
ggplot2::scale_y_continuous(breaks = seq(0, yMax, yInterval), limits = c(0, yMax))
ggplot2::ggsave(filename = "test.png", plot = outputPlot, width = 4, height = plotHeight, device = "png")
I have tried various combinations of coefficients for the formula, but suspect that at leat one of the factors isn't linear. If this is purely a statistical problem, I could take it to Cross-Validation, but I wondered whether anyone had already solved this?
If your problem is with offsetting the text to not overlap the bar while dealing with varying text sizes, just use vjust which is already proportional to the text size. A value of 0 will make the bottom of the text touch the bar, and a small negative value will give you some space between them:
testInput <- data.frame("xAxis" = c("first", "second", "third"), "yAxis" = c(20, 200, 60))
# Changeable variables
yMax <- 220
plotHeight <- 5
textSize <- 4
# Set up labels
geomTextList <- {
values <- testInput[["yAxis"]]
testInput[["labelPositions"]] <- values # Use the exact value
list(
ggplot2::geom_text(
data = testInput,
# vjust provides proportional offset
ggplot2::aes_string(x = "xAxis", y = "labelPositions", label = "yAxis"),
hjust = 0.5, vjust = -0.15, size = textSize
)
)
}
# Create plot
outputPlot <- ggplot2::ggplot(testInput) +
ggplot2::geom_bar(data = testInput, ggplot2::aes_string(x = "xAxis", y = "yAxis"), stat = "identity", position = "dodge", width = 0.5) +
geomTextList +
ggplot2::scale_y_continuous(limits = c(0, yMax))
ggplot2::ggsave(filename = "test.png", plot = outputPlot, width = 4, height = plotHeight, device = "png")
In the below example i have a simple plot of mean values with standard deviation error bars for both X and Y axis. I would like to control the error bar width so both axis always plot the same size.
Ideally I would like the bar width/height to be the same size as the symbols (i.e. in this case cex = 3) irrelevant of the final plot dimensions. Is there a way to do this?
# Load required packages:
library(ggplot2)
library(plyr)
# Create dataset:
DF <- data.frame(
group = rep(c("a", "b", "c", "d"),each=10),
Ydata = c(seq(1,10,1),seq(5,50,5),seq(20,11,-1),seq(0.3,3,0.3)),
Xdata = c(seq(1,10,1),seq(5,50,5),seq(20,11,-1),seq(0.3,3,0.3)))
# Summarise data:
subDF <- ddply(DF, .(group), summarise,
X = mean(Xdata), Y = mean(Ydata),
X_sd = sd(Xdata, na.rm = T), Y_sd = sd(Ydata))
# Plot data with error bars:
ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = 1, size = 0.5) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = 1, size = 0.5) +
geom_point(cex = 3)
Looks fine when plotted at 1:1 ratio (500x500):
but the errorbar width/heigh look different when plotted at 600x200
I'd use a size-variable so you can control all of the 3 plot elements at the same time
geom_size <- 3
# Plot data with error bars:
ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = 1, size = geom_size) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = 1, size = geom_size) +
geom_point(cex = geom_size)
This is just building on brettljausn earlier answer. You can control the ratio of your plot with a variable as well. This will only work when you actually save the file with ggsave() not in any preview. I also used size to control the size of the point. It scaled nicer with the error bar ends.
plotheight = 100
plotratio = 3
geomsize = 3
plot = ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = .5 * geomsize / plotratio, size = 0.5) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = .5 * geomsize, size = 0.5) +
geom_point(size = geomsize)
ggsave(filename = "~/Desktop/plot.png", plot = plot,
width = plotheight * plotratio, height = plotheight, units = "mm")
Change the plotheight, plotratio, and geomsize to whatever you need it to be to look good. You will have to change the filename in the one but last line to get the file in the folder of your choice.
How can I use geom_text() to add the "number" field next to each upper error bar. i.e. to the right of the upper error bar.
group= 1:10
count = c(41,640,1000,65,30,4010,222,277,1853,800 )
mu = c(.7143,.66,.6441,.58,.7488,.5616,.5507,.5337,.5513,.5118)
sd = c(.2443,.20,.2843,.2285,.2616,.2365,.2408,.2101,.2295,.1966)
u = mu + 1.96*sd/sqrt(count)
l= mu - 1.96*sd/sqrt(count)
number = c(23,12,35,32,23,63,65,66,66,66)
dat = data.frame(group= group, count = count, mu = mu, sd = sd,u,u,l=l,number = number)
dat[order(dat$count),]
ggplot(dat, aes(y=factor(group), x= mu)) +
geom_point()+
geom_errorbarh(aes(xmax = as.numeric(u),xmin = as.numeric(l)))
aes(label = number, x = as.numeric(u)) to use numbers as the labels and the upper error bar the x coordinates. The y coordinates will remain the same as you've specified in ggplot.
hjust = -1 will justify text labels and shift them right.
Use xlim() to adjust for text that might go over the right edge.
Example:
ggplot(dat, aes(y=factor(group), x= mu)) +
geom_point()+
geom_errorbarh(aes(xmax = as.numeric(u),xmin = as.numeric(l))) +
geom_text(aes(label = number, x = as.numeric(u)), hjust = -1) +
xlim(.49, .85)
I'm doing a comparison chart of two different estimates of the same time series data. I'm filling the area between the two series in green if the original estimate is more than the latest estimate, and red otherwise.
I've got that part working, but I'd like to add a legend for the fill colors. I tried scale_fill_manual towards the bottom of the code, but it doesn't seem to be doing anything?
Here's the code:
library(ggplot2)
library(scales)
library(colorspace)
# Return a polygon that only plots between yLower and yUpper when yLower is
# less than yUpper.
getLowerPolygon = function(x, yLower, yUpper) {
# Create the table of coordinates
poly = data.frame(
x = numeric(),
y = numeric())
lastReversed = (yUpper[1] < yLower[1])
for (r in 1:length(x)) {
reversed = (yUpper[r] < yLower[r])
if (reversed != lastReversed) {
# Between points r-1 and r, the series intersected, so we need to
# change the polygon from visible to invisible or v.v. In either
# case, just add the intersection between those two segments to the
# polygon. Algorithm from:
# https://en.wikipedia.org/wiki/Line-line_intersection
# First line: x1,y1 - x2,y2
x1 = x[r-1]
y1 = yLower[r-1]
x2 = x[r]
y2 = yLower[r]
# Second line: x3,y3 - x4,y4
x3 = x[r-1]
y3 = yUpper[r-1]
x4 = x[r]
y4 = yUpper[r]
# Calculate determinants
xy12 = det(matrix(c(x1, y1, x2, y2), ncol = 2))
xy34 = det(matrix(c(x3, y3, x4, y4), ncol = 2))
x12 = det(matrix(c(x1, 1, x2, 1), ncol = 2))
x34 = det(matrix(c(x3, 1, x4, 1), ncol = 2))
y12 = det(matrix(c(y1, 1, y2, 1), ncol = 2))
y34 = det(matrix(c(y3, 1, y4, 1), ncol = 2))
# Calculate fraction pieces
xn = det(matrix(c(xy12, x12, xy34, x34), ncol = 2))
yn = det(matrix(c(xy12, y12, xy34, y34), ncol = 2))
d = det(matrix(c(x12 , y12, x34, y34), ncol = 2))
# Calculate intersection
xi = xn / d
yi = yn / d
# Add the point
poly[nrow(poly)+1,] = c(xi, yi)
}
lastReversed = reversed
# http://stackoverflow.com/questions/2563824
poly[nrow(poly)+1,] = c(x[r], min(yLower[r], yUpper[r]))
}
poly = rbind(poly, data.frame(
x = rev(x),
y = rev(yUpper)))
return(poly)
}
getComparisonPlot = function(data, title, lower_name, upper_name,
x_label, y_label, legend_title = '') {
lightGreen = '#b0dd8d'
lightRed = '#fdba9a'
darkGray = RGB(.8, .8, .8)
midGray = RGB(.5, .5, .5)
plot = ggplot(data, aes(x = x))
plot = plot + geom_polygon(
aes(x = x, y = y),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
),
fill = lightRed)
coords = getLowerPolygon(data$x, data$yLower, data$yUpper)
plot = plot + geom_polygon(
aes(x = x, y = y),
data = coords,
fill = lightGreen)
plot = plot + geom_line(
aes(y = yUpper, color = 'upper'),
size = 0.5)
plot = plot + geom_line(
aes(y = yLower, color = 'lower'),
size = 0.5)
plot = plot +
ggtitle(paste(title, '\n', sep='')) +
xlab(x_label) +
ylab(y_label) +
scale_y_continuous(labels = comma)
# http://stackoverflow.com/a/10355844/106302
plot = plot + scale_color_manual(
name = legend_title,
breaks = c('upper' , 'lower'),
values = c('gray20', 'gray50'),
labels = c(upper_name, lower_name))
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'),
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
return(plot)
}
print(getComparisonPlot(
data = data.frame(
x = 1:20,
yLower = 1:20 %% 5 + 2,
yUpper = 1:20 %% 7
),
title = 'Comparison Chart',
lower_name = 'Latest',
upper_name = 'Original',
x_label = 'X axis',
y_label = 'Y axis',
legend_title = 'Thing'
))
Here's an image of the chart, I think it is a cool technique:
I'm also open to any other suggestions for improving my ggplot code.
GGplot need you to map polygons fill aesthetic to some variable. OR, in this case, it need just you to "label" the types of polygons (i.e. 'upper' and 'lower'). You do this by passing a string with the respective label for the fill aesthetic of geom_polygon(). What you are doing is passing a giving colour for each polygon and not mapping to anything that the ggplot will understand. It's kind of a "hard coded colour" =P.
Well, here are the changes inside getComparisonPlot:
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "upper"),
data = coords)
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "lower"),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
))
One more thing. Note that the strings passed to fill aesthetic coincides with the breaks passed to the scale_fill_manual. It is necessary to make the legend map things right.
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'), # <<< corresponds to fill aesthetic labels
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
Result:
hope it helps.