Related
This is my dataframe:
df = data.frame(info=1:30, type=c(replicate(5,'A'), replicate(5,'B')), group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3')))
I want to make a jitter plot of my data distinguished by group (X-label) and type (colour):
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2), cex=2)+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue"))
How can I reduce the distance between the X-labels (D1, D2, D3) in the representation?
P.D. I want to do it even if I left a blank space in the graphic
Here are a few options.
# Setting up the plot
library(ggplot2)
df <- data.frame(
info=1:30,
type=c(replicate(5,'A'), replicate(5,'B')),
group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3'))
)
p <- ggplot(df, aes(group, info, colour = type, shape = type))
Option 1: increase the dodge distance. This won't put the labels closer, but it makes better use of the space available so that the labels appear less isolated.
p +
geom_point(position = position_dodge(width = 0.9))
Option 2: Expand the x-axis. Increasing the expansion factor from the default 0.5 to >0.5 increases the space at the ends of the axis, putting the labels closer.
p +
geom_point(position = position_dodge(0.2)) +
scale_x_discrete(expand = c(2, 0))
Option 3: change the aspect ratio. Depending on the plotting window size, this also visually puts the x-axis labels closer together.
p +
geom_point(position = position_dodge(0.2)) +
theme(aspect.ratio = 2)
Created on 2021-06-25 by the reprex package (v1.0.0)
Try adding coord_fixed(ratio = 0.2) and play around with the ratio.
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2))+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue")) + coord_fixed(ratio = 0.2)
The simplest solution is to resize the plot. For example if you follow your command with ggsave("my_plot.pdf", width = 3, height = 4.5) it looks like this:
Or in an Rmd file you can control the dimensions by various means: see this link.
I am trying to make a plot in which I show per individual (ID) the mean depths (meanDepth) distribution through boxplots. I want that the width of my boxplots is relative to the sample size, but at the same time, I want to increase, in general, the width of all my boxplots since there is too much blank space among boxplots. I tried this:
Plot <- ggplot(data = Data) +
aes(x=ID, y=meanDepth, col=Site) +
geom_boxplot(size=1.75, varwidth = TRUE, position = position_dodge2(preserve = "single"), width=1.2) +
labs(y="\n Mean depth (m)") +
coord_capped_cart(bottom="both",left="both") +
theme_bw() +
guides(colour = guide_legend(override.aes = list(size=3.5))) +
scale_y_continuous(trans = "reverse", limits = c(55, 0), breaks = round(seq(0,55,55/4),1))
Plot
I have observed than if I used bigger values than 1 for width, the boxplots move along the X-axis obtaining what you see. Why is that?
How can I increase the width of my boxplots at the same time than I do their width proportional to their sample size?
Let me explain in pictures what I mean:
set.seed(1) ## dummy data.frame:
df <- data.frame( value1 = sample(5:15, 20, replace = T), value2 = sample(5:15, 20, replace = T),
var1 = c(rep('type1',10), rep('type2',10)), var2 = c('a','b','c','d'))
## Plot 1
ggplot() +
geom_point(data = df, aes(value1, value2)) +
facet_grid(~var1) +
coord_fixed()
ggsave("plot_2facet.pdf", height=5, units = 'in')
#Saving 10.3 x 5 in image
## Plot 2 which I want to save in a separate file (!)
ggplot() +
geom_point(data = df, aes(value1, value2)) +
facet_grid(~var2) +
coord_fixed()
ggsave("plot_4facet.pdf", height=5, units = 'in')
#Saving 10.3 x 5 in image
Now what happens here, that the devices have the same height, but the plots have different heights. But I would like to get the same height for the plots.
In the code above, I tried to only specify the height, but ggsave then just takes a fixed width dimension for the device.
I tried theme(plot.margin = margin(t=1,b=1)), but this did not change anything.
Taking out coord_fixed() gives plots with the same height:
But I would like to use coord_fixed().
Is there a solution for this, or do I need to "guess" the width dimensions of the device to get the correct plot height?
Cheers
Edit
The plots should ideally be created in separate devices/ files.
This is somewhat tricky with ggplot, so please forgive the long, convoluted, and admittedly a bit hacky answer. The basic problem is that with coord_fixed, the height of the y-axis becomes inextricably linked to the length of the x-axis.
There are two ways we can break this dependency:
by using the expand argument of scale_y_continuous. This allows us to extend the y axis by a given amount beyond the range of the data. The tricky bit is knowing how much to expand it, because this depends in a hard-to-predict way on all elements of the plot, including how many facets there are and the size of axis titles and labels etc.
by allowing the width of the two plots to differ. The tricky thing here is, as above, how to find the correct width as this depends on the various other aspects of the plots.
First I show how we can solve the first version (how much to expand the y-axis). Then using a similar approach and a little extra trickery we can also solve the varying width version.
Solution to finding how much to expand the y-axis
Given the difficulties of predicting how large the plotting area will be (which depnds on the relative sizes of all the elements of the plot), what we can do is to save a dummy plot in which we shade the plot area in black, read the image file back in, then measure the size of the black area to determine how large the plot area is:
1) let's start by assigning your plots to variables
p1 = ggplot(df1) +
geom_point(aes(value1, value2)) +
facet_grid(~var1) +
coord_fixed()
p2 = ggplot(df1) +
geom_point(aes(value1, value2)) +
facet_grid(~var2) +
coord_fixed()
2) now we can save some dummy versions of these plots that only show a black rectangle where the plotting region is:
t_blank = theme(strip.background = element_rect(fill = NA),
strip.text = element_text(color=NA),
axis.title = element_text(color = NA),
axis.text = element_text(color = NA),
axis.ticks = element_line(color = NA))
p1 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') +
t_blank
ggsave(fn1 <- tempfile(fileext = '.png'), height=5, units = 'in')
p2 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') +
t_blank
ggsave(fn2 <- tempfile(fileext = '.png'), height=5, units = 'in')
3) then we read these into an array (just the first color band is enough)
library(png)
p1.saved = readPNG(fn1)[,,1]
p2.saved = readPNG(fn2)[,,1]
4) calculate the height of each plotting area (the black-shaded areas which have a value=zero)
p1.height = diff(row(p1.saved)[range(which(p1.saved==0))])
p2.height = diff(row(p2.saved)[range(which(p2.saved==0))])
5) Find how much we need to expand the plotting area based on these. Note that we subtract the ratio of heights from 1.1 to account for the fact that the original plots were already expanded by the default amount of 0.05 in each direction. Disclaimer -- this formula works on your example. I haven't had time to check it more broadly, and it may yet need adapting to ensure generality for other plots
height.expand = 1.1 - p2.height / p1.height
6) Now we can save the plots using this expansion factor
ggsave("plot_2facet.pdf", p1, height=5, units = 'in')
ggsave("plot_4facet.pdf", p2 + scale_y_continuous(expand=c(height.expand, 0)),
height=5, units = 'in')
Solution to finding how much to alter the width
first, lets set the width of the first plot to what we want
p1.width = 10
Now, using the same approach as in the previous section we find how tall the plotting area is in this plot.
p1 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') +
t_blank
ggsave(fn1 <- tempfile(fileext = '.png'), height=5, width = p1.width, units = 'in')
p1.saved = readPNG(fn1, info = T)[,,1]
p1.height = diff(row(p1.saved)[range(which(p1.saved==0))])
Next, we find the mimimum width the second plot must have to get the same height (note - we look for a minimum here because any greater width than this will not increase the height, which already fiulls the vertical space, but will simply add white space to the left and right)
We will solve for the width using the function uniroot which finds where a function crosses zero. To use uniroot we first define a function that will calculate the height of a plot given its width as an argument. It then returns the difference between that height and the height we want. The line if (x==0) x = -1e-8 in this function is a dirty trick to allow uniroot solve a function that reaches zero, but does not cross it - see here.
fn2 <- tempfile(fileext = '.png')
find.p2 = function(w){
p = p2 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') +
t_blank
ggsave(fn2, p, height=5, width = w, units = 'in')
p2.saved = readPNG(fn2, info = T)[,,1]
p2.height = diff(row(p2.saved)[range(which(p2.saved==0))])
x = abs(p1.height - p2.height)
if (x==0) x = -1e-8
x
}
N1 = length(unique(df$var1))
N2 = length(unique(df$var2))
p2.width = uniroot(find.p2, c(p1.width, p1.width*N2/N1))
Now we are ready to save the plots with the correct widths to ensure they have the same height.
p1
ggsave("plot_2facet.pdf", height=5, width = p1.width, units = 'in')
p2
ggsave("plot_4facet.pdf", height=5, width = p2.width$root, units = 'in')
You can do this (it turns out) using the awesome egg package. I don't actually know how this works, or if it works more generally than this case; I just took a punt on the basis that ggarrange figures out the alignment. If anyone could shed light on this, that'd be great!
library(egg)
getScale <- ggarrange(p1, p2, draw = F, ncol=2)
p1_sc <- ggarrange(p1, heights = getScale$heights[2])
ggsave("plot_2facet.pdf", plot=p1_sc, height=5, units = 'in')
p2_sc <- ggarrange(p2, heights = getScale$heights[2])
ggsave("plot_4facet.pdf", plot=p2_sc, height=5, units = 'in')
Yeah, I really have no idea how this works:
getScale$heights[2]
# [1] max(1*1null, 1*1null)
class(getScale$heights[2])
# [1] "unit.list" "unit"
EDIT ..it does seem to generalise though
p3 <- ggplot() +
geom_point(data = df, aes(value1, value2)) +
facet_wrap(~var2, nrow=2) +
coord_fixed()
getScale <- ggarrange(p1, p2, p3, draw = F, ncol=3)
p1_sc <- ggarrange(p1, heights = getScale$heights[2])
ggsave("plot_2facet.pdf", plot=p1_sc, height=5, units = 'in')
p2_sc <- ggarrange(p2, heights = getScale$heights[2])
ggsave("plot_4facet.pdf", plot=p2_sc, height=5, units = 'in')
p3_sc <- ggarrange(p3, heights = getScale$heights[2])
ggsave("plot_4facet_2row.pdf", plot=p3_sc, height=5, units = 'in')
I am trying to make a labeled bubble plot with ggplot2 in R. Here is the simplified scenario:
I have a data frame with 4 variables: 3 quantitative variables, x, y, and z, and another variable that labels the points, lab.
I want to make a scatter plot, where the position is determined by x and y, and the size of the points is determined by z. I then want to place text labels beside the points (say, to the right of the point) without overlapping the text on top of the point.
If the points did not vary in size, I could try to simply modify the aesthetic of the geom_text layer by adding a scaling constant (e.g. aes(x=x+1, y=y+1)). However, even in this simple case, I am having a problem with positioning the text correctly because the points do not scale with the output dimensions of the plot. In other words, the size of the points remains constant in a 500x500 plot and a 1000x1000 plot - they do not scale up with the dimensions of the outputted plot.
Therefore, I think I have to scale the position of the label by the size (e.g. dimensions) of the output plot, or I have to get the radius of the points from ggplot somehow and shift my text labels. Is there a way to do this in ggplot2?
Here is some code:
# Stupid data
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
# Plot with bad label placement
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I should mention, I tried hjust and vjust inside of geom_text, but it does not produce the desired effect.
# Trying hjust and vjust, but it doesn't look nice
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab), hjust=0, vjust=0.5,
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I managed to get something that works for now, thanks to Henrik and shujaa. I will leave the question open just in case someone shares a more general solution.
Just a blurb of what I am using this for: I am plotting a map, and indicating the amount of precipitation at certain stations with a point that is sized proportionally to the amount of precipitation observed. I wanted to add a station label beside each point in an aesthetically pleasing manner. I will be making more of these plots for different regions, and my output plot may have a different resolution or scale (e.g. due to different projections) for each plot, so a general solution is desired. I might try my hand at creating a custom position_jitter, like baptiste suggested, if I have time during the weekend.
It appears that position_*** don't have access to the scales used by other layers, so it's a no go. You could make a clone of GeomText that shifts the labels according to the size mapped,
but it's a lot of effort for a very kludgy and fragile solution,
geom_shiftedtext <- function (mapping = NULL, data = NULL, stat = "identity",
position = "identity",
parse = FALSE, ...) {
GeomShiftedtext$new(mapping = mapping, data = data, stat = stat, position = position,
parse = parse, ...)
}
require(proto)
GeomShiftedtext <- proto(ggplot2:::GeomText, {
objname <- "shiftedtext"
draw <- function(., data, scales, coordinates, ..., parse = FALSE, na.rm = FALSE) {
data <- remove_missing(data, na.rm,
c("x", "y", "label"), name = "geom_shiftedtext")
lab <- data$label
if (parse) {
lab <- parse(text = lab)
}
with(coord_transform(coordinates, data, scales),
textGrob(lab, unit(x, "native") + unit(0.375* size, "mm"),
unit(y, "native"),
hjust=hjust, vjust=vjust, rot=angle,
gp = gpar(col = alpha(colour, alpha),
fontfamily = family, fontface = fontface, lineheight = lineheight))
)
}
})
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1.2,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z), shape=1) +
geom_shiftedtext(aes(label=lab, size=z),
hjust=0, colour="red") +
scale_size_continuous(range=c(5, 100), guide="none")
This isn't a very general solution, because you'll need to tweak it every time, but you should be able to add to the x value for the text some value that's linear depending on z.
I had luck with
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab, x = x + .06 + .14 * (z - min(z))),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
but, as the font size depends on your window size, you would need to decide on your output size and tweak accordingly. I started with x = x + .05 + 0 * (z-min(z)) and calibrated the intercept based on the smallest point, then when I was happy with that I adjusted the linear term for the biggest point.
Another alternative. Looks OK with your test data, but you need to check how general it is.
dodge <- abs(scale(df$z))/4
ggplot(data = df, aes(x = x, y = y)) +
geom_point(aes(size = z)) +
geom_text(aes(x = x + dodge), label = df$lab, colour = "red") +
scale_size_continuous(range = c(5, 50), guide = "none")
Update
Just tried position_jitter, but the width argument only takes one value, so right now I am not sure how useful that function would be. But I would be happy to find that I am wrong. Example with another small data set:
df3 <- mtcars[1:10, ]
ggplot(data = df3, aes(x = wt, y = mpg)) +
geom_point(aes(size = qsec), alpha = 0.1) +
geom_text(label = df3$carb, position = position_jitter(width = 0.1, height = 0)) +
scale_size_continuous(range = c(5, 50), guide = "none")
I am doing some research on non-defaulters and defaulters with regards to banking. In that context I am plotting their distributions relative to some score in a bar plot. The higher the score, the better the credit rating.
Since the number of defaults is very limited compared to the number of non-defaults plotting the defaults and non-defaults on the same bar plot is not very giving as you hardly can see the defaults. I then make a second bar plot based on the defaulters' scores only, but on the same interval scale as the full bar plot of both the scores of the defaulters and non-defaulters. I would then like to add vertical lines to the first bar plot indicating where the highest defaulter score is located and the lowest defaulter score is located. That is to get a view of where the distribution of the defaulters fit into that of the overall distribution of both defaulters and non-defaulters.
Below is the code I am using replaced with (seeded) random data instead.
library(ggplot2)
#NDS represents non-defaults and DS defaults on the same scale
#although here being just some random normals for the sake of simplicity.
set.seed(10)
NDS<-rnorm(10000,sd=1)-2
DS<-rnorm(100,sd=2)-5
#Cutoffs are constructed such that intervals of size 0.3
#contain all values of NDS & DS
minCutoff<--9.3
maxCutoff<-2.1
#Generate the actual interval "bins"
NDS_CUT<-cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
DS_CUT<-cut(DS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
#Manually generate where to put the vertical lines for min(DS) and max(DS)
minDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[1]
maxDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[32]
#Generate data frame - seems stupid, but makes sense
#when the "real" data is used :-)
NDSdataframe<-cbind(as.data.frame(NDS_CUT),rep(factor("State-1"),length(NDS_CUT)))
colnames(NDSdataframe)<-c("Score","Action")
DSdataframe<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe)<-c("Score","Action")
fulldataframe<-rbind(NDSdataframe,DSdataframe)
attach(fulldataframe)
#Plot the full distribution of NDS & DS
# with geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
# that unfortunately does not show :-(
fullplot<-ggplot(fulldataframe, aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) + geom_bar(position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts (legend.position = "none") + xlab("Scoreinterval") + ylab("Antal pr. interval") + geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
#Generate dataframe for DS only
#It might seem stupid, but again makes sense
#when using the original data :-)
DSdataframe2<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe2)<-c("theScore","theAction")
#Calucate max number of observations to adjust bar plot of DS only
myMax<-max(table(DSdataframe2))+1
attach(DSdataframe2)
#Generate bar plot of DS only
subplot<-ggplot(fulldataframe, aes(theScore, fill=factor(theAction))) + geom_bar (position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts(legend.position = "none") + ylim(0, myMax) + xlab("Scoreinterval") + ylab("Antal pr. interval")
#plot on a grid
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(fullplot, vp = vplayout(1, 1))
print(subplot, vp = vplayout(2, 1))
#detach dataframes
detach(DSdataframe2)
detach(fulldataframe)
Furthermore, if anybody has an idea of how I can align the to plot so that correct intervals are just below/above each other on the grid plot
Hope somebody is able to help!
Thanks in advance,
Christian
Wrap aes around the xintercept in the geom_vline layer:
... + geom_vline(aes(xintercept = minDS_bar)) + geom_vline(aes(xintercept = maxDS_bar))
Question 1:
Since you provide the vertical lines as data, you have to map the aesthetics first, using aes()
fullplot <-ggplot(
fulldataframe,
aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) +
geom_bar(position="stack") +
opts(axis.text.x = theme_text(angle = 45)) +
opts (legend.position = "none") +
xlab("Scoreinterval") +
ylab("Antal pr. interval") +
geom_vline(aes(xintercept = minDS_bar)) +
geom_vline(aes(xintercept = maxDS_bar))
Second question:
To align the plots, you can use the align.plots() function in package ggExtra
install.packages("dichromat")
install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)
ggExtra::align.plots(fullplot, subplot)