geom_text() size definitions in ggplot2 - r

I'm trying to vary the size of a geom_text() layer in a ggplot so that the labels are always narrower than a given range. The ranges are defined in the data, but what I don't know is how to scale the label to be narrower than that, without a ton of trial and error.
What I hope is that I can construct a function of label size and nchar(label) (realizing character width varies a bit) that would return a width that I could compare to the shape width, and scale down until no longer necessary.
Are the ggplot label sizes defined as a number of pixels, percentage of the plot height, or something else like that?

would this be a helpful place to start? (if not please feel free to delete my post). You add your ranges to ranges = rnorm(foo, 5, 1).
library(ggplot2)
library(directlabels)
set.seed(67)
foo <- 8
df <- data.frame(x = rnorm(foo, 1, .5), y=rnorm(foo, 1, .5), ranges = rnorm(foo, 5, 1), let=letters[1:foo])
p <- ggplot(df, aes(x, y, color=let)) + geom_point() + scale_colour_brewer(palette=5)
direct.label(p,
list("top.points", rot=0, cex=df[,3],
fontface="bold", fontfamily="serif", alpha=0.8))

Related

Adjust the size of panels plotted through ggplot() and facet_grid

I have a dataframe to plot multiple panels with the same x axis and different y axis. The number of columns may vary. I use ggplot and facet_grid to plot these panels.
The problems is that the size of the overall plot seems to be the same, thus when more panels appear, the size of each one is very small.
Are there any ways to fix the size of each panel and the overall size of the figure vary depending on the number of columns and panels? Thanks.
I'm sorry for unintended self promotion, but I wrote a function a while back to more precisely control the sizes of panels. I've put it in a package on github CRAN. I'm not sure how it'd work with a shiny app, but here is how you'd work with it in ggplot2.
You can control the relative sizes of the width/height by setting plain numbers for the rows/colums.
library(ggplot2)
library(ggh4x)
df <- expand.grid(1:12, 3:5)
df$x <- 1
ggplot(df, aes(x, x)) +
geom_point() +
facet_grid(Var1 ~ Var2) +
force_panelsizes(rows = 1, cols = 2, TRUE)
You can also control the absolute sizes of the panel by setting an unit object. Note that you can set them for individual rows and columns too if you know the number of panels in advance.
ggplot(df, aes(x, x)) +
geom_point() +
facet_grid(Var1 ~ Var2) +
force_panelsizes(rows = unit(runif(12) + 0.1, "cm"),
cols = unit(c(1, 5, 2), "cm"),
TRUE)
Created on 2020-05-05 by the reprex package (v0.3.0)
Hope that helped.

Can I set a minimum font size in geom_text?

I am constructing a scatter plot with variables x and y. Each point is labelled using geom_text. The size of geom_text is controlled by a third variable z.
Is there a way to specify the minimum acceptable font size? I have looked at this question but it only discusses how to set the size to a fixed value.
In the example below, I have reproduced the issue using mtcars, with the size of geom_text controlled by 'disp'. It works, but some of the labels are too small to read (once the value of 'disp' gets lower than about 100).
library(ggplot2)
ggplot(mtcars, aes(y=mpg, x=cyl)) + geom_text(aes(label=rownames(mtcars),size=disp))
I'd like to be able to specify, for example, that the size is controlled by the value of 'disp', but that it should be no smaller than 3.
Obviously this would mean that the larger text was scaled up too.
You can set the range in scale_size
ggplot(mtcars, aes(y=mpg, x=cyl)) +
geom_text(aes(label=rownames(mtcars),size=disp)) +
scale_size(range = c(6, 9))
# a little more reasonable
ggplot(mtcars, aes(y=mpg, x=cyl)) +
geom_text(aes(label=rownames(mtcars),size=disp)) +
scale_size(range = c(2, 5))

How to align the bars of a histogram with the x axis?

Consider this simple example
library(ggplot2)
dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
ggplot(dat, aes(x = number)) + geom_histogram()
See how the bars are weirdly aligned with the x axis? Why is the first bar on the left of 5.0 while the bar at 10.0 is centered? How can I get control over that? For instance, it would make more sense (to me) to have the bar starting on the right of the label.
Why are the bars "weirdly aligned"?
Let me start by explaining, why your code leads to weirdly aligned bars. This has to do with the way a histogram is constructed. First, the x-axis is split up into intervals and then, the number of values in each interval is counted.
By default, ggplot splits the data up into 30 bins. It even spits out a message that says so:
stat_bin() using bins = 30. Pick better value with binwidth.
The default number of is not always a good choice. In your case, where all the data points are integers, one might want to choose the boundaries of the bins as 5, 6, 7, 8, ... or 4.5, 5.5, 6.5, ..., such that each bin contains exactly one integer value. You can obtain the boundaries of the bins that have been used in the plot as follows:
data <- data.frame(number = c(5, 10, 11 ,12, 12, 12, 13, 15, 15))
p <- ggplot(data, aes(x = number)) + geom_histogram()
ggplot_build(p)$data[[1]]$xmin
## [1] 4.655172 5.000000 5.344828 5.689655 6.034483 6.379310 6.724138 7.068966 7.413793
## [10] 7.758621 8.103448 8.448276 8.793103 9.137931 9.482759 9.827586 10.172414 10.517241
## [19] 10.862069 11.206897 11.551724 11.896552 12.241379 12.586207 12.931034 13.275862 13.620690
## [28] 13.965517 14.310345 14.655172
As you can see, the boundaries of the bins are not chosen in a way that would lead to a nice alignment of the bars with integers.
So, in short, the reason for the weird alignment is that ggplot simply uses a default number of 30 bins, which is not suitable, in your case, to have bars that are nicely aligned with integers.
There are (at least) two ways to get nicely aligned bars that I will discuss in the following
Use a bar plot instead
Since you have integer data, a histogram may just not be the appropriate choice of visualisation. You could instead use geom_bar(), which will lead to bars that are centered on integers:
ggplot(data, aes(x = number)) + geom_bar() + scale_x_continuous(breaks = 1:16)
You could move the bars to the right of the integers by adding 0.5 to number:
ggplot(data, aes(x = number + 0.5)) + geom_bar() + scale_x_continuous(breaks = 1:16)
Create a histogram with appropriate bins
If you nevertheless want to use a histogram, you can make ggplot to use more reasonable bins as follows:
ggplot(data, aes(x = number)) +
geom_histogram(binwidth = 1, boundary = 0, closed = "left") +
scale_x_continuous(breaks = 1:16)
With binwidth = 1, you override the default choice of 30 bins and explicitly require that bins should have a width of 1. boundary = 0 ensures that the binning starts at an integer value, which is what you need, if you want the integers to be to the left of the bars. (If you omit it, bins are chosen such that the bars are centered on integers.)
The argument closed = "left" is a bit more tricky to explain. As I described above, the boundaries of the bins are now chosen to be 5, 6, 7, .... The question is now, in which bin, e.g., 6 should be? It could be either the first or second one. This is the choice that is controlled by closed: if you set it to "right" (the default), then the bins are closed on the right, meaning that the right boundary of the bin will be included, while the left boundary belongs to the bin to the left. So, 6 would be in the first bin. On the other hand, if you chose "left", the left boundary will be part of the bin and 6 would be in the second bin.
Since you want the bars to be to the left of the integers, you need to pick closed = "left".
Comparison of the two solutions
If you compare the histogram with the bar plot, you will notice two differences:
There is a little gap between the bars in the bar plot, while they touch in the histogram. You could make the bars touch in the former by using geom_bar(width = 1).
The right most bar is between 15 and 16 for the bar plot, while it is between 14 and 15 for the histogram. The reason is that while for all the bins only the left boundary is part of the bin, for the right most bin, both boundaries are included.
This will center the bar on the value
data <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
ggplot(data,aes(x = number)) + geom_histogram(binwidth = 0.5)
Here is a trick with the tick label to get the bar align on the left..
But if you add other data, you need to shift them also
ggplot(data,aes(x = number)) +
geom_histogram(binwidth = 0.5) +
scale_x_continuous(
breaks=seq(0.75,15.75,1), #show x-ticks align on the bar (0.25 before the value, half of the binwidth)
labels = 1:16 #change tick label to get the bar x-value
)
other option: binwidth = 1, breaks=seq(0.5,15.5,1) (might make more sense for integer)
On top of #Stibu's great answer, note that since ggplot2 3.4.0, geom_col and geom_bar can now take a new just argument to place the bars / cols to the left or right of the x-axis. 0.5 (the default) will place the columns in the center, 0 on the right, and 1 on the left:
library(patchwork)
library(ggplot2)
plot1 <- ggplot(dat, aes(x = number)) +
geom_bar(just = 0) +
labs(title = "with just = 0") +
scale_x_continuous(breaks = 1:16)
plot2 <- ggplot(dat, aes(x = number)) +
geom_bar(just = 1) +
labs(title = "with just = 1") +
scale_x_continuous(breaks = 1:16)
plot1 + plot2
This worked for me
+ scale_x_continuous(limits = c(0, NA))
From ?scale_x_continuous, limits is:
One of:
NULL to use the default scale range
A numeric vector of length two providing limits of the scale. Use NA
to refer to the existing minimum or maximum
A function that accepts the existing (automatic) limits and returns
new limits Note that setting limits on positional scales will remove
data outside of the limits. If the purpose is to zoom, use the limit
argument in the coordinate system (see coord_cartesian()).
library(ggplot2)
dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
#I have added bins=10 to control too many bins, by default it takes 30
#then it is difficult to read the labels
p1 <- ggplot(dat, aes(x = number)) + geom_histogram(bins = 10, color="black")
#use ggplot_build to get access to bin details, subsetting to [5] is used to
#get max of each bin, you can use 3 to get centre, 4 to get left edge etc
#to see all the coponent of this chart, you can just run
#ggplot_build(p1)$data[[1]]
binDetails <- round(ggplot_build(p1)$data[[1]][5], digits = 3)
Scalexx <- scale_x_continuous(breaks = binDetails$xmax)
#final chart
p1+Scalexx
Please visit below link to see the same method as video and upvote if it helps:
https://www.youtube.com/watch?v=Za8bTDvmPLk
By using this method, we do not need to count the bin details manually. Please comment if any questions.

howto: Automatically set fixed coordinate ratio (coord_fixed) when x- and y-axis are on different scales?

My goal is to fix the coordinate aspect ratio of a plot from ggplot2 via coord_fixed(). I thought coord_fixed(ratio=1) did the job independently of the scales of the x- or y-axis. My intuition: the argument ratio refers to the ratio of the total range of coordinate x to the total range of coordinate y. Implying that a ratio of 1 always means that the x-axis will be as long as the y-axis in the plot.
Yet with x-coordinates in the 1000s and y-coordinates e.g. percent, coord_fixed does not behave not as I expect it.
2 Questions:
Can you explain why coord_fixed takes the actual scale of the data into account but not the coordinate length as a whole?
Can I change the coord_fixed programatically to always refer to the whole range of the x- and y-coordinate values?
Here's an illustration
library("ggplot2")
set.seed(123)
df = data.frame(x=runif(11)*1000,y=seq(0,.5,.05))
ggplot(df, aes(x,y)) +geom_point() +coord_fixed(1)
produces
Rescaling the data by the ratio of x- and y-values in coord-fixed solves the issue
ggplot(df, aes(x,y)) +geom_point() +coord_fixed(1*max(df$x)/max(df$y))
However, this is not progammatically. I would have to specify the df$x manually to achieve the desired effect. See question 2: Is there a sensible way to automatize the re-scaling of the coordinates within coord_fixed depending on which data is on the x-/y-axis in my ggplot plot?
Can you plain why coord_fixed takes the actual scale of the data into account but not the coordinate length as a whole?
That's the point of coord_fixed. It's especially useful when, e.g., x and y are measures of length in the same units. (Basically whenever x and y have the same units, coord_fixed with ratio = 1 is what you want.)
For example, if my data is a square and a triangle, coord_fixed is the only way to make the square actually square
shapes <- data.frame(x = c(1, 1, 2, 2, 1, 3, 3, 4, 3),
y = c(1, 2, 2, 1, 1, 1, 2, 1, 1),
name = c(rep("square", 5), rep("isosceles triangle", 4)))
shape.plot <- ggplot(shapes, aes(x = x, y = y, group = name, color = name)) +
geom_path()
shape.plot # distorted
shape.plot + coord_fixed() # square!
Can I change the coord_fixed programatically to always refer to the whole range of the x- and y-coordinate values?
I would recommend not overwriting it, you could try to create your own version much as in your answer (though if you want to pull the appropriate values out of the x and y specifications of aes() you'll have a challenge---and you'll learn more about ggplot's internal workings than I know). However, the default behavior (without specifying any coord) seems to be what you're looking for.
If you compare
# your code
ggplot(df, aes(x,y)) + geom_point() + coord_fixed(1 * max(df$x) / max(df$y))
# no coord at all
ggplot(df, aes(x,y)) + geom_point()
they're basically the same. So, the modification of coord_fixed you seem to be looking for is don't use coord_fixed.
Aspect ratio of plot area (independent of coordinates): don't use coord_fixed
Just found out about this from this semi-related post: if you want a specific aspect ratio of the plot area, you can get it with theme(),
e.g.
p1 <- ggplot(df, aes(x,y)) + geom_point()
p1 + theme(aspect.ratio = 1)
p1 + theme(aspect.ratio = (1 + sqrt(5))/ 2) # golden ratio portrait
p1 + theme(aspect.ratio = 2 / (1 + sqrt(5))) # golden ratio landscape
This is, of course, data-agnostic. I think the take-home message is that if you want the scales of your data taken into account, relative to each other, use coord_fixed. If you want to change the aspect ratio of the plotting area but still fit the data, use theme(aspect.ratio). If you want to change the aspect ratio of a saved file, use the height and width arguments of your saving function.

How to make variable bar widths in ggplot2 not overlap or gap

geom_bar seems to work best when it has fixed width bars - even the spaces between bars seem to be determined by width, according to the documentation. When you have variable widths, however, it does not respond as I would expect, leading to overlaps or gaps between the different bars (as shown here).
To see what I mean, please try this very simple reproducible example:
x <- c("a","b","c")
w <- c(1.2, 1.3, 4) # variable widths
y <- c(9, 10, 6) # variable heights
ggplot() +
geom_bar(aes(x = x, y = y, width = w, fill=x),
stat="identity", position= "stack")
What I really want is for the different bars to be just touching, but not overlapping, like in a histogram.
I've tried adding position= "stack", "dodge", and "fill, but none work. Does the solution lie in geom_histogram or am I just not using geom_bar correctly?
P.s. to see the issue with gaps, try replacing 4 with 0.5 in the above code and see the outcome.
Seems that there isn't any straightforward solution, so we should treat x-axis as continuous in terms of w and manually compute required positions for ticks and bar centers (this is useful):
# pos is an explicit formula for bar centers that we are interested in:
# last + half(previous_width) + half(current_width)
pos <- 0.5 * (cumsum(w) + cumsum(c(0, w[-length(w)])))
ggplot() +
geom_bar(aes(x = pos, width = w, y = y, fill = x), stat = "identity") +
scale_x_continuous(labels = x, breaks = pos)
You can now do this with the mekko package: https://cran.r-project.org/web/packages/mekko/vignettes/mekko-vignette.html

Resources