How to set automatic label position based on box height - r

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?

A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

Related

Reduce distance in plot X labels (R: ggplot2)

This is my dataframe:
df = data.frame(info=1:30, type=c(replicate(5,'A'), replicate(5,'B')), group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3')))
I want to make a jitter plot of my data distinguished by group (X-label) and type (colour):
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2), cex=2)+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue"))
How can I reduce the distance between the X-labels (D1, D2, D3) in the representation?
P.D. I want to do it even if I left a blank space in the graphic
Here are a few options.
# Setting up the plot
library(ggplot2)
df <- data.frame(
info=1:30,
type=c(replicate(5,'A'), replicate(5,'B')),
group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3'))
)
p <- ggplot(df, aes(group, info, colour = type, shape = type))
Option 1: increase the dodge distance. This won't put the labels closer, but it makes better use of the space available so that the labels appear less isolated.
p +
geom_point(position = position_dodge(width = 0.9))
Option 2: Expand the x-axis. Increasing the expansion factor from the default 0.5 to >0.5 increases the space at the ends of the axis, putting the labels closer.
p +
geom_point(position = position_dodge(0.2)) +
scale_x_discrete(expand = c(2, 0))
Option 3: change the aspect ratio. Depending on the plotting window size, this also visually puts the x-axis labels closer together.
p +
geom_point(position = position_dodge(0.2)) +
theme(aspect.ratio = 2)
Created on 2021-06-25 by the reprex package (v1.0.0)
Try adding coord_fixed(ratio = 0.2) and play around with the ratio.
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2))+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue")) + coord_fixed(ratio = 0.2)
The simplest solution is to resize the plot. For example if you follow your command with ggsave("my_plot.pdf", width = 3, height = 4.5) it looks like this:
Or in an Rmd file you can control the dimensions by various means: see this link.

How to properly form ggplot graphs, without cutting off important parts of the graph?

I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.

Spacing between groups of bars in histogram

When I produce histograms in ggplot2 where the bar positions are dodge, I expect something like this where there is space between the groups of bars (i.e. notice the white space between each groups of red/green pairs):
I'm having a hard time producing the same effect when I build a histogram with continuous data. I can't seem to add space between the groups of bars, and instead, everything gets squashed together. As you can see, it makes it visually difficult to compare the red/green pairs:
To reproduce my problem, I created a sample data set here: https://www.dropbox.com/s/i9nxzo1cmbwwfsa/data.csv?dl=0
Code to reproduce:
data <- read.csv("https://www.dropbox.com/s/i9nxzo1cmbwwfsa/data.csv?dl=1")
ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge()) +
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
How can I add white space between the groups of red/green pairs?
Alternative 1: overlapping bars with geom_histogram()
From ?position_dodge():
Dodging preserves the vertical position of an geom while adjusting the horizontal position
This function accepts a width argument that determines the space to be created.
To get what I think you want, you need to supply a suitable value to position_dodge(). In your case, where binwidth=1e5, you might play with e.g. 20% of that value: position=position_dodge(1e5-20*(1e3)).
(I left the rest of your code untouched.)
You could use the following code:
ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge(1e5-20*(1e3))) + ### <-----
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
yielding this plot:
Alternative 2: use ggplot-object and render with geom_bar
geom_histogram() was not designed to produce what you want. geom_bar() on the other hand provides the flexibility you need.
You can generate the histogram with geom_histogram and save it in an ggplot-object. Then, you generate the plotting information with ggplot_build(). Now,
you may use the histogram plotting information in the object to generate a bar plot with geom_bar()
## save ggplot object to h
h <- ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge(1e5-20*(1e3)))
## get plotting information as data.frame
h_plotdata <- ggplot_build(h)$data[[1]]
h_plotdata$group <- as.factor(h_plotdata$group)
levels(h_plotdata$group) <- c("May 2018", "May 2019")
## plot with geom_bar
ggplot(h_plotdata, aes(x=x, y=y, fill = group)) +
geom_bar(stat = "identity") +
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
yielding this graph:
Please, let me know whether this is what you want.

Scale geom_density to match geom_bar with percentage on y

Since I was confused about the math last time I tried asking this, here's another try. I want to combine a histogram with a smoothed distribution fit. And I want the y axis to be in percent.
I can't find a good way to get this result. Last time, I managed to find a way to scale the geom_bar to the same scale as geom_density, but that's the opposite of what I wanted.
My current code produces this output:
ggplot2::ggplot(iris, aes(Sepal.Length)) +
geom_bar(stat="bin", aes(y=..density..)) +
geom_density()
The density and bar y values match up, but the scaling is nonsensical. I want percentage on the y axes, not well, the density.
Some new attempts. We begin with a bar plot modified to show percentages instead of counts:
gg = ggplot2::ggplot(iris, aes(Sepal.Length)) +
geom_bar(aes(y = ..count../sum(..count..))) +
scale_y_continuous(name = "%", labels=scales::percent)
Then we try to add a geom_density to that and somehow get it to scale properly:
gg + geom_density()
gg + geom_density(aes(y=..count..))
gg + geom_density(aes(y=..scaled..))
gg + geom_density(aes(y=..density..))
Same as the first.
gg + geom_density(aes(y = ..count../sum(..count..)))
gg + geom_density(aes(y = ..count../n))
Seems to be off by about factor 10...
gg + geom_density(aes(y = ..count../n/10))
same as:
gg + geom_density(aes(y = ..density../10))
But ad hoc inserting numbers seems like a bad idea.
One useful trick is to inspect the calculated values of the plot. These are not normally saved in the object if one saves it. However, one can use:
gg_data = ggplot_build(gg + geom_density())
gg_data$data[[2]] %>% View
Since we know the density fit around x=6 should be about .04 (4%), we can look around for ggplot2-calculated values that get us there, and the only thing I see is density/10.
How do I get geom_density fit to scale to the same y axis as the modified geom_bar?
Bonus question: why are the grouping of the bars different? The current function does not have spaces in between bars.
Here is an easy solution:
library(scales) # ! important
library(ggplot2)
ggplot(iris, aes(Sepal.Length)) +
stat_bin(aes(y=..density..), breaks = seq(min(iris$Sepal.Length), max(iris$Sepal.Length), by = .1), color="white") +
geom_line(stat="density", size = 1) +
scale_y_continuous(labels = percent, name = "percent") +
theme_classic()
Output:
Try this
ggplot2::ggplot(iris, aes(x=Sepal.Length)) +
geom_histogram(stat="bin", binwidth = .1, aes(y=..density..)) +
geom_density()+
scale_y_continuous(breaks = c(0, .1, .2,.3,.4,.5,.6),
labels =c ("0", "1%", "2%", "3%", "4%", "5%", "6%") ) +
ylab("Percent of Irises") +
xlab("Sepal Length in Bins of .1 cm")
I think your first example is what you want, you just want to change the labels to make it seem like it is percents, so just do that rather than mess around.

How to increase the space between the bars in a bar plot in ggplot2?

How can I increase the space between the bars in a bar plot in ggplot2 ?
You can always play with the width parameter, as shown below:
df <- data.frame(x=factor(LETTERS[1:4]), y=sample(1:100, 4))
library(ggplot2)
ggplot(data=df, aes(x=x, y=y, width=.5)) +
geom_bar(stat="identity", position="identity") +
opts(title="width = .5") + labs(x="", y="") +
theme_bw()
Compare with the following other settings for width:
So far, so good. Now, suppose we have two factors. In case you would like to play with evenly spaced juxtaposed bars (like when using space together with beside=TRUE in barplot()), it's not so easy using geom_bar(position="dodge"): you can change bar width, but not add space in between adjacent bars (and I didn't find a convenient solution on Google). I ended up with something like that:
df <- data.frame(g=gl(2, 1, labels=letters[1:2]), y=sample(1:100, 4))
x.seq <- c(1,2,4,5)
ggplot(data=transform(df, x=x.seq), aes(x=x, y=y, width=.85)) +
geom_bar(stat="identity", aes(fill=g)) + labs(x="", y="") +
scale_x_discrete(breaks = NA) +
geom_text(aes(x=c(sum(x.seq[1:2])/2, sum(x.seq[3:4])/2), y=0,
label=c("X","Y")), vjust=1.2, size=8)
The vector used for the $x$-axis is "injected" in the data.frame, so that so you change the outer spacing if you want, while width allows to control for inner spacing. Labels for the $x$-axis might be enhanced by using scale_x_discrete().
For space between factor bars use
ggplot(data = d, aes(x=X, y=Y, fill=F))
+ geom_bar(width = 0.8, position = position_dodge(width = 0.9))
The width in geom_bar controls the bar width in relation to the x-axis while the width in position_dodge control the width of the space given to both bars also in relation to the x-axis. Play around with the width to find one that you like.
Thank you very much chl.! I had the same problem and you helped me solving it. Instead of using geom_text to add the X-labels I used scale_x_continuous (see below)
geom_text(aes(x=c(sum(x.seq[1:2])/2, sum(x.seq[3:4])/2), y=0,
label=c("X","Y")), vjust=1.2, size=8)
replaced by
scale_x_continuous(breaks=c(mean(x.seq[1:2]), mean(x.seq[3:4])), labels=c("X", "Y"))
For space between POSIXlt bars you need adjust the width from the number of seconds in a day
# POSIXlt example: full & half width
d <- data.frame(dates = strptime(paste(2016, "01", 1:10, sep = "-"), "%Y-%m-%d"),
values = 1:10)
ggplot(d, aes(dates, values)) +
geom_bar(stat = "identity", width = 60*60*24)
ggplot(d, aes(dates, values)) +
geom_bar(stat = "identity", width = 60*60*24*0.5)

Resources