Secondary / Dual axis - ggplot - r

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))

In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

Related

ggplot2 geom_linerange remove whitespace between rows

Am attempting to create a plot similar to a strip chart recorder showing outage data. Outage Severity is Major and Minor. Plot has a large amount of vertical white space between the two rows and before and after that I would like to remove to create a compact two-row chart.
dataframe is:
> head(dfsub)
StartDateTime EndDateTime Outage.DUR Outage.Severity
1 2021-07-01T00:23:33.0000000 2021-07-01T00:25:26.0000000 1.8833333 Minor
2 2021-07-01T00:25:26.0000000 2021-07-01T00:31:33.0000000 6.1166667 Major
3 2021-07-01T00:31:33.0000000 2021-07-01T00:40:34.0000000 9.0166667 Major
4 2021-07-01T00:40:34.0000000 2021-07-01T00:42:57.0000000 2.3833333 Minor
5 2021-07-01T00:42:57.0000000 2021-07-01T00:43:49.0000000 0.8666667 Minor
6 2021-07-01T00:43:49.0000000 2021-07-01T00:45:35.0000000 1.7666667 Minor
R Code I am running
ggplot(dfsub) +
geom_linerange(aes(y = Outage.Severity,
xmin = StartDateTime,
xmax = EndDateTime,
colour = as.factor(Outage.Severity)
),
show.legend = FALSE,
size = 50) +
scale_color_manual(values = c("red", "yellow")) +
theme(legend.position = "none") +
theme_test()
generates this plot
Two suggestions.
You didn't ask about this, but your x-axis is broken, using time (which is a continuous thing) in a categorical sense. Note that R and ggplot2 are treating the current columns as strings not timestamps. This is easily resolved:
dfsub[c("StartDateTime", "EndDateTime")] <-
lapply(dfsub[c("StartDateTime", "EndDateTime")], as.POSIXct, format="%Y-%m-%dT%H:%M:%OS", tz="UTC")
I don't think you're going to get the fine control over blank space between the reds and yellows using geom_linerange, I suggest geom_rect as an option. With that, remove size=, and we'll need to control ymin= and ymax=. This benefits from setting Outage.Severity to a factor; while not completely necessary, it's common for this work to then come back with "how do I change the order of the y-axis categories?", for which the only (sane) response is to convert them to factors and control their levels=. We also need to add fill=, which geom_linerange did not need.
dfsub$Outage.Severity <- factor(dfsub$Outage.Severity) # add 'levels=' if you want to control the order
From here, knowing that categorical data are plotted on integers, we'll fill the gap between them by extending their rectangles +/- 0.48 (arbitrary, but should likely be close to but not at/beyond 0.5).
ggplot(dfsub) +
geom_rect(aes(ymin = as.numeric(Outage.Severity)-0.48,
ymax = as.numeric(Outage.Severity)+0.48,
xmin = StartDateTime,
xmax = EndDateTime,
colour = Outage.Severity,
fill = Outage.Severity),
show.legend = FALSE) +
scale_y_continuous(breaks = unique(as.numeric(dfsub$Outage.Severity)), labels = unique(dfsub$Outage.Severity)) +
scale_color_manual(values = c("Major"="red", "Minor"="yellow")) +
scale_fill_manual(values = c("Major"="red", "Minor"="yellow")) +
theme(legend.position = "none") +
theme_test()

Y scale on ggplot2 bar graph makes no sense

My bar graph has a weird Y Axis that skips around seemingly at random, from -1.7% to -10.1%, -10.3%, and then to -2%. You can see it below:
Here is my code:
library(ggplot2)
healthd = read.csv("R/states.csv")
states = healthd[[1]]
insuredChange = healthd[[4]]
ggplot(data = healthd, aes(x = states, y = insuredChange)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1))
What's going on here? How do I fix it?
Also, how can I get the x axis labels to all be right justified on the same line?
First - what you present isn't a reproducible example and nobody wants to sign up to access your data to help you out...
In your code:
states = healthd[[1]]
and
insuredChange = healthd[[4]]
are assigning the columns to the global environment - they are not changing the name of the values in your data.frame. When you use ggplot it is looking for columns in your data.frame with the names that don't exist - hence the NULL statement
healthd$states = healthd[[1]]
healthd$insuredChange = healthd[[4]]
will change it to something that should work - though I don't have the data so am not completely sure.
This should now generate the figure you want.
ggplot(healthd, aes(states, insuredChange)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1))

geom_dotplot Gaps Between scale_x_discrete

I am trying to make a histodot using geom_dotplot. For some reason, ggplot is giving me what appear to me are arbitrary breaks in my data along the x axis. My data has been binned seq(0,22000000,500000), so I would like to see gaps in my data where they actually exist. However, I'm struggling to successfully list those breaks(). I wouldn't expect to see my first break until after $6,000,000 with a break until $10,000,000. Bonus points for teaching me why I can't use labels=scales::dollar on my scale_x_discrete.
Here is my data.
library(ggplot2)
data <- read.csv("data2.csv")
ggplot(data,aes(x=factor(constructioncost),fill=designstage)) +
geom_dotplot(binwidth=.8,method="histodot",dotsize=0.75,stackgroups=T) +
scale_x_discrete(breaks=seq(0,22000000,500000)) +
ylim(0,30)
Thanks for any and all help and please, let me know if you have any questions!
Treating the x axis as continuous instead of a factor will give you what you need. However, you experienced the enormous range of your cost variable (0 to 21 million) was making ggplot2 choke when you try to treat is as continuous.
Because your values (other than 0) are all at least 500000, dividing the cost by 100000 will put things on a scale that ggplot2 can handle but also give you the variable spacing you want.
Note I had to play around with binwidth when I changed the scale of the data.
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks=seq(0, 220, 5)) +
ylim(0,30)
Then you can change the labels to reflect the whole dollar amounts if you'd like. The number are so big you'll likely need to either add fewer or change the orientation (or both).
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks = seq(0, 220, 10),
labels = scales::dollar(seq(0, 22000000, 1000000))) +
ylim(0,30) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

adjust a legend position in a barplot

I need to adjust the legend for the following barplot in a proper position somewhere outside the plot
COLORS=rainbow(18)
barplot(sort(task3_result$respondents_share,decreasing = TRUE), main="Share of respondents that mentioned brand among top 3 choices ", names.arg=task3_result$brand, col = COLORS)
legend("right", tolower(as.character(task3_result$brand)), yjust=1,col = COLORS, lty=c(1,1) )
Thanks guys, i couldn't solve the problem but i reached my goal using ggplot,
windows(width = 500, height= 700)
ggplot(data = task3_result, aes(x = factor(brand), y = respondents_share, fill = brand)) +
geom_bar(colour = 'black', stat = 'identity') + scale_fill_discrete(name = 'brands') + coord_flip()+
ggtitle('Share of respondents that mentioned brand among top 3 choices') +xlab("Brands") + ylab("Share of respondents")
As DatamineR pointed out, your code is not reproducible as-is (we don't have task3_result), but you can probably accomplish what you're talking about by playing with the x and y arguments to legend() - you can just set the x coordinate to something beyond the edges of the bars, for example. See the documentation: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/legend.html. Also note there the cex argument, because that legend might be bulkier than you want.
Note that you will have to specify a larger plot window in order to leave space for the legend; the relevant help file for that is plot.window: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.window.html
Though you won't want to call plot.window directly - better to pass the relevant arguments to it through the barplot() function. If that doesn't make sense, I recommend you read up on R's base plotting package more generally.

How to adjust figure settings in plotmatrix?

Can I adjust the point size, alpha, font, and axis ticks in a plotmatrix?
Here is an example:
library(ggplot2)
plotmatrix(iris)
How can I:
make the points twice as big
set alpha = 0.5
have no more than 5 ticks on each axis
set font to 1/2 size?
I have fiddled with the mapping = aes() argument to plotmatrix as well as opts() and adding layers such as + geom_point(alpha = 0.5, size = 14), but none of these seem to do anything. I have hacked a bit of a fix to the size by writing to a large pdf (pdf(file = "foo.pdf", height = 10, width = 10)), but this provides only a limited amount of control.
Pretty much all of the ggplot2 scatterplot matrix options are still fairly new and can be a bit experimental.
But the facilities in GGally do allows you to construct this kind of plot manually, though:
custom_iris <- ggpairs(iris,upper = "blank",lower = "blank",
title = "Custom Example")
p1 <- ggplot(iris,aes(x = Sepal.Length,y = Sepal.Width)) +
geom_point(size = 1,alpha = 0.3)
p2 <- ggplot(iris,aes(x = Sepal.Width,y = Sepal.Length)) +
geom_point()
custom_iris <- putPlot(custom_iris,p1,2,1)
custom_iris <- putPlot(custom_iris,p2,3,2)
custom_iris
I did that simply by directly following the last example in ?ggpairs.

Resources