adjust a legend position in a barplot - r

I need to adjust the legend for the following barplot in a proper position somewhere outside the plot
COLORS=rainbow(18)
barplot(sort(task3_result$respondents_share,decreasing = TRUE), main="Share of respondents that mentioned brand among top 3 choices ", names.arg=task3_result$brand, col = COLORS)
legend("right", tolower(as.character(task3_result$brand)), yjust=1,col = COLORS, lty=c(1,1) )

Thanks guys, i couldn't solve the problem but i reached my goal using ggplot,
windows(width = 500, height= 700)
ggplot(data = task3_result, aes(x = factor(brand), y = respondents_share, fill = brand)) +
geom_bar(colour = 'black', stat = 'identity') + scale_fill_discrete(name = 'brands') + coord_flip()+
ggtitle('Share of respondents that mentioned brand among top 3 choices') +xlab("Brands") + ylab("Share of respondents")

As DatamineR pointed out, your code is not reproducible as-is (we don't have task3_result), but you can probably accomplish what you're talking about by playing with the x and y arguments to legend() - you can just set the x coordinate to something beyond the edges of the bars, for example. See the documentation: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/legend.html. Also note there the cex argument, because that legend might be bulkier than you want.
Note that you will have to specify a larger plot window in order to leave space for the legend; the relevant help file for that is plot.window: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.window.html
Though you won't want to call plot.window directly - better to pass the relevant arguments to it through the barplot() function. If that doesn't make sense, I recommend you read up on R's base plotting package more generally.

Related

annotate edge of plot without changing plot limits or setting "expand" to 0

I have a ggplot object. I want to use annotate() to add a label to the top of the plot, so that the upper edge of the label is also the upper edge of the plot. When using default settings, this doesn't seem possible: adding an annotation at the upper edge of the plot causes the upper y-limit to increase.
One can get around this problem by specifying scale_y_continuous(expand = c(0, 0)) when creating the plot. But I don't want to do that, partly because I like the y limits created by the default expand setting. Given this constraint, is it possible to use annotate() to position a label at the top of the plot?
Here is a minimal example that demonstrates the problem:
library(ggplot2)
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
yMax <- layer_scales(p)$y$range$range[2] # upper y-limit
p + annotate("label", x = 30, y = yMax, vjust = "top", label = "X")
And here is the result:
You see that the annotation is not at the top of the plot. Instead, consistent with the default "expand" settings, the y-limit of the plot has changed.
Possible solutions:
Figure out the y-limits implied by the default expand setting. Then use scale_y_continuous() to both set the y limits and set expand = c(0, 0). This solution will give me the y limits that I want, and it will place the label appropriately. I know how to implement it, but it seems a bit cumbersome. It would also prevent other annotations at the top of the figure from changing the y-limit of the plot -- and I don't want the solution to affect annotations other than the one that I describe here.
Use annotation_custom(), which doesn't change plot limits in the same way. #baptiste suggests a solution like that in this answer to a different question. But annotation_custom() requires a grob. In practice, the annotations that I use may be more complicated than the label in this example, and I won't always know how to create them as a grob that can be passed to annotation_custom(). In addition, I've had some trouble positioning grobs with annotation_custom() while also specifying their exact sizes.
That said, I am quite open to annotation_custom()-based solutions. And perhaps there are solutions other than the two that I've sketched above.
I've read many SO posts on changing plot limits, but I haven't found any that speak to this problem.
A simple solution for that is setting y = Inf instead of using the maximum value found of the y-axis (yMax). The code would be like that then:
# load library
library(ggplot2)
# load data
data(mtcars)
# define plot
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + annotate("label", x = 30, y = Inf, vjust = "top", label = "X")
Here is the output:
Let me know if this is what you're looking for.
Does this help?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
geom_text(label = "X", x = 30, y = max(mtcars$wt))

Default spacing of grouped boxplots in ggplot2: how to derive correct position_dodge width to line up geoms?

this may be a dupe but I have not found the exact solution I need here. I need the answer more for pedagogical purposes: I have made a plot that looks the way I want, but I wanted to explain to ggplot beginners exactly why it works. The question is, why does a position_dodge(width = 0.75) argument make the points from stat_summary line up with the grouped boxplot? I found this number by trial and error but I cannot find the default spacing value that causes the 0.75 width to be "correct." Where is this value found?
reprex
set.seed(1)
g1mean <- rep(1:4, times=10)
g2mean <- rep(1:4, each=10)
y <- rnorm(n = length(g1mean), mean = g1mean+g2mean, sd = 2)
dat <- data.frame(g1=factor(g1mean), g2=factor(g2mean), y=y)
library(ggplot2)
ggplot(dat, aes(x=g1, fill=g2, y=y)) +
geom_boxplot() +
stat_summary(fun = mean, geom = 'point', color = 'blue', position = position_dodge(width = 0.75))
result
This looks fine but how can I programmatically determine the optimal width for position_dodge to make the geoms line up?
First of all, it actually looks like your points are not quite lined up with the center of each box.... width= should be just about 0.84 to make it perfect.
But that's not really the answer to your question. The answer to your question is to realize that there is, in fact, a position=position_dodge() applied to the geom_boxplot call as well. ggplot2 tries to be intelligent, and when you supply a fill= aesthetic to use, ggplot2 realizes that means you want to use dodging for the boxplot geom. Do not expect this behavior for all geoms by default, but that's the case for boxplots.
The real answer here is that in order to make your points line up between the two, you should supply the same value for position= to both. You can even specify this outside the ggplot call:
pos <- position_dodge(width=0.9)
ggplot(dat, aes(x=g1, fill=g2, y=y)) +
geom_boxplot(position=pos) +
stat_summary(fun = mean, geom = 'point', color = 'blue', position = pos)
So... why is the default dodge width somewhere around 0.85 or 0.84? Beats me. Gotta start somewhere? It's more important to know how to control it. You will want better control especially if you start to define the width of your boxplots with width=. dodge width = geom width will give you dodging so that the boxes exactly touch each other.

Multiple Splines using ggplot2 + Different colours + Line width + Custom X-axis markings

I have a two small sets of points, viz. (1,a1),...,(9,a9) and (1,b1),...,(9,b9). I'm trying to interpolate these two set of points separately by using splines with the help of ggplot2. So, what I want is 2 different splines curves interpolating the two sets of points on the same plot (Refer to the end of this post).
Since I have a very little plotting experience using ggplot2, I copied a code snippet from this answer by Richard Telford. At first, I stored my Y-values for set of points in two numeric variables A and B, and wrote the following code :
library(ggplot2)
library(plyr)
A <- c(a1,...,a9)
B <- c(b1,...,b9)
d <- data.frame(x=1:9,y=A)
d2 <- data.frame(x=1:9,y=B)
dd <- rbind(cbind(d, case = "d"), cbind(d2, case = "d2"))
ddsmooth <- plyr::ddply(dd, .(case), function(k) as.data.frame(spline(k)))
ggplot(dd,aes(x, y, group = case)) + geom_point() + geom_line(aes(x, y, group = case), data = ddsmooth)
This produces the following output :
Now, I'm seeking for an almost identical plot with the following customizations :
The two spline curves should have different colours
The line width should be user's choice (Like we do in plot function)
A legend (Specifying the colour and the corresponding attribute)
Markings on the X-axis should be 1,2,3,...,9
Hoping for a detailed solution to my problem, though any kind of help is appreciated. Thanks in advance for your time and help.
You have already shaped your data correctly for the plot. It's just a case of associating the case variable with colour and size scales.
Note the following:
I have inferred the values of A and B from your plot
Since the lines are opaque, we plot them first so that the points are still visible
I have included size and colour parameters to the aes call in geom_line
I have selected the colours by passing them as a character vector to scale_colour_manual
I have also selected the sizes of the lines by calling scale_size_manual
I have set the x axis breaks by adding a call to scale_x_continuous
The legend has been added automatically according to the scales used.
ggplot(dd, aes(x, y)) +
geom_line(aes(colour = case, size = case, linetype = case), data = ddsmooth) +
geom_point(colour = "black") +
scale_colour_manual(values = c("red4", "forestgreen"), name = "Legend") +
scale_size_manual(values = c(0.8, 1.5), name = "Legend") +
scale_linetype_manual(values = 1:2, name = "Legend") +
scale_x_continuous(breaks = 1:9)
Created on 2020-07-15 by the reprex package (v0.3.0)

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

cex equivalent in ggplot2

In my work, I frequently display scatterplots using text labels. That's meant a plot() command, followed by text(). I used cex to adjust to font size to what I wanted it very quickly.
I created a text scatterplot very quickly using qplot. But I can't adjust the size fast. Here's a silly code example:
data(state)
qplot(Income, Population,
data=as.data.frame(state.x77),
geom=c("smooth", "text"), method="lm",
label=state.abb)
Whereas in the old days I'd do:
plot(xlim=range(Income), ylim=range(Population),
data=state.x77, type="n")
text(Income, Population, state.abb,
data=state.x77, cex=.5)
If I wanted the text size halved from what I saw at the defaults (oh, and I'd have to do a linear regression manually and add abline() to get the regression line -- nice to do it all in one via ggplot2).
I know I can add a size adjustment with size, but it's not a relative size adjustment like I'm used to. Hadley tweeted me to say that size is measured in mm, which isn't fully intuitive to me. Since I often adjust the size of the plot, either in R or in LaTeX, an absolute scale isn't as useful to me.
I must be missing something really simple. What is it?
I think you are tyring to adjust the size of the text itself, not the x-axis, right?
Here's an approach using the ggplot() command.
ggplot(data = as.data.frame(state.x77), aes(x = Income, y = Population)) +
geom_smooth(method = "lm", se = FALSE) +
geom_text(aes(label = state.abb), size = 2.5)
qp <- qplot(Income, Population,data=as.data.frame(state.x77),
geom=c("smooth","text"),
method="lm",
label=state.abb)
qp + opts(axis.text.x = theme_text(size = 5))
I think Chase is probably right about wanting points as "labels":
qp <- qplot(Income, Population,data=as.data.frame(state.x77),
geom="smooth",method="lm",label=state.abb)
qp + geom_text(aes(label = state.abb), size = 2.5)
If "text" is given in the geom argument to qplot the default size is used and then gets overwritten (or underwritten as it were in this case). Give Chase the check. (Edit: should make size 2.5)
Edit2: Took digging but I found the way to get ggplot2 to cough up some of its defaults:
https://github.com/hadley/ggplot2/blob/master/R/geom-text.r
GeomText$new()$geom$default_aes
proto method (instantiated with ): function (.)
aes(colour = "black", size = 5, angle = 0, hjust = 0.5, vjust = 0.5,
alpha = 1)
There's got to be a better way....
qp <- qplot(Income, Population,data=as.data.frame(state.x77),
geom="smooth",method="lm",label=state.abb)
qp + geom_text(aes(label = state.abb, cex = 1.2))
Add cex inside aes will get what you want, as quoted from:
aes creates a list of unevaluated expressions. This function also performs partial name matching, converts color to colour, and old style R names to ggplot names (eg. pch to shape, cex to size)
http://docs.ggplot2.org/current/aes.html

Resources