ggplot: add manually labelled tick marks on top of automatic tick marks - r

I am trying to highlight the point with the lowest y value by attempting the following:
1) draw a line from this point down to the x-axis and another to the y-axis; and
2) add a manual tick mark with this point's x and y value on the x-axis and y-axis, respectively. This manual tick mark must be added in addition to the automatic tick marks on both axes.
Sample data:
df <- data.frame(x=1:100,y=rnorm(100,10,1))
ggplot(df) +
geom_point(aes(x=x,y=y))
Edit:
Here's an illustration of what I am attempting:

It's unclear exactly what you want this to look like but you could do one of two options. You could either use geom_vline() or geom_segment(). Vline will do a line from the bottom to the top, but it sounds like you may prefer to use segment. Try this:
+ geom_segment(x = min(x), xend = min(x), y = 0, yend = 1)
If you change the yend argument you could make the tick smaller or larger. Drawing one for the max value should be as simple as swapping the min() arguments for max() arguments. Or you could just input the values manually. Alternatively, you could add a vline to go the full height of the panel with:
+ geom_vline(xintercept = min(x))
You can read more about both here. If this doesn't help much, you can provide a proper reprex and maybe a sketch of your desired output we can modify that code to get a bit closer to what you want.
edit:
Writing outside of the plot window is a bit more difficult, but this link may help you. I've tried it on a few and always found that in my cases it was easier to use a different solution. Here's one option:
library(ggplot2)
set.seed(123) # so we have the same toy data
df <- data.frame(x=1:100,y=rnorm(100,10,1))
ggplot(df) +
geom_point(aes(x=x,y=y)) +
geom_segment(x=0, xend=18, y=8.033383, yend=8.033383) + # draw to x axis
geom_segment(x=18, xend=18, y=0, yend=8.033383) + # draw to y axis
annotate("text", 18.2, 8.2, label="(8, 8.03)", size=3) # ordered pair just above it
If you didn't want to draw all the way to the point you could just change the first xend and yend arguments where the x/y start at zero to be come just above the edge of the plot window.

Related

How can I ensure consistent axis lengths between plots with discrete variables in ggplot2?

I've been trying to standardise multiple bar plots so that the bars are all identical in width regardless of the number of bars. Note that this is over multiple distinct plots - faceting is not an option. It's easy enough to scale the plot area so that, for instance, a plot with 6 bars is 1.5* the width of a plot with 4 bars. This would work perfectly, except that each plot has an expanded x axis by default, which I would like to keep.
"The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."
https://ggplot2.tidyverse.org/reference/scale_discrete.html
My problem is that I can't for the life of me work out what '0.6 units' actually means. I've manually measured the distance between the bars and the y axis in various design tools and gotten inconsistent answers, so I can't factor '0.6 units' into my calculations when working out what size the panel windows should be. Additionally I can't find any answers on how many 'units' long a discrete x axis is - I assumed at first it would be 1 unit per category but that doesn't fit with the visuals at all. I've included an image that hopefully shows what I mean - the two graphs
In this image, the top graph has a plot area exactly 1.5* that of the bottom graph. Seeing as it has 6 bars compared with 4, that would mean each bar is the same width, except that that extra space between the axis and the first bar messes this up. Setting expand = expansion(add = c(0, 0)) clears this up but results in not-so-pretty graphs. What I'd like is for the bars to be identical in width between the two plots, accounting for this extra space. I'm specifically looking for a general solution that I can use for future plots, not for the individual solution for this sample. As such, what I'd really like to know is how many 'units' long are these two x axes? Many thanks for any and all help!
Instead of using expansion for the axis, I would probably use the fact that categorical variables are actually plotted on the positive integers on Cartesian co-ordinates. This means that, provided you know the maximum number of columns you are going to use in your plots, you can set this as the range in coord_cartesian. There is a little arithmetic involved to keep the bars centred, but it should give consistent results.
We start with some reproducible data:
library(ggplot2)
set.seed(1)
df <- data.frame(group = letters[1:6], value = 100 * runif(6))
Now we set the value for the maximum number of bars we will need:
MAX_BARS <- 6
And the only thing "funny" about the plot code is the calculation of the x axis limits in coord_cartesian:
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
Now let us remove one factor level and run the exact same plot code:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
You will see the bars remain constant width and centralized, yet the panel size remains fixed.
Created on 2021-11-06 by the reprex package (v2.0.0)

Why is the variable considered continous in legend?

I have used the following code to generate a plot with ggplot:
I want the legend to show the runs 1-8 and only the volumes 12.5 and 25 why doesn't it show it?
And is it possible to show all the points in the plot even though there is an overlap? Because right now the plot only shows 4 of 8 points due to overlap.
OP. You've already been given a part of your answer. Here's a solution given your additional comment and some explanation.
For reference, you were looking to:
Change a continuous variable to a discrete/discontinuous one and have that reflected in the legend.
Show runs 1-8 labeled in the legend
Disconnect lines based on some criteria in your dataset.
First, I'm representing your data here again in a way that is reproducible (and takes away the extra characters so you can follow along directly with all the code):
library(ggplot2)
mydata <- data.frame(
`Run`=c(1:8),
"Time"=c(834, 834, 584, 584, 1184, 1184, 938, 938),
`Area`=c(55.308, 55.308, 79.847, 79.847, 81.236, 81.236, 96.842, 96.842),
`Volume`=c(12.5, 12.5, 12.5, 12.5, 25.0, 25.0, 25.0, 25.0)
)
Changing to a Discrete Variable
If you check the variable type for each column (type str(mydata)), you'll see that mydata$Run is an int and the rest of the columns are num. Each column is understood to be a number, which is treated as if it were a continuous variable. When it comes time to plot the data, ggplot2 understands this to mean that since it is reasonable that values can exist between these (they are continuous), any representation in the form of a legend should be able to show that. For this reason, you get a continuous color scale instead of a discrete one.
To force ggplot2 to give you a discrete scale, you must make your data discrete and indicate it is a factor. You can either set your variable as a factor before plotting (ex: mydata$Run <- as.factor(mydata$Run), or use code inline, referring to aes(size = factor(Run),... instead of just aes(size = Run,....
Using reference to factor(Run) inline in your ggplot calls has the effect of changing the name of the variable to be "factor(Run)" in your legend, so you will have to also add that to the labs() object call. In the end, the plot code looks like this:
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line() +
labs(
x = "Area", y = "Time",
# This has to be changed now
color='Volume'
) +
theme_bw()
Note in the above code I am also not referring to mydata$Run, but just Run. It is greatly preferable that you refer to just the name of the column when using ggplot2. It works either way, but much better in practice.
Disconnect Lines
The reason your lines are connected throughout the data is because there's no information given to the geom_line() object other than the aesthetics of x= and y=. If you want to have separate lines, much like having separate colors or shapes of points, you need to supply an aesthetic to use as a basis for that. Since the two lines are different based on the variable Volume in your dataset, you want to use that... but keep the same color for both. For this, we use the group= aesthetic. It tells ggplot2 we want to draw a line for each piece of data that is grouped by that aesthetic.
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
theme_bw()
Show Runs 1-8 Labeled in Legend
Here I'm reading a bit into what you exactly wanted to do in terms of "showing runs 1-8" in the legend. This could mean one of two things, and I'll assume you want both and show you how to do both.
Listing and showing sizes 1-8 in the legend.
To set the values you see in the scale (legend) for size, you can refer to the various scale_ functions for all types of aesthetics. In this case, recall that since mydata$Run is an int, it is treated as a continuous scale. ggplot2 doesn't know how to draw a continuous scale for size, so the legend itself shows discrete sizes of points. This means we don't need to change Run to a factor, but what we do need is to indicate specifically we want to show in the legend all breaks in the sequence from 1 to 8. You can do this using scale_size_continuous(breaks=...).
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()
Showing all of your runs as points.
The note about showing all runs might also mean you want to literally see each run represented as a discrete point in your plot. For this... well, they already are! ggplot2 is plotting each of your points from your data into the chart. Since some points share the same values of x= and y=, you are getting overplotting - the points are drawn over top of one another.
If you want to visually see each point represented here, one option could be to use geom_jitter() instead of geom_point(). It's not really great here, because it will look like your data has different x and y values, but it is an option if this is what you want to do. Note in the code below I'm also changing the shape of the point to be a hollow circle for better clarity, where the color= is the line around each point (here it's black), and the fill= aesthetic is instead used for Volume. You should get the idea though.
set.seed(1234) # using the same randomization seed ensures you have the same jitter
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_jitter(aes(fill =as.factor(Volume), size = Run), shape=21, color='black') +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", fill='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()

Dual y axis (second axis) use in ggplot2

I come to encounter a problem that using two different data with the help of second axis function as described in this previous post how-to-use-facets-with-a-dual-y-axis-ggplot.
I am trying to use geom_point and geom_bar but the since the geom_bar data range is different it is not seen on the graph.
Here is what I have tried;
point_data=data.frame(gr=seq(1,10),point_y=rnorm(10,0.25,0.1))
bar_data=data.frame(gr=seq(1,10),bar_y=rnorm(10,5,1))
library(ggplot2)
sec_axis_plot <- ggplot(point_data, aes(y=point_y, x=gr,col="red")) + #Enc vs Wafer
geom_point(size=5.5,alpha=1,stat='identity')+
geom_bar(data=bar_data,aes(x = gr, y = bar_y, fill = gr),stat = "identity") +
scale_y_continuous(sec.axis = sec_axis(trans=~ .*15,
name = 'bar_y',breaks=seq(0,10,0.5)),breaks=seq(0.10,0.5,0.05),limits = c(0.1,0.5),expand=c(0,0))+
facet_wrap(~gr, strip.position = 'bottom',nrow=1)+
theme_bw()
as it can be seen that bar_data is removed. Is is possible to plot them together in this context ??
thx
You're running into problems here because the transformation of the second axis is only used to create the second axis -- it has no impact on the data. Your bar_data is still being plotted on the original axis, which only goes up to 0.5 because of your limits. This prevents the bars from appearing.
In order to make the data show up in the same range, you have to normalize the bar data so that it falls in the same range as the point data. Then, the axis transformation has to undo this normalization so that you get the appropriate tick labels. Like so:
# Normalizer to bring bar data into point data range. This makes
# highest bar equal to highest point. You can use a different
# normalization if you want (e.g., this could be the constant 15
# like you had in your example, though that's fragile if the data
# changes).
normalizer <- max(bar_data$bar_y) / max(point_data$point_y)
sec_axis_plot <- ggplot(point_data,
aes(y=point_y, x=gr)) +
# Plot the bars first so they're on the bottom. Use geom_col,
# which creates bars with specified height as y.
geom_col(data=bar_data,
aes(x = gr,
y = bar_y / normalizer)) + # NORMALIZE Y !!!
# stat="identity" and alpha=1 are defaults for geom_point
geom_point(size=5.5) +
# Create second axis. Notice that the transformation undoes
# the normalization we did for bar_y in geom_col.
scale_y_continuous(sec.axis = sec_axis(trans= ~.*normalizer,
name = 'bar_y')) +
theme_bw()
This gives you the following plot:
I removed some of your bells and whistles to make the axis-specific stuff more clear, but you should be able to add it back in no problem. A couple of notes though:
Remember that the second axis is created by a 1-1 transformation of the primary axis, so make sure they cover the same limits under the transformation. If you have bars that should go to zero, the primary axis should include the untransformed analogue of zero.
Make sure that the data normalization and the axis transformation undo each other so that your axis lines up with the values you're plotting.

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

Can I change where the x-axis intersects the y-axis in ggplot2?

I'm plotting some index data as a bar chart. I'd like to emphasise the "above index" and "below index"-ness of the numbers by forcing the x-axis to cross at 100 (such that a value of 80 would appear as a -20 bar.)
This is part of a much longer process, so it's hard to share data usefully. Here, though, is some bodge-y code that illustrates the problem (and the beginnings of my solution):
df <- data.frame(c("a","b","c"),c(118,80,65))
names(df) <- c("label","index")
my.plot <- ggplot(df,aes(label,index))
my.plot + geom_bar()
df$adjusted <- as.numeric(lapply(df$index,function(x) x-100))
my.plot2 <- ggplot(df,aes(label,adjusted))
my.plot2 + geom_bar()
I can, of course, change my index calculation to read: (value.new/value.old)*100-100 then title the chart appropriately (something like "xxx relative to index") but this seems clumsy.
So, too, does the approach I've been testing (to run the simple calculation above, then re-label the y-axis.) Is that really the best solution?
No doubt someone's going to tell me that this sort of axis manipulation is frowned upon. If this is the case, please could they point me in the direction of an explanation? At least then I'll have learned something.
This doesn't directly answer you question, but instead of missing about with the x-axis, why not make a single grid line a bit thicker? For example,
dd = data.frame(x = 1:10, y = runif(10))
g = ggplot(dd, aes(x, y)) + geom_point()
g + geom_hline(yintercept=0.2, colour="white", lwd=3)
Or as Paul suggested, with a black line and some text:
g + geom_hline(yintercept=0.2, colour="black", lwd=3) +
annotate("text", x = 2, y = 0.22, label = "Reference")
The coordinate system of you plot has the x-axis and the y-axis crossing at (0,0). This is just the way you define your coordinate system. You can of course draw a horizontal line at (x = 100), but to call this is x-axis is false.
What you already proposed is to redefine your coordinate system by transforming the data. Whether or not this transformation is appropriate is easier to answer with a reproducible example from your side.

Resources