ggplot dual reversed y axis and geom_hline intercept calculation - r

I have created a line graph in ggplot2 with two y axes, and want only one dataset (blue) plotted on a reversed axis and want the other dataset (red) plotted on a different scale from the first. However, the code I am working with reverses both axes, and although the second y axis has been coded to have a different scale the second dataset (red) is being plotted using the scale of the first y axis. Furthermore I have created a line (green) for which I have to determine where the blue line intercepts it. I know the latter part of this question has been asked before and was answered, however it was noted in that post that the solution doesn't actually work. Any input would be helpful! Thank you! I've provided a sample dataset as mine is too large to recreate.
time<-c(1,2,3,4,5,6,7,8,9,10)
height<-c(100,330,410,570,200,270,230,390,400,420)
temp<-c(37,33,14,12,35,34,32,28,26,24)
tempdf<-data.frame(time,height,temp)
makeplot<-ggplot(tempdf,aes(x=time)) + geom_line(aes(y=height),color="Blue")
+ geom_line(aes(y=temp),color="Red")+
scale_y_continuous(sec.axis=sec_axis(~./100,name =
"Temperature"),trans="reverse")+
geom_hline(aes(yintercept=250), color="green")

ggplot will only do 1:1 axis transformations, and if it flips one axis, will flip both, so you need to figure out an equation to translate your axes. Multiplying (or dividing) by a negative flips your temperature axis back to a standard increasing scale. These two equations worked to get the sample data you had on the same scale.
height = temp*(-10) + 600
temp = (height - 600)/(-10)
Now, you can incorporate the equations into your plot code, the first to translate the temperature data into numbers that fit on the height scale, the second to translate your secondary axis numbers to a scale that shows temperature.
makeplot<-ggplot(tempdf,aes(x=time)) +
geom_line(aes(y=height),color="blue") +
geom_line(aes(y = (temp*(-10)) + 600), color="red")+
scale_y_continuous(sec.axis=sec_axis(~(.-600)/(-10),name =
"Temperature"),trans="reverse")+
geom_hline(aes(yintercept=250), color="green")
makeplot

Ignoring the intersection of lines problem for now, here are a couple of alternatives to dual axes. First, facets:
library(tidyverse)
library(scales)
tempdf %>%
# convert height to depth
mutate(height = -height) %>%
rename(depth = height) %>%
gather(key, value, -time) %>%
ggplot(aes(time, value)) +
geom_line() +
facet_grid(key ~ ., scales = "free_y") +
scale_x_continuous(breaks = pretty_breaks()) +
theme_bw()
Second, use coloured points to indicate temperature at each depth:
tempdf %>%
mutate(height = -height) %>%
rename(depth = height) %>%
ggplot(aes(time, depth)) +
geom_line() +
geom_point(aes(color = temp), size = 3) +
geom_hline(yintercept = -250, color = "blue") +
scale_color_gradient2(midpoint = 25,
low = "blue",
mid = "yellow",
high = "red") +
scale_x_continuous(breaks = pretty_breaks()) +
theme_dark()

Another alternative is a path plot:
ggplot(tempdf, aes(height, temp)) +
geom_path() +
geom_point(aes(fill = time), size = 8, shape = 21) +
geom_text(aes(label = time)) +
viridis::scale_fill_viridis()

Related

plotting geom_text at individual positions in each facet of facet_wrap

I have this dataset
df <- data.frame(groups=factor(c(rep("A",6), rep("B",6), rep("C",6))),
types=factor(c(rep(c(rep("Z",2), rep("Y",2), rep("X",2)),3))),
values=c(10,11,1,2,0.1, 0.2, 12,13, 2,2.5, 0.2, 0.01,
12,14, 2,3,0.1,0.2))
library(ggplot2)
px <- ggplot(df, aes(groups, values)) + facet_wrap(~types, scale="free") + geom_point()
px
i conducted post-hoc analysis (values~groups for each level of types) and created a dataset, which contains significance groups (as letters: a, b, c and so on):
df.text <- data.frame(groups=factor(c(rep("A",3), rep("B",3), rep("C",3))),
label=rep("a", 9))
i proceeded to plot the labels on px:
px + geom_text(data=df.text, aes(x=groups, y=0.1, label=label), size=4, col="red", stat="identity") +theme_bw()
which doesnt look great.
My problem is to define a aes(y) in geom_text, which plots the labels at at a fixed position (e.g. above the x-axis or on top of the panel), without shifting the limits of the y axis too much. With previous datasets, the y-values were quite homogeneous among groups, so i could get away with a very low y-value. This time however the range of y is quite high, so its not easily getting done.
So, the question is how to plot the labels inside df.text at a fixed position in facet_wrap, while keeping scale="free". Best would be above the top panel.border.
You could define a new variable height which determines the height to plot the labels:
library(tidyverse)
df %>%
group_by(types) %>%
mutate(height = max(values) + .3 * sd(values)) %>%
left_join(df.text, by = "groups") %>%
ggplot(aes(groups, values)) +
facet_wrap(~types, scale = "free") +
geom_point() +
geom_text(aes(x = groups, y = height, label = label), size = 4, col = "red", stat = "identity") +
theme_bw()
Here I used the max value plus .3 times the standard deviation but you could change that to whatever you wanted obviously. Not sure how to get the labels on top of the panel strips though.

Adding different secondary x axis for each facet in ggplot2

I would like to add a different secondary axis to each facet. Here is my working example:
library(ggplot2)
library(data.table)
#Create the data:
data<-data.table(cohort=sample(c(1946,1947,1948),10000,replace=TRUE),
works=sample(c(0,1),10000,replace=TRUE),
year=sample(seq(2006,2013),10000,replace=TRUE))
data[,age_cohort:=year-cohort]
data[,prop_works:=mean(works),by=c("cohort","year")]
#Prepare data for plotting:
data_to_plot<-unique(data,by=c("cohort","year"))
#Plot what I want:
ggplot(data_to_plot,aes(x=age_cohort,y=prop_works))+geom_point()+geom_line()+
facet_wrap(~ cohort)
The plot shows how many people of a particular cohort work at a given age. I would like to add a secondary x axis showing which year corresponds to a particular age for different cohorts.
Since you have the actual values you want to use in your dataset, one work around is to plot them as an additional geom_text layer:
ggplot(data_to_plot,
aes(x = age_cohort, y = prop_works, label = year))+
geom_point() +
geom_line() +
geom_text(aes(y = min(prop_works)),
hjust = 1.5, angle = 90) + # rotate to save space
expand_limits(y = 0.44) +
scale_x_continuous(breaks = seq(58, 70, 1)) + # ensure x-axis breaks are at whole numbers
scale_y_continuous(labels = scales::percent) +
facet_wrap(~ cohort, scales = "free_x") + # show only relevant age cohorts in each facet
theme(panel.grid.minor.x = element_blank()) # hide minor grid lines for cleaner look
You can adjust the hjust value in geom_text() and y value in expand_limits() for a reasonable look, depending on your desired output's dimensions.
(More data wrangling would be required if there are missing years in the data, but I assume that isn't the case here.)

Nudging geom_segments where x|xend = -Inf and segments are color-coded by group

I have data for 16 analytes (the facet variable) for three groundwater monitoring wells (well = factor basis for color-coding), each screened at different intervals. For each analyte (facet), the intent is to overlay the data for each well and show corresponding screen intervals along the y-axis. Some screens overlap so aren't easily distinguished. The goal is to have them align along the y-axis with equidistant spacing in this fashion: |||. Problem is the levels of my facetting variable have very different scales. Below is a rough example using the diamonds data set.
require(dplyr)
require(ggplot2)
# Create mock dataframe, where facet variable ("mockvar") has different x-axis scales
mockdf <- filter(diamonds, cut=="Fair"|cut=="Good"|cut=="Ideal") %>%
droplevels() %>% mutate(mockvar=ifelse(clarity=="SI2", 10*table,
ifelse(clarity=="SI1", 100*table,
ifelse(clarity=="VS2", 1000*table, table))))
#Plot Code
ggplot(mockdf, aes(mockvar, depth, color=cut)) + scale_y_reverse() +
geom_point() + facet_wrap(~clarity, scales="free") +
geom_segment(data=mockdf[mockdf$cut=="Fair",], aes(x=-Inf, xend=-Inf, y=55, yend=65)) +
geom_segment(data=mockdf[mockdf$cut=="Good",], aes(x=-Inf, xend=-Inf, y=60, yend=70)) +
geom_segment(data=mockdf[mockdf$cut=="Ideal",], aes(x=-Inf, xend=-Inf, y=65, yend=75))
#calls to position = position_dodge(width = #.#)) ...didn't work
How do I juggle the segments given the different scaling? An alternate long-winded solution would be to subset further on each facet level, for example:
ggplot(mockdf, aes(mockvar, depth, color=cut)) + scale_y_reverse() +
geom_point() + facet_wrap(~clarity, scales="free") +
geom_segment(data=mockdf[mockdf$cut=="Fair"& mockdf$clarity=="I1",], aes(x=49, xend=49, y=55, yend=65)) +
geom_segment(data=mockdf[mockdf$cut=="Good"& mockdf$clarity=="I1",], aes(x=49.5, xend=49.5, y=60, yend=70)) +
geom_segment(data=mockdf[mockdf$cut=="Ideal"& mockdf$clarity=="I1",], aes(x=50, xend=50, y=65, yend=75))
#and so on for all remaining facet levels....
But that's a lot of code and a crude 'jerry-rig' at best. Any suggestions for keeping the initial x|xend=-Inf for the first group, then nudging the next 2 segments relative to -Inf with consistent spacing globally across facets?
You'll have to switch off automatic axis expansion and then you can draw the segments where you want them.
require(dplyr)
require(ggplot2)
# Create mock dataframe, where facet variable ("mockvar") has different x-axis scales
mockdf <- filter(diamonds, cut=="Fair"|cut=="Good"|cut=="Ideal") %>%
droplevels() %>% mutate(mockvar=ifelse(clarity=="SI2", 10*table,
ifelse(clarity=="SI1", 100*table,
ifelse(clarity=="VS2", 1000*table, table))))
# variables to control axis range and segment spacing
s1 = 0.9 # controls distance to minimum point
s2 = 0.03 # controls distance between segment lines
# add min and range variables
mockdf <- group_by(mockdf, clarity) %>%
mutate(min = s1*min(mockvar),
range = (2-s1)*max(mockvar) - s1*min(mockvar))
ggplot(mockdf, aes(mockvar, depth, color=cut)) + scale_y_reverse() +
# switch off automatic axis expansion
scale_x_continuous(expand = c(0, 0)) +
geom_point() + facet_wrap(~clarity, scales="free") +
geom_segment(data=mockdf[mockdf$cut=="Fair",],
aes(x=min, xend=min, y=55, yend=65)) +
geom_segment(data=mockdf[mockdf$cut=="Good",],
aes(x=min + s2*range, xend=min + s2*range, y=60, yend=70)) +
geom_segment(data=mockdf[mockdf$cut=="Ideal",],
aes(x=min + 2*s2*range, xend=min + 2*s2*range, y=65, yend=75)) +
# draw invisible segment to set end of x axis range
geom_segment(data=mockdf[mockdf$cut=="Fair",],
aes(x=min + range, xend=min + range, y=60, yend=70), color = NA)

Visualizing two or more data points where they overlap (ggplot R)

I have a scatterplot that has colour-coded data points. When two or more of the data points overlap only one of the colours is shown (whichever is first in the legend). Each of these data points represents an item and I need to show which items fall at each point on the scale. I'm using R (v.3.3.1). Would anyone have any suggestions as per how I could show that there are multiple items at each point on the scatterplot?
Thanks in advance.
pdf('pedplot.pdf', height = 6, width = 10)
p3 <- ggplot(data=e4, aes(x=e4$domain, y=e4$ped)) + geom_point(aes(color =
e4$Database_acronym), size = 3, shape = 17) +
labs(x = "Domains", y = "Proportion of Elements per Domain", color = "Data
Sources") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p3 dev.off();
You could jitter the points, meaning add a bit of noise to remove the overlap (probably the most commonly used option). Another option, would be to use different marker shapes (plus a small size adjustment) chosen so that the markers will be visible when plotted on top of each other. This will work if you have only two or three different marker types. A third option is to vary the size for each color, once again only for cases with maybe two or three colors/sizes, though the size difference might be confusing. If you can have multiple points of the same color with the same coordinates, then only jitter (among the three options above) will make that apparent. In any case, here are examples of each approach:
dat = data.frame(x=1:5, y=rep(1:5,3), group=rep(LETTERS[1:3],each=5))
theme_set(theme_bw())
# Jitter
set.seed(3)
ggplot(dat, aes(x,y, colour=group)) +
geom_point(size=3, position=position_jitter(h=0.15,w=0.15))
# Vary the marker size
ggplot(dat, aes(x,y, colour=group,size=group)) +
geom_point() +
scale_color_manual(values=c("red","blue","orange")) +
scale_size_manual(values=c(5,3,1))
# Vary the marker shape (plus a small size adjustment)
ggplot(dat, aes(x,y, colour=group, size=group, shape=group)) +
geom_point(stroke=1.5) +
scale_colour_manual(values=(c("black", "green", "orange"))) +
scale_shape_manual(values=c(19,17,4)) +
scale_size_manual(values=c(4,3,3))
Separately from or in addition to jittering as mentioned here, you could also consider making the points partially transparent:
linecolors <- c("#714C02", "#01587A", "#024E37")
fillcolors <- c("#9D6C06", "#077DAA", "#026D4E")
# partially transparent points by setting `alpha = 0.5`
ggplot(mpg, aes(displ, cty, colour = drv, fill = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1),
shape = 21, alpha = 0.5, size = 3) +
scale_color_manual(values=linecolors) +
scale_fill_manual(values=fillcolors) +
theme_bw()
What about using different shapes and fills?
ggplot(mpg, aes(displ, cty, fill = drv, shape = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1), alpha = 0.5, size = 3) +
scale_fill_manual(values=c("red","blue","orange")) +
scale_shape_manual(values= c(23, 24, 25)) +
theme_bw()
Another option could be by counting the overlapping points using geom_count with scale_size_area to scale the sizes of the points. Here is some reproducible code:
library(ggplot2)
ggplot(mpg, aes(x = displ, y = cty)) +
geom_count() +
scale_size_area()
Also, an example when using a color aesthetic to see the difference of counts of groups:
ggplot(mpg, aes(x = displ, y = cty, colour = drv)) +
geom_count() +
scale_size_area()
Created on 2023-01-31 with reprex v2.0.2
You could change the number of breaks in scale_size_area to show different sizes. Please check the link above for more examples.
Try geom_point(aes(color = e4$Database_acronym), position = "jitter", size = 3, shape = 17).
This adds a little bit of random variation to your scatter plot and thereby prevents overplotting.

How to smartly place text labels beside points of different sizes in ggplot2?

I am trying to make a labeled bubble plot with ggplot2 in R. Here is the simplified scenario:
I have a data frame with 4 variables: 3 quantitative variables, x, y, and z, and another variable that labels the points, lab.
I want to make a scatter plot, where the position is determined by x and y, and the size of the points is determined by z. I then want to place text labels beside the points (say, to the right of the point) without overlapping the text on top of the point.
If the points did not vary in size, I could try to simply modify the aesthetic of the geom_text layer by adding a scaling constant (e.g. aes(x=x+1, y=y+1)). However, even in this simple case, I am having a problem with positioning the text correctly because the points do not scale with the output dimensions of the plot. In other words, the size of the points remains constant in a 500x500 plot and a 1000x1000 plot - they do not scale up with the dimensions of the outputted plot.
Therefore, I think I have to scale the position of the label by the size (e.g. dimensions) of the output plot, or I have to get the radius of the points from ggplot somehow and shift my text labels. Is there a way to do this in ggplot2?
Here is some code:
# Stupid data
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
# Plot with bad label placement
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I should mention, I tried hjust and vjust inside of geom_text, but it does not produce the desired effect.
# Trying hjust and vjust, but it doesn't look nice
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab), hjust=0, vjust=0.5,
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I managed to get something that works for now, thanks to Henrik and shujaa. I will leave the question open just in case someone shares a more general solution.
Just a blurb of what I am using this for: I am plotting a map, and indicating the amount of precipitation at certain stations with a point that is sized proportionally to the amount of precipitation observed. I wanted to add a station label beside each point in an aesthetically pleasing manner. I will be making more of these plots for different regions, and my output plot may have a different resolution or scale (e.g. due to different projections) for each plot, so a general solution is desired. I might try my hand at creating a custom position_jitter, like baptiste suggested, if I have time during the weekend.
It appears that position_*** don't have access to the scales used by other layers, so it's a no go. You could make a clone of GeomText that shifts the labels according to the size mapped,
but it's a lot of effort for a very kludgy and fragile solution,
geom_shiftedtext <- function (mapping = NULL, data = NULL, stat = "identity",
position = "identity",
parse = FALSE, ...) {
GeomShiftedtext$new(mapping = mapping, data = data, stat = stat, position = position,
parse = parse, ...)
}
require(proto)
GeomShiftedtext <- proto(ggplot2:::GeomText, {
objname <- "shiftedtext"
draw <- function(., data, scales, coordinates, ..., parse = FALSE, na.rm = FALSE) {
data <- remove_missing(data, na.rm,
c("x", "y", "label"), name = "geom_shiftedtext")
lab <- data$label
if (parse) {
lab <- parse(text = lab)
}
with(coord_transform(coordinates, data, scales),
textGrob(lab, unit(x, "native") + unit(0.375* size, "mm"),
unit(y, "native"),
hjust=hjust, vjust=vjust, rot=angle,
gp = gpar(col = alpha(colour, alpha),
fontfamily = family, fontface = fontface, lineheight = lineheight))
)
}
})
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1.2,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z), shape=1) +
geom_shiftedtext(aes(label=lab, size=z),
hjust=0, colour="red") +
scale_size_continuous(range=c(5, 100), guide="none")
This isn't a very general solution, because you'll need to tweak it every time, but you should be able to add to the x value for the text some value that's linear depending on z.
I had luck with
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab, x = x + .06 + .14 * (z - min(z))),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
but, as the font size depends on your window size, you would need to decide on your output size and tweak accordingly. I started with x = x + .05 + 0 * (z-min(z)) and calibrated the intercept based on the smallest point, then when I was happy with that I adjusted the linear term for the biggest point.
Another alternative. Looks OK with your test data, but you need to check how general it is.
dodge <- abs(scale(df$z))/4
ggplot(data = df, aes(x = x, y = y)) +
geom_point(aes(size = z)) +
geom_text(aes(x = x + dodge), label = df$lab, colour = "red") +
scale_size_continuous(range = c(5, 50), guide = "none")
Update
Just tried position_jitter, but the width argument only takes one value, so right now I am not sure how useful that function would be. But I would be happy to find that I am wrong. Example with another small data set:
df3 <- mtcars[1:10, ]
ggplot(data = df3, aes(x = wt, y = mpg)) +
geom_point(aes(size = qsec), alpha = 0.1) +
geom_text(label = df3$carb, position = position_jitter(width = 0.1, height = 0)) +
scale_size_continuous(range = c(5, 50), guide = "none")

Resources