I am creating plots similar to the first example image below, and need plots like the second example below.
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major)
The data.2014 has only values for the "Findings" group. I would like to show those 2014 Findings values on the plot, on the appropriate/corresponding data.2015$area, where there is 2014 data available.
To show last year's data just on the "Finding" (red bars) data, I'd like to use a one-sided errorbar/whisker that emanates from the value of the relevant data.2015 bar, and terminates at the data.2014 value, for example:
I thought to do this by using layers and plotting error bars so that the 2015 data could overlap, however this doesn't work when the 2014 result is abs() smaller than the 2015 result and is thus occluded.
Considerations:
I'd like the errorbar/whisker to be the same width as the bars, perhaps even dashed line with a solid cap.
Bonus points for a red line when the value has decreased, and green when the value has increased
I generate lots of these plots in a loop, sometimes with many groups, with a different amount of areas in each plot. The 2014 data is (at this stage) always displayed only for a single group, and every area has some data (except for just one NA case, but need to provision for that scenario)
EDIT
So I've added to the below solution, I used that exact code but instead used the geom_linerange so that it would add lines without the caps, then I also used the geom_errorbar, but with ymin and ymax set to the same value, so that the result is a one-sided error bar in ggplot geom_bar! Thanks for the help.
I believe you can get most of what you want with a little data manipulation. Doing an outer join of the two datasets will let you add the error bars with the appropriate dodging.
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
To make the error bar one-sided, you'll want ymin to be either the same as y or NA depending on the group. It seemed easiest to make a new variable, which I called plotscore, to achieve this.
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
The last thing I did is to make a variable direction for when the 2015 score decreased vs increased compared to 2014. I included a third category for the Benchmark group as filler because I ran into some issues with the dodging without it.
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
The dataset used for plotting would look like this:
area group score.2015 score.2014 plotscore direction
1 first Benchmark -40 NA NA absent
2 first Findings -50 -30 -50 dec
3 second Benchmark -10 NA NA absent
4 second Findings 20 40 20 dec
5 third Benchmark 60 NA NA absent
6 third Findings 15 -15 15 inc
The final code I used looked like this:
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(aes(ymin = plotscore, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))
I'm using the development version of ggplot2, ggplot2_1.0.1.9002, and show_guide is now deprecated in favor of show.legend, which I used in geom_errorbar.
I obviously didn't change the line type of the error bars to dashed with a solid cap, nor did I remove the bottom whisker as I don't know an easy way to do either of these things.
In response to a comment suggesting I add the full solution as an answer:
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# reconfigure data to create values for the additional errorbar/linerange
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
# set the data min and max as the same to have a single 'cap' with no line
geom_errorbar(aes(ymin = score.2014, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
#then add the line
geom_linerange(aes(ymin = score.2015, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))
Related
On the same ggplot figure, I am trying to have the points (from geom_point), the lines (from geom_line) and the errorbars (from geom_errorbar) on the same "plane" (i.e. not overlapping), this for each factor.
As you can see the "layering" of the errorbars is not following the "layering" of the lines (not mentionning the points).
Here is a reproducible example:
# reproducible example
# package
library(dplyr)
library(ggplot2)
# generate the data
set.seed(244)
d1 <- data.frame(time_serie = as.factor(rep(rep(1:3, each = 6), 3)),
treatment = as.factor(rep(c("HIGH", "MEDIUM", "LOW"), each = 18)),
value = runif(54, 1, 10))
# create the error intervals
d2 <- d1 %>%
dplyr::group_by(time_serie,treatment) %>%
dplyr::summarise(mean_value = mean(value),
SE_value = sd(value/sqrt(length(value)))) %>%
as.data.frame()
# plot
p1 <- ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2)
p1
p1a <- p1 + geom_errorbar(aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value), width = .2, position = position_dodge(0.3), size =1) +
geom_point(aes(), position = position_dodge(0.3), size = 3) +
geom_line(aes(color = treatment), position=position_dodge(0.3), size =1)
p1a
Any idea?
Any help would be greatly appreciated :)
Thanks a lot!
Valérian
Up front: this is a partial answer that has two notable issues still to fix (see the end). Edit: the two issues have been resolved, see the far bottom.
I'll change the "dodge" slightly to clarify the point, identify an area of concern, and demonstrate a suggested workaround.
# generate the data
set.seed(244)
d1 <- data.frame(time_serie = as.factor(rep(rep(1:3, each = 6), 3)),
treatment = as.factor(rep(c("HIGH", "MEDIUM", "LOW"), each = 18)),
value = runif(54, 1, 10))
# create the error intervals
d2 <- d1 %>%
dplyr::group_by(time_serie,treatment) %>%
dplyr::summarise(mean_value = mean(value),
SE_value = sd(value/sqrt(length(value)))) %>%
dplyr::arrange(desc(treatment)) %>%
as.data.frame()
# plot
ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2) +
geom_errorbar(aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, position = position_dodge(0.03), size = 2) +
geom_point(aes(), position = position_dodge(0.03), size = 3) +
geom_line(aes(color = treatment), position = position_dodge(0.03), size = 2)
Namely, I'll assume that we want HIGH (red) points/lines/error-bars as the top-most layer, masked by nothing. We can see a clear violation of this in the right-most bar: the red dot is over the green errorbar but under the green line.
Unless/until there is an aes(layer=..) aesthetic (there is not afaik), you need to add layers one treatment at a time. While one could hard-code this with nine geoms, you can automate this with lapply. Note that ggplot(.) + list(geom1,geom2,geom3) works just fine, even with nested lists.
I'll control the order of layers with rev(levels(d2$treatment)), assuming that you want LOW as the bottom-most layer (ergo added first). The order of geoms within the list is what defines their layers. Technically we still have a single treatment's errorbar, point, and line on different layers, but they are consecutive so appear to be the same.
ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2) +
lapply(rev(levels(d2$treatment)), function(trtmnt) {
list(
geom_errorbar(data = ~ subset(., treatment == trtmnt),
aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, position = position_dodge(0.03), size = 2),
geom_point(data = ~ subset(., treatment == trtmnt), aes(), position = position_dodge(0.03), size = 3),
geom_line(data = ~ subset(., treatment == trtmnt), position = position_dodge(0.03), size = 2)
)
})
(Side note: I use levels(d2$treatment) and data=~subset(., treatment==trtmnt) here, but that's just one way to do it. Another would be lapply(split(d2, d2$treatment), function(x) ...) and use data=x in all of the inner geoms. This latter method allows for multi-variable grouping, if desired. I see no immediate advantage to one over the other.)
The problems with this:
The order of the legend is not consistent with the order of levels of the factor, somehow that is lost. (To be clear, I don't demonstrate this very well here: I can move "medium" to the middle of the legend using levels<-, and it works with the non-lapply rendering code with incorrect layering, but it is again lost with the lapply-geoms.)
position_dodge no longer has awareness of the other treatments, so it does not dodge the other errorbars. The only way around this (not demonstrated here) would be to manually dodge before plotting, shown below.
1: Order of legend elements
This one was solved in lapply'd geoms lose factor-ordering, where we just need to add scale_color_discrete(drop=FALSE).
2: Dodging
The dodge issue can be fixed by using real numerics in the x aesthetic. This is kind of a hack, as it is no longer done by ggplot2 but controlled externally. It's also applying an offset and not dodging, per se. But it does get the desired results.
d2$time_serie2 <- as.integer(as.character(d2$time_serie)) + as.numeric(d2$treatment)/10
ggplot(aes(x = time_serie2, y = mean_value, color = treatment, group = treatment), data = d2) +
lapply(rev(levels(d2$treatment)), function(trtmnt) {
list(
geom_errorbar(data = ~ subset(., treatment == trtmnt),
aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, size = 2),
geom_point(data = ~ subset(., treatment == trtmnt), aes(), size = 3),
geom_line(data = ~ subset(., treatment == trtmnt), size = 2)
)
}) +
scale_color_discrete(drop = FALSE)
I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.
I'm trying to map different ranges (lines) into different regions in the plot (see below) using geom_segment but some of the ranges overlap and can't be shown at all.
This is a minimal example for a dataframes:
start = c(1, 5,8, 14)
end =c(3, 6,12, 16)
regions = c(1,2,3, 4)
regions = data_frame(regions, start, end)
from = c(1,2, 5.5, 13.5)
to = c(3,2.5,6, 15)
lines = data_frame(from, to)
I plotted the regions with geom_rect and then plot the ranges (lines) with geom_segment.
This is the plot:
plot_splice <- ggplot() +
scale_x_continuous(breaks = seq(1,16)) +
scale_y_continuous() +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = 1.48,
yend = 1.48,
colour = "red",
size = 3
) +
ylim(1.0, 2.2) +
xlab("") +
theme_minimal()
The first plot is the one generated with the code whereas the second one is the desired plot.
As you can see, the second line overlaps with the first one, so you can't see the second line at all.
How can I change the code to produce the second plot?
I'm trying to use ifelse statement but not sure what is test argument should be:
I want it to check for each range (line) if it is overlapped with any previous range (line) to change the y position by around .05, so it doesn't overlap.
lines <- lines %>%
dplyr::arrange(desc(from))
new_y$lines = ifelse(from[1] < to[0], 1.48, 1.3)
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = new_y,
yend = new_y,
colour = "red",
size = 3
)
Your geom_segment call isn't using any aesthetic mapping, which is how you normally get ggplot elements to change position based on a particular variable (or set of variables).
The stacking of the geom_segment based on the number of overlapping regions is best calculated ahead of the call to ggplot. This allows you to pass the x and y values into an aesthetic mapping:
# First ensure that the data feame is ordered by the start time
lines <- lines[order(lines$from),]
# Now iterate through each row, calculating how many previous rows have
# earlier starts but haven't yet finished when the current row starts.
# Multiply this number by a small negative offset and add the 1.48 baseline value
lines$offset <- 1.48 - 0.03 * sapply(seq(nrow(lines)), function(i) {
with(lines[seq(i),], length(which(from < from[i] & to > from[i])))
})
Now do the same plot but using aesthetic mapping inside geom_segment:
ggplot() +
scale_x_continuous(breaks = seq(1,16), name = "") +
scale_y_continuous(limits = c(1, 2.2), name = "") +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
mapping = aes(
x = from,
xend = to,
y = offset,
yend = offset),
colour = "red",
size = 3
) +
theme_minimal()
My question is similar to this question.
But I can't transfer it to my own data.
I have a dataframe like this (over 1400 rows):
Code Stationsname Startdatum LAT LON Höhe Area Mean
1 AT0ENK1 Enzenkirchen im Sauwald 03.06.1998 48.39167 13.67111 525 rural 55.76619
2 AT0ILL1 Illmitz 01.05.1978 47.77000 16.76640 117 rural 58.98511
3 AT0PIL1 Pillersdorf bei Retz 01.02.1992 48.72111 15.94223 315 rural 59.47489
4 AT0SON1 Sonnblick 01.09.1986 47.05444 12.95834 3106 rural 97.23856
5 AT0VOR1 Vorhegg bei K”tschach-Mauthen 04.12.1990 46.67972 12.97195 1020 rural 70.65373
6 AT0ZIL1 Ried im Zillertal 08.08.2008 47.30667 11.86389 555 rural 36.76401
Now I want to create a map with ggplot and display the points in different colors based on the value in the Mean column, it reaches from 18 to 98.
Also I would like to change the symbols from a dot to a triangle if the value in the column Höhe is over 700.
Until now I did this:
library(ggmap)
library(ggplot2)
Europe <- get_map(location = "Europe", zoom = 3)
p = ggmap(Europe)
p = p + geom_point(data = Cluster, aes(LON, LAT, color = Mean),
size = 1.5, pch = ifelse(Höhe < 700,'19','17')) +
scale_x_continuous(limits = c(-25.0, 40.00), expand = c(0, 0)) +
scale_y_continuous(limits = c(34.00, 71.0), expand = c(0, 0)) +
scale_colour_gradient ---??
But I don't know how to go on and assign the colors.
I had a discussion with the OP using his data. One of his issues was to make scale_colour_gradient2() work. The solution was to set up a midpoint value. By default, it is set at 0 in the function. In his case, he has a continuous variable that has about 50 as median.
library(ggmap)
library(ggplot2)
library(RColorBrewer)
Europe2 <- get_map(maptype = "toner-2011", location = "Europe", zoom = 4)
ggmap(Europe2) +
geom_point(data = Cluster, aes(x = LON, y = LAT, color = Mean, shape = Höhe > 700), size = 1.5, alpha = 0.4) +
scale_shape_manual(name = "Altitude", values = c(19, 17)) +
scale_colour_gradient2(low = "#3288bd", mid = "#fee08b", high = "#d53e4f",
midpoint = median(Cluster$Mean, rm.na = TRUE))
It seems that the colors are not that good in the map given values seem to tend to stay close to the median value. I think the OP needs to create a new grouping variable with cut() and assign colors to the groups or to use another scale_color type of function. I came up with the following with the RColorBrewer package. I think the OP needs to consider how he wanna use colors to brush up his graphic.
ggmap(Europe2) +
geom_point(data = Cluster, aes(x = LON, y = LAT, color = Mean, shape = Höhe > 700), size = 1.5, alpha = 0.4) +
scale_shape_manual(name = "Altitude", values = c(19, 17)) +
scale_colour_distiller(palette = "Set1")
I am trying to create a color scale with a sharp color transition at one point. What I am currently doing is:
test <- data.frame(x = c(1:20), y = seq(0.01, 0.2, by = 0.01))
cutoff <- 0.10
ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y), width = 1, binwidth = 0)) +
geom_bar(stat = "identity") +
scale_fill_gradientn(colours = c("red", "red", "yellow", "green"),
values = rescale(log(c(0.01, cutoff - 0.0000000000000001, cutoff, 0.2))),
breaks = c(log(cutoff)), label = c(cutoff))
It is producing the plots I want. But the position of the break in colorbar somehow varies depending on the cutoff. Sometimes below the value, sometimes above, sometimes on the line. Here are some plots with different cutoffs (0.05, 0.06, 0.1):
What am I doing wrong? Or alternatively, is there a better way to create a such a color scale?
Have you looked into scale_colour_steps or scale_colour_stepsn?
Using the option n.break from scale_colour_stepsn you should be able to specify the number of breaks you want and have sharper transitions.
Be sure to use ggplot2 > 3.3.2
In case you are still interested in a solution for this, you can add guide = guide_colourbar(nbin = <some arbitrarily large number>) to scale_fill_gradientn(). This increases the number of bins used by the colourbar legend, which makes the transition look sharper.
# illustration using nbin = 1000, & weighted colours below the cutoff
plot.cutoff <- function(cutoff){
p <- ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y))) +
geom_col(width = 1) +
scale_fill_gradientn(colours = c("red4", "red", "yellow", "green"),
values = scales::rescale(log(c(0.01, cutoff - 0.0000000000000001,
cutoff, 0.2))),
breaks = c(log(cutoff)),
label = c(cutoff),
guide = guide_colourbar(nbin = 1000))
return(p)
}
cowplot::plot_grid(plot.cutoff(0.05),
plot.cutoff(0.06),
plot.cutoff(0.08),
plot.cutoff(0.1),
ncol = 2)
(If you find the above insufficiently sharp at very high resolutions, you can also set raster = FALSE in guide_colourbar(), which turns off interpolation & draws rectangles instead.)
I think it is slightly tricky to achieve an exact, discrete cutoff point in the continuous color scale using scale_fill_gradientn. A quick alternative would be to use scale_fill_gradient, set the cutoff with limits, and set the color of 'out-of-bounds' values with na.value.
Here's a slightly simpler example than in your question:
# some data
df <- data.frame(x = factor(1:10), y = 1, z = 1:10)
# a cutoff point
lo <- 4
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green",
limits = c(lo, max(df$z)), na.value = "red")
As you see, the values below your cutpoint will not appear in the legend, but one may consider including a large chunk of red a waste of "legend band width" anyway. You might just add a verbal description of the red bars in the figure caption instead.
You may also wish to differentiate between values below a lower cutpoint and above an upper cutpoint. For example, set 'too low' values to blue and 'too high values' to red. Here I use findInterval to differentiate between low, mid and high values.
# some data
set.seed(2)
df <- data.frame(x = factor(1:10), y = 1, z = sample(1:10))
# lower and upper limits
lo <- 3
hi <- 8
# create a grouping variable based on the the break points
df$grp <- findInterval(df$z, c(lo, hi), rightmost.closed = TRUE)
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green", limits = c(lo, hi), na.value = "red") +
geom_bar(data = df[df$grp == 0, ], fill = "blue", stat = "identity")