I want to output two plots in a grid using the same function but with different input for x. I am using ggplot2 with stat_function as per this post and I have combined the two plots as per this post and this post.
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, color = "red"), fun = f01) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, color = "black"), fun = f01)
With the following outputs:
Plot:
Message:
`mapping` is not used by stat_function()`data` is not used by stat_function()`mapping` is not used by stat_function()`data` is not used by stat_function()
I don't understand why stat_function() won't use neither of the arguments. I would expect to plot two graphs one with x between -1:1 and the second with x between -2:2. Furthermore it takes the colors as labels, which I also don't understand why. I must be missing something obvious.
The issue is that according to the docs the data argument is
Ignored by stat_function(), do not use.
Hence, at least in the second call to stat_function the data is ignored.
Second, the
The function is called with a grid of evenly spaced values along the x axis, and the results are drawn (by default) with a line.
Therefore both functions are plotted over the same range of x values.
If you simply want to draw functions this can be achievd without data and mappings like so:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(color = "black", fun = f01, xlim = c(-2, 2)) +
stat_function(color = "red", fun = f01, xlim = c(-1, 1))
To be honest, I'm not really sure what happens here with ggplot and its inner workings. It seems that the functions are always applied to the complete range, here -2 to 2. Also, there is an issue on github regarding a wrong error message for stat_function.
However, you can use the xlim argument for your stat_function to limit the range on which a function is drawn. Also, if you don't specify the colour argument by a variable, but by a manual label, you need to tell which colours should be used for which label with scale_colour_manual (easiest with a named vector). I also adjusted the line width to show the function better:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
cols <- c("red" = "red", "black" = "black")
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, colour = "red"), fun = f01, size = 1.5, xlim = c(-1, 1)) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, colour = "black"), fun = f01) +
scale_colour_manual(values = cols)
Related
I am trying to add lines for confidence intervals in R but lines() isn't working. In the following code b is a dataframe, 100 observations of 2 variables 'pred' and 'se'.
plot(c(1:300),b$pred,type="l",lwd=1.5)
lines(c(1:300),b$pred+2*b$se,type="l",lty=2,col='red')
The first line is working but the second is not. I have tried it with and without the x values (plot works with or without, lines works for neither). I can get lines to work for different dataframes, but not this one.
It seems very fragile to me to use 1:300 when also referencing b; it might work when b has 300 rows, but any other time it's going to either complain with warnings or recycling silently and show a misleading/meaningless plot. In general, "never" use hard-coded numbers when working programmatically like this, perhaps better seq_len(nrow(b)) instead of 1:300.
The bounds (x/y limits) for the plot are defined with the first plot command. After that, in base R graphics, no other plotting command will alter the limits. This means it is highly likely that all of pred+2*se are greater than max(pred), so R thinks it's plotting the lines, but due to plotting inefficiency is really doing nothing since the lines are off-canvas.
For this, you need to set the limits up front, perhaps:
xlims <- with(b, range(c(pred, pred+2*se), na.rm = TRUE))
plot(seq_len(nrow(b)), b$pred, type="l", lwd=1.5, xlim=xlims)
lines(seq_len(nrow(b)), b$pred+2*b$se, type="l", lty=2, col='red')
That should address your question. Continue reading if you want to consider migration to ggplot2 ... not a one-for-one migration, not trivial, and perhaps premature at this point, but still something to think about.
While the above should fix the problem you cited, you might also consider migrating to ggplot2: it allows many other things (too many to discuss here), including the feature of updating the x/y limits with every "layer" you add to it. For instance, I wonder if the above will work:
library(ggplot2)
ggplot(b, aes(x = seq_along(pred), y = pred)) +
geom_line(linewidth = 1.5) + # this is doing what your first 'plot' is doing
geom_line(aes(y = pred + 2*se), linewidth = 2, color = "red") # your call to lines
(Notice no need to handle the x/y limits manually, ggplot2 figures it out for you with each layer added.)
I'm going to infer that you'll want to add a pred - 2*se as well, in which case it'll be another call to geom_line, as in
ggplot(b, aes(x = seq_along(pred), y = pred)) +
geom_line(linewidth = 1.5) +
geom_line(aes(y = pred + 2*se), linewidth = 2, color = "red") +
geom_line(aes(y = pred - 2*se), linewidth = 2, color = "blue")
Note that ggplot2 would actually prefer that you handle this with "long" data ... in that case, we can do something like below:
library(dplyr)
library(tidyr) # pivot_longer
b %>%
select(x, pred, se) %>%
mutate(
x = row_number(),
sehigh = pred + 2*se,
selow = pred - 2*se
) %>%
pivot_longer(-x, names_to = "type", values_to = "val") %>%
ggplot(aes(x, val, group = type, color = type)) +
geom_line() +
scale_color_manual(values = c(pred = "black", sehigh = "red", selow = "blue"))
In this case, only one call to geom_line, and ggplot will handle colors automatically (based on the new categorical variable type that we created in a previous step).
This question already has an answer here:
ggplot2: connecting points in polar coordinates with a straight line 2
(1 answer)
Closed 2 years ago.
I am trying to use ggplot to draw a radar-chart following the guidelines from the Grammar of Graphics. I am aware of the ggradar package but based on the grammar it looks like coord_polar should be enough here. This is the pseudo-code from the grammar:
So I thought something like this may work, however, the contour of the area chart is curved as if I used geom_line:
library(tidyverse)
dd <- tibble(category = c('A', 'B', 'C'), value = c(2, 7, 4))
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_polar(theta = 'x') +
geom_area(color = 'blue', alpha = .00001) +
geom_point()
While I understand why geom_line draws arcs once in coord_polar, my understanding of the explanation from the Grammar of Graphics is that there may be an element/geom area that could plot straight lines:
here is one technical detail concerning the shape of Figure 9.29. Why
is the outer edge of the area graphic a set of straight lines instead
of arcs? The answer has to do with what is being measured. Since
region is a categorical variable, the line segments linking regions
are not in a metric region of the graph. That is, the segments of the
domain between regions are not measurable and thus the straight lines
or edges linking them are arbitrary and perhaps not subject to
geometric transformation. There is one other problem with the
grammatical specification of this figure. Can you spot it? Undo the
polar trans- formation and think about the domain of the plot. We
cheated.
For completeness, this question derives from this other question I asked about plotting in polar system.
tl;dr we can write a function to solve this problem.
Indeed, ggplot uses a process called data munching for non-linear coordinate systems to draw lines. It basically breaks up a straight line in many pieces, and applies the coordinate transformation on the individual pieces instead of merely the start- and endpoints of lines.
If we look at the panel drawing code of for example GeomArea$draw_group:
function (data, panel_params, coord, na.rm = FALSE)
{
...other_code...
positions <- new_data_frame(list(x = c(data$x, rev(data$x)),
y = c(data$ymax, rev(data$ymin)), id = c(ids, rev(ids))))
munched <- coord_munch(coord, positions, panel_params)
ggname("geom_ribbon", polygonGrob(munched$x, munched$y, id = munched$id,
default.units = "native", gp = gpar(fill = alpha(aes$fill,
aes$alpha), col = aes$colour, lwd = aes$size * .pt,
lty = aes$linetype)))
}
We can see that a coord_munch is applied to the data before it is passed to polygonGrob, which is the grid package function that matters for drawing the data. This happens in almost any line-based geom for which I've checked this.
Subsequently, we would like to know what is going on in coord_munch:
function (coord, data, range, segment_length = 0.01)
{
if (coord$is_linear())
return(coord$transform(data, range))
...other_code...
munched <- munch_data(data, dist, segment_length)
coord$transform(munched, range)
}
We find the logic I mentioned earlier that non-linear coordinate systems break up lines in many pieces, which is handled by ggplot2:::munch_data.
It would seem to me that we can trick ggplot into transforming straight lines, by somehow setting the output of coord$is_linear() to always be true.
Lucky for us, we wouldn't have to get our hands dirty by doing some deep ggproto based stuff if we just override the is_linear() function to return TRUE:
# Almost identical to coord_polar()
coord_straightpolar <- function(theta = 'x', start = 0, direction = 1, clip = "on") {
theta <- match.arg(theta, c("x", "y"))
r <- if (theta == "x")
"y"
else "x"
ggproto(NULL, CoordPolar, theta = theta, r = r, start = start,
direction = sign(direction), clip = clip,
# This is the different bit
is_linear = function(){TRUE})
}
So now we can plot away with straight lines in polar coordinates:
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_area(color = 'blue', alpha = .00001) +
geom_point()
Now to be fair, I don't know what the unintended consequences are for this change. At least now we know why ggplot behaves this way, and what we can do to avoid it.
EDIT: Unfortunately, I don't know of an easy/elegant way to connect the points across the axis limits but you could try code like this:
# Refactoring the data
dd <- data.frame(category = c(1,2,3,4), value = c(2, 7, 4, 2))
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_path(color = 'blue') +
scale_x_continuous(limits = c(1,4), breaks = 1:3, labels = LETTERS[1:3]) +
scale_y_continuous(limits = c(0, NA)) +
geom_point()
Some discussion about polar coordinates and crossing the boundary, including my own attempt at solving that problem, can be seen here geom_path() refuses to cross over the 0/360 line in coord_polar()
EDIT2:
I'm mistaken, it seems quite trivial anyway. Assume dd is your original tibble:
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_polygon(color = 'blue', alpha = 0.0001) +
scale_y_continuous(limits = c(0, NA)) +
geom_point()
I am trying to understand the connection between scale_fill_brewer and scale_fill_manual of package ggplot2.
First, generate a ggplot with filled colors:
library(ggplot2)
p <- ggplot(data = mtcars, aes(x = mpg, y = wt,
group = cyl, fill = factor(cyl))) +
geom_area(position = 'stack')
# apply ready-made palette with scale_fill_brewer from ggplot2
p + scale_fill_brewer(palette = "Blues")
Now, replicate with scale_fill_manual
library(RColorBrewer)
p + scale_fill_manual(values = brewer.pal(3, "Blues"))
where 3 is the number of fill-colors in the data. For convenience, I have used the brewer.pal function of package RColorBrewer.
As far as I understand, the convenience of scale_fill_brewer is that it automatically computes the number of unique levels in the data (3 in this example). Here is my attempt at replicating:
p + scale_fill_manual(values = brewer.pal(length(levels(factor(mtcars$cyl))), "Blues"))
My question is: how does scale_fill_brewer compute the number of levels in the data?
I'm interested in understanding what else fill_color_brewer might be doing under the hood. Might I run into any difficulty if I replace the more user friendly fill_color_brewer with a more contorted implementation of scale_fill_manual like the one above.
Perusing the source code:
scale_fill_brewer
function (..., type = "seq", palette = 1) {
discrete_scale("fill", "brewer", brewer_pal(type, palette), ...)
}
I couldn't see through this how scale_fill_brewer computes the number of unique levels in the data. Perhaps hidden in the ... ?
Edit: Where does the function scale_fill_brewer receive instructions to compute the number of levels in the data? Is it in "seq" or in ... or elsewhere?
The discrete_scale function is intricate and I'm lost. Here are its arguments:
discrete_scale <- function(aesthetics, scale_name, palette, name = NULL,
breaks = waiver(), labels = waiver(), legend = NULL, limits = NULL,
expand = waiver(), na.value = NA, drop = TRUE, guide="legend") {
Does any of this compute the number of levels?
The easiest way is to trace it is to think in terms of (1) setting up the plot data structure, and (2) resolving the aesthetics. It uses S3 so the branching is implicit
The setup call sequence
[scale-brewer.R] scale_fill_brewer(type="seq", palette="Blues")
[scale-.R] discrete_scale(...) - return an object representing the scale
structure(list(
call = match.call(),
aesthetics = aesthetics,
scale_name = scale_name,
palette = palette,
range = DiscreteRange$new(), ## this is scales::DiscreteRange
...), , class = c(scale_name, "discrete", "scale"))
The resolve call sequence
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scales_train_df(...) - iterate again over scales$scales (list)
[scale-.r] scale_train_df(...) - iterate again
[scale-.r] scale_train(...) - S3 generic function
[scale-.r] scale_train.discrete(...) - almost there...
scale$range$train(x, drop = scale$drop)
but scale$range is a DiscreteRange instance, so it calls (scales::DiscreteRange$new())$train, which overwrites scale$range!
range <<- train_discrete(x, range, drop)
scales:::train_discrete(...) - again, almost there...
scales:::discrete_range(...) - still not there..
scales:::clevels(...) - there it is!
As of this point, scale$range has been overwritten by the levels of the factor. Unwinding the call stack to #1, we now call scales_map_df
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scale_maps_df(...) - iterate
[scale-.r] scale_map_df(...) - iterate
[scale-.r] scale_map.discrete - fill up the palette (non-position scale!)
scale_map.discrete <- function(scale, x, limits = scale_limits(scale)) {
n <- sum(!is.na(limits))
pal <- scale$palette(n)
...
}
I am trying to plot a scatterplot using ggplot2 in R. I have data as follows in csv format
A B
-4.051587034 -2.388276692
-4.389339837 -3.742321425
-4.047207557 -3.460923901
-4.458420756 -2.462180905
-2.12090412 -2.251811973
I want to high light specific two dot with corresponds -2.462180905 and -3.742321425 and to in plot with different colors. Which should to different than default colors in the plot. I tried following code
library(ggplot2)
library(reshape2)
library(methods)
library(RSvgDevice)
Data<-read.csv("table.csv",header=TRUE,sep=",")
data1<-Data[,-3]
plot2<-ggplot(data1,aes(x = A, y = B)) + geom_point(aes(size=2,color=ifelse(y=-2.462180905,'red')))
graph<-plot2 + theme_bw()+opts(axis.line = theme_segment(colour = "black"),panel.grid.major=theme_blank(),panel.grid.minor=theme_blank(),panel.border = theme_blank())
ggsave(graph,file="figure.svg",height=6,width=7)
It is not working the way i want. It gives all dots in same color. Can anybody help?
Another way, which may be more or less efficient depending on your requirements, would be to add another geom_point():
x <- c(-4.051587034, -4.389339837, -4.047207557, -4.458420756, -2.12090412)
y <- c(-2.388276692, -3.742321425, -3.460923901, -2.462180905, -2.251811973)
d <- data.frame(x, y)
require("ggplot2")
h <- c(2, 4) # put row numbers in here or use condition
ggplot() +
geom_point(data = d, aes(x, y), colour = "red", size = 5) +
geom_point(data = d[h, ], aes(x, y), colour = "blue", size = 5)
# notice the colour is outside the aesthetic arguments
Which gives you this:
Add a different column with the same value for all points except the highlighted point, assign the colour aesthetic to that column, then change the colours manually.
data1$highlight <- data1$B == -2.462180905 # FALSE except for the one you want
ggplot(data1, aes(x = A, y = B)) +
geom_point(aes(colour = highlight), size = 2) +
scale_colour_manual(values = c("FALSE" = "black", "TRUE" = "red"))
Note that the condition in the first line will have to be exact in order to get TRUE at the right row. Either ensure the value is exact or use a condition that will match the desired row.
Also note that opts is deprecated. Use theme instead. But that's another question.
I am trying to use geom_point to illustrate the count of my data. I would also like to annotate a few of the points in my graph with geom_text. When I add the call to geom_text, it appears that it is plotting something underneath the points in the legend. I've tried reversing the order of the layers to no avail. I can't wrap my head around why it is doing this. Has anyone seen this before?
set.seed(42)
df <- data.frame(x = 1:10
, y = 1:10
, label = sample(LETTERS,10, replace = TRUE)
, count = sample(1:300, 10, replace = FALSE)
)
p <- ggplot(data = df, aes(x = x, y = y, size = count)) + geom_point()
p + geom_text(aes(label = label, size = 150, vjust = 2))
This happened to me all the time. The trick is knowing that aes() maps data to aesthetics. If there's no data to map (e.g., if you have a single value that you determine), there's no reason to use aes(). I believe that only things inside of an aes() will show up in your legend.
Furthermore, when you specify mappings inside of ggplot(aes()), those mappings apply to every subsequent layer. That's good for your x and y, since both geom_point and geom_text use them. That's bad for size = count, as that only applies to the points.
So these are my two rules to prevent this kind of thing:
Only put data-based mappings inside of aes(). If the argument is taking a single pre-determined value, pass it to the layer outside of aes().
Map data only for those layers that will use it. Corollary: only map data inside of ggplot(aes()) if you're confident that every subsequent layer will use it. Otherwise, map it at the layer level.
So I would plot this thusly:
p <- ggplot(data = df, aes(x = x, y = y)) + geom_point(aes(size = count))
p + geom_text(aes(label = label), size = 4, vjust = 2)
or, if you need to specify the size of text inside the aes, then legend = FALSE suppress drawing the legends of the geom:
p <- ggplot(data = df, aes(x = x, y = y, size = count)) + geom_point()
p + geom_text(aes(label = label, size = 150, vjust = 2), show_guide = FALSE)