I'm working with some grid data and I'm having problems with working with discrete diverging scales. Specifically, how to set the midpoint so it's not at the center of the range. This is a reproducible example to get what i mean:
library(ggplot2)
grid <- expand.grid(lon = seq(0, 360, by = 2), lat = seq(-90, 0, by = 2))
grid$z <- with(grid, cos(lat*pi/180) - .7)
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut_width(z, .1))) +
scale_fill_brewer(palette = "RdBu")
Here, the center of the scale is not a the divide between positive and negative values. I know I could use a continuous scale, but I find that having fewer colours help with what I'm trying to show.
Is there a way to shift the midpoint in a discrete scale? Other alternatives that achieve the same result are welcome too.
The issue is that your cut points are not falling symmetrically around 0, and are mapping directly to your colors. One approach is to manually set your cut points so that they center around 0. Then, just make sure to not drop unused levels in the legend:
zCuts <-
seq(-.7, 0.7, length.out = 10)
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut(z, zCuts))) +
scale_fill_brewer(palette = "RdBu"
, drop = FALSE)
If you are willing to go with a gradient instead of such discrete colors, you can use scale_fill_gradient2 which by default centers at 0 and ranges between two colors:
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = z)) +
scale_fill_gradient2()
Or, if you really want the interpolation from Color Brewer, you can set the limits argument in scale_fill_distiller and get a gradient that way instead. Here, I set them at + and - the range around 0 (max(abs(grid$z)) is getting the largest deviation from 0, whether it is the min or the max, to ensure that the range is symetrical). If you are using more than the 11 available values, that is probably the best way to go:
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = z)) +
scale_fill_distiller(palette = "RdBu"
, limits = c(-1,1)*max(abs(grid$z))
)
If you want more colors, without doing a gradient, you are probably going to need to construct your own palette manually with more colors. The more you add, the less the distinction between the colors you will find. Here is one example stitching together two palettes to ensure that you are working from colors that are distinct.
zCuts <-
seq(-.7, 0.7, length.out = 20)
myPallette <-
c(rev(brewer.pal(9, "YlOrRd"))
, "white"
, brewer.pal(9, "Blues"))
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut(z, zCuts))) +
scale_fill_manual(values = myPallette
, drop = FALSE)
Related
Using ggplot, you can change the width of a bar of a bar graph by modifying width:
geom_bar(stat="identity",position=position_dodge(),width = .9)
You can uniformly change the distance of the bars using position_dodge():
geom_bar(stat="identity",position=position_dodge(1),width = .9)
How do I customize the distance between bars so they are varied in a ununiform manner?
It's not clear exactly what you mean. I'm assuming you mean you have a discrete x axis variable and you wish to specify custom spacing between each bar. It's possible to use position_jitter to get random spacing, though this also affects bar width and I'm guessing is not what you want.
I would probably handle this by using a numeric x scale and relabelling the axis with my factor levels:
library(ggplot2)
ggplot(data = data.frame(x = 1:10 + rep(c(0.1, -0.1), 5), y = sample(11:20))) +
geom_bar(aes(x, y, fill = factor(x)), color = "black", stat = "identity") +
scale_x_continuous(breaks = 1:10 + rep(c(0.1, -0.1), 5),
labels = LETTERS[1:10]) +
guides(fill = guide_none())
Of course, we can only guess at what you really want since you didn't provide a motivating example.
I would like to plot with gglot's geom_raster a 2D plot with 2 different gradients, but I do not know if there is a fast and elegant solution for this and I am stuck.
The effect that I would like to see is the overlay of multiple geom_raster, essentially. Also, I would need a solution that scales to N different gradients; let me give an example with N=2 gradients which is easier to follow.
I first create a 100 x 100 grid of positions X and Y
# the domain are 100 points on each axis
domain = seq(0, 100, 1)
# the grid with the data
grid = expand.grid(domain, domain, stringsAsFactors = FALSE)
colnames(grid) = c('x', 'y')
Then I compute one value per grid point; imagine something stupid like this
grid$val = apply(grid, 1, function(w) { w['x'] * w['y'] }
I know how to plot this with a custom white to red gradient
ggplot(grid, aes(x = x, y = y)) +
geom_raster(aes(fill = val), interpolate = TRUE) +
scale_fill_gradient(
low = "white",
high = "red", aesthetics = 'fill')
But now imagine I have another value per grid point
grid$second_val = apply(grid, 1, function(w) { w['x'] * w['y'] + runif(1) }
Now, how do I plot a grid where each position "(x,y)" is coloured with an overlay of:
1 "white to red" gradient with value given by val
1 "white to blue" gradient with value given by second_val
Essentially, in most applications val and second_val will be two 2D density functions and I would like each gradient to represent the density value. I need two different colours to see the different distribution of the values.
I have seen this similar question but don't know how to use that answer in my case.
#Axeman's answer to my question, which you linked to, applies directly the same to your question.
Note that scales::color_ramp() uses values between 0 and 1, so normalize val and second_val between 0, 1 before plotting
grid$val_norm <- (grid$val-min(grid$val))/diff(range(grid$val))
grid$second_val_norm <- (grid$second_val-min(grid$second_val))/diff(range(grid$second_val))
Now plot using #Axeman's answer. You can plot one later as raster, and overlay the second with annotate. I have added transparency (alpha=.5) otherwise you'll only be able to see the second layer.:
ggplot(grid, aes(x = x, y = y)) +
geom_raster(aes(fill=val)) +
scale_fill_gradient(low = "white", high = "red", aesthetics = 'fill') +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","blue"))(grid$second_val_norm))
Or, you can plot both layers using annotate().
# plot using annotate()
ggplot(grid, aes(x = x, y = y)) +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","red"))(grid$val_norm)) +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","blue"))(grid$second_val_norm))
I have got a map with a legend gradient and I would like to add a box for the NA values. My question is really similar to this one and this one. Also I have read this topic, but I can't find a "nice" solution somewhere or maybe there isn't any?
Here is an reproducible example:
library(ggplot2)
map <- map_data("world")
map$value <- setNames(sample(-50:50, length(unique(map$region)), TRUE),
unique(map$region))[map$region]
map[map$region == "Russia", "value"] <- NA
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black")
So I would like to add a black box for the NA value for Russia. I know, I can replace the NA's by a number, so it will appear in the gradient and I think, I can write a workaround like the following, but all this workarounds do not seem like a pretty solution for me and also I would like to avoid "senseless" warnings:
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black") +
geom_point(aes(x = -100, y = -50, size = "NA"), shape = NA, colour = "black") +
guides(size = guide_legend("NA", override.aes = list(shape = 15, size = 10)))
Warning messages:
1: Using size for a discrete variable is not advised.
2: Removed 1 rows containing missing values (geom_point).
One approach is to split your value variable into a discrete scale. I have done this using cut(). You can then use a discrete color scale where "NA" is one of the distinct colors labels. I have used scale_fill_brewer(), but there are other ways to do this.
map$discrete_value = cut(map$value, breaks=seq(from=-50, to=50, length.out=8))
p = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=discrete_value)) +
scale_fill_brewer(palette="RdYlBu", na.value="black") +
coord_quickmap()
ggsave("map.png", plot=p, width=10, height=5, dpi=150)
Another solution
Because the original poster said they need to retain the color gradient scale and the colorbar-style legend, I am posting another possible solution. It has 3 components:
We need to trick ggplot into drawing a separate color scale by using aes() to map something to color. I mapped a column of empty strings using aes(colour="").
To ensure that we do not draw a colored boundary around each polygon, I specified a manual color scale with a single possible value, NA.
Finally, guides() along with override.aes is used to ensure the new color legend is drawn as the correct color.
p2 = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=value, colour="")) +
scale_fill_gradient2(low="brown3", mid="cornsilk1", high="turquoise4",
limits=c(-50, 50), na.value="black") +
scale_colour_manual(values=NA) +
guides(colour=guide_legend("No data", override.aes=list(colour="black")))
ggsave("map2.png", plot=p2, width=10, height=5, dpi=150)
It's possible, but I did it years ago. You can't use guides. You have to set individually the continuous scale for the values as well as the discrete scale for the NAs. This is what the error is telling you and this is how ggplot2 works. Did you try using both scale_continuous and scale_discrete since your set up is rather awkward, instead of simply using guides which is basically used for simple plot designs?
I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):
I am trying to make a labeled bubble plot with ggplot2 in R. Here is the simplified scenario:
I have a data frame with 4 variables: 3 quantitative variables, x, y, and z, and another variable that labels the points, lab.
I want to make a scatter plot, where the position is determined by x and y, and the size of the points is determined by z. I then want to place text labels beside the points (say, to the right of the point) without overlapping the text on top of the point.
If the points did not vary in size, I could try to simply modify the aesthetic of the geom_text layer by adding a scaling constant (e.g. aes(x=x+1, y=y+1)). However, even in this simple case, I am having a problem with positioning the text correctly because the points do not scale with the output dimensions of the plot. In other words, the size of the points remains constant in a 500x500 plot and a 1000x1000 plot - they do not scale up with the dimensions of the outputted plot.
Therefore, I think I have to scale the position of the label by the size (e.g. dimensions) of the output plot, or I have to get the radius of the points from ggplot somehow and shift my text labels. Is there a way to do this in ggplot2?
Here is some code:
# Stupid data
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
# Plot with bad label placement
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I should mention, I tried hjust and vjust inside of geom_text, but it does not produce the desired effect.
# Trying hjust and vjust, but it doesn't look nice
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab), hjust=0, vjust=0.5,
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I managed to get something that works for now, thanks to Henrik and shujaa. I will leave the question open just in case someone shares a more general solution.
Just a blurb of what I am using this for: I am plotting a map, and indicating the amount of precipitation at certain stations with a point that is sized proportionally to the amount of precipitation observed. I wanted to add a station label beside each point in an aesthetically pleasing manner. I will be making more of these plots for different regions, and my output plot may have a different resolution or scale (e.g. due to different projections) for each plot, so a general solution is desired. I might try my hand at creating a custom position_jitter, like baptiste suggested, if I have time during the weekend.
It appears that position_*** don't have access to the scales used by other layers, so it's a no go. You could make a clone of GeomText that shifts the labels according to the size mapped,
but it's a lot of effort for a very kludgy and fragile solution,
geom_shiftedtext <- function (mapping = NULL, data = NULL, stat = "identity",
position = "identity",
parse = FALSE, ...) {
GeomShiftedtext$new(mapping = mapping, data = data, stat = stat, position = position,
parse = parse, ...)
}
require(proto)
GeomShiftedtext <- proto(ggplot2:::GeomText, {
objname <- "shiftedtext"
draw <- function(., data, scales, coordinates, ..., parse = FALSE, na.rm = FALSE) {
data <- remove_missing(data, na.rm,
c("x", "y", "label"), name = "geom_shiftedtext")
lab <- data$label
if (parse) {
lab <- parse(text = lab)
}
with(coord_transform(coordinates, data, scales),
textGrob(lab, unit(x, "native") + unit(0.375* size, "mm"),
unit(y, "native"),
hjust=hjust, vjust=vjust, rot=angle,
gp = gpar(col = alpha(colour, alpha),
fontfamily = family, fontface = fontface, lineheight = lineheight))
)
}
})
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1.2,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z), shape=1) +
geom_shiftedtext(aes(label=lab, size=z),
hjust=0, colour="red") +
scale_size_continuous(range=c(5, 100), guide="none")
This isn't a very general solution, because you'll need to tweak it every time, but you should be able to add to the x value for the text some value that's linear depending on z.
I had luck with
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab, x = x + .06 + .14 * (z - min(z))),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
but, as the font size depends on your window size, you would need to decide on your output size and tweak accordingly. I started with x = x + .05 + 0 * (z-min(z)) and calibrated the intercept based on the smallest point, then when I was happy with that I adjusted the linear term for the biggest point.
Another alternative. Looks OK with your test data, but you need to check how general it is.
dodge <- abs(scale(df$z))/4
ggplot(data = df, aes(x = x, y = y)) +
geom_point(aes(size = z)) +
geom_text(aes(x = x + dodge), label = df$lab, colour = "red") +
scale_size_continuous(range = c(5, 50), guide = "none")
Update
Just tried position_jitter, but the width argument only takes one value, so right now I am not sure how useful that function would be. But I would be happy to find that I am wrong. Example with another small data set:
df3 <- mtcars[1:10, ]
ggplot(data = df3, aes(x = wt, y = mpg)) +
geom_point(aes(size = qsec), alpha = 0.1) +
geom_text(label = df3$carb, position = position_jitter(width = 0.1, height = 0)) +
scale_size_continuous(range = c(5, 50), guide = "none")