ggmap - stat_binhex and scale_fill_gradientn limits - ggmap

I use ggmap and stat_binhex to visualize the density of the boat in a given area. I don't want to display all the positions, only the ones where I really have a high density of positions.
So I use "scale_fill_gradientn" with the limits parameter to filter all the hexagon that have less than 500 positions. The limit parameter needs to specify the lowest and the highest value. While it's fine for the lowest value, I don't want to manually specify the highest value (currently 100000) but instead get it from the "stat_binhex" result. Do you know if it's possible and how I can do that ?
Here is my current code :
ggmap(map, extent = "panel", maprange=FALSE) +
coord_cartesian() +
theme(legend.position='none') +
geom_point(data=positions, aes(x=positions$x, y=positions$y), alpha=0.1, size=0.01, color="grey") +
stat_binhex(data=positions, aes(x,y), binwidth=c(0.05, 0.05)) +
scale_fill_gradientn( colours= brewer.pal( 6, "YlGn"),
na.value = NA, trans= "log10", limits=c(500,100000))
Thanks for your help
Arnaud

The help page ?scale_fill_gradientn points to ?continuous_scale which states that you can use NA (or NA_real_) for the limit you don't want to set manually. With a dummy data set:
Here is the complete code:
library(ggmap)
library(RColorBrewer)
left <- -4.8
bottom <- 45.8
right <- -1.2
top <- 48.2
map <- get_stamenmap(c(left, bottom, right, top),
maptype = "toner", zoom = 8)
positions <- data.frame(x = rnorm(5e5, mean = -4, sd = 0.5),
y = rnorm(5e5, mean = 46.5, sd = 0.3))
positions <- positions[with(positions, x > left & x < right &
y > bottom & y < top), ]
print(ggmap(map, extent = "panel", maprange=FALSE) +
coord_cartesian() +
theme(legend.position="none") +
geom_point(data=positions, aes(x, y), alpha=0.1, size=0.01,
color="grey") +
stat_binhex(data=positions, aes(x, y), binwidth=c(0.05, 0.05)) +
scale_fill_gradientn(colours = brewer.pal( 6, "YlGn"),
na.value = NA, trans= "log10", limits = c(500, NA)))

Related

How to plot filled points and confidence ellipses with the same color using ggplot in R?

I would like to plot a graph from a Discriminant Function Analysis in which points must have a black border and be filled with specific colors and confidence ellipses must be the same color as the points are filled. Using the following code, I get almost the graph I want, except that points do not have a black border:
library(ggplot2)
library(ggord)
library(MASS)
data("iris")
set.seed(123)
linear <- lda(Species~., iris)
linear
dfaplot <- ggord(linear, iris$Species, labcol = "transparent", arrow = NULL, poly = FALSE, ylim = c(-11, 11), xlim = c(-11, 11))
dfaplot +
scale_shape_manual(values = c(16,15,17)) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 1
I could put a black border on the points by using the following code, but then confidence ellipses turn black.
dfaplot +
scale_shape_manual(values = c(21,22,24)) +
scale_color_manual(values = c("black","black","black")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 2
I would like to keep the ellipses as in the first graph, but the points as in the second one. However, I am being unable to figure out how I could do this. If anyone has suggestions on how to do this, I would be very grateful. I am using the "ggord" package because I learned how to run the analysis using it, but if anyone has suggestions on how to do the same with only ggplot, it would be fine.
This roughly replicates what is going on in ggord. Looking at the source for the package, the ellipses are implemented differently in ggord than below, hence the small differences. If that is a big deal you can review the source and make changes. By default, geom_point doesn't have a fill attribute. So we set the shapes to a character type that does, and then specify color = 'black' in geom_point(). The full code (including projecting the original data) is below.
set.seed(123)
linear <- lda(Species~., iris)
linear
# Get point x, y coordinates
df <- data.frame(predict(linear, iris[, 1:4]))
df$species <- iris$Species
# Get explained variance for each axis
var_exp <- 100 * linear$svd ^ 2 / sum(linear$svd ^ 2)
ggplot(data = df,
aes(x = x.LD1,
y = x.LD2)) +
geom_point(aes(fill = species,
shape = species),
size = 4) +
stat_ellipse(aes(color = species),
level = 0.95) +
ylim(c(-11, 11)) +
xlim(c(-11, 11)) +
ylab(paste("LD2 (",
round(var_exp[2], 2),
"%)")) +
xlab(paste("LD1 (",
round(var_exp[1], 2),
"%)")) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_shape_manual(values = c(21, 22, 24)) +
coord_fixed(1) +
theme_bw() +
theme(
legend.position = "none"
)
To plot arrows, you can grab the scaling from the output it and plot it with geom_segment. I played with the colors/alpha so they were visible in the plot below.
scaling <- data.frame(linear$scaling)
...
geom_segment(data = scaling,
aes(x = 0,
y = 0,
xend = LD1,
yend = LD2),
arrow = arrow(),
color = "black") +
geom_text(data = scaling,
aes(x = ifelse(LD1 <= 0.1, LD1 - 2, LD1 + 2),
y = ifelse(LD2 <= 0.1, LD2 - 1, LD2 + 1)),
label = rownames(scaling),
color = "black") +
...

r ggplot when two colors overlap

I have some codes to generate a plot,the only problem I have is there're many overlapping colors.
When two colors overlap, how do I specify the dominant color?
For example, there're 4 black points when indicator = threshold. They are at 4 x-axis correspondingly. However, the black points at "Wire" and "ACH" scales do not show up because it is overlap with blue points. The black point at "RDFI" scale barely shows up. How can I make black as the dominant color when two colors overlap? Thanks ahead!
ggplot(df, aes(a-axis, y-axis), color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE) +
labs(title= 'chart', x='x-axis', y= 'y-axis') +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000" ))
for specify the dominant color you should use the function new_scale () and its aliases new_scale_color () and new_scale_fill ().
As an example, lets overlay some measurements over a contour map of topography using the beloed volcano
library(ggplot2)
library(ggnewscale)
# Equivalent to melt(volcano)
topography <- expand.grid(x = 1:nrow(volcano),
y = 1:ncol(volcano))
topography$z <- c(volcano)
# point measurements of something at a few locations
set.seed(42)
measurements <- data.frame(x = runif(30, 1, 80),
y = runif(30, 1, 60),
thing = rnorm(30))
dominant point:
ggplot(mapping = aes(x, y)) +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
# Color scale for topography
scale_color_viridis_c(option = "D") +
# geoms below will use another color scale
new_scale_color() +
geom_point(data = measurements, size = 3, aes(color = thing)) +
# Color scale applied to geoms added after new_scale_color()
scale_color_viridis_c(option = "A")
dominant contour:
ggplot(mapping = aes(x, y)) +
geom_point(data = measurements, size = 3, aes(color = thing)) +
scale_color_viridis_c(option = "A")+
new_scale_color() +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
scale_color_viridis_c(option = "D")
Your problem may not lie with what color is dominant. You have selected colors that will show up often. You may be losing the bottom of your Y axis. The code you have in your example can not have possibly produced that plot it has errors.
Here is a simple example that show's one way to overcome your problem by simply overplottting the threshold points after you have plotted the beeswarm.
library(dplyr)
library(ggbeeswarm)
distro <- data.frame(
'variable'=rep(c('runif','rnorm'),each=1000),
'value'=c(runif(2000, min=-3, max=3))
)
distro$indicator <- "NA"
distro[3,3] <- "Threshhold"
distro[163,3] <- "Threshhold"
ggplot2::ggplot(distro,aes(variable, value, color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE, width=0.1) +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000")) +
geom_point(data = distro %>% filter(indicator == "Threshhold"))
You sort your data based on the color variable (your indicator).
Basically you want your black dots to be plotted last = on top of the other ones.
df$indicator <- sort(df$indicator, decreasing=T)
#Tidyverse solution
df <- df %>% arrange(desc(indicator))
Dependent on your levels you may have to reverse sort or not.
Then you just plot.
pd <- tibble(x=rnorm(1000), y=1, indicator=sample(c("A","B"), replace=T, size = 1000))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(indicator)
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(desc(indicator))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()

ggplot2 colorbar with discontinuous jump for skewed data

Here is some fake data, x and y, with color information z. z is highly skewed, and as such renders the colorbar uninformative:
set.seed(1)
N <- 100
x <- rnorm(N)
y <- x + rnorm(N)
z <- x+y+rnorm(N)
z[z>2] <- z[z>2]+exp(z[z>2]-2)
d <- data.frame(x,y,z)
ggplot(d, aes(x=x, y=y, color = z)) + geom_point()
I'd like to have most of the colorbar reflect the main range of the the data, but have a box for overflows, say above 5. Something like this:
Is there a way to do this in ggplot2? Note that I would like the colorbar to remain continuous, rather than discrete, for most of its range. I'll probably either discretize or topcode if what I want isn't feasible.
You can get that general plot, although the legends would need more work:
p <- ggplot(d, aes(x=x, y=y, color = z)) + geom_point(size = 5)
p + scale_color_gradient2(
low = 'green', high = 'red', mid = 'grey80', na.value = 'blue', limits= c(-10, 10)
)
You can cheat in some extra legend fluff, e.g.:
ggplot(d, aes(x=x, y=y, color = z, alpha = '>10')) +
geom_point(size = 5) +
scale_color_gradient2(
low = 'green', high = 'red', mid = 'grey80', na.value = 'blue', limits= c(-10, 10),
guide = guide_colorbar(title.position = 'left')
) +
scale_alpha_manual(
values = 1, name = 'z',
guide = guide_legend(
override.aes = list(color = 'blue'), title.position = 'left',
title.theme = element_text(color = 'white', angle = 0)
)
) +
theme(legend.margin = margin(-5, 10, -5, 10))
Note that red/green pallets are bad for the color impaired.
Extending upon Axeman's answer I came up with the following slight hack to get blues into your color scale:
First, define a color map with 20 colors for the values within and 5 for the values outside your range.
cmap <- colorRampPalette(c("green","grey80","red"))(20)
cmap <- append(cmap,rep("blue",5))
Then cut the z values into 20 chunks between -10 and 10 and convert to numeric (resulting in NA's for values above 10). By specifying the cmap in scale_color_gradientn and limits of [1,25] we map values of -10 to 1 (green) and 10 to 20 (red). Finally by specifying breaks we manually add the correct labels (i.e. the 5th category corresponds to values between -6 and -5).
ggplot(d, aes(x=x, y=y, color=as.numeric(cut(z, breaks=seq(-10,10))))) +
geom_point(size=3) +
scale_color_gradientn(colors=cmap, limits=c(1,25), breaks=c(5,11,17,23),
labels=c(-6,0,6,">10"), name="z", na.value = "blue")
Lovely result :)
The only issue is that you will have to make sure that no values will ever fall below -10 as they would also be shown in blue as well using this method.

How to create legend (continuous colourbar) based on specific range of values in R/ggplot2?

I have a dataframe named myKrige_new contains some latitude-longutude wise interpolated values. You can download from HERE. I have plotted this values on a particular area of a country map using ggplot2 package in R and I got this plot
But I want the legend(colourbar) of my plot would be like the following legend.
In my dataset here, the range of the data (pred) is 72 to 257. But I want my legend would show the value 0 to 200 because of comparing reason with other plot though there no value under 72 here .
So, I want to use 20 different colour like above legend that means last box of legend would contain colour regarding value greater than 200. I have used scale_fill_gradientn function but it didn't work. I have spend days to find some option to do it in R, didn't get success. Any kind of help will be highly appreciable.
R code :
library(scales)
library(ggplot2)
myKrige_new <- read.csv ("myKrige_new.csv")
range(myKrige_new$LON)
range(myKrige_new$LAT)
#Original skorea data transformed the same was as myKrige_new
skorea1 <- getData("GADM", country= "KOR", level=1)
skorea1 <- fortify(skorea1)
myKorea1 <- data.frame(skorea1)
###############
ggplot()+
theme_minimal() +
#SOLUTION 1:
#geom_tile(data = myKrige_new, aes(x= LON, y= LAT, fill = pred)) +
#SOLUTION 2: Uncomment the line(s) below:
#geom_point(data = myKrige_new, aes(x= LON, y= LAT, fill = pred),
#shape=22, size=8, colour=NA)+
#Solution 3
stat_summary_2d(data=myKrige_new, aes(x = LON, y = LAT, z = pred),bins = 30,
binwidth = c(0.05,0.05)) +
scale_fill_gradientn(colours=c("white","blue","green","yellow","red"),
values=rescale(c(0,50,100,150,200)),
guide="colorbar", name = "PM10 Conc")+
geom_map(data= myKorea1, map= myKorea1, aes(x=long,y=lat,map_id=id,group=group),
fill=NA, colour="black") +
coord_cartesian(xlim= c(126.6, 127.2), ylim= c(37.2 ,37.7)) +
labs(title= "PM10 Concentration in Seoul Area at South Korea",
x="", y= "")+
theme(legend.position = "bottom")+
guides(fill = guide_colourbar(barwidth = 27, barheight = NULL,
title.position = "bottom", title.hjust = 0.5))
Here is a working solution:
library(scales)
library(ggplot2)
library(raster) # needed for the `getData` function
library(dplyr) # needed for the `mutate` funtion
myKrige_new <- read.csv("~/Downloads/myKrige_new.csv")[-1]
range(myKrige_new$LON)
range(myKrige_new$LAT)
# Original skorea data transformed the same was as myKrige_new
skorea1 <- getData("GADM", country= "KOR", level=1)
skorea1 <- fortify(skorea1)
myKorea1 <- data.frame(skorea1)
# the range of pred goes above 200 (max = 257)
summary(myKrige_new$pred)
ggplot() +
theme_minimal() +
stat_summary_2d(data = mutate(myKrige_new,
pred = ifelse(pred > 200, 200, pred)),
aes(x = LON, y = LAT, z = pred),
bins = 30,
binwidth = c(0.05,0.05)) +
scale_fill_gradientn(colours=c("white","blue","green","yellow","red"),
values=rescale(c(0,50,100,150,200)),
name = expression(paste(PM[10], group("[",paste(mu,g/m^3), "]"))),
limits = c(0,200),
breaks = seq(0,200, 20),
guide = guide_colorbar(nbin = 20,
barwidth = 27,
title.position = "bottom",
title.hjust = 0.5,
raster = FALSE,
ticks = FALSE)) +
geom_map(data= myKorea1,
map= myKorea1,
aes(x=long,y=lat,map_id=id,group=group),
fill=NA,
colour="black") +
coord_equal(xlim= c(126.6, 127.2),
ylim= c(37.2 ,37.7)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(expand = c(0,0)) +
labs(title = "PM10 Concentration in Seoul Area at South Korea",
x = "",
y = "") +
theme(legend.position = "bottom")
I added limits = c(0,200) and breaks = seq(0, 200, 20) to scale_fill_gradientn as well as nbin = 20 to guide_colorbar, this last change is optional because the default nbin is 20, but in your case you actually need 20. Also, adding limits means values outside the range are plotted in grey50 so I had to transform pred values above 200 to 200 to avoid that; the interpretation of red color is now 200+.
One more thing, the option raster in guide_colorbar changes the colorbar from a raster object to a set of rectangles achieving the look you were looking for.
Finally, I changed the coordinate system from cartesian to equal because you are plotting a map.
Here is the result hope it helps:
Update: added a expand argument to scale_y_continuous and scale_x_continuous as requested by OP

Specifying the scale for the density in ggplot2's stat_density2d

I'm looking to create multiple density graphs, to make an "animated heat map."
Since each frame of the animation should be comparable, I'd like the density -> color mapping on each graph to be the same for all of them, even if the range of the data changes for each one.
Here's the code I'd use for each individual graph:
ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level..), geom="polygon", bins=3, size=1)
Imagine I use this same code, but 'this_df' changes on each frame. So in one graph, maybe density ranges from 0 to 4e-4. On another, density ranges from 0 to 4e-2.
By default, ggplot will calculate a distinct density -> color mapping for each of these. But this would mean the two graphs-- the two frames of the animation--aren't really comparable. If this were a histogram or density plot, I'd simply make a call to coord_cartesian and change the x and y lim. But for the density plot, I have no idea how to change the scale.
The closest I could find is this:
Overlay two ggplot2 stat_density2d plots with alpha channels
But I don't have the option of putting the two density plots on the same graph, since I want them to be distinct frames.
Any help would be hugely appreciated!
EDIT:
Here's a reproducible example:
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)), geom="contour", bins=4, size= 2)
}
print(g) # level has a different scale for each
I would like to leave an update for this question. As of July 2016, stat_density2d is not taking breaks any more. In order to reproduce the graphic, you need to move breaks=1e-6*seq(0,10,by=2) to scale_alpha_continuous().
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) +
scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
geom="contour", bins=4, size= 2) +
scale_alpha_continuous(limits=c(0,1e-5), breaks=1e-6*seq(0,10,by=2))+
scale_color_discrete("Condition")
}
do.call(grid.arrange,c(g,ncol=2))
So to have both plots show contours with the same levels, use the breaks=... argument in stat_densit2d(...). To have both plots with the same mapping of alpha to level, use scale_alpha_continuous(limits=...).
Here is the full code to demonstrate:
library(ggplot2)
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
breaks=1e-6*seq(0,10,by=2),geom="contour", bins=4, size= 2)+
scale_alpha_continuous(limits=c(0,1e-5))+
scale_color_discrete("Condition")
}
library(gridExtra)
do.call(grid.arrange,c(g,ncol=2))
And the result...
Not sure how useful this is, but I found it easier to either use:
scale_fill_gradient(low = "purple", high = "yellow", limits = c(0, 1000))
Where you can overwrite the limits of the plot easily, choose colors etc. and you can just add it at the end of your code so it'll overwrite most things it needs to, so it's easy to use
or a similar solution using:
library(viridis)#colors for heat map
scale_fill_viridis(option = 'inferno')+
scale_fill_viridis_c(limits = c(0, 1000))

Resources