How to create legend (continuous colourbar) based on specific range of values in R/ggplot2? - r

I have a dataframe named myKrige_new contains some latitude-longutude wise interpolated values. You can download from HERE. I have plotted this values on a particular area of a country map using ggplot2 package in R and I got this plot
But I want the legend(colourbar) of my plot would be like the following legend.
In my dataset here, the range of the data (pred) is 72 to 257. But I want my legend would show the value 0 to 200 because of comparing reason with other plot though there no value under 72 here .
So, I want to use 20 different colour like above legend that means last box of legend would contain colour regarding value greater than 200. I have used scale_fill_gradientn function but it didn't work. I have spend days to find some option to do it in R, didn't get success. Any kind of help will be highly appreciable.
R code :
library(scales)
library(ggplot2)
myKrige_new <- read.csv ("myKrige_new.csv")
range(myKrige_new$LON)
range(myKrige_new$LAT)
#Original skorea data transformed the same was as myKrige_new
skorea1 <- getData("GADM", country= "KOR", level=1)
skorea1 <- fortify(skorea1)
myKorea1 <- data.frame(skorea1)
###############
ggplot()+
theme_minimal() +
#SOLUTION 1:
#geom_tile(data = myKrige_new, aes(x= LON, y= LAT, fill = pred)) +
#SOLUTION 2: Uncomment the line(s) below:
#geom_point(data = myKrige_new, aes(x= LON, y= LAT, fill = pred),
#shape=22, size=8, colour=NA)+
#Solution 3
stat_summary_2d(data=myKrige_new, aes(x = LON, y = LAT, z = pred),bins = 30,
binwidth = c(0.05,0.05)) +
scale_fill_gradientn(colours=c("white","blue","green","yellow","red"),
values=rescale(c(0,50,100,150,200)),
guide="colorbar", name = "PM10 Conc")+
geom_map(data= myKorea1, map= myKorea1, aes(x=long,y=lat,map_id=id,group=group),
fill=NA, colour="black") +
coord_cartesian(xlim= c(126.6, 127.2), ylim= c(37.2 ,37.7)) +
labs(title= "PM10 Concentration in Seoul Area at South Korea",
x="", y= "")+
theme(legend.position = "bottom")+
guides(fill = guide_colourbar(barwidth = 27, barheight = NULL,
title.position = "bottom", title.hjust = 0.5))

Here is a working solution:
library(scales)
library(ggplot2)
library(raster) # needed for the `getData` function
library(dplyr) # needed for the `mutate` funtion
myKrige_new <- read.csv("~/Downloads/myKrige_new.csv")[-1]
range(myKrige_new$LON)
range(myKrige_new$LAT)
# Original skorea data transformed the same was as myKrige_new
skorea1 <- getData("GADM", country= "KOR", level=1)
skorea1 <- fortify(skorea1)
myKorea1 <- data.frame(skorea1)
# the range of pred goes above 200 (max = 257)
summary(myKrige_new$pred)
ggplot() +
theme_minimal() +
stat_summary_2d(data = mutate(myKrige_new,
pred = ifelse(pred > 200, 200, pred)),
aes(x = LON, y = LAT, z = pred),
bins = 30,
binwidth = c(0.05,0.05)) +
scale_fill_gradientn(colours=c("white","blue","green","yellow","red"),
values=rescale(c(0,50,100,150,200)),
name = expression(paste(PM[10], group("[",paste(mu,g/m^3), "]"))),
limits = c(0,200),
breaks = seq(0,200, 20),
guide = guide_colorbar(nbin = 20,
barwidth = 27,
title.position = "bottom",
title.hjust = 0.5,
raster = FALSE,
ticks = FALSE)) +
geom_map(data= myKorea1,
map= myKorea1,
aes(x=long,y=lat,map_id=id,group=group),
fill=NA,
colour="black") +
coord_equal(xlim= c(126.6, 127.2),
ylim= c(37.2 ,37.7)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(expand = c(0,0)) +
labs(title = "PM10 Concentration in Seoul Area at South Korea",
x = "",
y = "") +
theme(legend.position = "bottom")
I added limits = c(0,200) and breaks = seq(0, 200, 20) to scale_fill_gradientn as well as nbin = 20 to guide_colorbar, this last change is optional because the default nbin is 20, but in your case you actually need 20. Also, adding limits means values outside the range are plotted in grey50 so I had to transform pred values above 200 to 200 to avoid that; the interpretation of red color is now 200+.
One more thing, the option raster in guide_colorbar changes the colorbar from a raster object to a set of rectangles achieving the look you were looking for.
Finally, I changed the coordinate system from cartesian to equal because you are plotting a map.
Here is the result hope it helps:
Update: added a expand argument to scale_y_continuous and scale_x_continuous as requested by OP

Related

change environment size on scale_colour_manual to assign colour to factors to use across multiples plots

I need to make 5 plots of bacteria species. Each plot has a different number of species present in a range of 30-90. I want each bacteria to always have the same color in all plots, therefore I need to set an assigned color to each name.
I tried to use scale_colour_manual to create a color set but, the environment created has only 16 colors. How can I increase the number of colors present in the environment created?
the code I am using can be replicated as follow:
colour_genus <- stringi::stri_rand_strings(90, 5) #to be random names
nb.cols = nrow(colour_genus) #to set the length of my string
MyPalette = colorRampPalette(brewer.pal(12,"Set1"))(nb.cols) # the palette of choice
colGenus <- scale_color_manual(name = colour_genus, values = MyPalette)
The output formed contains only 16 values, so when I try to apply it to a figure with 90 factors, it complains I have only 16 values
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
p <- ggplot(my_data, aes(x = colour_genus, y= abundance)) +
geom_bar(aes(color = colour_genus, fill = colour_genus), stat = "identity", position = "stack") +
labs(x = "", y = "Relative Abundance\n") +
theme(panel.background = element_blank())
p + theme(legend.text= element_text(size=7, face="bold"), axis.text.x = element_text(angle = 90)) + guides(fill=guide_legend(ncol=2)) + scale_fill_manual(values=colGenus)
The following error shows:
Error: Insufficient values in manual scale. 90 needed but only 16 provided.
Thank you very much for your help.
When you know all your 90 bacci names in front of plotting, you can try.
set.seed(123)
colour_genus <- sort(stringi::stri_rand_strings(90, 5))#to be random names. I sorted the vector to illustrate the output better (optional).
MyPalette <- sample(colors(), length(colour_genus))
# named vector for scale_fill
names(MyPalette) <- colour_genus
# data
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
# two sets to show results
set1 <- my_data[20:30,]
set2 <- my_data[25:35,]
ggplot(set1, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)
ggplot(set2, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)

Plotting a vertical normal distribution next to a box plot in R

I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this:
This is what I currently have graphed from an excel sheet uploaded to R:
And the code associated with them:
set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape=20, fill="gray", color = "gray") +
geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"),
outlier.color = "NA") +
theme_hc()
Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?
You can do something similar to your example plot with the gghalves package:
library(gghalves)
n=0.02
ggplot(iris, aes(Species, Sepal.Length)) +
geom_half_boxplot(center=TRUE, errorbar.draw=FALSE,
width=0.5, nudge=n) +
geom_half_violin(side="r", nudge=n) +
geom_half_dotplot(dotsize=0.5, alpha=0.3, fill="red",
position=position_nudge(x=n, y=0)) +
theme_hc()
There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:
set.seed(12345)
X8_17_20_R_20_60 <- data.frame(
Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
Now we create a little data frame of normal distributions based on the parameters taken from each group:
df <- do.call(rbind, mapply( function(d, n) {
y <- seq(min(d), max(d), length.out = 1000)
data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
}, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
Finally, we draw your plot and add a geom_path with the new data.
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
theme_hc()
Created on 2020-08-21 by the reprex package (v0.3.0)

Is there an equivalent to points() on ggplot2

I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))

Create discrete colour axis

I am plotting concentration data on a map, my data looks like this:
Lat Long Conc Colour
-33.90624 151.2237 10.0 #4393C3
-33.92404 151.2280 12.95 #92C5DE
-33.92384 151.2275 14.0 #D1E5F0
Plotting on the map using:
map <- ggmap(map)+
scale_x_continuous(limits = c(151.220, 151.230), expand = c(0, 0)) +
scale_y_continuous(limits = c(-33.927, -33.902), expand = c(0, 0))
map +
geom_point(data = df_avg, aes(x = df_avg$Long, y = df_avg$Lat),
col = df_avg$Colour, cex = 4.2) +
ggtitle(paste0("PM2.5 (ug/m3)", " ", title_start," - ", title_end))
The colour scale I am using is: (the "RdBu" RColorBrewer palette)
"#67001F" "#B2182B" "#D6604D" "#F4A582" "#FDDBC7"
"#F7F7F7" "#D1E5F0" "#92C5DE" "#4393C3" "#2166AC" "#053061"
I would like to add a labelled colour axis on the side of my map with discrete squares. Can anyone help me with this?
I have attached a picture of what I am thinking of - this was taken from R ggplot2 discrete colour palette for gradient map (this question didn't help me sadly as I am using ggmap)
Thank you!
You should let ggplot do the color mapping for you as #MikkoMartillia suggested. Then you get an automatic legend. Below I created a reproducible example from your data. First I added the color labels to the data. Then I basically used your code to create the plot, but moved the color inside the aes call. Finally I added a color scale with the "RdBu" palette and the correct limits.
# import packages
require(tibble)
require(dplyr)
require(ggmap)
require(RColorBrewer)
# load data
df_avg <- tribble(~Lat, ~Long, ~Conc, ~Colour,
-33.90624, 151.2237, 10.0, "#4393C3",
-33.92404, 151.2280, 12.95, "#92C5DE",
-33.92384, 151.2275, 14.0, "#D1E5F0")
# add colour to data
df_labels <- tibble(label = letters[1:11], # change this to sensible labels
Colour = brewer.pal(11, "RdBu"))
df_avg <- left_join(df_avg, df_labels)
# download map
map <- get_map(location = c(lon = 151.225, lat = -33.913), zoom = 14)
# map plot
p_map <- ggmap(map)+
scale_x_continuous(limits = c(151.220, 151.230), expand = c(0, 0)) +
scale_y_continuous(limits = c(-33.927, -33.902), expand = c(0, 0))
# add points to map
p_map +
geom_point(data = df_avg, aes(x = Long, y = Lat, color = label), cex = 4.2) +
scale_color_brewer(palette = "RdBu", limits = df_labels$label)

ggplot Discrete Legend Continuous Data

I'm new to R programming but I'm enjoying the challenge of writing code!
I created a GIF by stitching multiple map plots together. Unfortunately,
my legend is referencing the particular year of the map being generated and as a result, the GIF shows a legend that has its marks moving up and down. I think the solution would be to have the legend reference the entire data-frame rather than the given year. How do I do this?
Link to the GIF:
https://1drv.ms/i/s!Ap-NxMqZOClHqgsFHSxo-kR1pLrr
##This is the R-Code I used for the year 1950:
kansas1950 <- readShapePoly("KansasCOUNTIES.shp")
## Kansas Winter-Wheat Planted from Quickstats
kansas1950.acres <- read.csv(file = "KWW 19502016 QuickStatsEst.csv",
stringsAsFactors = FALSE)
## Create a smaller dataset by retaining the kansas Acres in 1950 and the FIPS
## FIPS, which will be used for matching and merging with the input shapefile
smaller.data1950 <- data.frame(FIPS = kansas1950.acres$FIPS, Acres = kansas1950.acres$X1950)
smaller.data1950 <- na.omit(smaller.data1950)
## Join the two datasets using their common field
matched.indices1950 <- match(kansas1950#data[, "FIPS"], smaller.data1950[, "FIPS"])
kansas1950#data <- data.frame(kansas1950#data, smaller.data1950[matched.indices1950, ])
## Compute the cartogram transformation of each county using its population
## with the degree of Gaussian blur = 0.5
kansas1950.carto <- quick.carto(kansas1950, kansas1950#data$Acres, blur = 0.5)
## Convert the object into data frame
kansas1950.carto <- gBuffer(kansas1950.carto, byid=TRUE, width=0)
kansas1950.f <- fortify(kansas1950.carto, region = "FIPS")
## Merge the cartogram transformation with the kansas map shapefile
kansas1950.f <- merge(kansas1950.f, kansas1950#data, by.x = "id", by.y = "FIPS")
# Plot of the transformed polygons, where each county is
## further shaded by their acreage (lighter means bigger)
my_map1950 <- ggplot(kansas1950.f, aes(long, lat, group = group,
fill = kansas1950.f$Acres)) + geom_polygon() +
scale_fill_continuous(breaks = c(0, 10000, 100000, 200000, 526000),
labels = c("0 Acres","10k Acres", "100k Acres", "200k Acres", "526k Acres"),
low = "black",
high = "purple"
) +
labs(x=NULL, y=NULL) + labs(fill = "Acres Planted")
# Remove default ggplot layers
my_map1950 <-my_map1950 + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.ticks=element_blank(),
axis.text.x=element_blank(),axis.text.y=element_blank(),
axis.line = element_line(colour = NA))
# Citation
my_map1950 <- my_map1950 + labs(caption = "USDA-NASS Quick Stats") + ggtitle("1950 Kansas Winter-Wheat Acres Planted")
my_map1950
# Save a higher resolution PNG
png('my_map1950kwwpurp.png', units="in", width=10, height=8, res=300)
my_map1950
dev.off()
Assuming this is what you want, try adding this to your plot (but, of course, specifying your own custom lower and upper limits):
+ scale_fill_gradient(limits = c(0, 10))
I have a sample df that worked:
df <- data.frame(x = 1:10)
p <- ggplot(df, aes(x, 1)) + geom_tile(aes(fill = x), colour = "white")
p + scale_fill_gradient(limits = c(0, 10))
p + scale_fill_gradient(limits = c(0, 20))
Here's the graph with the scale set from 0 to 10.
Here's the graph with the scale set from 0 to 20.
EDIT: Oh, I see now that you have called scale_fill_continuous() in your code. Try adding a limits argument similar to what I did to that.

Resources