Draw circles on dotplot to mark optimal regions in ggplot2 - r

I have a point plot with power plants that have the properties x and y. The best power plants are those were x and y are both high. I now want to visualize which regions of my plot are desirable and less desirable for a power plant to be in
What i need to produce is something like this:
Any ideas? i tried drawing huge dots with geom_point but they change size depending on the export resolution of my plot I also tried drawing circles like here but ggplot doesn't draw the circles correctly if parts of it fall outside the actual plot

When using #joran's circle function here, this seems to work:
# prepare data for circles
green <- circleFun(center = c(5, 5), diameter = 1.5, npoints = 100)
yellow <- circleFun(center = c(5, 5), diameter = 3, npoints = 100)
orange <- circleFun(center = c(5, 5), diameter = 4.5, npoints = 100)
red <- circleFun(center = c(5, 5), diameter = 6, npoints = 100)
dat <- rbind(green, yellow, orange, red)
# specify levels to get the order of colours right,
# and set the plotting order from large (red) to small (green) circle
dat$fill <- factor(rep(c("green", "yellow", "orange", "red"), each = 100),
levels = rev(c("green", "yellow", "orange", "red")))
# dummy data for points
dat2 <- data.frame(x = rnorm(100, mean = 3), y = rnorm(100, mean = 3))
ggplot(data = dat, aes(x = x, y = y)) +
geom_polygon(aes(fill = fill)) +
geom_point(data = dat2, aes(x = x, y = y)) +
coord_cartesian(xlim = c(0, 5), ylim = c(0, 5)) +
scale_fill_manual(values = rev(c("green", "yellow", "orange", "red")))
Things that did not work so well:
For some reason circles are distorted when (1) the legend is turned off using theme(legend.position = "none"), and (2) when scale_fill_identity() is used, instead of scale_fill_manual(), to pick colours from 'fill' variable in 'dat'. I have no clue why.

Related

How to make geom_ribbon have gradation color in ggplot2

I would like to make geom_ribbon have gradation color.
For example, I have data.frame as below;
df <-data.frame(Day = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)), # create random data
Depth = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
group = c(rep('A', 300), rep('B', 150))) # add two groups
With this data.frame, I make ggplot using geom_ribbon as below
gg <-
ggplot(data=df,aes(x=Day))+
geom_ribbon(aes(ymin=Depth,ymax=max(Depth)),alpha = 0.25)+
ylim(max(df$Depth),0)+
facet_wrap(~group,scales = "free_x",ncol=2)+
labs(x="Days(d)",y="Depth (m)")
gg
, which makes a following plot;
Here, I would like to make the ribbon have gradation color by the value of y-axis (i.e. df$Depth, in this case). However, I do not how to do it.
I can do it by geom_point as below;
gg <- gg +
geom_point(aes(y=Depth,color=Depth),alpha = 1, shape = 20, size=5)+
scale_color_gradient2(midpoint = 5,
low = "red", mid="gray37", high = "black",
space ="Lab")
gg
But, I want the color gradation on ribbon by filling the ribbon area, not on each point.
Do you have any suggestion to do it with geom_ribbon?
I do not know this is perfect, but I found a solution for what I want as follows;
First, I prepare data.frame;
df <-data.frame(Day = c(rnorm(300, 7, 2), rnorm(150, 5, 1)), # create random data
Depth = c(rnorm(300, 10, 2.5), rnorm(150, 7, 2)),
group = c(rep('A', 300), rep('B', 150))) # add two groups
Second, prepare the gradation background by following the link; log background gradient ggplot
xlength <- ceiling(max(df$Day))
yseq <- seq(0,max(df$Depth), length=100)
bg <- expand.grid(x=0:xlength, y=yseq) # dataframe for all combinations
Third, plot by using ggplot2;
gg <- ggplot() +
geom_tile(data=bg,
aes(x=x, y=y, fill=y),
alpha = 0.75)+ # plot the gradation
scale_fill_gradient2(low='red', mid="gray37", high = "black",
space ="Lab",midpoint = mean(df$Depth)/2)+ #set the color
geom_ribbon(data=df,
aes(x=Day,ymin=0,ymax=Depth),
fill = "gray92")+ #default ggplot2 background color
ylim(max(df$Depth),0)+
scale_x_continuous()+
facet_wrap(~group,scales = "free_x",ncol=2)+
labs(x="Days(d)",y="Depth (m)")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
gg

How to plot a formula with a given range?

I am looking to plot the following:
L<-((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
all variables except l are constant:
T<-6000
h<-6.626070040*10^-34
c<-2.99792458*10^8
k<-1.38064852*10^-23
l has a range of 20*10^-9 to 2000*10^-9.
I have tried l<-seq(20*10^-9,2000*10^-9,by=1*10^-9), however this does not give me the results I expect.
Is there a simple solution for this in R, or do I have to try in another language?
Thank you.
Looking at the spectral radiance equation wikipedia page, it seems that your formula is a bit off. Your formula multiplies an additional pi (not sure if intended) and the -1 is inside the exp instead of outside:
L <- ((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
Below is the corrected formula. Also notice I have converted it into a function with parameter l since this is a variable:
T <- 6000 # Absolute temperature
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L <- function(l){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))}
# Plotting
plot(L, xlim = c(20*10^-9,2000*10^-9),
xlab = "Wavelength (nm)",
ylab = bquote("Spectral Radiance" ~(KW*sr^-1*m^-2*nm^-1)),
main = "Plank's Law",
xaxt = "n", yaxt = "n")
xtick <- seq(20*10^-9, 2000*10^-9,by=220*10^-9)
ytick <- seq(0, 4*10^13,by=5*10^12)
axis(side=1, at=xtick, labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9))
axis(side=2, at=ytick, labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12))
The plot above is not bad, but I think we can do better with ggplot2:
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L2 <- function(l, T){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))} # Plank's Law
classical_L <- function(l, T){(2*c*k*T)/l^4} # Rayleigh-Jeans Law
library(ggplot2)
ggplot(data.frame(l = c(20*10^-9,2000*10^-9)), aes(l)) +
geom_rect(aes(xmin=390*10^-9, xmax=700*10^-9, ymin=0, ymax=Inf),
alpha = 0.3, fill = "lightblue") +
stat_function(fun=L2, color = "red", size = 1, args = list(T = 3000)) +
stat_function(fun=L2, color = "green", size = 1, args = list(T = 4000)) +
stat_function(fun=L2, color = "blue", size = 1, args = list(T = 5000)) +
stat_function(fun=L2, color = "purple", size = 1, args = list(T = 6000)) +
stat_function(fun=classical_L, color = "black", size = 1, args = list(T = 5000)) +
theme_bw() +
scale_x_continuous(breaks = seq(20*10^-9, 2000*10^-9,by=220*10^-9),
labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
sec.axis = dup_axis(labels = (1*10^6)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
name = "Wavelength (\U003BCm)")) +
scale_y_continuous(breaks = seq(0, 4*10^13,by=5*10^12),
labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12),
limits = c(0, 3.5*10^13)) +
labs(title = "Black Body Radiation described by Plank's Law",
x = "Wavelength (nm)",
y = expression("Spectral Radiance" ~(kWsr^-1*m^-2*nm^-1)),
caption = expression(''^'\U02020' ~'Spectral Radiance described by Rayleigh-Jeans Law, which demonstrates the ultraviolet catastrophe.')) +
annotate("text",
x = c(640*10^-9, 640*10^-9, 640*10^-9, 640*10^-9,
150*10^-9, (((700-390)/2)+390)*10^-9, 1340*10^-9),
y = c(2*10^12, 5*10^12, 14*10^12, 31*10^12,
35*10^12, 35*10^12, 35*10^12),
label = c("3000 K", "4000 K", "5000 K", "6000 K",
"UV", "VISIBLE", "INFRARED"),
color = c(rep("black", 4), "purple", "blue", "red"),
alpha = c(rep(1, 4), rep(0.6, 3)),
size = 4.5) +
annotate("text", x = 1350*10^-9, y = 23*10^12,
label = deparse(bquote("Classical theory (5000 K)"^"\U02020")),
color = "black", parse = TRUE)
Notes:
I created L2 by also making absolute temperature T a variable
For each T, I plot the function L2 using different colors for representation. I've also added a classical_L function to demonstrate classical theory of spectral radiance
geom_rect creates the light blue shaded area for "VISIBLE" light wavelength range
scale_x_continuous sets the breaks of the x axis, while labels sets the axis tick labels. Notice I have multiplied the seq by (1*10^9) to convert the units to nanometer (nm). A second x-axis is added to display the micrometer scale
Analogously, scale_y_continuous sets the breaks and tick labels for y axis. Here I multiplied by (1*10^-12) or (1*10^(-3-9)) to convert from watts (W) to kilowatts (kW), and from inverse meter (m^-1) to inverse nanometer (nm^-1)
bquote displays superscripts correctly in the y axis label
annotate sets the coordinates and text for curve labels. I've also added the labels for "UV", "VISIBLE" and "INFRARED" light wavelengths
ggplot2
Plot from wikipedia:
Image source: https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Black_body.svg/600px-Black_body.svg.png

R - Contour plot from raster dataset with country borders overlaid

I have a fairly simple and probably common task, plotting a raster dataset with countour lines and adding country borders together in one plot, however I did not find a solution anywhere. There are a a few hints available (such as this one), but no raster dataset is used there and I can't get it to work.
The dataset I am using is actually in netcdf format and available here (15mb in size) and contains about 40 years of gridded precipitation data.
Here is my line of code:
setwd("...netcdf Data/GPCP")
library("raster")
library("maps")
nc_brick79_17 <- brick("precip.mon.mean.nc") # load in the ncdf data as a
raster brick
newextent <- c(85, 125, -20, 20) # specify region of interest
SEA_brick <- crop(nc_brick79_17, newextent) # crop the region
day1 <- SEA_brick[[1]] # select very first day as example
colfunc<-colorRampPalette(c("white","lightblue","yellow","red","purple")) # colorscale for plotting
So it works of course when I just plot the raster data together with a map overlaid:
plot(day1, col=(colfunc(100)), interpolate=F, main="day1",legend.args=list(text='mm/hr', side=4,font=1, line=2.5, cex=1.1))
map("world", add=TRUE, lwd=0.5, interior = FALSE, col = "black")
We get this plot (Raster Plot with country borders added)
Now the code I use to generate the contour plot is the following:
filledContour(day1,zlim=c(0,20),color=colorRampPalette(c("white","lightblue","yellow","red","purple")),
xlab = "Longitude (°)", ylab = "Latitude (°)")
map("world", add=TRUE, lwd=0.5, interior = FALSE, col = "black") # add map overlay
I end up with a plot where obviously the country borders do not align and are even covering the colorbar.
Contour plot with map overlay shifted
In this last part I am trying to add the country boundaries to the contour plot, but it does not work, even though it should I assume. The map is simply not there, no error though:
filledContour(day1, zlim=c(0,20),
color.palette = colorRampPalette(c("white","lightblue","yellow","red","purple")),
xlab = "Longitude (°)", ylab = "Latitude (°)",
xlim = c(90, 120), ylim = c(-20, 20), nlevels = 25,
plot.axes = {axis(1); axis(2);
map('world', xlim = c(90, 120), ylim = c(-20, 20), add = TRUE, lwd=0.5, col = "black")})
From that line of code I get this plot.
Contour plot but no country borders added
What could I improve or is there any mistake somewhere? Thank you!
I chose to use ggplot here. I leave two maps for you. The first one is the one you created. This is a replication with ggplot. The second one is the one you could not produce. There are many things to explain. But I am afraid I do not have enough time to write all. But I left some comments in my code below. Please check this question to learn more about the second graphic. Finally, I'd like to give credit to hrbrmstr who wrote a great answer in the linked question.
library(maptools)
library(akima)
library(raster)
library(ggplot2)
# This is a data set from the maptools package
data(wrld_simpl)
# Create a data.frame object for ggplot. ggplot requires a data frame.
mymap <- fortify(wrld_simpl)
# This part is your code.
nc_brick79_17 <- brick("precip.mon.mean.nc")
newextent <- c(85, 125, -20, 20)
SEA_brick <- crop(nc_brick79_17, newextent)
day1 <- SEA_brick[[1]]
# Create a data frame with a raster object. This is a spatial class
# data frame, not a regular data frame. Then, convert it to a data frame.
spdf <- as(day1, "SpatialPixelsDataFrame")
mydf <- as.data.frame(spdf)
colnames(mydf) <- c("value", "x", "y")
# This part creates the first graphic that you drew. You draw a map.
# Then, you add tiles on it. Then, you add colors as you wish.
# Since we have a world map data set, we trim it at the end.
ggplot() +
geom_map(data = mymap, map = mymap, aes(x = long, y = lat, map_id = id), fill = "white", color = "black") +
geom_tile(data = mydf, aes(x = x, y = y, fill = value), alpha = 0.4) +
scale_fill_gradientn(colors = c("white", "lightblue", "yellow", "red", "purple")) +
scale_x_continuous(limits = c(85, 125), expand = c(0, 0)) +
scale_y_continuous(limits = c( -20, 20), expand = c(0, 0)) +
coord_equal()
ggplot version of filled.contour()
# As I mentioned above, you want to study the linked question for this part.
mydf2 <- with(mydf, interp(x = x,
y = y,
z = value,
xo = seq(min(x), max(x), length = 400),
duplicate = "mean"))
gdat <- interp2xyz(mydf2, data.frame = TRUE)
# You need to draw countries as lines here. You gotta do that after you draw
# the contours. Otherwise, you will not see the map.
ggplot(data = gdat, aes(x = x, y = y, z = z)) +
geom_tile(aes(fill = z)) +
stat_contour(aes(fill = ..level..), geom = "polygon", binwidth = 0.007) +
geom_contour(color = "white") +
geom_path(data = mymap, aes(x = long, y = lat, group = group), inherit.aes = FALSE) +
scale_x_continuous(limits = c(85, 125), expand = c(0, 0)) +
scale_y_continuous(limits = c(-20, 20), expand = c(0, 0)) +
scale_fill_gradientn(colors = c("white", "lightblue", "yellow", "red", "purple")) +
coord_equal() +
theme_bw()

How to superimpose bar plots in R?

I'm trying to create a figure similar to the one below (taken from Ro, Russell, & Lavie, 2001). In their graph, they are plotting bars for the errors (i.e., accuracy) within the reaction time bars. Basically, what I am looking for is a way to plot bars within bars.
I know there are several challenges with creating a graph like this. First, Hadley points out that it is not possible to create a graph with two scales in ggplot2 because those graphs are fundamentally flawed (see Plot with 2 y axes, one y axis on the left, and another y axis on the right)
Nonetheless, the graph with superimposed bars seems to solve this dual sclaing problem, and I'm trying to figure out a way to create it in R. Any help would be appreciated.
It's fairly easy in base R, by using par(new = T) to add to an existing graph
set.seed(54321) # for reproducibility
data.1 <- sample(1000:2000, 10)
data.2 <- sample(seq(0, 5, 0.1), 10)
# Use xpd = F to avoid plotting the bars below the axis
barplot(data.1, las = 1, col = "black", ylim = c(500, 3000), xpd = F)
par(new = T)
# Plot the new data with a different ylim, but don't plot the axis
barplot(data.2, las = 1, col = "white", ylim = c(0, 30), yaxt = "n")
# Add the axis on the right
axis(4, las = 1)
It is pretty easy to make the bars in ggplot. Here is some example code. No two y-axes though (although look here for a way to do that too).
library(ggplot2)
data.1 <- sample(1000:2000, 10)
data.2 <- sample(500:1000, 10)
library(ggplot2)
ggplot(mapping = aes(x, y)) +
geom_bar(data = data.frame(x = 1:10, y = data.1), width = 0.8, stat = 'identity') +
geom_bar(data = data.frame(x = 1:10, y = data.2), width = 0.4, stat = 'identity', fill = 'white') +
theme_classic() + scale_y_continuous(expand = c(0, 0))

r : ecdf over histogram

in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!

Resources