Gap between forecast and actual data in ggplot - r

I am trying to plot some data, fitted values and forecasts on a nice ggplot format but when I plot my data the way I think should work I get a gap between the real data and the forecast. The gap is meaningless but it would be nice if it was gone.
Some R code you can use to recreate my problem is:
library(xts)
library(tidyverse)
library(forecast)
dates <- seq(as.Date("2016-01-01"), length = 100, by = "days")
realdata <- arima.sim(model = list(ar = 0.7, order = c(1,1,0)), n = 99)
data <- xts(realdata, order.by = dates)
user_arima <- arima(data, order = c(1,1,0))
user_arimaf <- forecast(user_arima)
fits <- xts(user_arimaf$fitted, order.by = dates)
fcastdates <- as.Date(dates[100]) + 1:10
meancast <- xts(user_arimaf$mean[1:10], order.by = fcastdates)
lowercast95 <- xts(user_arimaf$lower[1:10], order.by = fcastdates)
uppercast95 <- xts(user_arimaf$upper[1:10], order.by = fcastdates)
frame <- merge(data, fits, meancast, uppercast95, lowercast95, all = TRUE, fill = NA)
frame <- as.data.frame(frame) %>%
mutate(date = as.Date(dates[1] + 0:(109)))
frame %>%
ggplot() +
geom_line(aes(date, data, color = "Data")) +
geom_line(aes(date, fits, color = "Fitted")) +
geom_line(aes(date, meancast, color = "Forecast")) +
geom_ribbon(aes(date, ymin=lowercast95,ymax=uppercast95),alpha=.25) +
scale_color_manual(values = c(
'Data' = 'black',
'Fitted' = 'red',
'Forecast' = 'darkblue')) +
labs(color = 'Legend') +
theme_classic() +
ylab("some data") +
xlab("Date") +
labs(title = "chart showing a gap",
subtitle = "Shaded area is the 95% CI from the ARIMA")
And the chart is below
I know there is a geom_forecast in ggplot now but I would like to build this particular plot the way i'm doing it. Although if there's no other solution to the gap then i'll use the geom_forecast.

Closing the gap requires providing a data point in the meancast column for the blank area. I guess it makes sense just to use the value for the last "real" data point.
# Grab the y-value corresponding to the date just before the gap.
last_data_value = frame[frame$date == as.Date("2016-04-09"), "data"]
# Construct a one-row data.frame.
extra_row = data.frame(data=NA_real_,
fits=NA_real_,
meancast=last_data_value,
uppercast95=last_data_value,
lowercast95=last_data_value,
date=as.Date("2016-04-09"))
# Add extra row to the main data.frame.
frame = rbind(frame, extra_row)

Related

How to overlap R histograms

Reproduced from this code:
library(haven)
library(survey)
library(dplyr)
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
nhanesAnalysis <- nhanesDemo %>%
mutate(LowIncome = case_when(
INDFMIN2 < 40 ~ T,
T ~ F
)) %>%
# Select the necessary columns
select(INDFMIN2, LowIncome, persWeight, psu, strata)
# Set up the design
nhanesDesign <- svydesign(id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
svyhist(~log10(INDFMIN2), design=nhanesDesign, main = '')
How do I color the histogram by independent variable, say, LowIncome? I want to have two separate histograms, one for each value of LowIncome. Unfortunately I picked a bad example, but I want them to be see-through in case their values overlap.
If you want to plot a histogram from your model, you can get its data from model.frame (this is what svyhist does under the hood). To get the histogram filled by group, you could use this data frame inside ggplot:
library(ggplot2)
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(alpha = 0.5, color = "gray60", breaks = 0:20 / 10) +
theme_classic()
Edit
As Thomas Lumley points out, this does not incorporate sampling weights, so if you wanted this you could do:
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(aes(weight = persWeight), alpha = 0.5,
color = "gray60", breaks = 0:20 / 10) +
theme_classic()
To demonstrate this approach works, we can replicate Thomas's approach in ggplot using the data example from svyhist. To get the uneven bin sizes (if this is desired), we need two histogram layers, though I'm guessing this would not be required for most use-cases.
ggplot(model.frame(dstrat), aes(enroll)) +
geom_histogram(aes(fill = "E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype == "E"),
breaks = 0:35 * 100,
position = "identity", col = "gray50") +
geom_histogram(aes(fill = "Not E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype != "E"),
position = "identity", col = "gray50",
breaks = 0:7 * 500) +
scale_fill_manual(NULL, values = c("#00880020", "#88000020")) +
theme_classic()
You can't just extract the data and use ggplot, because that won't use the weights and so misses the whole point of svyhist. You can use the add=TRUE argument, though. You do need to set the x and y axis ranges correctly to make sure the whole plot is visible
Using the data example from ?svyhist
svyhist(~enroll, subset(dstrat,stype=="E"), col="#00880020",ylim=c(0,0.003),xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!="E"), col="#88000020",add=TRUE)

Is there a way I could plot t = 300, 350, 450, and 500 lines in one graph?

enter image description hereI wanted to plot multiple lines in one graph but I couldn't figure out which code to use. Also, is there a way I could assign colors to each of the lines? Just new to Rstudio and was assigned to pick up someones work so I've been doing a lot of trial and error but I haven't been lucky for the past few days. Hope someone could help me with this! Thank you so much
ecdf.shift <- function(OUR_threshold, des_cap = 40, nint = 10000){
#create some empty vectors for later use in the loop
ecdf_med = c()
ecdf_obs = c()
for (i in 1:length(OUR_threshold)){
# filter out the OUR threshold data, then select only the capture column and create a ecdf function
ecdf_fun <- HRP_rESS_no %>%
filter(ESS > OUR_threshold[i]) %>%
.$TSS_con %>%
ecdf()
# extract the ecdf data and put in tibble dataframe, then create a linear interpolation of the curve.
ecdf_data <- tibble(TSS_con = environment(ecdf_fun)$x, prob = environment(ecdf_fun)$y)
ecdf_interpol <- approx(x = ecdf_data$TSS_con, y = ecdf_data$prob, n = nint)
# find the vector numbers in x which correspond with the desired capture. Then find correlate the vectornumbers with probability numbers in the y vectors. Take the median value in case multiple hits. Put this number in a vector with designed vectornumber as ditacted by the loopnumber i.
ecdf_med[i] <- median(ecdf_interpol$y[(round(ecdf_interpol$x,1) == des_cap)])
# calculate the number of observations when the filtering takes place.
ecdf_obs[i] <- HRP_rESS_no %>%
filter(ESS > OUR_threshold[i]) %>%
.$TSS_con %>%
length()
# Flush the ecdf data. The ecdf is encoded as a function with global paramaters, so you want to reset them everytime the loop is done to avoid pesky bugs to appear.
rm(ecdf_data)
}
#create a tibble dataframe with all the loop data.
ecdf_out <- tibble(OUR_ratio_cutoff = OUR_threshold, prob = (ecdf_med)*100, nobs = ecdf_obs)
return(ecdf_out)
}
ratio_threshold <- seq(0,115, by = 5)
t = ecdf_MLSS_target <- 400 %>%
ecdf.shift(ratio_threshold, .) %>%
filter(nobs > 2) %>%
ggplot(aes( x = OUR_ratio_cutoff, y = prob)) +
geom_line() +
geom_point() +
theme_bw(base_size = 12) +
theme(panel.grid = element_blank()) +
scale_y_continuous(limits = c(0,100),
breaks = seq(0,300, by = 5),
expand = c(0,0)) +
scale_x_continuous(limits = c(0,120),
breaks = seq(0,110, by = 10),
expand = c(0,0)) +
labs(x = "ESS mg TSS/L",
y = "Probability of contactor MLSS > 400 mg TSS/L ")
plot(t)
Easiest would be to loop over your different t values first and bring the resulting data frames into one big data frame, and use this for your plot. Your code is not fully reproducible (it requires data that we do not have, i.e. HRP_rESS_no). So I have stripped down the function to the core - creating a data frame which makes different "lines" depending on your t value. I just used it as slope.
I hope the idea is clear.
library(tidyverse)
ecdf.shift <- function(OUR_threshold, t) {
data.frame(x = OUR_threshold, y = t * OUR_threshold)
}
ratio_threshold <- seq(0, 115, by = 5)
t_df <-
map(1:5, function(t) ecdf.shift(ratio_threshold, t)) %>%
bind_rows(, .id = "t")
ggplot(t_df, aes(x, y, color = t)) +
geom_line() +
geom_point()
Created on 2020-05-07 by the reprex package (v0.3.0)

apply fullrange option on multiple loess smooth lines when subsetting the data

How could I draw different smooth lines on my plot for data included in separate time periods, but draw them both on the full range of the plot?
In my working example below, even when setting the fullrange argument to TRUE, the smooth lines limit themselves, and I get the missing values warnings (which does make sense as we are setting a new data range locally in each one of the geom_smooth() functions).
# convert time series to data.frame, conserving date info
sb <- data.frame(Seatbelts, date = time(Seatbelts))
# convert from ts to date
library(lubridate)
sb$date <- as_date(date_decimal(as.numeric(sb$date)))
# store seatbelt law date
law <- ymd(19830131)
# plot
library(ggplot2)
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_smooth(data = sb[sb$date < law,],
fullrange = TRUE) +
geom_smooth(data = sb[sb$date > law,],
fullrange = TRUE)
Warning messages:
Warning messages:
1: Removed 10 rows containing missing values (geom_smooth).
2: Removed 71 rows containing missing values (geom_smooth).
(currently using ggplot2 3.1.0 and R 3.5.2)
Edit:
As I thought the issue was the preliminary subsetting of the data, I also tried this cleaner version, to no avail:
# add before/after
sb$relative <- ifelse(sb$date < law, "before", "after")
# plot v.2
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_smooth(aes(colour = relative),
fullrange = TRUE)
The explanation for the behaviour you're seeing has to do with the way the LOESS fit is performed; by default
dates <- seq(as.Date("1960-01-01"), law, by = "1 day")
head(setNames(predict(
loess(front ~ as.numeric(date), data = sb[sb$date < law, ]),
data.frame(date = as.numeric(dates))), dates))
1960-01-01 1960-01-02 1960-01-03 1960-01-04 1960-01-05 1960-01-06
NA NA NA NA NA NA
the behaviour of which is explained in ?predict.loess (bold-face mine)
When the fit was made using ‘surface = "interpolate"’ (the
default), ‘predict.loess’ will not extrapolate - so points outside
an axis-aligned hypercube enclosing the original data will have
missing (‘NA’) predictions and standard errors.
In order to extrapolate to points outside of the range of points used for the LOESS model we can use control = loess.control(surface = "direct") inside loess.
Unfortunately this means that we need to manually perform the two LOESS fits, predict values for two ranges of interest, and plot everything.
Here is what I would do:
Define a convenience function extrapolate.loess that predicts values for dates with a lower/upper confidence interval (based on an alpha level)
library(tidyverse)
library(broom)
extrapolate.loess <- function(data, dates, alpha = 0.95) {
loess(
front ~ as.numeric(date), data = data,
control = loess.control(surface = "direct")) %>%
augment(newdata = data.frame(date = as.numeric(dates))) %>%
transmute(
date = dates,
front = .fitted,
front.l = front - qnorm((1 - alpha) / 2) * .se.fit,
front.h = front + qnorm((1 - alpha) / 2) * .se.fit)
}
We now store LOESS estimates with CI's for the two ranges in a data.frame
dates.left <- seq(as.Date("1960-01-01"), law, by = "1 day")
df.left <- extrapolate.loess(sb[sb$date < law, ], dates.left)
dates.right <- seq(law, as.Date("1990-01-01"), by = "1 day")
df.right <- extrapolate.loess(sb[sb$date > law, ], dates.right)
Now we can plot
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_line(data = df.left, colour = "blue", size = 1) +
geom_ribbon(data = df.left, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
geom_line(data = df.right, colour = "blue", size = 1) +
geom_ribbon(data = df.right, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
coord_cartesian(ylim = c(400, 1300))
I will not/cannot comment on how useful/meaningful these LOESS extrapolations are.

How to plot a subset of forecast in R?

I have a simple R script to create a forecast based on a file.
Data has been recorded since 2014 but I am having trouble trying to accomplish below two goals:
Plot only a subset of the forecast information (starting on 11/2017 onwards).
Include month and year in a specific format (i.e. Jun 17).
Here is the link to the dataset and below you will find the code made by me so far.
# Load required libraries
library(forecast)
library(ggplot2)
# Load dataset
emea <- read.csv(file="C:/Users/nsoria/Downloads/AMS Globales/EMEA_Depuy_Finanzas.csv", header=TRUE, sep=';', dec=",")
# Create time series object
ts_fin <- ts(emea$Value, frequency = 26, start = c(2014,11))
# Pull out the seasonal, trend, and irregular components from the time series
model <- stl(ts_fin, s.window = "periodic")
# Predict the next 3 bi weeks of tickets
pred <- forecast(model, h = 5)
# Plot the results
plot(pred, include = 5, showgap = FALSE, main = "Ticket amount", xlab = "Timeframe", ylab = "Quantity")
I appreciate any help and suggestion to my two points and a clean plot.
Thanks in advance.
Edit 01/10 - Issue 1:
I added the screenshot output for suggested code.
Plot1
Edit 01/10 - Issue 2:
Once transformed with below code, it somehow miss the date count and mess with the results. Please see two screenshots and compare the last value.
Screenshot 1
Screenshot 2
Plotting using ggplot2 w/ ggfortify, tidyverse, lubridate and scales packages
library(lubridate)
library(tidyverse)
library(scales)
library(ggfortify)
# Convert pred from list to data frame object
df1 <- fortify(pred) %>% as_tibble()
# Convert ts decimal time to Date class
df1$Date <- as.Date(date_decimal(df1$Index), "%Y-%m-%d")
str(df1)
# Remove Index column and rename other columns
# Select only data pts after 2017
df1 <- df1 %>%
select(-Index) %>%
filter(Date >= as.Date("2017-01-01")) %>%
rename("Low95" = "Lo 95",
"Low80" = "Lo 80",
"High95" = "Hi 95",
"High80" = "Hi 80",
"Forecast" = "Point Forecast")
df1
### Updated: To connect the gap between the Data & Forecast,
# assign the last non-NA row of Data column to the corresponding row of other columns
lastNonNAinData <- max(which(complete.cases(df1$Data)))
df1[lastNonNAinData, !(colnames(df1) %in% c("Data", "Fitted", "Date"))] <- df1$Data[lastNonNAinData]
# Or: use [geom_segment](http://ggplot2.tidyverse.org/reference/geom_segment.html)
plt1 <- ggplot(df1, aes(x = Date)) +
ggtitle("Ticket amount") +
xlab("Time frame") + ylab("Quantity") +
geom_ribbon(aes(ymin = Low95, ymax = High95, fill = "95%")) +
geom_ribbon(aes(ymin = Low80, ymax = High80, fill = "80%")) +
geom_point(aes(y = Data, colour = "Data"), size = 4) +
geom_line(aes(y = Data, group = 1, colour = "Data"),
linetype = "dotted", size = 0.75) +
geom_line(aes(y = Fitted, group = 2, colour = "Fitted"), size = 0.75) +
geom_line(aes(y = Forecast, group = 3, colour = "Forecast"), size = 0.75) +
scale_x_date(breaks = scales::pretty_breaks(), date_labels = "%b %y") +
scale_colour_brewer(name = "Legend", type = "qual", palette = "Dark2") +
scale_fill_brewer(name = "Intervals") +
guides(colour = guide_legend(order = 1), fill = guide_legend(order = 2)) +
theme_bw(base_size = 14)
plt1

Use R to recreate contour plot made in Igor

This contour plot, made with the Igor program, is popular in atmospheric chemistry and pollution studies:
I'm trying to recreate it with R for a friend who wants to stop using Igor, and we can't quite get it. Here's the dataset (the same data used to make the plot with Igor), and here's what I've got so far to make the plot with R:
# read in the data
dat <- read.csv("contour_plot_data.csv")
# focus on the untransformed values
dat <- dat[, 1:108]
# get Diameter value from col names
Diameter <- as.numeric(gsub("X", "", names(dat)[-1]))
# interpolate between the Diameter values for a smoother contour,
# a seperate interpolation for each row (date value)
# this takes a moment or two...
interp <- seq(min(Diameter), max(Diameter), 0.2)
dat_interp <- data.frame(matrix(0, ncol = length(interp), nrow = nrow(dat)))
for(i in 1:nrow(dat)){
# get the values from row i
vec <- unlist(dat[i, 2:108], use.names = FALSE)
# compute loess interpolations
lo <- loess(vec ~ Diameter)
# predict interpolated values
pr <- predict(lo, newdata = data.frame(Diameter = interp))
# store in a data frame
df <- data.frame(ct = unname(pr), Diameter = interp)
# add as new row to new data frame
dat_interp[i, ] <- df$ct
print(i) # so we can see that it's working
}
# add date col and col names to the interpolated data
names(dat_interp) <- interp
dat_interp$date <- as.character(dat$Time)
# melt data into long format
# see http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
library(tidyr)
gather_cols <- interp
dat_long <- gather_(dat_interp, "Diameter", "dN_dlogDp", gather_cols)
# we want diameter as a numeric
dat_long$Diameter <- as.numeric(as.character(dat_long$Diameter))
# we want date as a date format
x <- as.character(dat_long$date)
date_ <- as.Date(x, format = "%d/%m/%Y")
time_ <- gsub(" ", "", substr(x, nchar(x) - 4, nchar(x)))
dat_long$date_time <- as.POSIXct(paste0(date_, " ", time_))
# The Igor plot seems to use log dN_dlogDp values, so let's get those
dat_long$dN_dlogDp_log <- log10(dat_long$dN_dlogDp)
dat_long$dN_dlogDp_log <- ifelse(dat_long$dN_dlogDp_log == "NaN", 0, dat_long$dN_dlogDp_log)
# get on with plottong...
library(ggplot2)
library(scales)
labels_breaks <- seq(0, max(Diameter), 100)
mytheme <- theme_bw(base_size = 14) + theme(aspect.ratio = 1/4)
ggplot(dat_long, aes(y = Diameter, x = date_time, fill=dN_dlogDp_log)) +
geom_raster(interpolate = TRUE) +
scale_fill_gradientn(name=expression(log(dN/dlogD[p])), colours = rainbow(7)) +
scale_y_continuous(expand = c(0,0), breaks = labels_breaks ) +
scale_x_datetime(expand = c(0,0), breaks = date_breaks("12 hours")) +
ylab("Diameter (nm)") +
xlab("Date and time") +
mytheme
My plot could do with a little more finessing with labels and tick marks, etc. However, my main question is why my contour fill looks so different from the Igor plot. The scale seems reversed, and the interpolation looks very different.
How can I make my plot look more like the Igor plot?
Note that these other questions of mine are closely related to the task of recreating this plot:
geom_raster interpolation with log scale
2d density plot for categories
And after I asked this question I have been keeping an updated gist of R code that combines details from the answers to these questions, and successfully replicates these plots (example output included in the gist). That gist is here: https://gist.github.com/benmarwick/9a54cbd325149a8ff405.
UPDATE I've now made a package that will produce these plots: https://github.com/benmarwick/smps
I can get a lot closer to the Igor plot using akima::interp instead of loess for the interpolation:
# read in the data
dat <- read.csv("contour_plot_data.csv")
# focus on the untransformed values
dat <- dat[, 1:108]
# get Diameter value from col names
Diameter <- as.numeric(gsub("X", "", names(dat)[-1]))
# melt data into long format
# see http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
library(tidyr)
dat_long <- gather(dat, "Diameter", "dN_dlogDp", 2:108)
# we want diameter as a numeric
dat_long$Diameter <- as.numeric(gsub("X", "", dat_long$Diameter ))
# we want time as a date-formatted variable
x <- as.character(dat_long$Time)
date_ <- as.Date(x, format = "%d/%m/%Y")
time_ <- gsub(" ", "", substr(x, nchar(x) - 4, nchar(x)))
dat_long$Time <- as.POSIXct(paste0(date_, " ", time_))
# The Igor plot seems to use log dN_dlogDp values, so let's get those
dat_long$dN_dlogDp_log <- log10(dat_long$dN_dlogDp)
dat_long$dN_dlogDp_log <- ifelse(dat_long$dN_dlogDp_log == "NaN" |
dat_long$dN_dlogDp_log == "-Inf" , 0, dat_long$dN_dlogDp_log)
# interpolate between the values for a smoother contour
# this takes a moment or two...
library(akima)
xo <- with(dat_long, seq(min(Time), max(Time), 120))
yo <- with(dat_long, seq(min(Diameter), max(Diameter), 0.5))
dat_interp <- with(dat_long, interp(Time, Diameter, dN_dlogDp_log, xo = xo, yo = yo) )
# get on with plotting...
# make into a data frame for ggplot
dat_interp_df <- data.frame(matrix(data = dat_interp$z, ncol = length(dat_interp$y), nrow = length(dat_interp$x)))
names(dat_interp_df) <- dat_interp$y
dat_interp_df$Time <- as.POSIXct(dat_interp$x, origin = "1970-01-01")
# wide to long
dat_interp_df_long <- gather(dat_interp_df, "Diameter", "dN_dlogDp_log", 1:(ncol(dat_interp_df)-1))
dat_interp_df_long$Diameter <- as.numeric(as.character(dat_interp_df_long$Diameter))
# plot
library(ggplot2)
library(scales)
y_labels_breaks <- seq(0, max(Diameter), 100)
ggplot(dat_interp_df_long, aes(y = Diameter, x = Time, fill = dN_dlogDp_log)) +
geom_raster(interpolate = TRUE) +
scale_fill_gradientn(name=expression(log(dN/dlogD[p])), colours = rev(rainbow(50))) +
scale_y_continuous(expand = c(0,0), breaks = y_labels_breaks ) +
scale_x_datetime(expand = c(0,0), breaks = date_breaks("1 day"))
But there is still quite a big difference in the colour mapping, with the Igor plot having wide bands with sharp boundaries, and my plot has fewer colour bands and fuzzy boundaries between them. So I guess I don't quite have the interpolation method that the Igor plot uses.
UPDATE after experimenting with a bunch of colour ramps, I've found a pretty good match in colorRamps::blue2green2red. I've also put a bit of effort here into fancy tickmarks:
# plot
library(ggplot2)
library(scales) # for date_breaks
library(colorRamps) # for blue2green2red
# function for minor tick marks
every_nth <- function(x, nth, empty = TRUE, inverse = FALSE)
{
if (!inverse) {
if(empty) {
x[1:nth == 1] <- ""
x
} else {
x[1:nth != 1]
}
} else {
if(empty) {
x[1:nth != 1] <- ""
x
} else {
x[1:nth == 1]
}
}
}
# add tick marks every two hours
start_date <- min(dat_interp_df_long$Time)
end_date <- max(dat_interp_df_long$Time)
date_breaks_2h <- seq(from = start_date, to = end_date, by = "2 hours")
date_breaks_1_day <- seq(from = start_date, to = end_date, by = "1 day")
multiple <- length(date_breaks_2h) / length(date_breaks_1_day)
insert_minor <- function(major_labs, n_minor) {labs <-
c( sapply( major_labs, function(x) c(x, rep("", multiple) ) ) )
labs[1:(length(labs)-n_minor)]}
y_labels_breaks <- seq(0, max(Diameter), 100)
mytheme <- theme_bw(base_size = 14) + theme(aspect.ratio = 1/5)
ggplot(dat_interp_df_long, aes(y = Diameter, x = Time, fill = dN_dlogDp_log)) +
geom_raster(interpolate = TRUE) +
scale_fill_gradientn(name=expression(log(dN/dlogD[p])), colours = blue2green2red(100)) +
scale_y_continuous(expand = c(0,0),
labels = every_nth(y_labels_breaks, 2, inverse = TRUE),
breaks = y_labels_breaks) +
scale_x_datetime(expand = c(0,0),
breaks=date_breaks_2h,
labels=insert_minor(format(date_breaks_1_day, "%d %b"),
length(date_breaks_1_day))) +
xlab("Day and time") +
ylab("Diameter (nm)") +
mytheme
The green-blue gradient is still a bit different from the Igor plot. I have very little green at all! Perhaps further experimentation with colour ramps might improve the match there.
To get the y-axis on a log scale, some additional effort is required. We have to use geom_rect and adjust the sizes of each rectangle to fit in the log scale:
################## y-axis with log scale ###########################
# get visually diminishing axis ticks
base_breaks <- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
# Now with log axis, we need to replace the ymin and ymax
distance <- diff((unique(dat_interp_df_long$Diameter)))/2
upper <- (unique(dat_interp_df_long$Diameter)) + c(distance, distance[length(distance)])
lower <- (unique(dat_interp_df_long$Diameter)) - c(distance[1], distance)
# Create xmin, xmax, ymin, ymax
dat_interp_df_long$xmin <- dat_interp_df_long$Time - 1000 # default of geom_raster is 0.5
dat_interp_df_long$xmax <- dat_interp_df_long$Time + 1000
idx <- rle(dat_interp_df_long$Diameter)$lengths[1]
dat_interp_df_long$ymin <- unlist(lapply(lower, function(i) rep(i, idx)))
dat_interp_df_long$ymax <- unlist(lapply(upper, function(i) rep(i, idx)))
ggplot(dat_interp_df_long, aes(y = Diameter, x = Time,
xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax,
fill = dN_dlogDp_log)) +
geom_rect() +
scale_fill_gradientn(name=expression(log(dN/dlogD[p])), colours = blue2green2red(1000)) +
scale_y_continuous(expand = c(0,0),
trans = log_trans(), breaks = base_breaks()) +
scale_x_datetime(expand = c(0,0),
breaks=date_breaks_2h,
labels=insert_minor(format(date_breaks_1_day, "%d %b"),
length(date_breaks_1_day))) +
xlab("Day and time") +
ylab("Diameter (nm)") +
mytheme
UPDATE After some experimentation with colour ramps, I've found a pretty close match:
# adjust the colour ramp to match the Igor plot (their colour ramp is pretty uneven! lots of red and blue, it seems.)
colfunc <- colorRampPalette(c( rep("red", 3),
rep("yellow", 1),
rep("green", 2),
"cyan",
rep("blue", 3),
"purple"))
y_labels_breaks <- seq(0, max(Diameter), 100)
mytheme <- theme_bw(base_size = 14) + theme(aspect.ratio = 1/5)
ggplot(dat_interp_df_long, aes(y = Diameter, x = Time, fill = dN_dlogDp_log)) +
geom_raster(interpolate = TRUE) +
scale_fill_gradientn(name=expression(log(dN/dlogD[p])), colours = rev(colfunc(100))) +
scale_y_continuous(expand = c(0,0),
labels = every_nth(y_labels_breaks, 2, inverse = TRUE),
breaks = y_labels_breaks) +
scale_x_datetime(expand = c(0,0),
breaks=date_breaks_2h,
labels=insert_minor(format(date_breaks_1_day, "%d %b"),
length(date_breaks_1_day))) +
xlab("Day and time") +
ylab("Diameter (nm)") +
mytheme
The code from this post is also at https://gist.github.com/benmarwick/9a54cbd325149a8ff405
UPDATE I've now made a package that will produce these plots: https://github.com/benmarwick/smps

Resources