Related
I've been trying to plot different exponential decay curves on to one graph. Initially I thought this would be rather be easy but it is turning out to be rather frustrating.
What I want to get:
nlsplot(k_data_nls, model = 6, start = c(a= 603.3, b= -0.03812), xlab = "hours", ylab = "copies")
nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723), xlab = "hours", ylab = "copies")
Here is some additional code for the data:
df4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(603.3,406,588,393.27,458.47,501.67,767.53,444.13,340.6,298.47,61.42,51.6))
nlsfit(df4, model=6, start=c(a=603.3,b=-0.009955831526))
d4plot <- nlsplot(df4, model=6, start=c(a=603.3,b=-0.009955831526))
r4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(26,13.44,4.57,3.12,6.89,0.71,0.47,0.47,0,0,0.24,0.48))
nlsLM(copies ~ a*exp(b*hours), data=r4, start=list(a=26,b=-0.65986))
r4plot <- nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723))
Essentially I want to be able to get both of these plots on one graph. I'm new to R so I'm not too sure where I can go from here. Thank you !
I don't know if this is actually helpful because it's so specific, but this is how I would do it (with ggplot2). First, you need data for the function you want to plot. Take the x for all the values you want to display and apply your function with your coefficients to the data. You need to have data points, not just a function, to plot data.
df_simulated <- data.frame("x" = rep(1:100, 2),
"class"= rep(c("DNA", "RNA"), each = 100))
df_simulated$y <- c(1683.7 * exp(-0.103 * 1:100), # DNA
578.7455 * exp(-0.156 * 1:100)) # RNA
However, since I never used the packages you used, I don't know how to extract the values from the models, so I took the values in your example plot. It's important that the "simulated" values for both groups are within one dataframe, and that you have a column which attributes the points to the respective group (RNA or DNA). At least it's easier if you do it like this. Then you need a data frame with your actual observations for the dots. I invented data again:
df_observed <- data.frame("x" = c(12, 13, 25, 26, 50, 51),
"y" = c(500, 50, 250, 25, 0, 5),
"class" = rep(c("DNA", "RNA"), 3))
Then you can create the plot. With color=class you specify that the data points will be grouped by "class" and will be colored accordingly. ("apple" and "banana" are just dummy words to demonstrate linebreaks)
ggplot() +
geom_line(data = df_simulated, aes(x = x, y = y, color = class), size = 1, linetype = "dashed") +
geom_point(data = df_observed, aes(x = x, y = y, color = class), size = 4, pch = 1) +
annotate("text", x = 50, y = 1250, label = "DNA\napple", color = "tomato", hjust = 0) +
annotate("text", x = 50, y = 750, label ="RNA\nbanana", color = "steelblue", hjust = 0) +
ggtitle(expression(~italic("Styela clava")~"(isolated)")) +
ylab("COI copies per 1ml") +
xlab("Time since removal of organisms (hours)") +
theme_classic() +
theme(legend.position = "none") +
scale_color_manual(values = c("DNA" = "tomato", "RNA" = "steelblue"))
This is the output:
First note that R squared is normally used for linear models and not for nonlinear models so the use of this statistic is suspect here; however, below we show it anyways since it seems that is what was asked for. A different goodness of fit measurement that is often used is residual standard error. If fm is the fitted model from nls then sigma(fm) is the residual standard error. Smaller values are more favorable. summary(fm) also reports this value.
For each of df4 and r4 we use lm to get starting values (taking log of both sides we get a model that is linear in log(a) and b), run nls fits and get the coefficients.
Now plot the points and add the fitted lines and legend. (Note that in setting up the graph we use rbind which assumes that df4 and r4 have the same column names, which they do.)
Note that the data provided in the question is much different than that shown in the question's image.
The code below does not need starting values since it uses lm to get them, runs nls and automatically extracts whatever information is needed for the graph.
1) Classic graphics In this alternative no packages are used.
r2 <- function(fm, digits = 3) {
y <- fitted(fm) + resid(fm)
r2 <- 1 - deviance(fm) / sum((y - mean(y))^2)
if (is.numeric(digits)) r2 <- round(r2, digits)
r2
}
fo <- copies ~ a * exp(b * hours) # formula used in nls
# get nls fitted model and coefficients for df4
co_d0 <- coef(lm(log(copies) ~ hours, df4, subset = copies > 0))
fmd <- nls(fo, df4, start = list(a = exp(co_d0[[1]]), b = co_d0[[2]]))
co_d <- round(coef(fmd), 4)
# get nls fitted model and coefficients for r4
co_r0 <- coef(lm(log(copies) ~ hours, r4, subset = copies > 0))
fmr <- nls(fo, r4, start = list(a = exp(co_r0[[1]]), b = co_r0[[2]]))
co_r <- round(coef(fmr), 4)
both <- rbind(cbind(df4, col = "red"), cbind(r4, col = "blue"))
plot(both[1:2], col = both$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0)
lines(fitted(fmd) ~ hours, df4, col = "red", lty = 2)
lines(fitted(fmr) ~ hours, r4, col = "blue", lty = 2)
legend <- c(bquote(DNA),
bquote(y == .(co_d[[1]]) * e ^ {.(co_d[[2]])*x}),
bquote(R^2 == .(r2(fmd))),
bquote(),
bquote(RNA),
bquote(y == .(co_r[[1]]) * e ^ {.(co_r[[2]])*x}),
bquote(R^2 == .(r2(fmr))))
legend("right", legend = as.expression(legend), bty = "n",
text.col = c("red", "red", "red", NA, "blue", "blue", "blue"))
2) ggplot2 This uses ggplot2 and gridtext. r2, fmd, fmr, co_d and co_r are all taken from (1). We use richtest_grob from gridtext to create a custom grob for the legend and pass it using annotate_custom.
library(gridtext)
library(ggplot2)
txt <- sprintf(
"<span style='color:red'>DNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>
<br><br><span style='color:blue'>RNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>",
co_d[[1]], co_d[[2]], r2(fmd), co_r[[1]], co_r[[2]], r2(fmr))
both2 <- rbind(cbind(df4, col = "red", fitted = fitted(fmd)),
cbind(r4, col = "blue", fitted = fitted(fmr)))
ggplot(both2, aes(hours, copies, col = I(col))) +
geom_point() +
geom_line(aes(y = fitted), linetype = 2) +
annotation_custom(richtext_grob(txt, hjust = 0)) +
theme(legend.position = "none") +
labs(x = "Time since removal of organisms", y = "COI copies per 1ml") +
ggtitle(("C)" ~ italic("Styela clava") ~ "(isolated)"))
3) lattice
This uses legend from (1) and both2 from (2). First create a plot for the data points. It will also contain the legend, axes and labels. Then add a layer for the fitted lines. main.settings specifies that the main title should be left justified and bold and is adapted from this page.
library(latticeExtra)
main.settings <- list(par.main.text = list(font = 2, just = "left",
x = grid::unit(25, "mm")))
xyplot(copies ~ hours, both2, col = both2$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0,
key = list(text = list(as.expression(legend),
col = c("red", "red", "red", NA, "blue", "blue", "blue")),
x = 0.65, y = 0.65, columns = 1),
par.settings = main.settings) +
as.layer(xyplot(fitted ~ hours, both2, groups = col, type = "l", lty = 2))
I would like to plot box plots for a data set that including four categorical such as: Good, Bad, VeryGood, and VeryBad and four normal distribution.
My question how to make plots the four categorical with four different normal distribution in one plot separate from each other, I have tried (see below) but there look a mess.
I have used an example that I found it here and did some changes on it.
I added another edited plot which I would like each plot of box plot look like this one which more cleat and each four categorical (blue, yellow, red and green) are clear .
par(mfrow=c(2,2))
df <- data.frame(id = c(rep("Good",200), rep("Bad", 200),
rep("VeryGood",200), rep("VeryBad",200)),
F1 = c(rnorm(200,10,2), rnorm(200,8,1), rnorm(200,5,2),rnorm(200,7,3)),
F2 = c(rnorm(200,7,1), rnorm(200,6,1), rnorm(200,8,1),rnorm(200,12,4)),
F3 = c(rnorm(200,6,2), rnorm(200,9,3),rnorm(200,12,3),rnorm(200,15,2)),
F4 = c(rnorm(200,12,3), rnorm(200,8,2),rnorm(200,8,5),rnorm(200,5,1)))
boxplot(df[,-1], xlim = c(0.5, ncol(df[,-1])+0.9),
boxfill=rgb(1, 1, 1, alpha=1), border=rgb(1, 1, 1, alpha=1)) #invisible boxes
boxplot(df[which(df$id=="Good"), -1], xaxt = "n", add = TRUE, boxfill="red", boxwex=0.25,
at = 1:ncol(df[,-1]) - 0.15) #shift these left by -0.15
boxplot(df[which(df$id=="Bad"), -1], xaxt = "n", add = TRUE, boxfill="blue", boxwex=0.25,
at = 1:ncol(df[,-1]) + 0.15) #shift these right by +0.15
boxplot(df[which(df$id=="VeryBad"), -1], xaxt = "n", add = TRUE, boxfill="green", boxwex=0.25,
at = 1:ncol(df[,-1]) + 0.25) #shift these right by +0.15
boxplot(df[which(df$id=="VeryGood"), -1], xaxt = "n", add = TRUE, boxfill="yellow", boxwex=0.25,
at = 1:ncol(df[,-1]) + 0.45) #shift these right by +0.15
If you aren't set on using Base R graphics, and looking at the new plot you added to the question, I believe this is what you are looking for:
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(id = c(rep("Good",200), rep("Bad", 200),
rep("VeryGood",200), rep("VeryBad",200)),
F1 = c(rnorm(200,10,2), rnorm(200,8,1), rnorm(200,5,2),rnorm(200,7,3)),
F2 = c(rnorm(200,7,1), rnorm(200,6,1), rnorm(200,8,1),rnorm(200,12,4)),
F3 = c(rnorm(200,6,2), rnorm(200,9,3),rnorm(200,12,3),rnorm(200,15,2)),
F4 = c(rnorm(200,12,3), rnorm(200,8,2),rnorm(200,8,5),rnorm(200,5,1)))
df2 <- tidyr::gather(df, key = "FVar", value = "value", F1:F4)
df2 %>%
ggplot(aes(id, value, fill = id)) +
geom_boxplot() +
facet_grid(. ~ FVar) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
I am using a copula to look at the probability of occurrence of events based on duration and magnitude of the events. I can create contours for recurrence intervals with observed and simulated data in base R graphics, but I can't figure out how to reproduce in ggplot2. Why not just produce the graphs in base graphics and move on you may be wondering? Because I'm including the graphs in a short summary report and want to have consistency with numerous other graphs in the report. Below is some example code. I know that using the location, scale, and shape for the GEV distribution to create random deviates to get the same distribution from is not ideal, but it is the best way I could think of to create a somewhat reproducible example, despite the poor correlation at the end. In base R, the contours are generated from a matrix of simulated data. Is this possible in ggplot2?
library(evd)
library(copula)
dur <- rgev(500, 2.854659, 2.170122, -0.007829)
mag <- rgev(500, 0.02482, 0.01996, 0.04603)
fDurGev <- fgev(dur)
fMagGev <- fgev(mag)
durVec <- dgev(dur, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])
magVec <- dgev(mag, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3])
durMagMat <- as.matrix(cbind(duration = durVec, magnitude = magVec))
theta <- coef(fitCopula(claytonCopula(dim = 2), durMagMat, method = "itau"))
clayCop <- claytonCopula(theta, dim = 2)
fCopDurMag <- pCopula(durMagMat, clayCop)
copPts <- data.frame(duration = dur, magnitude = mag, copNEP = fCopDurMag,
copEP = (1 - fCopDurMag), copRI = (1 / fCopDurMag))
fSim <- seq(0.05, 0.99998, length.out = 1000)
quaDur <- qgev(fSim, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])
quaMag <- qgev(fSim, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3])
expDurMagMat <- cbind(expand.grid(fSim, fSim)$Var1, expand.grid(fSim,
fSim)$Var2)
simPred <- pCopula(expDurMagMat, clayCop)
simPredMat <- matrix(simPred, 1000, 1000)
simDF <- data.frame(simDur = quaDur, simMag = quaMag, simPredMat)
rndPred <- data.frame(rCopula(5000, clayCop))
rndPred$rndDur <- qgev(rndPred[,1], fDurGev[[1]][1], fDurGev[[1]][2],
fDurGev[[1]][3])
rndPred$rndMag <- qgev(rndPred[,2], fMagGev[[1]][1], fMagGev[[1]][2],
fMagGev[[1]][3])
RI <- c(1.25, 2 ,5, 10, 20, 50, 100, 200, 500)
NEP <- 1 - (1 / RI)
plot(rndPred$rndDur, rndPred$rndMag, col = "light grey", cex = 0.5, xlab =
"Duration (time)", ylab = "Magnitude (x)")
points(copPts[,1], copPts[,2], col = "red", cex = 0.5)
contour(simDF$simDur, simDF$simMag, simPredMat, levels = NEP, labels = RI,
xaxs = 'i', yaxs = 'i', labcex = 0.6, lwd = 1, col = "black", add =
TRUE, method = "flattest", vfont = c("sans serif", "plain"))
And now for my attempt to recreate in ggplot2 (which fails to draw contours).
library(dplyr)
simDF <- data.frame(dur = expDurMagMat[, 1], mag = expDurMagMat[, 2], NEP = simPred)
simDF <- simDF %>%
dplyr::mutate(quaDur = qgev(NEP, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])) %>%
dplyr::mutate(quaMag = qgev(NEP, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3]))
library(ggplot2)
ggplot(data = rndPred, aes(x = rndDur, y = rndMag)) +
geom_point(color = "light grey", alpha = 0.5) +
labs(x = "Duration (time)", y = "Magnitude (x)") +
geom_point(data = copPts, aes(x = duration, y = magnitude),
color = "red") +
geom_contour(data = simDF, aes(x = quaDur, y = quaMag, z = NEP),
inherit.aes = FALSE, breaks = NEP) +
theme_classic()
Thank you to anyone who can help.
I am really confused. I would like to change the axis labels of a plot (classification or uncertainty) for a 'Mclust' model object in R and I don't understand why it's working for a simple object with just two variables, but not several ones.
Here an example:
require(mclust)
mod1 = Mclust(iris[,1:2])
plot(mod1, what = "uncertainty", dimens = c(1,2), xlab = "test")
# changed x-axis-label
mod2 = Mclust(iris[,1:4])
plot(mod2, what = "uncertainty", dimens = c(1,2), xlab = "test")
# no changed x-axis-label
Another way I tried was with coordProj:
coordProj(data= iris[, -5], dimens = c(1,2), parameters = mod2$parameters,
z = mod2$z, what = "uncertainty", xlab = "test")
# Error in plot.default(data[, 1], data[, 2], pch = 19, main = "", xlab = xlab, :
# formal argument "xlab" matched by multiple actual arguments
So I thought, maybe it will work with ggplot2 (and that would be my favourite option). Now I can change the axis labels and so on but I don't know how to plot the ellipses?
require(ggplot2)
ggplot(data = iris) +
geom_point(aes(x = Sepal.Length, y = Sepal.Width, size = mod2$uncertainty)) +
scale_x_continuous(name = "test")
It would be nice, if someone might know a solution to change the axis labels in plot.Mclust or to add the ellipses to ggplot.
Thanks a lot!
I started to look at the code for plot.Mclust, but then I just used stat_ellipse and changed the level until the plots looked the same. It appears to be a joint t-distribution (the default) at 50% confidence (instead of the default 95%). There's probably a better way to do it using the actual covariance matrix (mod2$parameters$variance$sigma), but this gets you to where you want.
require(dplyr)
iris %>%
mutate(uncertainty = mod2$uncertainty,
classification = factor(mod2$classification)) %>%
ggplot(aes(Sepal.Length, Sepal.Width, size = uncertainty, colour = classification)) +
geom_point() +
guides(size = F, colour = F) + theme_classic() +
stat_ellipse(level = 0.5, type = "t") +
labs(x = "Label X", y = "Label Y")
Consider following script to plot an impulse response function:
library(vars)
Canada <- Canada * 999
var <- VAR(Canada, p = 2, type = "both")
plot(irf(var, impulse = "rw", response = "U", boot = T, cumulative = FALSE, n.ahead = 20))
plot(irf(var, impulse = "rw", response = "U", boot = T, cumulative = TRUE, n.ahead = 20))
I wonder how I could access the data of the plot (and 95% intervals)?
It would be great to print a plot with a color filled confidence band, a green impulse response line and different axis descriptions. A solution with R's inbuild plot features would be preferred over ggplot.
Thanks!
You can view the data returned by irf:
library("vars")
# generate some dummy data
df <- data.frame(n=rnorm(100), p=rpois(100, 2))
var <- VAR(df, p = 2, type = "both")
irf <- irf(var, impulse = "n", response = "p", boot = T,
cumulative = FALSE, n.ahead = 20)
# inspect coefficients object
str(irf)
All the data you need is accessible from here (e.g. check irf$Lower and irf$Upper).
One way to customise the default plot would be to look at the source of the function being called when you run plot(irf):
vars:::plot.varirf
In this case it's a bit involved but you can copy the body of this function and edit the code to change the colours, draw a filled polygon and edit the labels of the axes to get them exactly the way you want.
Updated:
Here's a starting point for the confidence bands:
# set up the base plot
plot(irf$irf$n, type="n", ylim = c(-.3, .5),
ylab = "Your label", xlab = "Another label")
abline(h=0)
# draw the filled polygon for confidence intervals
polygon(
c(1:length(irf$Upper$n), length(irf$Lower$n):1),
c(irf$Upper$n, rev(irf$Lower$n)),
col = "grey80", border = NA)
# add coefficient estimate line
lines(irf$irf$n, col = "darkgreen")
I had a similar problem, so I modeled it myself. I am not an advanced R user so maybe someone can put that into a function or so.
This method creates a plot of all IRFs, with a vertical at y=0, the names of the impulses on the x-axis and the responses on the y-axis. The IRF-plots are also size-adjusted.
"VAR_BS_9016_5VAR" is my "varest" object. I used 5 variables but this method can easily be shortened or expanded.
par(mfrow=c(5,5), oma = c(0,0,0,0) + 0.1, mar = c(5,5,0,0) + 0.1)
for (i in 1:5){
for (j in 1:5){
var_plot=irf(VAR_BS_9016_5VAR, impulse = paste(colnames(VAR_BS_9016_5VAR$y)[i]), response=paste(colnames(VAR_BS_9016_5VAR$y)[j]), n.ahead = 20, ortho=TRUE, boot=TRUE, runs=1000, ci=0.9)
plot(x=c(1:21), y=unlist(var_plot$Lower), type="l", lwd = 3, lty=2,col="red", ylab=paste(colnames(VAR_BS_9016_5VAR$y)[j]), xlab=paste(var_plot$impulse), ylim=range(c(unlist(var_plot$Lower),unlist(var_plot$Upper))) )
lines(x=c(1:21),y=unlist(var_plot$Upper),type="l",lwd = 3, lty=2,col="red")
lines(x=c(1:21),y=unlist(var_plot$irf),type="l", lwd = 3)
abline(a = NULL, h = 0)
}
}
Here is my solution for obtaining a data frame that can be used in ggplot when you have multiple impulses and multiple responses.
For the pipe operator please get library(dplyr). Be careful since dplyr and MASS (dependency of vars-package) have naming conflicts (e.g., for "select"):
getIRFPlotData <- function(impulse, response, list) {
cbind.data.frame(Week = 0:(nrow(list[[1]][[1]])-1),
Lower = list[[2]][names(list[[2]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
irf = list[[1]][names(list[[1]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
Upper = list[[3]][names(list[[3]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
Impulse = impulse,
Response = response, stringsAsFactors = FALSE)
}
With this you can return a data.frame with columns = Lower, irf, Upper, Impulse, Response. When you use dplyr::bind_rows() on the data frames you can stack the different data.frames on top of each other and using ggplot2::facet_wrap() and facet_grid() you can produce charts similar to the ones outputted by vars:::plot.varirf(), but are fully flexible to append stuff and work with the data.
getIRFPlotData("Spendings", "Returns", irf4c) %>% ggplot(.) + geom_line(aes(Week, Lower), linetype="dashed") + geom_line(aes(Week, irf)) + geom_line(aes(Week, Upper),linetype="dashed") + geom_ribbon(aes(Week, ymin=Lower, ymax=Upper), alpha = 0.3) + theme_minimal()