plotting regression line in with lattice - r

I'm having a bit of a trouble here, please help me.
I have this data
set.seed(4)
mydata <- data.frame(var = rnorm(100),
temp = rnorm(100),
subj = as.factor(rep(c(1:10),5)),
trt = rep(c("A","B"), 50))
and this model that fits them
lm <- lm(var ~ temp * subj, data = mydata)
I want to plot the results with lattice and fit the regression line, predicted with my model, through them. To do so, I'm using this approach, outlined "Lattice Tricks for the power useR" by D. Sarkar
temp_rng <- range(mydata$temp, finite = TRUE)
grid <- expand.grid(temp = do.breaks(temp_rng, 30),
subj = unique(mydata$subj),
trt = unique(mydata$trt))
model <- cbind(grid, var = predict(lm, newdata = grid))
orig <- mydata[c("var","temp","subj","trt")]
combined <- make.groups(original = orig, model = model)
xyplot(var ~ temp | subj,
data = combined,
groups = which,
type = c("p", "l"),
distribute.type = TRUE
)
So far every thing is fine, but I also want to assign a fill color to the data points for the two treatments trt=1 and trt=2.
So I have written this piece of code, that works fine, but when it comes to plot the regression line, it seems that type is not recognized by the panel function...
my.fill <- c("black", "grey")
plot <- with(combined,
xyplot(var ~ temp | subj,
data = combined,
group = combined$which,
type = c("p", "l"),
distribute.type = TRUE,
panel = function(x, y, ..., subscripts){
fill <- my.fill[combined$trt[subscripts]]
panel.xyplot(x, y, pch = 21, fill = my.fill, col = "black")
},
key = list(space = "right",
text = list(c("trt1", "trt2"), cex = 0.8),
points = list(pch = c(21), fill = c("black", "grey")),
rep = FALSE)
)
)
plot
I've also tried to move type and distribute type within panel.xyplot, as well as subsetting the data in it panel.xyplot like this
plot <- with(combined,
xyplot(var ~ temp | subj,
data = combined,
panel = function(x, y, ..., subscripts){
fill <- my.fill[combined$trt[subscripts]]
panel.xyplot(x[combined$which=="original"], y[combined$which=="original"], pch = 21, fill = my.fill, col = "black")
panel.xyplot(x[combined$which=="model"], y[combined$which=="model"], type = "l", col = "black")
},
key = list(space = "right",
text = list(c("trt1", "trt2"), cex = 0.8),
points = list(pch = c(21), fill = c("black", "grey")),
rep = FALSE)
)
)
plot
but no success with that either.
Can anyone help me to get the predicted values plotted as a line instead of being points?

This might be a job for the latticeExtra package.
library(latticeExtra)
p1 <- xyplot(var ~ temp | subj, data=orig, panel=function(..., subscripts) {
fill <- my.fill[combined$trt[subscripts]]
panel.xyplot(..., pch=21, fill=my.fill, col="black")
})
p2 <- xyplot(var ~ temp | subj, data=model, type="l")
p1+p2
I'm not sure what's going on in your first attempt, but the one with the subscripts isn't working because x and y are a subset of the data for subj, so subsetting them using a vector based on combined won't work the way you think it will. Try this instead.
xyplot(var ~ temp | subj, groups=which, data = combined,
panel = function(x, y, groups, subscripts){
fill <- my.fill[combined$trt[subscripts]]
g <- groups[subscripts]
panel.points(x[g=="original"], y[g=="original"], pch = 21,
fill = my.fill, col = "black")
panel.lines(x[g=="model"], y[g=="model"], col = "black")
},
key = list(space = "right",
text = list(c("trt1", "trt2"), cex = 0.8),
points = list(pch = c(21), fill = c("black", "grey")),
rep = FALSE)
)

This may be trivial, but you may try:
xyplot(... , type=c("p","l","r"))
"p" adds points, "l" connects them with broken lines, "r" fits a linear model through your data. type="r" alone will plot only regression lines without showing data points.

It might be easier to simply use the panel.lmline function on just your original data:
xyplot(var ~ temp | subj,
data = orig,
panel = function(x,y,...,subscripts){
fill <- my.fill[orig$trt[subscripts]]
panel.xyplot(x, y, pch = 21, fill = my.fill,col = "black")
panel.lmline(x,y,col = "salmon")
},
key = list(space = "right",
text = list(c("trt1", "trt2"), cex = 0.8),
points = list(pch = c(21), fill = c("black", "grey")),
rep = FALSE)
)

Related

How to plot two `ggscatter` correlation plots with confidence intervals on the same graph in R?

I am trying to get a spearman correlation for two scatter plots with confidence intervals on the same graph but it is proving difficult to do.
Given the following data:
spentWithTool <- sample(1:7, 20, replace = TRUE)
understoodWithTool <- sample(1:5, 20, replace = TRUE)
spentWithoutTool <- sample(1:4, 10, replace = TRUE)
understoodWithoutTool <- sample(1:5, 10, replace = TRUE)
This is the best workaround I came up with but it is not what I want - I want the withTool and the withoutTool plots to be on the same graph and NOT side by side.
plot_with <- ggscatter(data = data.frame(spentWithTool, understoodWithTool),
x = 'spentWithTool',
y = 'understoodWithTool',
color = 'darkred',
cor.method = "spearman",
cor.coef = TRUE,
conf.int = TRUE,
add = "reg.line",
add.params = list(color = 'red'),
)
plot_without <- ggscatter(data = data.frame(spentWithoutTool, understoodWithoutTool),
x = 'spentWithoutTool',
y = 'understoodWithoutTool',
color = 'darkblue',
cor.method = "spearman",
cor.coef = TRUE,
conf.int = TRUE,
add = "reg.line",
add.params = list(color = 'blue')
ggarrange(plotlist = list(plot_with, plot_without))
The above gives me:
Using ggpar(p = list(plot_with, plot_without)) just created two separate graphs so it was not helpful.
I am trying to get two lines (red and blue), where each line represents withTool and withoutTool respectively and the x and y axis represent timeSpent and understood respectively. Ideally, the confidence intervals, would be the same colour as what the line represents (red for withTool and blue for withoutTool).
Is there a way to get both correlation lines and points on the same graph?
Aside - Is there a way to use ggscatter without creating a data frame from vectors?
After checking out the docs and trying several options using the color and ggp arguments of ggscatter IMHO the easiest and less time-consuming option to achieve your desired result would be to build your plot from scratch using ggplot2 with some support from ggpubr to add the regression equations and the theme:
set.seed(1)
spentWithTool <- sample(1:7, 20, replace = TRUE)
understoodWithTool <- sample(1:5, 20, replace = TRUE)
spentWithoutTool <- sample(1:4, 10, replace = TRUE)
understoodWithoutTool <- sample(1:5, 10, replace = TRUE)
library(ggplot2)
library(ggpubr)
df <- rbind.data.frame(
data.frame(x = spentWithTool, y = understoodWithTool, id = "with"),
data.frame(x = spentWithoutTool, y = understoodWithoutTool, id = "without")
)
ggplot(df, aes(x, y, color = id, fill = id)) +
geom_point() +
geom_smooth(method = "lm") +
stat_cor(method = "spearman") +
scale_color_manual(values = c(with = "red", without = "blue"), aesthetics = c("color", "fill")) +
theme_pubr() +
labs(x = "timeSpent", y = "understood")
#> `geom_smooth()` using formula = 'y ~ x'

Is there a way to plot exponential decay curves with different decay constants on one graph?

I've been trying to plot different exponential decay curves on to one graph. Initially I thought this would be rather be easy but it is turning out to be rather frustrating.
What I want to get:
nlsplot(k_data_nls, model = 6, start = c(a= 603.3, b= -0.03812), xlab = "hours", ylab = "copies")
nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723), xlab = "hours", ylab = "copies")
Here is some additional code for the data:
df4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(603.3,406,588,393.27,458.47,501.67,767.53,444.13,340.6,298.47,61.42,51.6))
nlsfit(df4, model=6, start=c(a=603.3,b=-0.009955831526))
d4plot <- nlsplot(df4, model=6, start=c(a=603.3,b=-0.009955831526))
r4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(26,13.44,4.57,3.12,6.89,0.71,0.47,0.47,0,0,0.24,0.48))
nlsLM(copies ~ a*exp(b*hours), data=r4, start=list(a=26,b=-0.65986))
r4plot <- nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723))
Essentially I want to be able to get both of these plots on one graph. I'm new to R so I'm not too sure where I can go from here. Thank you !
I don't know if this is actually helpful because it's so specific, but this is how I would do it (with ggplot2). First, you need data for the function you want to plot. Take the x for all the values you want to display and apply your function with your coefficients to the data. You need to have data points, not just a function, to plot data.
df_simulated <- data.frame("x" = rep(1:100, 2),
"class"= rep(c("DNA", "RNA"), each = 100))
df_simulated$y <- c(1683.7 * exp(-0.103 * 1:100), # DNA
578.7455 * exp(-0.156 * 1:100)) # RNA
However, since I never used the packages you used, I don't know how to extract the values from the models, so I took the values in your example plot. It's important that the "simulated" values for both groups are within one dataframe, and that you have a column which attributes the points to the respective group (RNA or DNA). At least it's easier if you do it like this. Then you need a data frame with your actual observations for the dots. I invented data again:
df_observed <- data.frame("x" = c(12, 13, 25, 26, 50, 51),
"y" = c(500, 50, 250, 25, 0, 5),
"class" = rep(c("DNA", "RNA"), 3))
Then you can create the plot. With color=class you specify that the data points will be grouped by "class" and will be colored accordingly. ("apple" and "banana" are just dummy words to demonstrate linebreaks)
ggplot() +
geom_line(data = df_simulated, aes(x = x, y = y, color = class), size = 1, linetype = "dashed") +
geom_point(data = df_observed, aes(x = x, y = y, color = class), size = 4, pch = 1) +
annotate("text", x = 50, y = 1250, label = "DNA\napple", color = "tomato", hjust = 0) +
annotate("text", x = 50, y = 750, label ="RNA\nbanana", color = "steelblue", hjust = 0) +
ggtitle(expression(~italic("Styela clava")~"(isolated)")) +
ylab("COI copies per 1ml") +
xlab("Time since removal of organisms (hours)") +
theme_classic() +
theme(legend.position = "none") +
scale_color_manual(values = c("DNA" = "tomato", "RNA" = "steelblue"))
This is the output:
First note that R squared is normally used for linear models and not for nonlinear models so the use of this statistic is suspect here; however, below we show it anyways since it seems that is what was asked for. A different goodness of fit measurement that is often used is residual standard error. If fm is the fitted model from nls then sigma(fm) is the residual standard error. Smaller values are more favorable. summary(fm) also reports this value.
For each of df4 and r4 we use lm to get starting values (taking log of both sides we get a model that is linear in log(a) and b), run nls fits and get the coefficients.
Now plot the points and add the fitted lines and legend. (Note that in setting up the graph we use rbind which assumes that df4 and r4 have the same column names, which they do.)
Note that the data provided in the question is much different than that shown in the question's image.
The code below does not need starting values since it uses lm to get them, runs nls and automatically extracts whatever information is needed for the graph.
1) Classic graphics In this alternative no packages are used.
r2 <- function(fm, digits = 3) {
y <- fitted(fm) + resid(fm)
r2 <- 1 - deviance(fm) / sum((y - mean(y))^2)
if (is.numeric(digits)) r2 <- round(r2, digits)
r2
}
fo <- copies ~ a * exp(b * hours) # formula used in nls
# get nls fitted model and coefficients for df4
co_d0 <- coef(lm(log(copies) ~ hours, df4, subset = copies > 0))
fmd <- nls(fo, df4, start = list(a = exp(co_d0[[1]]), b = co_d0[[2]]))
co_d <- round(coef(fmd), 4)
# get nls fitted model and coefficients for r4
co_r0 <- coef(lm(log(copies) ~ hours, r4, subset = copies > 0))
fmr <- nls(fo, r4, start = list(a = exp(co_r0[[1]]), b = co_r0[[2]]))
co_r <- round(coef(fmr), 4)
both <- rbind(cbind(df4, col = "red"), cbind(r4, col = "blue"))
plot(both[1:2], col = both$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0)
lines(fitted(fmd) ~ hours, df4, col = "red", lty = 2)
lines(fitted(fmr) ~ hours, r4, col = "blue", lty = 2)
legend <- c(bquote(DNA),
bquote(y == .(co_d[[1]]) * e ^ {.(co_d[[2]])*x}),
bquote(R^2 == .(r2(fmd))),
bquote(),
bquote(RNA),
bquote(y == .(co_r[[1]]) * e ^ {.(co_r[[2]])*x}),
bquote(R^2 == .(r2(fmr))))
legend("right", legend = as.expression(legend), bty = "n",
text.col = c("red", "red", "red", NA, "blue", "blue", "blue"))
2) ggplot2 This uses ggplot2 and gridtext. r2, fmd, fmr, co_d and co_r are all taken from (1). We use richtest_grob from gridtext to create a custom grob for the legend and pass it using annotate_custom.
library(gridtext)
library(ggplot2)
txt <- sprintf(
"<span style='color:red'>DNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>
<br><br><span style='color:blue'>RNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>",
co_d[[1]], co_d[[2]], r2(fmd), co_r[[1]], co_r[[2]], r2(fmr))
both2 <- rbind(cbind(df4, col = "red", fitted = fitted(fmd)),
cbind(r4, col = "blue", fitted = fitted(fmr)))
ggplot(both2, aes(hours, copies, col = I(col))) +
geom_point() +
geom_line(aes(y = fitted), linetype = 2) +
annotation_custom(richtext_grob(txt, hjust = 0)) +
theme(legend.position = "none") +
labs(x = "Time since removal of organisms", y = "COI copies per 1ml") +
ggtitle(("C)" ~ italic("Styela clava") ~ "(isolated)"))
3) lattice
This uses legend from (1) and both2 from (2). First create a plot for the data points. It will also contain the legend, axes and labels. Then add a layer for the fitted lines. main.settings specifies that the main title should be left justified and bold and is adapted from this page.
library(latticeExtra)
main.settings <- list(par.main.text = list(font = 2, just = "left",
x = grid::unit(25, "mm")))
xyplot(copies ~ hours, both2, col = both2$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0,
key = list(text = list(as.expression(legend),
col = c("red", "red", "red", NA, "blue", "blue", "blue")),
x = 0.65, y = 0.65, columns = 1),
par.settings = main.settings) +
as.layer(xyplot(fitted ~ hours, both2, groups = col, type = "l", lty = 2))

combining 2 dotplots with different y axis leads to overlap on x axis

I am trying to combine 2 dotplots using lattice and latticeExtra packages but am finding that the data groups on the x axis overlap in the combined plot. Here is a reproducible example:
First I create 2 reproducible data sets and melt them so that they are long instead of wide:
require(lattice)
df1 <- data.frame(Treatment = rep(c("B", "C"), each = 6),
LocB = sample(1:100, 12),
LocC = sample(1:100, 12))
dftwo <- data.frame(Treatment = rep(c("A"), each = 6),
LocA = sample(1:100, 6))
dat.reprod1 <- melt(df1, id.vars = 'Treatment')
dat.reprod2 <- melt(dftwo, id.vars = 'Treatment')
And then I create a dotplot for each dataset:
dotreprod1 <- dotplot(value ~ Treatment, data = dat.reprod1,
par.strip.text = list(cex = 3),
cex = 2)
dotreprod2 <- dotplot(value ~ Treatment, data = dat.reprod2,
par.strip.text = list(cex = 3), col = "orange",
cex = 2)
And then I combine them, adding a new Y axis for dotreprod2:
require(latticeExtra)
doubleYScale(dotreprod1, dotreprod2, add.ylab2 = TRUE, use.style = F)
Unfortunately there is no room on the x axis of the combined plot for "A" and so the orange points overlap with the blue ones. Is it possible to create space on the X axis so that "A","B", and "C" are next to one another and the points do not overlap?
In both individual plots, specify the x variable as a factor with levels of the combined data, and set drop.unused.levels = FALSE
dotreprod1 <- dotplot(value ~ factor(Treatment, levels = LETTERS[1:3]),
data = dat.reprod1,
drop.unused.levels = FALSE)
dotreprod2 <- dotplot(value ~ factor(Treatment, levels = LETTERS[1:3]),
data = dat.reprod2,
col = "orange",
drop.unused.levels = FALSE)
doubleYScale(dotreprod1, dotreprod2, add.ylab2 = TRUE, use.style = FALSE)

How to get ggplot2 geom_contour to replicate base graphics contour

I am using a copula to look at the probability of occurrence of events based on duration and magnitude of the events. I can create contours for recurrence intervals with observed and simulated data in base R graphics, but I can't figure out how to reproduce in ggplot2. Why not just produce the graphs in base graphics and move on you may be wondering? Because I'm including the graphs in a short summary report and want to have consistency with numerous other graphs in the report. Below is some example code. I know that using the location, scale, and shape for the GEV distribution to create random deviates to get the same distribution from is not ideal, but it is the best way I could think of to create a somewhat reproducible example, despite the poor correlation at the end. In base R, the contours are generated from a matrix of simulated data. Is this possible in ggplot2?
library(evd)
library(copula)
dur <- rgev(500, 2.854659, 2.170122, -0.007829)
mag <- rgev(500, 0.02482, 0.01996, 0.04603)
fDurGev <- fgev(dur)
fMagGev <- fgev(mag)
durVec <- dgev(dur, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])
magVec <- dgev(mag, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3])
durMagMat <- as.matrix(cbind(duration = durVec, magnitude = magVec))
theta <- coef(fitCopula(claytonCopula(dim = 2), durMagMat, method = "itau"))
clayCop <- claytonCopula(theta, dim = 2)
fCopDurMag <- pCopula(durMagMat, clayCop)
copPts <- data.frame(duration = dur, magnitude = mag, copNEP = fCopDurMag,
copEP = (1 - fCopDurMag), copRI = (1 / fCopDurMag))
fSim <- seq(0.05, 0.99998, length.out = 1000)
quaDur <- qgev(fSim, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])
quaMag <- qgev(fSim, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3])
expDurMagMat <- cbind(expand.grid(fSim, fSim)$Var1, expand.grid(fSim,
fSim)$Var2)
simPred <- pCopula(expDurMagMat, clayCop)
simPredMat <- matrix(simPred, 1000, 1000)
simDF <- data.frame(simDur = quaDur, simMag = quaMag, simPredMat)
rndPred <- data.frame(rCopula(5000, clayCop))
rndPred$rndDur <- qgev(rndPred[,1], fDurGev[[1]][1], fDurGev[[1]][2],
fDurGev[[1]][3])
rndPred$rndMag <- qgev(rndPred[,2], fMagGev[[1]][1], fMagGev[[1]][2],
fMagGev[[1]][3])
RI <- c(1.25, 2 ,5, 10, 20, 50, 100, 200, 500)
NEP <- 1 - (1 / RI)
plot(rndPred$rndDur, rndPred$rndMag, col = "light grey", cex = 0.5, xlab =
"Duration (time)", ylab = "Magnitude (x)")
points(copPts[,1], copPts[,2], col = "red", cex = 0.5)
contour(simDF$simDur, simDF$simMag, simPredMat, levels = NEP, labels = RI,
xaxs = 'i', yaxs = 'i', labcex = 0.6, lwd = 1, col = "black", add =
TRUE, method = "flattest", vfont = c("sans serif", "plain"))
And now for my attempt to recreate in ggplot2 (which fails to draw contours).
library(dplyr)
simDF <- data.frame(dur = expDurMagMat[, 1], mag = expDurMagMat[, 2], NEP = simPred)
simDF <- simDF %>%
dplyr::mutate(quaDur = qgev(NEP, fDurGev[[1]][1], fDurGev[[1]][2], fDurGev[[1]][3])) %>%
dplyr::mutate(quaMag = qgev(NEP, fMagGev[[1]][1], fMagGev[[1]][2], fMagGev[[1]][3]))
library(ggplot2)
ggplot(data = rndPred, aes(x = rndDur, y = rndMag)) +
geom_point(color = "light grey", alpha = 0.5) +
labs(x = "Duration (time)", y = "Magnitude (x)") +
geom_point(data = copPts, aes(x = duration, y = magnitude),
color = "red") +
geom_contour(data = simDF, aes(x = quaDur, y = quaMag, z = NEP),
inherit.aes = FALSE, breaks = NEP) +
theme_classic()
Thank you to anyone who can help.

Plotting points and lines separately in R with ggplot

I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines

Resources