plotting nls fits with overlapping prediction intervals in a single figure

plotting nls fits with overlapping prediction intervals in a single figure - r

Say I some data, d, and I fit nls models to two subsets of the data.
x<- seq(0,4,0.1)
y1<- (x*2 / (0.2 + x))
y1<- y1+rnorm(length(y1),0,0.2)
y2<- (x*3 / (0.2 + x))
y2<- y2+rnorm(length(y2),0,0.4)
d<-data.frame(x,y1,y2)
m.y1<-nls(y1~v*x/(k+x),start=list(v=1.9,k=0.19),data=d)
m.y2<-nls(y2~v*x/(k+x),start=list(v=2.9,k=0.19),data=d)
I then want to plot the fitted model regression line over data, and shade the prediction interval. I can do this with the package investr and get nice plots for each subset individually:
require(investr)
plotFit(m.y1,interval="prediction",ylim=c(0,3.5),pch=19,col.pred='light blue',shade=T)
plotFit(m.y2,interval="prediction",ylim=c(0,3.5),pch=19,col.pred='pink',shade=T)
However, if I plot them together I have a problem. The shading of the second plot covers the points and shading of the first plot:
1: How can I make sure the points on the first plot end up on top of the shading of the second plot?
2: How can I make the region where the shaded prediction intervals overlap a new color (like purple, or any fusion of the two colors that are overlapping)?

Use adjustcolor to add transparency like this:
plotFit(m.y1, interval = "prediction", ylim = c(0,3.5), pch = 19,
col.pred = adjustcolor("lightblue", 0.5), shade = TRUE)
par(new = TRUE)
plotFit(m.y2, interval = "prediction", ylim = c(0,3.5), pch = 19,
col.pred = adjustcolor("light pink", 0.5), shade = TRUE)
Depending on what you want you can play around with the two transparency values (here both set to 0.5) and possibly make only one of them transparent.

Related

R - update boxplot axis range after adding points

I have a boxplot which summarizes ~60000 turbidity data points into quartiles, median, whiskers and sometimes outliers. Often a few outliers are so high up that the whole plot is compressed at the bottom, and I therefor choose to omit the outliers. However, I also have added averages to the plots as points, and I want these to be plotted always. The problem is that the y-axis of the boxplot does not adjust to the added average points, so when averages are far above the box they are simply plotted outside the chart window (see X-point for 2020, but none for 2021 or 2022). Normally with this parameter, the average will be between the whisker end and the most extreme outliers. This is normal, and expected in the data.
I have tried to capture the boxplot y-axis range to compare with the average, and then setting the ylim if needed, but I just don't know how to retrieve these axis ranges.
My code is just
boxplot(...)
points(...)
and works as far as plotting the points. Just not adjusting the y-axis.
Question 1: is it not possible to get the boxplot to redraw with the new points data? I thought this was standard in R plots.
Question 2: if not, how can I dynamically adjust the y-axis range?

Let's try to show a concrete example of the problem with some simulated data:
set.seed(1)
df <- data.frame(y = c(rexp(99), 150), x = rep(c("A", "B"), each = 50))
Here, group "B" has a single outlier at 150, even though most values are a couple of orders of magnitude lower. That means that if we try to draw a boxplot, the boxes get squished at the bottom of the plot:
boxplot(y ~ x, data = df, col = "lightblue")
If we remove outliers, the boxes plot nicely:
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
The problem comes when we want to add a point indicating the mean value for each boxplot, since the mean of "B" lies outside the plot limits. Let's calculate and plot the means:
mean_vals <- sapply(split(df$y, df$x), mean)
mean_vals
#> A B
#> 0.9840417 4.0703334
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
The mean for "B" is missing because it lies above the upper range of the plot.
The secret here is to use boxplot.stats to get the limits of the whiskers. By concatenating our vector of means to this vector of stats and getting its range, we can set our plot limits exactly where they need to be:
y_limits <- range(c(boxplot.stats(df$y)$stats, mean_vals))
Now we apply these limits to a new boxplot and draw it with the points:
boxplot(y ~ x, data = df, outline = FALSE, ylim = y_limits, col = "lightblue")
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
For comparison, you could do the whole thing in ggplot like this:
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_boxplot(fill = "lightblue", outlier.shape = NA) +
geom_point(size = 3, color = "red", stat = "summary", fun = mean) +
coord_cartesian(ylim = range(c(range(c(boxplot.stats(df$y)$stats,
mean_vals))))) +
theme_classic(base_size = 16)
Created on 2023-02-05 with reprex v2.0.2

Adding a 95% confidence interval to NMDS plot

I am trying to plot an NMDS plot of species community composition data with ellipses which represent 95% confidence intervals. I generated the data for my NMDS plot using metaMDS and successfully have ordinations generated using the basic plot functions in R (see code below). However, I am struggling to get my data to plot successfully using ggplot2 and this is the only way I have seen 95% CIs plotted on NMDS plots. I am hoping someone is able to help me correct my code so the ellipses show 95% CIs, or could point me in the right direction for achieving this using other methods?
My basic code for plotting my NMDS plot:
orditorp(dung.families.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(group.variables$Heating)], col = c("green", "blue", "orange", "black") [as.numeric(group.variables$Dungfauna)], cex = 1.3)
ordiellipse(dung.families.mds, groups = group.variables$Dungfauna, draw = "polygon", lty = 1, col = "grey90")
legend("topleft", "stress = 0.1329627", bty = "n", cex = 1)
My ordination:

I realize this question is old, but I found this post useful for plotting confidence ellipses during my work, and maybe it will help you. Plotting ordiellipse function from vegan package onto NMDS plot created in ggplot2
Edit: Below I have copied the code from the second part of Didzis Elferts's answer on the link above.
Where "sol" is the metaMDS object:
First, make NMDS data frame with group column.
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = >sol$points[,2],group=MyMeta$amt)
Next, save result of function ordiellipse() as some object.
ord<-ordiellipse(sol, MyMeta$amt, display = "sites", >kind = "se", conf = 0.95, label = T)
Data frame df_ell contains values to show ellipses. It is calculated again with function veganCovEllipse which is hidden in vegan package. This function is applied to each level of NMDS (group) and now it uses arguments stored in ord object - cov, center and scale of each level.
df_ell <- data.frame()
for(g in levels(NMDS$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(NMDS[NMDS$group==g,],
veganCovEllipse(ord[[g]]$cov,ord[[g]]$center,ord[[g]]$scale)))
,group=g))
}
Plotting is done the same way as in previous example. As for the calculating of coordinates for elipses object of ordiellipse() is used, this solution will work with different parameters you provide for this function.
ggplot(data = NMDS, aes(MDS1, MDS2)) + geom_point(aes(color = group)) +
geom_path(data=df_ell, aes(x=NMDS1, y=NMDS2,colour=group), size=1, linetype=2)

Formatting histograms in R

I'm trying to fit Variance-Gamma distribution to empirical data of 1-minute logarithmic returns. In order to visualize the results I plotted together 2 histograms: empirical and theoretical.
(a is the vector of empirical data)
SP_hist <- hist(a,
col = "lightblue",
freq = FALSE,
breaks = seq(a, max(a), length.out = 141),
border = "white",
main = "",
xlab = "Value",
xlim = c(-0.001, 0.001))
hist(VG_sim_rescaled,
freq = FALSE,
breaks = seq(min(VG_sim_rescaled), max(VG_sim_rescaled), length.out = 141),
xlab = "Value",
main = "",
col = "orange",
add = TRUE)
(empirical histogram-blue, theoretical histogram-orange)
However, after having plotted 2 histograms together, I started wondering about 2 things:
In both histograms I stated, that freq = FALSE. Therefore, the y-axis should be in range (0, 1). In the actual picture values on the y-axis exceed 3,000. How could it happen? How to solve it?
I need to change the bucketing size (the width of the buckets) and the density per unit length of the x-axis. How is it possible to do these tasks?
Thank you for your help.

freq=FALSE means that the area of the entire histogram is normalized to one. As your x-axis has a very small range (about 10^(-4)), the y-values must be quite large to achieve an area (= x times y) of one.
The only way to set the number of bins is by providing a vector of break points to the parameter breaks. Theoretically, this parameter also accepts a single number, but this number is ignored by hist. Thus try the following:
bins <- 6 # number of cells
breaks <- seq(min(x),max(x),(max(x)-min(x))/bins)
hist(x, freq=FALSE, breaks=breaks)

How to overlay density histogram with gamma distribution fit in R?

I am new to R and would like to add a fit to a gamma distribution to my histogram. I would like the gamma distribution fit to overlay my histogram.
I am able to calculate the gamma distribution with the dgamma function and also with the fitdist function. However, I am not able to overlay this gamma distribution as a fit onto my histogram.
This is the code I tried:
hist(mydata, breaks = 30, freq = FALSE, col = "grey")
lines(dgamma(mydata, shape = 1))
The code I tried does not overlay the gamma distribution fit onto my histogram. I only get the histogram without the fit.

See if the following example can help in overlaying
a fitted line in black
a PDF graph in red, dotted
on a histogram.
First, create a dataset.
set.seed(1234) # Make the example reproducible
mydata <- rgamma(100, shape = 1, rate = 1)
Now fit a gamma distribution to the data.
param <- MASS::fitdistr(mydata, "gamma")
This vector is needed for the fitted line.
x <- seq(min(mydata), max(mydata), length.out = 100)
And plot them all.
hist(mydata, breaks = 30, freq = FALSE, col = "grey", ylim = c(0, 1))
curve(dgamma(x, shape = param$estimate[1], rate = param$estimate[2]), add = TRUE)
lines(sort(mydata), dgamma(sort(mydata), shape = 1),
col = "red", lty = "dotted")

Share area of full confidence interval (base graphics)

I am using the following code in R to a plot a linear regression with confidence interval bands (95%) around the regression line.
Average <- c(0.298,0.783429,0.2295,0.3725,0.598,0.892,2.4816,2.79975,
1.716368,0.4845,0.974133,0.824,0.936846,1.54905,0.8166,1.83535,
1.6902,1.292667,0.2325,0.801,0.516,2.06645,2.64965,2.04785,0.55075,
0.698615,1.285,2.224118,2.8576,2.42905,1.138143,1.94225,2.467357,0.6615,
0.75,0.547,0.4518,0.8002,0.5936,0.804,0.7,0.6415,0.702182,0.7662,0.847)
Area <-c(8.605,16.079,4.17,5.985,12.419,10.062,50.271,61.69,30.262,11.832,25.099,
8.594,17.786,36.995,7.473,33.531,30.97,30.894,4.894,8.572,5.716,45.5,69.431,
40.736,8.613,14.829,4.963,33.159,66.32,37.513,27.302,47.828,39.286,9.244,19.484,
11.877,9.73,11.542,12.603,9.988,7.737,9.298,14.918,17.632,15)
lm.out <- lm (Area ~ Average)
newx = seq(min(Average), by = 0.05)
conf_interval <- predict(lm.out, newdata = data.frame(Average = newx), interval ="confidence",
level = 0.95)
plot(Average, Area, xlab ="Average", ylab = "Area", main = "Regression")
abline(lm.out, col = "lightblue")
lines(newx, conf_interval[,2], col = "blue", lty ="dashed")
lines(newx, conf_interval[,3], col = "blue", lty ="dashed")
I am stuck because the graph I got reports the bands just for the first part pf the line, leaving out all the remaining line (you find the link to the image at the bottom of the message). What is going wrong? I would also like to shade the area of the confidence interval (not just the lines corresponding to the limits) but I can't understand how to do it.
Any help would be really appreciated, I am completely new in R.

This is very easy with the ggplot2 -library. Here is the code:
library(ggplot2)
data = data.frame(Average, Area)
ggplot(data=data, aes(x=Area, y=Average))+
geom_smooth(method="lm", level=0.95)+
geom_point()
Code to install the library:
install.packages("ggplot2")