"lines()" acting like "polygon()" R? - r

I'm trying to draw two black lines around a red regression line. But the lines() command draws something more like a polygon() than a simple line (see picture below code).
I'm wondering is there a fix to simply draw two lines around the regression line (i.e., uncertainty intervals), or I'm missing something?
library(rstanarm)
data(kidiq)
d <- kidiq
fit <- stan_glm(kid_score ~ mom_iq,
data = d,
prior = normal(0, 2.5),
prior_intercept = normal(0, 10),
prior_aux = cauchy(0, 100))
plot(kid_score ~ mom_iq, data = d, type = "n")
abline(fit, col = 2)
pred_lin <- posterior_linpred(fit)
loop <- length(d$mom_iq)
I <- matrix(NA, loop, 2)
for(i in 1:loop){
I[i,] = quantile(pred_lin[,i], c(.025, .975))
}
lines(d$mom_iq, I[,1], lty = 2)
lines(d$mom_iq, I[,2])

Try the ordered data.frame like:
a <- cbind(d$mom_iq, I[,1])
a <- a[order(a[,1]),]
lines(a)
So you can also write:
lines(sort(d$mom_iq), I[,2][order(d$mom_iq)])
or simply:
apply(I, 2, function(x) lines(sort(d$mom_iq), x[order(d$mom_iq)]))

Related

Fitting a sigmoid curve using a logistic function in R

I have data that follows a sigmoid curve and I would like fit a logistic function to extract the three (or two) parameters for each participant. I have found some methods online, but I'm not sure which is the correct option.
This tutorial explains that you should use the nls() function like this:
fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))
## get the coefficients using the coef function
params=coef(fitmodel)
... where you clearly need the starting values to find the best-fitting values (?).
And then this post explains that to get the starting values, you can use a "selfstarting model can estimate good starting values for you, so you don't have to specify them":
fit <- nls(y ~ SSlogis(x, Asym, xmid, scal), data = data.frame(x, y))
However somewhere else I also read that you should use the SSlogis function for fitting a logistic function. Please could someone confirm whether these two steps are the best way to go about it? Or should I use values that I have extracted from previous similar data for the starting values?
Additionally, what should I do if I don't want the logistic function to be defined by the asymptote at all?
Thank you!
There isn't a best way but SSlogis does eliminate having to set starting values whereas if you specify the formula you have more control over the parameterization.
If the question is really how to fix a at a predetermined level, here the value 1, without rewriting the formula then set a before running nls and omit it from the starting values.
a <- 1
fo <- y ~ a / (1 + exp(-b * (x-c)))
nls(fo, start = list(b = 0.5, c = 25))
Alternately this substitutes a=1 into formula fo giving fo2 without having to rewrite the formula yourself.
fo2 <- do.call("substitute", list(fo, list(a = 1)))
nls(fo2, start = list(b = 0.5, c = 25))
As #G. Grothendieck writes, there is no general "best way", it always depends on you particular aims. Use of SSLogis is a good idea, as you don't need to specify start values, but a definition of an own function is more flexible. See the following example, where we use heuristics to derive start values ourselves instead of specifying them manually. Then we fit a logistic model and as a small bonus, the Baranyi growth model with an explicit lag phase.
# time (t)
x <- c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
# Algae cell counts (Mio cells per ml)
y <- c(0.88, 1.02, 1.43, 2.79, 4.61, 7.12,
6.47, 8.16, 7.28, 5.67, 6.91)
## we now plot the data linearly and logarithmically
## the layout function is another way to subdivide the plotting area
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE), respect = TRUE)
layout.show(nf) # this shows how the plotting area is subdivided
plot(x, y)
plot(x, log(y))
## we see that the first points show the steepest increase,
## so we can estimate a start value of the growth rate
r <- (log(y[5]) - log(y[1])) / (x[5] - x[1])
abline(a=log(y[1]), b=r)
## this way, we have a heuristics for all start parameters:
## r: steepest increase of y in log scale
## K: maximum value
## N0: first value
## we can check this by plotting the function with the start values
f <- function(x, r, K, N0) {K /(1 + (K/N0 - 1) * exp(-r *x))}
plot(x, y, pch=16, xlab="time (days)", ylab="algae (Mio cells)")
lines(x, f(x, r=r, K=max(y), N0=y[1]), col="blue")
pstart <- c(r=r, K=max(y), N0=y[1])
aFit <- nls(y ~ f(x, r, K,N0), start = pstart, trace=TRUE)
x1 <- seq(0, 25, length = 100)
lines(x1, predict(aFit, data.frame(x = x1)), col = "red")
legend("topleft",
legend = c("data", "start parameters", "fitted parameters"),
col = c("black", "blue", "red"),
lty = c(0, 1, 1),
pch = c(16, NA, NA))
summary(aFit)
(Rsquared <- 1 - var(residuals(aFit))/var(y))
## =============================================================================
## Approach with Baranyi-Roberts model
## =============================================================================
## sometimes, a logistic is not good enough. In this case, use another growth
## model
baranyi <- function(x, r, K, N0, h0) {
A <- x + 1/r * log(exp(-r * x) + exp(-h0) - exp(-r * x - h0))
y <- exp(log(N0) + r * A - log(1 + (exp(r * A) - 1)/exp(log(K) - log(N0))))
y
}
pstart <- c(r=0.5, K=7, N0=1, h0=2)
fit2 <- nls(y ~ baranyi(x, r, K, N0, h0), start = pstart, trace=TRUE)
lines(x1, predict(fit2, data.frame(x = x1)), col = "forestgreen", lwd=2)
legend("topleft",
legend = c("data", "logistic model", "Baranyi-Roberts model"),
col = c("black", "red", "forestgreen"),
lty = c(0, 1, 1),
pch = c(16, NA, NA))

Conway Maxwell Distribution Density Plot

I have written my own code to simulate the Conway maxwell distribution sample.
This is the pmf (Guikema & Goffelt, 2008):
However, I have met some problem to plot the density plot.
rcomp <- function(n,lamb,v)
{
u <- runif(n)
w <- integer(n)
for(i in 1:n) {
z=sum(sapply( 0:100, function(j) (( ((lamb)^j) / (factorial(j)) )^v) ))
x <- seq(1, 50, 1) #seq of 1 to 50, increase by 1
px <- (((lamb^x)/factorial(x))^v)/z
# px is pmf of re-parameter conway maxwell
w[i] <- if (u[i] < px[1]) 0 else (max (which (cumsum(px) <= u[i])))
}
return (w)
}
dcomp <- function(x,lamb,v) {
z=sum(sapply( 0:100, function(j) (( ((lamb)^j) / (factorial(j)) )^v) ))
px <- (((lamb^x)/factorial(x))^v)/z
return(px)
}
As I wanna plot the density plot to check whether lamb or v is location parameter, the plot I get is weird.
x = rcomp(100,6,0.2); pdf = dcomp(x,6,0.2)
x1 = rcomp(100,6,0.5); pdf1 = dcomp(x1,6,0.5)
x2 = rcomp(100,6,0.7); pdf2 = dcomp(x2,6,0.7)
plot(x2, pdf2, type="l", lwd=1,lty=1,col="blue")
How could I solve this problem?
Source: Guikema & Goffelt (2008), A Flexible Count Data Regression Model for Risk Analysis. Risk Analysis 28(1): 215.
You have to sort the values of the x coordinate if you want a graph to connect the points in their axis order.
Note, however, that there might be better ways to graph the density you want. See the red curve. I first create a vector x of values within a certain range and then compute the PDF for those values. These pairs (x, y) are what function lines plots.
set.seed(2673) # Make the results reproducible
x2 <- rcomp(100, 6, 0.7)
x2 <- sort(x2)
pdf2 <- dcomp(x2, 6, 0.7)
plot(x2, pdf2, type = "l", lwd = 1, lty = 1, col = "blue")
x <- seq(0, 50, length.out = 100)
y <- dcomp(x, 6, 0.2)
lines(x, y, type = "l", col = "red")

How can I exclude certain values for curve fitting in R?

I have the following plot of some experimental data (see below). The red line is a fitting curve of the black dots, which are experimental values. Now, the first three dots at 0, 0.583, and 1.916 form a baseline and the next two, 2.083, 2.416, seem to be outliers. How can I program the fitting curve, so that it doesn't take into account baseline and outliers? At the moment, R is clearly trying to optimize also for those irrelevant values.
x <-
c(0,0.583333,1.916666,2.083333,2.416666,2.5,3.666666,5.916666,9,16.75,20)
y <-
c(
0.05464,0.05453,0.0544,0.18043,0.18151,0.12551,0.18792,0.2497,0.28359,0.31734,0.3263
)
plot(x,y, ylim = range(c(0,0.45)), pch = 1)
fit <- nls(y ~ -p1 / exp(x) + p1, start = list(p1 = 1))
xx <- seq(0,20, length = 200)
lines(xx, predict(fit, data.frame(x = xx)), col = "red")
To avoid fitting the first 5 points use the subset= argument of nls giving a vector of the negative positions to exclude:
nls(y ~ -p1 / exp(x) + p1, start = list(p1 = 1), subset = -seq(5))
Note that this model is actually linear in its single parameter so we could use lm instead of nls:
lm(y ~ I(1-exp(-x)) - 1, subset = -seq(5))

R legend for color density scatterplot produced using smoothScatter

I am producing a color density scatterplot in R using the smoothScatter() function.
Example:
## A largish data set
n <- 10000
x1 <- matrix(rnorm(n), ncol = 2)
x2 <- matrix(rnorm(n, mean = 3, sd = 1.5), ncol = 2)
x <- rbind(x1, x2)
oldpar <- par(mfrow = c(2, 2))
smoothScatter(x, nrpoints = 0)
Output:
The issue I am having is that I am unsure how to add a legend/color scale that describes the relative difference in numeric terms between different shades. For example, there is no way to tell whether the darkest blue in the figure above is 2 times, 10 times or 100 times as dense as the lightest blue without some sort of legend or color scale. Is there any way in R to retrieve the requisite information to make such a scale, or anything built in that can produce a color scale of this nature automatically?
Here is an answer that relies on fields::imageplot and some fiddling with par(mar) to get the margins correct
fudgeit <- function(){
xm <- get('xm', envir = parent.frame(1))
ym <- get('ym', envir = parent.frame(1))
z <- get('dens', envir = parent.frame(1))
colramp <- get('colramp', parent.frame(1))
fields::image.plot(xm,ym,z, col = colramp(256), legend.only = T, add =F)
}
par(mar = c(5,4,4,5) + .1)
smoothScatter(x, nrpoints = 0, postPlotHook = fudgeit)
You can fiddle around with image.plot to get what you want and look at ?bkde2D and the transformation argument to smoothScatter to get an idea of what the colours represent.

Color bar for smoothScatter in R [duplicate]

I am producing a color density scatterplot in R using the smoothScatter() function.
Example:
## A largish data set
n <- 10000
x1 <- matrix(rnorm(n), ncol = 2)
x2 <- matrix(rnorm(n, mean = 3, sd = 1.5), ncol = 2)
x <- rbind(x1, x2)
oldpar <- par(mfrow = c(2, 2))
smoothScatter(x, nrpoints = 0)
Output:
The issue I am having is that I am unsure how to add a legend/color scale that describes the relative difference in numeric terms between different shades. For example, there is no way to tell whether the darkest blue in the figure above is 2 times, 10 times or 100 times as dense as the lightest blue without some sort of legend or color scale. Is there any way in R to retrieve the requisite information to make such a scale, or anything built in that can produce a color scale of this nature automatically?
Here is an answer that relies on fields::imageplot and some fiddling with par(mar) to get the margins correct
fudgeit <- function(){
xm <- get('xm', envir = parent.frame(1))
ym <- get('ym', envir = parent.frame(1))
z <- get('dens', envir = parent.frame(1))
colramp <- get('colramp', parent.frame(1))
fields::image.plot(xm,ym,z, col = colramp(256), legend.only = T, add =F)
}
par(mar = c(5,4,4,5) + .1)
smoothScatter(x, nrpoints = 0, postPlotHook = fudgeit)
You can fiddle around with image.plot to get what you want and look at ?bkde2D and the transformation argument to smoothScatter to get an idea of what the colours represent.

Resources